Large Language Models (LLMs) vs. Small Language Models (SLMs)

Matt Sartain, Solutions Architect, Rackspace Technology

Introduction

In natural language processing (NLP), the development of language models has revolutionized how machines understand and generate human language. With the inception of models in the GPT series, such as GPT-3, the capabilities of these models have reached unprecedented levels. However, there is a growing interest in exploring the effectiveness and efficiency of smaller language models (SLMs) compared to their larger counterparts (LLMs). In this article, we will dive into the distinctions between LLMs and SLMs, their advantages, and the scenarios where each type may be advantageous.

Understanding Large Language Models

Large Language Models (LLMs) are characterized by their enormous size which typically involve billions of parameters. In this context, parameters are variables that specify the structure and behavior of the model. These models undergo extensive training on large datasets sourced from various origins enabling them to learn complex patterns and relationships within language. LLMs can craft text that aligns with the given context, drawing upon information from the entire input sequence to deliver coherent and contextually fitting responses. They can also highlight impressive generative prowess, proving invaluable for tasks like language translation, text generation and summarization, and question answering. In some cases, certain models can display multimodal abilities, enabling them to process diverse data types such as images or audio alongside textual inputs.

The primary advantage of LLMs lies in their capacity to capture complex patterns and relationships within the data, resulting in impressive performance across a wide range of NLP tasks. Due to their extensive training, LLMs often exhibit higher accuracy and fluency in generating human-like text, making them a suitable choice for applications where precision and naturalness are key.

However, along with their potential benefits, large language models raise ethical concerns regarding misinformation, bias, and privacy. Given their ability to generate highly convincing fake content, there is a need for responsible AI practices to mitigate these risks. Despite these challenges, large language models offer immense potential for a multitude of applications across different domains, with ongoing research focusing on addressing ethical concerns while harnessing the power of these transformative technologies.

Unveiling Small Language Models (SLMs)

Small Language Models (SLMs) are characterized by their reduced scale and simplified architecture compared to larger models. They are trained on less data with fewer parameters, so they may struggle to capture intricate language patterns effectively.

These models are crafted to be more lightweight and resource-efficient, making them suitable for deployment in environments with limited computational resources.

While SLMs may not match the scale and performance of LLMs, they offer distinct advantages in terms of speed, memory footprint and energy consumption. This makes them particularly appealing for real-time applications where low latency and efficient resource utilization are crucial factors.

As demonstrated in the table below, an example of an SLM keeping pace with LLMs is Microsoft’s Phi-2, which is an open-source model intended for research purposes. Phi-2 significantly outperforms all 4 models in mathematics (except Llama-2–70B) and coding tasks.

SLM keeping pace with LLMs is Microsoft’s Phi-2

Source: Microsoft Phi 2: The surprising power of small language models” (12/12/23)

While these models excel in task-specific adaptability and cost-effective training, they encounter scalability issues when confronted with larger datasets or more intricate tasks. As a result, determining the suitability of small language models hinges on evaluating the specific needs of the application and striking a balance between resource efficiency and performance.

Use cases and applications

• LLMs: LLMs excel in applications requiring high precision and naturalness in language generation, such as chatbots, content generation and language translation services. They can also generate human-like text for various purposes, including content creation, storytelling and creative writing. They are also valuable for research purposes and exploring complex language tasks.

• SLMs: SLMs are well suited for tasks that require specialized knowledge or expertise in a specific field. They are suitable where resource efficiency and low latency are critical, such as real-time language processing in mobile applications, voice assistants and IoT devices. They are also suitable for educational purposes and smaller-scale NLP projects.

Contrasting characteristics

While there is ongoing discussion as to what constitutes an LLM vs. an SLM in terms of parameters, for this article, I am considering a threshold of 7 billion parameters. What that means is that any model with 7 billion parameters or fewer will be considered an SLM.

• Model size: LLMs are characterized by their large size, typically containing hundreds of billions of parameters, whereas SLMs have a smaller parameter count to optimize resource consumption. LLMs such as ChatGPT can contain up to 1.7 trillion parameters, while an open SLM such as Phi-2 contains 2.7 billion parameters.

• Performance: LLMs tend to outperform SLMs in terms of accuracy and fluency, thanks to their extensive training on vast datasets. However, SLMs can still achieve satisfactory performance for many NLP tasks while offering faster inference times.

• Resource requirements: LLMs demand substantial computational resources, including high-end GPUs or TPUs for training and inference. In contrast, SLMs can run efficiently on devices with limited resources, making them more accessible for deployment in various settings.

• Deployment flexibility: SLMs are more flexible for deployment in resource-constrained environments such as mobile devices, IoT devices, or edge computing platforms, whereas LLMs are better suited for cloud-based applications where ample computational resources are available.

• Understanding and field specialization: SLMs are trained on data from specialized domains, potentially limiting their grasp of broader information across multiple fields. However, they often demonstrate exceptional proficiency within their designated domain. In contrast, LLMs aim to replicate human intelligence on a much larger scale. Leveraging large datasets of training data, they are expected to demonstrate capable performance across various domains compared to domain specific SLMs. Additionally, LLMs possess greater adaptability and can be refined for enhanced performance in downstream tasks like programming.

• Bias: LLMs can often exhibit biases due to insufficient fine-tuning and training on raw data from the internet. This source of training data may lead to a misrepresentation of certain groups or ideas, as well as erroneous labeling. Inherent biases in language, influenced by factors like dialect, geographic location and grammar rules contribute to further complexity. On the other hand, small language models (SLMs), which train on smaller, domain-specific datasets, inherently carry a lower risk of bias compared to LLMs.

Demonstration

This is a project demonstrating basic usage of Phi-2 as our SLM and Llama2(using Ollama) as our LLM. We will be issuing identical instructions in order to compare and contrast the models comprehension and responses.

Phi-2

Phi-2, a language model with 2.7 billion parameters, was developed by Microsoft Research. This compact language model aims to achieve state-of-the-art performance compared to models much larger in size.

LLama2 using Ollama

Ollama enables local execution of open-source large language models by bundling configuration and data into a single package. This approach streamlines setup and configuration.

This blog will test each model for the following

  • Text generation
  • Sentiment analysis
  • Language translation
  • Code generation
  • Mathematics

Phi-2

Install required dependencies

!pip install sentencepiece transformers accelerate einops torch

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import GenerationConfig, TextStreamer, pipeline

Load the pretrained Microsoft/Phi-2 along with the tokenizer.

MODEL_NAME = "microsoft/phi-2"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME,
    torch_dtype="auto",
    flash_attn=True,
    flash_rotary=True,
    fused_dense=True,
    device_map="auto",
    trust_remote_code=True,
    use_fast=False)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, trust_remote_code=True)

Generate function that accepts an instruction, builds a prompt

def generate(
        instruction,
        max_new_tokens=1024,
        temperature=0.0001,
        top_p=0.75,
        top_k=40,
        **kwargs,
):
    prompt = "Instruct: " + instruction + "\nOutput:"
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"]
    attention_mask = inputs["attention_mask"]

    with torch.no_grad():
        generation_output = model.generate(
            input_ids=input_ids,
            attention_mask=attention_mask,
            max_new_tokens=max_new_tokens,
            eos_token_id = tokenizer.eos_token_id,
            pad_token_id=tokenizer.eos_token_id,
            use_cache=True
        )
    output = tokenizer.decode(generation_output[0])
    return output.split("\nOutput:")[1].lstrip("\n")

 

 Text generation

For the first example, we will test Phi-2's text generation and analyze its language comprehension.

%%time
instruction = "What are the pros/cons of LLMs vs SLMs?"
print(generate(instruction))

Out- of- the- box sentiment analysis

%%time
statement = """
The food at the restaurant was mediocre at best. I wouldn't recommend it to anyone.
"""

sentimentPrompt = f"""
Do sentiment analysis on the following statement:
{statement}
and rewrite it in the words of Yoda.
"""

print(generate(sentimentPrompt))

Language Translation

statement = "I am looking for the bus stop, do you know where it is?"

translationPrompt = f"""
Translate the following statement to french:
{statement}
"""
print(generate(translationPrompt))

Code generation

%%time
codeGenPrompt = "Design a class in python that can categorize transactions from a credit card statement."
print(generate(codeGenPrompt))

Ollama, using the Llama 2 model

  • Download Ollama:
  • Visit the official Ollama website or GitHub repository to download the latest version of the Ollama software
  • Install
  • Extract and/or install Ollama and follow the on-screen installation.
  • Install necessary dependencies (if necessary)
  • Depending on your system configuration, Ollama may require certain dependencies to be installed. Common dependencies include Python and other libraries. Refer to the installation instructions provided with Ollama to identify and install any necessary dependencies
  • Start Using Ollama
  • Once Ollama is installed and configured, you can start using it to run language models locally. Follow the usage instructions provided by Ollama to perform tasks such as training models, generating text, or evaluating performance.

Quick notes from terminal or command prompt:

Available commands:

  • serve Start Ollama
  • create Create a model from a Modelfile
  • show Show information for a model
  • run Run a model
  • pull Pull a model from a registry
  • push Push a model to a registry
  • list List models
  • cp Copy a model
  • rm Remove a model
  • help Help about any command

For this tutorial, after installation, I ran the following commands: ollama pull llama2 Verify llama2 was downloaded by running ollama list To start ollama, run ollama serve.

Once Ollama is running and serving the appropriate model, run through the below prompts and compare and contrast to what Phi-2 and Llama2 models output. Pay close attention to the consistency and coherence of the results and the results.

!pip install ollama

import ollama
ollama.list()

Text Generation

%%time
response = ollama.chat(model='llama2', messages=[
  {
    'role': 'user',
    'content': instruction,
  },
])
print(response['message']['content'])

%%time
response = ollama.chat(model='llama2', messages=[
  {
    'role': 'user',
    'content': sentimentPrompt,
  },
])
print(response['message']['content'])

%%time
response = ollama.chat(model='llama2', messages=[
  {
    'role': 'user',
    'content': translationPrompt,
  },
])
print(response['message']['content'])

%%time
response = ollama.chat(model='llama2', messages=[
  {
    'role': 'user',
    'content': codeGenPrompt,
  },
])
print(response['message']['content'])

Conclusion

In conclusion, both LLMs and SLMs have their unique strengths and applications in natural language processing. While LLMs offer unparalleled performance and accuracy, SLMs provide efficiency and flexibility, particularly in resource-constrained environments. Your choice between LLMs and SLMs depends on the specific requirements of the application, balancing performance with resource constraints to achieve optimal results in various NLP tasks. The suitability of language models depends entirely on the specific use case and the resources at one's disposal. For some businesses, using an LLM as a chat agent for support teams may be to their advantage as it can handle large volumes of inquiries.

For function-specific tasks, SLMs typically excel where specialized and proprietary knowledge is of the utmost importance. In such cases, training an SLM in-house, and leveraging domain-specific expertise can yield sophistication in these specialized sectors.

As the field continues to evolve, advancements in both types of models will further broaden the landscape of possibilities in natural language processing and AI-driven applications.

Learn fore about FAIR