Specialized Intelligence

HOW SMALL LANGUAGE MODELS ARE REVOLUTIONIZING AI APPLICATIONS

Phi-3 models significantly outperform language models of the same and larger sizes on key benchmarks. Phi-3-mini does better than models twice its size, and Phi-3-small and Phi-3-medium outperform much larger models, including GPT-3.5. - Microsoft on the power of their small, but smart models

Large Language Models (LLMs) like OpenAI’s GPT-4 series are capable of impressive skills like reaching (or surpassing) humans on the SAT, GRE, AP classes, even the Bar Exam. Bigger LLMs like GPT-4 and Anthropic’s Claude contain hundreds of billions of parameters that enables them to process complicated inputs and create seemingly intelligent and accurate responses. Generally speaking, the larger the model, the more capable it is, and the more tasks it can accomplish at a high level. When I’m writing Python code, I will pick the largest model I have available to help me because it’s likely to write code with fewer errors. But not always. Smaller models have their place too. Large models require massive computations, high-end hardware, and significant energy consumption to do their job; however, small models require significantly less energy, can be used on more devices, and can be good at a smaller range of tasks with specialized training. Small models can help democratize AI with efficiency and accessibility.

ENERGY SAVINGS AND ENVIRONMENTAL IMPACT

As we have discussed before, the training and usage of today’s foundational LLMs requires huge amounts of energy. Since sustainability and combating climate change is dear to F’inn, we are particularly interested in the reduced energy requirements of smaller models. Smaller models require less processing power and memory to operate effectively, so they consume significantly less energy during training and inference.

They estimated teaching the neural super-network in a Microsoft data center using Nvidia GPUs required roughly 190,000 kWh, which using the average carbon intensity of America would have produced 85,000 kg of CO2 equivalents, the same amount produced by a new car in Europe driving 700,000 km, or 435,000 miles, which is about twice the distance between Earth and the Moon, some 480,000 miles. The Register

By reducing the energy demands of AI applications, we can decrease the overall carbon footprint associated with AI. And as AI is increasingly adopted (whether everyone wants to or not), using smaller models can play an important role to mitigating their environmental impacts. If we prioritize energy efficiency, we can improve how AI impacts our environment.

ACCESSIBILITY AND WIDER ADOPTION

Today’s smartest models run on server-grade hardware in huge datacenters, while smaller models can be run effectively on more common hardware with limited processing power like laptops and smartphones. They are accessible to a much broader range of people, including those without access to high-end computing devices and have the potential to bring the benefits of natural language processing to a large audience. Google has developed the Gemma series (one of which has 2 billion parameters compared to GPT-4’s rumored 1.7 trillion parameters) which are much smaller than their Gemini cousin and can run on a smartphone. With a billion Android phones sold each year, there will be more and more devices running AI, designed to help with users with settings, automating phone functions, and more. Apple is also enabling on-device AI with their latest devices and built into iOS 18.

man next to river using AI — AI can be used to identify and learn about plants on your hike without an internet connection

Being able to run an AI model on a smartphone also means they can run offline, not requiring a central server to process all incoming requests and generate the responses. We can run AI-powered applications in areas with limited or no internet connectivity, meaning regions with unreliable or no internet can still benefit. Edge devices, such as smart-home appliances can also benefit from running small models – our home assistant runs speech-to-text and text-to-speech models locally (like using Alexa voice commands completely privately without any voice or sensitive information ever leaving your house). These small models can help bridge the digital divide by ensuring the benefits of AI are accessible to everyone. We can better democratize AI, making it more inclusive and empowering people and communities to join the AI revolution.

SPECIALIZATION AND TASK-SPECIFIC PERFORMANCE

Despite the benefits, smaller models are typically less capable than their larger siblings. However, one of the key strengths of small language models is their ability to be specifically trained for individual tasks. Unlike large models that are trained on vast amounts of diverse data, small models can be fine-tuned or trained from scratch on a specific task. For example, if you want a home automation system that can respond with natural language, you don’t need a large model that can also program in Python or write Shakespearean sonnets. We can create smaller, specialized datasets relevant to the task, like various commands someone might use to turn on lights, check the video doorbell, or start the robot vacuum cleaner. This specialized training allows the model to capture the nuances and intricacies of the specific domain, leading to better performance in that area. By concentrating on a single task, small models can achieve task performance on par or exceeding larger models but require a fraction of the energy.

smart home page projected on to wall — A generalized AI isn’t necessary to run a home system - a small, specialized model should excel

The use cases and need for smaller models continues to grow. For example, a small model trained specifically to analyze the sentiment of text can accurately classify the emotional tone of the text, whether it is positive, negative or neutral. They can also be trained to create concise summaries of longer documents, program in specific languages, or efficiently extract information from a database. It’s like using a specialized screwdriver to fix your watch instead of a power drill. Specialization can lead to improved efficiency and accuracy in various natural language applications, creating highly optimized and effective solutions tailored to specific use cases, ultimately enhancing the performance of user experience of AI applications.

One of my favorite small models is Microsoft’s Phi-3 series which perform well beyond their size. Microsoft has prioritized a curated training dataset that focuses on languages, reasoning, coding, and math abilities. Phi-3-mini has only 3.8 billion parameters but performs as well or better than models with 7 billion parameters trained on ordinary datasets. However, this strong performance on math and reasoning comes at the cost of less general knowledge. These models are smart and can work through some complex tasks, but don’t know as much trivia as larger models. But the Phi models prove that specialized training datasets allow small models to be highly performant.

REDUCED LATENCY AND REAL-TIME APPLICATIONS

Small language models also process inputs faster, which reduces latency (the time it takes for a model to generate a response after receiving an input). With smaller models, the computational requirements are reduced, allowing for faster processing times. In real-time applications where quick responses are important, small models can be the right solution. LLMs are very good at translation, so faster processing can get us closer to real-time-translation which could mean seamless communication across languages. Voice assistants will also benefit from quicker responses and thus are good candidates for smaller, specialized models. Interactions can be more natural and conversational in nature – it’s becoming a real-time experience versus a turn-based one.

man using device to translate language to alien — Real-time translation with AI ready for first contact?

And my favorite usage of low-latency models is to train small models to recognize the signs of Artificial General Intelligence (AGI – see lots more on the topic here, here and here!). We don’t know what will happen when AI surpasses our abilities, so we need to tread this line carefully. These quick AGI-detectors will detect the signs of a large, slow, Pre-AGI model approaching a critical level and pull the plug or pause processing before the large model finishes. By having quick, responsive LLMs, we can react before it’s too late. Small models’ ability to quickly monitor network security could transform cybersecurity by providing swift remediation and patches.

PRIVACY AND SECURITY CONSIDERATIONS

Finally, being able to run locally on a device like your smartphone means the cloud isn’t needed for processing, which means all your data and results stays private on your device. When your device is talking to the cloud, there’s an inherent risk of data breaches and unauthorized access to sensitive information (as CTO of F’inn who undergoes rigorous yearly IT-security assessments, this one hits particularly close to home). This decentralized approach to AI enhances privacy and security as the data never leaves the device, minimizing the risk of interception or unauthorized access. We take the privacy and security of our clients’ data very seriously, so I often use local models to analyze sensitive information (there can be some tradeoff for privacy in terms of accuracy, but it’s worth it).

Keeping your data local also means that you are in control of your data. The General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) emphasize data privacy, including transparency on sharing personal information with 3rd parties. By processing data locally, users have control of their data and how that data is shared. Small language models can enable the development of AI-powered tools that prioritize user privacy, such as secure personal assistants, encrypted messaging services, and privacy-preserving recommendation systems. By embedding small language models into these applications, developers can create innovative solutions that harness the power of AI while respecting and protecting user privacy. Whether you are processing your baby-camera footage, getting feedback on new products, or admitting your deepest secrets to your AI friend, privacy is key.

CHALLENGES AND FUTURE DEVELOPMENTS

These benefits don’t come without a cost, which is usually broader abilities, so one key challenge is how to best balance model size and performance. As models become smaller, there is the risk of reducing their efficacy or making more mistakes. Researchers and developers must find ways to optimize their architectures and training processes of small language models to ensure they can still deliver satisfactory results, despite the reduced size. One approach is compressing large models via quantization (it’s like making an MP3 file from a CD; it’s smaller, but reduced quality) which allows larger models to run faster and on less-impressive hardware, with acceptable levels of competence.

chart showing LLM predictive accuracy — Perplexity is a measure of how accurately LLMs predict the next word – lower is better

This graph shows that bigger models have lower perplexity than smaller models and that higher levels of quantization do eventually negatively impact performance. Resource

Future developments will continue the rise of small models, especially improvements in architecture and training data. We will develop new neural network architectures that are more efficient than the current transformer, such as Mamba, that will enable faster and less energy-hungry computation with the same level of performance. Incorporating techniques such as transfer learning, few-shot learning, or meta-learning could make small models much more capable. Improvements in training datasets can also help improve generalization and reasoning abilities as Microsoft’s Phi dataset demonstrates.

Thus, it’s important to be transparent and acknowledge potential limitations of small language models, such as reduced generalization abilities or complex contextual understanding. By using repeatable benchmarks, we can directly measure how well models perform compared to their size. For example, using F’inn’s GRAIG benchmark that measures several capabilities of LLMs relevant to our business, using a highly compressed version of Meta’s Llama3-70B model (with IQ quantization, it is about 1/12 the size of the full model), reasoning skills are only somewhat diminished, but the ability to follow directions is significantly worse. Understanding these limitations can help us harness the power of small models and mitigate their weaknesses.

CONCLUSION

Small language models can address some of today’s current issues with AI’s energy consumption, latency, and reliance on cloud servers. The importance of reducing energy consumption cannot be overstated and why small models provide a path towards more sustainable AI. The ability for more devices to run AI models locally democratizes access to AI, empowering individuals and communities who have been excluded. And as AI develops, you can expect more emphasis on developing specialized, smaller models in industries from healthcare, to finance and education; especially in situations where privacy and local processing is necessary. The big models are impressive, but small models offer practical, efficient, and accessible solutions for real-world applications, enabling broader and more equitable use of AI technology.