Alessandro – Malted AI

DeepSeek and the Future of Distillation

This blog post examines how Deepseek’s distillation process, alongside Malted AI’s task-specific approach, highlights the potential for developing efficient, small language models that focus on accuracy, specialised knowledge, and reduced computational overhead, offering a more tailored solution for enterprise AI applications. There has been significant hype around Deepseek building a performant model for only $6M, […]

DeepSeek and the Future of Distillation Read More »

Teaching small models to think big: the secrets of knowledge distillation

Uncategorized

This blog post explores how knowledge distillation, combined with synthetic data, enables the development of small, efficient AI models that retain the capabilities of larger ones, addressing data scarcity, reducing resource requirements, and delivering practical and secure solutions for enterprise applications. Access to quality data remains a persistent challenge for organisations striving to build effective,

Teaching small models to think big: the secrets of knowledge distillation Read More »

Large language models are not always the answer: the rise of small language models

Uncategorized

This blog will explore the key differences between small language models (SLMs) and large language models (LLMs), focusing on how they’re built, their trade-offs in efficiency and resource consumption, the situations where one might be more appropriate than the other and what happens when models are combined. In recent years, advancements in natural language processing

Large language models are not always the answer: the rise of small language models Read More »