From the Alexa Challenge to Malted AI

The Malted AI founders leveraged small language models (SLMs) and distillation to win the Amazon Alexa Challenge. These learnings are the cornerstone of Malted AI unique approach to solving domain-specific tasks in secure enterprise environments.

What is the 2021/22 Amazon Alexa TaskBot Challenge?

A 12-month competition against 125 top AI teams from around the globe.
Each team developed an AI agent, or TaskBot, to help users solve day-to-day tasks such as cooking a meal or fixing a bike.
TaskBots supported Amazon Alexa users and had to manage 100,000s+ interactions.
This challenge required production-grade AI, i.e., low latency, 100% uptime, and data security (self-hosting to ensure no personal data left Amazon’s cloud).
Malted AI won based on user ratings and judgment from domain experts.

Problem 1 – system

We mapped out a dream user experience. For example, imagine cooking at home with Gordon Ramsey to design and prepare a dinner party for four guests. The AI agent (Gordon Ramsey) would help you find the best recipes from 100,000s options, deal with any issues in real-time, and make it a factually grounded and engaging experience.

Thus, we needed a flexible and scalable system to construct these knowledge-grounded experiences at scale. We looked at various system options but assessed neither was suitable:


Intent-based: These are traditional tree-based conversational logic where you can design detailed user flows.2 These flows give some sense of controlled user experience, low latency and can be self-hosted, which was required because we were dealing with personal/sensitive user information. However, these systems often do not sound natural, are brittle when dealing with edge cases, and are hard to scale to millions of tasks.

Large Language Models (LLMs): As of late 2022, OpenAI launched GPT3, which is a general AI model containing 175 billion parameters3. These systems can create natural and fluent text, are flexible when dealing with conversation edge cases, and are scalable to millions of tasks by varying the prompt4. However, we couldn’t create complex and knowledge-based conversations, use self-host given the model size, or respond to the user in a low-latency manner.

Problem 2 – Data


We solved the system problem by leveraging a network of SLMs and rich task data. However, how will we train our SLMs for each sub-task? We needed to build representative and high-quality datasets for our machine-learning models.

In machine learning, you need three things: data, data, data.”

Building traditional machine learning datasets often requires hundreds or thousands of hours of manual annotation. These may result in high-quality data but are not cost-effective or scalable. Furthermore, there are sensitivity issues around annotators viewing sensitive data such as personal conversations.  

Distillation

Founding Malted AI

Malted AI was inspired by overcoming the system and data challenges the team encountered during the Alexa Challenge. We showed that leveraging small language models and distillation can solve complex domain-specific problems. After winning the global competition, Iain Mackie, Carlos Gemmell, and Federico Rossetto founded what Malted AI is today. Alan Turning Fellow Jeff Dalton is Malted’s Chief Scientific Advisor and Paul Owoicho is a Machine Learning Engineer.

Malted AI partners with enterprises to build custom AI applications utilising distilled small language models, trained on their proprietary data in a secure environment. Our distillation technology creates high-quality training data for SLMs that would have required thousands of human hours to annotate manually. Thus, we support enterprises in building factually accurate and reliable AI solutions that are 10-100x smaller than current general LLMs. Ultimately, showing that smaller can be better.

References

  1. Ipek, Anna Gottardi Osman, et al. “Alexa, Let’s Work Together: Introducing the First Alexa Prize TaskBot Challenge on Conversational Task Assistance.” arXiv preprint arXiv:2209.06321 (2022). ↩︎
  2. Xie, Tian, et al. “Converse: A tree-based modular task-oriented dialogue system.” arXiv preprint arXiv:2203.12187 (2022). ↩︎
  3. Brown, Tom B. “Language models are few-shot learners.” arXiv preprint arXiv:2005.14165 (2020). ↩︎
  4. Foosherian, Mina, et al. “Enhancing pipeline-based conversational agents with large language models.” arXiv preprint arXiv:2309.03748 (2023). ↩︎
  5. Raffel, Colin, et al. “Exploring the limits of transfer learning with a unified text-to-text transformer.” Journal of machine learning research 21.140 (2020): 1-67. ↩︎
  6. Devlin, Jacob et al. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” North American Chapter of the Association for Computational Linguistics (2019). ↩︎
  7. Gou, Jianping, et al. “Knowledge distillation: A survey.” International Journal of Computer Vision 129.6 (2021): 1789-1819. ↩︎

Scroll to Top