Sesame AI has publicly released the underlying technology powering its wildly popular Maya voice assistant. The company unveiled CSM-1B, a sophisticated speech generation model containing 1 billion parameters, on March 13, 2025, making it freely available under the Apache 2.0 license.
CSM-1B stands out for its ability to create remarkably human-like speech from both text and audio inputs, utilizing advanced residual vector quantization (RVQ) technology to achieve its natural-sounding results. This technology is also used by Google’s SoundStream and Meta Encodec. Maya is built on Meta’s Llama AI, and it can produce a variety of voices without specific fine-tuning.
The Apache 2.0 license attached to CSM-1B allows commercial applications with few limitations, potentially triggering a wave of innovation across industries ranging from customer service to accessibility tools. Developers worldwide now have access to the same core technology that powers Maya, the voice assistant that captured public attention for its conversational abilities.
By making this 1-billion-parameter model publicly available, Sesame AI has effectively lowered barriers to entry in the voice AI space, creating opportunities for researchers and smaller companies to build upon this foundation. The move could accelerate advancements in how machines and humans communicate through speech, with implications extending far beyond current applications.
However, this also raises ethical concerns as CSM-1B has fewer restrictions than some of its rivals, and open-sourcing it means putting powerful unresriected voice AI in everyone’s hands. Instead of any strict technical limitations that don’t allow it to perform certain tasks, it only comes with an ethical guideline, requesting users to avoid unauthorized voice impersonation, creating misleading content, and other potentially harmful activities.
