Microsoft AI models: Microsoft has introduced three new artificial intelligence models-MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2-on April 2, expanding its AI offerings for developers and enterprise users. The models are designed to handle speech recognition, voice generation, and image creation, and are now available through the company’s Foundry platform.
MAI-Transcribe-1 is Microsoft’s first-generation speech recognition model. It supports up to 25 languages and is built to de liver accurate transcription across different accents and real-world audio conditions. The company said the model achieves competitive accuracy while using nearly 50% less GPU cost compared to similar tools.
On the other hand, MAI-Voice-1 focuses on speech generation. It can produce up to 60 seconds of expressive audio in less than one second using a single GPU. The model is aimed at applications where quick and realistic voice output is required.
Both MAI-Transcribe-1 and MAI-Voice-1 are available through Azure Speech services and are intended for real-world deployment across various industries.
Use cases across industries
The speech-based models are designed for multiple use cases, including conversational AI systems such as virtual assistants and call-centre tools. They can also support live captioning for events and meetings, helping improve accessibility.
Other applications include media production, where the models can automate subtitling and transcription, and education platforms, where lectures and training materials can be converted into text. Businesses can also use these tools to analyse customer interactions and generate insights from spoken data.
MAI-Image-2: Image generation
Microsoft also introduced MAI-Image-2, a text-to-image model that can generate visuals based on written prompts. The model has been ranked among the top image model families on the Arena.ai leaderboard.

MAI-Image-2 is designed for use in creative and business workflows. It can help designers and content creators quickly generate visual concepts, while organisations can use it to produce customised graphics for communication and branding.
Integration with existing products
According to Microsoft, these models are already being used in its products such as Copilot, Bing, PowerPoint, and Azure Speech. By making them available to developers, the company aims to expand their use across different platforms and applications.
With the release of these latest advanced AI models, Microsoft continues to focus on building AI tools that can handle a wide range of tasks, from speech and language processing to visual content creation.



