Key Highlights:

Microsoft is reportedly rolling out a new line up of AI models right before Build 2026, expanding its MAI family across image generation, speech recognition, and text-to-speech.
The line up includes MAI-Image-2.5, MAI-Image-2.5e, MAI-Transcribe-1.5, and MAI-Voice-2, all upgrades to models Microsoft already launched within its AI ecosystem.
These new models promise better image quality and editing tools, improved multilingual speech capabilities, more accurate transcription, and smarter, faster inference.

The expected announcements focus on three major categories that have become increasingly important in the AI market: image generation, speech recognition, and voice synthesis.

The upcoming releases build on Microsoft’s existing MAI family: MAI-Image-2, MAI-Transcribe-1, and MAI-Voice-1. If these reports prove to be accurate, the new generation will bring expanded capabilities and reduce Microsoft’s reliance on third-party AI providers by strengthening its own model ecosystem.

New MAI Models Expected at Build 2026

BUILD 🔥: Microsoft is preparing new image and voice models for the announcement on June 2.

> MAI Voice 2, a multilingual model supporting 15 news languages and a wider range of emotional spectrum (check voice samples in the article)

> MAI Transcribe 1.5, a new model for… pic.twitter.com/kfiPEP6YoZ
— 🚨 AI News | TestingCatalog (@testingcatalog) May 30, 2026

1. MAI-Image-2.5

MAI-Image-2.5 appears to be Microsoft’s top image-generation model. Reports say it brings sharper images, better prompt understanding, and all-around stronger performance. A standout feature might be its support for image editing so users can upload their own images and modify them with AI instructions, not just generate new images from text.

The model is also said to have achieved strong results on image-generation benchmark leader boards, reflecting Microsoft’s continued efforts to compete in a market which is increasingly defined by advanced visual AI systems.

2. MAI-Image-2.5e

Microsoft is also said to be preparing MAI-Image-2.5e. The “e” stands for efficiency. It’s expected to be faster and more resource-friendly, while maintaining the visual quality offered by the flagship model.

This gives businesses and developers more flexibility, letting them choose between the best possible image quality or quicker, cheaper results based on what they need.

3. MAI-Transcribe-1.5

Speech recognition is getting an update, too, with MAI-Transcribe-1.5. This version, built on the foundation established by MAI-Transcribe-1, focuses on nailing accuracy, handling more languages, and performing better in different speaking environments.

As companies rely more on automated meeting notes, support analytics, accessibility tools, and transcription, these upgrades matter. Better speech-to-text is fast becoming a must-have for modern AI platforms.

4. MAI-Voice-2

MAI-Voice-2 is expected to be Microsoft’s next-gen text-to-speech model, maybe the biggest upgrade in the line up. It’s going to add several languages: Hindi, Spanish, French, Japanese, Korean, Portuguese, Turkish, Vietnamese, and Chinese.

It’s not just about speaking more languages, either. The new version is supposed to sound more expressive, handling things like whispering and showing a bigger range of emotional tones. That’s vital for making AI-generated voices feel more natural across lots of uses, whether you’re building assistants, making content, or engaging customers.

How the New MAI Models Compare With Previous Versions

Microsoft is putting emphasis on making its AI not just more capable, but more practical in everyday business settings.

In AI tool for image generation, going from MAI-Image-2 to 2.5 is all about higher-quality output and now, built-in editing. MAI-Image-2 was mainly about just creating images. The new version turns the platform into a full visual AI suite; create, then edit, all in one place.

MAI-Image-2.5e isn’t replacing the flagship; it’s a new option entirely. It gives users the choice between top-tier quality and speed.

For speech recognition, MAI-Transcribe-1.5 refines the capabilities of MAI-Transcribe-1. The main focus is even better accuracy and reliability which is crucial for real-world settings where background noise, accents, and varying audio quality can affect transcription performance.

The biggest changes come with MAI-Voice-2. Compared to MAI-Voice-1, this new model covers a lot more languages and introduces nuanced, emotional expressions. Features like whispering and subtler vocal delivery mark a real shift toward realistic speech.

Also read: Microsoft has Launched out MAI-Image-2-Efficient for Faster Image Generation

Wrapping Up

Microsoft hasn’t confirmed these announcements yet, but the rumoured MAI upgrades make it clear where things are headed. With MAI-Image-2.5 and 2.5e focused on visuals, MAI-Transcribe-1.5 boosting speech recognition, and MAI-Voice-2 widening voice synthesis capabilities, Microsoft is building a deep multimodal AI ecosystem, all under one platform.

Devanshi Kashyap

Devanshi is a curious learner who enjoys exploring new ideas and expressing creativity through art.