AI News

Apple Quietly Drops Two New AI Models on Hugging Face

Apple launches AI modes on Hugging face

Key Highlights – 

  • Apple rolls out two AI models on hugging face namely – FastVLM and MobileCLIP2
  • This comes a week before its “Awe Dropping” event on September 9
  • Apple says that both these models run locally and offer almost real-time output

Apple has made a silent move in the AI landscape by launching two of its models on Hugging Face – FastVLM and MobileCLIP2. The Cupertino-based tech giant claims that both these models run locally and offer almost real-time output.

The announcement gained traction when Clem Delangue, CEO and Co-founder of Hugging Face took to X to share his opinion on these models. He shared –

But what are these models capable of, and how does this position Apple in the AI race? Let’s find out.

Apple’s FastVLM and MobileCLIP2 AI Models & What They Do

At the heart of Apple’s new release are two models designed to excel at processing and understanding both visual and textual data. Both models run locally on a device and are fine-tuned for Apple silicon, a core part of the company’s machine learning framework, MLX. The emphasis on on-device processing is a key differentiator, as it allows for privacy-preserving, real-time output without sending data to the cloud.

The FastVLM (Visual Language Model) addresses the long-standing tradeoff between accuracy and latency in vision-language models. It is capable of high-resolution processing in near real-time, enabling it to efficiently understand and generate content across both images and text.

The FastVLM family includes more powerful versions with 1.5 billion and 7 billion parameters, which improve both performance and response time. Apple has even released a light version, FastVLM-0.5B, which can be tried directly in a web browser, showcasing its accessibility and efficiency.

The companion model, MobileCLIP2, pushes Apple’s efficiency-first approach even further. It brings vision and language capabilities to mobile devices, and Apple claims it is 85 times faster and 3.4 times smaller than earlier versions.

This significant improvement is made possible by a new hybrid encoder, FastViT-HD, which produces fewer but higher-quality tokens, enabling faster processing without compromising accuracy. The model’s ability to decipher images and videos in real-time makes it highly practical for applications in accessibility (e.g., describing what’s in a picture for the visually impaired), robotics, and user interface navigation.

If you’re looking to try on these models, you can find the Hugging Face redirects to FastVLM and MobileCLIP2 here. For a in-browser demo and source code, check out the FastVLM WebGPU.

Apple’s Current Position in the AI Race

The release of these models just days before the highly anticipated iPhone 17’s “Awe Dropping” event does not seem like a coincidence. While other AI leads like OpenAI and Google have been pushing towards LLMs, Apple is taking a different path than hyping up chatbots.

Rather than focusing on a conversational AI bots, the company is doubling down on on-device AI, which could then be integrated onto the MacBooks, iMacs, iPhones and other Apple devices. More of this could be seen at the upcoming Apple event on September 9, 2025.

Apple is reportedly also planning to use OpenAI’s ChatGPT in some capacity for their iOS operating systems. In response to this, xAI owner Elon Musk has filed a lawsuit against both Apple and OpenAI, accusing them of creating unfair monopoly in the AI market.

Meanwhile, fans are excited for the upcoming Apple event. You can join the livestream on September 9, starting 1000 hours.

Abhijay Singh Rawat
Abhijay is the News Editor at TimesofAI and TimesofGames, who loves to follow up on the latest tech and AI trends. After office hours, you would find him either grinding competitive ranked games, or trek up his way in the hills of Uttarakhand.
More in:AI News