
Key Highlights –
- Apple rolls out two AI models on hugging face namely – FastVLM and MobileCLIP2
- This comes a week before its “Awe Dropping” event on September 9
- Apple says that both these models run locally and offer almost real-time output
Apple has made a silent move in the AI landscape by launching two of its models on Hugging Face – FastVLM and MobileCLIP2. The Cupertino-based tech giant claims that both these models run locally and offer almost real-time output.
The announcement gained traction when Clem Delangue, CEO and Co-founder of Hugging Face took to X to share his opinion on these models. He shared –
If you think @Apple is not doing much in AI, you’re getting blindsided by the chatbot hype and not paying enough attention!
They just released FastVLM and MobileCLIP2 on @huggingface. The models are up to 85x faster and 3.4x smaller than previous work, enabling real-time vision… pic.twitter.com/jYCPukNuiK
— clem 🤗 (@ClementDelangue) September 1, 2025
But what are these models capable of, and how does this position Apple in the AI race? Let’s find out.
Apple’s FastVLM and MobileCLIP2 AI Models & What They Do
At the heart of Apple’s new release are two models designed to excel at processing and understanding both visual and textual data. Both models run locally on a device and are fine-tuned for Apple silicon, a core part of the company’s machine learning framework, MLX. The emphasis on on-device processing is a key differentiator, as it allows for privacy-preserving, real-time output without sending data to the cloud.
The FastVLM (Visual Language Model) addresses the long-standing tradeoff between accuracy and latency in vision-language models. It is capable of high-resolution processing in near real-time, enabling it to efficiently understand and generate content across both images and text.
The FastVLM family includes more powerful versions with 1.5 billion and 7 billion parameters, which improve both performance and response time. Apple has even released a light version, FastVLM-0.5B, which can be tried directly in a web browser, showcasing its accessibility and efficiency.
The companion model, MobileCLIP2, pushes Apple’s efficiency-first approach even further. It brings vision and language capabilities to mobile devices, and Apple claims it is 85 times faster and 3.4 times smaller than earlier versions.
This significant improvement is made possible by a new hybrid encoder, FastViT-HD, which produces fewer but higher-quality tokens, enabling faster processing without compromising accuracy. The model’s ability to decipher images and videos in real-time makes it highly practical for applications in accessibility (e.g., describing what’s in a picture for the visually impaired), robotics, and user interface navigation.
If you’re looking to try on these models, you can find the Hugging Face redirects to FastVLM and MobileCLIP2 here. For a in-browser demo and source code, check out the FastVLM WebGPU.
Apple’s Current Position in the AI Race
The release of these models just days before the highly anticipated iPhone 17’s “Awe Dropping” event does not seem like a coincidence. While other AI leads like OpenAI and Google have been pushing towards LLMs, Apple is taking a different path than hyping up chatbots.
Rather than focusing on a conversational AI bots, the company is doubling down on on-device AI, which could then be integrated onto the MacBooks, iMacs, iPhones and other Apple devices. More of this could be seen at the upcoming Apple event on September 9, 2025.
Apple is reportedly also planning to use OpenAI’s ChatGPT in some capacity for their iOS operating systems. In response to this, xAI owner Elon Musk has filed a lawsuit against both Apple and OpenAI, accusing them of creating unfair monopoly in the AI market.
Meanwhile, fans are excited for the upcoming Apple event. You can join the livestream on September 9, starting 1000 hours.