Microsoft introduces three new Phi 3.5 models

Microsoft is making three new Phi 3.5 models available to developers on Hugging Face. The models perform excellently in benchmarks alongside several major AI companies.

Microsoft introduces three new Phi-3.5 models: Phi-3.5-vision, Phi-3.5-MoE and Phi-3.5-mini. Phi-3.5-mini is suitable for demanding reasoning tasks such as summarizing long meetings, while Phi-3.5 Vision can generate both text and images. The third model, Phi-3.5-MoE, uses the Mix of expertsTechnology that has proven itself in benchmarks. This technique involves adding multiple models, each of which is specialized for a specific task.

Developers can now download any of these three models from Hugging Face under an MIT license from Microsoft. All three models offer impressive state-of-the-art performance in some benchmarks, beating other AI giants including Google’s Gemini 1.5 Flash, Meta’s Llama 3.1 8B, and sometimes even OpenAI’s GPT-4o.

How the hell is Phi-3.5 even possible?

Phi-3.5-3.8B (Mini) somehow outperforms LLaMA-3.1-8B.
(trained only on 3.4T tokens)

Phi-3.5-16×3.8B (MoE) somehow outperforms Gemini Flash
(trained only on 4.9T tokens)

Phi-3.5-V-4.2B (Vision) somehow outperforms GPT-4o
(trained on 500B tokens)

How? Lol pic.twitter.com/97gmx1CsQs

— Yam Peleg (@Yampeleg) 20 August 2024

Phi 3.5 models

Microsoft has made the new Phi 3.5 models available on Hugging Face. Phi 3.5 mini is a lightweight AI model with 3.8 billion parameters and supports a token length of 128,000. The model is suitable for environments with memory or computational constraints that require strong reasoning capabilities. The model is ideal for summarizing long documents or meetings, for example.

The Phi-3.5 Vision is an advanced multimodal model that combines text and image processing capabilities. It is designed for tasks such as understanding charts and tables, summarizing videos, or understanding images. Like the other Phi-3.5 models, it supports a token length of 128,000. Microsoft emphasizes that this model is trained using a combination of synthetic and filtered, publicly available datasets.

Mix of experts

The latest model Phi-3.5-MoE uses the Mix of expertsMethod in which a model combines several different model types into one, each of which is specialized for different tasks. It is the first model in the series to use this technology and immediately proves its worth in the benchmarks. The model uses 42 billion parameters and supports a token length of 128,000, making it suitable for various demanding applications.

Developers can now download any of these three models from Hugging Face under an MIT license from Microsoft.

Source: IT Daily

Mary

As an experienced journalist and author, Mary has been reporting on the latest news and trends for over 5 years. With a passion for uncovering the stories behind the headlines, Mary has earned a reputation as a trusted voice in the world of journalism. Her writing style is insightful, engaging and thought-provoking, as she takes a deep dive into the most pressing issues of our time.

Recent Posts

An unprecedented ecosystem found under the ice of an Antarctic lake December 24, 2024

Vivo Y29 5G officially launched: entry level with LED Dynamic Light December 24, 2024

Can animals have schizophrenia like humans? December 24, 2024