May 1, 2025
Trending News

Snowflake integrates Llama 3.1 into Cortex AI and introduces optimization stack

  • August 1, 2024
  • 0

Snowflake’s AI research team introduces an open-source inference and fine-tuning system that delivers fast and memory-efficient performance for models with hundreds of billions of parameters. Snowflake is bringing

Snowflake integrates Llama 3.1 into Cortex AI and introduces optimization stack

Snowflake

Snowflake’s AI research team introduces an open-source inference and fine-tuning system that delivers fast and memory-efficient performance for models with hundreds of billions of parameters.

Snowflake is bringing the Llama 3.1 LLM family to Cortex AI. This will allow Snowflake customers to get started with the large models in their Snowflake environment. The large open source Llama 3.1 405B model is also present. The model has been optimized by Snowflake itself for inference and fine-tuning, each based on proprietary data that remains secure and private.

Snowflake’s AI research team accompanying Massive LLM inference and fine-tuning system optimization stack also made open source. This move follows the launch of Llama 3.1 405B within Snowflake and marks an important milestone in the collaboration with DeepSpeed, Hugging Face, vLLM and the broader AI community.

Challenges and solutions

Models of size Llama 3.0 405B present significant challenges, particularly in terms of memory requirements and low-latency inference, both of which are important for real-time applications and low-cost processing. Storing and processing the model and activation states requires large GPU clusters, which is often a barrier for data scientists who do not have access to such resources.

The Massive LLM inference and fine-tuning system optimization stack Snowflake’s AI platform offers a solution to these problems. By using advanced parallelism techniques and memory optimizations, Snowflake enables fast and efficient AI processing without the need for overly complex and therefore very expensive infrastructure.

The system can deliver real-time performance on just one GPU node and supports context windows up to 128 KB in multi-node configurations. The system is flexible in its deployment on both new and legacy hardware, allowing it to benefit a wider range of organizations.

Benefits for data scientists

This system enables data scientists to optimize Llama 3.1 405B using mixed precision techniques on fewer GPUs, reducing the dependence on large GPU clusters and making it easier to develop and deploy high-performance, enterprise-level generative AI applications.

In addition, Snowflake has developed an optimized infrastructure for fine-tuning, including model distillation, security rails, retrieval augmented generation (RAG), and synthetic data generation, so that enterprises can easily get started with these applications within Cortex AI.

The announcements are fully in line with Snowflake’s strategy, which, as a specialist in data in the cloud, also wants to develop into an AI partner. On the one hand, Snowflake is uniquely positioned to connect AI developments with data customers, but on the other hand, it has little choice. The AI ​​hype is in full swing and without relevant AI solutions for customer data, it could look elsewhere.

Source: IT Daily

Leave a Reply

Your email address will not be published. Required fields are marked *