Hugging Face has announced HUGS, an alternative to Nvidia’s Inference Microservices (NIMs). This allows you to run and deploy AI models on a wide range of hardware.
HUGS, short for Hugging Face Generative AI Services, is based on the open source Text Generation Inference (TGI) and Transformers frameworks. This makes the containers compatible with various hardware, including Nvidia and AMD GPUs. According to The Register, specialized AI accelerators such as Amazon Inferentia or Google’s TPUs could also be added in the future.
HUGS is similar to Nvidia’s NIMs and offers pre-configured container images that can be easily deployed via Docker or Kubernetes. These can be accessed via OpenAI API calls.
Although HUGS uses open source technologies, they are not free. When deployed on AWS or Google Cloud, using HUGS costs about a dollar per container per hour. In comparison, Nvidia charges $1 per hour per GPU for NIMs in the cloud or $4,500 per year per GPU for on-premise use. However, supporting different hardware platforms offers customers more flexibility.
Flexibility for smaller users
For smaller deployments, HUGS containers will be available through DigitalOcean at no additional cost to the software. The computing power still has to be paid for. DigitalOcean recently started offering GPU-based VMs based on Nvidia’s H100 accelerators. Prices range from $2.5 to $6.74 per hour, depending on the number of GPUs used and the length of the contract.
Hugging Face will also make the new service available to its Enterprise Hub subscribers. These users pay $20 per month per user and can deploy HUGS on their own infrastructure.
As for supported models, Hugging Face is targeting these open models for now: Meta Llama 3.1 Mistral Mixtral, Alibaba Qwen 2.5, and Google Gemma 2. The company expects to add more models in the future, including Microsoft’s Phi series.