Ryan Daws is a senior editor at TechForge Media with over a decade of experience in crafting compelling narratives and making complex topics accessible. His articles and interviews with industry leaders have earned him recognition as a key influencer by organisations like Onalytica. Under his leadership, publications have been praised by analyst firms such as Forrester for their excellence and performance. Connect with him on X (@gadget_ry) or Mastodon (@gadgetry@techhub.social)
Hugging Face has joined forces with NVIDIA to bring inference-as-a-service capabilities to one of the world’s largest AI communities. This collaboration, announced at the SIGGRAPH conference, will provide Hugging Face’s four million developers with streamlined access to NVIDIA-accelerated inference on popular AI models.
The new service enables developers to swiftly deploy leading large language models, including the Llama 3 family and Mistral AI models, with optimisation from NVIDIA NIM microservices running on NVIDIA DGX Cloud. This integration aims to simplify the process of prototyping with open-source AI models hosted on the Hugging Face Hub and deploying them in production environments.
For Enterprise Hub users, the offering includes serverless inference, promising increased flexibility, minimal infrastructure overhead, and optimised performance through NVIDIA NIM. This service complements the existing Train on DGX Cloud AI training service available on Hugging Face, creating a comprehensive ecosystem for AI development and deployment.
The new tools are designed to address the challenges faced by developers navigating the growing landscape of open-source models.
By providing a centralised hub for model comparison and experimentation, Hugging Face and NVIDIA are lowering the barriers to entry for cutting-edge AI development. Accessibility is a key focus, with the new features available through simple “Train” and “Deploy” drop-down menus on Hugging Face model cards, allowing users to get started with minimal friction.
At the heart of this offering is NVIDIA NIM, a collection of AI microservices that includes both NVIDIA AI foundation models and open-source community models. These microservices are optimised for inference using industry-standard APIs, offering significant improvements in token processing efficiency – a critical factor in language model performance.
The benefits of NIM extend beyond mere optimisation. When accessed as a NIM, models like the 70-billion-parameter version of Llama 3 can achieve up to 5x higher throughput compared to off-the-shelf deployment on NVIDIA H100 Tensor Core GPU-powered systems. This performance boost translates to faster, more robust results for developers, potentially accelerating the development cycle of AI applications.
Underpinning this service is NVIDIA DGX Cloud, a platform purpose-built for generative AI. It offers developers scalable GPU resources that support every stage of AI development, from prototype to production, without the need for long-term infrastructure commitments. This flexibility is particularly valuable for developers and organisations looking to experiment with AI without significant upfront investments.
As AI continues to evolve and find new applications across industries, tools that simplify development and deployment will play a crucial role in driving adoption. This collaboration between NVIDIA and Hugging Face empowers developers with the resources they need to push the boundaries of what’s possible with AI.
(Image Credit: NVIDIA)
See also: OpenAI slashes AI costs with high-performance GPT-4o mini
Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.
Tags: AI, ai development, artificial intelligence, development, hugging face, llama 3, nvidia