Home PressAnyscale Teams With NVIDIA to Supercharge LLM Performance and Efficiency

Anyscale Teams With NVIDIA to Supercharge LLM Performance and Efficiency

Integration of Ray and Anyscale with NVIDIA AI Software Accelerates Computing Speeds, End-to-End Development and Deployment of Generative AI LLMs and Applications.

Update June 2024: Anyscale Endpoints (Anyscale's LLM API Offering) and Private Endpoints (self-hosted LLMs) are now available as part of the Anyscale Platform. Click here to get started on the Anyscale platform.

SAN FRANCISCO – Ray Summit – Sept. 18, 2023 – Anyscale, the AI infrastructure company built by the creators of Ray, the world’s fastest-growing open-source unified framework for scalable computing, today announced a collaboration with NVIDIA to further boost the performance and efficiency of large language model (LLM) development on Ray and the Anyscale Platform for production AI.

The companies are integrating NVIDIA AI software into Anyscale’s scalable computing platforms, including Ray open source, the Anyscale Platform, and Anyscale Endpoints, announced separately today.

The open-source integrations will bring NVIDIA software, including NVIDIA TensorRT-LLM, NVIDIA Triton Inference Server, and NVIDIA NeMo to Ray to supercharge end-to-end AI development and deployment. Making cutting-edge AI software available via open source democratizes access and dramatically increases the audience of developers that can use this integration.

For production AI, the companies will certify the NVIDIA AI Enterprise software suite for the Anyscale Platform, bringing enterprise-grade security, stability, and support to companies deploying AI. An additional integration with Anyscale Endpoints will bring support for the NVIDIA software to a greatly expanded pool of AI application developers via easy-to-use application programming interfaces.

“Realizing the incredible potential of generative AI requires computing platforms that help developers iterate quickly and save costs when building and tuning LLMs,” said Robert Nishihara, CEO and co-founder of Anyscale. “Our collaboration with NVIDIA will bring even more performance and efficiency to Anyscale’s portfolio so that developers everywhere create LLMs and generative AI applications with unprecedented speed and efficiency.”

“LLMs are at the heart of today’s generative AI transformation, and the developers creating and customizing these models require full-stack computing with efficient orchestration throughout the AI life cycle,” said Manuvir Das, vice president of Enterprise Computing at NVIDIA. “The combination of NVIDIA AI and Anyscale unites incredible performance with ease of use and the ability to scale rapidly with success.”

NVIDIA AI Acceleration Speeds End-to-End Anyscale Development

NVIDIA’s open-source and production software helps boost accelerated computing performance and efficiency for generative AI development.

The integration delivers numerous benefits for customers and users:

NVIDIA TensorRT-LLM automatically scales inference to run models in parallel over multiple GPUs, which can provide up to 8X higher performance when running on NVIDIA H100 Tensor Core GPUs, compared to prior-generation GPUs. These capabilities will bring further acceleration and efficiency to Ray, which ultimately results in significant cost savings for at-scale LLM development.
NVIDIA Triton Inference Server standardizes AI model deployment and execution across every workload. It supports inference across cloud, data center, edge, and embedded devices on GPUs, CPUs, and other processors, maximizing performance and reducing end-to-end latency by running multiple models concurrently to maximize GPU utilization and throughput for LLMs. These capabilities will add more efficiency for developers deploying AI in production on Ray and the Anyscale Platform.
NVIDIA NeMo is an end-to-end, cloud-native framework for building, customizing, and deploying generative AI models anywhere. It includes training and inferencing frameworks, guardrailing toolkits, data curation tools, and pretrained models, offering enterprises an easy, cost-effective, and fast way to adopt generative AI. The integration of NeMo with Ray and the Anyscale Platform will enable developers to fine-tune and customize models with enterprise data, paving the way for LLMs that understand the unique offerings of individual businesses.
Anyscale Endpoints is a service that enables developers to integrate fast, cost-efficient, and scalable LLMs into their applications using popular LLM APIs. Endpoints can be tailored to specific use cases and fine-tuned with additional content and context to serve users’ specific needs while ensuring the best combination of price and performance. Endpoints is less than half the cost of comparable proprietary solutions for general workloads and up to 10X less expensive for specific tasks.

More details are available on the NVIDIA blog.

Availability

NVIDIA AI integrations with Anyscale are under development and expected to be available in Q4. Practitioners interested in early access are encouraged to apply here.

To learn more about Anyscale or to join the company’s growing team, please visit https://www.anyscale.com/. To join the Ray open source community, please go to Ray.io.

About Anyscale

Anyscale is the leading AI application platform. With Anyscale, developers can build, run and scale AI applications instantly. Built by the creators of Ray, the world’s fastest growing open-source unified framework for scalable computing, thousands of companies rely on technology from Anyscale to accelerate the delivery of AI products to market at significantly reduced cost. Backed by Andreessen Horowitz, NEA, Addition, Intel Capital and Foundation Capital, Anyscale is headquartered in San Francisco, CA. www.anyscale.com

Contact and press inquiries: anyscale@launchsquad.com