Home BlogBlog Detail

Anyscale and Lambda - Addressing AI Scarcity with Engineering

By The Anyscale Team | November 21, 2023

The Generative AI boom is well underway, with large language models (LLMs) creating a global understanding of the enormous untapped potential it presents, seemingly overnight. The world has woken up and “the race is on” across the platform vendor community to win more AI workloads.

The news isn’t all good. While ChatGPT has shown “the art of the possible”, most large organizations face a host of challenges trying to deploy AI in production. Those challenges include everything from clearly-defined use cases to executive sponsorship to the AI skills on the deployment team. But the biggest constraint on Generative AI is the scarcity of compute, specifically specialized compute for AI workloads.

In the category of “simple, not easy” solutions, one could alleviate this resource constraint by:

More efficiently using the compute resources that we already have, so that we squeeze every drop of productivity out of what is currently available.
Increasing the supply of compute resources so that it’s faster and less expensive to acquire hardware for AI.

The good news for AI/ML practitioners is that we’re doing both - one directly, and one with an exciting partner.

At Anyscale, we continue to push the boundaries of compute efficiency - setting the record in the cloudsort benchmark, exploiting new inference engines like vLLM to deliver significant performance breakthroughs , speeding up loading of large language models, and more.

And our partners at Lambda are increasing access to specialized AI hardware by creating a purpose-built accelerated cloud. They have a long history of applying hardware in innovative ways to address compute challenges, offering Blade GPU servers back in 2017, and then introducing Lambda GPU Cloud to support deep learning workloads for customers like Apple, Raytheon, and Sony.

As regular readers know, Anyscale has a long-standing partnership with Nvidia, who took the stage earlier this fall at Ray Summit 2023 to talk about the latest integrations and optimizations of Ray and Anyscale with Nvidia hardware and software.

Given that Lambda offers a wide range of Nvidia-based instances to support AI workloads, it was natural to also validate and support our Nvidia integrations with Lambda’s offerings and tooling.

We had a great opportunity to take advantage of the agility that Lambda’s model offers. Just a couple of weeks ago, we saw LLM performance claims circulating on social media from different LLM API vendors, and wanted to both investigate those claims, as well as provide LLM developers an open, self-serve way to make their own performance assessments.

While we at Anyscale have an abundance of GPUs supporting customer workloads, we wanted to access a robust testing configuration quickly, without any disruption to customer-supporting infrastructure.

LinkEnter Lambda

They were able to offer on-demand access to Nvidia GPUs, accelerating our testing as well as delivery of new LLMPerf tooling. Read the details and see the test results here.

Thanks to Ray, the Anyscale Platform, and Lambda, we were able to rapidly configure a custom AI pipeline, run tests, and share the results, test harness, and other artifacts. There was no hardware-procurement cycle, and no need to repurpose machines that were serving other production workloads. That’s a win for Anyscale, and more importantly, a win for the LLM developer community, made possible by the collaboration between Lambda and Anyscale.

LinkAccelerating the AI Future

At Anyscale, we’re always thrilled when we see newer players creating greater customer choice and advancing the state of the art in AI. We’re looking forward to offering services powered by Lambda hardware and supporting more joint customers as Lambda expands operations in 2024 and beyond.

Enter Lambda
Accelerating the AI Future

Sharing

Sign up for product updates

Deploy DeepSeek‑R1 with vLLM and Ray Serve on Kubernetes

Introducing KubeRay v1.4

The architecture of a Reinforcement Learning (RL) library is split into two primary components: Generation and Training. During the generation phase, an LLM Engine performs multi-turn rollouts within an environment to produce data and reward signals. This output is then fed into the training phase to update the model's parameters. This process forms a feedback loop, where the progressively improved model generates the next iteration of data for continuous refinement.

Open Source RL Libraries for LLMs

Ready to try Anyscale?

Access Anyscale today to see how companies using Anyscale and Ray benefit from rapid time-to-market and faster iterations across the entire AI lifecycle.