Announcing Aviary: Open Source Multi-LLM Serving

By Waleed Kadous   

Exec Summary: Open source LLMs are getting better every day, so today we’re open sourcing Aviary to make testing, evaluating and deploying Open Source LLMs easier – we found it harder than we thought it should be so we used Ray Serve to fix it. 

  • Try it out with Aviary Explorer here

  • Github Repo here

  • Video demo and explanation here

  • Slack support here. Forum support here

Want a managed version of Aviary? Sign up here.

We’re excited to announce the release of Aviary: a new open source project that simplifies and enables easy self-hosted serving of multiple LLM models efficiently. 

We’re big fans of open source LLMs here at Anyscale. The rate of improvement of open source LLMs has been nothing short of phenomenal. This has given the AI community many options beyond the big “closed” players like OpenAI, Anthropic, Cohere and more. 

Why are companies exploring self hosted open source LLMs? There are a few reasons:

  • Cost: LLMs are very expensive to operate. A single query can cost tens of cents. Using open source can be considerably cheaper. 

  • Latency: By colocating LLMs and business logic, sometimes even on the same machine, latency – one of the biggest issues with deploying LLMs – can be kept low.  

  • Transparency: Organizations want to understand exactly what is happening in their models. Open source allows them to understand what’s happening inside them.

  • Deployment flexibility: Open Source models can be deployed on premise, in the cloud using the user’s cloud resources, or as part of a SaaS offering.  

  • Data Control: Data governance issues are much easier to guarantee if the data does not leave your control. 

  • Customization: Customizing/fine-tuning on proprietary data to enable domain specific answers at high quality.

We were writing some tools for ourselves to really understand the answers to the questions above, as well as the advantages of closed models, mainly focused around quality. A funny thing happened on the way, however: we discovered that while the open source models have improved, the infrastructure for serving LLMs has not kept up. We started building our own library for loading the models, autoscaling them efficiently, etc on top of Ray Serve, dealing with variations between the models on things like stop tokens etc. 

It was then that we realized, actually the rest of the LLM community might benefit from this library as well. So, today we are releasing Aviary as open source. 

Aviary helps leverage the advantages of open source models by building on the solid foundation of Ray, the popular framework for scalable AI. In particular, it takes advantage of Ray Serve, the highly flexible serving framework that is part of Ray. 

Aviary does this by: 

  • Providing an extensive suite of pre-configured open source LLMs, with reasonable defaults that work out of the box. 

  • Integrating acceleration approaches like DeepSpeed with the packaged LLMs. 

  • Simplifying the deployment of multiple LLMs within a single unified framework. 

  • Simplifying the addition of new LLMs to within minutes in most cases

  • Offering unique autoscaling support, including scale-to-zero (a first in open source). 

We’ve also included a demo Gradio frontend that shows off what’s possible with these capabilities, as well as some command line tools to address the evaluation questions that originally motivated this project. 

Aviary Light Mode

While existing solutions have some of these features individually and we are grateful to build on their foundations (such as Hugging Face’s text-generation-inference), none of the existing solutions brings these capabilities together in a way that is convenient for users. We are also planning to continue to expand the feature set of Aviary, adding support for features like streaming, continuous batching and others.

Aviary is also open to community contributions – especially for adding new LLMs. We’ll then redeploy these in production so the rest of the world can use those LLMs too. We will also be supporting Aviary through Slack, Discourse, GitHub issues and an LLM-based RayBot. 

At the same time we understand that not everyone wants to take on the additional challenges of maintaining their own aviary of models. For that reason, we will also be offering a hosted version of Aviary that builds on top of our Anyscale managed Ray platform and offers additional features around deployment including:

  • Using spot instances with on-demand fallback for large potential savings

  • Zero downtime upgrades with no dropped requests

  • Faster deployment and autoscaling

  • Improved observability tools. 

We are also announcing that Aviary will be available at no additional charge for existing Anyscale customers via our workspaces solution, and we’re actively onboarding new Aviary customers now. If you’d like to deploy Aviary, please reach out to us here.

Next steps

Anyscale's Platform in your Cloud

Get started today with Anyscale's self-service AI/ML platform:


  • Powerful, unified platform for all your AI jobs from training to inference and fine-tuning
  • Powered by Ray. Built by the Ray creators. Ray is the high-performance technology behind many of the most sophisticated AI projects in the world (OpenAI, Uber, Netflix, Spotify)
  • AI App building and experimentation without the Infra and Ops headaches
  • Multi-cloud and on-prem hybrid support