Anyscale Endpoints Preview: Fast, Cost-Efficient, and Scalable LLM APIs

By Ameer Haj Ali and Robin Singh   

TLDR: Anyscale releases Anyscale Endpoints Preview for LLM developers to run and fine-tune open-source LLMs fast, cost-efficiently, and at scale. Get started now.


Generative AI and Large Language Models (LLMs) have propelled the AI domain to new heights in recent times. However, along with the evolution and capabilities of these sophisticated models, numerous challenges arise when it comes to their deployment and fine-tuning. Recognizing these hurdles, Anyscale is releasing a preview of Anyscale Endpoints to help developers integrate fast, cost-efficient, and scalable LLM APIs.

LinkHere's a quick glimpse at what Anyscale Endpoints offers:

  • State-of-the-art open-source and proprietary performance and cost optimizations.

  • A serverless approach to running open-source LLM models, mitigating infrastructure complexities.

  • A seamless transition for running base or fine-tuned models on your cloud.

  • Streaming response.

  • Utilization of the power of Ray Serve and Ray Train libraries.

LinkGetting started with Anyscale Endpoints:

  • Integration within your workflow is straightforward, all you have to do is sign up in less than two minutes.

  • Compatibility with the OpenAI API and SDK enables a smooth integration of Anyscale LLM Endpoints with minimal code changes.

LinkAdditional services from the Anyscale Platform:

  • If you need to fine-tune and deploy models on Anyscale's cloud infrastructure, our platform offers a comprehensive suite of services.

  • For a more secure, customized environment, the Anyscale platform provides the necessary resources and support for fine-tuning and deploying LLMs in your own cloud managed by Anyscale's cloud infrastructure.

Over the years, we have developed a robust and scalable infrastructure through continuous iterations and valuable feedback from users and customers. The stability, reliability, and observability of our managed Ray solution on the Anyscale platform allow you to focus on your AI applications, leaving the complexities of infrastructure management to us. With an emphasis on advanced autoscaling, smart instance selection, intelligent spot instance support, and user-customized Docker images, Anyscale Endpoints serves as the ideal platform for accelerating your LLM application development.

We are excited to see the innovation that Anyscale Endpoints will bring to your generative AI applications and look forward to your feedback! Join us in this transformative journey by diving into the Anyscale platform.

Next steps

Anyscale's Platform in your Cloud

Get started today with Anyscale's self-service AI/ML platform:

  • Powerful, unified platform for all your AI jobs from training to inference and fine-tuning
  • Powered by Ray. Built by the Ray creators. Ray is the high-performance technology behind many of the most sophisticated AI projects in the world (OpenAI, Uber, Netflix, Spotify)
  • AI App building and experimentation without the Infra and Ops headaches
  • Multi-cloud and on-prem hybrid support