Run, Fine Tune and Scale LLMs via production-ready APIs
Run, Fine Tune and Scale LLMs via production-ready APIs
Choose how to deploy
Fast, cost-efficient, serverless APIs for LLM Serving and Fine Tuning
Easy APIs you can query and fine-tune to power your apps without having to deal with infrastructure. Get started in minutes.
Dedicated GPUs to deploy your custom models and scale your production applications.
Endpoints | Dedicated Endpoints | |
---|---|---|
General | ||
Infrastructure | Anyscale Cloud | Anyscale Cloud |
Accelerator | Auto | Select from A10, A100, H100 |
Scaling | Auto | Configurable, up to 1000 GPUs |
Rate Limiting | 30 Concurrent requests, Cap increased upon request | No rate limiting |
Compatibility | OpenAl Compatible API, Integration with Weights & Biases | OpenAl Compatible API, Integration with Weights & Biases |
Pricing | $/Token | Based on compute and accelerator usage |
Observability | Token Usage, System Status | Built in Grafana support, Alerting |
Support | Fast response through email | Dedicated support |
Customizability | Bring your custom model and optimize for throughput or latency | |
Performance Optimization | ||
Models | ||
Base Model Support | Llama2 Family (7B, 13B, 70B, Code Llama), Mistral 7B and more | Any compatible LLM or embedding model |
Fine-Tuning | ||
Fine-tuning | ||
Security | ||
Data Privacy | Data is never used for training. | Data is never used for training. |
Security | API Key | SSO/SAML + API Key |
Logging |
For $1 or less per million tokens, use our growing list of high performance models or deploy your own.
Validate your ideas with a familiar API, fine tune LLMs to get the quality you need, deploy your apps, and repeat.
Anyscale Endpoints quickly adds new models, optimizations, and integrations to give you the best tools to build the best apps.