News

User Story

Engineering

Culture

Anyscale team

Ray Summit 2024, the annual Ray community conference, is back in San Francisco, CA this September 30th through October 2nd. It is a two-day event with a third day dedicated to training.

Call for Proposals is open Blog Post

Ray Summit 2024 Call for Proposals is now open

Ramit Hora

Intel and Anyscale have integrated Ray with Gaudi3. Gaudi3 is AI-optimized for efficient training and other machine learning workloads.

Anyscale - Intel

Accelerating AI: Harnessing Intel(R) Gaudi(R) 3 with Ray 2.10

Ray Meetup Logo

Update on Ray CVE-2023-48022: New Verification Tooling Available

Neelay Shah

Akshay Malik

Anyscale is teaming with NVIDIA to combine the developer productivity of Ray Serve and RayLLM with the cutting-edge optimizations from NVIDIA Triton Inference Server software and the NVIDIA TensorRT-LLM library.

NVIDIA-Anyscale Logo 1280x680.jpg

Low-latency Generative AI Model Serving with Ray, NVIDIA Triton Inference Server, and NVIDIA TensorRT-LLM

Artur Niederfahrenhorst

Kourosh Hakhamaneshi

Based on the popular “Needle In a Haystack” benchmark and RAG, we share our process of creating a problem-specific fine-tuning dataset to extend the context of models to build better RAG systems.

fine tuning llms query featured

Fine-tuning LLMs for longer context and better RAG systems

Scott Lee

Kyle Huang

Cheng Su

Hao Chen

Generating embeddings is a critical task for developing successful RAG applications, and Anyscalew with Pinecone is the most cost-efficient solution for this workflow.

blog-anyscale-pincone

RAG at Scale: 10x Cheaper Embedding Computations with Anyscale and Pinecone

llm-api-leaderboard-compare

Comparing LLM performance: Introducing the Open Source Leaderboard for LLM APIs

Endpoints Team

Anyscale Endpoints is the first LLM APIs providing a wide range of capabilities to empower developers to build their applications not just from serving and fine tuning LLMs, but also leveraging embedding services and function calling.

anyscale-endpoints-llama-and-orca

Anyscale Endpoints: JSON Mode, Function calling, New models: Llama Guard and Mistral-7B-OpenOrca

JSON Mode and Function calling Features

Anyscale Endpoints: JSON Mode and Function calling Features

Portkey and Anyscale Endpoints, when used in combination, offer a comprehensive LLMOps stack for developing AI applications with open-source Large Language Models (LLMs).

portkey-and-anyscale

Portkey ♥️ Anyscale Endpoints

Update on Ray CVEs CVE-2023-6019, CVE-2023-6020, CVE-2023-6021, CVE-2023-48022, CVE-2023-48023

Retrieval-augmented generation, or RAG applications are among the most popular applications built with LLMs. Embedding endpoints enables developers to use open-source embedding models. Today, we are starting with gte-large, and developers can access it at $0.05/MTokens. We plan to add more models in the future, and users can request newer embedding models by filling out this google form. For more info visit here.

anyscale-endpoints-llama-2

Anyscale Endpoints: Embedding endpoint, Llama-2 70B fine-tuning and improved sign-up experience

This section is used to order the "Types" and "Tags" that show up for filters on the Blog Index

Types

Products / Libraries

Blog

Hosted Anyscale is in Private Preview. Click here to request access!

Anyscale is the leading AI application platform. With Anyscale, developers can build, run and scale AI applications instantly.

Featured Posts and News

Ray Summit 2024 Call for Proposals is now open

Accelerating AI: Harnessing Intel(R) Gaudi(R) 3 with Ray 2.10

Low-latency Generative AI Model Serving with Ray, NVIDIA Triton Inference Server, and NVIDIA TensorRT-LLM

No posts found.