The Anyscale Team

import openai
client = openai.OpenAI(
    base_url = "https://api.endpoints.anyscale.com/v1",
    api_key = "esecret_YOUR_API_KEY"
)
embedding = client.embeddings.create(
    model="thenlper/gte-large",
    input="Your text string goes here",
)
print(embedding.model_dump())


{
    'data': [
        {'embedding': [...],
         'index': 0,
         'object': 'embedding'
         }
     ],
     'model': 'thenlper/gte-large'
   ...
}

import openai

client = openai.OpenAI(
    base_url = "https://api.endpoints.anyscale.com/v1",
    api_key = "esecret_yourAuthTokenHere"
)

# Upload the file
file_name = "train.jsonl"
file = client.files.create(
  file=open(file_name, "rb"),
  purpose="fine-tune",
  user_provided_filename=file_name,
)

# Launch the finetuning job
client.fine_tuning.jobs.create(
    model="meta-llama/Llama-2-70b-chat-hf",
    training_file="file_123",
)

Retrieval-augmented generation, or RAG applications are among the most popular applications built with LLMs. Embedding endpoints enables developers to use open-source embedding models. Today, we are starting with gte-large, and developers can access it at $0.05/MTokens. We plan to add more models in the future, and users can request newer embedding models by filling out this google form. For more info visit here.

anyscale-endpoints-llama-2

Access Anyscale today to see how companies using Anyscale and Ray benefit from rapid time-to-market and faster iterations across the entire AI lifecycle.

Anyscale Endpoints: Embedding endpoint, Llama-2 70B fine-tuning and improved sign-up experience

The Call for Speakers for Ray Summit 2025 has Been Extended!

Anyscale is the leading AI application platform. With Anyscale, developers can build, run and scale AI applications instantly.

Model	Fixed Cost/Run	Price ($/M tokens)
Llama-2-7b-chat-hf	5	1
Llama-2-13b-chat-hf	5	2
Llama-2-70b-chat-hf	5	4

Model	Price ($/M tokens)
Llama-2-7b-chat-hf	0.25
Llama-2-13b-chat-hf	0.50
Llama-2-70b-chat-hf	1.00

Anyscale Endpoints: Embedding endpoint, Llama-2 70B fine-tuning and improved sign-up experience

LinkEmbedding Endpoints

LinkLlama-2 70B fine tuning

LinkFine-tuning Pricing

LinkFine-tuned model inference Pricing

LinkImprovements to user experience

Table of contents

Sharing

Sign up for product updates

Recommended content

Your Data and AI Frameworks Evolved – What About Your Distributed Compute Framework?

Ray on Alibaba Cloud: Building an ML Platform

An Open Source Stack for AI Compute: Kubernetes + Ray + PyTorch + vLLM

Ready to try Anyscale?