Anyscale Endpoints: Embedding endpoint, Llama-2 70B fine-tuning and improved sign-up experience

By Anyscale team   

Update June 2024: Anyscale Endpoints (Anyscale's LLM API Offering) and Private Endpoints (self-hosted LLMs) are now available as part of the Anyscale Platform. Click here to get started on the Anyscale platform.

LinkEmbedding Endpoints

Retrieval-augmented generation, or RAG applications are among the most popular applications built with LLMs. Embedding endpoints enables developers to use open-source embedding models. Today, we are starting with gte-large, and developers can access it at $0.05/MTokens. We plan to add more models in the future, and users can request newer embedding models by filling out this google form. For more info visit here.

Example usage:

1
2
3
4
5
6
7
8
9
10
import openai
client = openai.OpenAI(
    base_url = "https://api.endpoints.anyscale.com/v1",
    api_key = "esecret_YOUR_API_KEY"
)
embedding = client.embeddings.create(
    model="thenlper/gte-large",
    input="Your text string goes here",
)
print(embedding.model_dump())

The output:

1
2
3
4
5
6
7
8
9
10
{
    'data': [
        {'embedding': [...],
         'index': 0,
         'object': 'embedding'
         }
     ],
     'model': 'thenlper/gte-large'
   ...
}

LinkLlama-2 70B fine tuning

Fine tuning is a popular technique to allow for model personalization and optimization, making it possible to improve model quality for specific uses, while also reducing costs and improving performance.

We have seen good traction on Llama-2 7B and 13B fine-tuning API. Today we are extending the fine-tuning functionality to the Llama-2 70B model. Llama-2 70B is the largest model in the Llama 2 series of models, and starting today, you can fine-tune it on Anyscale Endpoints with a $5 fixed cost per job run and $4/M tokens of data. You can start inference on the fine-tuned model at $1/M tokens. For more info visit here.

LinkFine-tuning Pricing

Model

Fixed Cost/Run

Price ($/M tokens)

Llama-2-7b-chat-hf

5

1

Llama-2-13b-chat-hf

5

2

Llama-2-70b-chat-hf

5

4

LinkFine-tuned model inference Pricing

Model

Price ($/M tokens)

Llama-2-7b-chat-hf

0.25

Llama-2-13b-chat-hf

0.50

Llama-2-70b-chat-hf

1.00

Example usage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import openai

client = openai.OpenAI(
    base_url = "https://api.endpoints.anyscale.com/v1",
    api_key = "esecret_yourAuthTokenHere"
)

# Upload the file
file_name = "train.jsonl"
file = client.files.create(
  file=open(file_name, "rb"),
  purpose="fine-tune",
  user_provided_filename=file_name,
)

# Launch the finetuning job
client.fine_tuning.jobs.create(
    model="meta-llama/Llama-2-70b-chat-hf",
    training_file="file_123",
)

LinkImprovements to user experience

Users can now get started with Anyscale Endpoints without a credit card. Get started with free credits and add payment information on the account later.

Ready to try Anyscale?

Access Anyscale today to see how companies using Anyscale and Ray benefit from rapid time-to-market and faster iterations across the entire AI lifecycle.