Open Source LLMs: Viable for Production or a Low-Quality Toy?

By Anyscale Ray Team   

This blog post is part of the Ray Summit 2023 highlights series where we provide a summary of the most exciting talk from our recent LLM developer conference.

Disclaimer: Summary was AI generated with human edits from video transcript.

LinkKey Takeaways

  1. Open source large language models (LLMs) like LLaMA are viable for production use cases. Companies like Anyscale and Hugging Face offer managed hosting of open source LLMs that makes it easy to use them.

  2. Open source LLMs can be much cheaper (e.g. 30x cheaper) than proprietary models like GPT-3 while still providing good enough performance for many applications. This cost difference allows some products/use cases to exist that wouldn't be feasible with proprietary models.

  3. Fine-tuned open source LLMs can sometimes outperform even the largest proprietary general purpose LLMs like GPT-4. However, fine-tuning is not a silver bullet - it works better for language formats vs factual information.

  4. Open source LLMs currently lag behind proprietary models in some areas like output quality, following instructions precisely, support for function templates, and large context window sizes. But many of these gaps are being actively worked on.

  5. Using a hybrid approach with both open source and proprietary LLMs can provide a good cost/performance tradeoff. The speaker gave an example where they send 5% of queries to GPT-4 and 95% to an open source LLM.

Open-source Language Models (LLMs) are increasingly becoming viable for production, challenging the notion that they are low-quality toys. This blog post explores the strengths and weaknesses of open LLMs compared to proprietary ones like GP4. We delve into factors such as model quality, instruction following, function templates, and context window sizes, highlighting the cost-effectiveness and potential feasibility of open LLMs in various applications.

LinkOpen Source LLMs: Viable for Production or a Low-Quality Toy?

In recent years, the landscape of language models has undergone a significant transformation, with open-source models gaining attention and challenging the perception that they are merely low-quality toys. This shift raises the question: are Open Source Language Models (LLMs) truly viable for production applications? To answer this question, we'll dissect the insights provided in a recent talk titled "Open Source LLMs: Viable for Production or a Low-Quality Toy?" and explore the strengths and weaknesses of open LLMs in comparison to their proprietary counterparts.

LinkQuality Matters

The talk begins by addressing the elephant in the room: model quality. Historically, proprietary LLMs, such as GP4, have been considered the gold standard for high-quality outputs. They excel in analogical reasoning, planning, and generating refined answers. However, the speaker introduces a twist—open LLMs like Llama 70b are not to be dismissed. While GP4 might lead in certain tasks, open LLMs can rival their proprietary counterparts, particularly in tasks like summarization and fine-tuning.

LinkInstruction Following: Alignment and Flexibility

A surprising revelation emerges: proprietary LLMs showcase better obedience when explicitly instructed, thanks to reinforcement learning through human feedback. Open LLMs, on the other hand, might need an additional fine-tuning step to ensure precise instruction following. However, the speaker suggests a workaround: task-specific LLMs that instruct other LLMs, resulting in better performance and cost-effectiveness.

LinkFunction Templates: Locking in Format

Proprietary LLMs introduce a convenient feature—function templates. These templates allow users to lock in the desired format, eliminating the need for post-processing corrections. Open LLMs, initially lacking this feature, are catching up. While function templates might not be readily available in all open-source models, ongoing projects indicate that this gap could be closed in the near future.

LinkContext Window Sizes: The Larger, the Better

Context window sizes play a crucial role in handling information, and proprietary LLMs have been ahead of the curve with larger context windows. Open-source models like Llama 70b might lag in this aspect, but promising projects are underway to address this limitation. As technology evolves, the hope is that open LLMs will soon match, if not exceed, the context window sizes offered by their proprietary counterparts.

LinkCost-Effectiveness: The Open LLM Advantage

One of the most compelling arguments for open LLMs is cost-effectiveness. The talk emphasizes that open models are not just marginally cheaper but can be radically cheaper. This cost advantage opens the door to a myriad of possibilities, making certain applications economically feasible that would otherwise be financially prohibitive using proprietary models.

LinkThe Road Ahead: Challenges and Opportunities

While open LLMs have made significant strides, challenges persist. Quality improvements, expanded context windows, and the integration of function templates are ongoing efforts within the open-source community. However, the potential of open LLMs is unmistakable. Their ability to challenge proprietary models in various tasks and the cost advantage they offer make them a formidable option for developers and businesses.

LinkConclusion: Choosing the Right Tool for the Job

In conclusion, the debate about open LLMs being viable for production or mere toys is shifting in favor of viability. The talk provides a nuanced perspective, urging us to choose the right tool for the right job. Open LLMs are not a one-size-fits-all solution, but they bring cost advantages and capabilities that can be leveraged effectively in specific scenarios.

As the open-source community continues to innovate and address current limitations, the future looks promising for open LLMs. Developers and businesses alike should keep a watchful eye on this evolving landscape, ready to embrace the opportunities presented by these increasingly powerful and cost-effective language models.

Sign up now for Anyscale endpoints and get started fast or contact sales if you are looking for a comprehensive overview of the Anyscale platform.

Next steps

Anyscale's Platform in your Cloud

Get started today with Anyscale's self-service AI/ML platform:


  • Powerful, unified platform for all your AI jobs from training to inference and fine-tuning
  • Powered by Ray. Built by the Ray creators. Ray is the high-performance technology behind many of the most sophisticated AI projects in the world (OpenAI, Uber, Netflix, Spotify)
  • AI App building and experimentation without the Infra and Ops headaches
  • Multi-cloud and on-prem hybrid support