Case Study

Notion Enhances Search Capabilities with Anyscale

With Anyscale, the Notion team effortlessly scales out inference for their reranking models to deliver fast and relevant results.

2-3x

faster deployment

20%

better latency

2

months to migrate all workflows to Anyscale

Quick Takeaways

2-3x faster deployment
20% better latency
2 months to migrate all workflows to Anyscale
AI infrastructure usage doubled
Reduced time thinking about and coordinating AI infrastructure

LinkIntroduction

Notion is the AI workspace teams can make their own – connecting docs, projects, and knowledge in a single, powerful platform. As one of the fastest-growing applications globally with over 100 million users, Notion helps teams create, organize, and automate their work. Its AI-powered search ensures everyone can instantly find trusted answers across their workspace and connected tools like Slack, Google Drive, GitHub, and many more. As a team's workspace content grows, delivering fast, relevant search results becomes increasingly complex, particularly for Notion AI's Enterprise Search, which uses machine learning to re-rank and personalize search results. Providing these results at scale – and with low latency – requires not only sophisticated models but also infrastructure capable of handling intensive computational demands.

To address their growing search infrastructure demands, Notion sought a solution that would both resolve current challenges and support their roadmap for other compute-intensive AI workloads, particularly embedding generation. Anyscale's performant, resilient, and developer-friendly experience for Ray – an AI-native distributed compute framework for Python – made it the clear choice for the evolution of Notion’s AI infrastructure.

"Anyscale gives us the confidence that when we build, it just works. It removes the friction around environment management and scaling, so our teams can focus on delivering fast, intelligent experiences to our users."

Sarah Sachs | Engineering Leader, AI Modeling

LinkThe Challenge: Delivering Personalized Search at Scale

Notion's ML-based search reranking model analyzes key information and usage patterns to deliver the most relevant results, helping users quickly find content across the customer's Notion workspace or any of the apps connected to it.

Initially, a fixed-size deployment could handle relatively stable search demand based on known throughput requirements. However, as Notion's user base expanded and AI adoption expanded rapidly, the original infrastructure reached the limits of its intended design – prompting the team to look for a more scalable, future-proof architecture to ensure consistently low-latency performance as both data volume and usage grew.

"At Notion, we knew our rapid growth would push our infrastructure beyond its limits. We needed a solution that could scale horizontally with our growth while maintaining strict low-latency performance requirements for our users. Anyscale was the answer."

Jake Sager | Software Engineer

The team was confident in Ray's proven scale and performance for AI-native workloads, but they didn’t want to shift valuable development time to managing infrastructure. What they needed was a platform that could scale horizontally with minimal overhead while providing high performance for every user, at any scale.

LinkThe Anyscale Advantage: Improved Performance, Reliability, and Developer Experience

After evaluating several providers in the space, the Notion team chose Anyscale to address both their immediate search application challenges and future ambitions for their embedding generation pipeline.

"Anyscale made horizontal scaling easy out of the box. It gave us seamless scalability to keep up with our fast-growing, low-latency inference needs – all without adding operational burden."

Jake Sager | Software Engineer

Requirement	Anyscale Advantage
Low latency and high throughput	- Scale-proven, Ray-based architecture for distributed Python. - Direct access to Ray experts via dedicated Slack channel for implementation guidance and ongoing support.
Resilient, adaptive scaling	- Compute resources automatically scale up and down based on application’s demands. - Advanced failure handling and roll backs to support blue-green deployment model.
Intuitive developer experience	- Out-of-the-box integration with popular IDEs to work locally and debug remotely. - Familiar Python interface with support for open-source models.

LinkPowering Search at Massive Scale

Notion's search infrastructure faced an immediate challenge: handling millions of daily searches while maintaining low latency. They needed a solution that could scale quickly to meet growing demand without compromising the seamless experience their users expected.

In addition to performance, the Notion team needed something they could implement at speed. Working closely with Anyscale's support team, they completed onboarding in less than two months. This successful transition was an easy one thanks to Anyscale's comprehensive support framework:

A dedicated Slack channel connecting Notion’s engineers with over 20 Anyscale specialists across field engineering, customer success, product, engineering, and leadership
Workspace mirroring technology that allowed Anyscale engineers to troubleshoot directly in Notion's environments, accelerating problem resolution
Proactive technical guidance including regular check-ins, best practice sharing, and early access to relevant product enhancements

The impact was clear from day one – search completion times dropped substantially and users could find answers and insights instantly. Teams moved faster and accomplished more, making Notion an even more essential part of their daily work.

"With Anyscale, we've achieved 20% better latency compared to our previous solution – all while supporting continued user growth."

Jake Sager | Software Engineer

LinkFail-Safe Deployments at Scale

Model deployments were another critical factor for Notion's always-on search API. With Anyscale, the team was able to implement a blue-green deployment models, which runs two identical production environments – at any one time, one is used for serving production traffic and the other is used for the newest deployment.

This deployment pattern provides several key advantages for Notion's search infrastructure, including:

Uninterrupted search during updates: Zero-downtime deployment ensures users continue to have a seamless search experience – even during model or infrastructure changes.
Fast, safe rollbacks: The team can quickly route traffic back to a previous version if issues arise, providing a safety net for millions of daily users.
Higher confidence in new versions: Testing in a production-like environment helps surface hidden issues before a full rollout.
Clear validation through comparison: Running two production environments side by side allows the team to easily monitor and compare metrics before committing to changes.

"Anyscale gives me confidence in my deployments. If it works in my workspace environment, I know it’ll perform reliably in production as well."

Jake Sager | Software Engineer

Best of all, this newfound velocity hasn't come at the expense of reliability. In fact, the team reports greater stability in production – a critical benefit for a feature as essential as search to the Notion experience.

LinkA Developer Experience That Accelerates Innovation

While many platforms promise scale, Anyscale distinguishes itself by delivering both seamless scalability and a frictionless developer experience for the entire ML workflow. With Anyscale Workspaces, the Notion team can easily install and cache packages to reduce friction on the management and deployment side.

Key to Anyscale’s developer experience is an integrated IDE which allows developers to use tried-and-loved development environments like VSCode and Cursor to edit code, install dependencies, and monitor resources – just as they would on their local machines.

"The IDE integration is epic. Anyscale lets me use my preferred tools like VSCode or Cursor on my laptop while still accessing distributed compute power. It's obvious that the Anyscale platform was built by developers for developers."

Jake Sager | Software Engineer

This developer-centric approach has already delivered significant benefits beyond the search team. What started as a solution for one use case has evolved – as of early 2025, three teams are exploring Anyscale and one other team has active models in production.

"With Anyscale, we've been able to show people how easy it is to pull a model from Hugging Face and run it ourselves on-prem – it's become a compelling selling point as we advocate for adoption across our search and AI organization."

Jake Sager | Software Engineer

LinkConclusion

With Anyscale as a key part of their AI infrastructure, Notion was able to achieve better throughput, lower latency, and increased developer adoption – all within two months of onboarding. Beyond these immediate gains, the platform positions Notion to scale for continued growth and to drive the next wave of machine learning innovation.

"Choosing Anyscale was a strategic bet on the future. As Notion AI use cases get more complex, I’m confident that Anyscale can handle it."

Jake Sager | Software Engineer

"We chose Anyscale not just for what we needed today, but for where we know we’re heading. As our AI workloads grow more complex, Anyscale gives us the infrastructure to scale without limits."

Sarah Sachs

Engineering Leader, AI Modeling @ Notion

Notion Enhances Search Capabilities with Anyscale

2-3x

20%

2

Quick Takeaways

LinkIntroduction

LinkThe Challenge: Delivering Personalized Search at Scale

LinkThe Anyscale Advantage: Improved Performance, Reliability, and Developer Experience

LinkPowering Search at Massive Scale

LinkFail-Safe Deployments at Scale

LinkA Developer Experience That Accelerates Innovation

LinkConclusion

Want to give it a try?