Case Study
onepot Scales Molecule Synthesis for Drug Discovery on Anyscale
Using Anyscale, onepot runs ML simulations across 3.4B compounds and tens of billions of possible reactions, narrowing them down to the ones most likely to synthesize successfully, to move faster through the design, make, test, and analyze (DMTA) cycle.

10B+
reactions enumerated and scored with Anyscale
10K+
vCPUs deployed across CPU and GPU inference workloads
50%
of workloads run on spot instances for cost efficiency
onepot AI is building an AI-native drug discovery platform that makes small-molecule fabrication faster, cheaper and more predictable.
Drug discovery follows a design–make–test–analyze (DMTA) loop: design candidate molecules, make them in the lab, test their properties, and analyze the results. Computational chemistry has accelerated the design step dramatically, but the make step has remained the bottleneck. onepot closes that gap with their flagship product, onepot CORE, by combining robotic lab automation with machine learning models that predict, before any reaction runs, which compounds are most likely to synthesize successfully.
To build and continuously grow a catalog of synthesizable compounds at this scale, onepot runs large-scale feasibility inference across tens of billions of candidate reactions, with workloads scaling from quick exploratory jobs to runs spanning tens of thousands of parallel jobs. Having built with Anyscale at prior companies, the onepot team chose to use Anyscale as its distributed AI compute platform from day one.
LinkChallenges
Building a reliable, customer-facing library at billion-molecule scale requires solving three compounding infrastructure problems simultaneously: the operational complexity of managing compute for generating and scoring the space, the security constraints that govern where that compute can run, and the cost pressure of sustaining that scale without a dedicated platform engineering team.
Three challenges stood in the way:
ML inference on billions of molecules demands seamless CPU/GPU orchestration. To identify which compound can actually be synthesized in the lab, onepot team enumerates a raw chemical space of tens of billions of candidates, each requiring featurization, de-duplication, and structural validation before scoring it with feasibility ML models that predict success in the lab. That’s a distinct two-stage distributed problem: CPU-heavy enumeration and large-scale batch inference that mixes CPUs and GPU instances. Together the two stages consume tens of thousands of vCPUs, and the system must support everything from small jobs to massive parallel runs with the same workflow without manual configurations for each.
Proprietary data required strong security guarantees on a multi-cloud compute estate. The molecules in onepot CORE and the synthesis outcomes that train the feasibility models are proprietary. In drug discovery, where the patentability of a chemical space can determine the commercial viability of an entire research program, running computational workflows on shared or externally managed infrastructure is not a viable option. onepot's workloads also moved between AWS and GCP based on compute capacity, so seamless multi-cloud support was a baseline requirement, not a preference.
Growing library of unique compounds without growing compute costs. The scale at which onepot operates would be economically infeasible if every job ran on on-demand instances at full price. The team needed to shift a large fraction of that compute to spot, but without taking on the engineering burden of building and maintaining custom fault-tolerance logic. The operational overhead of managing spot instance reliability at scale was itself a cost that had to be designed around in addition to the raw instance pricing.

Andrei Tyrin | Co-founder, onepot AI
LinkThe Solution
Both of onepot's co-founders arrived at the company with direct experience on the Ray and Anyscale ecosystem. That prior familiarity meant the team could move quickly, and they adopted Anyscale as their distributed compute platform from the first weeks of building the company.
With Anyscale, onepot is able to:
Run end-to-end CPU/GPU data curation and ML inference pipeline on one scalable platform. Tens of thousands of vCPUs fan out across tens of billions of candidate reactions, and the same platform runs the feasibility models that filter the catalog down to reliable, synthesizable molecules across both CPU and GPU instance types, with no separate pipeline configuration required for each.
Keep proprietary data fully within their own cloud accounts. Anyscale runs inside onepot’s AWS and GCP environments (BYOC), keeping synthesis outcomes and molecular data that underpin their models and catalog within their control, all while enabling them to move workloads between cloud providers without reconfiguration overhead.
Offload up to half of all workloads to spot instances without engineering effort. Jobs are structured to complete quickly and write results incrementally, making the entire pipeline well-suited to use spot instances. Anyscale handles automatic preemption, checkpointing inference workloads and resuming once capacity comes back online.

Brandon Wang | ML Researcher, onepot AI
LinkEnd-to-end preprocessing and feasibility ML inference on one platform
Anyscale powers onepot's full chemical-space pipeline, covering both the enumeration stage that generates candidate molecules and the inference stage that determines which of them are worth offering to customers. The team deploys tens of thousands of vCPUs to fan out across the full combinatorial space of purchasable building blocks and validated reaction templates, with each pairing featurized, de-duplicated, and written to storage in a single coordinated job. What would otherwise be an infeasible process on modest infrastructure completes in hours, and the same Anyscale workspaces and job primitives used for a quick exploratory run scale directly to a full catalog refresh without any reconfiguration.
Once enumeration is complete, onepot's feasibility models score every candidate reaction and remove the ones unlikely to succeed in a real lab. These models are trained on the company's proprietary pool of experimental outcomes and applied across tens of billions of candidates in batch, with many compact enough to run economically on CPU instances rather than GPUs, a flexibility Anyscale handles without requiring separate pipelines. The result is onepot CORE: more than 3 billion synthesizable compounds with a 70 to 80 percent synthesis success rate, guaranteed 90% or better purity by LC/MS, and delivery timelines as short as five business days. Filtering by a model trained on real experimental data is critical for achieving such high success rates.
Brandon, who leads ML and infrastructure at onepot, noted that the reliability of the Anyscale platform at scale stood out most after coming from past environments where sporadic failures were a routine part of large distributed jobs. Seeing thousands of machines running simultaneously without interruption was a concrete sign that the infrastructure would not become the constraint as the scale of computational work grew.

Andrei Tyrin | Co-founder, onepot AI
LinkMulti-cloud deployment with BYOC security
The molecules in onepot CORE and the synthesis outcomes that train the feasibility models carry real patentability implications. In drug discovery, where IP position often determines whether a research program is commercially viable, running computational workflows on shared or externally managed infrastructure could directly undermine the product's value. From the beginning, onepot ran Anyscale inside its own AWS and GCP cloud accounts using a bring-your-own-cloud model, keeping all data and computation within environments the team already controls.
Multi-cloud support proved to be an operational advantage as much as a security one. Anyscale treats both AWS and GCP as first-class targets with no reconfiguration required when a workload moves between them, meaning the team can optimize for resource availability and cost across providers without any engineering overhead at the cloud boundary. For a seven-person company where every engineering hour counts, that translates directly into more time spent on the chemistry and modeling problems rather than on infrastructure logistics.

Andrei Tyrin | Co-founder, onepot AI
LinkSpot instance reliability at scale
Running at tens of thousands of vCPUs per run would be cost-prohibitive on on-demand instances alone. onepot's pipeline is deliberately structured so that individual jobs complete quickly and write results to storage incrementally, which means any single spot interruption rarely loses meaningful progress on an overall run. That design lets the team run roughly half of their compute on spot today.
What that reliability has meant in practice is that the team has never had to treat spot as a special case: no separate recovery workflows, no manual job restarts, no engineering time budgeted against the risk of preemption. Anyscale absorbs the operational complexity that would otherwise require a dedicated platform engineering team, leaving onepot's team free to focus on what actually matters. Every hour not spent managing spot recovery is an hour available for the generative models and synthesis intelligence that define onepot's next phase of growth.

Andrei Tyrin | Co-founder, onepot AI
LinkWhat's Next
onepot is working on the next generation of its chemical space, with ambitions to grow from billions to trillions of compounds, alongside generative models that allow medicinal chemists to specify desired molecular properties and receive back synthesizable candidates with high predicted success rates. Both directions will require significantly more distributed training and inference than today, and the infrastructure to support that scale is already in place.

Brandon Wang | ML Researcher, onepot AI
“Running thousands of parallel jobs would be an infrastructure cost and operational challenge for most teams. With Anyscale reliably managing that complexity, including running 50% of workloads on spot instances for cost efficiency, our ML team can stay focused on the modeling and chemistry."
Andrei Tyrin
Co-founder, onepot AI
