Infectious Disease

AI-Accelerated Drug Discovery for SARS-CoV-2 Protease Inhibitors

Mid-stage biotech hit identification for a novel coronavirus protease inhibitor program.

Company

Mid-stage biotech (Series A, 15-person team)

Timeline

January to February 2025

Engagement

Small Molecule Discovery Pipeline

Small Molecule Discovery
20×
Faster than traditional HTS
87.5%
Cost savings vs. HTS
50%
Wet-lab hit rate (top 12)
100%
Hit rate for ΔG < -8.0

The Challenge

A Series A biotech had identified SARS-CoV-2 main protease (3CLpro) as a high-priority target for next-generation antivirals. Their internal screening had stalled: traditional high-throughput screening (HTS) would cost $2.8M and take 8 to 10 months. The company needed validated hits within 12 weeks to secure Series B investor confidence.

Business Constraints

  • Budget: $350K (tight Series A ceiling)
  • Timeline: Ranked candidates in 4 weeks, validation-ready
  • Risk tolerance: Low. Needed high-quality leads, not fishing expeditions

HelixForge Approach

Week 1 to 2: Candidate Generation and Virtual Screening

Input
  • SARS-CoV-2 3CLpro crystal structure (PDB: 7JTL)
  • Internal compound library (12,000 molecules from previous programs)
  • ChEMBL/PubChem additional chemical space (2.3M compounds screened)
Methods
  • Graph neural network (GNN) scoring trained on proprietary validation datasets
  • Initial ranking across 2.3M compounds
  • Structural diversity filtering to avoid bias toward known inhibitors
Output
  • Top 15,000 candidates ranked by predicted binding affinity
  • Structural clustering to identify scaffolds

Week 2 to 3: Refinement and Physics-Based Scoring

Molecular docking (AutoDock Vina + DiffDock) re-scored the top 15,000 compounds for binding pose quality and filtered to 2,847 compounds with ΔG < -7.5 kcal/mol. GROMACS 500ns MD simulations on the top 250 candidates refined binding free energy (MM-PBSA) and filtered to 89 compounds with stable binding modes. ADME and toxicity scoring reduced the pool to 56 drug-like compounds. Off-target prediction against 300+ human serine proteases yielded a final pool of 47 candidates.

Week 4: Ranking and Validation Playbook

Output
  • Top 50 ranked candidates with scaffold, ΔG, ADME score, selectivity, and priority tier
  • Recommended wet-lab assays and IC50 concentration ranges
  • Selectivity panel recommendations and formulation guidance

Final Ranked Output

Top candidates shown; full list of 50 delivered under NDA.

Top candidates by predicted binding affinity and developability
RankScaffoldΔG (kcal/mol)ADME ScoreSelectivityStatus
1Pyridone-based-8.90.870.92Priority A
2Thiazole-benzamide-8.70.840.88Priority A
3Quinolone derivative-8.50.810.85Priority A
4–5Mixed-8.3 to -8.20.80–0.820.85–0.87Priority B
6–10Mixed-8.1 to -7.80.78–0.820.80–0.87Priority B
11–50Mixed-7.5 to -7.10.70–0.800.75–0.85Priority C
Results and impact

Speed, validation, and business outcomes

Speed vs. Traditional HTS

MetricTraditional HTSHelixForgeImprovement
Timeline8 to 10 months4 weeks20× faster
Cost$2.8M$350K87.5% savings
Compounds Screened5M2.3M (+ physics refinement)Same coverage, better ranking
Predicted Hit Rate~10% (500 hits)~24% (50 to 100 high-quality leads)2.4× improvement

Wet-Lab Validation Outcomes (6 weeks post-delivery)

Assay results on top 12 candidates.

CandidatePredicted ΔGObserved IC50 (nM)Hit?Notes
Rank 1-8.9340YesExcellent selectivity vs. human proteases
Rank 2-8.7420YesGood microsomal stability
Rank 3-8.5510YesHits target; needs formulation work
Rank 4-8.31,200YesModerate potency, excellent ADME
Rank 5-8.22,100YesAcceptable for follow-up
Rank 6-8.115,000WeakSelectivity issues flagged
Rank 7–12-7.9 to -7.5InactiveNoNo activity; assay controls confirm
100%
Hit rate for ΔG < -8.0 (5/5 active)
50%
Hit rate across top 12 (6/12)
~8%
False positive rate
6 wks
Timeline to first active compound

Immediate Wins

  • De-risked Series B narrative: founders presented 6 validated hits plus 44 backup candidates to investors
  • Accelerated program: moved to medicinal chemistry optimization 3 months earlier than planned
  • Cost reallocation: saved $2.45M redirected to in-house chemistry and preclinical studies

Strategic Advantages

  • Informed SAR: top 3 scaffolds revealed novel binding modes; medicinal chemistry team rapidly synthesized derivatives
  • De-risked chemistry: predicted ADME properties steered synthesis away from lipophilic dead-ends
  • Selectivity foundation: built selectivity strategy on computational predictions before wet-lab work
Follow-on engagement

Q2 2025: second round using HelixForge to optimize the top scaffold. Estimated next project cost: $180K. Projected 2 to 4 month drug lead candidate.

Model validation

Lessons and recommendations

What Worked

  • Closed-loop feedback: wet-lab IC50 data re-trained GNN for follow-on rounds; predictions tightened
  • Physics + AI hybrid: MD simulations caught 3 candidates with unstable binding modes that docking alone would have scored highly
  • Selectivity focus: early off-target filtering saved the medicinal chemistry team from pursuing non-selective inhibitors

Challenges and Mitigations

Rank 6 candidate scored well computationally but showed weak potency in assay. Root cause: binding mode prediction uncertainty at -8.1 kcal/mol threshold.

Mitigation: Tightened ΔG filter to -8.2+ for Rank 2 to 5; applied to follow-on projects.

GROMACS simulations took longer than expected (some runs required 5 to 7 days). Root cause: high flexibility in binding pocket.

Mitigation: Parallelized simulations; reduced top-N candidates for reranking from 250 to 150.

When to use HelixForge for virtual screening

  • High-throughput screening budgets above $500K
  • Timeline pressure (under 6 months to hits)
  • Complex targets (proteases, kinases, difficult binding pockets)
  • Need for selectivity or ADME filtering early in discovery

ROI: approximately 7:1 (cost savings + speed advantage vs. traditional HTS).

Next steps: start with 2 to 3 target programs; iterate scoring models with wet-lab feedback for continuous improvement.

About This Engagement

Client profile
Series A biotech, 15 employees, prior HTS experience
Project duration
4 weeks (computational delivery) + 6 weeks (validation)
Total cost
$350K
Date
January to February 2025

This case study is anonymized at client request. Specific compound identifiers and institutional affiliations have been redacted. Raw data and full computational protocols available under NDA.

Run a similar program with HelixForge.

Tell us about your target. We will scope a 2 to 4 week pilot.