Modern semiempirical electronic structure methods and machine learning potentials for drug discovery: conformers, tautomers and protonation states

The Journal of Chemical Physics vol. 158  p. 124110  DOI: 10.1063/5.0139281  Published: 2023-03-06 

Jinzhe Zeng [ ] , Yujun Tao [ ] , Timothy J. Giese [ ] , Darrin M. York [ ]

  View Full Article
 Download PDF


Modern semiempirical electronic structure methods have considerable promise in drug discovery as universal "force fields" that can reliably model biological and drug-like molecules. Herein, we compare the performance of several NDDO-based semiempirical (MNDO/d, AM1, PM6 and ODM2), density-functional tight-binding based (DFTB3, GFN1-xTB and GFN2-xTB) models with pure machine learning potentials (ANI-1x and ANI-2x) and hybrid quantum mechanical/machine learning potentials (AIQM1 and QDπ) for a wide range of data computed at a consistent ωB97X/6-31G* level of theory (as in the ANI-1x database). This data includes conformational energies, intermolecular interactions, tautomers, and protonation states. Additional comparisons are made to a set of natural and synthetic nucleic acids from the artificially expanded genetic information system (AEGIS). This dataset has important implications in the design of new biotechnology and therapeutics. Finally, weexamine acid/base chemistry relevant for RNA cleavage reactions catalyzed by small nucleolytic ribozymes and ribonucleases. Overall, the recently developed QDπ model performs exceptionally well across all datasets, having especially high accuracy for tautomers and protonation states relevant to drug discovery.