De novo-designed proteins

Free software: AlphaFold (structural protein database; predicts a proteins 3D structure from its amino acid sequence) Openfold3 (collaboration with Nvidia)

Latent-X (by Latent Lab is a de novo protein design model) SandboxAQ (announced the Structurally Augmented IC50 Repository (SAIR), an open access repository that leverages the Boltz seris of models to generate computationally folded protein ligand structures linke to corresponding experimental drug affinity values). Boltz-2 (predicts molecular binding affinity-Youtube video) RFdiffusion (enables users to create completely novel proteins based on molecualr specifications)

Paid software: ProteinMPNN (can take a protein structure and generate an amino acid sequence that will fold into that structure) ThermoMPNN (can predict change in protein stability due to point mutations).

Companies: Tamarind Bio (has built a web interfact where resarchers can access hudreds of AI tools, some open sourced and some licenced and get guidance and support in how to use them). Openfold (non-profit AI research and development consortium developing free and open-source software tools for biology and drug discovery.) Insilico Medicine (small molecule drug discovery through AI) Genesis AI (a foundaiton model called Pearl for protein-ligand cofolding) Nabla Bio (Boston based AI protein therapeutics company) Cradle (AI based protein design company) Chai Discovery (AI driven biologics company developing therapeutics against undruggable targets; its de novo antibody design model is capable of generating full lenght antibodies with therapeutic attributes) Boltz (AI research and product company) GATC Health (an AI drive therapeutic discovery company; also has clinical trail insurance products) Absci (focuses on rational antibody design and computational design of antibodies from scaratch or “de novo” to bypass labor intesive experimental screens.

Introduction/Definitions:

computational docking: is based on maximizing the shape and chemical complementarities between a given pair of interacting proteins.

Artificial Intelligence:

Deep Cure (uses a plaform called MolGen which builds custom libraries for a specific set of requirements)

Typically scientists start by screening massive libaries of small molecuels with a hope of finding a starting point for a long optimization process. However, many companies are starting to use intentional enginenring such as wehn developing small molecule drugs for therapeutic targets. Instead of jsut screening massive libaries and hoping that something will stick, the drug is designed form the beginning to meet specific requirements.

Iktos generates innovative molecules by using an AI driven retrosyntehsis platform. The company uses generative AI to find ways of breaking down a complicated target molecule. Its proprietary generative AI, trained on millions of organic reacitons, generates molecules like a chemist by leveraging commercial building blocks and organic reactions over several steps.

Exscientia milestones include final results for Phase I/II trails for its cyclin-dependent kinase 7 inhibitor latter in 2024. The company beleives that AI driven drug design will result in unprecedented drugs as opposed to incremental gains in efficiency and speed.

De novo or ab initio methods:

De novo methods predicts the structure from sequence alone, without relying on similarity at the fold level between the modeled sequence and any known structures. These methods assume that the native structure corresponds to the global free-energy minimum and attempts to find this minimum by an exploration of many conceivable protein conformations.

Grinter discloses a RFdiffusion-based protein design to create binders that block hemoglobin binding to ChuA. design de novo protein binders to block heme acquisition from hemoglobin. Using an AlphaFold2 model of ChuA as a target, they utilised RFdiffusion and ProteinMPNN to design binders targeting extracellular loops 7 and 8 of ChuA, which accordingl to their model indicated were responsible for hemoglobin binding. They screened a limited number of these designs, identifying several binders that inhibit E. coli growth at low nanomolar concentrations when hemoglobin or myoglobin was the sole available iron source. See Grinter

Antibiotics: (MIT, US 2022/0310198) discloses a method for identifying molecule(s) that possess antimicrobial activity that includes (a) providing a first training set of molecules for which antimicrobial activity is known, (b) applying a machine learning algorithm to the first training set of molecules thereby generating a machine learning model, (c) assessing the ability of the machine learning model to predict antimicrobial activity of the molecules in the first training set, (d) applying the machine learning model to a seocnd training set of molecules, (e) assessing the ability of the machine leanring model to predict antimicrobial activity of the molecules in the second training set, (f) altering the machine learning model to integrate results obtained in step (e) thereby generating an updated machine learning model and (g) applying updated machine learning model to a test set of molecules that includes molecules unkonw in the updated machine learning mode, thereby identifying molecule(s) of the test set of molecules as a molecule predicted to possess antimicrobial activity. The in silico moceling was performed upon a vast number of test compounds. One of the compounds, halicin, was discovered to be effective against C. difficile, pan-resistant A. baumannii, carbapenem-resistant Enterobacteriaceae species, M. tuberculosis and MRSA.

Designing Antibody-Antigen Binders:

Chai Discover (has produce called Cahi-2, a multimodal l generative model that achieved a 16% hit rate on de novo antibody designs).

\Using the 3-D structure of the antibody-antigen complexes, it is possible to enhance the antibody-antigen binding affinities by in silico mutations on antibody residues. In the best situation, when the antibody-antigen complex structures are available, it is relatively straight forward to perform affinity maturation in silico. First, the protein backbone is treated as rigid, and the conformation of the side chain was determined by discrete side-chain rotamer search. Second the lowest energy of the structures was further re-evaluated by using more accurate, but computationally more expense models. (Zhao, “In silico methods in antibody design” Antibodies, 2018)

Homology modeling:

Homology modeling relies on detectable similarity spanning most of the modeled sequence and at least one known structure. It relies on finding known structures related to the sequence to be modeled, aligning the sequence with the related structures, building a model, and assessing the model.

Bioinformatic analysis of genomic sequences:

Multiple sequence alignments and protein structure

–ConPLex: is an in silico screening tool which makes predictions of binding based on the distance between learned representations, enabling predictions at the scale of massive compound libraries and the human proteome. (Singh, Biophysics and computational Biology, “Constrastiv learning in protein language space predicts interactions between drugs and protein targets”, 120(24), 2023).

Subtractive Geneomics:

KEGG (Kyto Encyclopedia of Genes and Genomes –database that assimilates informaiton on genes, prtoeins, pathways and diseases). Uniprot (find your prtoein)

NCBI (sequence analysis tools, etc) antiSMASH (tool can be used to gain insights regarding potential drug candiddates, enzymes, secondary metabolities from genomic data).

PSORTdb (subcellular localizaiton prediciton tool)

Youtube videos: Galaxy Bioinformatics (tutorial)

Substractive genomics is a computational method used to identify unique genetic elements or features of an organisms by comparing the genomes of closely related organisms to identify regions that are present in one organisms but absent or significantly different in another. For example, M. luteus, which is known to cause infections and human proteomes can be acquired from the Uniprt database. Human proteins that are alike to pathogen proteins might itnerfer in binding of therapeutic compounds. Thus the homologous sequences that are functionally comparable between M. luteus and human prtoeomes are then screened and only non-homologous seuqences are considered for further analysis. The dataset for example is subjected to Galazy tool for sequence similarity searching using the Blastp database against human proteome. The resulting data set of proteins are listed as non-homologous proteins. The study resulted in the identificaiton of important extracellular protein SOD as a drug target. The, genome mining was used to identify potential ligands from B. licheniformis genomic data which resulted in identifciation of five important lead molecuels agaisnt the SOD. (Yaraguppi, “Molecular dynamic and simulation analysis against superoxide dismutase (SOD) target of Micrococcus luteus with secondary metabolites from Bacillus licheniformis recognized by genome mining approach”. J. Biolg. Sciences 30 (2023)).