protein function
X-Refs:National Human Genome Research Institute (comparison of genomes of other species with our species to determine protein function)
See also Protein-Protein Interaction assays under biotechnology.
The Daunting Task of predicting Protein Function based on Structure
Proteomics is the study of how proteins interact with each other and other molecuels in metabolic pathways. The ghree billion nucleic acid base pairs identified in the human genome are believed to make up about 40k genes, which after full cotranslational and posttranslational modifications are believed to total in the millions of proteins. Unlike DNA and RNA, protein activity is based on molecular structure, While there have been countless attmepts to predict prtoein activity with supercomputers, these efforts have produced little useful results due to the complex structure of proteins. (Haaft, “Separations in Proteomics: Use of Camelid Antibody Fragments in the Depletion and Enrichment of Human Plasma Proteins for Proteomics Applciations, http://www.captureselect.com/downloads/sperations-in-proteomics.pdf, pp. 29-40, 2005).
Although the sequencing of complete genomes provides a list that includes the proteins responsible for cellular regulation, this does not reveal what these proteins do, nor how they are assembled into the molecular machines and functional networks that control cellular behavior. With the genomes of so many organisms completely sequenced, science and its new biomedical discipline of functional genomics, are faced with understanding the function of these newly discovered genes. Attwood states that “it is presumptuous to make funcitonal assignments merely on the basis of some degree of similarity between sequences” because “very few structures are known compared with the number of sequences, and structure prediction methods are unreliable (and knowing structure does not inherently tell us function) (Science, 290, 2000). Bowie also states that although it should be possible to predict structure from sequence and subsequently to infer detailed aspects of function form the structure, both problems are extremely complex and it seems unlikely that either will be solved in an exact manner in the near future (“Diciphering the Message in protein sequences: tolerance to amino acid substitutions” Science, 247, 1990, pp. 1306-1310).
The regulation of many different cellular processes requires the use of protein interaction domains to direct the association of polypeptides with one another and with phospholipids, small molecules, or nucleic acids. Interaction domains can target proteins to a specific subcellular location, provide a means for recognition of protein posttranslational modifications or chemical second messengers, nucleate the formation of multiprotein signaling complexes, and control the conformation, activity, and substrate specificity of enzymes.
As an example, enzymes like kinases often generate modified amino acids on their substrates that are then recognized by interaction modules in signal transduction. For example, phosphotyrosine (pTyr) sites formed by the actions of tyrosine kinases bind effectors with pTyr recognition domains (i.e., , whereas phosphoinositides produced by phospoinositide kinases recruit pleckstrin homology domains.
Mutant cellular proteins that cause inherited disorders can exert their effects through the loss of protein-protein interactions, or conversely, by the creation of aberrant protein complexes. This suggests that rewiring of protein-protein interactions could be used experimentally to alter cellular function. Understanding the network of cellular protein interactions should expand the scope for creating novel biological responses through engineered proteins or small molecules.
It is becoming increasingly clear that an important level of organization is provided by multi protein complexes because instead of proteins and substrates colliding in diffusion-dependent manner, proteins generally interact with each other and form larger assemblages in a time and space dependent manner.
The importance of studying complexes is that it allows to place proteins with unknown roles into a functional context that is provided by their associated partners, some of which may have a known function.
Analysis of protein complexes has some special challenges in that more than 10k different genes might be expressed at the same time in a single cell or tissue and diversity on the protein level is much higher. In addition diversity on the level of primary protein sequence and the presence of modificaitons, complexity is further increased when considering the dynamic range of expression levels of individual proteins. While some proteins are present thousand copies per cell, others are just represented by a few molecules.
Prlic “Impact of genetic variation on three dimensional structure and function of proteins” PLOS One, March 15, 2017) discloses a wide range of structural and functional changes caused by single amino acid differences, including changes in enzyme activity, aggregation propensity, structural stability, binding and dissociation. For example, delta-aminolevulinic acid dehydratase catayzes an early step in tetrapyrrole biosynthesis. The Phe-Leu mutation (F12L) casues ALAD Porphyria, a rare autosomal recessive disease. Despite of being located far from active site reisudes 199 and 252, theis variant changes the preferred protein assemply from octamer to hexamer. In addition, the optimal pH for enzyme activity is shifted from pH 7 to pH 9 in the mutant. The mtuant enzyme is barely active under physiological conditions.
Antibodies:
Kabat compared the sequences of the hypervariable regions then known and found that, at 13 sites in the light chains and at seven positions in the heavy chains, the residues are conserved. They argued that hte residues at these sites are involved in the structure, rather than the specificity of the hypervariable regions. They suggested that these residues have a fixed position in antibodies and that this could be used in the model builindg of combining sites to limit the conformations and positions of the sites whose reisdues varied. (Lesk, “Canonical structures for the hypervariable regions of immunoglobulins” J Mol. Bio. (1987) 196, 901-917)
Antibody complementarity determining region (CDR) H3 loops are critical for adaptive immunological functions. Although the other five CDR loops adopt predictable canonical structures, H3 conformations have proven unclassifiable, other than an unusual C-terminal “kink” present in most antibodies. High structureal conservation among antibodies makes it possible to model the framework and the five CDR loops that adopt canoical conformations, but the exceptionally diverse CDR H3 loop evades current methods, thus making structure prediciton of the antigen-binding region difficult. (Weitzner, “the origin of CDR H3 structural diversity” Structure 23, 302-311, 2015).
Effects of Single Nucleotide Variations (SNVs):
on Activity:
Prlic “Impact of genetic variation on three dimensional structure and function of proteins” PLOS One, March 15, 2017) discloses that 52 of 374 SNV related changes in their dataset either increase or decrease protein activity. In some cases, SNVs lead to complete loss of funciton. For example, human glycyl-tRNA synthetase loses detedtable enzymatic activity due to a G526R mutation, which is causative of Charcot-Marie-Tooth disease. The Ile-Val mutation in von Willebrand factor casues the blood clotting disorder von Willebrand diease. The mutation has a “gain of function” effect, producing a constitutively active form of vWF that binds platelets in the absence of shear forces.
on Aggregation:
Prlic “Impact of genetic variation on three dimensional structure and function of proteins” PLOS One, March 15, 2017) disclsoes that 28 of 374 SNVs in their dataset gave rise to prtoein aggregaion, which is a hallmark of some nuerodegeneative diases such as Alzheimers.
on Stability:
Prlic “Impact of genetic variation on three dimensional structure and function of proteins” PLOS One, March 15, 2017) discloses that 58 of 374 SNV related changes in their dataset letd to reduce protein stability.
on Binding:
Prlic “Impact of genetic variation on three dimensional structure and function of proteins” PLOS One, March 15, 2017) disclsoes that 44 of 374 SNV related changes in their dataset affect ligand or macromolecule binding proeprties of the protein. A SNV can change the affinity of binding to partners such as activators, repressors, or substrates. Such changes can also afect the kinetics of interacitons with partners or alter binding specificity.
Conservative Amino Acid Substitution
There is tremendous variability in the importance of individual amino acids in protein sequences. On the one hand, nonconcervative residue substitutions can be tolerated wtih no loss of activity at many residue positions, especially those exposed on the protein surface. On the other hand, destabilizing mutations can occur at a large number of different sites in a protein, and for many proteins such mutations account for more than half of the randomly isolated missene mutations that confer a defective phenotype. At sites that are key determinatans of stability or activity, even residue substituions that are generally considered to be conservative (.e.g, Glu-Asp, Asn-Asp, Ile-Leu, Lys-Arg and Ala-Gly) can have severe phenotypic effects. Unofrtunately, this means that there is no simple way to infer the likely effect of an amino acid substitution on the basis of sequence information alone. A nonconservative Gly-Arg substitution could be phenotypically silent at one position while a conservative Asn-Asp change could ldead to complete loss of activity at another position. (Pakula, “Genetic analysis of protein stability and funciton” Annu. Rev. Genet. 1989, 23: 289-310).
Peptides can be designed de novo, but most peptides of biological interest are derived from N-terminal, C-erminal, or internal sequences of native proteins. Unfortunately, there are valid reasons why certain native sequences soemtimes need to be altered. Even for relatively short sequences, there are essential and non-esstial amino acid residues, although the relative importance of the individual amino acid residues is not always easy to determine. The “not-so-straighforward” rule of thum is to make the changes in the non-essential residues. These changes may include amino acid substitution (e.g., for solubility, stability, etc), chemical modification(e.g., for stability, structure-fuciton sutides), attachmet of ligands and conjugation. (SIGMA “Designing custom peptides” 2004, pp. 1-2).