Protein bioinformatics

Useful Links: Human Proteome Map Portal contains translation of protein products derived from over 17k human genes. Proteomic Data Base world-wide Protein Data Bank (contains more than 126,000 experimentally determiend atomic level 3D structures of biological macromolecules). Peptide Atlas (ompendium of peptides identified in a large set of tandem mass spectrometry proteomics experiments)

Proteomics, generally:

The genome can be viewed as the blueprint of a cell; the transcriptome encompasses the first step, transcribing parts of the genome, which is active at a given time point. The proteome, however, describes the sum of the working parts of a cell. Thus, proteomics is the most direct platform for measuring cellular activity. Importantly, both during transcription from DNA to RNA and during translation from RNA to protein, changes occur, which can multiply the different variants of the encoding gene. These include transcription errors, epigenetic changes, and other events, as well as translation errors, posttranslational modifications such as phosphorylation, and differential modes of protein folding. These changes increase complexity markedly, thereby allowing the most direct and most precise insight into a cell. See Kaufman

Proteomics aims to provide the most detailed insights into cellular processes by analyzing mature proteins, including modifications such as posttranslational processing or cleavage, which cannot be captured by genomics or transcriptomics. Furthermore, studies in numerous species and cell types have indicated that the cellular concentration of mRNA and protein encoded by the same locus do not strictly correlate and that this correlation is state specific.

High-throughput proteomics data can be harnessed to evaluate and refine genome annotation—a strategy called proteogenomics—by providing experimental evidence for missing genes, correcting translational start site annotations, and corroborating existing open reading frames. Typically, the first step of a proteogenomic analysis is the six-frame translation of the genome to capture all possible translated genomic regions. Extensive MS/MS data are then searched against this translated genome database to provide evidence for thus far unannotated open reading frames. To date, several such proteogenomic analyses have been carried out for example to improve M. tuberculosis genome annotations. See Kaufman

Proteomics has been developed for the large scale study of protein patterns in organisms. Typical goals for proteomic analysis are identification and quantification of proteins present in a specific tissue under specific circumstances. Proteomic technologies, in combination with bioinformatics, are powerful tools for protein identification and characterisation. Commonly, two dimensional (2D) electrophoresis is used for protein separation and Mass Spectrometry followed by databank searching are used for protein identification. Up to 10000 proteins can be studies simultaneously. Strategies for characterizing changes in complex mixtures have been developed using both and also.

Functional Proteomics:

Expression proteomics compares protein concentrations over different conditions to infer protein activity. In contrast, functional proteomics focuses on the regulation of proteins by posttranslational modifications and protein turnover, as well as the organization of proteins into multiprotein complexes, signaling pathways, and protein networks. Ultimately these factors all contribute to the regulation of cell function, adaptation to environmental stimuli, virulence, and pathogenicity.

Differential Proteomics

The single largest proteomics market is in the field of differential proteomics, where samples of serum from diseased and nondiseased propulations are compared and contrasted to search for differences in protein levels. These proteins become diagnostic markers of diisease, targets for clinical therpaeutic intervention, or therapeutics themselves. Haaft, “Separations in Proteomics: Use of Camelid Antibody Fragments in the Depletion and Enrichment of Human Plasma Proteins for Proteomics Applcations, http://www.captureselect.com/downloads/sperations-in-proteomics.pdf, pp. 29-40, 2005).

Techniques Used for Proteomics

Huamn have about 20k human protein oding genes and the proteins they express span a concentraiton rnage of about 12 logs. Yet only 0.1% of proteins contribute to 99% of the total plasma protein mass. The remaining 1% contains highly relevant, low abundance biomarkers, including cytokines, chemokines, interferons and proteins released by early stage tumors. Detecting these crucial biomarkers requries a very sensitive protein assay.

Affinity Chromatography:

–VHH fragments as ligands: Affinity chromatography that uses naturally occurring camelid single chain antibody (VHH) fragments as ligands can solve problems in proteomics by providing high affinity, high specificity binders. Unlike antibody reagents, VHH fragments can be easily manufactured and are very stable. (Haaft, “Separations in Proteomics: Use of Camelid Antibody Fragments in the Depletion and Enrichment of Human Plasma Proteins for Proteomics Applciations, http://www.captureselect.com/downloads/sperations-in-proteomics.pdf, pp. 29-40, 2005).

For two-dimensional gels, samples may be run on separate gels, stained, and protein abundances compared with the use of imaging software. However, in practice, protein pattern comparisons can be difficult to achieve due to poor reproducibility of protein separations on two-dimensional gels.

Mass spectrometry (MS) is not a quantitative technique per se as ion yields are highly dependent on the chemical and physical nature of the sample. However, isotopic labeling combined with MS has been extensively used for many years to produce accurate quantitation of small molecules and, more recently this has been extended to peptides and proteins.

Isotope-coded affinity tag (ICAT): is a gel-free method for quantitative proteomics that relies on chemical labeling reagents referred to as ICATs. These chemical probes consist of 3 general elements: a reactive group capable of labeling a defined amino acid side chain (e.g. iodacetamide to modify cyteine residues), an isotopically coded linker, and a tag (e.g. biotin) for the affinity isolation of labeled proteins/peptides. For the quantitative comparison of two proteomes, one sample is labeled with the isotopically light (d0) probe and the other with the isotopically heavy (d8) version. To minimize error, both samples are then combined, digested with a protease (ie., trypsin), and subjected to avidin affinity chromatography to isolate peptides labeled with isotope-coded tagging reagents. These peptides are then analyzed by . The ratios of signal intensities of differentially mass-tagged peptide pairs are quantified to determine the relative levels of proteins in the two samples.

The development of isotope coded affinity tag (ICAT) reagents allows for quantitation through isotopic labeling. These reagents consist of 3 functional parts:

an iodoacetamide group that reacts with the free sulfhydryl group of a reduced side chain
a biotin moiety to aid isolation of modified peptides by
a linker group that contains either heavy or light isotopic variants

In a typical experiment, one sample is labeled with light reagent and the other with heavy reagent. After attachment with ICAT labels, samples are combined, and the cysteine containing components are purified by means of the biotin tag. After MS data acquisition, the resulting mass spectra are searched for pairs of isotope envelopes differing in mass by 8 Da, and relative quantities of the proteins are determined by comparison of the corresponding isotope profiles. Collision induced fragmentation (CID) of peptides of interest by gives rise to sequence specific fragmentation patterns, from which the identity of the parent protein can be derived.

A major innovation of the ICAT approach was that the affinity tag (biotin) was used to purify cysteine-containing peptides, reducing the complexity of a peptide mixture by about a factor of 10. As a result, several proteins that usually can’t be observed in an approach like 2DE could be identified and quantified.

Several problems with first generation ICAT reagents included the fact that 1) the biotin tag was bulky, and fragmentation of modified peptides produced many fragments in the CID spectrum related to the tag rather than the peptide, 2) the substantial mass addition resulting from the tag could also shift the masses of larger peptides outside the optimum range for detection by standard MS instruments, 3) the choice of 8 Da mass difference for the heavy ICAT reagent produced potential ambiguity between peptides containing 2 ICAT labeled cysteine residues (delta M _16.100 Da) and common oxidation of methionine residues (delta M + 15.995 Da) and 4) the D0 and D8 modified peptides did not coelute by reverse-phage chromatography, making quantitation less accurate.

These problems have been solved through the use of second generation ICAT reagents such as those which contain a cleavable linker group connecting the biotin moiety with the sulfhydryl reactive isotope tag. Also, rather than using deuterium as the heavy isotope, reagents employ nine 13C atoms as the isotopic label for the heavy reagent. Therefore, the heavy and light modified peptides coelute by reverse-phase chromatography, making quantitation simpler to achieve and the results more reliable.

ICAT reagentsapplied biosystems

Next Generation Sequences (NGS): Platinum Pro (NGS platform which can be operated on a lab benchtop)

Newer affinity based approaches using oligonucleotide aptomers (short single stranded ogligonucleotides that bind with high affinity and specificity to proteins and peptides) that allow sequencing, enalbing NGS readout and all the resultant benefits, icnluding scalability, streamlined workflows, and multiomic endeavors. (see Illumina)

Stable isotope labeling by amino acids in cell culture (SILAC)

In this quantification procedure, labeled, essential amino acids (usually deuterated leucine (Leu-d3) are added to amino acid deficient cell culture media and are thus incorporated into all proteins as they are synthesized. No chemical labeling or affinity purification steps are performed.

In a typical experiment, an experimental cell population is treated in a specific way, such as cytokine stimulation. Protein populations from both this experimental sample and the control are then harvested, and because the label is encoded directly into the amino acid sequence of every protein, the extracts can be mixed directly. Purified proteins or peptides will preserve the exact ratio of the labeled to unlabeled protein, as no more synthesis is taking place. Quantitation takes place at the level of the peptide mass spectrum or peptide fragment mass spectrum, exactly the same as in any other stable isotope method (such as ICAT).

Advantages of SILAC over ICAT include the fact that almost 70% of unique tryptic peptides in the human genome contain at least one leucine, while only ~25% contain cysteine, the common target for chemical tagging. A disadvantage of SILAC is that it is limited to cells that can be grown in culture.

Proteolysis in the presence of 16O and 18O

In this procedure Mirgorodskaya et al. carried out isotopic label after proteolysis in the presence of 18O water or regular water. The sample digested with 18O water incorporates 16O, generating an isotopic label that is used for relative quantitation. However, the quantitation can be complicated by the possible loss or incomplete incorporation of the label.