Introduction:

RNA interference (RNAi) is an endogenous cellular process in which double-stranded small interfering RNA molecules (siRNAs) bind to a specific mRNA target and trigger its degradation, resulting in reduced protein levels in the cell. This gene silencing method allows researchers to study the funciton of proteins within biologcial pathways by transiently removing them and analyzing the impact on cellular function.

Gene transcription control is also mediated using other endogenous RNA molecules, such as microRNAs (miRNAs) and small nuclear RNAs (snRNAs).

While RNA i has proen to be an effective tool for transient gene control, the discovery of CRISPR has revolutionized the field of permant gene editing. The CRISPR-Cas9 system can be used in various gene editing applicaitons and as gene knock-out thorugh the creation of stop condons or splice site variants. CRISPR is often used to validate gene function data that was initially obtained using RNAi approaches. CRISPR relies on the creation of a DNA double strand break and homology directed repair or the less desirable nonhomologus end joining for egene editing to occur.

RNA interference (RNAi):

The term RNA interference (RNAi) was coined after the discovery that injection of dsRNA into the nematode Caenorhabditis elgegans leads to specific silencing of genes highly homologous in sequence to the delivered dsRNA. RNAi was also observed subsequently in insects and other animals. The natural function of RNAi appears to be protection of the genome againsnt invasion by mobile genetic elements such as transposons and viruses, which produce aberant RNA or dsRNA in the host cell when they become active. Specific mRNA degradation prevents transposon and virus replication. DsRNA triggers the specific degradation of homologous RNAs only within the region of identity with dsRNA. (Tuschl “RNA interference is mediated by 21- and 22-nucleotide RNAs” Genes & Development, 2001).

RNA interference (RNAi) is used by eukaryotic cells in a variety of organisms like fungi, plants, worms, mice and probably humans. In plants, RNAi protects cells against RNA viruses. In other types of organisms it may protect against the proliferation of transposable elements that replicate via RNA intermediates. The presence of free, double stranded RNA triggers RNAi by attracting a protein complex containing an RNA nuclease and an RNA helicase. This complex cleaves the double stranded RNA into small fragments. The bound RNA fragments then direct the enzyme complex to other RNA molecules that have complementary nucleotide sequences which can be single or double stranded and the enzyme degrades these as well. In this way, introduction of a double stranded RNA molecule can be used to inactivate specific cellular mRNAs. Thus, RNA interference is typically a two step process. In the first step, input dsRNA is digested into 21-23 nucleotide (nt) small interfering FNAs (siRNAs), probably by the action of Dicer, a member of the RNase III family of double-strand-specific ribonucleases, which cleaves double stranded RNA in an ATP-dependent manner. Successive cleavage events degrade the RNA to 19-21 bp duplexes (siRNA), each with 2-nucleotide 3′ overhangs. In the second step, siRNA duplexes bind to a nuclease complex to form the RNA-induced silencing complex (RISC). An ATP dependent unwinding of the siRNA duplex is required for activation of the RISC. The active RISC (containing a single siRNA and an RNase) then targets the homologous transcript by base pairing interactions and typically cleaves the mRNA into fragments of about 12 nucleotides, starting from the 3′ terminus of the siRNA.

siRNAi:

siRNAs are double stranded RNAs with a MW of about 13kDa which suppress protein translation by recruiting RISC to mRNA via Watson-Crick base pairing. Through the action of the catalytic RISC protein Ago2, a member of the Argonaute family, the target mRNA is cleaved. Alternativley, other Ago proteins (Ago 1, Ago3 and Ago4) catalyse endonuclease mediated nonspecific mRNA degradation by localizing the bound mRNA in processing (P) bodies. (Dahlman “Drug delivery systems for RNA therapeutics” Nature Reviews, Genetics, 23 (May 2022))

siRNA mediated gene silencing has been used safely in humans. These double-stranded RNAs with a MW of about 13kDa suppress protein translation by recruiting RISC to mRNA via Watson-Crick base pairing. Through the action of the catalytic RISC protein Ago2, a member of the Argonaute family, the target mRNA is cleaved. Alternatively, other Ago proteins (Ago1, Ago3 and Ago4) catalyse endonuclease-mediated nonspecific mRNA degradation by localizing the bound mRNA in processing (P)-bodes. siRNA can reduct the expression of any protein-coding gene and has been approved by the FDA and EMA in the form of drugs such as patisiran, which is used to treat hereditary transthyretin-mediated amyloidosis (hATTR), givossiran, which is used to treat acute hepatic pophyria, lumasiran, which is used to treat primary hyperoxaluria type 1 and inclisiran, which is used to treat hypercholesterolaemia. Given that siRNA intefers with mature mRNA, it requires only cytoplasmic delivery, which is easier to acheive than nuclear delivery. (Dahlman, “Drug delivery systems for RNA therapeutics” Nature Reviews Genetics, 23, May 2022).

Antisense Oligonucleotides (ASOs):

Companies: Sarepta Therapeutics 

ASOs are a second class of RNA therapeutics, and are oligonucleotides with a MW of 6-9 kDa. ASOs have the same manufacturing advantages as siRNA and have been approved by the FDA to treat familial hypercholesterolaemia, hATTR amyloidosis with polyneuropathy, specific subtypes of Duchenne muslar dystrophy, and infantile-onset spinal musclular atrophy. ASOs can act throguh three mechanisms of action. First, similar to siRNAs, ASOs bind mRNA via Watson-Crick base pairing, but unlike siRNAs, the ASO DNA-RNA heteroduplex recruits RNase H1 rather than RISC. RNase H1 dependent ASOs are also known as gapmers and lead to cleavage of the target RNA. Second, ASOs can also interfer with splicing machinery by interacting with pre-mRNA, thereby promoting alternative splicing, and increasing target protein expression. Thus, unlike siRNA, which silences target genes, ASOs can be used to increase protein activity in diseases including Duchenne muscular dystrophy and spinal muscular atoprhy. (Dahlman, “Drug delivery systems for RNA therapeutics” Nature Reviews Genetics, 23, May 2022).

MicroRNA (miRNAs): 

MicroRNAs (miRNAs) recruit RISC to complementary mRNA sequences, thereby facilitating targeted RNA interference. As a result, miRNA mimics, which are desigend to increase antive miRNA activity, and ant8-miRNAs or antago-miRNAs, which inhibit miRNA activity, have been studies in animal models and used in clincial trials. Dahlman “Drug delivery systems for RNA therapeutics” Nature Reviews, Genetics, 23 (May 2022)

In 2001, several groups used a cloning method to isolate and identify a large group of miRNA from C. elegans, Drosophila and humans. Several hundeds of miRNAs have been identified in plants and animals which do not appear to have endogenous siRNAs. Thus, while similar to siRNAs, miRNAs are distinct. miRNAs thus far observed have been about 21-22 nucleotides in lenght and they arise from longer precursors, which are transcribed from non-protein-encoding genes. The precursors form structures that fold back on themselves in self-complementary regions; they are then processed by the nuclease Dicer in animals or DCL1 in plants. miRNA molecules interrupt translation through precise or imprecise base-pairing with their targets. Studies have shown that expression levels of numerous miRNAs are associated with various cancers (US 2009/0175827).

Animal cells ahve been shown to express a range of about 22 nucleotide noncoding RNAs termed micro RNAs (miRNAs). The human mir-30 miRNA can be excised from irrelvent, endogenously transcribed mRNAs encompassing the predicted 71 nucleotide mir-30 precursor. One common feature of miRNAs is that they all reside within a putative arm of a predicted aobut 70 nt precursor RNA stem-loop. Dicer, an RNase III-tye enzyme, is beleived to be important for the processing of these miRNA precursors into teh about 22 nt mature miRNAs. (Dicer is also involved in siRNA production form longer dsRNAs above). Expression of the mir-30 miRNA specifically blocks the translation in human cells of an mRNA containing artifical mir-30 target sites. Similalary designed miRNAs can also be excised from transcripts encompassing artificial miRNAs precursores and can inhibit the epxression of mRNAs containg a complementary target site. This approach offers a number of important advantages when contrasted with siRNAs including transfection of miRNA expression plasmids in simple and inexpensive which can result in continous miRNA production. Inhibitor miRNAs could aslo be expressed using viral vectors thaus allowing the production of miRNAs in primary cells or in other cells that are not readily transfectable with syntehtic siRNAs. As the inhibitory RNA is expressed as part of an mRNA, it should also be possible to use regulatable promoters to control miRNA production. (Zen, Molecular Cell, 9, 1327-1333, June 2002).

It is thought that bout 30% of the total genes of the human genome are regulated by miRNAs. The miRNAs are generated through transcription of individual genes in the non-coding regions. The miRNA is transcribed from a pri-miRNA which is a precursor transcribed in the nucleus by RNA polymerase II. The pri-miRNA is cleaved by the RNase III enzyme called Drosha (dsRNA-specific ribonuclease) to produce a pre-miRNA having a hairpin loop structure. The hairpin loop of the pre-miRNA is exported out of the nucleus by the protein exportin-‘5 and Ran-‘GTP, which serve as cofactors, and processed into a miRNA duplex about 22 nucleotides in lenght by the action of teh RNase III enzyme Dicer and TRBP (transactivation-‘responsive RNA binding protein). The miRNA duplex binds with RISC (RNA-‘induced silencing complex) and regulates genes by cleaving mRNAs or preventing translation.

Various kinds of miRNAs and target genes regulated thereby may be useful in predicting the mechanisms of various diseases. Since abnormally increased or decreased miRNA expression is observed in various diseases such as cancer, diabetes and cardiovascular diseases, the miRNA is recognced as a biomarker for diagnosing and predicting diseases.

One phase I clinical trail investigated the use of MRX34, which uses liposomes to deliver a double stranded miRNA-34a mimic, for the treatment of advanced solid tumours. In a phase II clinical trail, the anti-miRNA-122 miravirsen, which binds miRNA-122 and leads to its subsequent inactivity, was tested for the treamtent of hepatitis C. (Dahlman “Drug delivery systems for RNA therapeutics” Nature Reviews, Genetics, 23 (May 2022).

Small Nuclear RNAs (snRNAs):

snRNAs contribute to pre-mRNA splicing regulation rather than binding to mRNA and cuasing degradation. These RNA molecuels are also avialbe as synthetic gene modulators and are able to up or down regulate protein expression transienty.

In molecular biology and genetics, a transcription factor (sometimes called a sequence-specific DNA-binding factor) is a protein that binds to specific DNA sequences, thereby controlling the flow (or transcription) of genetic information from DNA to mRNA. Transcription factors perform this function alone or with other proteins in a complex, by promoting (as an activator), or blocking (as a repressor) the recruitment of RNA polymerase (the enzyme that performs the transcription of genetic information from DNA to RNA) to specific genes.

Companies: Tevard Biosciences (TRNA therapies)

See Prokaryotics for differences with their translation

Translation is the process of converting the nucleotide sequence of mRNA into a protein. This is accomplished in the cell by using a genetic code consisting of codons which are sequences 3 nucleotides long. Each codon specifies 1 of the 20 different amino acids during protein synthesis. Since there are 4 possible nucleotides (A,C,G, U) there are 64 possible combinations of 3 nucleotides which make up a protein. These 64 possible combinations code for all of the 20 amino acids commonly used in protein synthesis which means that the code is degenerate in that more than 1 codon codes for the same amino acid. This degeneracy of the code also means that nucleotide mutations, particularly in the 3rd position of the codon often do not change the amino acid sequence. This is particularly true in transitions where one purine is replaced by another purine (i.e., A to G or G to A) or where one pyrimidine is replaced by another pyrimidine. However, transversions where a pyrimidine is substituted for a purine or vice versa typically is less protected in the genetic code and leads to altered amino acid sequences.

Definitions:

Initiation Codon: also referred to as the “AUG condon,” the “start condon” or the “AUG start codon” is typically 5’AUG (in transcribed mRNA molecules; 5’ATG in the corresponding DNA m olecule).

Translation termination codon also called the “stop codon” may have 1-3 sequences, i.e., 5’UAA, 5′-UAG and 5′-UGA (the corresponding DNA sequences are 5′-TAA, 5’TAG and 5’TGA.

ORF: open reading frame also called “coding region” is known in the art to refer to the region between the translation initation codon and the translation termination codon. The first codon in an ORF is the “start” condon, which encodes a modified form of methionine. Each amino acid in the polypeptide chain is encoded by a subsequent set of three base pairs, until the translation is terminated at the stop codon that does not itself encode an amino acid, but rather signals the end of translation. Thus a double-tranded lenght of DNA can have six different reading frames, depending on the starting base-pair of the first codon and the direction in which the strand is read (two strands times three base pairs per codon equals six reading frames).

The codons in a mRNA molecule do not directly recognize the amino acids they specify. Instead, the translation of mRNA into protein depends on adaptor molecules called transfer RNAs (tRNAs) that can recognize and bind both to the codon at one end using 3 nucleotides called the anticodon and to the appropriate amino acid at the other end. The anticodon region of some tRNAs is sequenced such that they require accurate base pairing with the codon at the first two positions of the codon but can tolerate a mismatch at the 3rd position. For example, a special possible anticodon base, inosine (I) can recognize U, C or A. This “wobble” base-pairing between codons and anticodons accounts for degeneracy in the genetic code (the same amino acid can be specified by different codons). Inosine is just one of the many modified nucleotides on tRNA. TRNAs are transcribed by RNA polymerase III. 

Recognition and attachment of the correct amino acid to a tRNA depends on enzymes called aminoacyl-tRNA synthetases. For most cells there is a different synthetase enzyme for each amino acid. In this linkage process, the amino acid is first activated through the linkage of its carboxyl group to an AMP moiety using ATP. The AMP-linked carboxyl group on the amino acid is then transferred to an OH of the terminal A on the sugar at the 3′ end of the tRNA molecule using an activated ester linkage. 

Protein synthesis occurs by the formation of a peptide bond between the carboxyl group at the end of a growing polypeptide chain and a free amino group on an incoming amino acid. Thus a protein is synthesized from its N-terminal end to its C terminal end. To maintain the correct reading frame and to ensure accuracy, synthesis is carried out in the ribosome which is a complex consisting of more than 50 different proteins and several RNA molecules, the ribosomal RNAs. Eukaryotic and procaryotic ribosomes are very similar. Both are composed of one large and one small subunit that is divided up into further subunits. Ribosomal components are usually designated by their S values which refers to their rate of sedimentation in an ultracentrifuge.

A ribosome contains 4 binding sites for RNA molecules. One is for the mRNA and three called the E-site, P-site and A-site (“EPA) are for tRNAs. Initiation of translation occurs with the codon AUG and a special initiator tRNA which carries the amino acid, methionine (in bacterial, a modified form of methione–formylmethionine — is used). This Met will later typically be removed from the protein with a protease.

In eucaryotes, the initiator tRNA which is coupled to Met is loaded into the small ribosomal subunit along with additional proteins called eukaryotic initiation factors (eIFs). One such initiating factor, iIF-2, forms a complex with GTP and mediates the binding of the methionyl initiator tRNA to the small ribosomal subunit, which then binds to the 5′ end of the mRNA and begins scanning along the mRNA. When an AUG codon is recognized, the bound GTP is hydrolyzed to GDP by the eIF-2 protein, causing a conformation change in the eIF-2 protein and releasing it from the small ribosomal subunit. The large ribosomal subunit then joins the small one to form a complete ribosome that beings protein synthesis.

At this point, the initiator tRNA is bound to the P-site. The process of amino acid addition to form a protein occurs in a series of repeated steps. (1) a tRNA carrying the next amino acid in the chain binds to the ribosomal A-site by forming base pairs with the codon in mRNA which is positioned at the A-site. The aminoacyl-tRNA is tightly bound to and elongation factor (EF-Tu) which pairs transiently with the codon at the A site. The codon-anticodon pairing triggers GTP hydrolysis by EF-Tu causing it to dissociate from the tRNA. This delay between tRNA binding increases the accuracy of translation. At this point, the A and P sites contains adjacent tRNAs. (2) the carboxyl end of the polypeptide chain is released from the tRNA at the P-site and joined to the free amino group of the amino acid linked to the tRNA at the A site, forming a new peptide bond in a reaction catalyzed by peptidyl transferase. This reaction is accompanied by conformational changes in the ribosome which shift the tRNA into the E and P sites. (3) Additional conformational changes moves the mRNA exactly 3 nucleotides through the ribosome and resets the ribosome so it is ready to receive the next amino acyl tRNA.

The end of translation is signaled by one of 3 stop codons (UAA, UAG, or UGA) which are not recognized by a tRNA. Proteins called release factors (which mimic the shape and charge of a tRNA) bind to the ribosome with a stop codon positioned in the A site causing the peptidyl transferase in the ribosome to catalyze the addition of a water molecule instead of an amino acid to the peptidyl-tRNA. This frees the carboxyl end of the growing polypeptide chain from its attachment to a tRNA.

In eukaryotes, translation can occur either in the cytoplasm or on the RER. Proteins that are translated on the RER are targeted there based on their own initial amino acid sequence. The ribosomes found on the RER are actively translating and are not permanently bound to the ER. A polypeptide that starts with a short series of amino acids called a signal sequence is specifically recognized and bound by a cytoplasmic complex of proteins called the signal recognition particle (SRP). The complex of signal sequence and SERP is in turn recognized by a receptor protein in the ER membrane.

Protein synthesis requires a lot of energy. At least 4 high energy phosphate bonds are split to make each new peptide bond. Two are consumed in charging a tRNA with an amino acid and two more drive steps in peptide synthesis. It would not only be a waste of energy therefore if incomplete mRNAs were translated but it would also be very harmful to the cell because aberrant proteins would be produced. Eukaryotes avoid this mistake by recognizing the 5′ cap and the poly A tail. Bacteria must solves this problem another way since there are no signals at the 3′ ends of bacterial mRNAs. Instead, the bacterial ribosome translates to the end of an incomplete RNA and then a special RNA, tmRNA, enters the A site of the robosome and is itself translated. This adds a special 11 amino acid to the C terminus of the truncated protein that signals for degradation.

One deviation from the genetic code is the use of a 21st amino acid called selenocysteine that can be incorporated into a growing polypeptide chain through translational recoding. Selenocysteine is essential for the efficient function of a variety of enzymes and is produced when a specialized tRNA is charged with serine which is then converted enzymatically. A specific RNA structure in the mRNA (a stem and loop structure with a particular nucleotide sequence) signals that selenocysteine is to be inserted at a UGA stop codon.

Another deviation from the genetic code that is produced by translational recording is called translational frameshifting which is commonly used by retroviruses like  HIV, in which it allows more than one protein to be synthesized from a single mRNA. These viruses commonly make both the capsid proteins (Gag proteins) and the viral reverse transcriptase and integrase (Pol proteins) from the same RNA transcript. Such a virus needs more copies of the Gag proteins and it achieves this by encoding the pol genes just after the gag genes but in a different reading frame. A stop codon at the end of the gag coding sequence can by bypassed on occasion. The frameshift occurs because features in the local RNA structure cause tRNAleuattached to the C terminus of the growing polypeptide chain to slip backward on occasion by one nucleotide on the ribosome.

There is evidence that histoe acetylation and deacetylation are mechanisms by which transcriptional regulation in a cell is achieved (Grunstein, M. (1997) Nature 389: 249-52). These effects are throught to occur through changes in the structure of chromatin by altering the affinity of histone proteins (e.g., H1, H2A, H2B, H3 and H4) for coild DNA in the nuceosome. It is believe that when the histone protein are hypoacetylated, there is a greater affinity of the histone to the DNA phosphate backbone. This affinity causes DNA to be tightly bound to the histone and renders the DNA inaccessible to transcriptional regulatory elements and machinery. The regulation of acetylated states occurs through the balance of activity between two enzyme complexes, HAT and HDAC which are covered below. 

Histone acetyl transferases (HATs) 

1. Introduction:

HATs add acetyl groups to histone (particulary H3 & H4) lysine residues which can eliminate higher order chromatin structures.  Histone acetylation tends to destablize chromatin structure, perhaps because adding an acetyl group removes the + charge from lysine, thereby making it difficult for histones to neutralize the charges on DNA. In addition to reducing the interaction between DNA and the histones, HATs may also provide binding sites for proteins that recognize acetylated lysines but not de-acetylated lysines. Decondensed or “open” chromatin is characterized by hyperacetylation of associated histones as well as by increased accessibility to restriction enzymes, nucleases and transcription factors. Histone acetylation may play a role in excision repair of DNA damage. Reports have shown that treatment of fibroblast strains with sodium butyrate, an agent which inhibits histone deacetylase enzymes, results in a 2-3 fold stimulation in repair synethsis occurring immediately after UV irradiation. 

Histone deacetylases (HDACs) 

1. Introduction: HDAcs remove acetyl groups from the the lysine residues of histone proteins, which has the reverse effect of above (condenses chromatin and prevents gene expression). Histone deacetylation is a major mechanism of methylated DNA silencing. See US Patent No. 6,541,661 entitled “inhibitors of Histone Deacetylase” provides compounds and methods for inhibiting HDAC enzymes.

2. Therapeutic applications of HDA inhibitors: 

Baopoulos (US 2007/0190022A1) discoses methods of treating cancer by adinistering to a subject an HDAC inhibitor. 

For Histone methylation see right hand panel.

Epigenetics has two main elements; DNA methylation and histone modifications. Transcriptional activation of tissue-specific genes generally involves both DNA demethylation and changes in chromatin structure, evident as changes in the pattern of accessibility to restriction enzymes or DNase I.These elements control how the DNA is structured, and the structure of DNA determines its function. DNA methylation is a modified form of DNA typically associated with a resting state which turns down DNA activity. Histone modifications manifest themselves by either ramping up DNA or toning it down.

Methylation of DNA

Methylation of CpG: Addition of a methyl moiety to the 5th position of DNA cytosin base (5-methycytosin) is an evolutionarily conserved feature of most vertebrate and plant genomes. Such DNA methylation yields a fifth coding element for the genome, and plays diverse roles in modulation and expression of the associated genomic information. In mammals, DNA methylation occurs predominantly at CpG dinucleotides, and underlies a variety of transcriptional regulatory phenomena, including imprinting, X-chromosome inactivation, transgenerational epigenetic inheritance and stable silencing of gene activity. Vertebrate cells contain a family of proteins that bind methlated DNA. These proteins interact with chromatin remodeling complexes and histone deacetylases that condense chromatin so it becomes transcriptionally inactive. DNA binding factors which bind methylated CpG residues, such as MeCP2, associate with the mSin3/histone deacetylase (HDAC) corepressor complex, providng a mechanistic link between DNA methylation and the formation of repressed, higher-order chromatin structures. 

The proteins mediating CpG methylation ahve been identified, and their roles in mammalian development have been investigated. The DNA methyltransferase Dnmt1 maintains CpG methylation during DNA replication by methylating the newly synthesized daughter DNA strand, using the methylation pattern of the parental strand as a template. Dnmt1 binds the proliferating cell nuclear antigen PCNA and can be found in a complex bound to hemimethylated DNA at replication foci in late S pahse. However, Dnmt1 does not efficiently modify unmethylated CpG in vivo; de novo DNA methylation is accomplished by two other DNA methyltransferases, Dnmt3a and Dnmt3B, which symmetrically methylate CpG pairs on both DNA strands.

In vertebrates, DNA methylation is found primarily on transcriptionally silent regions of the genome, such as the inactive X chromosome (discussed below). X Inactivation is the transcriptional inactivation of one of the 2 X chromosomes in female somatic cells that occurs early in the development of a female embryo when it consists of a few thousand cells. One of these 2 chromosomes becomes highly condensed. X chromosome is initiated and spreads form a single site in the middle of the X chromosome called the X-inactivation center (XIC) which is a large DNA sequence that codes for an RNA molecule, XIST RNA which is expressed solely from the inactive X chromosome. The XIST RNA does not get translated into protein but rather stays in the nucleus where it participates in the formation and spread of heterchromatin. In activation of one of the females X chromosomes is absolutely necessary to insure an equal dosage of X chromosome gene products between males and females. Mutations that interfere with such dosage are lethal.

Genomic imprintinginvolves the methylation of genes which usually silences nearby gene expression. The imprinting is epigenetic which means that it is heritable. Imprinted genes are usually found organized as clusters. The clusters often include genes for noncoding RNAs, the expression of which often correlates with repression of nearby protein coding genes. One studied example of coding and noncoding gene partners is Igf2/Air. In one model, an Air transcript recruits a repressor complex to chromatin in a way similar to X inactivation above. Specific sequences around the imprinted genes in the cluster are recognized by the Air-containing ribonucleoprotein complexes. AirRNA coats this region in a similar manner to X inactivation.

Demethylation of DNA

In contast to what is known about how genes are methylated, little is known about how genes are demethylated, either during development or during somatic differentation. An “active” model of tissue-specific gene demethylation has been propsoed in which demethylases catalyze localized demethylation of those genetic loci that have been targetd for activation by differentiative signals. An alternative “passive” model of CpG demethylation proposes that lineage-specific DNA binding factors synthesized in differentiating cells interfer with maintenance methylation by inhbiting access of Dnmt1 to unmodifed CpG on newly replicated DNA strands.

See also Chromosomes and DNA Methylation and deMethylation and role of Epigenetic Modifications in Driving Th1/Th2 Development and Fetal Assays based on Chomatin epitopes and Histone aceylation (right hand panel). 

A large number of covalent modifications of histones have been documented, including acetylation, phosphorylation, methylation, ubiquitination, and ADP ribosylation, that take place on the amino terminus “tail” domains of histones. Such diversity in the types of modifications and the remarkable specificity for residues undergoing these modifications suggest a complex hierarchy of order and combinatorial function that remains unclear. Of the covalent modifications known to take place on histone amino-termini, acetylation is perhaps the best studied and appreciated. Recent studies have identified previously characterized coactivators and corepressors that acetylate or deacetylate, respectively, specific lysine residues in histones in response to their recruitment to target promoters in chromatin. These studies provide compelling evidence that chromatin remodeling plays a fundamental role in the regulation of transcription from nucleosomal templates.

Epigenetics is the study of heritable changes in gene expression or cellular phenotype which do not involve changes in the underlying DNA sequence itself. Epigenetic modifications do not change the DNA sequence itself but alter the transcriptional activity of genes, changing the repertoire of genes expressed by the cell. The two major epigenetic mechanisms are DNA methylation at CpG islands in the promoter and histone acetylation (see below). Methylation at CpG island silences gene transcription in most instances, but, rarely, it results in activation. Similarly, deacetylation of histones is thought to result in transcriptional silencing because of the condenstation of chromatin. Recent evidence suggests that the two processes are related. Methylated DNA appears to preferentially associate with histone deaceylase protein complexes and histone methylases. Histone methylation has been shown to result in DNA methylation. 

The chromatin in the nucleus of eukaryotic cells is regulated to permit or exclude access of the enzymatic machinery for processes such as transcription and recombination. Specific regulatory sequences in the DNA are ultimately responsible for this regulation, serving as binding sites for proteins or protein complexes that recruit specific chromatin-modifying activities. In the case of transcriptional regulation, specific DNA-bindng activators or repressors recruit histone-modifying enzymes and nucleosome remodeling complexes, generating localized modifications of the chromatin that govern the access of the transcription machinery. In addition to such localized chromatin modificaitons, there are developmentally regulated large-scale reorganizations of chromatin structure into active and inactive domains.

The amino-terminal (NH2) terminal tails of histones H3 and H4 protrude from nucleosomes and are subject to diverse modifications, including phosphorylation, methylation and acetylation. For instance, methylation of Lys 9 of histone H3 (K9/H3) is involved in the formation of stable repressive heterochromatin whereas methylation of K4/H3 is associated with transcriptional activity. These covalent modifications may alter the interaction of histone tails with DNA or serve as docking sites for chromatin associated proteins. Such modifications can loosen up or further condense chromatin or such post translational modifications can create recognition (binding) sites for other proteins that regulate gene expression. In this last process, deposition of a given modification on the histone tail is thought to specify a code that dictates the regulatory features of a gene (the “histone code” hypothesis). 

Core histone actylation is a reversible post-translational modification, and transcriptional activators and repressors recruit histone acetyltransferases (HATs) and histone deacetylases (HDACs) to gene promoters and enhancers. Chromatin remodeling and histone acetylation at regulatory regions of IL4 and IFNG is also associated with T cell differentiation. Mechanisms and examples of such chromatin alteration are as follows:

Chromatin remodeling complexes which are protein machines that use ATP hydrolysis can also change the structure of nucleosomes temporarily so that DNA becomes less tightly bound to the histone core.

RNA Capping:

In eukaryotes, transcription is only the first step leading up to a mature mRNA which can be transported out of the nucleus to the cytoplasm of the cell for translation. Intron (noncoding) sequences must also be removed from the primary transcript and covalent modifications must be made at the ends of the mRNA. However, these events are coupled and can occur simultaneously during transcription. For example, an RNA cap is added and splicing of introns typically begins before transcription has been completed. This coupling is achieved through the RNA polymerase tail where proteins on the tail jump onto the nascent RNA molecule to begin processing it as soon as it emerges from the polymerase.

Eurkayrotic mRNA is modified in the nucleus with the addition of a methylated GTP to the 5′ end of the traanscript, called the 5′ cap, and a long chain of adenine residues to the 3′ end of the transcript, called the 3′ poly-A tail. When the transcript reaches about 20 nucleotides, it is modified by the addition of GTP to the 5′ PO4- group. The G in the GTP is also modifed by the addition of a methyl group, so it is often called a methyl-G cap. The transcript is cleaved downstream at a specific site (AAUAAA) while the polyemrease elongating and a series of 100-200 adenosine resiudes, the 3′ poly-A tail, is added to the mRNA. The enzyme responsible for this is poly-A polymerase.

RNA capping at the 5′ end is the first modification of eukaryotic pre-mRNAs. This cap is an unusual 5′ to 5′ linkage of 7-methylguanosine to the mRNA. The 5′ methyl cap helps to distinguish mRNAs from other RNA molecules present in the cell. For example, RNA polymerases I and III produce uncapped RNAs during transcription. The cap is also used to help the RNA to be properly processed and exported and will have an important role in the translation of mRNAs in the cytosol.

In humans, only 1-1.5% of the genome is devoted to the exons that encode proteins; 24% is devoted to the noncoding introns within which these exons are embedded. The primary transcript is but and put back together to produce the mature mRNA, in a process known as pre-mRNA splicing. It occurs in the nucleus prior to the export of the mRNA to the cytoplasm.

Splicing:

The introns-exon junctions are recognized by small nuclear ribonucleoprotein particles (snRNPs) or “snurps”): The snRNPs are complexes composed of snRNA and protein. These snRNPs cluster together with other associated proteins to form a larger complex called the spliceosome, which is respnsible for the splicing or remal of the introns. Introns all being with the same 2 base sequence and end with another 2 base sequence that tags them for removal. No rules govern the number of introns per gene or the sizes of introns and exons. Some genes have no introns; others have as many as 50. The sizes of exons range from a few nucleotides to 7500 nt and the sizes of introns are equally variable. Also, if eveyr gnee was spliced to include all exons, then the number of genes, transcripts and proteins would be euqal. But this is not the case. A single primary transcript can be spliced into different mRNA by using different sets of exons in a process called alternative splicing. It is estimated that up to 95 of human genes prodce multiple splce products.

RNA splicing involves the joining of two exons while removing an intron as a “lariat.” RNA splicing is performed mostly by snRNAs rather than proteins. These snRNPs form the core of a spliceosome which is a large assembly of RNA and protein molecules that performs pre-mRNA splicing. The process starts with a specific adenine nucleotide at a “branch point” in the intron sequence which attacks a 5′ splice site, cutting the sugar phosphate backbone of the RNA. The 5′ end of the intron becomes covalently linked to the A residue. The released 3′ OH end of the exon then reacts with the start of the next exon sequence, joining the 2 exons together and releasing the intron sequence in the shape of a lariat.

Generating the 3′ end of a eukaryotic mRNA is much more complicated than in bacteria where RNA polymerase simply stops at a termination signal and releases both the 3′ end of its transcript and the DNA template. In eukaryotes, there is a consensus sequence in the DNA which directs cleavage of the transcribed mRNA as well as polyadenylation. 2 proteins called CstF (cleavage stimulation Factor F) and CPSF (cleavage and polyadenylation specificity factor) are crucial for this process. First the RNA is cleaved and then another enzyme called poly-A polymerase adds about 200 A nucleotides to the 3′ end produced by the cleavage. Unlike the usual RNA polymerase, poly-A polymerase does not require a template.

Protein binding to a pre-mRNA is necessary not only for transcription of the mRNA, but is also necessary for transportation of the transcript out of the nucleus to the cytoplasm where translation will occur. Only if the proper set of proteins is bound to a mRNA will it be guided through the nuclear pore complex into the cytosol. Nuclear pore complexes are aqueous channel in the nuclear membrane that allow small molecules to diffuse. But macromolecules like mRNA need special processes to move them. Signals on the macromolecule determine whether it is exported and many of the proteins necessary for splicing are of key importance.

Regulated alternative splicing of pre-mRNA is a critical mechanisms by which functionally different proteins are generated from the same gene. Splice site selection and specificy are influenced by 5′ and 3′ splice sites located at the exon-intron boundaries of pre-mRNAs and by exonic splicing enhancer (ESE) and suppressor (ESS) elements. In general, the binding of serine/arginine rich proteins (SR proteins) to the ESEs enhances splicing and the binding to the ESSs by members of the heterogenoeous nuclear rebonucleoportein (hnRNP) family results in a supression of splicing. SR proteins stimulate the selection of intron-proximal 5′ splice sites in pre-mRNAs that contain 2 or more alternative 5′ splice sites, while hnRNPs have the opposite effect, promoting the selection of intron-distal 5′ spice sites.

See also transcription

Comparative genome analyses indicate that increases in gene number do not account for increases in morphological and behavioral complexity. For example, the nematode worm has about 20,000 genes, but lacks the full range of cell types and tissues seen in the fruitfly which contains fewer than 14,000 genes. How has evolutionary diversity arisen? Evidence suggests that organismal complexity is not related to gene number but rather from progressively more elaborate regulation of gene expression.

There are many ways in which a relatively small number of genes can be exploited so as to generate more complex organisms over evolutionary time. Two mechanisms are alternative splicing-the production of different RNA species form a given gene during mRNA splicing- and DNA rearrangement, where genes themselves are rearranged during cellular differentation, as used to generate diversity in mammalian immune systems.  Another way is greater elaboration of regulatory DNA sequences, which control the expression of nearby genes, and increased complexity in the multiprotein transcription compelxes that regulate gene expression.

Transcription is initiated at gene promoters, but many classes of transcriptional regulators, including DNA-binding transcription factors, coactivators, corepressors and proteins that alter epigenetic modifications of DNA and nucleosomal histones, combine to influence the function of minimal promoters. These transcriptional regulators act through a variety of DNA regulatory elements including enhancers, silencers and insulators, which are often located far from the gene promoters. DNA-based regulatory modules may be located in the introns of the regulated gene, a few hundred base pairs to hundreds of kilobases 5′ or 3′ of the gene, or even in the introns of a neighboring irrelevant gene.

Genome sequence comparisons are used to detect noncoding genomic regions that have been evolutionarily conserved, presumbly to maintain come critical biological function. Such regions, called conserved noncoding sequences (CNSs), often correspond to dispersed transcriptional regulatory elements. CNS may also correspond to loci for noncoding RNAs or provide signals necessary for regulated mRNA splicing. Knowledge of the genomic location of CNS regions facilitates analysis of their biological functions. The regions can be cloned and tested in cell based reporter assay or deleted or mutated in their natural genomic context or in the context of bacterial or yeast artificial chromosome transgenes. Different cell types expressing the same gene choose different subsets of the CNS regions as  Dnase I hypersensitive sites  implying that the evolutionary conservation of these regions derives not from their simultaneous participation in gene expression in all expressing cell types but rather from their specialized roles as protein binding regulatory elements in individual cell types.

Typically each DNA regulatory element binds a different subset of transcriptional regulators and thus is independently controlled, imparting modularity to gene regulation.

Control of Transcription:

Gene DNAs and Regulatory Proteins:

Alterations in Chromatin Stucture as a transcriptional control

Transcription Attenuation: In this type of transcriptional control RNA synthesis occurs but for some reason transcription is terminated prematurely. This could happen, for example, where the nascent RNA chain adopts a structure that causes it to interact with RNA polymerase so as to abort transcription. HIV uses a form of transcription attenuation in its life cycle.

Alternative RNA Splicing: is used by many organisms to make different polypeptide chains from the same gene. The regulation of RNA splicing can generate different versions of a protein in different cell types. Recent estimates from expressed sequence tag (EST) studies indicate that 40-60% of human genes are alternatively spliced, and in many cases alternative isoforms result in proteins of distinct function. For example, isoforms generated by alternative splicing may show change or loss of specific function(s) or localization of the respective product, or even a gain of a novel unexpected function (Sykorova, Biol. Cell, 2009, p. 381, 2nd column, lines 1-6).

Biologically relevant isoform differences range form subtle, such as a few nucleotides at an alternative 5′ or 3′ splice site, to skipping several consecutive exons. Variant isoforms can be specific to tissue types or developmental stages and are involved in a large number of normal cellular functions. Defects in splicing also account for a substantial fraction of human genetic disease.

A human example of tissue specific alternative splicing involves proteins produced by the thyroid gland and the hypothalamus. These two organs produce two distinct hormones: calcitonin and CGRP (calcitonin-gene-related peptide) form one gene. Calcitonin controls calcium uptake and the balance of calcium in tissues such as bones and teeth. CGFP is involved in a number of neural and endocrine functions. Despite their different physiological roles, they are produced form the same primary transcript. Whether calcitonin or CGRP is produced depends on different splicing factors in the hytroid and the hypothalamus.

RNA 3′ End Cleavage: In eukaryotes, the 3′ end of a mRNA results from the termination of RNA synthesis by the RNA polymerase and then a cleavage reaction. Cells can regulate where this cleavage event occurs so as to change, for example, the C terminus of the resultant protein. During the development of B lymphocytes, for example, the antibody it produces is anchored in the PM where it serves as a receptor for antigen. Antigen stimulation causes B lymphocytes to multiply and to begin secreting their antibody which is identical to the membrane bound form except for a shorter string of hydrophilic amino acids instead of the long string of hydrophobic amino acids on the membrane bound form. This change is generated through a change in the site of cleavage at the 3′ end. In unstimulated B lymphocytes, the first cleavage poly A addition site encountered by an RNA polymerase is suboptimal and usually skipped leading to production of the longer transcript. When antibody stimulation causes an increase in CSTF concentration, cleavage occurs at the suboptimal site which results in a shorter transcript but one which includes some intron sequence having which encodes the hydrophilic portion (since early cleavage also occurred in front of the 3′ splice site necessary for intron removal).

RNA Editing: alters the nucleotide sequences of mRNA transcripts once they are transcribed. In mammals, for example, there can be enzymatic deamination of adenine to produce inosine which can change the splicing pattern in the RNA or even change the meaning of codons since inosine can base pair with cytosine. Editing is carried out by protein enzymes.

Post Transcriptional Controls

Translational Repressors: Bacteria have a Shine-Dalgarno sequence upstream of the initiating AUG that base pairs with the 16S RNA in the small ribosomal subunit, correctly positioning the initiating AUG codon in the ribosome. Many bacterial mRNAs have specific translational repressor proteins that can bind in this vicinity and inhibit translation of only that species of mRNA. Eucaryotic mRNA do not contain a Shine-Dalgarno sequence but rather use the 5′ cap of the mRNA by the small ribosomal subunit to start scanning for an initiating AUG codon. Some repressor proteins bind to the 5′ cap and inhibit translation initiating.

Phosphorylation of Initiation Factor: The initiation factor eIF-2 for translation in eukaryotes binds very tightly to GDP such that another initiating factor protein is required to cause GDP release so that a new GTP molecule can bind and eIF-2 can be reused. The reuse of eIF-2 is inhibited when it is phosphoyrlated. Regulation of the level of active eIF-2 is important in mammalian cells that allows them to enter a nonproliferating resting state called G0.

mRNA degradation: Most mRNAs in bacteria are very unstable, having a half life of about 3 minutes. Many mRNAs in eucaryotic cells that code for regulatory proteins also have half lives of 30 minutes or less. Exonucleases are responsible for the rapid destruction of bacterial mRNAs. In eucaryotes, the poly A tails are gradually shortened by an exonuclease once the mRNAs enter the cytoplasm. Once a critical threshold is reached, the 5′ cap is removed and the RNA degraded.

Some mRNA also have sequences in their 3′ UTR region that serves as recognition sequences for endonucleases to cleave the poly A tail in one step. For example, many contain AU-rich elements (AREs) in their 3′ untranslated region (UTR). These AREs characteristically comprise clusters of the motif AUUUA and confer instability. Activity in the p38 MAPK pathway can counter this destabilizing effect and so stabilizes the mRNA in question.

mRNAs containing TTATTTAT is detected in the 3’UTR of many cytokine genes: TNF?, TNF?, IL-1alpha, IL-1beta, M-CSF, IFNalpha, IFNbeta, c-fos. This element is important for enhanced turnover of cytokine mRNAs.

See Immunol. 2005 Jan 15;174(2):953-61; Arthritis Res Ther. 2004;6(6):248-64; and Science. 1998 Aug 14;281(5379):1001-5.

RNA interference (RNAi): see outline

See also RNA capping and splicing

Transcription in eukaryotic cells occurs in the nucleus followed by translation in the cytoplasm.

Transcription is the transfer of information from a double stranded DNA molecule to a single RNA molecule. Like DNA, RNA is a linear polymer made up of four types of nucleotides linked together by phosphodiester bonds. It differs from DNA in that (1) its sugar units are riboses (OH attached to the 2’C) rather than deoxyriboses, (2) uracil is substituted for the base thymine (uracil lacks the methyl group which thymine has), (3) it is single stranded except in some viruses and (4) it does not form a double helix although it can fold back onto itself to form double helical regions.

Most of the genes which are transcribed into RNA specify the amino acid sequence of proteins. The RNA which is transcribed from such genes is referred to as messenger RNA (mRNA). However, the final product after transcription of many genes is not protein but rather the RNA itself. Like proteins, these RNAs serve as enzymatic and structural components for a wide variety of processes in the cell. 3 major classes of such RNAs are small nuclear RNA (snRNA) which direct the splicing of pre-mRNA to form mRNA, ribosomal RNA (rRNA) which form the core of ribosomes and transfer RNA (tRNA) molecules which serve in translation of the mRNA. Transfer RNA molecuels have amino acids coavlently attached to one end and an anticodon that can base pair with an mRNA codon at the other end.

Because DNA is double-stranded and RNA is single-stranded, only one of the two DNA strands needs to be copied. The strand that is copied is called the template strand. The RNA transcript’s sequence is complementary to the template strand. The strand of DNA not used as a template is called the coding or sense strand becasue it has the same “sense” as the RNA. It has the same sequence as the RNA transcript, except that U is used instead of T. The template strand is referred to as the anti-sense strand.

Another type of RNA is the SRP RNA. In eukaryotes, where some proteins are synthesized by ribosomes on the roughg ER, this process is mediated by the signal recognition particle or SRP that contains both RNA and proteins.

Yet another type of RNA are small RNAs which includes both micro RNA (miRNA) and small interfering RNA (siRNA) which are involved in the control of gene expression.

Transcription starts with the opening and unwinding of a small portion of the DNA double helix to expose the bases on each DNA strand. Transcription runs in the 5′ to 3′ direction so that the RNA produced by transcription starts from the 3′ end. In contrast to DNA replication, the RNA single strand product does not remain hydrogen bonded to the DNA template but is rather displaced just behind the region where the ribonucleotides are being added. This release means that many RNA copies can be made from the same gene.

The enzymes which transcribe DNA are called RNA polymerases. They differ from DNA polymerases in that 1) they catalyze the linkage of ribonucleotides, not deoxyribonucleotides, 2) they can start an RNA chain without a primer and 3) they lack the high fidelity or accuracy of DNA polymerases (there is an error rate of about 1 of 104 nucleotides copies into RNA in comparison of 1 of 107 by DNA polymerase).

Eucaryotics have 3 types of RNA polymerases which transcribe different types of genes. RNA polymerase I transcribes the 5.8S, 18S and 28S rRNA genes. RNA polymerase II transcribes all the protein coding genes (mRNA) along with some small nucleolar RNAs (snoRNAs) and small nuclear snRNA genes. RNA polymerase III transcribes tRNA genes, 5S rRNA genes and some other snRNA genes.

RNA polymerase II requires the help of a large set of proteins called general transcription factors which must assemble at the promoter with the polymerase before transcription can begin. The proteins are called “general” because they must assemble on all promoters used by RNA polymerase II. The assembly starts with the binding of a short double helical DNA sequence composed mainly of T and A nucleotides about 25 nucleotides upstream from the transcription start site, referred to as the TATA box by a subunit, TBP (for TATA box binding protein), of one of the general transcription factors, TFIID (for transcription factor II). After RNA polymerase II has been guided onto the promoter DNA to form a transcription initiation complex, it gains access to the template strand by another general transcription factor, TFIIH which contains a DNA helicase. Some eukaryotes also use a GC box rather than a TATA box which serve as promoter sequences for the transcription factor SP1. Compare Bacteria

Transcription in the eukaryotic cell is very complex and requires more proteins than it does on purified DNA. Even other regulatory proteins called enhancers or transcriptional activators bind to specific sequences in DNA sometimes several thousand nucleotide pairs away from the transcription start site to help RNA polymerase and the general factors assemble at the promoter. The enhancers which can be located anywhere with respect to the gene (5′ of the promoter, 3′ of the gene or even in an intron of the gene). In addition, enhancers attract ATP-dependent chromatin remodeling complexes and histone acetylases which allow greater accessibility to the DNA present in chromatin and thereby facilitate the assembly of the transcription ininitiation machinery on DNA.

These proteins bind to specific nucleotide sequences within promoters and enhancers and act either to enhance or suppress their activity. Such proteins have been identified by a variety of techniques like DNA footprinting.

Once transcription has started, RNA polymerase also require elongation factors which are proteins that help polymerases move along the DNA template. In order to deal with DNA supercoiling (1 large DNA supercoil will form to compensate for each 10 nucleotide pairs that are unwound, DNA topisomerase enzymes are necessary to remove this tension. Click Here to see what happens to a mRNA after transcription.

Ribosomal RNA genes exist in multiple copies in order to produce necessary quantities of rRNA. This is important because, unlike mRNA, rRNAs are not translated. The final produce from transcription in the case of rRNA is the rRNA itself and thus amplification can not be achieved through multiple rounds of amplification. There are 4 types of eukaryotic rRNA. 3 of the 4 types (18S, 5.8S, and 28S) which are transcribed by RNA polymerase I, are made by chemically modifying and cleaving a single large precursor rRNA and the 4th (5S) is transcribed by a different polymerase RNA polymerase III. The specific positions at which chemical modifications are made as well as the cleavage of the precursor rRNA into the mature rRNA are made by a large class of RNAs called small nucleolar RNAs (snoRNAs) so named because they perform their functions in a subcompartment of the nucleus called the nucleolus which is a large aggregate of macromolecules including the rRNA genes, snoRNPs and other proteins.

Prokaryotic Transcription:

Unlike eukaryotes, bacteria have a single RNA polymerase. Accurate initiation of transcription requires two sites in DNA; one called a promoter that forms a recognition and binding site for the RNA polymerase and then the actual start site. The polymerase also needs a termination site.

The first based transcribed is referred to as +1. Bases upstream of the start site receive negative numbers starting at -1. The promoter is a short sequence found upstream of the start site; it is not transcribed by the polymerase. Two 6 base sequences are common to bacterial promoters; one is located 35 nt upstream and the other is located 10 nt upstream of the start site. The binding of RNA polymerase to the promoter is the first step in transcription. Promoter binding is controlled by the alpha subunit of the RNA polymerase holoenzyme, which recognizes the -35 sequence in the promoter and positions the RNA polymerase at the correct start site, oriented to transcribe in the correct direction.  Once bound to the promoter, the RNA polymerase begins to unwind the DNA helix at the -10 site. The polymerase covers a region of about 75 bp but unwinds only about 12-14 bp.

The region containing the RNA polymerease, the DNA template and the growing RNA trasncript is called the transcription bubble because it contains a locally unwound “bubble” of DNA.

The end of a bacterial transcripton unit is makred by terminator sequences that singal “stop” to the polymerease. The simplest temrinators consist of a series of G-C base pairs followed by a series of A-T base pairs. The RNA transcript of this stop region can form a double stranded structure in the GC region called a hairpin, which is followed by four or more uracil ribonucleotides. Formation of the hairpin casues the RNA polyemrease ot pause, placing it direclty over the fun of four uracils. The pairing of U with the DNA’s A is the weakest of the four hybrid base pairs and it is not strong enoguh to hold the hybrid strands when the polymerase pauses.

In prokaryotes, the mRNA produced by transcripion begins to be translated before transcription is finished. Thus, transciption and translation are ocupled. As soon as a 5′ end of the mRNA becomes availabe, ribosomes are loaded onto this to being translation.  This is not the case with eukaryotes where transcription occurs in the nucleus and is followed by translation in the cytoplasm.

Amino Acid Table

Amino Acid 3-letter Code 1-letter code Side chain (R)

Alanine   Ala  A  -CH3

Arginine  Arg  R  -(CH2)3NHCHNH2NH2+

Asparagine  Asn  N -CH2CONH2

Aspartic Acid   Asp   D  -CH2COOH

Cysteine  Cys  C  -CH2SH

Glutamic Acid  Glu  E  -(CH2)2COOH

Glutamine  Gln Q  -(CH2)2CONH2

Glycine  Gly G  -H

Histidine  His  H  -CH2-imidazole

Isoleucine  Ile I   -CH(CH3)CH2CH3

Leucine  Leu  L  -CH2CH(CH3)2

Lysine  Lys  K   -(CH2)4NH3+

Methionine  Met  M -(CH2)2SCH3

Phenylalanine  Phe   F-CH2-Phenyl

Proline   Pro   P  -?, N-(CH2)3

Serine   Ser   S  -CH2OH

Threonine   Thr   T  -CH(OH)CH3

Tryptophan  Trp  W-CH2-indole

Tyrosine  Try  Y -CH2-phenyl-OH

Valine  Val  V -CH(CH3)2

codon table

Send an Email. All fields with an * are required.