Home
Acima

Mass Fingerprinting using Tryptic Peptides.

When applied to a mixture of polypeptides derived from a protein spot, a mass spectrometry analysis yields a complex series of molecular masses, each of which corresponds to that of a peptide obtained from the trypsin digestion.  Since it is assumed that the protein spot in the 2D gel was well isolated, it follows that all the measured peptides are derived from a single protein. Therefore, since trypsin specifically cleaves peptide bonds at lysine and argenine residues, the resulting peptides will have masses that are specifically related to the full protein sequence from which they were derived. The mass spectrum will therefore give a characteristic profile of peptide masses for a determined protein, and is commonly referred to as the "mass fingerprint" of a protein. Although at this stage there is no way of knowing which polypeptide gives rise to which peak in the mass spectrometry data, this data set is very useful when combined with genomic DNA sequence information.

Modern mass spectrometers can go one step further in peptide analysis. The instrument can be tuned to analyse one particular mass peak (ie one chosen peptide), which is allowed to collide with an inert gas (usually argon). This results in the fragmentation of the peptide into the charged polypetide derivatives, and the differences in the masses of these  derived "daughter ions" (mD, mE etc in the figure below) can be analysed to derive the amino acid sequence of the chosen peptide. In this way, not only can the mass of the peptide be found, but also its amino acid sequence. This peptide mass and sequence information is all that is needed for a comparison with the protein databases to identify the protein. This comparison works best when the protein database is derived from a genomic DNA sequencing project of the same organism.

One of the benefits of the rapid expansion in the area of genomics is the avalability of hundreds of whole genome sequences. Although the ability to analyse and interpret this data is barely keeping pace with the DNA sequencing technology, the number of reliably annotated genomes increases daily. An important part of genome annotation involves the identification of the open reading frames of all encoded proteins, and relies heavily on sophisticated (although not infallible) DNA sequence analysis algorithms. Since the predicted protein sequnces will be compared with mass spectrometry data, reliable and complete genome annotation is as important to the success of a proteomic project as the quality of the experimentally derived peptide mass data.

The positions of lysine and argenine residues in the amino acid sequence of a predicted protein may be located, and so the theoretical masses of the peptides resulting from and in silico tryptic digestion of this protein can be calculated. This analysis can rapidly be performed for all the predicted proteins in a genome, and the results compared to the peptide mass and peptide sequence data obtained by mass spectrometry using a sample derived from a single spot in the 2D gel. In this way, the measured and predicted mass fingerprints can be matched, and the protein in a given spot in the 2D gel can be identified.