Parag Mallick

Assistant Adjunct Professor of Biochemistry

B.S., Washington University, St. Louis; Ph.D, Chemistry & Biochemistry, UCLA

Phone: (310) 423-7600
Fax: (310) 423-1998

E-mail:mallick@chem.ucla.edu

UCLA Department of Chemistry & Biochemistry
Box 951569 (post)
607 Charles E. Young Drive East (courier)
Los Angeles, CA 90095-1569

Map of UCLA

Virtual Tour of UCLA Campus

 


 Current Research Interests

Molecularly-targeted therapy holds great promise as a new paradigm for treatment. The primary research interests of my lab centre on developing and applying systems approaches to quantitatively describe organisms' physiologic states towards the goal of enabling personalized, predictive medicine.  We are currently developing experimental techniques and computational methods for quantitative proteomic profiling and pattern discovery in order to identify prognostic fingerprints.    To validate fingerprints, rapidly classify samples, and better understand the chemical processes underlying proteomic technologies, we are developing computational and experimental tools for the discovery and application of proteotypic peptides.   In addition, we are developing tools to integrate genomic information with proteomic information so as to better elucidate the genome and to develop more detailed models of regulation.   Our general hope is to apply systems biology's complimentary computational and experimental methods in hopes that experimental results motivate large-scale computational studies, which initiate new experimental explorations. We hope this synergistic combination will provide insight into the relationship between molecular phenomena and organismic phenomena.

Background

The knowledge of the complete gene set of a species is providing the opportunity to systematically study the organization and expression of genes and their products.   We are now able to interpret DNA and protein sequences in terms of structure, biochemical function, cellular networks and cellular control systems.   Organismic states, both healthy and diseased, are hypothesized to arise from the alteration of systems' normal network structures through a combination of genetic modifications and environmental agents.   Whole cell, and whole organism, genome-wide analysis of gene expression, protein expression, protein state (e.g. glycosylation, phosphorylation) and related differential analyses in differentially perturbed systems have been widely applied to study biological processes and disease states.   Consequently, genome- and proteome-based techniques are rapidly permitting the examination of cellular processes and their relationship to physiologic effects in a greater detail than previously possible, enabling better characterizations of pathologic states, such as cancer.

Quantitative differential proteomics, defined as the comparison of relative protein changes in different samples, an important component of the emerging sciences of systems biology and functional genomics, provides accurate and comprehensive quantitation of the components of differentially-perturbed cell systems.   The technology is expected to facilitate the detection and identification of diagnostic or prognostic markers, to identify proteins for use as therapeutic targets and to provide insights into biological processes.

Differential genomic and proteomic analyses assume that specific alteration in the molecular composition and/or organizations of constituent molecules distinguish cells or organisms in different states; alterations consequential to a specific state, may serve as diagnostic markers.   For instance, protein markers like PSA for prostate cancer, progesterone receptors for breast cancer and alpha-fetoprotein for testicular cancers. On the other hand, alterations causal to the induction of a specific state reveal fundamental biological control mechanisms and relationships between cellular and physiologic processes identifying potential therapeutic targets.   Examples include the Ras oncogene targeted by inhibitors of farnesyltransferase and the transmembrane tyrosine kinase Her-2/neu, a target of antibody-based therapies.

Discovering Serum Markers Using Glycopeptide Enrichment

Quantitative proteomic analysis of blood serum seeks to detect and identify prognostic and diagnostic markers.   Blood serum is highly accessible and contains enormous information about organismic physiologic state; when blood circulates through the body, proteins secreted from cells, shed from cell surfaces and released from dead cells are deposited to the blood serum.   The physiologic state, the genetic background of the individual and pathological changes in organs, such as cancer, can affect serum protein composition.   Blood serum is also readily-accessible for diagnostic purposes.   A key problem with the proteomic analysis of serum and many other body fluids is the highly skewed composition of blood serum, which is dominated by a few highly-abundant proteins; albumin alone represents over 50% of total serum protein content.   Enrichment methods are typically employed, as the range of abundance of serum proteins is approximately 12 orders of magnitude, whereas the dynamic range of a mass spectrometry is only 4 orders of magnitude.

Our strategy for serum marker discovery is outlined here:   First, serum from annotated patients is enriched for glycopeptides.   Next, LCMS profiles of the mixture are generated.   From these profiles, computational tools are applied to extract and quantify the abundance of likely-peptide features.   Pattern-discovery methods are then applied to the extracted sets of putative peptides to identify peptides discriminating amongst annotation groups.   Discriminating peptides are submitted for MSMS sequencing.  

Serum Glycopeptide Enrichment

Protein glycosylation has long been recognized as a common post-translational modification prevalent in proteins destined for extracellular environments.   These include proteins on the extracellular side of the plasma membrane, secreted proteins and proteins contained in body fluids (e.g., blood serum).   These proteins happen to be most easily accessible for diagnostic and therapeutic purposes.   It is therefore not surprising that many clinical biomarkers and therapeutic proteins, such as Her2/neu, human chorionic gonadotropin, alpha-fetoprotein, PSA and CA125 are glycosylated.   It is interesting to note that many of the most abundant (and uninformative) proteins in serum, like albumin, are not glycosylated.

Quantifying and Comparing LCMS Experiments

Proteomic LCMS spectra are an encoding of the composition of a peptide/protein mixture; experimental variability, process artifacts, chemical noise and machine noise obfuscate the determination of a sample's true composition.

We decode the spectra by modeling noise and then by identifying and quantifying peptide-generated spectral features with a charge state-dependent template function.   This approach has allowed us to identify more peptide features with smaller error rates than traditional methods and to generate better estimates of peptides' relative abundance.   For accurate experiment superposition, dynamic programming-based retention time normalization methods were developed.   In addition, we recently developed a set of software benchmarks to aid the development of computational methods for LCMS quantification and comparison.  

These benchmarks allow for developing improved methods for decoding of proteomic experiments.   In addition, we have been applying existing algorithms to study general problems in the biological sciences, like protein degradation.

Rapid Fingerprint Quantification with Proteotypic Peptides

The development of a high-throughput screening technology for the detection and quantification of specific proteins within complex mixtures is an essential extension of the above technologies.   We have developed an approach wherein we first identify targeted proteins' proteotypic peptides, which are defined as the peptides that uniquely identify each protein and are consistently observed when a sample mixture is interrogated by a [tandem] mass spectrometer.   We then chemically synthesize a panel of isotopically-labeled proteotypic peptides, add that panel of peptides to a sample mixture of tryptic peptides and analyze the sample by MALDI TOF/TOF mass spectrometer.

Use of Proteotypic Peptides with LC-MALDI TOF/TOF MS

Stable isotope-labeled proteotypic peptides are synthesized and combined with a digested sample.   The combined mixture is separated by capillary reverse phase liquid chromatography and the eluting peptides are deposited on a MALDI sample plate.   For detection and quantification the sample is analyzed using a MALDI tandem mass spectrometer.   Acquired MALDI-MS spectra contain two types of signals: one representing the signals of the peptides for which no reference peptide was added, appearing as single peaks, and the other representing the signals for those peptides for which a reference peptide was added, appearing as paired signals with a mass difference that precisely corresponds to the mass difference encoded in the stable isotope tag.   The relative signal intensities of the isotopically heavy and light forms of a signal pair are determined and can be used to calculate the absolute abundance of the peptide derived from the protein sample. Proteins can be primarily identified by array position and mass of each isotopically-labeled peptide pair.   Peptide sequences can be confirmed by MS/MS methods.

Annotation of the Human Genome with Protein Sequences Obtained by High-Throughput Mass Spectrometry

The recent definition of the complete nucleotide sequence of the human genome has motivated the comprehensive annotation of the sequence.   The true promise of the human genome project, as the foundation for medical and biological research, can only be realized if the coding sequences are conclusively identified, intron/exon structures are accurately described and isozymes determined. It is not presently possible to reliably predict all features of the genome from sequence alone.   Therefore, the value of the human genome sequence can be enhanced through the collection of different types of experimental data and their integration and validation in a genomic context.

Recently, peptides derived from accurate interpretations of protein tandem mass spectrometry (MS) data were mapped to the draft human genome; in addition an expandable resource for integration of data from many diverse proteomics experiments was created in the so-called PeptideAtlas.

For some proteins, 100 or more peptide matches were observed allowing for the confirmation of 4,800 intron/exon boundaries and 600 differential splicing phenomena.   In most cases, these boundaries were already known to exist from cDNA information.   In some cases we observed medically-informative peptides hilighting differentially-skipped exons, for instance in the expression of the A-type lamins in the lung adenocarcinoma cell line GLC-A1.   The new peptide information confirms the existence of this splice variant and shows that low-abundance proteins can be detected through high-throughput proteomics technologies.

In our initial studies, 1086 peptides identified numerous times in different experiments could not be mapped to the genome.   These peptides are of special interest as they often document interesting biological phenomena such as single-nucleotide polymorphisms, demonstrating the need for annotating the human genome sequence with high-quality experimental data obtained from expressed proteins.   For example, the peptide AGKPVICATQMLESMIK was identified 525 times at different charge states and with different mass modifications in 20 distinct experiments and mapped to KPY1_HUMAN, a pyruvate kinase M1 isozyme.   Interestingly, the protein appears likely to have a valine to isoleucine polymorphism.

There are several natural extensions to the PeptideAtlas, including its translation to other genomes and development of tools for automated discovery of differential phenomena and their relationship to human disease.   The Atlas is also a potentially valuable resource for peptide assignment and for the generation of targeted-profiling experiments.

Open Source Software Tools

Our group recently released the Proteowizard Libraries an open source set of libraries to simplify the process of developing proteomics tools. They read and write the HUPO-PSI mzML standard and have been incorporated into the ISB's transproteomicpipeline!

 Representative Publications

Kuester B, Shirle M, Mallick P Aebersold R .   Innovation: Scoring Proteomes with Proteotypic Peptide Probes. Nature Reviews Molecular Cellular Biology 2005, 6 : 577-583,

Zhang H, Yi E, Li X, Mallick P, Aebersold R: High throughput quantitative analysis of serum proteins using glycopeptide capture and liquid chromatography mass spectrometry. Mol Cell Proteomics. 2005 Feb;4(2):144-55

Desiere F..Mallick P, ... , Aebersold R: Annotation of the Human Genome with Peptide Sequences obtained by High-Throughput Mass Spectrometry ., Genome Biology. 2004, 6 :R9

Mallick P, Marcotte E:   Making sense of proteomics: Using bioinformatics to discover a protein's structure, functions and interactions. Proteins and Proteomics: A Laboratory Manual , Dec. 2002, Cold Spring Harbor Laboratory Press

 Mallick Lab Members

Senior Scientists
Roland Luthy, Roland.Luethy@cshs.org
Maryann Vogelsang, Maryann.Vogelsang@cshs.org
Bob Rice, Robert.Rice@cshs.org

Post-doctoral Fellows
Kian Kani, Kian.Kani@cshs.org
Mohamad Abbani, abbanim@cshs.org

Software Developers & Bioinformaticists
Darren Kessner, darren.kessner@cshs.org
Robert Burke, Robert.Burke@cshs.org
Sul-Min Kimm, Sul-Min.Kim@cshs.org
Kate Hoff, Katherine.Hoff@cshs.org
Eva Orpelli, Eva.Orpelli@cshs.org

Research Associates
Damien Wood, wooddm@cshs.org
Rose Silvas, Rose.Silvas@cshs.org
Jenny Wan, Jenny.Wan@cshs.org
Anthony Nguyen, Anthony.Nguyen@cshs.org
Ah Young Joo, AhYoung.Joo@cshs.org