Assistant Professor, Public Health Sciences
- BS, Physics, Peking University
- PhD, Physics, George Washington University
- Postdoc, Biostatistics and Computational Biology, Harvard University
Bioinformatics and Genomics, Biology, Biomedical Engineering, Biophysics, Biotechnology, Cancer Biology, Computational Biology, Computer Science, Epigenetics, Genetics, Immunology, Statistics
Bioinformatics methodology development; Epigenetics and chromatin biology; Transcriptional regulation; Cancer genomics and epigenomics; Statistical methods for biomedical data integration; Advanced machine learning; Theoretical and computational biophysic
<p>The research in my lab focuses on developing computational methodologies and integrative genomics approaches to study epigenetics and transcriptional regulation of gene expression in a variety of mammalian cell systems and human diseases such as cancer.
How gene expression is regulated in chromatin is a fundamental question in molecular biology. High-throughput technologies such as next-generation sequencing (NGS) have become powerful tools for studying gene regulation at the genomic scale. We conduct computational research that leverages these genomics technologies. Some research directions include:
1. <i>Next-generation sequencing bioinformatics</i>
We are interested in developing statistical methods and novel algorithms for analyzing massive data from next-generation sequencing (NGS) coupled with various assays for studying genomic chromatin profiles, such as ChIP-seq for transcription factor and histone mark profiling, ATAC-seq and DNase-seq for chromatin accessibility profiling, etc. As a pioneer in ChIP-seq bioinformatics, we developed SICER (<a href="https://academic.oup.com/bioinformatics/article/25/15/1952/212783" target="_blank"><i>Bioinformatics</i> 2009</a>), one of the most widely used methods for ChIP-seq data analysis with exceptional performance for board histone modification marks. We are developing novel statistical models and computational methods for analyzing DNase/ATAC-seq data and for studying chromatin dynamics.
2. <i>Chromatin, epigenetics, and transcriptional regulation</i>
Our ultimate goal is to understand the fundamental mechanisms in transcriptional regulation and the functions of chromatin. We characterized dozens of histone modifications and histone modifying enzymes at the genomic scale (<a href="http://www.nature.com/ng/journal/v40/n7/full/ng.154.html" target="_blank"><i>Nat Genet</i> 2008</a>, <a href="http://www.nature.com/ng/journal/v41/n8/full/ng.409.html" target="_blank"><i>Nat Genet</i> 2009</a>, <a href="http://www.cell.com/abstract/S0092-8674(09)00841-1" target="_blank"><i>Cell</i> 2009</a>, <a href="http://www.cell.com/cell-stem-cell/abstract/S1934-5909(08)00582-1" target="_blank"><i>Cell Stem Cell</i> 2009</a>). Leveraging the large amount of publicly available ChIP-seq data, we developed MARGE (<a href="http://genome.cshlp.org/content/26/10/1417" target="_blank"><i>Genome Res</i> 2016</a>), a computational method to predict cis-regulatory profiles from differential expression gene sets using integrative learning approaches. We are specifically interested in studying functional enhancer regulation of gene expression in cancers.
3. <i>Genomic data integration for chromatin dynamics and regulatory networks</i>
High-dimensional genomic data analysis is challenging because of noises and biases in high-throughput experiments. We developed MANCIE (<a href="http://www.nature.com/articles/ncomms11305" target="_blank"><i>Nat Commun</i> 2016</a>), a method for bias correction and data integration of cross-platform genomic profiles on the same samples, using a Bayesian-supported principal component analysis (PCA)-based approach. We are interested in using statistical modeling and machine-learning approaches to integrate public genomic data for characterizing physical properties of mammalian epigenomes and dynamic interactions between chromatin and DNA in human cell systems.