CMB 2020 Retreat
Friday, October 2, 2020 Abstract: Genes are regulated by cis-regulatory elements, which contain transcription factor (TF) binding motifs in specific arrangements. To understand the syntax of these motif arrangements and its influence on cooperative TF binding, we developed a new convolutional neural network called BPNet that models the relationship between regulatory DNA sequence and base-resolution binding profiles from ChIP-exo/nexus experiments targeting four pluripotency TFs Oct4, Sox2, Nanog, and Klf4 in mouse embryonic stem cells. BPNet is able to predict base-resolution binding profiles and footprints on sequences not used in training at unprecedented accuracy on par with replicate experiments. We developed a suite of robust model interpretation methods including a new Fourier transform based attribution prior to learn novel motif representations, accurately map predictive motif instances in the genome and identify higher-order rules by which combinatorial motif syntax influences cooperative binding of these TFs. We discovered several novel motifs bound by these TFs supported by distinct footprints. We further found that instances of strict motif spacing are largely due to retrotransposons, but that soft motif syntax such as helical periodicity influences TF binding at protein or nucleosome range in a directional manner. We then validated our model's predictions using CRISPR-induced point mutations of motif instances. The sequence representations learned by the binding models can also be seamlessly transferred to accurately predict differential chromatin accessibility after TF depletion and intrinsic regulatory activity from massively parallel reporter experiments. BPNet easily adapts to other types of profiling experiments (e.g. ChIP-seq, DNase-seq, ATAC-seq, PRO-seq), thus paving the way to decipher the complexity of the cis-regulatory code using deep learning oracle models of functional genomics data. Bio: Anshul Kundaje is an Assistant Professor of Genetics and Computer Science at Stanford University. The Kundaje lab develops statistical and machine learning methods for large-scale integrative analysis of functional genomic data to decode regulatory elements and pathways across diverse cell types and tissues and understand their role in cellular function and disease. Anshul completed his Ph.D. in Computer Science in 2008 from Columbia University. As a postdoc at Stanford University from 2008-2012 and a research scientist at MIT and the Broad Institute from 2012-2014, he led the integrative analysis efforts for two of the largest functional genomics consortia - The Encyclopedia of DNA Elements (ENCODE) and The Roadmap Epigenomics Project. Dr. Kundaje is a recipient of the 2019 Chen Award of Excellence from the Human Genome Organization, 2016 NIH Director’s New Innovator Award and The 2014 Alfred Sloan Foundation Fellowship. Anshul is also a member of the NIH Director's Advisory Committee for Artificial Intelligence in Biomedical Research. past retreats / symposia: 2019 | 2014 | 2012 | 2011 | 2010 | 2009 | 2008 | 2007 CMB Journal Club The CMB journal club is CSE 590C, a weekly seminar on Readings and Research in Computational Biology offered every autumn, winter, and spring. Related Seminar Series at University of Washington Several seminar series on campus regularly have talks of interest to the computational biology community: |
Email Lists Announcements for these and other seminars of relevance to computational biologists are usually sent to the compbio-seminars@cs.washington.edu mailing list. All are welcome to subscribe and/or post relevant announcements: Announcements of local, national and international conferences, occasional job postings and other topics of potential relevance to computational biologists are usually sent to the companion list compbio-group@cs.washington.edu. Again, all are welcome to subscribe and/or post relevant messages: |