current
Biological systems contain a large number of components whose physical interactions bring about cellular processes. A fundamental problem in molecular biology is to catalog these interactions and to decipher their functional consequences. High throughput sequencing has made it possible to characterize some of these interactions rapidly, at high-resolution, and in vivo (e.g., protein-DNA binding via ChIP-Seq and protein-RNA binding via RIP-Seq). But many interactions are not susceptible to these methods (e.g., RNA- RNA complexes, ncRNA-DNA binding, and - aside from recent work described below - DNA-DNA contacts and genome folding.) This gap may be bridged by coupling high-throughput sequencing with proximity-ligation-based methods. In proximity ligation, spatially proximate nucleic acids ligate to one another, forming a chimeric oligo. Observation of a chimera composed of X and Y suggests that X and Y must have been near one another in the original sample. As a result, questions about spatial arrangement become questions about sequence composition, making it possible to take advantage of high-throughput sequencing. Nevertheless, the development of these approaches is challenging: they involve subtle molecular biology and produce massive high-dimensional datasets requiring wholly new analytical paradigms including extensive physical modeling. We recently developed Hi-C, the first technology that couples proximity ligation and high-throughput sequencing in an unbiased, genome-wide fashion (Lieberman-Aiden et al., Science, 2009). Hi-C uses a DNA-DNA proximity ligation step to identify long-range physical contacts between genomic DNA loci in vivo. We used Hi-C to create a low-resolution three-dimensional map of the human genome, and made two significant discoveries: (1) genetic regulation is accompanied by the three-dimensional movement of genes from an 'on' compartment to an 'off' compartment, and vice-versa; (2) a never-before-seen macromolecular state, the fractal globule, which couples extraordinary spatial density and a total absence of knots. Here, we propose to dramatically extend the above work, by building a new generation of tools for systematically exploring the spatial organization of genomes, RNAs, and proteins, and by applying these tools to explore how RNAs and proteins establish and regulate the three-dimensional architecture of the genome. We will accomplish this through three specific research aims: (1) We will create an ensemble of new technologies combining proximity ligation and sequencing to enable comprehensive mapping of (a) DNA-RNA contacts [via DNA-RNA proximity ligation]; (b) RNA-RNA complexes [via RNA-RNA proximity ligation]; (c) selected protein-protein complexes [via probe-coupled proximity ligation]. We will use these methods to generate maps of biomolecular contacts in vivo. (2) We will create high-resolution Hi-C maps of mammalian genomes, comprehensively mapping promoter-enhancer contacts and exploring large-scale organizational features such as transcription factories. (3) We will develop new analytical approaches that combine the data produced by (1) and (2) with new (a) informatic tools, (b) computational analyses, (c) physical simulations, and (d) rigorous theoretical methods. We will characterize how physical interactions change during differentiation and tumorigenesis; identify the RNAs, proteins and pathways that that are most crucial in regulating genome folding, and produce detailed physical models of these pathways and how they modulate the physical structure of the genome. We plan to initially apply these techniques to characterize murine ES cells differentiating down a neural lineage, and later to differentiating human ES cells and to primary tumors. This effort will produce powerful new molecular methods which will dramatically improve our ability to assess the spatial arrangement of cellular components. It will transform our understanding of how mammalian genomes fold inside the nucleus. It will reveal how specific physical interactions between DNA, RNA, and protein play a role in differentiation, tumorigenesis, and genome folding, and suggest new drug targets in the process. Finally, this work will generate a series of datasets that will serve as valuable resources for the scientific community as a whole. Public Health Relevance: Biological systems contain a large number of components whose physical interactions bring about cellular processes, but our tools for identifying many of these biomolecular interactions are laborious and slow. We recently developed the Hi-C method for reconstructing the architecture of the human genome, and will extend this technological approach to map interactions between DNA, RNA, and protein in vivo and at high-throughput. We will use these maps to study how genome folding regulates cell function, and to characterize the process of cellular differentiation and tumorigenesis, identifying crucial biomolecular pathways and potential drug targets.