The 4DN Hi-C data processing pipeline produces contacts lists and contact matrices. The contact matrices show signals of genomic compartments (Liberman-Aiden et al. 2009) and local regions of enriched intra-contacts. The local contact enrichments have been interpreted to represent the existence of topologically associating domains, or TADs (Dixon et al. 2012). Two domain calling workflows in the 4DN Data Portal utilize the contact matrices to report compartments and local contact enrichment regions. Eigenvector decomposition of the matrix can be used to identify active (A) and inactive (B) compartments. And dips in the insulation score along the diagonal of the matrix define boundaries between high intra-contact domains (Crane et al. 2015). To perform the domain calls, we utilize the cooltools library which implements algorithms that are commonly utilized in literature on contact matrices in the cooler format.
Lieberman-Aiden, E., Berkum, N., Williams, L. et al. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 326, 289-293 (2009). https://doi.org/10.1126/science.1181369
Dixon, J., Selvaraj, S., Yue, F. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012). https://doi.org/10.1038/nature11082
Crane, E., Bian, Q., McCord, R. et al. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523, 240–244 (2015). https://doi.org/10.1038/nature14450
The top plots represent the number of boundaries obtained after a minimum boundary strength score is chosen as threshold. The bottom plots represent the proportion of boundaries within 5kb distance from a CTCF region after a minimum boundary strength score is chosen as threshold. 0.2 and 0.5 were selected as the weak and strong boundary thresholds respectively.
The top plots represent the boundary count for Micro-C set, Hi-C set and their overlap (within a 10kb distance) and the bottom represents the proportion of the overlap set to the number of boundaries called. The left plots are the results for a dataset with 2.5 billion reads (