Hi-C pairs 文件格式

发布时间 2023-04-06 23:45:55作者: emanlee

Hi-C pairs 文件格式

## pairs format v1.0
#sorted: chr1-chr2-pos1-pos2
#shape: upper triangle
#chromsize: chr1 248956422
#chromsize: chr2 242193529
#chromsize: chr3 198295559
#chromsize: chr4 190214555
#chromsize: chr5 181538259
#chromsize: chr6 170805979
#chromsize: chr7 159345973
#chromsize: chr8 145138636
#chromsize: chr9 138394717
#chromsize: chr10 133797422
#chromsize: chr11 135086622
#chromsize: chr12 133275309
#chromsize: chr13 114364328
#chromsize: chr14 107043718
#chromsize: chr15 101991189
#chromsize: chr16 90338345
#chromsize: chr17 83257441
#chromsize: chr18 80373285
#chromsize: chr19 58617616
#chromsize: chr20 64444167
#chromsize: chr21 46709983
#chromsize: chr22 50818468
#chromsize: chrX 156040895
#chromsize: chrY 57227415
#chromsize: chrM 16569
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2 frag1 frag2
.    chr1    1    chr1    51659    -    -    1    98
.    chr1    1    chr1    73925    -    -    0    152
.    chr1    1    chr1    184432    -    -    1    437
.    chr1    1    chr1    443977    -    -    1    848
.    chr1    1    chr1    509430    -    +    1    992
.    chr1    1    chr1    631351    -    +    1    1194
.    chr1    1    chr1    632024    -    +    1    1195
.    chr1    1    chr1    632032    -    +    1    1195

 

 

Long format

The long format is used by Juicer and takes in directly the merged_nodups.txt file. A whitespace separated file that contains, on each line
<str1> <chr1> <pos1> <frag1> <str2> <chr2> <pos2> <frag2> <mapq1> <cigar1> <sequence1> <mapq2> <cigar2> <sequence2> <readname1> <readname2>

    • str = strand (0 for forward, anything else for reverse)
    • chr = chromosome (must be a chromosome in the genome)
    • pos = position
    • frag = restriction site fragment
    • mapq = mapping quality score
    • cigar = cigar string as reported by aligner
    • sequence = DNA sequence If not using the restriction site file option, frag will be ignored, but please see above note on dummy values. If not using mapping quality filter, mapq will be ignored. readname, strand, cigar, and sequence are also not currently stored within .hic files.

 

 

REF

https://github-wiki-see.page/m/jianlin-cheng/GenomeFlow/wiki/Data-Format