linux中awk命令对fastq格式的碱基质量体系进行判断

发布时间 2023-09-30 22:18:22作者: 小鲨鱼2018

 

001、

[root@pc1 test]# ls
a.fastq
[root@pc1 test]# head -n 4 a.fastq        ## 测试fastq格式数据
@SRR12342886.1 1/1
TCTTCAAAAATTTCTCACAGCTTGTTGTGATCCACACAGTCAAAGGCTTTAAGTGTAGTCAGTGAAGCAGAAGTGGATATTTTTCTGGAATTCCCTTGCTTTCTCTGTGATCCAAGGGATTTGATCTCTGGTTCCTCTGCTTTTTCTAAAC
+
FFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFF:F
[root@pc1 test]# head -n 12 a.fastq | awk '{if(NR%4==0) printf("%s",$0);}' | od -A n -t u1 -v | awk 'BEGIN{min=100;max=0;} {for(i=1;i<=NF;i++) {if($i>max) max=$i; if($i<min) min=$i;}} END {if(max<=126 && min<59) print "Phred33"; else if(max>73 && min>=64) print "Phred64"; else if(min>=59 && min<64 && max>73) print "Solexa64"; else print "Unknown score encoding";}'
Phred33                   ## 判断程序,逻辑是将碱基质量值转换为数值,然后对质量值数值的区间进行计算,利用碱基质量的范围进行判断

 

 

。 

参考:https://mp.weixin.qq.com/s?__biz=Mzg4NzA4MzUxOA==&mid=2247486721&idx=1&sn=c268b78f600d9acbe25831a62a47df12&chksm=cf8e9590f8f91c861157fe3dcbc8439826ba8134f7de67515de23409d746c6d34892695fcfee&cur_album_id=3101294931740213257&scene=189#wechat_redirect