宏基因组：KenKra2 注释”真菌“（自整理，详细，全网唯一）-526互联

(Newbase) [wz@localhost ~]$ kraken2-build -h
Usage: kraken2-build [task option] [options]

Task options (exactly one must be selected):
  --download-taxonomy        Download NCBI taxonomic information
  --download-library TYPE    Download partial library
                             (TYPE = one of "archaea", "bacteria", "plasmid",
                             "viral", "human", "fungi", "plant", "protozoa",
                             "nr", "nt", "UniVec", "UniVec_Core")
  --special TYPE             Download and build a special database
                             (TYPE = one of "greengenes", "silva", "rdp")
  --add-to-library FILE      Add FILE to library
  --build                    Create DB from library
                             (requires taxonomy d/l'ed and at least one file
                             in library)
  --clean                    Remove unneeded files from a built database
  --standard                 Download and build default database
  --help                     Print this message
  --version                  Print version information

Options:
  --db NAME                  Kraken 2 DB name (mandatory except for
                             --help/--version)
  --threads #                Number of threads (def: 1)
  --kmer-len NUM             K-mer length in bp/aa (build task only;
                             def: 35 nt, 15 aa)
  --minimizer-len NUM        Minimizer length in bp/aa (build task only;
                             def: 31 nt, 12 aa)
  --minimizer-spaces NUM     Number of characters in minimizer that are
                             ignored in comparisons (build task only;
                             def: 7 nt, 0 aa)
  --protein                  Build a protein database for translated search
  --no-masking               Used with --standard/--download-library/
                             --add-to-library to avoid masking low-complexity
                             sequences prior to building; masking requires
                             dustmasker or segmasker to be installed in PATH,
                             which some users might not have.
  --max-db-size NUM          Maximum number of bytes for Kraken 2 hash table;
                             if the estimator determines more would normally be
                             needed, the reference library will be downsampled
                             to fit. (Used with --build/--standard/--special)
  --use-ftp                  Use FTP for downloading instead of RSYNC; used with
                             --download-library/--download-taxonomy/--standard.
  --skip-maps                Avoids downloading accession number to taxid maps,
                             used with --download-taxonomy.
  --load-factor FRAC         Proportion of the hash table to be populated
                             (build task only; def: 0.7, must be
                             between 0 and 1).
  --fast-build               Do not require database to be deterministically
                             built when using multiple threads.  This is faster,
                             but does introduce variability in minimizer/LCA
                             pairs.  Used with --build and --standard options.

这是Kraken 2的帮助文档，它是一个用于快速、准确地分类宏基因组或宏转录组中的序列的工具。以下是对该帮助文档的解释：

任务选项 (至少选择一个):

--download-taxonomy: 从NCBI下载分类学(taxonomy)信息。
--download-library TYPE: 下载部分数据库库。TYPE可以是"archaea", "bacteria", "plasmid", "viral", "human", "fungi", "plant", "protozoa", "nr", "nt", "UniVec", "UniVec_Core"中的一个。
--special TYPE: 下载并构建特殊的数据库，比如"greengenes", "silva", "rdp"。
--add-to-library FILE: 将文件FILE添加到库中。
--build: 从库中创建数据库（需要先下载分类学信息并在库中至少有一个文件）。
--clean: 从已构建的数据库中删除不需要的文件。
--standard: 下载并构建默认数据库。
--help: 打印帮助消息。
--version: 打印版本信息。

选项:

--db NAME: Kraken 2数据库名称（除非使用--help/--version，否则是必须的）。
--threads #: 线程数量，默认为1。
--kmer-len NUM: K-mer的长度，仅在构建任务中使用；默认是35 nt, 15 aa。
--minimizer-len NUM: 最小化长度，仅在构建任务中使用；默认是31 nt, 12 aa。
--minimizer-spaces NUM: 最小化中忽略比较的字符数，仅在构建任务中使用；默认是7 nt, 0 aa。
--protein: 构建用于翻译搜索的蛋白质数据库。
--no-masking: 用于--standard/--download-library/ --add-to-library来避免在构建前屏蔽低复杂度的序列；屏蔽需要在PATH中安装dustmasker或segmasker。
--max-db-size NUM: Kraken 2哈希表的最大字节数；如果估算器确定通常需要更多，则会对参考库进行降采样以适应。
--use-ftp: 使用FTP而不是RSYNC下载；与--download-library/--download-taxonomy/--standard一起使用。
--skip-maps: 避免下载accession number到taxid的映射，与--download-taxonomy一起使用。
--load-factor FRAC: 哈希表被填充的比例，仅在构建任务中使用；默认是0.7，必须在0到1之间。
--fast-build: 当使用多线程时，不要求数据库确定性地构建。这更快，但确实在minimizer/LCA对中引入了变化。与--build和--standard选项一起使用。

简而言之，这些选项允许用户下载不同的参考数据库，添加自己的序列，构建和优化Kraken 2数据库，并使用多线程等功能进行优化。

接下来，下载并构建Kraken 2的真菌（fungi）数据库

1.下载分类学信息:
这是构建任何Kraken 2数据库所需的基本步骤。
kraken2-build --download-taxonomy --db [DB_NAME]
2.下载真菌数据库:
kraken2-build --download-library fungi --db [DB_NAME]
3.构建数据库:
kraken2-build --build --db [DB_NAME]


可以通过在上述命令中使用--threads选项指定更多的线程，例如--threads 20。

注意：确保系统有足够的存储空间，因为下载的数据和构建的数据库可能占用大量的磁盘空间。

通常，Kraken2的数据库都有一个主目录（在我的例子中是/home/wz/Kraken2/fungi/），其中包含taxonomy和library子目录。

/home/wz/Kraken2/fungi/taxonomy 是用于存储从NCBI下载的分类学信息的目录。
/home/wz/Kraken2/fungi/library/fungi 是存放真菌序列数据的目录。

要构建数据库，应该在/home/wz/Kraken2目录下运行命令，并使用fungi作为数据库名称：

kraken2-build --build --db fungi --threads [NUMBER_OF_THREADS]

构建完成后开始注释会遇到的错误：

bracken -d /home/wz/Kraken2/ fungi -i ./test.report -o ./test.S.bracken -w ./test.S.bracken.report -r 150 -l S
 >> Checking for Valid Options...
 ERROR: /home/wz/Kraken2/database100mers.kmer_distrib does not exist
        Run bracken-build to generate the kmer distribution file.

错误消息指出Bracken期望找到一个名为database100mers.kmer_distrib的文件，但是这个文件不存在。kmer_distrib文件是Bracken运行所必需的，它包含了数据库中k-mer的分布信息，这对于Bracken估算物种丰度非常关键。

根据错误消息，需要运行bracken-build来生成这个文件。这个步骤是在构建Kraken数据库之后，为Bracken特别进行的构建步骤。

这里是一般的bracken-build命令的格式：

bracken-build -d [Kraken数据库的路径] -t [线程数] -k [Kraken使用的k-mer大小] -l [阅读长度]

在当前的情况下，可以尝试使用以下命令来生成所需的k-mer分布文件（确保替换[THREADS]为您想要使用的线程数，[KMER_SIZE]为您在构建Kraken数据库时使用的k-mer大小，通常是35）：

bracken-build -d /home/wz/Kraken2/fungi -t [THREADS] -k [KMER_SIZE] -l 150

完成之后，应该能够运行之前的Bracken命令而不会遇到错误。要确保Bracken版本与你的Kraken版本兼容，因为不匹配的版本可能会导致错误。

526互联

宏基因组：KenKra2 注释”真菌“（自整理，详细，全网唯一 ）

宏基因组：KenKra2 注释”真菌“（自整理，详细，全网唯一）