In this section we will focus on the taxonomic classification of shotgun metagenomic reads using two different tools: Kraken 2 and Kaiju. We will use the data obtained in the data retrieval section.
Approach 1: Kraken 2¶
Before we can use Kraken 2, we need to build or download a database. We will use the build-kraken-db action to fetch the PlusPF database
from here - this database covers RefSeq sequences for archaea, bacteria, viral, plasmid,
human, UniVec_Core, protozoa and fungi.
mosh annotate build-kraken-db \
--p-collection pluspf \
--o-kraken2-db cache:kraken2_db \
--o-bracken-db cache:bracken_db \
--verboseWe can now use the classify-kraken2 command to run Kraken2 using the paired-end reads as a query and the PlusPF database retrieved in the previous step:
mosh annotate classify-kraken2 \
--i-seqs cache:reads_filtered \
--i-db cache:kraken2_db \
--p-threads 72 \
--p-confidence 0.5 \
--p-memory-mapping False \
--p-report-minimizer-data \
--o-reports cache:kraken_reports_reads \
--o-outputs cache:kraken_hits_reads \
--verbosemosh annotate estimate-bracken \
--i-kraken2-reports cache:kraken_reports_reads \
--i-db cache:bracken_db \
--p-threshold 5 \
--p-read-len 150 \
--o-taxonomy cache:bracken_taxonomy \
--o-table cache:bracken_ft \
--o-reports cache:bracken_reports \
--verboseTo remove the unclassified read fraction we can use the filter-table action from the q2-taxa QIIME 2 plugin:
mosh taxa filter-table \
--i-table cache:bracken_ft \
--i-taxonomy cache:bracken_taxonomy \
--p-exclude Unclassified \
--o-filtered-table cache:bracken_ft_filteredApproach 2: Kaiju¶
Similarly to Kraken 2, Kaiju requires a reference database to perform taxonomic classification. We will use the fetch-kaiju-db
action to download the nr_euk database that includes both
prokaryotes and eukaryotes (more info on the taxa here).
mosh annotate fetch-kaiju-db \
--p-database-type nr_euk \
--o-db cache:kaiju_nr_euk \
--verboseWe run Kaiju with the confidence of 0.1 using the paired-end reads as a query and the database artifact that was generated in the previous step:
mosh annotate classify-kaiju \
--i-seqs cache:reads_paired \
--i-db cache:kaiju_nr_euk \
--p-z 16 \
--p-c 0.1 \
--o-taxonomy cache:kaiju_taxonomy \
--o-abundances cache:kaiju_ft \
--verboseFinally, we filter the table to remove the unclassified reads:
mosh taxa filter-table \
--i-table cache:kaiju_ft \
--i-taxonomy cache:kaiju_taxonomy \
--p-exclude unclassified,belong,cannot \
--o-filtered-table cache:kaiju_ft_filtered \
--verboseVisualization¶
You can try to generate a taxa bar plot with either of these results now! We will continue with the Kaiju results - to generate a taxa bar plot, you can run:
mosh taxa barplot \
--i-table cache:kaiju_ft_filtered \
--i-taxonomy cache:kaiju_taxonomy \
--m-metadata-file metadata.tsv \
--o-visualization results/kaiju_barplot.qzvYour visualization should look similar to this one.