Jul 07, 2020

Public workspaceTaxonomic classification of IonTorrent-sequenced 16S amplicon sequences

  • 1Cardiff University
Icon indicating open access to content
QR code linking to this content
Protocol CitationLaura Espina 2020. Taxonomic classification of IonTorrent-sequenced 16S amplicon sequences. protocols.io https://dx.doi.org/10.17504/protocols.io.bh8pj9vn
Manuscript citation:
Espina L (2020) An approach to increase the success rate of cultivation of soil bacteria based on fluorescence-activated cell sorting. PLoS ONE 15(8): e0237748. doi: 10.1371/journal.pone.0237748
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: July 04, 2020
Last Modified: July 07, 2020
Protocol Integer ID: 38895
Disclaimer
Protocol using QIIME2 pipeline, v2019-4 (doi: 10.1038/s41587-019-0209-9) and the Naïve Bayes classifier implemented in the q2-feature-classifier plugin (doi: 10.1186/s40168-018-0470-z). This classifier was trained on the Greengenes 13_8 99% OTU database (doi: 10.1038/ismej.2011.139). R v3.6.1 environment was used with Phyloseq v1.29.0 (doi: 10.1371/journal.pone.0061217).
Pre-processing
Pre-processing

Software
Torrent Suite
NAME
Thermo Fisher Scientific
DEVELOPER
Demultiplex sequences. Quality-filter sequences. Tim barcodes and adapters. Export as fastq files (1 fastq file per sample).

Note
Example of fastq files at this point:
Download fcm1_1_L001_R2_001.fastqfcm1_1_L001_R2_001.fastq
Download fcm2_2_L001_R2_001.fastqfcm2_2_L001_R2_001.fastq


Import fastq files.
Trim primers.
Remove sequences shorter than 100 bp.
Suggested software:
Software
Geneious
NAME
Biomatters Ltd
DEVELOPER
(or python scripts).

(Optional) Check the Phred quality score of the sequences.
Suggested software:
Software
FastQC
NAME
Simon Andrews
DEVELOPER

On Qiime2
On Qiime2
To train a Naïve Bayes classifier: Download the Greengenes 13_8 99% OTU database. Upload the files "99_otus.fasta" and "99_otu_taxonomy.txt" onto a folder accesible by the Qiime2 environment. With Qiime2 activated and inside the folder containing those files, type in the command line:
>qiime tools import --type 'FeatureData[Sequence]' --input-path 99_otus.fasta --output-path otus.qza
>qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat --input-path 99_otu_taxonomy.txt --output-path taxonomy.qza
>qiime feature-classifier extract-reads --i-sequences otus.qza --p-f-primer AGAGTTTGATCMTGGCTCAG --p-r-primer CYNACTGCTGCCTCCCGTAG --o-reads ref-seqs.qza
>qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads ref-seqs.qza --i-reference-taxonomy taxonomy.qza --o-classifier classifierp.qza
Expected result
The artifact "classifier.qza" can be copied to any other folder for the following steps.
Example of classifier trained on the Greengenes 99% OTU database for the V1-V2 region of the 16S rRNA gene:
Download classifier.qzaclassifier.qza


To perform taxonomy classification: In a new folder, upload the artifact "classifier.qza" and the file with the metadata corresponding to the fastq files in the correct format. Example of file:
Download metadata.tsvmetadata.tsv
Also create a subfolder "fastq" containing the compressed fastq files in the correct nominal format (eg. "fcm1_1_L001_R1_001.fastq.qz" and "fcm2_2_L001_R1_001.fastq.qz") and the classifier artifact.
In the command line, type in:
>qiime tools import --type 'SampleData[SequencesWithQuality]' --input-path fastq --input-format CasavaOneEightSingleLanePerSampleDirFmt --output-path demux-single-end.qza
>qiime vsearch dereplicate-sequences --i-sequences demux-single-end.qza --o-dereplicated-table dereplicatedtable.qza --o-dereplicated-sequences rep-seqs.qza
>qiime feature-classifier classify-sklearn --i-classifier classifier.qza --i-reads rep-seqs.qza --o-classification taxonomy.qza
>qiime taxa barplot --i-table dereplicatedtable.qza --i-taxonomy taxonomy.qza --m-metadata-file sample-metadata.tsv --o-visualization taxa-bar-plots.qzv
Expected result
The artifact "taxa-bar-plots.qzv" can be interactively visualized in https://view.qiime2.org/
Example of artifact:
Download taxa-bar-plots.qzvtaxa-bar-plots.qzv


The components of the biom table (the OTU abundance table and the taxonomy table) can be obtained from the "taxa-bar-plots.qzv" artifact or with the following commands:
>qiime tools export --input-path dereplicatedtable.qza --output-path exported-feature-table
>cd exported-feature-table
>biom convert -i feature-table.biom -o otutable.tsv --to-tsv
>qiime tools export --input-path taxonomy.qza --output-path taxonomy

Expected result
The files "taxonomy.tsv" and "otutable.tsv" should be downloaded and can be manually converted into .csv files and adapted for the analysis with R. The additional file "metadata.csv" is also necessary.
Example of files:
Download taxonomy.csvtaxonomy.csv Download otutable.csvotutable.csv Download metadata.csvmetadata.csv


Analysis with R
Analysis with R
Within R and activating the phyloseq package, different types of analysis can be easily performed.

The first step is the creation of the phyloseq object using the type of files shown in step 5.5:
> otu_table <- read.csv("otutable.csv",sep=",",row.names=1)
> tax_table <- read.csv("taxonomy.csv",sep=",",row.names=1)
> OTU <- otu_table(otu_table,taxa_are_rows=TRUE)
> tax_matrix<- as.matrix(tax_table)
> TAX <- tax_table(tax_matrix)
> metadata <- read.csv("metadata.csv",sep=",",row.names=1)
> meta <- sample_data(metadata)
> fcm <- phyloseq(OTU, TAX, meta)

Basic operations such as tax_glom are used to prepare the data for further analyses. For example, with the following commands a table containing the taxonomic abundance for the families present in each sample is created:
> fcm.family <- tax_glom(fcm,"Family", NArm= FALSE)
> fcm.family.m <- psmelt(fcm.family)
> write.table(x=fcm.family.m, sep = ",", file = "fcm_family.csv")

Expected result
At this point a table with the taxonomic abundance of each family is produced. Example of file:
Download fcm_family.csvfcm_family.csv