Taxonomic classification of IonTorrent-sequenced 16S amplicon sequences

Laura Espina

Jul 07, 2020

Taxonomic classification of IonTorrent-sequenced 16S amplicon sequences

PLOS One

DOI

dx.doi.org/10.17504/protocols.io.bh8pj9vn

Laura Espina¹

¹Cardiff University

Laura Espina

DOI: dx.doi.org/10.17504/protocols.io.bh8pj9vn

External link: https://doi.org/10.1371/journal.pone.0237748

Protocol Citation: Laura Espina 2020. Taxonomic classification of IonTorrent-sequenced 16S amplicon sequences. protocols.io https://dx.doi.org/10.17504/protocols.io.bh8pj9vn

Manuscript citation:

Espina L (2020) An approach to increase the success rate of cultivation of soil bacteria based on fluorescence-activated cell sorting. PLoS ONE  15(8): e0237748. doi: 10.1371/journal.pone.0237748

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: July 04, 2020

Last Modified: July 07, 2020

Protocol Integer ID: 38895

Disclaimer

Protocol using QIIME2 pipeline, v2019-4 (doi: 10.1038/s41587-019-0209-9) and the Naïve Bayes classifier implemented in the q2-feature-classifier plugin (doi: 10.1186/s40168-018-0470-z). This classifier was trained on the Greengenes 13_8 99% OTU database (doi: 10.1038/ismej.2011.139). R v3.6.1 environment was used with Phyloseq v1.29.0 (doi: 10.1371/journal.pone.0061217).

Pre-processing

Software
Torrent Suite
NAME
Thermo Fisher Scientific
DEVELOPER
Demultiplex sequences. Quality-filter sequences. Tim barcodes and adapters. Export as fastq files (1 fastq file per sample).

Note
Example of fastq files at this point:
fcm1_1_L001_R2_001.fastq  
fcm2_2_L001_R2_001.fastq  

Import fastq files.
Trim primers.
Remove sequences shorter than 100 bp.
Suggested software:
Software
Geneious
NAME
Biomatters Ltd
DEVELOPER
(or python scripts).

(Optional) Check the Phred quality score of the sequences.
Suggested software: 
Software
FastQC
NAME
Simon Andrews
DEVELOPER

On Qiime2

To train a Naïve Bayes classifier: Download the Greengenes 13_8 99% OTU database. Upload the files "99_otus.fasta" and "99_otu_taxonomy.txt" onto a folder accesible by the Qiime2 environment. With Qiime2 activated and inside the folder containing those files, type in the command line:

>qiime tools import --type 'FeatureData[Sequence]' --input-path 99_otus.fasta --output-path otus.qza

>qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat --input-path 99_otu_taxonomy.txt --output-path taxonomy.qza

>qiime feature-classifier extract-reads --i-sequences otus.qza --p-f-primer AGAGTTTGATCMTGGCTCAG --p-r-primer CYNACTGCTGCCTCCCGTAG --o-reads ref-seqs.qza

>qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads ref-seqs.qza --i-reference-taxonomy taxonomy.qza --o-classifier classifierp.qza
Expected result
The artifact "classifier.qza" can be copied to any other folder for the following steps.
Example of classifier trained on the Greengenes 99% OTU database for the V1-V2 region of the 16S rRNA gene: 
classifier.qza  

To perform taxonomy classification: In a new folder, upload the artifact "classifier.qza" and the file with the metadata corresponding to the fastq files in the correct format. Example of file:
metadata.tsv  
Also create a subfolder "fastq" containing the compressed fastq files in the correct nominal format (eg. "fcm1_1_L001_R1_001.fastq.qz" and "fcm2_2_L001_R1_001.fastq.qz") and the classifier artifact.
In the command line, type in:

>qiime tools import --type 'SampleData[SequencesWithQuality]' --input-path fastq --input-format CasavaOneEightSingleLanePerSampleDirFmt --output-path demux-single-end.qza

>qiime vsearch dereplicate-sequences --i-sequences demux-single-end.qza --o-dereplicated-table dereplicatedtable.qza --o-dereplicated-sequences rep-seqs.qza

>qiime feature-classifier classify-sklearn --i-classifier classifier.qza --i-reads rep-seqs.qza --o-classification taxonomy.qza

>qiime taxa barplot --i-table dereplicatedtable.qza --i-taxonomy taxonomy.qza --m-metadata-file sample-metadata.tsv --o-visualization taxa-bar-plots.qzv
Expected result
The artifact "taxa-bar-plots.qzv" can be interactively visualized in https://view.qiime2.org/
Example of artifact:
taxa-bar-plots.qzv  

The components of the biom table (the OTU abundance table and the taxonomy table) can be obtained from the "taxa-bar-plots.qzv" artifact or with the following commands:
>qiime tools export --input-path dereplicatedtable.qza --output-path exported-feature-table
>cd exported-feature-table
>biom convert -i feature-table.biom -o otutable.tsv --to-tsv
>qiime tools export --input-path taxonomy.qza --output-path taxonomy

Expected result
The files "taxonomy.tsv" and "otutable.tsv" should be downloaded and can be manually converted into .csv files and adapted for the analysis with R. The additional file "metadata.csv" is also necessary.
Example of files:
taxonomy.csv otutable.csv metadata.csv    

Analysis with R

Within R and activating the phyloseq package, different types of analysis can be easily performed.

The first step is the creation of the phyloseq object using the type of files shown in step 5.5:
 
> otu_table <- read.csv("otutable.csv",sep=",",row.names=1)
> tax_table <- read.csv("taxonomy.csv",sep=",",row.names=1)
> OTU <- otu_table(otu_table,taxa_are_rows=TRUE)
> tax_matrix<- as.matrix(tax_table)
> TAX <- tax_table(tax_matrix)
> metadata <- read.csv("metadata.csv",sep=",",row.names=1)
> meta <- sample_data(metadata)
> fcm <- phyloseq(OTU, TAX, meta)

Basic operations such as tax_glom are used to prepare the data for further analyses. For example, with the following commands a table containing the taxonomic abundance for the families present in each sample is created:
> fcm.family <- tax_glom(fcm,"Family", NArm= FALSE)
> fcm.family.m <- psmelt(fcm.family)
> write.table(x=fcm.family.m, sep = ",", file = "fcm_family.csv")

Expected result
At this point a table with the taxonomic abundance of each family is produced. Example of file:
fcm_family.csv  

Public workspaceTaxonomic classification of IonTorrent-sequenced 16S amplicon sequences

Taxonomic classification of IonTorrent-sequenced 16S amplicon sequences