Feb 04, 2020

Public workspacePhylogenetic tree and ancestral sequence reconstruction

  • 1University of California, Santa Barbara
  • Santoro Lab @ UCSB
Icon indicating open access to content
QR code linking to this content
Protocol CitationMatthew Kellom 2020. Phylogenetic tree and ancestral sequence reconstruction. protocols.io https://dx.doi.org/10.17504/protocols.io.6qrhdv6
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: In development
We are still developing and optimizing this protocol
Created: August 21, 2019
Last Modified: February 04, 2020
Protocol Integer ID: 27121
Keywords: ancestral, sequence, ASR, Phylogenetic tree,
Abstract
Protocol for the alignment, tree building, and ancestral sequence reconstruction of metagenomic protein coding sequences.
Guidelines
This protocol was used specifically for AMT ammonium transporter sequences from TARA Oceans and HOT/ALOHA datasets.
Materials
Programs used:
DIAMOND
TranslatorX
MAFFT
RAxML
IQ-TREE
MrBayes
FastML (Web)
FigTree
Before start
Make sure all of metagenome sequences are in FASTA nucleotide format. Target amino acid sequences should also be in FASTA format.
Obtain protein-coding metagenome FASTA files.
Obtain a set of amino acid sequences that are known/trusted target homologs (Uniprot works well for this).
Use DIAMOND (blastx can also be used but will be slower for large datasets) to search for target hits from within the metagenome sequences. Recommend an e-value cutoff of at least 1e-20.
Translate protein-coding target hits to amino acid sequences with TranslatorX.
Use DIAMOND (or BLAST) to annotate hits against a local copy of the NCBI nr database.
Choose taxa of interest, and select sequences of appropriate length for the protein from the metagenome hits.
Align the collected sequences and closely related complete outgroup sequence(s) with MAFFT. Recommend settings of --maxiterate 1000 --localpair
Make tree with RAxML. Recommend settings -f a -k -m PROTGAMMAAUTO

Inspect the tree to see if it makes sense. If a node looks out of place, consult the alignment and figure out why and if you want to keep it in the dataset.
Check RAxML amino acid substition model and branch support with IQTree. Recommend settings -m TEST -nt AUTO -alrt 10000
Branch support with MR Bayes. Will need to create a nexus and mbatch file. Recommend mbatch settings:
lset nst=6 rates invgamma;
prset aamodelpr=fixed(lg); (The model choice here should match the model that RAxML and IQTree select.
mcmcp ngen=1000000 nruns=2 nchains=4 samplefreq=100 printfreq=10000 relburnin=yes burninfrac=0.25;
mcmc;
sumt;
sump;
Submit the MAFFT alignment and RAxML best_tree to the FastML web portal (http://fastml.tau.ac.il/). You can use the local version of this but I found it easier to just use the web portal. Be sure to select Amino Acids and the correct model from the tree building. Use Gamma Distribution and Maximum Liklihood. Deselect Optimize Branch Lengths and joint reconstruction.
Obtain the 25 most probable ancestral sequences at the nodes of interest from the FastML results page and visualize the tree results in FigTree.