Mitogenome (or other Multi-gene) Maximum Likelihood Phylogenetic Analyses

Dakota Betz

Jun 12, 2025

Mitogenome (or other Multi-gene) Maximum Likelihood Phylogenetic Analyses

DOI

https://dx.doi.org/10.17504/protocols.io.bp2l613kzvqe/v1

Dakota Betz¹

¹ucsd

Rouse Lab

Sonja Huc

UNCW

DOI: https://dx.doi.org/10.17504/protocols.io.bp2l613kzvqe/v1

Protocol Citation: Dakota Betz 2025. Mitogenome (or other Multi-gene) Maximum Likelihood Phylogenetic Analyses . protocols.io https://dx.doi.org/10.17504/protocols.io.bp2l613kzvqe/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: April 29, 2022

Last Modified: June 12, 2025

Protocol Integer ID: 61648

Keywords: maximum likelihood phylogenetic analysis, maximum likelhood whole mitogenome, whole mitogenome, mitogenome, analyses in raxml, raxml, doing maximum likelhood

Abstract

A guide to doing maximum likelhood whole mitogenome (or multigene) analyses in RaXML. This was written after Avery's instructions!

Create text files in FASTA format with all of each of the genes you want to use (i.e. Pilargidae_12S.txt, Gastropods_ATP6.txt, etc.). Make sure the same organism has the same name for ALL of the genes (spaces included - there should be none, but sometimes they sneak up on you).

Open Mesquite, create a new file, make sure to check “Make Character Matrix” and then select “DNA Data” as the type of data. Drag any one of your gene text files into it and import as FASTA, click “Don’t Adjust”. Delete the top empty taxa that were there by default.

For non-protein-coding genes (12S, 16S) or if you're doing a nucleotide only analysis, align the sequences using MUSCLE (Click on “Matrix”, “Align Multiple Sequences”, then “MUSCLE Align”. Select “No” for running on multiple threads, and then click okay. Check to see if everything is fine.), then export in phylip (DNA/RNA) format. Repeat this for all non-protein-coding genes, or all genes for a nucleotide only analysis.

If you're doing an amino acid only or nucleotide for non-protein-coding genes and amino acid for protein-coding genes analysis, follow these next steps:

Create a nexus file like before, DO NOT yet align it, save it, then duplicate it. Name the duplicated file something like “Gastropods_ATP6_NT.nex”. The file you have open can stay “Gastropods_ATP6.nex” or be something like “Gastropods_ATP6_AA.nex”. As long as you know which one will be the nucleotide only and which one will be the protein only file.

2. Click on “Character Matrix” on the left sidebar and select “List & Manage Characters”

In the toolbar, select “Columns”, then “Current Genetic Codes”.

Click on the “Genetic Code” tab in your file window, then under “Genetic Code” choose Invertebrate Mitochondrial (or whatever applies, but in our lab it should mostly be that one). Make sure you actually selected the column, so the values all change from displaying “Standard” to “Invertebrate Mitochondrial”.

Click on “Codon Position”, select “Minimize Stop Codons”. When you do this, MAKE SURE that the column is selected! The values should change from “N” to numbers 1-3, depending on the position.

Your window should look like this:

In the toolbar, go to “Characters”, select “Make New Matrix From”, and then “Translate DNA to Protein”. Click “OK”. This should open a new tab with your data displayed as amino acids. You should only have stop codons (black with asterisks) at the ends of your sequences if they are complete, but sometimes if the sequence doesn’t start on the first codon, there may be more stop codons throughout. This only means this one sequence needs to be removed, not that all of the others are wrong, too. They’re fine!

It’s a good idea to save your file here, in case you run into issues in this next step, because Mesquite doesn’t have a useful undo feature. Go to the end of the sequences. Delete any empty columns, and then delete all of the stop codons, because they interfere with the RaXML analysis. You should be able to delete the stop codons by just selecting them and deleting them using the “Select” tool, but if that doesn’t work for some reason, you can use the “Edit” tool and click on each stop codon and type “-“ instead of “*” and the end result is the same. Make sure you get ALL of them! Even those from shorter sequences that may be hiding more towards the beginning of the sequences.

You go from this:

To this:

You want a file with just the nucleotides and a file with just the amino acids to not confuse RaXML. Because you duplicated this file in the beginning and already have a nucleotide version, go ahead and delete the character matrix from the side bar by clicking on it, then selecting “Delete Matrix”. You should be left with just the “Protein Translation of Character Matrix” tab.

Now align the sequences. Click on “Matrix”, “Align Multiple Sequences”, then “MUSCLE Align”. Select “No” for running on multiple threads, and then click okay. Check to see if everything is fine (depending on how related your organisms are, the alignment might also look nicer or not as nice). Then, delete any empty columns at the end of the alignment. Save the file.

Export as a Phylip (protein) file. MAKE SURE it says “protein” in parentheses next to it! You can name it something like “Gastropods_ATP6_AA.phy”. Change the number for “Maximum length of taxon names” so that you will get the whole name of the sequence in the phylip file. I usually put something large like 100 or 200 just to be safe.

You might get an error saying “Sorry, this data matrix can't be exported to this format (some character states aren't represented by a single symbol [char. 91, taxon 24])”. In that case, find the character and change the value (which will be something like “W/M/V/C”) to a “?” using the “Edit” tool. Make sure to save the file again before retrying to export.

Repeat this for every protein coding gene you want to use amino acids for.

Open RaXML GUI and load all of your alignments - the two non protein coding gene nucleotide phylip files, and the protein coding gene amino acid phylip files (or whatever applies for your analysis). Drag the first one in, then select the rest of them with the “ADD ALIGNMENT” feature.

Note
You can opt to do this part on the lab computer remotely, if you want to spare your computer! I will either do that, or have the analysis run overnight on my laptop. Depending on the capacity of your computer, this might take a really long time, it only takes a few hours on the lab computer.

Select “RUN MODELTEST” for all of them (each gene you've added should have a separate option to click "RUN MODELTEST"). Make sure you have 15 genes if you’re doing a mitogenome analysis, and make sure you have really ran the model test function for all of them.

Set the analysis to “ML + thorough bootstrap + consensus”, set Runs to 10, and keep Reps. at 100 (optionally, you can increase both numbers). Leave "<none>" as the outgroup and whatever seed RaXML decided on.

Run the analysis (bottom right corner of the top right window -  sometimes you need to expand the program window to find it).
I usually open a youtube window and set an i.e. 10h timer video so my computer doesn't go to sleep while it's computing.

Once the analysis is done, find the output file that ends in ".support" and change the suffix to ".tre", so you can open the file in Figtree.

In Figtree, to see the results with the bootstrap results, check off "Node Labels" and in the drop-down menu for "Display", choose "labels".