Download and install all software listed above according to the manual. However, I do have some notes here that might help with some of the installation protocols (Note: I did everything on a remote Linux server, not a local MacOS or Windows machine, so all instructions will be working for Linux):
______________________________________________________________________________________________________________
diamond: this installation is simple, follow the instructions in the README portion of the github page. Make sure you put the `diamond` executable into your PATH.
______________________________________________________________________________________________________________
sortmerna: use the Github release, as building the code is a major pain, and follow the README instructions. Once you have it installed, you then need to generate the indices before you run the code. There are 6 databases that need to be indexed (small and large rRNA subunits for bacteria, archaea, and eukarya). To do this, make sure the `indexdb_rna` script is in your PATH. I also placed my `sortmerna` directory into my home directory. Then run the following (note: everything inside of `< >` is dependent on your particular files; when entering the command, do not use the `< >` symbols):
$ indexdb_rna --ref ~/sortmerna-<version>/rRNA_databases/silva-arc-16S-id95.fasta,/home/<username>/sortmerna-<version>/index/silva-arc-16S-id95 -v
______________________________________________________________________________________________________________
$ wget ftp://ftp.ncbi.nlm.nih.gov/refseq/release/complete/*.faa.gz
Note: this database is HUGE, so make sure you have the available space. You will then need to pull all of the files together:
$ cat *.faa.gz > refseq.faa.gz
Then, you will need to prepare the database for `diamond`:
$ nohup nice diamond makedb --in refseq.faa.gz --db refseq &
______________________________________________________________________________________________________________
$ wget ftp://ftp.theseed.org/subsystems/subsys*
Then, unzip the `.gz` files using: $ gunzip <filename>.gz
$ python2.7 subsys_db_rebuilder.py subsystems.complex subsystems2role subsystems2pg subsys.txt
This will produce a `subsystems.complex.merged` file. Note: this file will be used further down the line, so keep it somewhere you will remember. Next, run the following command:
$ sed 's/\t/ /g' subsystems.complex.merged > notabs.subsystems.complex.merged
Finally, run the following, which will generate a `reduced` file:
$ python2.7 duplicate_counter.py notabs.subsystems.complex.merged
Now, you will generate the `diamond` compatible database:
$ nohup nice diamond makedb --in notabs.subsystems.complex.merged.reduced --db seed_subsystems
______________________________________________________________________________________________________________
Trimmomatic: go to the Trimmomatic page, and under the `Downloads` section, choose the `binary` link. Make sure you have an updated version of `Java`.
______________________________________________________________________________________________________________
BBmerge: download the BBtools package, which has the bbmerge.sh script inside. Place this into your PATH.
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________