Locate assembled contigs that belong the target virus genome
Using blast we will find virus contigs which might belong to the target virus from Fig. 2.
i) Blast all metagenomic contigs against the virus consensus and pick matching reads.
We will only pick matching reads over 150bp. The following shell script can be copied and pasted into a file Phising.sh
Use the command ./Phising.sh Consensus.fasta Assembled_query.fasta (replaces with your filenames)
Use the following to make it executable: chmod +x Phising.sh
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
#Blasts denovo assembly (Assembled_query.fasta) against virus genome (Consensus.fasta) then picks all results >150bp as fasta files (E-value cut off 10-5).
#Usage ./Phishing.sh <Consensus.fasta> <Assembled_query.fasta>
makeblastdb -dbtype nucl -in $1
blastn -query $2 -db $1 -evalue 0.00001 -num_threads 4 -outfmt 6 -out $2_blastn_$1
#Generate results list for matches
awk ' $4>=150 {print $1"@"}' $2_blastn_$1 >$2_blastn_$1_phishing_list
#Pick matching fasta entires from Assembled_query.fasta
tr '\n' '@' <$2 | sed 's/>/\n>/g'| grep -f $2_blastn_$1_phishing_list| tr '@' '\n' >$2_blastn_$1_phishes
#Remove ambigous bases (N's) from matching contigs as they hinder future alignment
tr -d "N" <$2_blastn_$1_phishes >$2_blastn_$1_phishes_NoN.fasta
#*phishes_NoN.fasta is file of matching fasta records