Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses

Bonnie Hurwitz; Ken Youens-Clark

Feb 03, 2016

Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses

DOI

https://dx.doi.org/10.17504/protocols.io.efgbbjw

Bonnie Hurwitz¹,
Ken Youens-Clark¹

¹University of Arizona

VERVE Net
Hurwitz Lab

Ken Youens-Clark

University of Arizona

DOI: https://dx.doi.org/10.17504/protocols.io.efgbbjw

External link: http://www.pnas.org/content/111/29/10714.full

Protocol Citation: Bonnie Hurwitz, Ken Youens-Clark 2016. Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses. protocols.io https://dx.doi.org/10.17504/protocols.io.efgbbjw

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

Created: January 13, 2016

Last Modified: March 27, 2018

Protocol Integer ID: 2248

Keywords: ecological drivers in marine viral community, marine viral community, questions in marine viral ecology, marine viral ecology, occurring viral diversity, viral metagenome, viral community, pacific ocean virome, comparative metagenomic, using comparative metagenomic, metagenomic, free strategy for comparative metagenomic, viromes from diverse site, visualization of complex sample network, metagenome, ecology, community structure, complex sample network, social network analysis, network analysis, virome, fundamental ecological question, modeling ecological driver, diverse site, ecological factor, driving community structure, pacific ocean

Abstract

Long-standing questions in marine viral ecology are centered on understanding how viral assemblages change along gradients in space and time. However, investigating these fundamental ecological questions has been challenging due to incomplete representation of naturally occurring viral diversity in single gene- or morphology-based studies and an inability to identify up to 90% of reads in viral metagenomes (viromes).  In this protocol, I describe how to use an annotation- and assembly-free strategy for comparative metagenomics that combines shared k-mer and social network analyses (regression modeling). This robust statistical framework enables visualization of complex sample networks and determination of ecological factors driving community structure.  This tutorial describes a protocol to reproduce work from the Pacific Ocean virome comprised of 32 viromes from diverse sites in the Pacific Ocean.

 "Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses" (July 7, 2014, doi: 10.1073/pnas.1319778111, PNAS July 22, 2014 vol. 111 no. 29 10714-10719)

Code is freely available at Github.

Troubleshooting

Log into to iPlant/CyVerse (http://www.cyverse.org/, http://de.iplantcollaborative.org) Discovery Environment.

Upload FASTA-formatted sequence files and a tab-delimited file of metadata.  Example data can be found in the Data Store at "/iplant/home/shared/imicrobe/fizkin/pov."  To view in the Discovery Environment:
Click on the "Data" button in the DE
Go to the "Community Data -> imicrobe -> fizkin -> pov -> fasta" directory


A sample metadata file is also included ("meta.tab").  The headers of the metadata file should include the 'name' of the file and fields ending in '.d' for 'discrete' value (e.g., 'Male' or 'Female'), '.c' for 'continuuous' data (e.g., numbers in a range), or '.ll' for 'latitude/longitude' data.  Field names should not include underscores with the exception of 'lat_long.ll.'  

Here is an example table:

Select the "Apps" button on the left, then look under "Public Apps -> Experimental -> iMicrobe -> Fizkin."  Open the "Required Args" section and select your FASTA directory as the "Input directory."  You can leave "Output directory" alone or change it if you wish.  Use the file selector to find your "Metadata file" described in step 2.



Optional args:
K-mer size: Default is 20.  Values between 16 and 31 are best.
Mode minimum: Default is 1.  Increase to require more stringent matching.
Max. num. sequences: Default is 300K.  Use a lower value to reduce runtime.  Use a higher value to get deeper coverage.  Samples containing more than this parameter will be randomly sampled.
Max. num. samples: Default is 15.  Keep in mind that Fizkin runs a pair-wise analysis, so runtime is O(n^2).  If your number of samples is greater than this argument, the samples will be randomly selected.
Files list: The subset of files you wish to run, one file on each line

Press "Launch analysis" and wait for notification of the completion of your job.

Common failures include something like this from R (GBME):

Error in summary(fit1)$cov.unscaled[(2 * n):length(fit1$coef), (2 * n):length(fit1$coef)] : 
 subscript out of bounds
Calls: gbme -> gbme.glmstart -> as.matrix
Execution halted

This is usually due to the metadata being too homogenous or entirely heterogenous.  Remove any offending metadata and try again.

The ultimate result should be a social network graph showing the grouping of samples similar to this:

Public workspaceModeling ecological drivers in marine viral communities using comparative metagenomics and network analyses

Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses