Sep 21, 2020

Public workspacenf-100GMX-variant-summarizer

  • 1Instituto Nacional de Medicina Genómica (INMEGEN)
  • Whole genome variation in 27 Mexican indigenous populations, demographic and biomedical insights
Icon indicating open access to content
QR code linking to this content
Protocol CitationIsrael Aguilar Ordoñez 2020. nf-100GMX-variant-summarizer. protocols.io https://dx.doi.org/10.17504/protocols.io.bkv6kw9e
Manuscript citation:
Aguilar-Ordoñez I, Pérez-Villatoro F, García-Ortiz H, Barajas-Olmos F, Ballesteros-Villascán J, González-Buenfil R, Fresno C, Garcíarrubio A, Fernández-López JC, Tovar H, Hernández-Lemus E, Orozco L, Soberón X, Morett E (2021) Whole genome variation in 27 Mexican indigenous populations, demographic and biomedical insights. PLoS ONE 16(4): e0249773. doi: 10.1371/journal.pone.0249773
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: September 05, 2020
Last Modified: September 21, 2020
Protocol Integer ID: 41630
Abstract
Nextflow pipeline used to count variants for the 100GMX project
'nf-100GMX-variant-summarizer' is a pipeline tool that counts variants in a VEPextended annotated VCF file.
This pipeline generates 3 outputs:
1) a TSV file with the total number of SNV and indels
2) a TSV file with per sample counts for variants of type SNV, indel, novel, worldwide singletons, clinvar, gwascat and pharmgkb
3) a PDF file with the number of discernible variants in sample groups of interest.

Important note: input file must be previously annotated by https://github.com/Iaguilaror/nf-VEPextended
Guidelines

Installation

Download nf-100GMX-variant-summarizer from Github repository:
git clone https://github.com/Iaguilaror/nf-100GMX-variant-summarizer

Compatible OS*:

* nf-100GMX-variant-summarizer may run in other UNIX based OS and versions, but testing is required.

Software Requirements:
Software
bcftools
NAME

Software
htslib
NAME

Software
filter_vep
NAME

Software
Nextflow
NAME

Software
Plan9
NAME

Software
R
NAME


Materials

Pipeline Inputs

Example line(s):
##fileformat=VCFv4.2 #CHROM POS ID REF ALT QUAL FILTER INFO chr21 5101724 . G A . PASS AC=1;AF=0.00641;AN=152;DP=903;ANN=A|intron_variant|MODIFIER|GATD3B|ENSG00000280071|Transcript|ENST00000624810.3|protein_coding||4/5|ENST00000624810.3:c.357+19987C>T|||||||||-1|cds_start_NF&cds_end_NF|SNV|HGNC|HGNC:53816||5|||ENSP00000485439||A0A096LP73|UPI0004F23660|||||||chr21:g.5101724G>A||||||||||||||||||||||||||||2.079|0.034663|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| chr21 5102165 rs1373489291 G T . PASS AC=1;AF=0.00641;AN=140;DP=853;ANN=T|intron_variant|MODIFIER|GATD3B|ENSG00000280071|Transcript|ENST00000624810.3|protein_coding||4/5|ENST00000624810.3:c.357+19546C>A|||||||rs1373489291||-1|cds_start_NF&cds_end_NF|SNV|HGNC|HGNC:53816||5|||ENSP00000485439||A0A096LP73|UPI0004F23660|||||||chr21:g.5102165G>T||||||||||||||||||||||||||||5.009|0.275409||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
  • *.tsv: A metadata file, relating every sample ID (as registered in the VCF file) and a sample group in column format.

Example line(s):
sample group
SM-3MG5L Chinanteco
SM-3MG5F Chocholteco
SM-3MG46 Kanjobal

Before start

Test

To test nf-100GMX-variant-summarizer's execution using test data, run:
./runtest.sh

Your console should print the Nextflow log for the run, once every process has been submitted, the following message will appear:
======
VCF summarizer: Basic pipeline TEST SUCCESSFUL
======

nf-100GMX-variant-summarizer results for test data should be in the following file:
nf-100GMX-variant-summarizer/test/results/VCFsummarizer-results


Usage

To run nf-100GMX-variant-summarizer go to the pipeline directory and execute:
nextflow run summarize-vcf.nf --vcffile <path to input 1> --metadata <path to input 2> --nsamples <integer> --group_minaf <numeric> --outgroup_maxaf <numeric> [--output_dir path to results ]

For information about options and parameters, run:
nextflow run summarize-vcf.nf --help

Branch A
Branch A
Project Counts
Note
a) Count samples and raw stats for all samples.
b) Give the total counts data.

Branch B
Branch B
No filter counts
Note
a) Filter variants that are not in ClinVar, GWAS Catalog or PGKB.
b) Give the total counts data.

Dependencies:
  • final-counter.R
Software
bcftools
NAME

Novel counts
Note
a) Filter variants without a rsID.
b) Give the total counts data.

Dependencies:
  • final-counter.R
Software
bcftools
NAME

Worldwide singletons counts
Note
a) Filter variants that are singletons.
b) Keep variants that have not frequencies in another population.

Dependencies:
  • final-counter.R
Software
bcftools
NAME

Software
filter_vep
NAME

ClinVar counts
Note
a) Filter variants harbored in CinVar.
b) Give the total counts data.

Dependencies:
  • final-counter.R
Software
filter_vep
NAME

Software
bcftools
NAME

GWASCatalog counts
Note
a) Filter variants harbored in GWAS Catalog.
b) Give the total counts data.

Dependencies:
  • final-counter.R
Software
bcftools
NAME

Software
filter_vep
NAME

PGKB counts
Note
a) Filter variants harbored in Pharm GKB.
b) Give the total counts data.

Dependencies:
  • final-counter.R
Software
bcftools
NAME

Software
filter_vep
NAME

Merge tables
Merge tables of counted data.

Dependencies:
  • merger.R
Branc C
Branc C
Define groups
Note
a) Count samples per group.
b) Join in a file samples per group.

Dependencies:
NONE
Select world rare
Note
a) Filter variants according AF in gnomAD populations and NatMex.

Dependencies:
Software
filter_vep
NAME

Extract discernible VCF
Note
a) Extract variants in other samples.
b) Extract variants in local samples.
c) Extract exclusive variants.

Dependencies:
Software
bcftools
NAME

Count and Plot
Note
a) Count the number of discernible variants per group and type in a TSV file.
b) Plot number of discernible variants per group and type.

Dependencies:
Software
bcftools
NAME
  • plotter.R