Aguilar-Ordoñez I, Pérez-Villatoro F, García-Ortiz H, Barajas-Olmos F, Ballesteros-Villascán J, González-Buenfil R, Fresno C, Garcíarrubio A, Fernández-López JC, Tovar H, Hernández-Lemus E, Orozco L, Soberón X, Morett E (2021) Whole genome variation in 27 Mexican indigenous populations, demographic and biomedical insights. PLoS ONE 16(4): e0249773. doi: 10.1371/journal.pone.0249773
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: August 30, 2020
Last Modified: September 21, 2020
Protocol Integer ID: 41247
Abstract
Nextflow pipeline used to build the novel variants dataset for the 100GMX project.
'nf-vcf-novel-dataset-builder' is a pipeline tool that builds a VCF file compiling only novel variants according to dbSNP and VEP, from a VEPextended annotated VCF file. This novel selection does not include singletons and private variants. The main output is in VCF format. Additional outputs include the dataset in TSV format, and a sequence coverage from gnomAD in these sites.
##fileformat=VCFv4.2 #CHROM POS ID REF ALT QUAL FILTER INFO chr21 5101724 . G A . PASS AC=1;AF=0.00641;AN=152;DP=903;ANN=A|intron_variant|MODIFIER|GATD3B|ENSG00000280071|Transcript|ENST00000624810.3|protein_coding||4/5|ENST00000624810.3:c.357+19987C>T|||||||||-1|cds_start_NF&cds_end_NF|SNV|HGNC|HGNC:53816||5|||ENSP00000485439||A0A096LP73|UPI0004F23660|||||||chr21:g.5101724G>A||||||||||||||||||||||||||||2.079|0.034663|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| chr21 5102165 rs1373489291 G T . PASS AC=1;AF=0.00641;AN=140;DP=853;ANN=T|intron_variant|MODIFIER|GATD3B|ENSG00000280071|Transcript|ENST00000624810.3|protein_coding||4/5|ENST00000624810.3:c.357+19546C>A|||||||rs1373489291||-1|cds_start_NF&cds_end_NF|SNV|HGNC|HGNC:53816||5|||ENSP00000485439||A0A096LP73|UPI0004F23660|||||||chr21:g.5102165G>T||||||||||||||||||||||||||||5.009|0.275409||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Before start
Test
To test nf-vcf-novel-dataset-builder execution using test data, run:
./runtest.sh
Your console should print the Nextflow log for the run, once every process has been submitted, the following message will appear:
======
nf-vcf-novel-dataset-builder: Basic pipeline TEST SUCCESSFUL
======
nf-vcf-novel-dataset-builder results for test data should be in the following file:
cataloguer.R is a tool for cataloging the consequences of novel variants.
Dependencies:
cataloguer.R
Final Output:
Expected result
A compressed TSV file format by each category of variant and a SVG file.
Example line(s) of TSV:
Consequence number_of_variants Type General_category First_specific_consequence 3_prime_UTR_variant 2 noncoding UTR 3 prime UTR 3_prime_UTR_variant&NMD_transcript_variant NA noncoding UTR 3 prime UTR ...
Coverage gnomAD
Plot gnomAD coverages.
coverage-analyzer.R is a tool for plotting coverage of the gnomAD project.