Protocol Citation: Maria Balkey, Julie Haendiges, Ruth Timme, Candace Hope Bias 2022. Querying the NCBI database for GenomeTrakr data. protocols.io https://dx.doi.org/10.17504/protocols.io.bznup5ew
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it’s working
Created: November 01, 2021
Last Modified: February 17, 2022
Protocol Integer ID: 54708
Disclaimer
Please note that this protocol is public domain, which supersedes the CC-BY license default used by protocols.io.
Abstract
This protocol describes methods to query GenomeTrakr sequencing records and metadata across multiple NCBI resources: BioSample, BioProject, Sequencing Read Archive, Pathogen Detection and Assembly databases.
NCBI Resources
NCBI Resources
Whole Genome Sequencing data submitted to NCBI is processed in multiple databases. If your laboratory or collaborator has submitted WGS data for foodborne pathogens to NCBI, you can locate the data at the NCBI resources: BioProject, BioSample, Sequencing Read Archive, Pathogen Detection and Assembly.
BioProject is a collection of biological data related to a surveillance or research effort. Umbrella bioprojects contain several data-level projects. PRJNA593772 ( https://www.ncbi.nlm.nih.gov/bioproject/593772) comprises a set of umbrella bioprojects, each established for a pathogen being sequenced by the GenomeTrakr network. If you need to find a BioProjects for an specific organism processed by your lab, click at the corresponding organism umbrella Bioproject.
Figure 2: GenomeTrakr Umbrella BioProjects
If you are interested in searching for specific type of data, you can click on Browse by Project attributes and narrow your search by using filters such as: Project, Data Type, Scope, Property , Kingdom, Group, Subgroup.
BioSample
BioSample
BioSample (https://www.ncbi.nlm.nih.gov/biosample/ ) is the database for the isolate or sample metadata. Users access biosample records at using the search box and typing laboratory identifiers ( strain, isolate name alias, FDA_Lab_ID, BioProjects) or specific attributes separated by " OR " e.g. "CFSAN0001 OR CFSAN0002 OR CFSAN0003" "Salmonella enterica".
Figure 3: Searching records in BioSample
The data from biosample can be downloaded by clicking the send to icon and choosing the destination of the file (summary, full text, full XML, biosample ID list or Accessions list).
Figure 4: Downloading records from NCBI BioSample
SRA and nucleotide links are available if sequencing and assembly data were submitted to NCBI.
Figure 5: SRA and Nucleotide (assembly) access from BioSample.
Sequencing Read Archive
Sequencing Read Archive
The Sequencing Read Archive (SRA) https://www.ncbi.nlm.nih.gov/sra/ is the primary repository of raw whole genome sequencing data. You can access records at SRA by typing laboratory identifiers ( strain, isolate name alias, FDA_Lab_ID, BioProjects) or specific attributes in the search box. Identifiers might need to be separated by " OR " e.g. -CFSAN0001 OR CFSAN0002 OR CFSAN0003.
You can export SRA accessions by clicking at the Send to bottom and choosing file and the accession list format.
Figure 6: Downloading SRA accessions from NCBI SRA.
You can download sequencing data files from NCBI using SRA Toolkit, Run Browser and the cloud.
Enter the accessions in the search box at the SRA browser, click Search, the output will include all the found records.
Click Send to on the top of the SRA page, check the Run Selector radio button, and click the button Go.
If necessary, refine your results by using various filters provided by the Run Selector's interface.
Click the Metadata button. This will generate a tabular file with metadata available for each Run.
Figure 7: Sending data to SRA Run Selector.
Pathogen Detection
Pathogen Detection
Visit the Pathogen Detection (https://www.ncbi.nlm.nih.gov/pathogens/) to access real-time analyses of isolates obtained from ongoing pathogen surveillance activities.
If you need to download multiple assembled genomes, access the NCBI Assembly resource (https://www.ncbi.nlm.nih.gov/assembly/). Enter the identifiers in the search box and click in the "Download Assemblies" button. In the left side of this interface, you can refine your search by applying multiple filters. For more details on programatically download genomes from NCBI visit https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/.
Figure 8: Downloading assemblies from NCBI Assembly
NCBI Insights
NCBI Insights
If you want to keep up with NCBI news, sign up for NCBI insights updates. The NCBI Insights Blog offers guidance on the latest NCBI resources.