A | B | C | |
Data Extractor Name: | <First Name> <Last Name> | version 0.1 | |
Date Range of Extraction: | <YYYY-MM-DD> to <YYYY-MM-DD> | ||
General: | This is your personal worksheet for the scoping review full-text metadata extraction. All headers are imported from a master reference sheet that will be updated as necessary to ensure everyone has the most up-to-date and comprehensive information available. It is recommended you familiarize yourself with and frequently reference the "Legend" tabs, as they are your guide to filling in the tables. All data extraction tables have a aforementioned “Legend” that describes headers, and provides value examples/descriptions of what those fields are for. Tables are designed so that they import reference information from a reference sheet; when the reference sheet is updated the changes appear on individual data collection sheets. This keeps data collectors in sync when it comes to the table “Legends”, headers, etc. These imported cells are protected so that only the project lead (<First & Last Name>) or the data extract worksheet manager (<First & Last Name>) can edit them to avoid users from accidentally breaking a formula. For data extraction examples, we recommend you review the information provided on your personal scoping review full-text metadata extraction pilot worksheet. All data extraction is to be performed manually as described in the protocol. | ||
Manuscript Location: | You will find the link to the manuscript you've been assigned under the "Nextcloud (Data Storage Location)" column. Make sure you double check that the title/date in the file matches the information in the overview table. If the file is missing, corrupted, or doesn't appear to be a manuscript, login to Covidence and search for your article using the "Covidence #" with the "#" sign in front. | ||
Overview Table: | The "Overview Table" is a summary table that provides data extractors with relevant manuscript information and overarching variables of interest the scoping review team and metadata advisors have advised the capture of. Your copy of the table will only contain the manuscripts you have been assigned for data extraction. | ||
Metadata Table: | The "Metadata Table" is where more granular metadata evidence is captured to indicate the presence or absence of a metadata type/field. “Metadata Type (Field Name)”s are deemed “TRUE” if a required subfield was identified, or in the case where an example subfield value was found, indicated as TRUE by the inclusion of the quoted “value” which supports the conclusion. Otherwise, indicate "FALSE" to confirm you tried but could not find evidence of a data item. | ||
Formatting/Customization & Troubleshooting: | To make the data extraction worksheet more comfortable for yourself, you are welcome to customize the formatting of cells provided you are customizing the entire cell and not just a selection of text. This is because trying to format a smaller range of text will break the import function and result in a "REF#" error. If you receive this error you can resolve it by undoing the formatting you have applied. You should also not rearrange and columns or rows so as to ensure appropriate data alignment when your tables are merged into the masters. |
A | B | C | D | E | F | |
Header | Description | Data Type | Values | Value Description | Version 0.1 | |
A | B | C | D | E | F | |
Header | Description | Values | Value Description | Data Type | Version 0.1 | |
Assignee | ||||||
Study/Covidence # | ||||||
Study | ||||||
Title | ||||||
Authors | ||||||
DOI | ||||||
Data Storage Location | ||||||
Variables of Interest |
A | B | C | D | E | E | |
Header | Description | Values | Value Description | Data Type | Version 0.1 | |
Assignee | The team member assigned to extra metadata from the associated article. | |||||
Study/Covidence # | The Covidence document reference number (#). | |||||
Study | Shorthand study label indicating <1st author> . | |||||
Title | Manuscript/Document title. | |||||
Authors | Manuscript/Document author list. | |||||
DOI | DOI reference ID or PURL. | |||||
Data Storage Location | URL link to the data storage folder on the Centre for Infectious Disease Genomics and One Health (CIDGOH) Nextcloud server. This is where manuscripts can be found/uploaded, and where to upload additional data such as supplementary files and sequence data (when applicable). | |||||
Variables of Interest | Overarching variables we've chosen to extract for analysis. |
A | B | C | D | E | E | |
Header | Description | Values | Value Description | Data Type | Version 0.1 | |
[------removed for brevity------] | ||||||
Variables of Interest | Overarching variables we've chosen to extract for analysis. | |||||
Sequence Data Accessible | TRUE/FALSE | boolean | ||||
Metadata Table Accessible | - Are we thinking supplementary tables or in-text too? | TRUE/FALSE | boolean | |||
Open Access | TRUE/FALSE | boolean | ||||
Organism | List the name(s) of the infectious pathogens under study. | string | ||||
Organism Type | Virus, Bacteria | enum (use dropdown list) | ||||
A | B | C | D | E | E | |
Header | Description | Values | Value Description | Data Type | Version 0.2 | |
[------removed for brevity------] | ||||||
Variables of Interest | Overarching variables we've chosen to extract for analysis. | |||||
Sequence Data Accessible | We want this to help us evaluate sequence data sharing and accessibility. Need to double check it is actually shared and not just claimed to be shared. | Database Name, Not Applicable | string | |||
Sequence Data Recorded | Discusses saving sequence information on the lab server in space dedicated for this purpose. Store in a folder labelled with the covidence number (requested by lab supervisor). | boolean? | ||||
Metadata Table Accessible | - Are we thinking supplementary tables or in-text too? | TRUE/FALSE | boolean | |||
Open Access | If the document was was available via open access (i.e. not behind a paywall). IF not indicated on the journal/document provided, this can be checked by (a) checking the DOI link in incognito mode on your browser, and (b) searching the university library catalogue to see if the journal is listed as OPEN (and where)... | TRUE/FALSE | boolean | |||
Organism Type | Capture this so we can see what the overview of organisms types are. | Virus, Bacteria, Fungi, Protozoa | enum | |||
Organism | - Supervisor noted this would be of value to the lab for people interested in doing a similar but more pathogen specific version of this scoping review / contextual metadata collection. - Don't need to list at the variant level, probably not even the subspecies level. NCBITaxon could be used for rounding up to more general classes. | List the name(s) of the infectious pathogens under study. | string | |||
A | B | C | D | E | E | |
Header | Description | Values | Value Description | Data Type | Version 0.5 | |
[------removed for brevity------] | ||||||
Variables of Interest | Overarching variables we've chosen to extract for analysis. | |||||
Type | If the document type is surveillance, outbreak investigation, or both. Do you best to pick a primary study type of either "surveillance" or "outbreak investigation" (e.g. a surveillance study that references historical outbreaks would be "surveillance" but not "outbreak investigation", and vice versa for an "outbreak investigation" that happens to reference "surveillance" data"). Only use "Both" in cases of extreme certainty or uncertainty. Note: Please refer to the "Glossary" for more study type descriptions and other agreed upon terminology. | Surveillance | We are working with the understanding that surveillance is the ongoing systematic collection, analysis, and interpretation of health data that are essential to the planning, implementation, and evaluation of public health practice. This means a surveillance study can be deduced based on if isolates/data are collected over a span of time (such as with contact tracing); otherwise err on the side of not surveillance if you’re not 100% sure whether the study is conducting surveillance (i.e. if the study is using surveillance isolates but not conducting actual surveillance, etc.). If a study describes itself as surveillance then we defer to the study. (source: https://www.sciencedirect.com/science/article/abs/pii/B9780127640518500421) | enum | ||
Outbreak Investigation | A study is an outbreak investigation if the authors describe investigating an outbreak. Note that a cluster of cases is not necessarily and outbreak. Generally, the aim of outbreak epidemiology is to study an epidemic in order to gain control over it and to prevent further spread of the disease. Generally outbreak means a 'sudden occurrence,' while in the epidemiological sense an outbreak is defined as a sudden increase in the disease frequency, related to time, place, and observed population. (source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7187955/) | enum | ||||
Both | A surveillance context in which activities are specifically triggered by the detection of [a disease] outbreak. (source: http://purl.obolibrary.org/obo/HSO_0000371) | enum | ||||
Sequence Data Accessible (Database Name) | If the sequence data is publically available and accessible through information provided by the article, indicate which database it is located at. | [Database Name] | The name of the database where the sequence data was located. Note: Captured as a string but can be converted (>) to a "TRUE" boolean for analysis. | string > boolean | ||
FALSE | Database information could not be identified. | boolean | ||||
Sequence Data Saved (TRUE/FALSE) | Please save sequence information on the lab server space dedicated for this purpose. Store in a folder labelled with the manuscript/document Covidence number. ONLY SAVE THE DATA if stored in a non-mainstream repository. Mainstream repositories are deemed more likely to be accessible for future research and thus not necessary to retrieve data from. You may note down the and/or or leave it to researchers to identify this information from the manuscript. | TRUE | Sequence data or information about it has been stored on the CIDGOH server. | boolean | ||
FALSE | Sequence data or information about it has not been stored on the CIDGOH server. | boolean | ||||
Open Access (TRUE/FALSE) | If the document was available via open access (i.e. not behind a paywall). If not indicated on the journal/document provided, this can be checked by (a) checking the DOI link in incognito mode on your browser, and (b) searching the institutional Library catalogue to see if the journal is listed as OPEN (and where). It is best to perform both checks as just because the DOI link provided by Covidence isn't open, doesn't mean it isn't openly accessible at another journal. | TRUE | An open access version of the manuscript/document was identified. | boolean | ||
FALSE | An open access version of the manuscript/document was not identified. | boolean | ||||
Organism Type (Virus| Bacteria| Fungi| Protozoa) | Infection pathogens that undergo whole genome or near complete genome sequencing. Not including worms or prions. Some studies include more than one type, in which case indicate all applicable organisms, separated by a "|" (vertical bar/pipe). | Virus | An infectious agent which consists of two parts, genetic material and a protein coat. These organisms lack independent metabolism, and they must infect the cells of other types of organisms to reproduce. Most viruses are capable of passing through fine filters that retain bacteria, and are not visible through a light microscope. (source: http://purl.obolibrary.org/obo/NCIT_C14283) | enum | ||
Bacteria | Unicellular, prokaryotic organisms that reproduce by cell division and usually have cell walls; can be shaped like spheres, rods or spirals and can be found in virtually any environment. (source: http://purl.obolibrary.org/obo/NCIT_C14187) | enum | ||||
Fungi | A kingdom of eukaryotic, heterotrophic organisms that live as saprobes or parasites, including mushrooms, yeasts, smuts, molds, etc. They reproduce either sexually or asexually, and have life cycles that range from simple to complex. Filamentous fungi refer to those that grow as multicellular colonies (mushrooms and molds). (source: http://purl.obolibrary.org/obo/NCIT_C14209) | enum | ||||
Protozoa | Unicellular heterotrophic eukaryote in the kingdom protista. (source: http://purl.obolibrary.org/obo/NCIT_C77916) | enum | ||||
Organism (separator = |) | List the name(s) of the infectious pathogens studied. Indicate all applicable organisms, separated by a "|" (vertical bar/pipe). Note: You do not need to list at the variant level. | [Organism Name] | Latin scientific name for the organism of study. | string | ||
Metadata Accessible | If a metadata table is available or not. | |||||
In-Text (TRUE/FALSE) | Metadata available from the main body text of the manuscript/document. This value does not indicate the presence of a metadata table, but is still a variable of interest. | TRUE | Metadata available within text. | boolean | ||
FALSE | Metadata not available within text. | boolean | ||||
Manuscript Embedded Table (Summary Table/Line List/FALSE, separator = |) | Metadata available as an embedded table within the manuscript/document. The presence of both can be indicated be listing them using the "|" separator (vertical bar/pipe). | Summary Table | Metadata available within embedded table as a aggregated data (e.g. listing out counts/date for a specific demographic and associated symptoms). Note: Captured as a string but can be converted (>) to a "TRUE" boolean for analysis. | string > boolean | ||
Line List | Metadata available within embedded table as a individual data points (i.e. individual subjects and their metadata values). Note: Captured as a string but can be converted (>) to a "TRUE" boolean for analysis. | string > boolean | ||||
FALSE | Metadata not available within embedded table. | boolean | ||||
[---see manuscript dataset for more examples---] | ||||||
A | B | C | |
Header | Description / Definition | version 0.1 | |
Metadata Type (Field Name) | Metadata fields that are usually required when present in a CIDGOH specification. | ||
Definition | Field or subfield definitions, some pulled directly from specification vocabulary while others are more broadly described categories. | ||
Values | Quote from the text the information used as evidence to conclude the presence of a metadata type. | ||
Motivation | A positive or negative motivation for the inclusion of a metadata type. A positive motivation would be along the lines of “this metadata allowed us to do this”, while a negative motivation could be “a limitation of that study was we couldn’t study this because we didn’t have that”. |