DNA extraction from human stool samples and sequencing

Veronika Pettersen

Feb 27, 2026

DNA extraction from human stool samples and sequencing

DOI

https://dx.doi.org/10.17504/protocols.io.8epv55o46v1b/v1

Veronika Pettersen¹

¹UiT The Arctic University of Norway

Veronika Pettersen

DOI: https://dx.doi.org/10.17504/protocols.io.8epv55o46v1b/v1

Protocol Citation: Veronika Pettersen 2026. DNA extraction from human stool samples and sequencing. protocols.io https://dx.doi.org/10.17504/protocols.io.8epv55o46v1b/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: February 27, 2026

Last Modified: February 27, 2026

Protocol Integer ID: 244153

Keywords: gut microbiome, ESBL-E, metagenomics, shotgun sequencing, fecal DNA extraction, antimicrobial resistance genes (ARGs), mobile genetic elements, long-read sequencing, whole-genome sequencing, microbial genomics, infant gut by bifidobacteria, dna extraction from human stool sample, infant gut microbiome, antimicrobial resistance dynamics in infant gut microbiome, human stool sample, producing enterobacterale, based probiotic, metagenomic read, antibiotic resistance gene, fecal sample, metagenomic sequencing, dna extraction, infant gut, bifidobacteria, based dna extraction, genome sequencing, screening for esbl, antimicrobial resistance dynamic, standardized dna input

Funders Acknowledgements:

Tromsø Research Foundation

Grant ID: (TFS18_CANS_AS-HVF

Disclaimer

DISCLAIMER – FOR INFORMATIONAL PURPOSES ONLY; USE AT YOUR OWN RISK

This protocol is provided for research use only. Users are responsible for ensuring compliance with applicable institutional, ethical, and safety regulations. The authors assume no liability for misuse of the procedures described.

Abstract

This protocol describes the DNA extraction, metagenomic sequencing, and bioinformatic analysis workflow used in the study “Metabolic reprogramming of the infant gut by bifidobacteria-based probiotics drives exclusion of antibiotic-resistant pathobionts” by Bargheet et al. 2026. Fecal samples were mechanically lysed with bead beating following the addition of an internal spike-in control and processed using automated magnetic bead–based DNA extraction. Libraries were prepared from standardized DNA input, quality-controlled, pooled, circularized, and sequenced on the DNBSEQ-T7 platform using 150 bp paired-end reads. Parallel culture-based screening for ESBL-producing Enterobacterales was performed using selective chromogenic media and phenotypic confirmation.
Metagenomic reads were quality-filtered, host DNA was removed, and downstream analyses were conducted on thenon-human reads. Taxonomic profiling was performed using MetaPhlAn4, while functional pathway analysis was conducted using HUMAnN3. Antibiotic resistance genes were identified by aligning sequences to the CARD database using stringent filtering and were normalized to RPKM. Mobile genetic elements were quantified using ShortBRED with MobileOG-derived markers. Strain-level analysis and replication rate estimation were performed using StrainGE and CoPTR, respectively. For a subset of matched ESBL-E isolates, long-read whole-genome sequencing enabled high-resolution validation and host attribution of resistance genes. Together, this workflow integrates metagenomic, resistome, mobilome, and isolate genomics to characterize antimicrobial resistance dynamics in infant gut microbiomes.

Materials

- Fecal samples
- Liquefaction reagent (DNA Genotek, Canada)
- Vibrio campbellii culture
- ZR BashingBead lysis tubes (Zymo Research, USA)
- ZymoBIOMICS Microbial Community Standard (Zymo Research, USA)
- FastPrep-24 5G instrument (MP Biomedicals, USA)
- Tecan Fluent liquid-handling platform (Tecan, Switzerland)
- ZymoBIOMICS 96 MagBead DNA protocol (Zymo Research)
- EB buffer (Qiagen, Germany)
- QuantIT High Sensitivity dsDNA Assay
- Tecan Spark plate reader (Tecan, Switzerland)
- Alicyclobacillus acidophilus DNA
- MGI FS Library Prep Set (MGI Tech, China)
- Agilent TapeStation D1000 kit (Agilent Technologies, USA)
- MGI Easy Circularization Kit (MGI Tech, China)
- DNBSEQ-T7 platform (MGI Tech, China)
- ChromID ESBL (BioMérieux, France)
- MALDI-TOF MS (Bruker Daltonik)
- Double disk approximation test (Liofilchem, Italy)
- eSwab®
- FastQC v.0.11.9
- Trim Galore v.0.6.10
- MetaPhlAn4 v.4.0.6
- CHOCOPhLAN database
- Comprehensive Antibiotic Resistance Database (CARD)
- Bowtie2 v.2.4.4
- SAMtools v.1.12
- ShortBRED v.0.9.5
- MobileOG v.1.6
- StrainGST v.1.3.3
- MEGAHIT v.1.2.9
- MetaQUAST from QUAST v.5.2.0
- HUMAnN v3.0.1
- Oxford Nanopore Technologies (ONT) GridION platform
- GenFind V3 reagent kit (Beckman Coulter Life Sciences, Indianapolis, USA)
- Dorado v7.6.7
- fllong v0.2.1
- Autocycler v0.4.0
- fast_count
- quast v5.2.0
- BioProject PRJNA1370689
- Abricate v1.0.1
- CARD database v4.0.1
- CARD_AMR_clustered.csv
- Speciator v4.0.0
- Prodigal v2.6.3
- DIAMOND v.2.1.6

DNA extraction from human stool samples and sequencing

Fecal samples were thawed and mixed with liquefaction reagent (DNA Genotek, Canada). To each sample,  10 µL of a Vibrio campbellii culture (10⁴ cells/µL) was added as an internal spike-in control. A total of 800 µL of the mixture was transferred to ZR BashingBead lysis tubes containing a combination of 0.1- and 0.5-mm beads (Zymo Research, USA) for mechanical lysis. Each extraction batch included a positive
and a negative control: the positive control consisted of 75 µL of the ZymoBIOMICS Microbial Community Standard (Zymo Research, USA) in 725 µL liquefaction reagent, while the negative control consisted of 700 µL liquefaction reagent and 100 µL of Vibrio campbellii culture (10⁴ cells/µL).

Samples were mechanically lysed using the FastPrep-24 5G instrument (MP Biomedicals, USA)
and subsequently centrifuged at 10,000 × g for 1 minute. A volume of 200 µL of the resulting supernatant was transferred to a deep-well plate for DNA extraction. The extraction was performed using the Tecan Fluent liquid-handling platform (Tecan, Switzerland) according to the ZymoBIOMICS 96 MagBead DNA
protocol (Zymo Research). DNA was eluted in 50 µL of EB buffer (Qiagen, Germany) and stored at –20 °C .

DNA concentrations were quantified using the QuantIT High Sensitivity dsDNA Assay on a Tecan Spark
plate reader (Tecan, Switzerland). Samples with DNA concentrations above 1.1 ng/µL were spiked with 1% Alicyclobacillus acidophilus DNA (relative to total input), while samples with concentrations below 1.1 ng/µL were supplemented with Alicyclobacillus DNA to reach a total of 50 ng per reaction. The negative sequencing control was also spiked with Alicyclobacillus DNA to monitor potential background signal.

Library preparation was performed using 50 ng of input DNA and the MGI FS Library Prep Set (MGI Tech, China). Library quality was assessed using the Agilent TapeStation D1000 kit (Agilent Technologies, USA), and concentrations were confirmed using the same QuantIT assay. Equimolar amounts of the resulting
libraries were pooled to a final concentration of 100 pM, circularized using the MGI Easy Circularization Kit (MGI Tech, China), and sequenced on the DNBSEQ-T7 platform (MGI Tech, China) using 150 bp paired-end reads. 

Bioinformatics preprocessing of metagenomic dataection

Pair-end reads were checked for quality using FastQC v.0.11.9 (Andrews 2010). The adaptor sequences and low-quality reads were filtered using Trim Galore v.0.6.10 (Krueger 2012) with the default parameters. The human DNA contaminant sequences were discarded from all samples by filtering out reads mapped to the human reference genome (GRCh38, downloaded from NCBI GenBank in 2022) using Bowtie2 v.2.4.43 with --very-sensitive --end-to-end parameters. The identified paired reads that did not map against the human genome using SAMtools v.1.12 (Li et al. 2009) with -f 12 -F 256 were used in subsequent
analyses.

Metagenome, resistome, and mobilome profiling

Taxonomic profiling was obtained using MetaPhlAn4 v.4.0.6(Blanco-Míguez et al. 2023) against the CHOCOphLAN database. The resistome annotation of metagenomic reads was performed by mapping them against the nucleotide_fasta_protein_homolog_model from the Comprehensive Antibiotic Resistance Database (CARD (Jia et al. 2017)v.3.2.9) database using Bowtie2 v.2.4.4 (Langmead and Salzberg 2012)
with parameter –very-sensitive-local.

For ARG annotation, a coverage threshold of 80% was used. Using SAMtools v.1.12 (Li et al. 2009), the mapped reads were separated from the unmapped reads, sorted, and indexed, and the number of reads mapped for each ARG was calculated. The counts were then normalized for each sample to the total gene length by calculating reads per kilobase reference per million mapped reads (RPKM). The clinical relevance of individual ARGs was based on the AMR Gene Family previously reported as clinically relevant by Diebold et.al. (Diebold et al. 2023) and the National Database of Antibiotic Resistant Organisms (NDARO). Specifically, we excluded ARGs encoding efflux pumps and other metabolic functions because distinguishing their physiological roles from antibiotic resistance is difficult. Although this filtering enriches
the dataset for clinically relevant, horizontally acquired ARGs, metagenomic annotation cannot always fully distinguish intrinsic from acquired resistance determinants; therefore, some degree of overlap may remain.

The relative abundance of MGEs was quantified by employing ShortBRED v.0.9.5(Kaminski et al. 2015). Specifically, MGEs sourced from the MobileOG v.1.6(Brown et al. 2022) served as the proteins of interest for identifying marker families using ‘shortbred_identify.py’ with ‘--clustid 0.95’ option. MGEs' read counts were normalized in RPKM using ‘shortbred_quantify.py’. We also employed a computed peak-to-trough ratio (CoPTR) v.1.1.4 (Joseph et al. 2022) to estimate the impact of probiotic supplementation on the replication rates of the two most important ESBL-E bacteria, E. coli and K. pneumoniae. Strain analysis was
performed using StrainGST from StrainGE v.1.3.3(van Dijk et al. 2022). Reference genomes for B. longum were downloaded from NCBI on 20 October 2025 (n = 102 genomes). The database was built with StrainGE and is available on Zenodo (DOI: 10.5281/zenodo.17671286). The quality-filtered short-read sequences were assembled into longer contiguous sequences (contigs) using MEGAHIT v.1.2.9(Li et al. 2015) with the default parameters. 

For assembly assessment, MetaQUAST from QUAST v.5.2.0 (Gurevich et al. 2013b) was used with the -m
1000 option (Table S16). Functional pathway profiling was performed using HUMAnN v3.0.1(Beghini et al. 2021), generating both community-level and species-resolved MetaCyc pathway abundances from the quality-filtered metagenomic reads. The per-sample pathway abundance tables were merged with humann_join_tables and normalised to relative abundances using humann_renorm_table for downstream analyses. Pathway tables were normalized and analyzed for differential abundance using the Mann–Whitney U test with Benjamini–Hochberg FDR correction. Analyses were conducted separately for unstratified (community-level) and stratified (species-level) outputs, focusing on the High and Low Bifidobacterium clusters.

Linking ARGs in metagenomes to ESBL-E isolates

To validate the hosts of the detected ARGs, we linked ESBL-E isolate genomes to their corresponding metagenomic samples. Among the 250 samples, 47 had a corresponding cultured ESBL-E isolate. Low-quality or unsequenced isolates were removed (n=10), and to ensure unambiguous metagenome-isolate linkage, samples with more than one recovered ESBL-E isolate were excluded (n=7).

The resulting 32 samples were whole-genome sequenced on the Oxford Nanopore Technologies (ONT) GridION platform using R10.4.1 flow cells and the rapid SQK-RBK114-96 barcoding kit (ONT, Oxford, UK). DNA was extracted using the GenFind V3 reagent kit with the Bacteria protocol (Beckman Coulter Life Sciences, Indianapolis, USA). Base calling was performed with Dorado v7.6.7 using the super accurate model ([email protected]). The reads were filtered with fllong v0.2.1 and genome assemblies generated with Autocycler v0.4.0, and all genomes were fully circularized. The quality was assessed using fast_count and quast v5.2.0. The genome sequences have been deposited under BioProject PRJNA1370689; the accessions are listed in Table S12.

ARGs in both the isolate and metagenome assemblies were identified using Abricate v1.0.1 with the CARD database v4.0.1, applying a 99% sequence identity and length threshold. ESBL-encoding genes were assigned based on annotation in CARD_AMR_clustered.csv. Species were identified with Speciator v4.0.0.

ESBL-E screening

Fecal samples collected with eSwab‱ were cultured using chromogenic agars selective for ESBL-E (ChromID ESBL, BioMérieux, Marcy l’Étoile, France). All samples were also plated on non-selective blood agar for growth control. Agar plates were incubated at 35°C under normal conditions for 18-24 hours before inspection. When no growth was observed on blood agar, the sample was considered invalid.
Samples that showed no growth on the chromogenic agars after 18-24 h of incubation were reported as negative for ESBL-E. ESBL-E suspect colonies were identified at the species level using MALDI-TOF MS (Bruker Daltonik). An ESBL-E phenotype was confirmed using the double disk approximation test (Liofilchem, Roseto degli Abruzzi, Italy). 

Analysis of ESBL genes in isolates and corresponding metagenomes

The 32 ESBL-E isolates and 33 ESBL genes were selected for resistome comparison. Abricate v1.0.1 with CARD database v4.0.1 was used with a 99.0% sequence identity and length threshold to identify ARGs for the isolate-metagenome comparison. To identify the taxonomy, open Reading Frames (ORFs) were predicted from contigs exceeding 1000 base pairs that harboured ARGs identified by ABRicate against CARD, using Prodigal v.2.6.3 with default parameters. Following this, we determined the taxonomy by comparing amino acid sequences from Prodigal against the NCBI non-redundant database, executed using DIAMOND v.2.1.6 with the following options: `-evalue 0.00001`, `-id 95`, and `-query-cover 95`. For contigs assigned to multiple species, we retained only those where at least 75% of ORFs were assigned to a single species.

Protocol references

Andrews, Simon. 2010. "FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc. 

Beghini, Francesco, Lauren J. McIver, Aitor Blanco-Míguez, Leonard Dubois, Francesco Asnicar, Sagun Maharjan, Ana Mailyan, Paolo Manghi, Matthias Scholz, Andrew Maltez Thomas, Mireia Valles-Colomer, George Weingart, Yancong
Zhang, Moreno Zolfo, Curtis Huttenhower, Eric A. Franzosa, and Nicola Segata.  2021. 'Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3', eLife, 10: e65088.

Blanco-Míguez, Aitor, Francesco Beghini, Fabio Cumbo, Lauren J McIver, Kelsey N Thompson, Moreno Zolfo, Paolo Manghi, Leonard Dubois, Kun D Huang, and Andrew Maltez Thomas. 2023. 'Extending and improving metagenomic
taxonomic profiling with uncharacterized species using MetaPhlAn 4', Nature Biotechnology, 41: 1633–44.

Brown, Connor L, James Mullet, Fadi Hindi, James E Stoll, Suraj Gupta, Minyoung Choi, Ishi Keenum, Peter Vikesland, Amy Pruden, and Liqing Zhang. 2022. 'mobileOG-db: a manually curated database of protein families mediating the life cycle of bacterial mobile genetic elements', Applied and Environmental Microbiology, 88: e00991–22.

Buchfink, Benjamin, Chao Xie, and Daniel H Huson. 2015. 'Fast and sensitive protein alignment using DIAMOND', Nature Methods, 12: 59–60.

Diebold, Peter J., Matthew W. Rhee, Qiaojuan Shi, Nguyen Vinh Trung, Fayaz Umrani, Sheraz Ahmed, Vandana Kulkarni, Prasad Deshpande, Mallika Alexander, Ngo Thi Hoa, Nicholas A. Christakis, Najeeha Talat Iqbal, Syed Asad
Ali, Jyoti S. Mathad, and Ilana L. Brito. 2023. 'Clinically relevant antibiotic resistance genes are linked to a limited set of taxa within gut microbiome worldwide', Nature Communications, 14: 7366.

Gurevich, A., V. Saveliev, N. Vyahhi, and G. Tesler. 2013a. 'QUAST: quality assessment tool for genome assemblies', Bioinformatics, 29: 1072–5.

Gurevich, Alexey, Vladislav Saveliev, Nikolay Vyahhi, and Glenn Tesler. 2013b. 'QUAST: quality assessment tool for genome assemblies', Bioinformatics, 29: 1072–75.

Hyatt, Doug, Gwo-Liang Chen, Philip F LoCascio, Miriam L Land, Frank W Larimer, and Loren J Hauser. 2010. 'Prodigal: prokaryotic gene recognition and translation initiation site identification', BMC bioinformatics, 11: 1–11.
Jia, B., A. R. Raphenya, B. Alcock, N. Waglechner, P. Guo, K. K.Tsang, B. A. Lago, B. M. Dave, S. Pereira, A. N. Sharma, S. Doshi, M. Courtot, R. Lo, L. E. Williams, J. G. Frye, T. Elsayegh, D. Sardar, E. L. Westman, A. C. Pawlowski, T. A. Johnson, F. S. Brinkman, G. D. Wright, and A. G. McArthur. 2017. 'CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database', Nucleic Acids Res, 45: D566–d73.

Joseph, Tyler A, Philippe Chlenski, Aviya Litman, Tal Korem, and Itsik Pe'er. 2022. 'Accurate and robust inference of microbial growth dynamics from metagenomic sequencing reveals personalized growth rates', Genome Research, 32: 558–68. 

Kaminski, James, Molly K Gibson, Eric A Franzosa, Nicola Segata, Gautam Dantas, and Curtis Huttenhower. 2015. 'High-specificity targeted functional profiling in microbial communities with ShortBRED', PLOS Computational Biology, 11: e1004557.

Krueger, Felix. 2012. "Trim Galore." available online: https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/. 

Langmead, Ben, and Steven L. Salzberg. 2012. 'Fast gapped-read alignment with Bowtie 2', Nature Methods,
9: 357–59.

Li, Dinghua, Chi-Man Liu, Ruibang Luo, Kunihiko Sadakane, and Tak-Wah Lam. 2015. 'MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph', Bioinformatics, 31: 1674–76. 

Li, Heng, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, Goncalo Abecasis, and Richard Durbin. 2009. 'The sequence alignment/map format and SAMtools', Bioinformatics, 25: 2078–79.

van Dijk, Lucas R, Bruce J Walker, Timothy J Straub, Colin J Worby, Alexandra Grote, Henry L Schreiber IV, Christine Anyansi, Amy J Pickering, Scott J Hultgren, and Abigail L Manson. 2022. 'StrainGE: a toolkit to track and characterize low-abundance strains in complex microbial communities', Genome Biology, 23: 74.

Wick, Ryan R., Benjamin P. Howden, and Timothy P. Stinear. 2025. 'Autocycler: long-read consensus assembly for bacterial genomes', Bioinformatics, 41: btaf474.