Mar 18, 2026

Public workspaceMapping Community Meta-transcriptomics to gapseq-Annotated Metabolic Reactions and Transporters

  • Zhelun Zhang1
  • 1MEME lab microbial ecology
Icon indicating open access to content
QR code linking to this content
Protocol CitationZhelun Zhang 2026. Mapping Community Meta-transcriptomics to gapseq-Annotated Metabolic Reactions and Transporters. protocols.io https://dx.doi.org/10.17504/protocols.io.8epv5yoynl1b/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: In development
We are still developing and optimizing this protocol
Created: March 16, 2026
Last Modified: March 18, 2026
Protocol Integer ID: 313305
Keywords: transcriptomic reads against functional annotation, transcriptomics to gapseq, transcriptomic read, annotated metabolic reaction, transcriptomic, scale metabolic model, mapping community meta, functional annotation bridging, specific gtf files from functional table, genome, functional annotation, rrna, read alignment, workflow for mapping, depletion of rrna
Abstract
This protocol details a workflow for mapping and counting meta-transcriptomic reads against functional annotations derived from genome-scale metabolic models (gapseq).
The pipeline consists of four primary modules:
Pre-processing: Quality filtering and depletion of rRNA.
Read Alignment: High-efficiency mapping using Bowtie2 and Samtools.
Functional Annotation Bridging: Automated generation of reaction-specific GTF files from functional tables (.tbl). Summarization: HTSeq-count.
Materials
module load python/3.10.8 module load samtools module load bowtie2 module load bedtools2/2.30.0
Troubleshooting
Safety warnings
## give HTSeq-count at least 60 gb memory, 100 gb is good ## 3 hours might be required for each organism across three samples
Before start
For custom scripts used in this pipeline please refer to the following GitHub repository https://github.com/Zhelunnn/Metatranscriptomic_for_gapseq

### Please let me know if there is any problem
Mapping Community Metatranscriptomics to gapseq-Annotated Metabolic Reactions and Transporters
module load python/3.10.8
module load samtools
module load bowtie2
module load bedtools2/2.30.0
# Please run this pipeline on the university supercomputer
trimmomatic PE -threads 12 -phred33 ${reads}.fastq HEADCROP:33 SLIDINGWINDOW:4:20 MINLEN:50
## remove rRNA
export PATH=sortmerna-4.3.6-Linux/bin:$PATH

sortmerna --workdir ${sample_name}/P1_sortmerna --ref rRNA_DB/CC_5_16_23s_rna_db.fa --reads ${sample_name}_paired_R1.fastq -other ${sample_name}_paired_R1 -fastx fastq
# Build Bowtie2 index from the reference genome FASTA file
# This step prepares the genome for efficient read alignment

bowtie2-build ${genome}.fa ${genome}
bowtie2 -x ${genome} -1 ${reads}_paired_R1.fq -2 ${reads}_paired_R2.fq -S ${genome}.sam -p 12
# Convert SAM (text format) to BAM (binary format) to reduce file size and enable downstream processing
samtools view -bS ${genome}.sam -o ${genome}.bam
# Sort BAM file by genomic coordinates
# Required for indexing and most downstream analyses
samtools sort ${genome}.bam -o ${genome}_sorted.bam
# Create index file (.bai) for the sorted BAM
# Enables rapid random access to alignment data
samtools index ${genome}_sorted.bam
rm ${genome}.sam
rm ${genome}.bam
######################################################################
################## for transporter transcripts ################################
# Convert the gapseq annotation to GTF

python GTF_generate_20230815.py -T ${genome}-Transporter.tbl -o ${genome}_transporters_raw.gtf
htseq-count -r pos -t CDS -m intersection-nonempty --nonunique all -f bam ${genome}_sorted.bam ${genome}_transporters.gtf -c transporters_readscount_HTSeq.csv

--nonunique all: Reads (or read pairs) assigned to multiple features are counted as ambiguous and are also included in the counts for each assigned feature. Additionally, reads that align to more than one location are labeled as alignment_not_unique and are also counted separately for each alignment location.

-m intersection-nonempty
This option counts reads that are ambiguously assigned to multiple features.
check the figure in the site:https://htseq.readthedocs.io/en/release_0.11.1/count.html?highlight=ambiguous#counting-reads-in-features-with-htseq-count.

## allocate HTSeq-count at least 60 gb memory
## 3 hours might be required for each sample
######################################################################
################## for reaction transcripts ################################
# Convert the gapseq annotation to GTF

python GTF_generate_20230815.py -G ${genome}-all-Reactions.tbl -o ${genome}_reactions_raw.gtf
htseq-count -r pos -t CDS -m intersection-nonempty --nonunique all -f bam ${genome}_sorted.bam ${genome}_reactions.gtf -c reactions_readscount_HTSeq.csv


sed -i 's/\t/,/g' transporters_readscount_${transreads_version}_HTSeq.csv
### sort according to number of mapped reads
sort -n -k2 -r -t, transporters_readscount__HTSeq.csv > transporters_readscount__HTSeq_sorted.csv
######### Normalize and summarize the metatranscriptomic reads mapped to each bacterial genome and produce a summary file ########
#### Parameters a, b and c represent three replicates ##########
# Rscript summary_transporter_reaction_0911.R -a transporters_readscount_${transreads_version_1}_HTSeq_sorted.csv -b transporters_readscount_${transreads_version_2}_HTSeq_sorted.csv -c transporters_readscount_${transreads_version_3}_HTSeq_sorted.csv -n ${genome} -g ${genome}-Transporter.tbl -o ${genome}_${reads_date}_transporter_reads_count.tsv