Mapping Community Meta-transcriptomics to gapseq-Annotated Metabolic Reactions and Transporters

Zhelun Zhang

Mar 18, 2026

Mapping Community Meta-transcriptomics to gapseq-Annotated Metabolic Reactions and Transporters

DOI

https://dx.doi.org/10.17504/protocols.io.8epv5yoynl1b/v1

Zhelun Zhang¹

¹MEME lab microbial ecology

Zhelun Zhang

DOI: https://dx.doi.org/10.17504/protocols.io.8epv5yoynl1b/v1

Protocol Citation: Zhelun Zhang 2026. Mapping Community Meta-transcriptomics to gapseq-Annotated Metabolic Reactions and Transporters. protocols.io https://dx.doi.org/10.17504/protocols.io.8epv5yoynl1b/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: In development

We are still developing and optimizing this protocol

Created: March 15, 2026

Last Modified: March 18, 2026

Protocol Integer ID: 313305

Keywords: transcriptomic reads against functional annotation, transcriptomics to gapseq, transcriptomic read, annotated metabolic reaction, transcriptomic, scale metabolic model, mapping community meta, functional annotation bridging, specific gtf files from functional table, genome, functional annotation, rrna, read alignment, workflow for mapping, depletion of rrna

Abstract

This protocol details a workflow for mapping and counting meta-transcriptomic reads against functional annotations derived from genome-scale metabolic models (gapseq). 
The pipeline consists of four primary modules: 
Pre-processing: Quality filtering and depletion of rRNA. 
Read Alignment: High-efficiency mapping using Bowtie2 and Samtools. 
Functional Annotation Bridging: Automated generation of reaction-specific GTF files from functional tables (.tbl). Summarization: HTSeq-count.

Materials

module load python/3.10.8 module load samtools module load bowtie2 module load bedtools2/2.30.0

Safety warnings

## give HTSeq-count at least 60 gb memory, 100 gb is good ## 3 hours might be required for each organism across three samples

Before start

For custom scripts used in this pipeline please refer to the following GitHub repository https://github.com/Zhelunnn/Metatranscriptomic_for_gapseq

### Please let me know if there is any problem

Mapping Community Metatranscriptomics to gapseq-Annotated Metabolic Reactions and Transporters

module load python/3.10.8
module load samtools      
module load bowtie2       
module load bedtools2/2.30.0
# Please run this pipeline on the university supercomputer

trimmomatic PE -threads 12 -phred33 ${reads}.fastq HEADCROP:33 SLIDINGWINDOW:4:20 MINLEN:50

## remove rRNA 
export PATH=sortmerna-4.3.6-Linux/bin:$PATH

sortmerna --workdir ${sample_name}/P1_sortmerna --ref rRNA_DB/CC_5_16_23s_rna_db.fa --reads ${sample_name}_paired_R1.fastq -other ${sample_name}_paired_R1 -fastx fastq

# Build Bowtie2 index from the reference genome FASTA file
# This step prepares the genome for efficient read alignment

bowtie2-build ${genome}.fa ${genome}
bowtie2 -x ${genome} -1 ${reads}_paired_R1.fq -2 ${reads}_paired_R2.fq -S ${genome}.sam -p 12

# Convert SAM (text format) to BAM (binary format) to reduce file size and enable downstream processing
samtools view -bS ${genome}.sam -o ${genome}.bam

# Sort BAM file by genomic coordinates
# Required for indexing and most downstream analyses
samtools sort ${genome}.bam -o ${genome}_sorted.bam

# Create index file (.bai) for the sorted BAM
# Enables rapid random access to alignment data
samtools index ${genome}_sorted.bam

rm ${genome}.sam
rm ${genome}.bam

######################################################################
################## for transporter transcripts ################################
# Convert the gapseq annotation to GTF

python GTF_generate_20230815.py -T ${genome}-Transporter.tbl -o ${genome}_transporters_raw.gtf

htseq-count -r pos -t CDS -m intersection-nonempty --nonunique all -f bam ${genome}_sorted.bam ${genome}_transporters.gtf -c transporters_readscount_HTSeq.csv

--nonunique all: Reads (or read pairs) assigned to multiple features are counted as ambiguous and are also included in the counts for each assigned feature. Additionally, reads that align to more than one location are labeled as alignment_not_unique and are also counted separately for each alignment location.

-m intersection-nonempty 
This option counts reads that are ambiguously assigned to multiple features.
check the figure in the site:https://htseq.readthedocs.io/en/release_0.11.1/count.html?highlight=ambiguous#counting-reads-in-features-with-htseq-count. 

## allocate HTSeq-count at least 60 gb memory
## 3 hours might be required for each sample

######################################################################
################## for reaction transcripts ################################
# Convert the gapseq annotation to GTF

python GTF_generate_20230815.py -G ${genome}-all-Reactions.tbl -o ${genome}_reactions_raw.gtf

htseq-count -r pos -t CDS -m intersection-nonempty --nonunique all -f bam ${genome}_sorted.bam ${genome}_reactions.gtf -c reactions_readscount_HTSeq.csv

sed -i 's/\t/,/g' transporters_readscount_${transreads_version}_HTSeq.csv

### sort according to number of mapped reads
sort -n -k2 -r -t, transporters_readscount__HTSeq.csv > transporters_readscount__HTSeq_sorted.csv

######### Normalize and summarize the metatranscriptomic reads mapped to each bacterial genome and produce a summary file ########
#### Parameters a, b and c represent three replicates ##########
#    Rscript summary_transporter_reaction_0911.R -a transporters_readscount_${transreads_version_1}_HTSeq_sorted.csv -b transporters_readscount_${transreads_version_2}_HTSeq_sorted.csv -c transporters_readscount_${transreads_version_3}_HTSeq_sorted.csv -n ${genome} -g ${genome}-Transporter.tbl -o ${genome}_${reads_date}_transporter_reads_count.tsv