Jun 15, 2026

ExpressPath: RNA-seq Time Course Analysis Protocol

  • Bruno Pavletić1
  • 1Ruđer Bošković Institute
  • AdenoTeam
Icon indicating open access to content
QR code linking to this content
Protocol CitationBruno Pavletić 2026. ExpressPath: RNA-seq Time Course Analysis Protocol. protocols.io https://dx.doi.org/10.17504/protocols.io.q26g7qr9klwz/v1
License: This is an open access  protocol  distributed under the terms of the  Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: In development
We are still developing and optimizing this protocol
Created: June 11, 2026
Last Modified: June 15, 2026
Protocol  Integer ID: 318907
Keywords: seq time course analysis protocol expresspath, complete differential expression analysis with time course, complete differential expression analysis, pathway enrichment, transcription factor analysis, expresspath, rna, raw rna, lists of gene, classification of gene, gene, biological process
Abstract
ExpressPath is a pipeline that takes your raw RNA-seq count data and
produces a complete differential expression analysis with time course and
cross-cell-line comparisons. It runs from a single command and gives you:

- Lists of genes that respond to treatment (and when)
- Classification of genes into temporal patterns (transient, sustained, etc.)
- Pathway enrichment (which biological processes are affected)
- Transcription factor analysis (which regulators drive the response)
- An interactive HTML report you can open in any browser

Guidelines
RESULTS GUIDE: Understanding Your Output

The results folder contains many files. Here is what to look at first.

FIRST — Check data quality (results//qc/):

pca_plot.pdf
Each dot = one sample. Dots of the same condition should cluster
together. Different conditions should separate. If samples cluster
by batch instead of condition, there may be a batch effect.

sample_distance_heatmap.pdf
Same-condition replicates should form dark blocks (high similarity).
If a replicate looks very different from the others, investigate.

SECOND — Which genes respond (results//tables/):

signif_lrt.tsv
Every gene that changed significantly across the experiment. Open
in Excel.

Key columns:
lrt_padj — overall: did this gene change? (< 0.05 = yes)
*_log2FC — fold change for each comparison (log2 scale)
*_padj — confidence for each comparison

combined_results.tsv
Same but with ALL genes (not just significant ones).

THIRD — Temporal patterns (results//cross_temporal/):

gene_activity.tsv
When each gene goes up or down. One row per gene per cell line.

Categories:
Transient — responds early, then returns to baseline
Sustained — responds early and stays changed
Partially_Sustained — starts but fades before the last timepoint
Secondary_Deferred — responds only at the last timepoint
Complex — irregular pattern

persistence_classes.tsv
Summary: gene -> temporal category.

FOURTH — Affected pathways (results//pathway/):

gsea_kegg_signif.tsv
Enriched KEGG pathways. Open in Excel.
Key columns: NES (positive = up, negative = down), padj, contrast.

pathview_output/
KEGG pathway maps. Red = upregulated, blue = downregulated.

FIFTH — Compare cell lines (results//cross_temporal/):

cross_cellline_shared.tsv
Genes DE in both cell lines. Concordant or discordant?

cross_cellline_specific.tsv
Genes DE only in one cell line.

SIXTH — Temporal divergence (results//cross_cellline/):

cross_temporal_persistence.tsv
Between-cell-line difference over time. Categories:
Constitutive — cell lines always differ
Baseline_Only — differ before treatment, converge after
Emergent_Early — differ only at first treatment timepoint
Emergent_Sustained — treatment-induced, persistent difference
Convergent — differ at baseline, become similar later

SEVENTH — Transcription factors (results//tf/):

tf_enrichment_results.tsv
Which transcription factors are predicted to regulate DEGs.

tf_regulatory_network_*.html
Interactive TF -> target gene networks.

How to read statistics:

log2FC = how much expression changed. +1 = 2x higher, +2 = 4x,
-1 = half. Scale is logarithmic.

padj = confidence the change is real. < 0.05 = 95%+ confidence.

NES = pathway enrichment score. Positive = upregulated pathway,
negative = downregulated.
Materials
Hardware:

- Any computer running Linux, macOS, or Windows with WSL (Windows
Subsystem for Linux). A machine with at least 8 GB RAM is
recommended for typical datasets (e.g., 2 cell lines x 3 timepoints
x 3 replicates).

Software:

Windows users: install WSL first

a) Open PowerShell as Administrator (Start -> search PowerShell, right-click
-> "Run as Administrator").
b) Type: wsl --install
c) Restart your computer when prompted.
d) After restart, a Linux terminal opens. Create a username and password
(write it down — it's separate from your Windows password).
e) You now have Ubuntu running inside Windows. Use this terminal for all
steps in this protocol.
f) To find it later: open the Start menu and type "Ubuntu".

conda (package manager) — install once from

a) Open your terminal.

b) Download the Miniconda installer:
(macOS Intel: same command. Apple Silicon: replace Linux-x86_64 with MacOSX-arm64.)

c) Run the installer:
bash miniconda.sh -b -p $HOME/miniconda3

d) Add conda to your terminal:
$HOME/miniconda3/bin/conda init bash

e) Close and reopen your terminal. Verify: conda --version

snakemake — install once:
conda install -c conda-forge -c bioconda snakemake
Type this in your terminal and press Enter. Type "y" if asked to confirm.)

Verify: snakemake --version

Input data:

A tab-separated file (.tsv) with gene expression counts.

Required columns:
- First column: Gene ID (e.g., ENSG00000000003). No header needed.
- gene_name: Human-readable gene symbol (e.g., TSPAN6).
- Count columns: One per sample. Any column names work.
- Annotation columns (GO terms, KEGG pathways, protein families, etc.)
are carried through to output if present (common with eggNOG-mapper).

Example TSV:

,gene_name,A549_mock_1,A549_mock_2,A549_1h_1,...
ENSG00000000003,TSPAN6,150,162,145,...
ENSG00000000005,TNMD,0,0,0,...

How to get this file:
- featureCounts output is already in this format.
- STAR + RSEM: use rsem-generate-data-matrix.
- Salmon/Kallisto: use tximport in R.
- If in doubt: rows = genes, columns = samples, values = integer counts.
Safety warnings
"wsl: command not found" or no Ubuntu terminal
Open PowerShell as Administrator and run: wsl --install

"conda: command not found"
bash miniconda.sh -b -p $HOME/miniconda3
$HOME/miniconda3/bin/conda init bash
Close and reopen terminal. Verify: conda --version
(Apple Silicon: replace Linux-x86_64 with MacOSX-arm64.)

"snakemake: command not found"
conda install -c conda-forge -c bioconda snakemake

"ERROR: data/design.yaml not found"
Create your experiment design (Step 3) or run run.sh for guided setup.

"ERROR: source_tsv field missing" or "TSV file not found"
The filename in design.yaml doesn't match the file in data/.
Either rename your file or edit design.yaml's "source_tsv" field.

"Permission denied" when running ./run.sh
chmod +x run.sh

Pipeline stops with a red error
- Internet lost during install -> re-run
- Package install failed -> snakemake --use-conda -j1 --forceall
- Out of memory -> reduce cores (-j2 instead of -j4)

"I changed config.yaml but nothing changed"
cd pipeline && snakemake --use-conda --forceall

"Some genes have log2FC = 30"
Normal. Gene went from "off" to "on." Check baseMean for raw counts.

"Heatmap looks white except a few genes"
The color scale is clamped to prevent extreme outliers from washing
out the rest. White genes still have real changes — just smaller
magnitude.

Windows/WSL: can't find my files
Your Windows drives are under /mnt/:
C:\Users\YourName\Desktop -> /mnt/c/Users/YourName/Desktop
D:\data\counts.tsv -> /mnt/d/data/counts.tsv

Windows/WSL: "git: command not found"
sudo apt update && sudo apt install git -y

Windows/WSL: "curl: command not found"
sudo apt update && sudo apt install curl -y

Where to get help:
Documentation: pipeline/README_explanation.md
Before start
Checklist:

[ ] Windows users: WSL is installed (type "ubuntu" in Start menu)
[ ] conda is installed (open terminal, type "conda --version")
[ ] snakemake is installed (type "snakemake --version")
[ ] Your count TSV is ready with a gene_name column
[ ] You know your experimental design:
- Cell line names (e.g., A549, E6)
- Time points (e.g., mock, 1h, 3h)
- Treatment name (e.g., Ad26)
- Number of replicates per condition
- Batch labels (if samples were processed in batches; use "1"
for all if there is no batch effect)

If any software is missing, install it before proceeding.
Get the ExpressPath code
Open your terminal (Terminal on macOS/Linux, or WSL on Windows) and run:


This downloads the pipeline into a folder called "ExpressPath" and moves you into it.

You should see files like "run.sh", "README.md", and folders named "pipeline/", "data/".
Place your count data in the right folder
Copy your count TSV file into the "data" folder inside ExpressPath:

cp /path/to/your/counts.tsv data/

Replace "/path/to/your/counts.tsv" with the actual path to your file.

The filename can be anything (e.g., "my_experiment_counts.tsv").
Just remember this name - you will need it in the next step.
Create your experiment design
The experiment design tells the pipeline:
- What your cell lines are called
- What your time points are
- Which TSV column belongs to which sample
- Which comparisons to run

Two options — choose one:

OPTION A — Use the browser-based setup tool

a) Open the file "pipeline/setup_design.html" in your browser.
- Double-click it in your file manager, or
- Right-click -> "Open with" -> your browser, OR
- In terminal:

xdg-open pipeline/setup_design.html

b) A form appears. Fill in:

Experiment Info:
- TSV File: Click "Browse" and select your counts TSV
- Treatment Name: e.g., Ad26
- Time Points: click "+" to add: mock, 1h, 3h
(The first time point is the reference/baseline.)

Cell Lines:
- Click "Add Cell Line" for each one.
- Enter the name (e.g., A549).
- Set the number of replicates.
- For each replicate, pick the matching column from your TSV.
- (Optional) Set a batch label if samples were processed
separately.

Comparison Flags: Leave all four switched ON (default).
You can turn off specific types later by editing
"data/design.yaml" by hand.

c) Click "Download design.yaml".
Save it into the "data/" folder inside ExpressPath.
Make sure the filename is exactly "design.yaml".

OPTION B — Edit the template by hand

Copy the example and open it in any text editor:

cp data/design.example.yaml data/design.yaml
nano data/design.yaml (or use gedit, vim, Notepad, etc.)

Edit the following:
- source_tsv: "your_counts.tsv"
- Under "cell_lines": change "A549" and "E6" to your lines
- Under "timepoints": change list to yours
- Under "column_map": replace each sample column name with yours from the TSV.
Format:
"YourSampleColumnName": {cell_line: "A549", time: "mock",
replicate: 1, batch: "1"}

Save and close.
(Optional) Adjust pipeline settings
If needed, edit "pipeline/config.yaml" in a text editor:

nano pipeline/config.yaml

Settings you might change:

pvalue_threshold: 0.05 <- significance cutoff
count_filter_threshold: 10 <- minimum reads to keep a gene
temporal_n_clusters: 6 <- number of expression patterns
gsea_min_set_size: 15 <- smallest pathway to test

The defaults work well for most experiments. You do NOT need to change anything for your first run.
Run the pipeline
In the terminal, make sure you are inside the ExpressPath folder.
Then run:

./run.sh

What happens:
- The script checks that conda, snakemake, and your design are ready.
- If design.yaml is missing, it offers to help you create one.
- Snakemake starts processing: the first run installs R packages
automatically (this takes 10-20 minutes and only happens once).
- You will see progress messages as each analysis step completes.

Options:

./run.sh -n #Dry-run — shows what would run without running it.
./run.sh -j8 #Use 8 CPU cores (faster; adjust to your machine).
./run.sh -n Dry-run — shows what would run without running it.
./run.sh -j8 Use 8 CPU cores (faster; adjust to your machine).

Runtime: ~15-30 minutes for a typical 2x3x3 design on a modern laptop. The first run is slower because conda installs packages.
View your results
When the pipeline finishes, the terminal prints the path to your
report. It looks something like:

Report: results/20260611_143052/pathway/interactive_report.html
Output: results/20260611_143052/

The timestamp (20260611_143052) is generated automatically from the
date and time of the run.

Open the report:
- Double-click the HTML file in your file manager, or
- Type: xdg-open results/20260611_143052/pathway/interactive_report.html

Expected runtime:
First run: 30-60 minutes (installs R packages automatically)
Later runs: 15-30 minutes (packages already installed)

Disk space: output is typically 50-200 MB.

Re-running: each run creates a new timestamped folder in results/.
Old results are never overwritten.

Batch effects: if your samples were processed in different batches,
assign batch labels in the experiment design (Step 3). The pipeline
includes batch as a covariate in the DESeq2 model. Use the PCA batch
plot (qc/pca_batch_plot.pdf) to verify correction.