ExpressPath: RNA-seq Time Course Analysis Protocol

Bruno Pavletić

Jun 15, 2026

ExpressPath: RNA-seq Time Course Analysis Protocol

DOI

https://dx.doi.org/10.17504/protocols.io.q26g7qr9klwz/v1

Bruno Pavletić¹

¹Ruđer Bošković Institute

AdenoTeam

Bruno Pavletić

Ruđer Bošković Institute

DOI: https://dx.doi.org/10.17504/protocols.io.q26g7qr9klwz/v1

External link: https://github.com/bruPav/ExpressPath

Protocol Citation: Bruno Pavletić 2026. ExpressPath: RNA-seq Time Course Analysis Protocol. protocols.io https://dx.doi.org/10.17504/protocols.io.q26g7qr9klwz/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: In development

We are still developing and optimizing this protocol

Created: June 11, 2026

Last Modified: June 15, 2026

Protocol Integer ID: 318907

Keywords: seq time course analysis protocol expresspath, complete differential expression analysis with time course, complete differential expression analysis, pathway enrichment, transcription factor analysis, expresspath, rna, raw rna, lists of gene, classification of gene, gene, biological process

Abstract

ExpressPath is a pipeline that takes your raw RNA-seq count data and
produces a complete differential expression analysis with time course and
cross-cell-line comparisons. It runs from a single command and gives you:

  - Lists of genes that respond to treatment (and when)
  - Classification of genes into temporal patterns (transient, sustained, etc.)
  - Pathway enrichment (which biological processes are affected)
  - Transcription factor analysis (which regulators drive the response)
  - An interactive HTML report you can open in any browser

GitHub: https://github.com/bruPav/ExpressPath

Guidelines

RESULTS GUIDE: Understanding Your Output

The results folder contains many files. Here is what to look at first.

FIRST — Check data quality (results//qc/):

  pca_plot.pdf
    Each dot = one sample. Dots of the same condition should cluster
    together. Different conditions should separate. If samples cluster
    by batch instead of condition, there may be a batch effect.

  sample_distance_heatmap.pdf
    Same-condition replicates should form dark blocks (high similarity).
    If a replicate looks very different from the others, investigate.

SECOND — Which genes respond (results//tables/):

  signif_lrt.tsv
    Every gene that changed significantly across the experiment. Open
    in Excel.

    Key columns:
      lrt_padj    — overall: did this gene change? (< 0.05 = yes)
      *_log2FC    — fold change for each comparison (log2 scale)
      *_padj      — confidence for each comparison

  combined_results.tsv
    Same but with ALL genes (not just significant ones).

THIRD — Temporal patterns (results//cross_temporal/):

  gene_activity.tsv
    When each gene goes up or down. One row per gene per cell line.

    Categories:
      Transient         — responds early, then returns to baseline
      Sustained         — responds early and stays changed
      Partially_Sustained — starts but fades before the last timepoint
      Secondary_Deferred — responds only at the last timepoint
      Complex           — irregular pattern

  persistence_classes.tsv
    Summary: gene -> temporal category.

FOURTH — Affected pathways (results//pathway/):

  gsea_kegg_signif.tsv
    Enriched KEGG pathways. Open in Excel.
    Key columns: NES (positive = up, negative = down), padj, contrast.

  pathview_output/
    KEGG pathway maps. Red = upregulated, blue = downregulated.

FIFTH — Compare cell lines (results//cross_temporal/):

  cross_cellline_shared.tsv
    Genes DE in both cell lines. Concordant or discordant?

  cross_cellline_specific.tsv
    Genes DE only in one cell line.

SIXTH — Temporal divergence (results//cross_cellline/):

  cross_temporal_persistence.tsv
    Between-cell-line difference over time. Categories:
      Constitutive       — cell lines always differ
      Baseline_Only      — differ before treatment, converge after
      Emergent_Early     — differ only at first treatment timepoint
      Emergent_Sustained — treatment-induced, persistent difference
      Convergent         — differ at baseline, become similar later

SEVENTH — Transcription factors (results//tf/):

  tf_enrichment_results.tsv
    Which transcription factors are predicted to regulate DEGs.

  tf_regulatory_network_*.html
    Interactive TF -> target gene networks.

How to read statistics:

  log2FC = how much expression changed. +1 = 2x higher, +2 = 4x,
           -1 = half. Scale is logarithmic.

  padj = confidence the change is real. < 0.05 = 95%+ confidence.

  NES = pathway enrichment score. Positive = upregulated pathway,
        negative = downregulated.

Materials

Hardware:

  - Any computer running Linux, macOS, or Windows with WSL (Windows
    Subsystem for Linux). A machine with at least 8 GB RAM is
    recommended for typical datasets (e.g., 2 cell lines x 3 timepoints
    x 3 replicates).

Software:

Windows users: install WSL first

    a) Open PowerShell as Administrator (Start -> search PowerShell, right-click
       -> "Run as Administrator").
    b) Type: wsl --install
    c) Restart your computer when prompted.
    d) After restart, a Linux terminal opens. Create a username and password
       (write it down — it's separate from your Windows password).
    e) You now have Ubuntu running inside Windows. Use this terminal for all
       steps in this protocol.
    f) To find it later: open the Start menu and type "Ubuntu".

conda (package manager) — install once from
    https://docs.conda.io/en/latest/miniconda.html

    a) Open your terminal.

    b) Download the Miniconda installer:
         curl -L -o miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
       (macOS Intel: same command. Apple Silicon: replace Linux-x86_64 with MacOSX-arm64.)

    c) Run the installer:
         bash miniconda.sh -b -p $HOME/miniconda3

    d) Add conda to your terminal:
         $HOME/miniconda3/bin/conda init bash

    e) Close and reopen your terminal. Verify: conda --version

snakemake — install once:
    conda install -c conda-forge -c bioconda snakemake
    Type this in your terminal and press Enter. Type "y" if asked to confirm.)

    Verify: snakemake --version

Input data:

A tab-separated file (.tsv) with gene expression counts.

Required columns:
    - First column:  Gene ID (e.g., ENSG00000000003). No header needed.
    - gene_name:     Human-readable gene symbol (e.g., TSPAN6).
    - Count columns: One per sample. Any column names work.
    - Annotation columns (GO terms, KEGG pathways, protein families, etc.)
      are carried through to output if present (common with eggNOG-mapper).

Example TSV:

        ,gene_name,A549_mock_1,A549_mock_2,A549_1h_1,...
        ENSG00000000003,TSPAN6,150,162,145,...
        ENSG00000000005,TNMD,0,0,0,...

How to get this file:
    - featureCounts output is already in this format.
    - STAR + RSEM: use rsem-generate-data-matrix.
    - Salmon/Kallisto: use tximport in R.
    - If in doubt: rows = genes, columns = samples, values = integer counts.

Safety warnings

"wsl: command not found" or no Ubuntu terminal
  Open PowerShell as Administrator and run: wsl --install
  If fails: https://learn.microsoft.com/en-us/windows/wsl/install

"conda: command not found"
  curl -L -o miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
  bash miniconda.sh -b -p $HOME/miniconda3
  $HOME/miniconda3/bin/conda init bash
  Close and reopen terminal. Verify: conda --version
  (Apple Silicon: replace Linux-x86_64 with MacOSX-arm64.)

"snakemake: command not found"
  conda install -c conda-forge -c bioconda snakemake

"ERROR: data/design.yaml not found"
  Create your experiment design (Step 3) or run run.sh for guided setup.

"ERROR: source_tsv field missing" or "TSV file not found"
  The filename in design.yaml doesn't match the file in data/.
  Either rename your file or edit design.yaml's "source_tsv" field.

"Permission denied" when running ./run.sh
  chmod +x run.sh

Pipeline stops with a red error
  - Internet lost during install -> re-run
  - Package install failed -> snakemake --use-conda -j1 --forceall
  - Out of memory -> reduce cores (-j2 instead of -j4)

"I changed config.yaml but nothing changed"
  cd pipeline && snakemake --use-conda --forceall

"Some genes have log2FC = 30"
  Normal. Gene went from "off" to "on." Check baseMean for raw counts.

"Heatmap looks white except a few genes"
  The color scale is clamped to prevent extreme outliers from washing
  out the rest. White genes still have real changes — just smaller
  magnitude.

Windows/WSL: can't find my files
  Your Windows drives are under /mnt/:
    C:\Users\YourName\Desktop  ->  /mnt/c/Users/YourName/Desktop
    D:\data\counts.tsv         ->  /mnt/d/data/counts.tsv

Windows/WSL: "git: command not found"
  sudo apt update && sudo apt install git -y

Windows/WSL: "curl: command not found"
  sudo apt update && sudo apt install curl -y

Where to get help:
  Documentation: pipeline/README_explanation.md
  Issues: https://github.com/bruPav/ExpressPath/issues
  Email: [email protected]

Before start

Checklist:

  [ ] Windows users: WSL is installed (type "ubuntu" in Start menu)
  [ ] conda is installed (open terminal, type "conda --version")
  [ ] snakemake is installed (type "snakemake --version")
  [ ] Your count TSV is ready with a gene_name column
  [ ] You know your experimental design:
        - Cell line names (e.g., A549, E6)
        - Time points (e.g., mock, 1h, 3h)
        - Treatment name (e.g., Ad26)
        - Number of replicates per condition
        - Batch labels (if samples were processed in batches; use "1"
          for all if there is no batch effect)

  If any software is missing, install it before proceeding.

Get the ExpressPath code

Open your terminal (Terminal on macOS/Linux, or WSL on Windows) and run:

git clone https://github.com/bruPav/ExpressPath.git
cd ExpressPath

This downloads the pipeline into a folder called "ExpressPath" and moves you into it. 

You should see files like "run.sh", "README.md", and folders named "pipeline/", "data/".

Place your count data in the right folder

Copy your count TSV file into the "data" folder inside ExpressPath:

cp /path/to/your/counts.tsv data/

Replace "/path/to/your/counts.tsv" with the actual path to your file.

The filename can be anything (e.g., "my_experiment_counts.tsv").
Just remember this name - you will need it in the next step.

Create your experiment design

The experiment design tells the pipeline:
  - What your cell lines are called
  - What your time points are
  - Which TSV column belongs to which sample
  - Which comparisons to run

Two options — choose one:

OPTION A — Use the browser-based setup tool

  a) Open the file "pipeline/setup_design.html" in your browser.
     - Double-click it in your file manager, or
     - Right-click -> "Open with" -> your browser, OR
     - In terminal:

xdg-open pipeline/setup_design.html

  b) A form appears. Fill in:

     Experiment Info:
       - TSV File: Click "Browse" and select your counts TSV
       - Treatment Name: e.g., Ad26
       - Time Points: click "+" to add: mock, 1h, 3h
         (The first time point is the reference/baseline.)

     Cell Lines:
       - Click "Add Cell Line" for each one.
       - Enter the name (e.g., A549).
       - Set the number of replicates.
       - For each replicate, pick the matching column from your TSV.
       - (Optional) Set a batch label if samples were processed
         separately.

     Comparison Flags: Leave all four switched ON (default).
       You can turn off specific types later by editing
       "data/design.yaml" by hand.

  c) Click "Download design.yaml".
     Save it into the "data/" folder inside ExpressPath.
     Make sure the filename is exactly "design.yaml".

OPTION B — Edit the template by hand

  Copy the example and open it in any text editor:

    cp data/design.example.yaml data/design.yaml
    nano data/design.yaml   (or use gedit, vim, Notepad, etc.)

  Edit the following:
    - source_tsv: "your_counts.tsv"
    - Under "cell_lines": change "A549" and "E6" to your lines
    - Under "timepoints": change list to yours
    - Under "column_map": replace each sample column name with yours from the TSV. 
       Format:
        "YourSampleColumnName": {cell_line: "A549", time: "mock",
                                  replicate: 1, batch: "1"}

  Save and close.

(Optional) Adjust pipeline settings

If needed, edit "pipeline/config.yaml" in a text editor:

nano pipeline/config.yaml

Settings you might change:

  pvalue_threshold: 0.05         <- significance cutoff
  count_filter_threshold: 10     <- minimum reads to keep a gene
  temporal_n_clusters: 6         <- number of expression patterns
  gsea_min_set_size: 15          <- smallest pathway to test

The defaults work well for most experiments. You do NOT need to change anything for your first run.

Run the pipeline

In the terminal, make sure you are inside the ExpressPath folder.
Then run:

./run.sh

What happens:
  - The script checks that conda, snakemake, and your design are ready.
  - If design.yaml is missing, it offers to help you create one.
  - Snakemake starts processing: the first run installs R packages 
     automatically (this takes 10-20 minutes and only happens once).
  - You will see progress messages as each analysis step completes.

Options:

./run.sh -n       #Dry-run — shows what would run without running it.
./run.sh -j8      #Use 8 CPU cores (faster; adjust to your machine).
 
 ./run.sh -n       Dry-run — shows what would run without running it.
  ./run.sh -j8      Use 8 CPU cores (faster; adjust to your machine).

Runtime: ~15-30 minutes for a typical 2x3x3 design on a modern laptop. The first run is slower because conda installs packages.

View your results

When the pipeline finishes, the terminal prints the path to your
report. It looks something  like:

  Report: results/20260611_143052/pathway/interactive_report.html
  Output: results/20260611_143052/

The timestamp (20260611_143052) is generated automatically from the
date and time of the run.

Open the report:
  - Double-click the HTML file in your file manager, or
  - Type: xdg-open results/20260611_143052/pathway/interactive_report.html

Expected runtime:
  First run:  30-60 minutes (installs R packages automatically)
  Later runs: 15-30 minutes (packages already installed)

Disk space: output is typically 50-200 MB.

Re-running: each run creates a new timestamped folder in results/.
Old results are never overwritten.

Batch effects: if your samples were processed in different batches,
assign batch labels in the experiment design (Step 3). The pipeline
includes batch as a covariate in the DESeq2 model. Use the PCA batch
plot (qc/pca_batch_plot.pdf) to verify correction.