Jun 09, 2026

Data Processing Protocol: Metabolomic Analysis and Compound Annotation

  • Alice M.S. RODRIGUES1,
  • Clémence ROHÉE1,
  • Yoan FERANDIN1,
  • Emeline HOUEL1,
  • Didier STIEN1,
  • Didier Stien2
  • 1Sorbonne Université, CNRS, Laboratoire de Biodiversité et Biotechnologies Microbiennes, LBBM, Observatoire Océanologique, 66650 Banyuls‑sur‑mer, France;
  • 2CNRS
  • EUREMAP PROTOCOLS COMMUNITY
  • EUREMAP_SU_LBBM
Icon indicating open access to content
QR code linking to this content
Protocol CitationAlice M.S. RODRIGUES, Clémence ROHÉE, Yoan FERANDIN, Emeline HOUEL, Didier STIEN, Didier Stien 2026. Data Processing Protocol: Metabolomic Analysis and Compound Annotation. protocols.io https://dx.doi.org/10.17504/protocols.io.4r3l2x59xv1y/v1
License: This is an open access  protocol  distributed under the terms of the  Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: June 09, 2026
Last Modified: June 09, 2026
Protocol  Integer ID: 318743
Keywords: targeted metabolomics context, metabolomics context, metabolomic analysis, compound annotation from uhplc, ms analysis, compound annotation description of the workflow, compound annotation description, compound annotation, data processing, m
Funders Acknowledgements:
EUROPEAN UNION
Grant ID: INFRA-2023-DEV-01-04
Abstract
Description of the workflow used for data processing, molecular networking, and compound annotation from UHPLC-MS/MS analyses in a non-targeted metabolomics context.
Materials
1. Software used: MZmine 4 (v4.8.30) was used to preprocess UHPLC-MS/MS raw data.

2. Platform: Processed data from MZmine were uploaded to the GNPS platform (Global Natural Products Social Molecular Network, https://gnps.ucsd.edu).

3. Networking algorithm: Feature-Based Molecular Networking (FBMN) can be used to construct molecular networks, treating each feature independently (Nothias et al., 2020).

4. Network visualization: Molecular networks are visualized using Cytoscape (open-source software from the Institute for Systems Biology, https://cytoscape.org).

5. Dereplicator (GNPS): Spectral annotation tool within FBMN workflow.

6. SIRIUS® (Jena Bioinformatics): Predicts molecular formulas, structural candidates, or chemical classes based on MS/MS fragmentation data.

7. MoNa (MassBank of North America): Manual spectral comparison tool.

8. Verification databases: Proposed formulas and structures are cross-checked against:
- SciFinder®
- PubChem
- COCONUT / LOTUS (Natural Products Online databases)
- LIPID MAPS® (LMSD)

9. Software used: MS1 and MS2 spectra are analyzed in FreeStyle® (Thermo Fisher Scientific) for adduct identification and formula validation.
Molecular Networking
Platform: Processed data from MZmine were uploaded to the GNPS platform (Global Natural Products Social Molecular Network, https://gnps.ucsd.edu).
Raw Data Preprocessing
Software used: MZmine 4 (v4.8.30) was used to preprocess UHPLC-MS/MS raw data.
Preprocessing output:
Ion alignment table
Fragment list (MS/MS spectra linked to parent ions)
Molecular Networking
Networking algorithm:
Feature-Based Molecular Networking (FBMN) can be used to construct molecular networks, treating each feature independently (Nothias et al., 2020).
Network visualization: Molecular networks are visualized using Cytoscape (open-source software from the Institute for Systems Biology, https://cytoscape.org).
Compound Annotation Strategy
Two complementary strategies were applied for compound annotation:
Automatic and Semi-Automatic Annotation
Dereplicator (GNPS):
Spectral annotation tool within FBMN workflow.
Matches experimental MS/MS spectra with theoretical spectra generated in silico from known natural product structures.
Annotates a compound when a match is found (Allard et al., 2016; Mohimani et al., 2017).
SIRIUS® (Jena Bioinformatics):
Predicts molecular formulas, structural candidates, or chemical classes based on MS/MS fragmentation data.
MoNa (MassBank of North America):
Manual spectral comparison tool.
Allows insertion of individual MS/MS spectra and matches against public experimental databases.
Verification databases:
Proposed formulas and structures are cross-checked against:
SciFinder®
PubChem
COCONUT / LOTUS (Natural Products Online databases)
LIPID MAPS® (LMSD)
Manual Annotation and Confirmation
Software used:
MS1 and MS2 spectra are analyzed in FreeStyle® (Thermo Fisher Scientific) for adduct identification and formula validation.
Manual MS^2^ interpretation:
Fragmentation patterns of each ion are manually interpreted to:
Confirm or refute automatic annotations
Suggest new structural hypotheses
Propagation in molecular networks:
Neighboring ions in the network are examined.
Based on structural similarity and fragmentation logic, molecular modifications are inferred and propagated to propose annotations for related compounds.
Protocol references
Nothias et al., 2020; Allard et al., 2016; Mohimani et al., 2017.