Large Targeted Metabolomics - Data Treatment - MetaboHUB-Tours

Staff Members of PMAC

Jun 29, 2026

Large Targeted Metabolomics - Data Treatment - MetaboHUB-Tours

DOI

https://dx.doi.org/10.17504/protocols.io.kqdg3r6beg25/v1

Large Targeted Metabolomics - Data Treatment - MetaboHUB-Tours

Staff Members of PMAC^1,2

¹Plateforme de Métabolomique et d'Analyses Chimiques, US61 ASB, Université de Tours, CHRU Tours, Inserm, Tours, France;
²MetaboHUB-Tours, Tours, France

MetaboHUB-Tours

Jérémy Monteiro

DOI: https://dx.doi.org/10.17504/protocols.io.kqdg3r6beg25/v1

External link: https://metabolomique.med.univ-tours.fr/

Protocol Citation: Staff Members of PMAC 2026. Large Targeted Metabolomics - Data Treatment - MetaboHUB-Tours. protocols.io https://dx.doi.org/10.17504/protocols.io.kqdg3r6beg25/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: June 29, 2026

Last Modified: June 29, 2026

Protocol Integer ID: 319992

Keywords: metabolomics, LC-MS, targeted, IROA, Q-Exactive, metabolomic coverage, large targeted metabolomics-lc, analysis of metabolite, metabolite, sample preparation, present in biological sample, cv of qc sample, methods after sample preparation, biological sample, qc sample, protein precipitation, data treatment data, large targeted metabolomic, metabohub, ms parameter, data treatment

Funders Acknowledgements:

Agence Nationale de la Recherche

Grant ID: ANR-11-INBS-0010 MetaboHUB

Agence Nationale de la recherche au titre de France 2030

Grant ID: ANR-21-ESRE-0035

Disclaimer

N/A

Abstract

I. PURPOSE
Identification and relative quantification (based on intensity) analysis of metabolites present in biological sample for metabolomic coverage.

II. MATERIAL & METHODS
After sample preparation which consisted in a protein precipitation, samples are analyzed by ES-LC-HRMS (Q-Orbitrap) by 3 modalities: C18 in both positive and negative mode & HILIC in positive mode.

III. DATA TREATMENT
Data are analyzed using Workflow4Metabolomics open-source software that allows to detect all features. features are identified with the use of IROA-database, based on the retention time and the m/z

IV. QUALITY CONTROL
-Checking the system suitability
- CV of QC samples (injected/10 injections) < 30 %

Image Attribution

Laurent Galineau (Plateforme Imagerie Préclinique, US-61 ASB, Université de Tours, CHRU Tours, Inserm, Tours, France ; Université de Tours, INSERM, Imaging Brain & Neuropsychiatry iBraiN U1253, 37032, Tours, France)
Camille Dupuy (Plateforme de Métabolomique et d'Analyses Chimiques, US61 ASB, Université de Tours, CHRU Tours, Inserm, Tours, France; MetaboHUB-Tours, Tours, France; Université de Tours, INSERM, Imaging Brain & Neuropsychiatry iBraiN U1253, 37032, Tours, France)

Guidelines

Metabolomics is the  “omics” science studying metabolites and can meet the need for a deeper insight into biological data. It offers a real-time snapshot of a metabolic state and so reflect immediate physiological changes.
By revealing metabolic profiles, metabolomics allows for precise diagnosis, tailored treatments, and proactive health management, ensuring highly personalized and effective interventions.
(Nordström, A., Lewensohn, R. Metabolomics: Moving to the Clinic. J Neuroimmune Pharmacol 5, 4–17 (2010). https://doi.org/10.1007/s11481-009-9156-4 )

Large targeted metabolomics analyses were performed on an UPLC Ultimate WPS-3000 system (Dionex, Germany) coupled to a Q-Exactive mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) equipped with a HESI source (head electrospray ionisation). Xcalibur 2.2 software (Thermo Fisher Scientific, Bremen, Germany) controlled the system.

For exploratory analyses, features were considered for further analysis only if they were detected in all injections and their CV were less than 30 % after signal normalization. The identification of metabolites was performed based on MS² spectra and retention time compared with the in-house injected database of > 600 compounds from IROA (MSMLS, FAMLS, BACSMLS & OAMLS from Merck).

Materials

LC-MS Sytem
A LC-MS system, designed for high-resolution analysis
Columns : reversed phase and HILIC

Reagents
Formic acid >99% for LC-MS VWR International (Avantor)
Water milliQMerck MilliporeSigma (SigmaAldrich)
Acetonitrile HiPerSolv CHROMANORM VWR International (Avantor)
Methanol HiPerSolv CHROMANORM VWR International (Avantor)
Calibration Solution Pierce ESI Negative Ion (Thermo Fisher)
Calibration Solution Pierce LTQ Velos ESI Positive Ion (Thermo Fisher)
IROA MSMLS, OAMLS, BACSMLS, FAMLS Merck MilliporeSigma (Sigma-Aldrich)
Formiate ammonium VWR International (Avantor)
Indoxyl-3-sulfate Merck Millipore Sigma (Sigma-Aldrich)
Kynurenic acid Merck Millipore Sigma (Sigma-Aldrich)
L-tryptophan Merck Millipore Sigma (Sigma-Aldrich )
Indole-3-Lactic Acid Merck Millipore Sigma (Sigma-Aldrich)

Safety warnings

All sample handling must be carried out under a fume hood, equipped with traditional PPE: lab coat, goggles, and gloves.

Data Analysis

2d 4h

".raw" files are converted into ".mzML" files on MSconvert, with the filter "PeakPicking : MSlevels 1-2"

mzML files are uploaded on W4M, as a collection

Note
Keep the "mzML" extension in name file when creating the collection

      MSnbase readMSData

   xcms get a sampleMetadata file
Download the tsv file and modified it to add some columns:
sampleType : pool, blank, sample
injectionOrder
mode : pos, neg
 batch
Uploaded this updated sampleMetadata on W4M

   xcms plot chromatogram

Expected result
pdf files with TIC by sampleType and by sample displayed

xcms findChromPeaks (xcmsSet)
--> Spectra Filters
Filter on Acquisition Numbers*,# : NA
Filter on Retention Time (s)* : 30,600
Filter on Mz# : 60,850
--> Extraction method for peaks detection
CentWave – chromatographic peak detection using centWave method
Max tolerated ppm m/z deviation in consecutive scans in ppm# : 3
Min,Max peak in second* : 4,20
--> Advanced Options
Signal to Noise ratio cutoff : 3
Prefilter step for the first analysis step (ROI detection) £,#: 3,20000
Name of the function to calculate the m/z center of the chromatographic peak: mean of the peaks’ m/z values
Integration method :peak limits are found through descent on the Mexican hat filtered data
Minimum difference in m/z peaks with overlapping retention times : -0.001
Fitgauss : No
Noise filter: 10000
Verbose Columns : No
Get a list of found chromatographic peaks : No
List of regions-of-interest (ROI): NA

      xcms findChromPeaks Merger

xcms groupChromPeaks (group)
--> Method to use for grouping
PeakDensity – peak grouping based on time dimension peak density
Bandwidth(s): 5
Minimum fraction of samples: 0.25
Minimum number of samples : 5
Width of overlapping m/z slices : 0.01
--> Advanced Options
Maximum number of groups to identify in a single m/z slice : 50
--> Get the Peak List 
Yes
Convert retention time (seconds) into minutes : no
Number of decimal places for mass values reported in ions “identifiers”: 5
Number of decimal places for retention time reported in ions “identifiers” : 2
Reported intensity values : maxo
If NA values remain replace them by 0 in the dataMatrix: No

xcms adjustRtime (retcor)
--> Method to use for retention time correction
PeakGroups – retention time correction based on alignment of features (peak groups) present in most/all samples
Minimum required fraction of samples in which peaks for the peak group were identified: 0.85-0.9
Maximum number of additional peaks for all samples to be assigned to a peak group for retention time correction : 1
Smooth method: loess & Advanced options : NA

xcms groupChromPeaks (group)
--> Method to use for grouping
PeakDensity – peak grouping based on time dimension peak density
Bandwidth(s): 3
Minimum fraction of samples: 0.25
Minimum number of samples : 5
Width of overlapping m/z slices : 0.01
--> Advanced Options
Maximum number of groups to identify in a single m/z slice : 50
--> Get the Peak List 
Yes
Convert retention time (seconds) into minutes : no
Number of decimal places for mass values reported in ions “identifiers”: 5
Number of decimal places for retention time reported in ions “identifiers” : 2
Reported intensity values : maxo
If NA values remain replace them by 0 in the dataMatrix: No

xcms adjustRtime (retcor)
--> Method to use for retention time correction
PeakGroups – retention time correction based on alignment of features (peak groups) present in most/all samples
Minimum required fraction of samples in which peaks for the peak group were identified: 0.85-0.9
Maximum number of additional peaks for all samples to be assigned to a peak group for retention time correction : 1
Smooth method: loess & Advanced options : NA

xcms groupChromPeaks (group)
--> Method to use for grouping
PeakDensity – peak grouping based on time dimension peak density
Bandwidth(s): 1
Minimum fraction of samples: 0.25
Minimum number of samples : 5
Width of overlapping m/z slices : 0.01
--> Advanced Options
Maximum number of groups to identify in a single m/z slice : 50
--> Get the Peak List 
Yes
Convert retention time (seconds) into minutes : no
Number of decimal places for mass values reported in ions “identifiers”: 5
Number of decimal places for retention time reported in ions “identifiers” : 2
Reported intensity values : maxo
If NA values remain replace them by 0 in the dataMatrix: No

xcms fillChromPeaks (fillPeaks)
--> Advanced Options: NA
--> Peak List
Convert retention time (seconds) in minutes : Yes
Number of decimal places for mass values reported in ions “identifiers” : 5
Number of decimal places for retention time reported in ions “identifiers” : 2 
Reported intensity values : maxo
--> If NA values remain, replace them by 0 in the dataMatrix : No

CAMERA groupFWHM (Galaxy Version 0.1.0+camera1.48.0-galaxy1)
--> The multiplier of the standard deviation: 6
--> Percentage of the width of the FWHM: 0.6
--> Polarity : Positive or Negative
--> Advanced parameters
FALSE
--> Convert seconds to minutes when exporting tsv : No
--> Number of digits for MZ values : 4
--> Number of digits for RT values : 4

CAMERA findIsotopes (Galaxy Version 0.1.0+camera1.48.0-galaxy1)
--> Max. ion charge : 2
--> Max. number of expected isotopes: 2
--> ppm error for the search : 5
--> Allowed variance for the search : 0.01
--> Choose intensity values for C12/C13 check. Allowed values are into, maxo, intb: maxo
--> The percentage number of samples, which satisfy the C12/C13 rule for isotope annotation : 0.7
--> Shloud C12/C13 filter be applied : No
-->Convert seconds to minutes when exporting tsv : No
--> Number of digits for MZ values (namecustom): 4
--> Number of digits for RT values (namecustom): 4

CAMERA findAdducts (Galaxy Version 0.1.0+camera1.48.0-galaxy1)*
--> General ppm error : 5
--> General absolute error in m/z : 0.015
--> Which polarity mode was used for measuring of the MS sample : negative or positive
--> Use a personal ruleset file : FALSE
--> Choose intensity values : maxo
--> Advanced parameters
False
-->Convert seconds to minutes when exporting tsv : Yes
-->Number of digits for MZ values (namecustom): 4
-->Number of digits for RT values (namecustom) : 4

Several "Intensity Check" and "Generic_filter" are realized on missing values, fold_change between blank and others sampleType, quartile Q3 values of ions in pool samples

Then, in dataMatrix, NA are replaced by 0

Batch correction (Galaxy Version 3.0.0)
--> Sample metadata file coding parameter
Batch column name : batch
Injection order column name : injectionOrder
Sample type column name: sampleType
Set the name used to tag samples as pool in the sample type column : pool
Set the name used to tag samples as blank in the sample type column : blank
Set the name used to tag samples asreal sample in the sample type column : sample
--> Type of regression model
Loess
Span : not modified
Unconsistant values : Prevent it
 Factor of interest : batch
Level of details for plots : basic, standard or complete

Note
This step involves “normalizing” the signal. However, if biological sample are different, for example, if there are more or fewer cells in pellets, “biological” normalization is an additional step that must be performed.

Quality Metrics
--> Coefficient of Variation
Yes
Which type of CV calculation should be done : only pool CV
Threshold : 0.3
--> Advanced parameters: default

Generic filter on CV(pool) >0.3

Between-table Correlation (Galaxy Version 1.0.0) : put twice the dataMatrix and indicate that samples are in column for both tables

Analytical correlation filtration (Galaxy Version 2019-06-20)
--> Correlation threshold : 0.85
-->Do you want to take into account mass differences between 2 ions ?
Yes
Do
you have your own list of mass differences or do you want to use a default list ? No
Mass difference rang : 0.005
--> Do you want to take into account retention time differences between 2 ions ?
Yes
Retention time difference threshold : 0.1
Which representative ion do you want to select for each group : highest intensity

Generic_filter (Galaxy Version 1.3.0)

Filter on ACorF_groups to remove "0" and "-"

Expected result
A .tsv file where every line correspond to a unique feature FOR EACH MODALITY

Metabolites Identification - Level 2
Identifications are based on the match between experimentals m/z and RT with the ones from an in-house chemical library (standards from MSMLS, OAMLS, BACSMLS and FAMLS from IROA Technologies) with Xcalibur, software from ThermoScientific

Expected result
3 files (HILIC POS, C18 POS & C18 NEG) ".pmd" files with a list of study-calibrated RT and mz for each compounds

Metabolites Identification - Level 1
Experimental MS2 spectrum are compared with an in-house chemical library (standards from MSMLS, OAMLS, BACSMLS and FAMLS from IROA Technologies) with FlashEntropy.
For the metabolites presents in the previous step and in this step (if the “identity_search>0.7” score is met), the level is upgraded in level 1


Note
All spectrum from database have been concatenated in one msp file with FragHub

Data Curation = Signal Normalization & Metabolites Exclusion :
         - Normalization of metabolite areas to the total area of detected metabolites
         - Calculation of the coefficient of variation (CV) for each metabolite in the QCs (analytical variability) and in the samples (biological variability)
         - Removal of metabolites with analytical variability > biological variability
         - Removal of metabolites with analytical variability > 30%

Note
Signal Normalization and Metabolites Exclusion are repetead until there is no further exclusion

Analysis Validation: the analyses were validated by observing the distribution of QCs among the samples using Principal Component Analysis (PCA). For all PCAs, the data were log-transformed and underwent UV scaling normalization.

Fusion of modalites = sorting redundancies:
Only the best modality is kept for 1 metabolite, based on RT > Dead Volume (0-1 min) and/or lower CV(metabolite) on QCs
"Analysis Validation" is proceed
Assignment of CHEBI identifiers des for those referenced in the CHEBI database


Expected result
A list of unique metabolites identified at Level 1 or 2, with their intensity normalized to the reference signal in each sample and in each QC

Softwares and References: 

Software
MSconvert (ProteoWizard)
NAME
Chambers, M.C., MacLean, B., Burke, R., Amode, D., Ruderman, D.L., Neumann, S., Gatto, L., Fischer, B., Pratt, B., Egertson, J.,
DEVELOPER
https://proteowizard.sourceforge.io/publications.html
SOURCE LINK



Software
Workflow4Metabolomics
NAME
http://bioinformatics.oxfordjournals.org/content/early/2014/12/18/bioinformatics.btu813
DEVELOPER
https://workflow4metabolomics.usegalaxy.fr/
SOURCE LINK


Software
FragHub
NAME
https://pubs.acs.org/doi/full/10.1021/acs.analchem.4c02219
DEVELOPER
https://github.com/eMetaboHUB/FragHub
SOURCE LINK


Software
Flash Entropy Search
NAME
Yuanyue Li
DEVELOPER
GitHub
REPOSITORY
https://github.com/YuanyueLi/FlashEntropySearch
SOURCE LINK


Software
SIMCA
NAME
Sartorius
DEVELOPER
https://www.sartorius.com/en/products/process-analytical-technology/data-analytics-software/mvda-software/simca
SOURCE LINK

OpenScience

"Open science has the potential of making the scientific process more transparent, inclusive and democratic.  Open science:
increases scientific collaborations and sharing of information for the benefits of science and society;
makes multilingual scientific knowledge openly available, accessible and reusable for everyone; and
opens the processes of scientific knowledge creation, evaluation and communication to societal actors beyond the traditional scientific community.
Our interconnected world needs open science to help solve complex social, environmental, and economic challenges and achieve the Sustainable Development Goals."
https://www.unesco.org/en/open-science/about

 To draft your protocols regarding sampling, analyses (other than metabolomics and/or lipidomics), and statistical analyses, you may use:

Software
protocols.io
NAME
https://www.protocols.io/welcome
SOURCE LINK

Data Submission

Software
MetaboLights
NAME
Ozgur Yurekten, Thomas Payne, Noemi Tejera, Felix Xavier Amaladoss, Callum Martin, Mark Williams, Claire O’Donovan. MetaboLights
DEVELOPER
https://www.ebi.ac.uk/metabolights/login
SOURCE LINK