Jun 29, 2026

Large Targeted Metabolomics - Data Treatment - MetaboHUB-Tours

Large Targeted Metabolomics - Data Treatment - MetaboHUB-Tours
  • Staff Members of PMAC1,2
  • 1Plateforme de Métabolomique et d'Analyses Chimiques, US61 ASB, Université de Tours, CHRU Tours, Inserm, Tours, France;
  • 2MetaboHUB-Tours, Tours, France
  • MetaboHUB-Tours
Icon indicating open access to content
QR code linking to this content
Protocol CitationStaff Members of PMAC 2026. Large Targeted Metabolomics - Data Treatment - MetaboHUB-Tours. protocols.io https://dx.doi.org/10.17504/protocols.io.kqdg3r6beg25/v1
License: This is an open access  protocol  distributed under the terms of the  Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: June 29, 2026
Last Modified: June 29, 2026
Protocol  Integer ID: 319992
Keywords: metabolomics, LC-MS, targeted, IROA, Q-Exactive, metabolomic coverage, large targeted metabolomics-lc, analysis of metabolite, metabolite, sample preparation, present in biological sample, cv of qc sample, methods after sample preparation, biological sample, qc sample, protein precipitation, data treatment data, large targeted metabolomic, metabohub, ms parameter, data treatment
Funders Acknowledgements:
Agence Nationale de la Recherche
Grant ID: ANR-11-INBS-0010 MetaboHUB
Agence Nationale de la recherche au titre de France 2030
Grant ID: ANR-21-ESRE-0035
Disclaimer
N/A
Abstract
I. PURPOSE
Identification and relative quantification (based on intensity) analysis of metabolites present in biological sample for metabolomic coverage.

II. MATERIAL & METHODS
After sample preparation which consisted in a protein precipitation, samples are analyzed by ES-LC-HRMS (Q-Orbitrap) by 3 modalities: C18 in both positive and negative mode & HILIC in positive mode.

III. DATA TREATMENT
Data are analyzed using Workflow4Metabolomics open-source software that allows to detect all features. features are identified with the use of IROA-database, based on the retention time and the m/z

IV. QUALITY CONTROL
-Checking the system suitability
- CV of QC samples (injected/10 injections) < 30 %
Image Attribution
Laurent Galineau (Plateforme Imagerie Préclinique, US-61 ASB, Université de Tours, CHRU Tours, Inserm, Tours, France ; Université de Tours, INSERM, Imaging Brain & Neuropsychiatry iBraiN U1253, 37032, Tours, France)
Camille Dupuy (Plateforme de Métabolomique et d'Analyses Chimiques, US61 ASB, Université de Tours, CHRU Tours, Inserm, Tours, France; MetaboHUB-Tours, Tours, France; Université de Tours, INSERM, Imaging Brain & Neuropsychiatry iBraiN U1253, 37032, Tours, France)
Guidelines
Metabolomics is the “omics” science studying metabolites and can meet the need for a deeper insight into biological data. It offers a real-time snapshot of a metabolic state and so reflect immediate physiological changes. By revealing metabolic profiles, metabolomics allows for precise diagnosis, tailored treatments, and proactive health management, ensuring highly personalized and effective interventions.
(Nordström, A., Lewensohn, R. Metabolomics: Moving to the Clinic. J Neuroimmune Pharmacol 5, 4–17 (2010). https://doi.org/10.1007/s11481-009-9156-4 )
Large targeted metabolomics analyses were performed on an UPLC Ultimate WPS-3000 system (Dionex, Germany) coupled to a Q-Exactive mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) equipped with a HESI source (head electrospray ionisation). Xcalibur 2.2 software (Thermo Fisher Scientific, Bremen, Germany) controlled the system.

For exploratory analyses, features were considered for further analysis only if they were detected in all injections and their CV were less than 30 % after signal normalization. The identification of metabolites was performed based on MS² spectra and retention time compared with the in-house injected database of > 600 compounds from IROA (MSMLS, FAMLS, BACSMLS & OAMLS from Merck).
Materials
LC-MS Sytem
  • A LC-MS system, designed for high-resolution analysis
  • Columns : reversed phase and HILIC

Reagents
  • Formic acid >99% for LC-MS VWR International (Avantor)
  • Water milliQMerck MilliporeSigma (SigmaAldrich)
  • Acetonitrile HiPerSolv CHROMANORM VWR International (Avantor)
  • Methanol HiPerSolv CHROMANORM VWR International (Avantor)
  • Calibration Solution Pierce ESI Negative Ion (Thermo Fisher)
  • Calibration Solution Pierce LTQ Velos ESI Positive Ion (Thermo Fisher)
  • IROA MSMLS, OAMLS, BACSMLS, FAMLS Merck MilliporeSigma (Sigma-Aldrich)
  • Formiate ammonium VWR International (Avantor)
  • Indoxyl-3-sulfate Merck Millipore Sigma (Sigma-Aldrich)
  • Kynurenic acid Merck Millipore Sigma (Sigma-Aldrich)
  • L-tryptophan Merck Millipore Sigma (Sigma-Aldrich )
  • Indole-3-Lactic Acid Merck Millipore Sigma (Sigma-Aldrich)
Safety warnings
All sample handling must be carried out under a fume hood, equipped with traditional PPE: lab coat, goggles, and gloves.
Data Analysis
2d 4h
".raw" files are converted into ".mzML" files on MSconvert, with the filter "PeakPicking : MSlevels 1-2"
mzML files are uploaded on W4M, as a collection

Note
Keep the "mzML" extension in name file when creating the collection

MSnbase readMSData
xcms get a sampleMetadata file
Download the tsv file and modified it to add some columns:
  • sampleType : pool, blank, sample
  • injectionOrder
  • mode : pos, neg
  • batch
Uploaded this updated sampleMetadata on W4M
xcms plot chromatogram

Expected result
pdf files with TIC by sampleType and by sample displayed

xcms findChromPeaks (xcmsSet)
--> Spectra Filters
  • Filter on Acquisition Numbers*,# : NA
  • Filter on Retention Time (s)* : 30,600
  • Filter on Mz# : 60,850
--> Extraction method for peaks detection
  • CentWave – chromatographic peak detection using centWave method
  • Max tolerated ppm m/z deviation in consecutive scans in ppm# : 3
  • Min,Max peak in second* : 4,20
--> Advanced Options
  • Signal to Noise ratio cutoff : 3
  • Prefilter step for the first analysis step (ROI detection) £,#: 3,20000
  • Name of the function to calculate the m/z center of the chromatographic peak: mean of the peaks’ m/z values
  • Integration method :peak limits are found through descent on the Mexican hat filtered data
  • Minimum difference in m/z peaks with overlapping retention times : -0.001
  • Fitgauss : No
  • Noise filter: 10000
  • Verbose Columns : No
  • Get a list of found chromatographic peaks : No
  • List of regions-of-interest (ROI): NA
xcms findChromPeaks Merger
xcms groupChromPeaks (group) --> Method to use for grouping
  • PeakDensity – peak grouping based on time dimension peak density
  • Bandwidth(s): 5
  • Minimum fraction of samples: 0.25
  • Minimum number of samples : 5
  • Width of overlapping m/z slices : 0.01
--> Advanced Options
  • Maximum number of groups to identify in a single m/z slice : 50
--> Get the Peak List
  • Yes
  • Convert retention time (seconds) into minutes : no
  • Number of decimal places for mass values reported in ions “identifiers”: 5
  • Number of decimal places for retention time reported in ions “identifiers” : 2
  • Reported intensity values : maxo
  • If NA values remain replace them by 0 in the dataMatrix: No
xcms adjustRtime (retcor)
--> Method to use for retention time correction
  • PeakGroups – retention time correction based on alignment of features (peak groups) present in most/all samples
  • Minimum required fraction of samples in which peaks for the peak group were identified: 0.85-0.9
  • Maximum number of additional peaks for all samples to be assigned to a peak group for retention time correction : 1
  • Smooth method: loess & Advanced options : NA
xcms groupChromPeaks (group) --> Method to use for grouping
  • PeakDensity – peak grouping based on time dimension peak density
  • Bandwidth(s): 3
  • Minimum fraction of samples: 0.25
  • Minimum number of samples : 5
  • Width of overlapping m/z slices : 0.01
--> Advanced Options
  • Maximum number of groups to identify in a single m/z slice : 50
--> Get the Peak List
  • Yes
  • Convert retention time (seconds) into minutes : no
  • Number of decimal places for mass values reported in ions “identifiers”: 5
  • Number of decimal places for retention time reported in ions “identifiers” : 2
  • Reported intensity values : maxo
  • If NA values remain replace them by 0 in the dataMatrix: No
xcms adjustRtime (retcor)
--> Method to use for retention time correction
  • PeakGroups – retention time correction based on alignment of features (peak groups) present in most/all samples
  • Minimum required fraction of samples in which peaks for the peak group were identified: 0.85-0.9
  • Maximum number of additional peaks for all samples to be assigned to a peak group for retention time correction : 1
  • Smooth method: loess & Advanced options : NA
xcms groupChromPeaks (group) --> Method to use for grouping
  • PeakDensity – peak grouping based on time dimension peak density
  • Bandwidth(s): 1
  • Minimum fraction of samples: 0.25
  • Minimum number of samples : 5
  • Width of overlapping m/z slices : 0.01
--> Advanced Options
  • Maximum number of groups to identify in a single m/z slice : 50
--> Get the Peak List
  • Yes
  • Convert retention time (seconds) into minutes : no
  • Number of decimal places for mass values reported in ions “identifiers”: 5
  • Number of decimal places for retention time reported in ions “identifiers” : 2
  • Reported intensity values : maxo
  • If NA values remain replace them by 0 in the dataMatrix: No
xcms fillChromPeaks (fillPeaks)
--> Advanced Options: NA
--> Peak List
  • Convert retention time (seconds) in minutes : Yes
  • Number of decimal places for mass values reported in ions “identifiers” : 5
  • Number of decimal places for retention time reported in ions “identifiers” : 2
  • Reported intensity values : maxo
--> If NA values remain, replace them by 0 in the dataMatrix : No
CAMERA groupFWHM (Galaxy Version 0.1.0+camera1.48.0-galaxy1)
--> The multiplier of the standard deviation: 6
--> Percentage of the width of the FWHM: 0.6
--> Polarity : Positive or Negative
--> Advanced parameters
  • FALSE
--> Convert seconds to minutes when exporting tsv : No
--> Number of digits for MZ values : 4
--> Number of digits for RT values : 4
CAMERA findIsotopes (Galaxy Version 0.1.0+camera1.48.0-galaxy1)
--> Max. ion charge : 2
--> Max. number of expected isotopes: 2
--> ppm error for the search : 5
--> Allowed variance for the search : 0.01
--> Choose intensity values for C12/C13 check. Allowed values are into, maxo, intb: maxo
--> The percentage number of samples, which satisfy the C12/C13 rule for isotope annotation : 0.7
--> Shloud C12/C13 filter be applied : No
-->Convert seconds to minutes when exporting tsv : No
--> Number of digits for MZ values (namecustom): 4
--> Number of digits for RT values (namecustom): 4
CAMERA findAdducts (Galaxy Version 0.1.0+camera1.48.0-galaxy1)*
--> General ppm error : 5
--> General absolute error in m/z : 0.015
--> Which polarity mode was used for measuring of the MS sample : negative or positive
--> Use a personal ruleset file : FALSE
--> Choose intensity values : maxo
--> Advanced parameters
  • False
-->Convert seconds to minutes when exporting tsv : Yes
-->Number of digits for MZ values (namecustom): 4
-->Number of digits for RT values (namecustom) : 4
Several "Intensity Check" and "Generic_filter" are realized on missing values, fold_change between blank and others sampleType, quartile Q3 values of ions in pool samples
Then, in dataMatrix, NA are replaced by 0
Batch correction (Galaxy Version 3.0.0)
--> Sample metadata file coding parameter
  • Batch column name : batch
  • Injection order column name : injectionOrder
  • Sample type column name: sampleType
  • Set the name used to tag samples as pool in the sample type column : pool
  • Set the name used to tag samples as blank in the sample type column : blank
  • Set the name used to tag samples asreal sample in the sample type column : sample
--> Type of regression model
  • Loess
  • Span : not modified
  • Unconsistant values : Prevent it
  • Factor of interest : batch
  • Level of details for plots : basic, standard or complete

Note
This step involves “normalizing” the signal. However, if biological sample are different, for example, if there are more or fewer cells in pellets, “biological” normalization is an additional step that must be performed.

Quality Metrics
--> Coefficient of Variation
  • Yes
  • Which type of CV calculation should be done : only pool CV
  • Threshold : 0.3
--> Advanced parameters: default
Generic filter on CV(pool) >0.3

Between-table Correlation (Galaxy Version 1.0.0) : put twice the dataMatrix and indicate that samples are in column for both tables
Analytical correlation filtration (Galaxy Version 2019-06-20)
--> Correlation threshold : 0.85
-->Do you want to take into account mass differences between 2 ions ?
  • Yes
  • Do you have your own list of mass differences or do you want to use a default list ? No
  • Mass difference rang : 0.005
--> Do you want to take into account retention time differences between 2 ions ?
  • Yes
  • Retention time difference threshold : 0.1
  • Which representative ion do you want to select for each group : highest intensity
Generic_filter (Galaxy Version 1.3.0)

Filter on ACorF_groups to remove "0" and "-"

Expected result
A .tsv file where every line correspond to a unique feature FOR EACH MODALITY

Metabolites Identification - Level 2
Identifications are based on the match between experimentals m/z and RT with the ones from an in-house chemical library (standards from MSMLS, OAMLS, BACSMLS and FAMLS from IROA Technologies) with Xcalibur, software from ThermoScientific

Expected result
3 files (HILIC POS, C18 POS & C18 NEG) ".pmd" files with a list of study-calibrated RT and mz for each compounds

Metabolites Identification - Level 1
Experimental MS2 spectrum are compared with an in-house chemical library (standards from MSMLS, OAMLS, BACSMLS and FAMLS from IROA Technologies) with FlashEntropy.
For the metabolites presents in the previous step and in this step (if the “identity_search>0.7” score is met), the level is upgraded in level 1


Note
All spectrum from database have been concatenated in one msp file with FragHub





Data Curation = Signal Normalization & Metabolites Exclusion :
- Normalization of metabolite areas to the total area of detected metabolites
- Calculation of the coefficient of variation (CV) for each metabolite in the QCs (analytical variability) and in the samples (biological variability)
- Removal of metabolites with analytical variability > biological variability
- Removal of metabolites with analytical variability > 30%

Note
Signal Normalization and Metabolites Exclusion are repetead until there is no further exclusion

Analysis Validation: the analyses were validated by observing the distribution of QCs among the samples using Principal Component Analysis (PCA). For all PCAs, the data were log-transformed and underwent UV scaling normalization.
Fusion of modalites = sorting redundancies:
  • Only the best modality is kept for 1 metabolite, based on RT > Dead Volume (0-1 min) and/or lower CV(metabolite) on QCs
  • "Analysis Validation" is proceed
  • Assignment of CHEBI identifiers des for those referenced in the CHEBI database


Expected result
A list of unique metabolites identified at Level 1 or 2, with their intensity normalized to the reference signal in each sample and in each QC


Softwares and References:

Software
MSconvert (ProteoWizard)
NAME
Chambers, M.C., MacLean, B., Burke, R., Amode, D., Ruderman, D.L., Neumann, S., Gatto, L., Fischer, B., Pratt, B., Egertson, J.,
DEVELOPER



Software
Workflow4Metabolomics
NAME
http://bioinformatics.oxfordjournals.org/content/early/2014/12/18/bioinformatics.btu813
DEVELOPER


Software
FragHub
NAME
https://pubs.acs.org/doi/full/10.1021/acs.analchem.4c02219
DEVELOPER


Software
Flash Entropy Search
NAME
Yuanyue Li
DEVELOPER
REPOSITORY



OpenScience
"Open science has the potential of making the scientific process more transparent, inclusive and democratic.  Open science:
  • increases scientific collaborations and sharing of information for the benefits of science and society;
  • makes multilingual scientific knowledge openly available, accessible and reusable for everyone; and
  • opens the processes of scientific knowledge creation, evaluation and communication to societal actors beyond the traditional scientific community.
Our interconnected world needs open science to help solve complex social, environmental, and economic challenges and achieve the Sustainable Development Goals."
To draft your protocols regarding sampling, analyses (other than metabolomics and/or lipidomics), and statistical analyses, you may use:

Software
protocols.io
NAME

Data Submission

Software
MetaboLights
NAME
Ozgur Yurekten, Thomas Payne, Noemi Tejera, Felix Xavier Amaladoss, Callum Martin, Mark Williams, Claire O’Donovan. MetaboLights
DEVELOPER