Jun 26, 2026

Full Workflow for both Targeted and Untargeted Metabolomics on water samples from the river of Mayenne - MetaboHUB Tours

Full Workflow for both Targeted and Untargeted Metabolomics on water samples from the river of Mayenne - MetaboHUB Tours
  • Jérémy Monteiro1,2,
  • Staff Members of PMAC1,2
  • 1Plateforme de Métabolomique et d'Analyses Chimiques, US61 ASB, Université de Tours, CHRU Tours, Inserm, Tours, France;
  • 2MetaboHUB-Tours, Tours, France
  • MetaboHUB-Tours
Icon indicating open access to content
QR code linking to this content
Protocol CitationJérémy Monteiro, Staff Members of PMAC 2026. Full Workflow for both Targeted and Untargeted Metabolomics on water samples from the river of Mayenne - MetaboHUB Tours. protocols.io https://dx.doi.org/10.17504/protocols.io.8epv5wznnv1b/v1
License: This is an open access  protocol  distributed under the terms of the  Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: June 26, 2026
Last Modified: June 26, 2026
Protocol  Integer ID: 319905
Keywords: untargeted, features, quality metrics, untargeted metabolomic, spectral database, annotation of feature, annotation, untargeted metabolomics on water sample, untargeted metabolomic, untargeted data, water sample, identification of compound, annotation of feature, annotation, data, present in the data, sample, feature, river, house database of standard
Funders Acknowledgements:
Agence Nationale de la Recherche
Grant ID: ANR-11-INBS-0010 MetaboHUB
Agence Nationale de la recherche au titre de France 2030
Grant ID: ANR-21-ESRE-0035
Abstract
The purpose of this workflow is to describe how untargeted data is processed using two approaches:
1) detection and annotation of features present in the data
2) identification of compounds in our files using an in-house database of standards

Image Attribution
Laurent Galineau (Plateforme Imagerie Préclinique, US-61 ASB, Université de Tours, CHRU Tours, Inserm, Tours, France ; Université de Tours, INSERM, Imaging Brain & Neuropsychiatry iBraiN U1253, 37032, Tours, France)
Camille Dupuy (Plateforme de Métabolomique et d'Analyses Chimiques, US61 ASB, Université de Tours, CHRU Tours, Inserm, Tours, France; MetaboHUB-Tours, Tours, France; Université de Tours, INSERM, Imaging Brain & Neuropsychiatry iBraiN U1253, 37032, Tours, France)
Guidelines
You need to be perfectly aware of what kind of data you are working with. This workflow may not be adapted to your data because of your hardware and softwares.

Workflow4Metabolomics is actively maintained, so the versions of the tools it uses may change. Older versions remain available. For any questions or support about W4M: community.france-bioinformatique.fr/c/workflow4metabolomics/
Materials

Software
MSconvert (ProteoWizard)
NAME
Chambers, M.C., MacLean, B., Burke, R., Amode, D., Ruderman, D.L., Neumann, S., Gatto, L., Fischer, B., Pratt, B., Egertson, J.,
DEVELOPER



Software
Workflow4Metabolomics
NAME
http://bioinformatics.oxfordjournals.org/content/early/2014/12/18/bioinformatics.btu813
DEVELOPER





Troubleshooting
Problem
MSconvert can created artefact and false fragments during conversion
Solution
For raw files from Waters, DataConvert is recommended. Otherwie, be aware of the issue and use biological context and, when necessary, raw data and manufacturer softwares to "clarify" annotation.
Safety warnings
All sample handling must be carried out under a fume hood, equipped with traditional PPE: lab coat, goggles, and gloves.
Ethics statement
No specific ethic statement required.
1. Extraction of water sample
3d 10h 30m 30s
50 mL of samples have been store at -80 °C

Samples have been lyophilized during 72:00:00

3d
Add 5000 µL of methanol in each dried samples

Agitate vigouruosly during 00:00:30

30s
Let it settle during 02:00:00 at -20 °C


2h 15m
1- Transfer the liquid phase in a new 50mL falcon
2- From this falcon :
  • 2000 µL in an eppendorf for C18+/- analysis
  • 2000 µL in an eppendorf for HILIC+ analysis
  • 500 µL in falcon for Quality Control (QC)

Let the QC settle during 00:30:00 at -20 °C
30m
Prepare 2 mL aliquots of QC for C18+/- and HILIC+, in eppendorfs
Centrifugation:
15000 x g, 4°C, 00:15:00
Transfert 1.5 mL from each eppendorf in a 1.5 mL glass vial
1- Reversed-Phase Analysis (C18+/-)
Evaporate : 00:30:00 under N2 stream at 40 °C

30m
Resuspend in 100 µL of water/acetonitrile (8:2)

Agitate vigouruosly during 00:00:30



15m
Centrifugation
15000 x g, 4°C, 00:15:00
15m
Transfer 85 µL ach sample in an insert for 1.5 mL glass vial

Seal with an injection-friendly cap
2- HILIC Analysis
Evaporate : 00:30:00 under N2 stream at 40 °C

30m
Resuspend in 100 µL of acetonitrile/water (8:2)

Agitate vigouruosly during 00:00:30



15m
Centrifugation
15000 x g, 4°C, 00:15:00
Transfer 85 µL of each sample in an insert for 1.5 mL glass vial

Pipette and mix 5 µL of each sample to create a QC sample, then divide it into several aliquots for injection quality control

Seal with an injection-friendly cap
2. Data Acquisition
Data have been acquired on a Q-Exactive. For more details, please refer to the dedicated protocols, except for the data analysis part which is describe below (specific data analysis have been developped during this project) : https://dx.doi.org/10.17504/protocols.io.eq2lyoq9qgx9/v2

Equipment
Q-Exactive
NAME
Mass Spectrometer
TYPE
ThermoScientific
BRAND


3. TARGETED Metabolomics: Data PreProcessing & Data Processing
".raw" files are converted into ".mzML" files on MSconvert, with the filter "PeakPicking : MSlevels 1-1"
mzML files are uploaded on W4M, as a collection

XCMS findChromPeaks (Galaxy Version 3.12.0+galaxy3)
--> Spectra Filters
  • Filter on mz: 60-850
--> Extraction Method for Peak Detection
  • CentWave - chromatographic peak detection using the centWave method
  • Max tolerated ppm m/z deviation in consecutive scans in ppm: 10
  • Min,Max peak width in seconds: 4,40
--> Advanced options
  • Signal to Noise ratio cutoff: 10
  • Prefilter step for for the first analysis step (ROI detection): 4,50 000
  • Name of the function to calculate the m/z center of the chromatographic peak: Use the m/z at the peak apex
  • Integration method: Mexican hat
  • Minimum difference in m/z for peaks with overlapping retention times: 0.008
  • fitgauss: No
  • Noise filter: 20 000
  • verbose columns: No
  • Get a list of found chromatographic peaks: No

XCMS findChromPeaks merger (Galaxy Version 3.12.0+galaxy3)
XCMS groupChromPeaks (Galaxy Version 3.12.0+galaxy3)
--> Method to use for grouping
  • Peak density
  • Bandwidth: 5
  • Minimum fraction of samples: 0.2
  • Minimum number of samples: 5
  • Width of overlapping m/z slices: 0.008
--> Advanced Options
  • Maximum number of groups to identify in a single m/z slices: 50
--> Get the Peak List
  • No
XCMSadjustRtime (Galaxy Version 3.12.0+galaxy3)
--> Method to use for retentiuon time correction
  • Peakgroups - retention time correction based on aligment of features (peak groups) present in most/all samples
  • Minimum required fraction of samples in which peaks for the group were identified: 0.5
  • Maximal number of additional peaks for all samples to be assigned to a peak group for retention time correction: 1
--> Smooth method
  • loess - non-linear alignment
XCMS groupChromPeaks (Galaxy Version 3.12.0+galaxy3)
--> Method to use for grouping
  • Peak density
  • Bandwidth: 2
  • Minimum fraction of samples: 0.2
  • Minimum number of samples: 5
  • Width of overlapping m/z slices: 0.008
--> Advanced Options
  • Maximum number of groups to identify in a single m/z slices: 50
--> Get the Peak List
  • No
XCMS fillChromPeaks (Galaxy Version 3.12.0+galaxy3)
--> Advanced options: nothing modified
--> Peak List
  • Convert retention time (seconds) into minutes: Yes
  • Number of decimal places for mass values reported in ions "identifiers": 5
  • Number of decimal places for retention time reported in ions "identifiers":2
  • Reported intenisty values: into
  • If NA values remain, replace them by 0 in the dataMatrix: Yes

CAMERA.annotate
--> Group co-eluted peaks based on RT [groupFWHM]
  • Multiplier of the standard deviation: 6
  • Percentage of FWHM width: 0.6
--> Annotation general options
  • General ppm error: 5
  • General absolute error in m/z: 0.015
--> Annotate Isotopes [findIsotopes]
  • Max. ion charge: 3
  • Max. number of expected isotopes: 4
  • The percentage number of samples, which must satisfy the C12/C13 rule for isotope annotation: 0.5
--> Mode
  • only groupFWHM and findIsotopes functions
--> Statistics and resultats export: [diffreport]
  • Two or more conditions
  • Number of the most significantly different analytes to creat EICs for: 50000
  • Width (in seconds) of EICs produced: 200
  • Intensity values to be used for the diffreport: into
  • Numerical variable for the height of the eic and boxplots that are printed out: 480
  • Numeric variable for the width of the eic and boxplots print out made: 640
  • Number of decimal places of title m/z values in the eic plot: 4
  • logical indicating whether the reports should be sorted by p-value: No
  • Export the Diffreport files in: Zip
  • Export the EIC and boxplots in: Zip
Export options
  • Convert retention time (seconds) into minutes: No
  • Number of decimal places for mass values reported in ions' identifiers: 4
  • Number of decimal places for retention time values reported in ions' identifiers: 0
  • General used intensity value: into

Expected result
Two ouputs files :
  • one zip with a table inside with the area of each features in samples
  • one zip with png files corresponding ions chromatograms

Metabolites identification are based on m/z and RT of a in-house chemical library (standards from MSMLS, OAMLS, BACSMLS and FAMLS from IROA Technologies) with Xcalibur, software from ThermoScientific

Expected result
3 files (HILIC POS, C18 POS & C18 NEG) ".pmd" files with a list of study-calibrated RT and mz for each compounds

Data Curation = Signal Normalization & Metabolites Exclusion :
- Normalization of metabolite areas to the total area of detected metabolites
- Calculation of the coefficient of variation (CV) for each metabolite in the QCs (analytical variability) and in the samples (biological variability)
- Removal of metabolites with analytical variability > biological variability
- Removal of metabolites with analytical variability > 30%

Note
Signal Normalization and Metabolites Exclusion are repetead until there is no further exclusion

Analysis Validation: the analyses were validated by observing the distribution of QCs among the samples using Principal Component Analysis (PCA). For all PCAs, the data were log-transformed and underwent UV scaling normalization.
Fusion of modalites = sorting redundancies:
  • Only the best modality is kept for 1 metabolite, based on RT > Dead Volume (0-1 min) and/or lower CV(metabolite) on QCs
  • "Analysis Validation" is proceed
Softwares required:


Software
Workflow4Metabolomics
NAME
http://bioinformatics.oxfordjournals.org/content/early/2014/12/18/bioinformatics.btu813
DEVELOPER



4. UNTARGETED Metabolomics: DataPreProcessing & Data Processing
".raw" files are converted into ".mzML" files on MSconvert, with the filter "PeakPicking : MSlevels 1-2"
mzML files are uploaded on W4M, as a collection

Note
Keep the "mzML" extension in name file when creating the collection

MSnbase readMSData
xcms get a sampleMetadata file
Download the tsv file and modified it to add some columns:
  • sampleType : pool, blank, sample
  • injectionOrder
  • mode : pos, neg
  • batch
Uploaded this updated sampleMetadata on W4M
xcms plot chromatogram

Expected result
pdf files with TIC by sampleType and by sample displayed

xcms findChromPeaks (xcmsSet)
--> Spectra Filters
  • Filter on Acquisition Numbers*,# : NA
  • Filter on Retention Time (s)* : 30,600
  • Filter on Mz# : 60,850
--> Extraction method for peaks detection
  • CentWave – chromatographic peak detection using centWave method
  • Max tolerated ppm m/z deviation in consecutive scans in ppm# : 3
  • Min,Max peak in second* : 4,20
--> Advanced Options
  • Signal to Noise ratio cutoff : 3
  • Prefilter step for the first analysis step (ROI detection) £,#: 3,20000
  • Name of the function to calculate the m/z center of the chromatographic peak: mean of the peaks’ m/z values
  • Integration method :peak limits are found through descent on the Mexican hat filtered data
  • Minimum difference in m/z peaks with overlapping retention times : -0.001
  • Fitgauss : No
  • Noise filter: 10000
  • Verbose Columns : No
  • Get a list of found chromatographic peaks : No
  • List of regions-of-interest (ROI): NA
xcms findChromPeaks Merger
xcms groupChromPeaks (group) --> Method to use for grouping
  • PeakDensity – peak grouping based on time dimension peak density
  • Bandwidth(s): 5
  • Minimum fraction of samples: 0.25
  • Minimum number of samples : 5
  • Width of overlapping m/z slices : 0.01
--> Advanced Options
  • Maximum number of groups to identify in a single m/z slice : 50
--> Get the Peak List
  • Yes
  • Convert retention time (seconds) into minutes : no
  • Number of decimal places for mass values reported in ions “identifiers”: 5
  • Number of decimal places for retention time reported in ions “identifiers” : 2
  • Reported intensity values : maxo
  • If NA values remain replace them by 0 in the dataMatrix: No
xcms adjustRtime (retcor)
--> Method to use for retention time correction
  • PeakGroups – retention time correction based on alignment of features (peak groups) present in most/all samples
  • Minimum required fraction of samples in which peaks for the peak group were identified: 0.85-0.9
  • Maximum number of additional peaks for all samples to be assigned to a peak group for retention time correction : 1
  • Smooth method: loess & Advanced options : NA
xcms groupChromPeaks (group) --> Method to use for grouping
  • PeakDensity – peak grouping based on time dimension peak density
  • Bandwidth(s): 3
  • Minimum fraction of samples: 0.25
  • Minimum number of samples : 5
  • Width of overlapping m/z slices : 0.01
--> Advanced Options
  • Maximum number of groups to identify in a single m/z slice : 50
--> Get the Peak List
  • Yes
  • Convert retention time (seconds) into minutes : no
  • Number of decimal places for mass values reported in ions “identifiers”: 5
  • Number of decimal places for retention time reported in ions “identifiers” : 2
  • Reported intensity values : maxo
  • If NA values remain replace them by 0 in the dataMatrix: No
xcms adjustRtime (retcor)
--> Method to use for retention time correction
  • PeakGroups – retention time correction based on alignment of features (peak groups) present in most/all samples
  • Minimum required fraction of samples in which peaks for the peak group were identified: 0.85-0.9
  • Maximum number of additional peaks for all samples to be assigned to a peak group for retention time correction : 1
  • Smooth method: loess & Advanced options : NA
xcms groupChromPeaks (group) --> Method to use for grouping
  • PeakDensity – peak grouping based on time dimension peak density
  • Bandwidth(s): 1
  • Minimum fraction of samples: 0.25
  • Minimum number of samples : 5
  • Width of overlapping m/z slices : 0.01
--> Advanced Options
  • Maximum number of groups to identify in a single m/z slice : 50
--> Get the Peak List
  • Yes
  • Convert retention time (seconds) into minutes : no
  • Number of decimal places for mass values reported in ions “identifiers”: 5
  • Number of decimal places for retention time reported in ions “identifiers” : 2
  • Reported intensity values : maxo
  • If NA values remain replace them by 0 in the dataMatrix: No
xcms fillChromPeaks (fillPeaks)
--> Advanced Options: NA
--> Peak List
  • Convert retention time (seconds) in minutes : Yes
  • Number of decimal places for mass values reported in ions “identifiers” : 5
  • Number of decimal places for retention time reported in ions “identifiers” : 2
  • Reported intensity values : maxo
--> If NA values remain, replace them by 0 in the dataMatrix : No
CAMERA groupFWHM (Galaxy Version 0.1.0+camera1.48.0-galaxy1)
--> The multiplier of the standard deviation: 6
--> Percentage of the width of the FWHM: 0.6
--> Polarity : Positive or Negative
--> Advanced parameters
  • FALSE
--> Convert seconds to minutes when exporting tsv : No
--> Number of digits for MZ values : 4
--> Number of digits for RT values : 4
CAMERA findIsotopes (Galaxy Version 0.1.0+camera1.48.0-galaxy1)
--> Max. ion charge : 2
--> Max. number of expected isotopes: 2
--> ppm error for the search : 5
--> Allowed variance for the search : 0.01
--> Choose intensity values for C12/C13 check. Allowed values are into, maxo, intb: maxo
--> The percentage number of samples, which satisfy the C12/C13 rule for isotope annotation : 0.7
--> Shloud C12/C13 filter be applied : No
-->Convert seconds to minutes when exporting tsv : No
--> Number of digits for MZ values (namecustom): 4
--> Number of digits for RT values (namecustom): 4
CAMERA findAdducts (Galaxy Version 0.1.0+camera1.48.0-galaxy1)*
--> General ppm error : 5
--> General absolute error in m/z : 0.015
--> Which polarity mode was used for measuring of the MS sample : negative or positive
--> Use a personal ruleset file : FALSE
--> Choose intensity values : maxo
--> Advanced parameters
  • False
-->Convert seconds to minutes when exporting tsv : Yes
-->Number of digits for MZ values (namecustom): 4
-->Number of digits for RT values (namecustom) : 4
Several "Intensity Check" and "Generic_filter" are realized on missing values, fold_change between blank and others sampleType, quartile Q3 values of ions in pool samples
Then, in dataMatrix, NA are replaced by 0
Batch correction (Galaxy Version 3.0.0)
--> Sample metadata file coding parameter
  • Batch column name : batch
  • Injection order column name : injectionOrder
  • Sample type column name: sampleType
  • Set the name used to tag samples as pool in the sample type column : pool
  • Set the name used to tag samples as blank in the sample type column : blank
  • Set the name used to tag samples asreal sample in the sample type column : sample
--> Type of regression model
  • Loess
  • Span : not modified
  • Unconsistant values : Prevent it
  • Factor of interest : batch
  • Level of details for plots : basic, standard or complete

Note
This step involves “normalizing” the signal. However, if biological sample are different, for example, if there are more or fewer cells in pellets, “biological” normalization is an additional step that must be performed.

Quality Metrics
--> Coefficient of Variation
  • Yes
  • Which type of CV calculation should be done : only pool CV
  • Threshold : 0.3
--> Advanced parameters: default
Generic filter on CV(pool) >0.3

Between-table Correlation (Galaxy Version 1.0.0) : put twice the dataMatrix and indicate that samples are in column for both tables
Analytical correlation filtration (Galaxy Version 2019-06-20)
--> Correlation threshold : 0.85
-->Do you want to take into account mass differences between 2 ions ?
  • Yes
  • Do you have your own list of mass differences or do you want to use a default list ? No
  • Mass difference rang : 0.005
--> Do you want to take into account retention time differences between 2 ions ?
  • Yes
  • Retention time difference threshold : 0.1
  • Which representative ion do you want to select for each group : highest intensity
Generic_filter (Galaxy Version 1.3.0)

Filter on ACorF_groups to remove "0" and "-"

Expected result
A .tsv file where every line correspond to a unique feature FOR EACH MODALITY

Features identification are based on mz between experimental data and public spectral database (MSquery on W4M) with the tool "bank in-house"


Expected result
A .tsv file where every line correspond to a unique feature, where some of them have been identified FOR EACH MODALITY


Safety information
As features (unidentified ions) and metabolites (features identified based on public library) are putatively annotated, it is not possible to merge and sort redundancies between the 3 files

Softwares and References:

Software
MSconvert (ProteoWizard)
NAME
Chambers, M.C., MacLean, B., Burke, R., Amode, D., Ruderman, D.L., Neumann, S., Gatto, L., Fischer, B., Pratt, B., Egertson, J.,
DEVELOPER



Software
Workflow4Metabolomics
NAME
http://bioinformatics.oxfordjournals.org/content/early/2014/12/18/bioinformatics.btu813
DEVELOPER







Acknowledgements
This protocol have been developped for a collaboration between MetaboHUB-Tours, Islem Mokrane and Nathalie Gassama from UR 6293 - GéoHydrosystèmes continentaux - GéHCO (University of Tours, 37, France)