Jun 27, 2026

Full Workflow for both Targeted and Untargeted Metabolomics on Extracellular Vesicles - MetaboHUB Tours V.2

Full Workflow for both Targeted and Untargeted Metabolomics on Extracellular Vesicles - MetaboHUB Tours
  • Jérémy Monteiro1,2,
  • Staff Members of PMAC1,2
  • 1Plateforme de Métabolomique et d'Analyses Chimiques, US61 ASB, Université de Tours, CHRU Tours, Inserm, Tours, France;
  • 2MetaboHUB-Tours, Tours, France
  • MetaboHUB-Tours
Icon indicating open access to content
QR code linking to this content
Protocol CitationJérémy Monteiro, Staff Members of PMAC 2026. Full Workflow for both Targeted and Untargeted Metabolomics on Extracellular Vesicles - MetaboHUB Tours. protocols.io https://dx.doi.org/10.17504/protocols.io.5qpvoej9zl4o/v2Version created by Jérémy Monteiro
License: This is an open access  protocol  distributed under the terms of the  Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: June 26, 2026
Last Modified: June 27, 2026
Protocol  Integer ID: 319891
Keywords: untargeted, features, detection, quality metrics, full workflow for untargeted metabolomic, untargeted metabolomic, unstructured data, extracellular vesicle, spectral database, annotation of feature, identification of compound, identification table as output, annotation, metabohub, present in the data, identification table, data, database, untargeted metabolomics on extracellular vesicle
Funders Acknowledgements:
Agence Nationale de la Recherche
Grant ID: ANR-11-INBS-0010 MetaboHUB
Agence Nationale de la recherche au titre de France 2030
Grant ID: ANR-21-ESRE-0035
Abstract
The purpose of this workflow is to describe how unstructured data is processed using two approaches:
1) detection and annotation of features present in the data
2) identification of compounds in our files using FlashEntropy and a spectral database
Each process will generate an identification table as output, which can be cross-referenced to annotate certain features (note the confidence level of the identification).
Image Attribution
Laurent Galineau (Plateforme Imagerie Préclinique, US-61 ASB, Université de Tours, CHRU Tours, Inserm, Tours, France ; Université de Tours, INSERM, Imaging Brain & Neuropsychiatry iBraiN U1253, 37032, Tours, France)
Camille Dupuy (Plateforme de Métabolomique et d'Analyses Chimiques, US61 ASB, Université de Tours, CHRU Tours, Inserm, Tours, France; MetaboHUB-Tours, Tours, France; Université de Tours, INSERM, Imaging Brain & Neuropsychiatry iBraiN U1253, 37032, Tours, France)
Guidelines
If needed, all the software used has a GitHub repository or a dedicated website with training datasets and detailed tutorials.
Materials

Software
MSconvert (ProteoWizard)
NAME
Chambers, M.C., MacLean, B., Burke, R., Amode, D., Ruderman, D.L., Neumann, S., Gatto, L., Fischer, B., Pratt, B., Egertson, J.,
DEVELOPER

Software
BatMass: mass spectrometry data visualization
NAME
Dmitry Avtonomov
DEVELOPER

Software
Workflow4Metabolomics
NAME
http://bioinformatics.oxfordjournals.org/content/early/2014/12/18/bioinformatics.btu813
DEVELOPER

Software
FragHub
NAME
https://pubs.acs.org/doi/full/10.1021/acs.analchem.4c02219
DEVELOPER

Software
Flash Entropy Search
NAME
Yuanyue Li
DEVELOPER
REPOSITORY

Troubleshooting
Problem
MSconvert can created artefact and false fragments during conversion
Solution
For raw files from Waters, DataConvert is recommended. Otherwie, be aware of the issue and use biological context and, when necessary, raw data and manufacturer softwares to "clarify" annotation.
Safety warnings
All sample handling must be carried out under a fume hood, equipped with traditional PPE: lab coat, goggles, and gloves.
Ethics statement
No specific ethic statement required.
Before start
You need to be perfectly aware of what kind of data you are working with. This workflow may not be adapted to your data because of your hardware and softwares.

Workflow4Metabolomics is actively maintained, so the versions of the tools it uses may change. Older versions remain available. For any questions or support about W4M: community.france-bioinformatique.fr/c/workflow4metabolomics/
1. Extraction of EVs
2h 0m 30s
Add 800 µL of methanol in each dried pellets samples

Agitate vigouruosly during 00:00:30

30s
Centrifugation
15000 x g, 4°C, 00:15:00

15m
1- Transfer 700 µL in a first 96-wells plate --> for reversed-phase analysis
2- From the 1st plate, transfer 350 µL in a second plate --> for hilic analysis

1- Reversed-Phase Analysis
Evaporate : 00:30:00 under N2 stream at 40 °C

30m
Resuspend in 100 µL of water/acetonitrile (8:2)

Planar agitation : 00:15:00 , strength 7/10

Note
Cover the plate with an aluminium fold

15m
Centrifugation
3000 rpm, 4°C, 00:15:00
15m
Transfer 85 µL of each sample in a new 96-wells plates

Pipette and mix 5 µL of each sample to create a QC sample, then divide it into several aliquots for injection quality control

Cover with an injection-friendly fold
2- HILIC Analysis
Evaporate : 00:30:00 under N2 stream at 40 °C

30m
Resuspend in 100 µL of acetonitrile/water (8:2)

Planar agitation : 00:15:00 , strength 7/10

Note
Cover the plate with an aluminium fold

15m
Centrifugation
3000 rpm, 4°C, 00:15:00
Transfer 85 µL of each sample in an insert for 1.5 mL glass vial

Pipette and mix 5 µL of each sample to create a QC sample, then divide it into several aliquots for injection quality control

Seal with an injection-friendly cap
2. Data Acquisition
Data have been acquired on a Q-Exactive. For more details, please refer to the dedicated protocols : https://dx.doi.org/10.17504/protocols.io.eq2lyoq9qgx9/v1

Equipment
Q-Exactive
NAME
Mass Spectrometer
TYPE
ThermoScientific
BRAND


3. TARGETED Metabolomics: Data PreProcessing & Data Processing
".raw" files are converted into ".mzML" files on MSconvert, with the filter "PeakPicking : MSlevels 1-1"
mzML files are uploaded on W4M, as a collection

XCMS findChromPeaks (Galaxy Version 3.12.0+galaxy3)
--> Spectra Filters
  • Filter on mz: 60-850
--> Extraction Method for Peak Detection
  • CentWave - chromatographic peak detection using the centWave method
  • Max tolerated ppm m/z deviation in consecutive scans in ppm: 10
  • Min,Max peak width in seconds: 4,40
--> Advanced options
  • Signal to Noise ratio cutoff: 10
  • Prefilter step for for the first analysis step (ROI detection): 4,50 000
  • Name of the function to calculate the m/z center of the chromatographic peak: Use the m/z at the peak apex
  • Integration method: Mexican hat
  • Minimum difference in m/z for peaks with overlapping retention times: 0.008
  • fitgauss: No
  • Noise filter: 20 000
  • verbose columns: No
  • Get a list of found chromatographic peaks: No

XCMS findChromPeaks merger (Galaxy Version 3.12.0+galaxy3)
XCMS groupChromPeaks (Galaxy Version 3.12.0+galaxy3)
--> Method to use for grouping
  • Peak density
  • Bandwidth: 5
  • Minimum fraction of samples: 0.2
  • Minimum number of samples: 5
  • Width of overlapping m/z slices: 0.008
--> Advanced Options
  • Maximum number of groups to identify in a single m/z slices: 50
--> Get the Peak List
  • No
XCMSadjustRtime (Galaxy Version 3.12.0+galaxy3)
--> Method to use for retentiuon time correction
  • Peakgroups - retention time correction based on aligment of features (peak groups) present in most/all samples
  • Minimum required fraction of samples in which peaks for the group were identified: 0.5
  • Maximal number of additional peaks for all samples to be assigned to a peak group for retention time correction: 1
--> Smooth method
  • loess - non-linear alignment
XCMS groupChromPeaks (Galaxy Version 3.12.0+galaxy3)
--> Method to use for grouping
  • Peak density
  • Bandwidth: 2
  • Minimum fraction of samples: 0.2
  • Minimum number of samples: 5
  • Width of overlapping m/z slices: 0.008
--> Advanced Options
  • Maximum number of groups to identify in a single m/z slices: 50
--> Get the Peak List
  • No
XCMS fillChromPeaks (Galaxy Version 3.12.0+galaxy3)
--> Advanced options: nothing modified
--> Peak List
  • Convert retention time (seconds) into minutes: Yes
  • Number of decimal places for mass values reported in ions "identifiers": 5
  • Number of decimal places for retention time reported in ions "identifiers":2
  • Reported intenisty values: into
  • If NA values remain, replace them by 0 in the dataMatrix: Yes

CAMERA.annotate
--> Group co-eluted peaks based on RT [groupFWHM]
  • Multiplier of the standard deviation: 6
  • Percentage of FWHM width: 0.6
--> Annotation general options
  • General ppm error: 5
  • General absolute error in m/z: 0.015
--> Annotate Isotopes [findIsotopes]
  • Max. ion charge: 3
  • Max. number of expected isotopes: 4
  • The percentage number of samples, which must satisfy the C12/C13 rule for isotope annotation: 0.5
--> Mode
  • only groupFWHM and findIsotopes functions
--> Statistics and resultats export: [diffreport]
  • Two or more conditions
  • Number of the most significantly different analytes to creat EICs for: 50000
  • Width (in seconds) of EICs produced: 200
  • Intensity values to be used for the diffreport: into
  • Numerical variable for the height of the eic and boxplots that are printed out: 480
  • Numeric variable for the width of the eic and boxplots print out made: 640
  • Number of decimal places of title m/z values in the eic plot: 4
  • logical indicating whether the reports should be sorted by p-value: No
  • Export the Diffreport files in: Zip
  • Export the EIC and boxplots in: Zip
Export options
  • Convert retention time (seconds) into minutes: No
  • Number of decimal places for mass values reported in ions' identifiers: 4
  • Number of decimal places for retention time values reported in ions' identifiers: 0
  • General used intensity value: into

Expected result
Two ouputs files :
  • one zip with a table inside with the area of each features in samples
  • one zip with png files corresponding ions chromatograms

Metabolites identification are based on m/z and RT of a in-house chemical library (standards from MSMLS, OAMLS, BACSMLS and FAMLS from IROA Technologies) with Xcalibur, software from ThermoScientific

Expected result
3 files (HILIC POS, C18 POS & C18 NEG) ".pmd" files with a list of study-calibrated RT and mz for each compounds

Data Curation = Signal Normalization & Metabolites Exclusion :
- Normalization of metabolite areas to the total area of detected metabolites
- Calculation of the coefficient of variation (CV) for each metabolite in the QCs (analytical variability) and in the samples (biological variability)
- Removal of metabolites with analytical variability > biological variability
- Removal of metabolites with analytical variability > 30%

Note
Signal Normalization and Metabolites Exclusion are repetead until there is no further exclusion

Analysis Validation: the analyses were validated by observing the distribution of QCs among the samples using Principal Component Analysis (PCA). For all PCAs, the data were log-transformed and underwent UV scaling normalization.
Fusion of modalites = sorting redundancies:
  • Only the best modality is kept for 1 metabolite, based on RT > Dead Volume (0-1 min) and/or lower CV(metabolite) on QCs
  • "Analysis Validation" is proceed
  • Assignment of CHEBI identifiers des for those referenced in the CHEBI database
Softwares required:


Software
Workflow4Metabolomics
NAME
http://bioinformatics.oxfordjournals.org/content/early/2014/12/18/bioinformatics.btu813
DEVELOPER

Software
CHEBI
NAME
Malik, A., Arsalan, M., Moreno, C., Mosquera, J., Félix, E., Kizilören, T., Muthukrishnan, V., Zdrazil, B., Leach, A. R., and O'
DEVELOPER

4. UNTARGETED Metabolomics: DataPreProcessing & Data Processing
".raw" files are converted into ".mzML" files on MSconvert, with the filter "PeakPicking : MSlevels 1-2"
mzML files are uploaded on W4M, as a collection

Note
Keep the "mzML" extension in name file when creating the collection

MSnbase readMSData
xcms get a sampleMetadata file
Download the tsv file and modified it to add some columns:
  • sampleType : pool, blank, sample
  • injectionOrder
  • mode : pos, neg
  • batch
Uploaded this updated sampleMetadata on W4M
xcms plot chromatogram

Expected result
pdf files with TIC by sampleType and by sample displayed

xcms findChromPeaks (xcmsSet)
--> Spectra Filters
  • Filter on Acquisition Numbers*,# : NA
  • Filter on Retention Time (s)* : 30,600
  • Filter on Mz# : 60,850
--> Extraction method for peaks detection
  • CentWave – chromatographic peak detection using centWave method
  • Max tolerated ppm m/z deviation in consecutive scans in ppm# : 3
  • Min,Max peak in second* : 4,20
--> Advanced Options
  • Signal to Noise ratio cutoff : 3
  • Prefilter step for the first analysis step (ROI detection) £,#: 3,20000
  • Name of the function to calculate the m/z center of the chromatographic peak: mean of the peaks’ m/z values
  • Integration method :peak limits are found through descent on the Mexican hat filtered data
  • Minimum difference in m/z peaks with overlapping retention times : -0.001
  • Fitgauss : No
  • Noise filter: 10000
  • Verbose Columns : No
  • Get a list of found chromatographic peaks : No
  • List of regions-of-interest (ROI): NA
xcms findChromPeaks Merger
xcms groupChromPeaks (group) --> Method to use for grouping
  • PeakDensity – peak grouping based on time dimension peak density
  • Bandwidth(s): 5
  • Minimum fraction of samples: 0.25
  • Minimum number of samples : 5
  • Width of overlapping m/z slices : 0.01
--> Advanced Options
  • Maximum number of groups to identify in a single m/z slice : 50
--> Get the Peak List
  • Yes
  • Convert retention time (seconds) into minutes : no
  • Number of decimal places for mass values reported in ions “identifiers”: 5
  • Number of decimal places for retention time reported in ions “identifiers” : 2
  • Reported intensity values : maxo
  • If NA values remain replace them by 0 in the dataMatrix: No
xcms adjustRtime (retcor)
--> Method to use for retention time correction
  • PeakGroups – retention time correction based on alignment of features (peak groups) present in most/all samples
  • Minimum required fraction of samples in which peaks for the peak group were identified: 0.85-0.9
  • Maximum number of additional peaks for all samples to be assigned to a peak group for retention time correction : 1
  • Smooth method: loess & Advanced options : NA
xcms groupChromPeaks (group) --> Method to use for grouping
  • PeakDensity – peak grouping based on time dimension peak density
  • Bandwidth(s): 3
  • Minimum fraction of samples: 0.25
  • Minimum number of samples : 5
  • Width of overlapping m/z slices : 0.01
--> Advanced Options
  • Maximum number of groups to identify in a single m/z slice : 50
--> Get the Peak List
  • Yes
  • Convert retention time (seconds) into minutes : no
  • Number of decimal places for mass values reported in ions “identifiers”: 5
  • Number of decimal places for retention time reported in ions “identifiers” : 2
  • Reported intensity values : maxo
  • If NA values remain replace them by 0 in the dataMatrix: No
xcms adjustRtime (retcor)
--> Method to use for retention time correction
  • PeakGroups – retention time correction based on alignment of features (peak groups) present in most/all samples
  • Minimum required fraction of samples in which peaks for the peak group were identified: 0.85-0.9
  • Maximum number of additional peaks for all samples to be assigned to a peak group for retention time correction : 1
  • Smooth method: loess & Advanced options : NA
xcms groupChromPeaks (group) --> Method to use for grouping
  • PeakDensity – peak grouping based on time dimension peak density
  • Bandwidth(s): 1
  • Minimum fraction of samples: 0.25
  • Minimum number of samples : 5
  • Width of overlapping m/z slices : 0.01
--> Advanced Options
  • Maximum number of groups to identify in a single m/z slice : 50
--> Get the Peak List
  • Yes
  • Convert retention time (seconds) into minutes : no
  • Number of decimal places for mass values reported in ions “identifiers”: 5
  • Number of decimal places for retention time reported in ions “identifiers” : 2
  • Reported intensity values : maxo
  • If NA values remain replace them by 0 in the dataMatrix: No
xcms fillChromPeaks (fillPeaks)
--> Advanced Options: NA
--> Peak List
  • Convert retention time (seconds) in minutes : Yes
  • Number of decimal places for mass values reported in ions “identifiers” : 5
  • Number of decimal places for retention time reported in ions “identifiers” : 2
  • Reported intensity values : maxo
--> If NA values remain, replace them by 0 in the dataMatrix : No
CAMERA groupFWHM (Galaxy Version 0.1.0+camera1.48.0-galaxy1)
--> The multiplier of the standard deviation: 6
--> Percentage of the width of the FWHM: 0.6
--> Polarity : Positive or Negative
--> Advanced parameters
  • FALSE
--> Convert seconds to minutes when exporting tsv : No
--> Number of digits for MZ values : 4
--> Number of digits for RT values : 4
CAMERA findIsotopes (Galaxy Version 0.1.0+camera1.48.0-galaxy1)
--> Max. ion charge : 2
--> Max. number of expected isotopes: 2
--> ppm error for the search : 5
--> Allowed variance for the search : 0.01
--> Choose intensity values for C12/C13 check. Allowed values are into, maxo, intb: maxo
--> The percentage number of samples, which satisfy the C12/C13 rule for isotope annotation : 0.7
--> Shloud C12/C13 filter be applied : No
-->Convert seconds to minutes when exporting tsv : No
--> Number of digits for MZ values (namecustom): 4
--> Number of digits for RT values (namecustom): 4
CAMERA findAdducts (Galaxy Version 0.1.0+camera1.48.0-galaxy1)*
--> General ppm error : 5
--> General absolute error in m/z : 0.015
--> Which polarity mode was used for measuring of the MS sample : negative or positive
--> Use a personal ruleset file : FALSE
--> Choose intensity values : maxo
--> Advanced parameters
  • False
-->Convert seconds to minutes when exporting tsv : Yes
-->Number of digits for MZ values (namecustom): 4
-->Number of digits for RT values (namecustom) : 4
Several "Intensity Check" and "Generic_filter" are realized on missing values, fold_change between blank and others sampleType, quartile Q3 values of ions in pool samples
Then, in dataMatrix, NA are replaced by 0
Batch correction (Galaxy Version 3.0.0)
--> Sample metadata file coding parameter
  • Batch column name : batch
  • Injection order column name : injectionOrder
  • Sample type column name: sampleType
  • Set the name used to tag samples as pool in the sample type column : pool
  • Set the name used to tag samples as blank in the sample type column : blank
  • Set the name used to tag samples asreal sample in the sample type column : sample
--> Type of regression model
  • Loess
  • Span : not modified
  • Unconsistant values : Prevent it
  • Factor of interest : batch
  • Level of details for plots : basic, standard or complete

Note
This step involves “normalizing” the signal. However, if biological sample are different, for example, if there are more or fewer cells in pellets, “biological” normalization is an additional step that must be performed.

Quality Metrics
--> Coefficient of Variation
  • Yes
  • Which type of CV calculation should be done : only pool CV
  • Threshold : 0.3
--> Advanced parameters: default
Generic filter on CV(pool) >0.3

Between-table Correlation (Galaxy Version 1.0.0) : put twice the dataMatrix and indicate that samples are in column for both tables
Analytical correlation filtration (Galaxy Version 2019-06-20)
--> Correlation threshold : 0.85
-->Do you want to take into account mass differences between 2 ions ?
  • Yes
  • Do you have your own list of mass differences or do you want to use a default list ? No
  • Mass difference rang : 0.005
--> Do you want to take into account retention time differences between 2 ions ?
  • Yes
  • Retention time difference threshold : 0.1
  • Which representative ion do you want to select for each group : highest intensity
Generic_filter (Galaxy Version 1.3.0)

Filter on ACorF_groups to remove "0" and "-"

Expected result
A .tsv file where every line correspond to a unique feature FOR EACH MODALITY

Features identification are based on mz and MS spectras between experimental data and public spectral database (FragHub public database, available on Zenodo at https://doi.org/10.5281/zenodo.10837522) using FlashEntropy


Expected result
A .tsv file where every line correspond to a unique feature, where some of them have been identified FOR EACH MODALITY


Safety information
As features (unidentified ions) and metabolites (features identified based on public library) are putatively annotated, it is not possible to merge and sort redundancies between the 3 files

Softwares and References:

Software
MSconvert (ProteoWizard)
NAME
Chambers, M.C., MacLean, B., Burke, R., Amode, D., Ruderman, D.L., Neumann, S., Gatto, L., Fischer, B., Pratt, B., Egertson, J.,
DEVELOPER



Software
Workflow4Metabolomics
NAME
http://bioinformatics.oxfordjournals.org/content/early/2014/12/18/bioinformatics.btu813
DEVELOPER


Software
FragHub
NAME
https://pubs.acs.org/doi/full/10.1021/acs.analchem.4c02219
DEVELOPER


Software
Flash Entropy Search
NAME
Yuanyue Li
DEVELOPER
REPOSITORY

Acknowledgements
This protocol have been developped for a collaboration between MetaboHUB-Tours, Virginija Cvirkaite-Krupovic and Victoria Sevillia, both from the Archaeal Virology Unit, Institut Pasteur, Paris (75), France.