Full Workflow for both Targeted and Untargeted Metabolomics on Extracellular Vesicles - MetaboHUB Tours

Jérémy Monteiro; Staff Members of PMAC

Jun 27, 2026

Version 2

Full Workflow for both Targeted and Untargeted Metabolomics on Extracellular Vesicles - MetaboHUB Tours V.2

DOI

https://dx.doi.org/10.17504/protocols.io.5qpvoej9zl4o/v2

Full Workflow for both Targeted and Untargeted Metabolomics on Extracellular Vesicles - MetaboHUB Tours

Jérémy Monteiro^1,2,
Staff Members of PMAC^1,2

¹Plateforme de Métabolomique et d'Analyses Chimiques, US61 ASB, Université de Tours, CHRU Tours, Inserm, Tours, France;
²MetaboHUB-Tours, Tours, France

MetaboHUB-Tours

Jérémy Monteiro

DOI: https://dx.doi.org/10.17504/protocols.io.5qpvoej9zl4o/v2

Protocol Citation: Jérémy Monteiro, Staff Members of PMAC 2026. Full Workflow for both Targeted and Untargeted Metabolomics on Extracellular Vesicles - MetaboHUB Tours. protocols.io https://dx.doi.org/10.17504/protocols.io.5qpvoej9zl4o/v2Version created by Jérémy Monteiro

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: June 26, 2026

Last Modified: June 27, 2026

Protocol Integer ID: 319891

Keywords: untargeted, features, detection, quality metrics, full workflow for untargeted metabolomic, untargeted metabolomic, unstructured data, extracellular vesicle, spectral database, annotation of feature, identification of compound, identification table as output, annotation, metabohub, present in the data, identification table, data, database, untargeted metabolomics on extracellular vesicle

Funders Acknowledgements:

Agence Nationale de la Recherche

Grant ID: ANR-11-INBS-0010 MetaboHUB

Agence Nationale de la recherche au titre de France 2030

Grant ID: ANR-21-ESRE-0035

Abstract

The purpose of this workflow is to describe how unstructured data is processed using two approaches:
1) detection and annotation of features present in the data
2) identification of compounds in our files using FlashEntropy and a spectral database


Each process will generate an identification table as output, which can be cross-referenced to annotate certain features (note the confidence level of the identification).

Image Attribution

Laurent Galineau (Plateforme Imagerie Préclinique, US-61 ASB, Université de Tours, CHRU Tours, Inserm, Tours, France ; Université de Tours, INSERM, Imaging Brain & Neuropsychiatry iBraiN U1253, 37032, Tours, France)
Camille Dupuy (Plateforme de Métabolomique et d'Analyses Chimiques, US61 ASB, Université de Tours, CHRU Tours, Inserm, Tours, France; MetaboHUB-Tours, Tours, France; Université de Tours, INSERM, Imaging Brain & Neuropsychiatry iBraiN U1253, 37032, Tours, France)

Guidelines

If needed, all the software used has a GitHub repository or a dedicated website with training datasets and detailed tutorials.

Materials

Software
MSconvert (ProteoWizard)
NAME
Chambers, M.C., MacLean, B., Burke, R., Amode, D., Ruderman, D.L., Neumann, S., Gatto, L., Fischer, B., Pratt, B., Egertson, J.,
DEVELOPER
https://proteowizard.sourceforge.io/publications.html
SOURCE LINK

Software
BatMass: mass spectrometry data visualization
NAME
Dmitry Avtonomov
DEVELOPER
https://batmass.org/
SOURCE LINK

Software
Workflow4Metabolomics
NAME
http://bioinformatics.oxfordjournals.org/content/early/2014/12/18/bioinformatics.btu813
DEVELOPER
https://workflow4metabolomics.usegalaxy.fr/
SOURCE LINK

Software
FragHub
NAME
https://pubs.acs.org/doi/full/10.1021/acs.analchem.4c02219
DEVELOPER
https://github.com/eMetaboHUB/FragHub
SOURCE LINK

Software
Flash Entropy Search
NAME
Yuanyue Li
DEVELOPER
GitHub
REPOSITORY
https://github.com/YuanyueLi/FlashEntropySearch
SOURCE LINK

Troubleshooting

Problem

MSconvert can created artefact and false fragments during conversion

Solution

For raw files from Waters, DataConvert is recommended. Otherwie, be aware of the issue and use biological context and, when necessary, raw data and manufacturer softwares to "clarify" annotation.

Safety warnings

All sample handling must be carried out under a fume hood, equipped with traditional PPE: lab coat, goggles, and gloves.

Ethics statement

No specific ethic statement required.

Before start

You need to be perfectly aware of what kind of  data you are working with. This workflow may not be adapted to your data because of your hardware and softwares.

Workflow4Metabolomics is actively maintained, so the versions of the tools it uses may change. Older versions remain available. For any questions or support about W4M: community.france-bioinformatique.fr/c/workflow4metabolomics/

1. Extraction of EVs

2h 0m 30s

Add 800 µL   of methanol in each dried pellets samples

Agitate vigouruosly during 00:00:30  

30s

Centrifugation
15000 x g, 4°C, 00:15:00  

15m

1- Transfer 700 µL   in a first 96-wells plate --> for reversed-phase analysis
2- From the 1st plate, transfer 350 µL   in a second plate --> for hilic analysis

1- Reversed-Phase Analysis

Evaporate : 00:30:00   under N2 stream at 40 °C  

30m

Resuspend in 100 µL   of water/acetonitrile (8:2)

Planar agitation : 00:15:00  , strength 7/10

Note
Cover the plate with an aluminium fold

15m

Centrifugation
3000 rpm, 4°C, 00:15:00  

15m

Transfer 85 µL   of each sample in a new 96-wells plates

Pipette and mix 5 µL   of each sample to create a QC sample, then divide it into several aliquots for injection quality control

Cover with an injection-friendly fold

2- HILIC Analysis

Evaporate : 00:30:00   under N2 stream at 40 °C  

30m

Resuspend in 100 µL   of acetonitrile/water (8:2)

Planar agitation : 00:15:00  , strength 7/10

Note
Cover the plate with an aluminium fold

15m

Centrifugation
3000 rpm, 4°C, 00:15:00  

Transfer 85 µL   of each sample in an insert for 1.5 mL glass vial

Pipette and mix 5 µL   of each sample to create a QC sample, then divide it into several aliquots for injection quality control

Seal with an injection-friendly cap

2. Data Acquisition

Data have been acquired on a Q-Exactive. For more details, please refer to the dedicated protocols : https://dx.doi.org/10.17504/protocols.io.eq2lyoq9qgx9/v1

Equipment
Q-Exactive
NAME
Mass Spectrometer
TYPE
ThermoScientific
BRAND

3. TARGETED Metabolomics: Data PreProcessing & Data Processing

".raw" files are converted into ".mzML" files on MSconvert, with the filter "PeakPicking : MSlevels 1-1"

mzML files are uploaded on W4M, as a collection

XCMS findChromPeaks (Galaxy Version 3.12.0+galaxy3)
--> Spectra Filters
Filter on mz: 60-850
--> Extraction Method for Peak Detection
CentWave - chromatographic peak detection using the centWave method
Max tolerated ppm m/z deviation in consecutive scans in ppm: 10
Min,Max peak width in seconds: 4,40
--> Advanced options
Signal to Noise ratio cutoff: 10
 Prefilter step for for the first analysis step (ROI detection): 4,50 000
Name of the function to calculate the m/z center of the chromatographic peak: Use the m/z at the peak apex
Integration method: Mexican hat
Minimum difference in m/z for peaks with overlapping retention times: 0.008
fitgauss: No
Noise filter: 20 000
verbose columns: No
Get a list of found chromatographic peaks: No

XCMS findChromPeaks merger (Galaxy Version 3.12.0+galaxy3)

XCMS groupChromPeaks (Galaxy Version 3.12.0+galaxy3)
--> Method to use for grouping
Peak density
Bandwidth: 5
Minimum fraction of samples: 0.2
Minimum number of samples: 5
Width of overlapping m/z slices: 0.008
--> Advanced Options
Maximum number of groups to identify in a single m/z slices: 50
--> Get the Peak List
No

XCMSadjustRtime (Galaxy Version 3.12.0+galaxy3)
--> Method to use for retentiuon time correction
Peakgroups - retention time correction based on aligment of features (peak groups) present in most/all samples
Minimum required fraction of samples in which peaks for the group were identified: 0.5
Maximal number of additional peaks for all samples to be assigned to a peak group for retention time correction: 1
--> Smooth method
loess - non-linear alignment

XCMS groupChromPeaks (Galaxy Version 3.12.0+galaxy3)
--> Method to use for grouping
Peak density
Bandwidth: 2
Minimum fraction of samples: 0.2
Minimum number of samples: 5
Width of overlapping m/z slices: 0.008
--> Advanced Options
Maximum number of groups to identify in a single m/z slices: 50
--> Get the Peak List
No

XCMS fillChromPeaks (Galaxy Version 3.12.0+galaxy3)
--> Advanced options: nothing modified
--> Peak List
Convert retention time (seconds) into minutes: Yes
Number of decimal places for mass values reported in ions "identifiers": 5
Number of decimal places for retention time reported in ions "identifiers":2
Reported intenisty values: into
If NA values remain, replace them by 0 in the dataMatrix: Yes

CAMERA.annotate
--> Group co-eluted peaks based on RT [groupFWHM]
Multiplier of the standard deviation: 6
Percentage of FWHM width: 0.6
--> Annotation general options
General ppm error: 5
General absolute error in m/z: 0.015
--> Annotate Isotopes [findIsotopes]
Max. ion charge: 3
Max. number of expected isotopes: 4
The percentage number of samples, which must satisfy the C12/C13 rule for isotope annotation: 0.5
--> Mode
only groupFWHM and findIsotopes functions
--> Statistics and resultats export: [diffreport]
Two or more conditions
Number of the most significantly different analytes to creat EICs for: 50000
Width (in seconds) of EICs produced: 200
Intensity values to be used for the diffreport: into
Numerical variable for the height of the eic and boxplots that are printed out: 480
Numeric variable for the width of the eic and boxplots print out made: 640
Number of decimal places of title m/z values in the eic plot: 4
logical indicating whether the reports should be sorted by p-value: No
Export the Diffreport files in: Zip
Export the EIC and boxplots in: Zip
Export options
Convert retention time (seconds) into minutes: No
Number of decimal places for mass values reported in ions' identifiers: 4
Number of decimal places for retention time values reported in ions' identifiers: 0
General used intensity value: into

Expected result
Two ouputs files :
one zip with a table inside with the area of each features in samples
one zip with png files corresponding ions chromatograms

Metabolites identification are based on m/z and RT of a in-house chemical library (standards from MSMLS, OAMLS, BACSMLS and FAMLS from IROA Technologies) with Xcalibur, software from ThermoScientific

Expected result
3 files (HILIC POS, C18 POS & C18 NEG) ".pmd" files with a list of study-calibrated RT and mz for each compounds

Data Curation = Signal Normalization & Metabolites Exclusion :
         - Normalization of metabolite areas to the total area of detected metabolites
         - Calculation of the coefficient of variation (CV) for each metabolite in the QCs (analytical variability) and in the samples (biological variability)
         - Removal of metabolites with analytical variability > biological variability
         - Removal of metabolites with analytical variability > 30%

Note
Signal Normalization and Metabolites Exclusion are repetead until there is no further exclusion

Analysis Validation: the analyses were validated by observing the distribution of QCs among the samples using Principal Component Analysis (PCA). For all PCAs, the data were log-transformed and underwent UV scaling normalization.

Fusion of modalites = sorting redundancies:
Only the best modality is kept for 1 metabolite, based on RT > Dead Volume (0-1 min) and/or lower CV(metabolite) on QCs
"Analysis Validation" is proceed
Assignment of CHEBI identifiers des for those referenced in the CHEBI database

Softwares required:

Software
SIMCA
NAME
Sartorius
DEVELOPER
https://www.sartorius.com/en/products/process-analytical-technology/data-analytics-software/mvda-software/simca
SOURCE LINK

Software
Workflow4Metabolomics
NAME
http://bioinformatics.oxfordjournals.org/content/early/2014/12/18/bioinformatics.btu813
DEVELOPER
https://workflow4metabolomics.usegalaxy.fr/
SOURCE LINK

Software
CHEBI
NAME
Malik, A., Arsalan, M., Moreno, C., Mosquera, J., Félix, E., Kizilören, T., Muthukrishnan, V., Zdrazil, B., Leach, A. R., and O'
DEVELOPER
https://www.ebi.ac.uk/chebi/about
SOURCE LINK

4. UNTARGETED Metabolomics: DataPreProcessing & Data Processing

".raw" files are converted into ".mzML" files on MSconvert, with the filter "PeakPicking : MSlevels 1-2"

mzML files are uploaded on W4M, as a collection

Note
Keep the "mzML" extension in name file when creating the collection

      MSnbase readMSData

   xcms get a sampleMetadata file
Download the tsv file and modified it to add some columns:
sampleType : pool, blank, sample
injectionOrder
mode : pos, neg
 batch
Uploaded this updated sampleMetadata on W4M

   xcms plot chromatogram

Expected result
pdf files with TIC by sampleType and by sample displayed

xcms findChromPeaks (xcmsSet)
--> Spectra Filters
Filter on Acquisition Numbers*,# : NA
Filter on Retention Time (s)* : 30,600
Filter on Mz# : 60,850
--> Extraction method for peaks detection
CentWave – chromatographic peak detection using centWave method
Max tolerated ppm m/z deviation in consecutive scans in ppm# : 3
Min,Max peak in second* : 4,20
--> Advanced Options
Signal to Noise ratio cutoff : 3
Prefilter step for the first analysis step (ROI detection) £,#: 3,20000
Name of the function to calculate the m/z center of the chromatographic peak: mean of the peaks’ m/z values
Integration method :peak limits are found through descent on the Mexican hat filtered data
Minimum difference in m/z peaks with overlapping retention times : -0.001
Fitgauss : No
Noise filter: 10000
Verbose Columns : No
Get a list of found chromatographic peaks : No
List of regions-of-interest (ROI): NA

      xcms findChromPeaks Merger

xcms groupChromPeaks (group)
--> Method to use for grouping
PeakDensity – peak grouping based on time dimension peak density
Bandwidth(s): 5
Minimum fraction of samples: 0.25
Minimum number of samples : 5
Width of overlapping m/z slices : 0.01
--> Advanced Options
Maximum number of groups to identify in a single m/z slice : 50
--> Get the Peak List 
Yes
Convert retention time (seconds) into minutes : no
Number of decimal places for mass values reported in ions “identifiers”: 5
Number of decimal places for retention time reported in ions “identifiers” : 2
Reported intensity values : maxo
If NA values remain replace them by 0 in the dataMatrix: No

xcms adjustRtime (retcor)
--> Method to use for retention time correction
PeakGroups – retention time correction based on alignment of features (peak groups) present in most/all samples
Minimum required fraction of samples in which peaks for the peak group were identified: 0.85-0.9
Maximum number of additional peaks for all samples to be assigned to a peak group for retention time correction : 1
Smooth method: loess & Advanced options : NA

xcms groupChromPeaks (group)
--> Method to use for grouping
PeakDensity – peak grouping based on time dimension peak density
Bandwidth(s): 3
Minimum fraction of samples: 0.25
Minimum number of samples : 5
Width of overlapping m/z slices : 0.01
--> Advanced Options
Maximum number of groups to identify in a single m/z slice : 50
--> Get the Peak List 
Yes
Convert retention time (seconds) into minutes : no
Number of decimal places for mass values reported in ions “identifiers”: 5
Number of decimal places for retention time reported in ions “identifiers” : 2
Reported intensity values : maxo
If NA values remain replace them by 0 in the dataMatrix: No

xcms adjustRtime (retcor)
--> Method to use for retention time correction
PeakGroups – retention time correction based on alignment of features (peak groups) present in most/all samples
Minimum required fraction of samples in which peaks for the peak group were identified: 0.85-0.9
Maximum number of additional peaks for all samples to be assigned to a peak group for retention time correction : 1
Smooth method: loess & Advanced options : NA

xcms groupChromPeaks (group)
--> Method to use for grouping
PeakDensity – peak grouping based on time dimension peak density
Bandwidth(s): 1
Minimum fraction of samples: 0.25
Minimum number of samples : 5
Width of overlapping m/z slices : 0.01
--> Advanced Options
Maximum number of groups to identify in a single m/z slice : 50
--> Get the Peak List 
Yes
Convert retention time (seconds) into minutes : no
Number of decimal places for mass values reported in ions “identifiers”: 5
Number of decimal places for retention time reported in ions “identifiers” : 2
Reported intensity values : maxo
If NA values remain replace them by 0 in the dataMatrix: No

xcms fillChromPeaks (fillPeaks)
--> Advanced Options: NA
--> Peak List
Convert retention time (seconds) in minutes : Yes
Number of decimal places for mass values reported in ions “identifiers” : 5
Number of decimal places for retention time reported in ions “identifiers” : 2 
Reported intensity values : maxo
--> If NA values remain, replace them by 0 in the dataMatrix : No

CAMERA groupFWHM (Galaxy Version 0.1.0+camera1.48.0-galaxy1)
--> The multiplier of the standard deviation: 6
--> Percentage of the width of the FWHM: 0.6
--> Polarity : Positive or Negative
--> Advanced parameters
FALSE
--> Convert seconds to minutes when exporting tsv : No
--> Number of digits for MZ values : 4
--> Number of digits for RT values : 4

CAMERA findIsotopes (Galaxy Version 0.1.0+camera1.48.0-galaxy1)
--> Max. ion charge : 2
--> Max. number of expected isotopes: 2
--> ppm error for the search : 5
--> Allowed variance for the search : 0.01
--> Choose intensity values for C12/C13 check. Allowed values are into, maxo, intb: maxo
--> The percentage number of samples, which satisfy the C12/C13 rule for isotope annotation : 0.7
--> Shloud C12/C13 filter be applied : No
-->Convert seconds to minutes when exporting tsv : No
--> Number of digits for MZ values (namecustom): 4
--> Number of digits for RT values (namecustom): 4

CAMERA findAdducts (Galaxy Version 0.1.0+camera1.48.0-galaxy1)*
--> General ppm error : 5
--> General absolute error in m/z : 0.015
--> Which polarity mode was used for measuring of the MS sample : negative or positive
--> Use a personal ruleset file : FALSE
--> Choose intensity values : maxo
--> Advanced parameters
False
-->Convert seconds to minutes when exporting tsv : Yes
-->Number of digits for MZ values (namecustom): 4
-->Number of digits for RT values (namecustom) : 4

Several "Intensity Check" and "Generic_filter" are realized on missing values, fold_change between blank and others sampleType, quartile Q3 values of ions in pool samples

Then, in dataMatrix, NA are replaced by 0

Batch correction (Galaxy Version 3.0.0)
--> Sample metadata file coding parameter
Batch column name : batch
Injection order column name : injectionOrder
Sample type column name: sampleType
Set the name used to tag samples as pool in the sample type column : pool
Set the name used to tag samples as blank in the sample type column : blank
Set the name used to tag samples asreal sample in the sample type column : sample
--> Type of regression model
Loess
Span : not modified
Unconsistant values : Prevent it
 Factor of interest : batch
Level of details for plots : basic, standard or complete

Note
This step involves “normalizing” the signal. However, if biological sample are different, for example, if there are more or fewer cells in pellets, “biological” normalization is an additional step that must be performed.

Quality Metrics
--> Coefficient of Variation
Yes
Which type of CV calculation should be done : only pool CV
Threshold : 0.3
--> Advanced parameters: default

Generic filter on CV(pool) >0.3

Between-table Correlation (Galaxy Version 1.0.0) : put twice the dataMatrix and indicate that samples are in column for both tables

Analytical correlation filtration (Galaxy Version 2019-06-20)
--> Correlation threshold : 0.85
-->Do you want to take into account mass differences between 2 ions ?
Yes
Do
you have your own list of mass differences or do you want to use a default list ? No
Mass difference rang : 0.005
--> Do you want to take into account retention time differences between 2 ions ?
Yes
Retention time difference threshold : 0.1
Which representative ion do you want to select for each group : highest intensity

Generic_filter (Galaxy Version 1.3.0)

Filter on ACorF_groups to remove "0" and "-"

Expected result
A .tsv file where every line correspond to a unique feature FOR EACH MODALITY

Features identification are based on mz and MS spectras between experimental data and public spectral database (FragHub public database, available on Zenodo at https://doi.org/10.5281/zenodo.10837522) using FlashEntropy


Expected result
A .tsv file where every line correspond to a unique feature, where some of them have been identified FOR EACH MODALITY

Safety information
As features (unidentified ions) and metabolites (features identified based on public library) are putatively annotated, it is not possible to merge and sort redundancies between the 3 files

Softwares and References: 

Software
MSconvert (ProteoWizard)
NAME
Chambers, M.C., MacLean, B., Burke, R., Amode, D., Ruderman, D.L., Neumann, S., Gatto, L., Fischer, B., Pratt, B., Egertson, J.,
DEVELOPER
https://proteowizard.sourceforge.io/publications.html
SOURCE LINK



Software
Workflow4Metabolomics
NAME
http://bioinformatics.oxfordjournals.org/content/early/2014/12/18/bioinformatics.btu813
DEVELOPER
https://workflow4metabolomics.usegalaxy.fr/
SOURCE LINK


Software
FragHub
NAME
https://pubs.acs.org/doi/full/10.1021/acs.analchem.4c02219
DEVELOPER
https://github.com/eMetaboHUB/FragHub
SOURCE LINK


Software
Flash Entropy Search
NAME
Yuanyue Li
DEVELOPER
GitHub
REPOSITORY
https://github.com/YuanyueLi/FlashEntropySearch
SOURCE LINK

Acknowledgements

This protocol have been developped for a collaboration between MetaboHUB-Tours,  Virginija Cvirkaite-Krupovic and Victoria Sevillia, both from the Archaeal Virology Unit, Institut Pasteur, Paris (75), France.