License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: May 04, 2026
Last Modified: May 05, 2026
Protocol Integer ID: 316293
Keywords: speech biomarkers in huntington, speech digital biomarkers from video, speech digital biomarker, speech biomarker, speech feature extraction from audio recording, facial feature extraction from video recording, digital biomarker analysis, speech feature extraction, reproducibility of digital biomarker analysis, facial feature extraction, feature screening within the training data, disease huntington, speech impairment, facial feature, feature screening, training data, group comparison between huntington, audio recordings of individual, facial dyskinesia, multimodal machine, transparent computational framework for future validation study, future validation study, huntington, audio recording, digital assessment protocol, video recording, learning classification, behavioral feature, associated behavioral feature, based digital assessment protocol
Funders Acknowledgements:
Yue Huang
Grant ID: NSFC, T2488101, 82071417, Y.H.
Abstract
Huntington’s disease is characterized by motor, cognitive, and behavioral abnormalities, among which facial dyskinesia and speech impairment are clinically prominent but difficult to quantify objectively in routine assessment. This protocol describes a standardized Python-based workflow for extracting, processing, statistically analyzing, and modeling facial and speech digital biomarkers from video and audio recordings of individuals with Huntington’s disease and healthy controls.
The workflow includes facial feature extraction from video recordings, speech feature extraction from audio recordings, group comparison between Huntington’s disease and healthy controls, receiver operating characteristic analysis, and multimodal machine-learning classification. Facial features are extracted from predefined facial regions and movement-related metrics, while speech features cover domains including fundamental frequency, loudness regulation, phonation instability, and temporal dynamics. Statistical analyses are performed to identify disease-associated behavioral features, and machine-learning models are trained using a leakage-controlled framework with stratified train-test splitting, feature screening within the training data, and independent test-set evaluation.
This protocol is intended to improve reproducibility of digital biomarker analysis in Huntington’s disease and to provide a transparent computational framework for future validation studies.
Guidelines
This protocol describes a computational workflow for extracting and analyzing facial and speech digital biomarkers from video and audio recordings of human participants. Because the protocol involves the collection, processing, storage, and analysis of identifiable or potentially identifiable human video and audio data, all users must ensure that the study has received prior approval from their Institutional Ethics Board, Institutional Review Board, or equivalent ethics committee before any participant data are collected.
All participants should provide written informed consent before video or audio recording. The consent process should clearly describe the purpose of the study, the types of recordings collected, how the data will be processed, how privacy and confidentiality will be protected, whether data will be shared, and whether de-identified data may be used for future research.
Users should follow all applicable institutional, national, and international regulations concerning human participant research, data protection, privacy, and secure handling of sensitive health-related information. Raw video, audio, clinical, and neuroimaging data should be stored securely, access should be restricted to authorized research personnel, and identifiers should be removed or coded whenever possible before analysis.
This protocol is intended for research use only. The extracted digital biomarkers and machine-learning outputs should not be used as stand-alone diagnostic tools or as replacements for clinical assessment by qualified healthcare professionals.
Safety warnings
This protocol involves video and audio recordings of human participants. Such recordings may be identifiable even after removal of names or study IDs. Prior approval from an Institutional Ethics Board, Institutional Review Board, or equivalent ethics committee is required before collecting or analyzing participant recordings.
Do not collect, upload, share, or publicly release raw video, raw audio, clinical records, or neuroimaging data unless this is explicitly permitted by the approved ethics protocol and participant consent. De-identification of video and audio data may be incomplete because facial appearance and voice can reveal participant identity.
Store all raw and processed data in secure, access-controlled locations. Avoid using personal computers, unsecured cloud storage, or public repositories for identifiable participant data unless approved by the responsible ethics committee and institution.
This protocol is intended for research purposes only. The extracted facial and speech features, ROC results, and machine-learning predictions should not be used for clinical diagnosis, treatment decisions, or patient management without appropriate clinical validation, regulatory approval, and professional oversight.
Extra caution is required when working with participants who may have cognitive impairment, motor disability, psychiatric symptoms, or reduced capacity to consent. In such cases, follow local ethical requirements for consent, assent, and involvement of legally authorized representatives.
Before start
Before starting this protocol, obtain approval from the relevant Institutional Ethics Board, Institutional Review Board, or equivalent ethics committee for the collection and analysis of human video and audio recordings. Confirm that the approved study protocol specifically covers facial video recording, speech audio recording, clinical data collection, data storage, data processing, and any planned data sharing.
Before collecting data, obtain written informed consent from all participants or their legally authorized representatives. Participants should be informed that video and audio recordings may contain identifiable personal information and that these data will be used for computational analysis of facial and speech features.
Prepare a secure data management plan before recording begins. This should include participant ID coding, file naming rules, secure storage locations, access permissions, backup procedures, and procedures for handling withdrawal of consent. Ensure that all study staff are trained in human participant research ethics, privacy protection, and safe handling of sensitive data.
Confirm that the recording environment, camera, microphone, and analysis scripts are ready before participant assessment. Test the full workflow using non-participant or pilot data before applying it to study participants.
Section 1. Project setup and file organization
Step 1. Create the project folder
Create a main project folder for the digital biomarker analysis.
Step 2. Prepare the analysis scripts
Place all Python scripts into the
scripts/
folder.
Step 3. Prepare the integrated feature table
Place the integrated spreadsheet containing facial and speech features into the project folder.
Section 2. Software environment preparation
Step 4. Install Python
Install Python version 3.8 or later.
Step 5. Install required Python packages
Install the required packages for data processing, statistics, machine learning, and visualization.
Section 3. Facial feature extractiond section
Step 6. Prepare video recordings
Place all participant video files into the
raw_video/
folder.
Step 7. Run facial feature extraction
Run the facial feature extraction script.
Step 8. Check facial feature quality
Inspect the extracted facial feature table.
Sub-step 8.1
Confirm that participant IDs are correctly extracted.
Sub-step 8.2
Check missing values for each facial feature.
Sub-step 8.3
Identify extreme outliers that may result from failed landmark detection.
Sub-step 8.4
Document excluded participants and exclusion reasons.
Section 4. Speech feature extraction
Step 9. Prepare audio recordings
Place all participant audio files into the
raw_audio/
folder.
Sub-step 9.1
Ensure that each audio file is named using the participant ID.
Example:
sub001_audio.wav
sub002_audio.wav
sub003_audio.wav
Sub-step 9.2
Use a consistent audio format when possible.
Recommended format:
.wav
Sub-step 9.3
Exclude audio files with severe background noise, incomplete recording, or failed segmentation.
Step 10. Run speech feature extraction
Run the speech feature extraction script.
Sub-step 10.1
Open the terminal and enter the project folder.
cd HD_digital_biomarker_analysis
Sub-step 10.2
Run:
python "scripts/speech feature extraction.py"
Sub-step 10.3
Save the extracted speech features into the
intermediate_results/
folder.
Recommended output file name:
speech_features.csv
Step 11. Check speech feature quality
Inspect the extracted speech feature table.
Sub-step 11.1
Confirm that participant IDs are correctly extracted.
Sub-step 11.2
Check the number of successfully processed audio files.
Sub-step 11.3
Check missing values for each speech feature.
Sub-step 11.4
Inspect biologically implausible acoustic values.
Sub-step 11.5
Document excluded recordings and exclusion reasons.
Section 5. Data integration
Step 12. Merge facial, speech, demographic, and clinical data
Merge the facial feature table, speech feature table, and clinical information table by participant ID.
Sub-step 12.1
Use participant ID as the key variable.
Sub-step 12.2
Confirm that diagnostic group labels are consistent.
Example group labels:
HD
HC
Sub-step 12.3
Confirm that age and sex are complete for all included participants.
Sub-step 12.4
Save the merged table as:
final_merged_feature_table.xlsx
Step 13. Define feature categories
Classify features into facial and speech domains.
Sub-step 13.1
Facial features may include facial movement variability, regional facial movement metrics, and blink-related features.
Sub-step 13.2
Speech features may include fundamental frequency, loudness regulation, phonation instability, and temporal features.
Sub-step 13.3
Save the feature list for reproducibility.
Recommended output file name:
feature_category_list.xlsx
Section 6. Group comparison between HD and HC
Step 14. Run group comparison analysis
Run the statistical comparison script.
Sub-step 14.1
Open the terminal and enter the project folder.
cd HD_digital_biomarker_analysis
Sub-step 14.2
Run:
python "scripts/HC and HD comparism.py"
Sub-step 14.3
Compare each facial and speech feature between the HD and HC groups.
Sub-step 14.4
Use age and sex as covariates when appropriate.
Recommended model:
Feature ~ Group + Age + Sex
Step 15. Correct for multiple comparisons
Apply false discovery rate correction to feature-level statistical tests.
Sub-step 15.1
Collect raw p values from all tested facial and speech features.
Sub-step 15.2
Apply Benjamini-Hochberg FDR correction.
Sub-step 15.3
Report both raw p values and FDR-adjusted p values.
Sub-step 15.4
Save the statistical results as:
group_comparison_results.xlsx
Step 16. Generate group comparison plots
Generate visualizations for disease-associated features.
Sub-step 16.1
Plot significant facial features.
Sub-step 16.2
Plot significant speech features.
Sub-step 16.3
Use boxplots, violin plots, or dot plots to show group differences.
Sub-step 16.4
Save figures into the
figures/
folder.
Section 7. ROC analysis
Section 7. ROC analysis
Step 17. Run ROC analysis
Run the ROC analysis script.
Sub-step 17.1
Open the terminal and enter the project folder.
cd HD_digital_biomarker_analysis
Sub-step 17.2
Run:
python "scripts/ROC calculation.py"
Sub-step 17.3
Calculate ROC curves for individual facial and speech features.
Sub-step 17.4
Calculate the area under the curve for each feature.
Step 18. Summarize ROC performance
Summarize the discriminatory performance of individual digital biomarkers.
Sub-step 18.1
For each feature, report:
AUC
sensitivity
specificity
optimal cutoff, if applicable
Sub-step 18.2
Rank features according to AUC.
Sub-step 18.3
Save the ROC result table as:
ROC_results.xlsx
Sub-step 18.4
Save ROC curve figures into:
figures/ROC_curves/
Section 8. Machine-learning classification
Step 19. Prepare the machine-learning dataset
Prepare the feature matrix for classification.
Sub-step 19.1
Include participants with complete group label, age, sex, and behavioral feature data.
Sub-step 19.2
Define the outcome variable as diagnostic group.
Example:
HD = 1
HC = 0
Sub-step 19.3
Define predictors as facial features, speech features, age, and sex.
Step 20. Split the dataset into training and test sets
Use a stratified 70:30 train-test split.
Sub-step 20.1
Stratify the split by diagnostic group.
Sub-step 20.2
Assign 70% of participants to the training set.
Sub-step 20.3
Assign 30% of participants to the independent test set.
Sub-step 20.4
Hold out the test set throughout model development.
Sub-step 20.5
Do not use the test set for feature screening, preprocessing, cross-validation, or model selection.
Step 21. Perform cross-validation within the training set
Use stratified five-fold cross-validation within the training set.
Sub-step 21.1
Split the training set into five stratified folds.
Sub-step 21.2
For each fold, use four folds for model training.
Sub-step 21.3
Use the remaining fold for validation.
Sub-step 21.4
Repeat until each fold has been used once as the validation fold.
Step 22. Perform feature screening within each training fold
Perform feature screening only within the fold-specific training subset.
Sub-step 22.1
Use univariate logistic regression to screen behavioral features.
Sub-step 22.2
Do not use the validation fold for feature screening.
Sub-step 22.3
Do not use the independent test set for feature screening.
Sub-step 22.4
Keep age and sex as fixed covariates regardless of feature-screening results.
Step 23. Standardize features within the training pipeline
Standardize features using training data only.
Sub-step 23.1
Fit the scaler on the fold-specific training subset.
Sub-step 23.2
Apply the fitted scaler to the corresponding validation fold.
Sub-step 23.3
For final test-set evaluation, fit the scaler on the full training set only.
Sub-step 23.4
Apply the fitted scaler to the independent test set.
Step 24. Train classification models
Train machine-learning classifiers using the training data.
Sub-step 24.1
Evaluate logistic regression.
Sub-step 24.2
Evaluate random forest.
Sub-step 24.3
Evaluate XGBoost.
Sub-step 24.4
Evaluate LightGBM.
Step 25. Evaluate cross-validation performance
Calculate model performance within the training-set cross-validation procedure.
Sub-step 25.1
Report accuracy.
Sub-step 25.2
Report precision.
Sub-step 25.3
Report recall.
Sub-step 25.4
Report F1 score.
Sub-step 25.5
Report ROC AUC.
Sub-step 25.6
Report average precision.
Step 26. Evaluate final model performance on the independent test set
Retrain the selected model on the full training set and evaluate it on the independent test set.
Sub-step 26.1
Use only the training set for final model fitting.