Bibliometric Review Protocol: The Road to Autonomy – AI in Autonomous Vehicles

Camelia Delcea

Jul 17, 2025

Bibliometric Review Protocol: The Road to Autonomy – AI in Autonomous Vehicles

DOI

https://dx.doi.org/10.17504/protocols.io.e6nvw49m2lmk/v1

Camelia Delcea¹

¹Bucharest University of Economic Studies

Camelia Delcea

Bucharest University of Economic Studies

DOI: https://dx.doi.org/10.17504/protocols.io.e6nvw49m2lmk/v1

Protocol Citation: Camelia Delcea 2025. Bibliometric Review Protocol: The Road to Autonomy – AI in Autonomous Vehicles. protocols.io https://dx.doi.org/10.17504/protocols.io.e6nvw49m2lmk/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: July 17, 2025

Last Modified: July 17, 2025

Protocol Integer ID: 222662

Keywords: autonomo*_electric_vehicle*, machine_learning, deep_learning, artificial_intelligence, bibliometric review protocol, bibliometric review, ai in autonomous vehicle, ai, autonomous vehicle, quality publication, road to autonomy, quality publications from the web, artificial intelligence, intersection of artificial intelligence, deep learning, autonomy, machine learning

Abstract

This protocol outlines the methodological framework for a bibliometric review focused on the intersection of artificial intelligence (AI), machine learning (ML), deep learning (DL), and autonomous vehicles (AV). The aim is to map academic production, collaboration patterns, thematic trends, and emerging technologies by analyzing high-quality publications from the Web of Science (WoS) Core Collection.

Guidelines

Eligibility Criteria

Inclusion:
  • Language: English only
  • Document type: Article
  • Publication year: 1995–2024
  • Content: Must include all four domains (AV, AI, ML, DL)

Exclusion:
  • Non-English papers (29)
  • Non-article types (1769)
  • Articles published in 2025 (63)
  • Manually excluded irrelevant topics (37)

Final dataset: 2,228 articles.

Screening Process
  • PRISMA 2020 methodology was followed.
  • Articles were screened manually after keyword filtering.
  • Duplicate and off-topic articles were removed.
  • Figure 1 in the paper presents the PRISMA flow diagram.

Tools and Software
  • Biblioshiny 5.0 in R Studio 4.3.2 (for bibliometric analyses)
  • Gensim (Python) for Latent Dirichlet Allocation (LDA)
  • BERTopic (Python) using HDBSCAN clustering

Data Preprocessing for NLP
  • Text was lowercased
  • Punctuation removed
  • Token normalization applied (e.g., “AI” → “artificial intelligence”)
  • Domain-specific stopwords removed
  • Grid search applied for LDA alpha/eta tuning
  • BERTopic clustering tuned via minimum cluster/sample sizes

Key Outcomes Measured
  • Publication volume by year and country
  • Most cited articles and normalized total citations (NTC)
  • Bradford's Law core journals and H-index analysis
  • Thematic evolution and keyword co-occurrence networks
  • Topic modeling with LDA and BERTopic
  • Collaboration maps for authors and countries

Materials

Web of Science (WoS) Core Collection institutional subscription; Biblioshiny 5.0 in R Studio 4.3.2; Gensim (Python); BERTopic (Python); Computer with internet access.

Before start

Special search symbols:
  • _ was used to capture exact multi-word phrases.
  • * wildcard operator was used for plural/singular and extended term variations.

Databases and Indexes Used

Extract all articles from the Web of Science (WoS) Core Collection using an institutional subscription.
Include the following indexes in the extraction:
  • Science Citation Index Expanded (SCIE)
  • Social Sciences Citation Index (SSCI)
  • Arts and Humanities Citation Index (ACI)
  • Emerging Sources Citation Index (ESCI)
  • Conference Proceedings Citation Index – Science (CPCI-S)
  • CPCI – Social Sciences and Humanities (CPCI-SSH)
  • Book Citation Index – Science (BKCI-S)
  • Book Citation Index – SSH (BKCI-SSH)
  • Index Chemicus (IC)
  • Current Chemical Reactions (CCR-Expanded)

Search Strategy

Apply the following keyword queries in the Title (TI), Abstract (AB), and Author Keywords (AK) fields:
  • "autonomo*electricvehicle*"
  • "machine_learning"
  • "deep_learning"
  • "artificial_intelligence"

Use special search symbols:
  • _ to capture exact multi-word phrases.
  • * as a wildcard operator for plural/singular and extended term variations.

Eligibility Criteria

Apply the following inclusion criteria:
  • Language: English only
  • Document type: Article
  • Publication year: 1995–2024
  • Content: Must include all four domains (AV, AI, ML, DL)

Apply the following exclusion criteria:
  • Non-English papers (29)
  • Non-article types (1769)
  • Articles published in 2025 (63)
  • Manually excluded irrelevant topics (37)

Finalize the dataset to include 2,228 articles after applying inclusion and exclusion criteria.

Screening Process

Follow PRISMA 2020 methodology for screening.

Screen articles manually after keyword filtering.

Remove duplicate and off-topic articles from the dataset.

Refer to Figure 1 in the paper for the PRISMA flow diagram.

Tools and Software

Use Biblioshiny 5.0 in R Studio 4.3.2 for bibliometric analyses.

Use Gensim (Python) for Latent Dirichlet Allocation (LDA) topic modeling.

Use BERTopic (Python) with HDBSCAN clustering for topic modeling.

Data Preprocessing for NLP

Lowercase all text data.

Remove punctuation from text data.

Apply token normalization (e.g., convert “AI” to “artificial intelligence”).

Remove domain-specific stopwords from the text.

Apply grid search for LDA alpha/eta parameter tuning.

Tune BERTopic clustering using minimum cluster and sample sizes.

Key Outcomes Measured

Measure publication volume by year and country.

Identify most cited articles and calculate normalized total citations (NTC).

Analyze Bradford's Law core journals and perform H-index analysis.

Map thematic evolution and keyword co-occurrence networks.

Conduct topic modeling using LDA and BERTopic.

Generate collaboration maps for authors and countries.

Acknowledgements

Retrospective Registration Note: This protocol was registered after the start of data extraction, in line with transparency practices recommended by PRISMA 2020.