Apr 28, 2025

Public workspaceOpen Science 24/25 V.1

  • Alberto Ciarrocca1,
  • Anna Nicoletti2,
  • Ahmadreza Nazari3
  • 1University of Bologna;
  • 2Unibo;
  • 3Student
  • Open Science
Icon indicating open access to content
QR code linking to this content
Protocol CitationAlberto Ciarrocca, Anna Nicoletti, Ahmadreza Nazari 2025. Open Science 24/25. protocols.io https://dx.doi.org/10.17504/protocols.io.4r3l264ppv1y/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: In development
We are still developing and optimizing this protocol
Created: April 10, 2025
Last Modified: April 28, 2025
Protocol Integer ID: 126519
Abstract
Purpose
This study aims to evaluate the representation and dissemination of diverse research outputs - such as software, databases, exhibitions, audio-visual materials, and others - produced by the University of Bologna's (UNIBO) researchers across various repositories. It seeks to assess the extent of overlap among these repositories, analyze citation dynamics involving these research objects, and determine their integration within UNIBO's [Current Research Information System (IRIS)](https://cris.unibo.it/).

Methodology
The project team has systematically collected and analyzed metadata from selected institutional, disciplinary, and generalist repositories, including [AMS Acta](https://amsacta.unibo.it/), [Software Heritage](https://www.softwareheritage.org/), and [Zenodo](https://zenodo.org/). Relevant data and metadata were extracted using APIs and web scraping techniques. The analysis identified cross-repository depositions to assess overlaps and employ citation analysis tools, such as [OpenCitations](https://opencitations.net/), to examine citation patterns. Additionally, the collected data were cross-referenced with IRIS to evaluate its coverage of these research outputs.

Findings
The analysis reveals that the coverage of the diverse research outputs produced by the personnel of the University of Bologna varies significantly across the selected repositories. AMS Acta primarily hosts institutional outputs, while Software Heritage focuses on software-related contributions, and Zenodo encompasses a broader range of interdisciplinary research objects. There is a notable overlap among these repositories with certain identical research objects being deposited across multiple repositories. Citation analysis indicates that while these research objects generate both incoming and outgoing citations, their citation networks are unevenly distributed, with software and databases receiving more frequent references. Moreover, a substantial portion of these research objects remains unmapped in IRIS, highlighting gaps in institutional tracking and integration. These findings emphasize the need for a more cohesive strategy to enhance visibility, discoverability, and interoperability across repositories.

# Value
This study shed light on the dissemination patterns and citation impact of UNIBO's diverse research outputs, providing insights to enhance their visibility and accessibility. The findings can inform strategies to optimize repository usage and improve the integration of these research objects within IRIS, ultimately strengthening open science practices at the University of Bologna and promoting a more cohesive approach for maximizing the impact of its research outputs.
Data Gathering
Data Gathering
From Zenodo use the OAI-PMH protocol to harvest metadata in XML format.
From AMSActa use the OAI-PMH protocol to harvest metadata in XML format.
From Software Heritage use API to identify and extract metadata related to code/software authored by UNIBO researchers.
Convert and normalize data
Write scripts to convert XML to tabular format (CSV) for each repository
Normalize metadata fields across all sources, including: title, authors, affiliations, publication year, DOI, repository using Pandas or OpenRefine

Filter data relevant to the project

by author affiliation (e.g., “University of Bologna”) where possible
identify research object types relevant to the project (e.g., software, audio-visual materials).
Integrate datasets
Create a single CSV file combining all normalized and filtered metadata
Include metadata flags indicating source repository
one CSV file with unified metadata from the three repositories
Expected result
CSV

Data Analysis
Data Analysis
Assess the repository coverage using simple descriptive statistics (Answers Research Question 1)

Count the number of research objects per repository
Determine distribution by object type and publication year
Detect overlap across repositories (Answers Research Question 2)
Identify duplicates or matches using identifiers (e.g., DOI, title similarity)
Double-check matches found by title similarity and decide
Tag entries present in multiple repositories
Compare the results with IRIS metadata (Answers Research Question 4)
Cross-check the unified dataset against the IRIS bibliographic dump (https://doi.org/10.6092/unibo/amsacta/7736)
Mark objects that are also registered in IRIS
Collect citation data (Answers Research Question 3)
Use OpenCitations to gather incoming and outgoing citations for entries with a DOI
Record citation counts for each object
A Jupyter-Notebook file with the related CSV files
Expected result
Jupyter-Notebook, CSV

Data Visualisation
Data Visualisation
Visualise key findings
Bar chart showing number of objects per repository (Research Question 1)
Venn diagram showing repository overlap (Research Question 2)

graph showing the citation distribution chart (Research Question 3)
Bar chart showing the percentage of mapped entries in IRIS (Research Question 4)

A Jupyter-Notebook representing the results with visualisations
Expected result
Jupyter-Notebook