SANRA Quality Self-Assessment Protocol

Edison Carrasco-Jiménez

Mar 11, 2026

SANRA Quality Self-Assessment Protocol

DOI

https://dx.doi.org/10.17504/protocols.io.rm7vzeddrvx1/v1

Edison Carrasco-Jiménez¹

¹independent researcher

Edison Carrasco-Jiménez

Independent Researcher

DOI: https://dx.doi.org/10.17504/protocols.io.rm7vzeddrvx1/v1

Protocol Citation: Edison Carrasco-Jiménez 2026. SANRA Quality Self-Assessment Protocol. protocols.io https://dx.doi.org/10.17504/protocols.io.rm7vzeddrvx1/v1

Manuscript citation:

Artificial Intelligence in Nursing Documentation: Efficiency Gains and the Redistribution of Cognitive Work

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: March 04, 2026

Last Modified: March 11, 2026

Protocol Integer ID: 246426

Keywords: research on narrative review quality, narrative review quality, quality of narrative review article, narrative review article, assessment of the manuscript, quality assessment, scale for the quality assessment, peer review, sanra quality self, advance of peer review, asserting quality, revisions during manuscript development, manuscript development, assessment, assessment protocol this document, research integrity, assessment protocol, quality, quality claim, sanra scale, enabling reviewer, review, scoring process, informed revision, finding, editorial decision, traceable record of methodological decision, manuscript, using sanra, purposes in this submission, research, methodological decision

Abstract

This document reports the authors' self-assessment of the manuscript using the SANRA scale — the Scale for the Quality Assessment of Narrative Review Articles (Baethge, Goldbeck-Wood, 26 Mertens, 2019; Research Integrity and Peer Review, 4(1), 5. https://doi.org/10.1186/s41073-019-0064-8). SANRA was developed to provide a brief, standardised instrument for evaluating the quality of narrative review articles across six domains. It is used in editorial decision-making, peer review, and research on narrative review quality. Authors' self-assessment using SANRA serves three purposes in this submission. First, it demonstrates methodological transparency: the scoring process requires the authors to make explicit the evidence basis for each quality claim rather than asserting quality narratively. Second, it identifies specific limitations honestly and in advance of peer review, enabling reviewers to assess how the authors have handled known weaknesses rather than discovering them as unacknowledged gaps. Third, it documents how SANRA findings informed revisions during manuscript development, providing a traceable record of methodological decision-making.

Guidelines

The following sections provide the full evidence base and limitation analysis for each SANRA item. For each item, four components are reported: (1) the item definition and scoring anchors as per Baethge et al. (2019); (2) the manuscript evidence supporting the assigned score, with specific section and page references; (3) acknowledged limitations that qualify or constrain the assigned score; and (4) the score rationale explaining why the assigned score was selected over the alternatives.

Troubleshooting

DETAILED ITEM ASSESSMENTS

Item 5: Scientific Reasoning
Definition: Is the review's argument logically structured, conclusions supported by evidence, and evidence-reasoning relationships transparent?
Scoring Anchors: 0 — Reasoning unclear or circular; conclusions not derivable from evidence presented. 1 — Reasoning generally coherent but with notable logical gaps, unsupported claims, or conclusions that exceed evidence. 2 — Reasoning explicit and traceable; conclusions proportionate to evidence; uncertainty acknowledged where appropriate.
Manuscript Evidence The review advances a single central argument — the distinction between temporal efficiency and cognitive transformation — developed sequentially across Theoretical Foundations (CLT framework and automation research), Results (evidence synthesis across five thematic axes), and Discussion (clinical, educational, and policy implications). Each claim is attributed to specific studies with explicit epistemic hedging: quantitative figures from Zhao et al. (2026) are presented with their Ie75% limitation; perceived burden findings are distinguished from validated cognitive load measurement; deskilling evidence is qualified by the absence of longitudinal nursing-specific data. The conceptual contribution (Section 2.3) is positioned as a framework for interpretation rather than an empirical finding, which is appropriate for the review type. No conclusions are presented as established facts without supporting evidence.
Acknowledged Limitations The review's argument rests on a conceptual distinction (temporal efficiency vs. cognitive transformation) that, while theoretically grounded, has not been operationally validated in nursing research. The review proposes this distinction as analytically useful; it does not demonstrate empirically that the two constructs are independent or separable in nursing practice contexts. This represents an appropriate limitation of a theoretical synthesis rather than a flaw in reasoning, but it should be recognised.
Score Rationale Score 2: Argument is logically structured, evidence-reasoning links are explicit, uncertainty is consistently acknowledged, and conclusions are calibrated to evidence strength. The conceptual contribution is appropriately framed as theoretical rather than empirical.

Item 6: Presentation of Relevant Endpoint Data
Definition: Are key quantitative findings reported with sufficient precision (e.g., effect sizes, confidence intervals, p-values) rather than solely as narrative summaries?
Scoring Anchors: 0 — No quantitative data presented; findings reported only as directional narratives ('improved', 'reduced'). 1 — Some quantitative data presented but inconsistently; confidence intervals or precision estimates frequently absent. 2 — Key findings presented with effect sizes, precision estimates, and sample sizes; limitations of reported statistics acknowledged.
Manuscript Evidence Key quantitative findings are reported with precision throughout: Pelletier et al. (2025) — ITSA 20.9% time reduction (95% CI 17.2–24.6%), NASA-TLX all pc0.001, burnout 54.9%→33.3% (p=0.01); Zhao et al. (2026) — SMD −0.72, OR 0.28, Ie75% with explicit heterogeneity caveat; Lyell et al. (2017) — omission errors +28.7%, commission errors +56.9%, n=120; Kraft van Ermel et al. (2024) — 15.6% hallucination rate; Abdelhadi et al. (2022) — n=285 RNs, significant correlation; Git et al. (2024) — n=32; Yen et al. (2018) — 23% CL reduction. Section 3.3 explicitly states that quantitative data are drawn from primary studies and Zhao et al.'s meta-analysis (random-effects model, REML), and that this review conducts no independent pooling.
Acknowledged Limitations Not all included studies reported quantitative data with equal precision — several cited studies are implementation evaluations relying on self-report without confidence intervals. For these studies, findings are reported descriptively ('consistently reported', 'generally observed') rather than with false precision, which is methodologically appropriate but means the quantitative reporting is uneven across the corpus. Studies described as 'in press' without full journal details cannot be verified by readers, which is a transparency limitation.
Score Rationale Score 2: Core quantitative findings reported with effect sizes, sample sizes, confidence intervals, and precision estimates. Statistical limitations (Ie75%, random-effects model source) explicitly acknowledged. Appropriate descriptive language used for studies without precise quantitative data.

OVERALL ASSESSMENT AND IMPLICATIONS FOR THIS SUBMISSION

The manuscript received a total SANRA score of 11/12. Five of six items were scored at the maximum (2/2); item 3 (literature search description) was scored 1/2. This score profile indicates a manuscript of high overall quality for a narrative review, with one specific and documented methodological limitation.
The item 3 limitation — absence of a PRISMA flow diagram, non-verbatim search query reporting, and single-reviewer screening — is inherent to the narrative review methodology chosen and cannot be resolved without fundamentally changing the review type. A systematic review with dual independent screening and PRISMA flow would require a registered protocol, exhaustive search strategy, and quantitative pooling infrastructure that would be methodologically inappropriate given the high heterogeneity of the literature (Ie75%; Zhao et al., 2026).
The methodological choice to conduct a narrative review rather than a systematic review is explicitly justified in Section 3.1 of the manuscript on three grounds: the theoretical rather than effect-size objective, the heterogeneity of available evidence, and the nascent state of nursing-specific research. The item 3 score of 1/2 thus reflects an honest acknowledgement that narrative reviews, even well-conducted ones, cannot satisfy the reproducibility standards of systematic reviews — not a correctable error in execution.
The high scores on items 5 (scientific reasoning) and 6 (endpoint data presentation) reflect deliberate manuscript choices: explicit epistemic hedging of all quantitative claims, consistent attribution of uncertainty to specific methodological sources (Ie75%, pre-post designs, single-site samples), and calibration of conclusions to evidence strength rather than to the direction of findings. These features directly address the most common quality failures in narrative reviews identified by Baethge et al. (2019): overconfident conclusions, insufficient data presentation, and absence of transparent reasoning.
Authors invite editors and reviewers to use this document as a structured reference during review and to identify any items where their independent assessment differs from the authors' scoring. Disagreements in SANRA ratings are legitimate targets for reviewer comment and revision, and the authors commit to responding to specific SANRA-based critique in any revision process.

HOW SANRA ASSESSMENT INFORMED MANUSCRIPT DEVELOPMENT

This protocol was not completed retrospectively as a post-hoc quality certificate. Rather, the SANRA framework was applied iteratively during manuscript development, and specific revisions were made in response to scoring gaps identified during assessment. The following changes resulted directly from SANRA application:
Item 1 (importance): The original manuscript stated the review's importance implicitly. SANRA item 1 prompted explicit articulation of four specific gaps in the literature (methodological, empirical, conceptual, and measurement-related) in the Introduction and Section 2.3.
Item 2 (aims): The original abstract stated aims broadly. SANRA item 2 prompted revision to include explicit specification of target population (nurses), clinical domain, theoretical framework (CLT), and practical scope in the abstract.
Item 3 (search): SANRA item 3 prompted addition of explicit eligibility criteria (four inclusion, three exclusion categories), rationale for temporal boundaries, and the statement of SANRA-aligned reporting in Section 3.3. A score of 2/2 was not achievable without converting the review to a systematic review, which was methodologically unjustified.
Item 5 (reasoning): SANRA item 5 prompted systematic review of all conclusions to ensure proportionality to evidence. Specific revisions included the three-stage caveat on OR=0.28 (Section 4.1), the explicit separation of extraneous load reduction from cognitive transformation claims (Section 5.1), and the novice-expert differentiation in Section 5.3.
Item 6 (data): SANRA item 6 prompted addition of the statistical model transparency paragraph (Section 3.3), specifying that quantitative data are drawn from primary studies without independent pooling, and that Zhao et al. (2026) employed a random-effects model with REML estimation.

Protocol references

Baethge, C., Goldbeck-Wood, S., 26 Mertens, S. (2019). Research Integrity and Peer Review, 4(1), 5. https://doi.org/10.1186/s41073-019-0064-8

Acknowledgements

Several elements reduce reproducibility to below full standard. First, no PRISMA flow diagram is provided; the number of records identified, screened, and excluded at each stage is not reported. Second, specific Boolean search strings as entered in each database are not reproduced verbatim; semantic domains are described but exact queries are not. Third, a single reviewer conducted screening and selection; dual independent screening with inter-rater reliability assessment was not performed. Fourth, grey literature and non-English publications were not systematically addressed. These omissions are consistent with the narrative review methodology chosen, but they preclude full reproducibility of the search. A proportion of included studies are described as 'in press' without complete journal citations, which reduces verifiability for reviewers. Industry-affiliated or vendor-sponsored studies are cited without systematic assessment of funding source as a risk-of-bias dimension — a limitation acknowledged in Section 6 of the manuscript. Some earlier references (Blair 26 Smith, 2012; Wang et al., 2011) have been superseded by more recent evidence and serve primarily as historical context rather than current evidentiary support. Score Rationale Score 2: References are substantively balanced, span study types and perspectives, and include both supportive and critical evidence. The limitations (in-press citations, funding source) are formally acknowledged. Item 5: Scientific Reasoning Definition: Is the review's argument logically structured, conclusions supported by evidence, and evidence-reasoning relationships transparent? Scoring Anchors: 0 — Reasoning unclear or circular; conclusions not derivable from evidence presented. 1 — Reasoning generally coherent but with notable logical gaps, unsupported claims, or conclusions that exceed evidence. 2 — Reasoning explicit and traceable; conclusions proportionate to evidence; uncertainty acknowledged where appropriate. Manuscript Evidence The review advances a single central argument — the distinction between temporal efficiency and cognitive transformation — developed sequentially across Theoretical Foundations (CLT framework and automation research), Results (evidence synthesis across five thematic axes), and Discussion (clinical, educational, and policy implications). Each claim is attributed to specific studies with explicit epistemic hedging: quantitative figures from Zhao et al. (2026) are presented with their I^e75% limitation; perceived burden findings are distinguished from validated cognitive load measurement; deskilling evidence is qualified by the absence of longitudinal nursing-specific data. The conceptual contribution (Section 2.3) is positioned as a framework for interpretation rather than an empirical finding, which is appropriate for the review type. No conclusions are presented as established facts without supporting evidence. Acknowledged Limitations The review's argument rests on a conceptual distinction (temporal efficiency vs. cognitive transformation) that, while theoretically grounded, has not been operationally validated in nursing research. The review proposes this distinction as analytically useful; it does not demonstrate empirically that the two constructs are independent or separable in nursing practice contexts. This represents an appropriate limitation of a theoretical synthesis rather than a flaw in reasoning, but it should be recognised. Score Rationale Score 2: Argument is logically structured, evidence-reasoning links are explicit, uncertainty is consistently acknowledged, and conclusions are calibrated to evidence strength. The conceptual contribution is appropriately framed as theoretical rather than empirical. Item 6: Presentation of Relevant Endpoint Data Definition: Are key quantitative findings reported with sufficient precision (e.g., effect sizes, confidence intervals, p-values) rather than solely as narrative summaries? Scoring Anchors: 0 — No quantitative data presented; findings reported only as directional narratives ('improved', 'reduced'). 1 — Some quantitative data presented but inconsistently; confidence intervals or precision estimates frequently absent. 2 — Key findings presented with effect sizes, precision estimates, and sample sizes; limitations of reported statistics acknowledged. Manuscript Evidence Key quantitative findings are reported with precision throughout: Pelletier et al. (2025) — ITSA 20.9% time reduction (95% CI 17.2–24.6%), NASA-TLX all pc0.001, burnout 54.9%→33.3% (p=0.01); Zhao et al. (2026) — SMD −0.72, OR 0.28, I^e75% with explicit heterogeneity caveat; Lyell et al. (2017) — omission errors +28.7%, commission errors +56.9%, n=120; Kraft van Ermel et al. (2024) — 15.6% hallucination rate; Abdelhadi et al. (2022) — n=285 RNs, significant correlation; Git et al. (2024) — n=32; Yen et al. (2018) — 23% CL reduction. Section 3.3 explicitly states that quantitative data are drawn from primary studies and Zhao et al.'s meta-analysis (random-effects model, REML), and that this review conducts no independent pooling. Acknowledged Limitations Not all included studies reported quantitative data with equal precision — several cited studies are implementation evaluations relying on self-report without confidence intervals. For these studies, findings are reported descriptively ('consistently reported', 'generally observed') rather than with false precision, which is methodologically appropriate but means the quantitative reporting is uneven across the corpus. Studies described as 'in press' without full journal details cannot be verified by readers, which is a transparency limitation. Score Rationale Score 2: Core quantitative findings reported with effect sizes, sample sizes, confidence intervals, and precision estimates. Statistical limitations (I^e75%, random-effects model source) explicitly acknowledged. Appropriate descriptive language used for studies without precise quantitative data. 4. OVERALL ASSESSMENT AND IMPLICATIONS FOR THIS SUBMISSION The manuscript received a total SANRA score of 11/12. Five of six items were scored at the maximum (2/2); item 3 (literature search description) was scored 1/2. This score profile indicates a manuscript of high overall quality for a narrative review, with one specific and documented methodological limitation. The item 3 limitation — absence of a PRISMA flow diagram, non-verbatim search query reporting, and single-reviewer screening — is inherent to the narrative review methodology chosen and cannot be resolved without fundamentally changing the review type. A systematic review with dual independent screening and PRISMA flow would require a registered protocol, exhaustive search strategy, and quantitative pooling infrastructure that would be methodologically inappropriate given the high heterogeneity of the literature (I^e75%; Zhao et al., 2026). The methodological choice to conduct a narrative review rather than a systematic review is explicitly justified in Section 3.1 of the manuscript on three grounds: the theoretical rather than effect-size objective, the heterogeneity of available evidence, and the nascent state of nursing-specific research. The item 3 score of 1/2 thus reflects an honest acknowledgement that narrative reviews, even well-conducted ones, cannot satisfy the reproducibility standards of systematic reviews — not a correctable error in execution. The high scores on items 5 (scientific reasoning) and 6 (endpoint data presentation) reflect deliberate manuscript choices: explicit epistemic hedging of all quantitative claims, consistent attribution of uncertainty to specific methodological sources (I^e75%, pre-post designs, single-site samples), and calibration of conclusions to evidence strength rather than to the direction of findings. These features directly address the most common quality failures in narrative reviews identified by Baethge et al. (2019): overconfident conclusions, insufficient data presentation, and absence of transparent reasoning. Authors invite editors and reviewers to use this document as a structured reference during review and to identify any items where their independent assessment differs from the authors' scoring. Disagreements in SANRA ratings are legitimate targets for reviewer comment and revision, and the authors commit to responding to specific SANRA-based critique in any revision process. 5. HOW SANRA ASSESSMENT INFORMED MANUSCRIPT DEVELOPMENT This protocol was not completed retrospectively as a post-hoc quality certificate. Rather, the SANRA framework was applied iteratively during manuscript development, and specific revisions were made in response to scoring gaps identified during assessment. The following changes resulted directly from SANRA application: Item 1 (importance): The original manuscript stated the review's importance implicitly. SANRA item 1 prompted explicit articulation of four specific gaps in the literature (methodological, empirical, conceptual, and measurement-related) in the Introduction and Section 2.3. Item 2 (aims): The original abstract stated aims broadly. SANRA item 2 prompted revision to include explicit specification of target population (nurses), clinical domain, theoretical framework (CLT), and practical scope in the abstract. Item 3 (search): SANRA item 3 prompted addition of explicit eligibility criteria (four inclusion, three exclusion categories), rationale for temporal boundaries, and the statement of SANRA-aligned reporting in Section 3.3. A score of 2/2 was not achievable without converting the review to a systematic review, which was methodologically unjustified. Item 5 (reasoning): SANRA item 5 prompted systematic review of all conclusions to ensure proportionality to evidence. Specific revisions included the three-stage caveat on OR=0.28 (Section 4.1), the explicit separation of extraneous load reduction from cognitive transformation claims (Section 5.1), and the novice-expert differentiation in Section 5.3. Item 6 (data): SANRA item 6 prompted addition of the statistical model transparency paragraph (Section 3.3), specifying that quantitative data are drawn from primary studies without independent pooling, and that Zhao et al. (2026) employed a random-effects model with REML estimation. 6. REFERENCE Baethge, C., Goldbeck-Wood, S., 26 Mertens, S. (2019). SANRA — a scale for the quality assessment of narrative review articles. Research Integrity and Peer Review, 4(1), 5. https://doi.org/10.1186/s41073-019-0064-8 

Public workspaceSANRA Quality Self-Assessment Protocol

SANRA Quality Self-Assessment Protocol