Jun 17, 2025

Public workspaceExtended-data for the manuscript: Preclinical Evaluation of Large Language Model-Generated Instructions to Complement Digital Prescriptions in Primary Health Care

  • Zilma Silveira Nogueira Reis1,
  • Elisa Tuler Albergaria2,
  • Adriana Silvina Pagano1,
  • Eura Martins Lage1,
  • Flávia ibeiro de Oliveira1,
  • Cristiane dos Santos Dias1,
  • Juliana Almeida Oliveira1,
  • Gláucia iranda Varella Pereira1,
  • Isaias Jose Ramos de Oliveira1,
  • Érico ranco Mineiro1,
  • Davi dos Reis de Jesus2,
  • Antônio ereira de Souza Júnior2,
  • Igor de Carvalho Gomes3,
  • Rodrigo André Cuevas Gaete3,
  • Ricardo João Cruz-Correia4,
  • Leonardo Chaves Dutra da Rocha2
  • 1Universidade Federal de Minas Gerais, Brazil;
  • 2Universidade Federal de São João del Rei, Brazil;
  • 3Ministry of Health, Brazil;
  • 4Universidade do Porto, Portugal
Icon indicating open access to content
QR code linking to this content
Protocol CitationZilma Silveira Nogueira Reis, Elisa Tuler Albergaria, Adriana Silvina Pagano, Eura Martins Lage, Flávia ibeiro de Oliveira, Cristiane dos Santos Dias, Juliana Almeida Oliveira, Gláucia iranda Varella Pereira, Isaias Jose Ramos de Oliveira, Érico ranco Mineiro, Davi dos Reis de Jesus, Antônio ereira de Souza Júnior, Igor de Carvalho Gomes, Rodrigo André Cuevas Gaete, Ricardo João Cruz-Correia, Leonardo Chaves Dutra da Rocha 2025. Extended-data for the manuscript: Preclinical Evaluation of Large Language Model-Generated Instructions to Complement Digital Prescriptions in Primary Health Care. protocols.io https://dx.doi.org/10.17504/protocols.io.dm6gpq751lzp/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol, and it's working, to evaluates the performance of Large Language Models (LLMs) in generating instructions to complement drug prescriptions in Primary Health Care (PHC).
Created: June 16, 2025
Last Modified: June 17, 2025
Protocol Integer ID: 220361
Keywords: generated instructions to complement digital prescription, medication use instruction, using drug package insert, complement digital prescription, utilizing prescription, drug package insert, preclinical evaluation of large language model, error rates in the instruction, layout similar to the electronic health record, generated instruction, electronic health record, retrieval augmented generation, large language model, preclinical evaluation, inducing scenario, evaluation
Funders Acknowledgements:
Ministry of Health, Brazil
Grant ID: DECIT/CNPq [nº 00758/2024-5
Gates Foundation
Grant ID: INV-081397
Disclaimer
The published extended data have not yet been peer-reviewed
Abstract
This study is a randomized, blinded experimental study utilizing prescription-inducing scenarios assigned among 62 healthcare professionals to validate instructions generated by Large Language Models (LLMs). A simulation environment was developed with a layout similar to the electronic health record (EHR) of the Unified Health System, in Brazil. Medication use instructions were generated by three models: ChatGPT-4.0, Llama3.1-8B, and Llama3.1-8B enhanced with Retrieval Augmented Generation (RAG) using drug package inserts (Llama3.1-8B-RAG). Performance metrics, including appropriateness, completeness, clarity, personalization, utility, and error rates in the instructions, were analyzed globally and specifically.
Troubleshooting
Supplementary Table 1. Data collection form on healthcare professionals invited to the model validation process




QuestionsResponse
Municipality where you work (and State)Lists
ProfessionPrimary Health Care NursePrimary Health Care DentistPrimary Health Care Doctor
Year of graduation in the above professionYear
Sex/genderFree text
Since when have you been using e-SUS APS (year you started using it)? Year
What are your concerns about the introduction of artificial intelligence in healthcare (check all that apply)?Patient safety;Discrimination against minorities or disadvantaged people;Impact on communication between humans;None;Other.
In your opinion, what are the potential disadvantages or risks of using artificial intelligence to support communication between the prescribing professional and the health system user?Free text
In your opinion, what are the potential advantages of using artificial intelligence to support communication between the prescribing professional and the health system user?Free text
PHC: Primary Health Care.


Supplementary Table 2a. Simulated prescription-inducing scenarios used in the study validation environment



Profile of the targeted prescriberContext to stimulate prescribing in the test environmentAge (years)GenderHealth ConditionPreparationContinuous use
DentistYou see a 43-year-old man at a Dental Specialty Center. You identify a gum abscess and begin treatment.Prescription Task: Prescribe an analgesic medication to relieve pain, with clear instructions on how the person should use the medication.43MGum abscessTabletNo
DentistYou are treating a 29-year-old woman presenting with pain and swelling following endodontic treatment. She is already using common analgesics.Prescription task: Prescribe an anti-inflammatory medication with clear instructions for use.29FEndodontic edemaTabletNo
DentistYou see a 62-year-old man. The diagnosis is acute periapical abscess. There is no history of allergies or comorbidities.Prescription Task: Prescribe an antibiotic, with clear instructions for use.62MPeriapical abscessTabletNo
Primary Health Care NurseA young woman complained of discharge and a fishy vaginal odor. Prescribing Task: Following the vaginal discharge treatment protocol, prescribe metronidazole, with clear instructions on how the person should use the medication.FVaginal dischargeTabletNo
Primary Health Care NurseA 6-month-old baby is brought to the childcare clinic with iron deficiency anemia. Prescription Task: Prescribe treatment according to the Primary Health Care protocol, with clear instructions for use.0,5AnemiaLiquidNo
Primary Health Care NurseA 42-year-old man seeks consultation at the Primary Health Care Center with suspected dengue fever (fever, headache, retroocular pain and myalgia). Prescription task: Prescribe an antipyretic in accordance with Primary Health Care protocols, providing clear instructions for use of the medication.42MDengueTabletNo
Primary Health Care Doctor25-year-old male with mild asthma. Prescribing Task: Prescribe a bronchodilator, with clear instructions on how to use the medication.25MAsthmaInhalersYes
Primary Health Care DoctorMale, 28 years old, diabetes mellitus.Prescription Task: Prescribe NPH insulin, with clear instructions for use.28MDiabetes MellitusInjectionYes
Primary Health Care DoctorFemale, 18 years old, with mild atopic dermatitis.Prescription Task: Prescribe a topical corticosteroid, with clear instructions for use.18FAtopic dermatitisOintmentsNo
Primary Health Care DoctorMale, 25 years old, with bacterial conjunctivitis.Prescription Task: Prescribe antibiotic eye drops, with clear instructions for use.25MConjunctivitisDropsNo
Primary Health Care DoctorMale, 26 years old, with otitis externa.Prescription Task: Prescribe a topical antibiotic, with clear instructions on how the person should use the medication.26MOtitisOrifice dropsNo
Primary Health Care DoctorChild, 10 years old, with head lice. Prescription Task: Prescribe topical treatment, with clear instructions on how to use the medication.10PediculosisLotionNo
Primary Health Care DoctorPregnant woman, 27 years old, gestational age 28 weeks. Complains of vulvar itching and white, lumpy vaginal discharge. Prescription Task: Prescribe vaginal cream, with clear instructions on how to use the medication.27FVaginal dischargeVaginal creamNo
Primary Health Care DoctorFemale, 40 years old, with altered laboratory findings of glycated hemoglobin (HbA1c < 7.5%) and fasting blood glucose, with a diagnosis of uncomplicated type 2 diabetes mellitus. She reports difficulties to improve lifestyle habits and has made several attempts in the past. Prescription Task: Prescribe an oral hypoglycemic agent with clear instructions for use.40FDiabetes MellitusTabletYes
Primary Health Care DoctorMan, 50 years old, with systemic arterial hypertension and low cardiovascular risk.Prescription Task: Prescribe antihypertensive medication, with clear instructions on how the person should use the medication.50MArterial hypertensionTabletYes
Primary Health Care DoctorFemale, 40 years old, diagnosed with anxiety disorder.Prescription Task: Prescribe an anxiolytic, with clear instructions for use.40FAnxiety disorderTabletYes
F: Female; M: Male

Supplementary Table 2b. Simulated scenarios prompting prescription, used in the study validation setting
Medication according to the e-SUS PHC systemOption 1Option 2Option 3
Dipyrone Sodium 1 g, tabletparacetamol 500 mg, tabletibuprofen 600 mg tablet
Nimesulide 100 mg, tabletibuprofen 400mg, tabletketoprofen 150mg, capsule
Amoxicillin + potassium clavulanate 500 mg + 125 mg, tabletamoxicillin 500 mg, capsulebenzathine penicillin 1,200,000 IU, powder for suspension for injection
Metronidazole 400mg, tabletclindamycin phosphate 20 mg/g creammetronidazole 100mg/g, gel
Ferripolimaltose 50 mg/ml, oral solutionferrous sulfate (25 mg/ml elemental iron) 125 mg/ml, oral solution
Paracetamol 500 mg, tabletparacetamol 750 mg, tabletdipyrone sodium 500mg, tabletdipyrone sodium 1g, tablet
Salbutamol sulfate 100 mcg/dose, Aerosolfenoterol, hydrobromide 100 mcg/dose, Aerosol
Human insulin NPH 100iu/mL, solution for injectionregular human insulin 100iu/ml, solution for injectioninsulin detemir 100iu/ml, solution for injectioninsulin aspart 100iu/ml, solution for injection
Ciprofloxacin + hydrocortisone 2 + 10 mg/ml, vialbetamethasone dipropionate 0.5mg/g creamclobetasol, propionate 0.5mg/g, cream
Gentamicin sulfate 5 mg/mL ophthalmic solutiontobramycin 0.3%, ophthalmic solution
Polymyxin B + Neomycin + Fluocinolone + Lidocaine 10,000 IU + 3.5 + 0.25 + 20 mg/mL, otological solutionciprofloxacin + dexamethasone 3.5 + 1 mg/g, ointment
Permethrin 10mg/ml, shampoopermethrin 10mg/ml, lotionivermectin 6mg, tablet
Miconazole, 2% nitrate, vaginal creamclotrimazole 10mg/g, creamnystatin 25,000 iu/g, vaginal cream
Metformin Hydrochloride 500mg, controlled release tabletglibenclamide 5mg, tabletempagliflozin 25mg extended release tabletglimepiride 2mg, tablet
Enalapril Maleate 20mg, tabletamlodipine besylate 5 mg tabletlosartan potassium 50mg, tablet
Clonazepam 2.5 mg/mL, oral solutionclonazepam 2mg, tabletdiazepam 5mg, tabletlorazepam 2mg, tablet
PHC: Primary Health Care.

Supplementary Table 3. Characterization of prescription-prompting scenarios by variables

VariableCategoryStatistic
Age (years), median (range)28,5 (16,5)
Gender, n (%)Female6 (37,5)
Male8 (50,0)
Not specified for scenario2 (12,5%)
Indications for treatment (CIAP-2), n16
Active ingredient, n16
Frequency of use, n (%)Continuous use5 (31.3)
Temporary use11 (68.7)
Preparation, n (%)Aerosol1 (6.25)
Tablet8 (50.0)
Cream1 (6.25)
Injectable solution1 (6.25)
Ointment1 (6.25)
Ophthalmic solution1 (6.25)
Oral solution1 (6.25)
Oral suspension1 (6.25)
Otological suspension1 (6.25)
CIAP2: Classificação Internacional de Atenção Primária, versão 2 (International Classification of Primary Care-2) [https://www.sbmfc.org.br/ciap-2/].

Supplementary Table 4. Resources used to generate instructions for use of the prescribed medication

A
Input data
The process of automatically generating instructions for medication use begins with the collection of relevant data on the patient and the prescribed medication, following the standards adopted in the e-SUS APS electronic medical record system:First name of the patient.Active ingredient of the medicine, concentrationPresentation formDoseRoute of administrationPosologyTotal prescribed quantity
Requirements for instructions on use of medicines
Based on input and consensus from healthcare experts, nine essential requirements for clear and personalized instructions were defined, ranging from the structure and clarity of instructions to specific aspects such as storage and administration times, available in the database Protocols.IO dx.doi.org/10.17504/protocols.io.3byl49k92go5/v1 .
Base prompt used in models 1, 2 and 3
1. Specify the name of the medication and the form of presentation using numerical values ​​for doses and concentrations.2. Use imperative for commands.3. Provide instructions based on route of administration.4. Follow the chronological order of the actions the patient must perform.5. Mention any preparation steps before explaining how to administer the medication.6. Indicate special storage requirements, if applicable.7. Specify the frequency, time of day, and whether the medication should be taken with meals or daily events.8. For long-term treatment, advise the patient to seek medical consultation before completing the prescribed treatment.9. Instruct the patient to store the medication in a safe place, out of the reach of children, in the original packaging, and not to share it with other people.
Automation procedures used in models 1, 2 and 3
To avoid ambiguity in the instructions for use of the medication to the patient, the generated text must be written in the imperative tense.The specified route of administration and instructions should be arranged in chronological order.To mitigate limitations of LLMs, such as difficulties in simple mathematical calculations, a suggested medication administration schedule table was incorporated to avoid conflicting instructions.The need to renew the prescription was conditioned on the mention of an indefinite period of use in the e-SUS APS Electronic Medical Record.A list of medications that interact with alcoholic beverages was prepared for internal consultation using the model, for which a standard warning phrase was included.Requirements common to all prescriptions, such as storage guidelines, were prompted directly into the final model output, reducing the computational burden of LLM.
Retrieval-Augmented Generation used in Model 2 (Llama3.1-8b-RAG)
The RAG technique allowed the models used to access updated information from drug leaflets at Anvisa to enrich the context and improve the quality of the instructions generated.To ensure fast access to external content, an up-to-date database was developed containing medication leaflets extracted from the Anvisa website using web crawling techniques.To minimize the high computational cost of unrestricted use of full-text drug leaflets from Anvisa and to reduce the likelihood of including irrelevant or overly technical information, an on-demand Retrieval-Augmented Generation (RAG) approach was adopted. This method retrieves only specific excerpts from the leaflet corresponding to the prescribed medication.To achieve adequate performance and avoid information overload in the model, the selection of relevant content was optimized using the cosine distance. The leaflet was divided into vectorized blocks, identifying the section closest to the phrase "6. How should I use this medicine?". Present in a standardized form in most leaflets, this phrase introduces Section 6, which generally contains the instructions for use or dosage.To simplify the technical language of drug leaflets, the extracted information was summarized by the model via a specific Prompt (i.e. Extract and summarize dosage). The information extracted from this step is added to the prompt that requests the generation of instructions.

Supplementary Table 5. Assessment form used in the model validation process
1. Evaluator confidence
I feel confident, with the necessary knowledge to act as an evaluator of this prescription simulation.
Evaluator confidence I feel confident, with the necessary knowledge to act as an evaluator of this prescription simulation. 0 --------------------------------------------------------100
2. In each prescription with instructions for use generated by the model
For the statements below, choose the option that best represents your opinion:
QuestionsStrongly disagree (1 point) Somewhat Disagree (2 points)Neither agree or disagree (3 points)Somewhat agree (4 points)Strongly agree (5 points)
The instructions are accurate and consistent with widely accepted practices in the healthcare field.
The instructions contain harmful or incorrect information about the use of medications.
The instructions cover the aspects relevant to the correct use of the medication.
The instructions do not include sufficient information for the health system user to use the medication correctly.
The instructions are clear enough for the health system user to take/use the medication correctly.
The instructions present poorly organized information and are difficult for the person (user) to understand.
The instructions are written in an accessible and understandable way for the health system user .
The instructions provide the person (user) with excessively technical information.
The instructions are useful to complement the prescription text I have written.
My prescription did not improve with the AI-generated instructions.
Order of questions in the form:: 1, 5, 9, 2, 7, 3, 10, 6, 8, 4

3. If you found error(s), mark the type(s) of error detected:


Error types
Instructions may lead to incorrect use of this medicine
Usage instructions are contradictory or vague
There are factual (non-medical) errors
There is information that is not related to the prescription or is completely meaningless (model hallucination)


4. Comments (free text)


Comments


Supplementary Table 6. Frequency of errors found in the instructions for use of medications generated by the models

Error typesChatGPT-4.0 n=15Llama3.1-8B n=16Llama3.1-8B -RAG n=15
Error type 1:Instructions leading to incorrect use of the medication, n, % (95% CI)3, 20%(4,3 a 48,1%) 4, 25%(0,2 a 22,4%)2, 13,3% (1,7 a 40,5%)
Error type 2:Contradictory or vague instructions for use n, % (95% CI)05 (31,3)(11,0 a 58,7%)1, 6,7% (0,2 a 32%)
Error type 3:Factual errors n, % (95% CI)01, 6,3%(0,2 a 30,2%)1, 6,7% (0,2 a 32%)
Error type 4:There is information that is not related to the prescription or that is completely meaningless (model hallucination) n, % (95% CI)2, 13,3% (1,7 a 40,5%) 1, 6,3%(0,2 a 30,2%)0

Supplementary Table 7. Coordinates of the ROC curve with values of the global score achieved by all models, to discriminate errors
)(
Positive if greater than or equal to (a)Sensitivity1 - Specificity
17,8001,0001,000
25,2501,000,929
35,5001,000,857
41,1001,000,786
47,1001,000,643
56,0001,000,571
62,9501,000,500
65,6501,000,429
68,100,969,429
71,000,969,357
72,550,969,286
74,100,938,286
75,450,906,214
76,350,875,214
77,250,844,214
79,900,813,143
83,000,781,143
84,350,750,143
85,050,719,071
85,500,688,071
86,600,656,071
88,400,625,071
90,200,594,000
91,550,500,000
93,300,406,000
94,850,344,000
95,300,313,000
95,750,281,000
96,650,250,000
97,750,188,000
98,450,156,000
99,350,094,000
101,000,000,000
The test outcome variable or variables: SUS_Weighted have at least one tie between the positive actual state group and the negative actual state group.
(a) The lowest cutoff value is the minimum observed test value minus 1, and the highest cutoff value is the maximum observed test value plus 1. All other cutoff values ​​are the
means of two consecutive ordered observed test values.

Supplementary Table 8. Assessment of the adequacy, completeness, clarity, personalization and usefulness of the instructions generated by the models, from the perspective of health professionals (n=192)

AChatGPT 4.0 median (IIQ)Llama3.1 8B Median (IIQ)Llama3.1 8B-RAG median (IIQ)Comparisons between models*
Adequacy (weight 2) 16 (2)15 (7,8)15 (3)0,129
Completeness (weight 1) 7 (2)6 (4,1)7 (3)0,014**
Clarity (weight 1,5) 10,5 (1,5)10,5 (7,5)10,5 (3)0,070
Personalization (weight 1) 8 (1)6 (3,4)6,5 (3)0,126
Usefulness (weight 1,5)10,5 (1,5)8,6 (5,8)9 (3)0,177
*Friedman paired test. IIQ: Interquartile range
**Post-hoc test: In the comparison Llama3.1 8B-RAG vs. Llama3.1 8B, p=0.036. In the comparison ChatGPT 4.0 vs Llama3.1 8B, p=0.011.
RAG: Retrieval Augmented Generation.


Note: The score differences were statistically similar for the dimensions of Adequacy, Clarity, Personalization, and Usefulness. A significant difference was observed in the Completeness dimension, with Llama3.1 8 B-RAG outperforming Llama3.1 8B (p = 0.036). ChatGPT 4.0 performed significantly better than Llama3.1 8B (p = 0.011) and showed comparable performance to Llama3.1 8 B-RAG.