This collection of files contains the raw data and R script for the paper entitled 'The UK Research Excellence Framework and the Matthew Effect: Insights from machine learning.'Data:The file balbuena_REF_2014.dta is a Stata 12 file that contains the data for the paper. It is also saved in CSV format as Balbuena_REF_2014.csvThe file balbuena_ref_syntax.R contains the syntax used to run the analysis.The file balbuena_REF.RData file contains the data objects in RData format. See the instructions below if you wish to replicate the analysis. To simply view the data in Excel without re-running the analysis, the universities in the training set (n = 79) and testing set (n=30) are provided in the excel file “REF_2014_schools.xls”