Six gastric cancer microarray datasets (GSE66229, GSE65801, GSE54129, GSE51575, GSE19826, GSE13911) were downloaded from the NCBI GEO database. GSE66229, GSE65801 and GSE54129 were defined as the training cohort, while GSE51575, GSE19826 and GSE13911 were used as the independent validation cohort. All probe-level expression data were converted to gene-level expression values based on corresponding GPL platform annotation files. Multiple probes matching the same gene were averaged, and unmapped probes were discarded. Raw expression values were log2-transformed, and cross-sample quantile normalization was performed using the limma package. Systematic batch effects among different datasets and platforms were corrected using SVA and ComBat algorithms. After integration of common genes across all cohorts, the final unified normalized expression matrix was generated and saved as S2 Normalized_Expression_Matrix. Detailed clinical and grouping annotation information for the training cohort and validation cohort were organized and summarized intoS10 experimental cohorts and S11 validation cohorts, respectively.