Data

About our Dataset

Where our data comes from

The dataset was generated from two secondary schools in Portugal, Gabriel Pereira and Mousinho da Silveira, using a combination of school records and questionnaires. Students and their parents provided demographic and social information through surveys, while academic records documented student grades. The dataset consists of 649 subjects and captures both numeric variables (such as grades and absences) and categorical variables (such as gender, school, and parental status). To ensure privacy, student names and other personally identifying details were excluded. The original sources of the dataset are the reports and questionnaires collected from these two Portuguese schools, which focus on student performance in Mathematics and Portuguese—two core subjects in Portugal. The dataset was created by Paulo Cortez, a professor at the University of Minho, but no specific funding source is mentioned, leaving open the question of whether this research was institutionally or governmentally supported.

Boy writing on a piece of paper next to a girl.
Photo courtesy of Unsplash.
Three dimensional bar graph against a light blue background.
Photo courtesy of Unsplash.

What our data reveals

Data Critique

While the dataset provides valuable insights into student performance, it has several limitations that affect the depth and generalizability of its findings.

Some variables contain vague classifications, such as “other,” which lack clarity, and gender is recorded only in a binary format, limiting representation. The “famsup” variable, which indicates family educational support, does not capture the quality or extent of that support. Additionally, the dataset offers only a narrow view of students’ socio-cultural backgrounds, omitting key factors such as whether schools are publicly or privately funded, student-to-teacher ratios, class sizes, and the number of subjects contributing to final grades. This lack of context raises concerns about the dataset’s applicability beyond the two schools studied, as findings may not generalize to other educational or geographical settings. Furthermore, the dataset reflects an ideological bias toward quantifying student success through numeric indicators, reinforcing a grade-centric approach. By prioritizing structured data, it overlooks important qualitative factors such as student motivation, mental health, teacher quality, learning styles, and school funding.

Despite these shortcomings, the dataset remains valuable for understanding how specific external factors correlate with academic outcomes and can serve as a foundation for further research into student success.

View our data processing methodology