Database Normalization and Performance Prediction Using Linear Regression on a Student Dataset
DOI:
https://doi.org/10.63671/ijsssr.v3i4.542Keywords:
Database Normalization, Linear Regression, Feature Engineering, Educational Data Mining, Student Performance PredictionAbstract
The practical integration of database management systems (DBMS) and machine learning to forecast students' academic performance is presented in this paper. In order to remove redundancy and guarantee reference integrity, the Student Performance Dataset, which comprises 1000 records, is first configured and normalized using SQL up to Third Normal Form (3NF). Following normalization, tables are merged and feature engineering methods, such as one-hot encoding of classified attributes and target variable (mean score) generation, are used. The etched features are then used in a linear regression model to forecast the students' average grades. MAE, RMSE, and R2 measurements are used to evaluate sample performance after the dataset is split into training and test sets (80:20 ratio). The findings demonstrate a moderate degree of predictive power, suggesting that academic performance is influenced by variables like exam preparation courses, parents' educational attainment, and lunch type. This research shows how machine learning workflows can be efficiently supported by structured database design in a completely real-world setting.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 International Journal of Science and Social Science Research

This work is licensed under a Creative Commons Attribution 4.0 International License.
