Project Details


Contact Chris Research




1.1       Background to the Study

In any form of higher education it is necessary to predict a student’s academic performance and there are two reasons for this: it is essential to identify which set of students would do well in semester end examination so that they can be awarded scholarships and more importantly to identify the students who may fail in semester end examinations so that some form of remediation may be offered to them (Acharya and Sinha, 2014). The academic performance of students depends on several factors; some of these are previous gender, economic status, family background, performance in mid semester examinations etc. Based on these factors, classification models using MLAs can be constructed which can predict student results. Various classes of MLAs have been used by the researchers for this purpose. These include Decision Trees (DT), Bayesian Networks (BN), Artificial Neural Networks (ANN), Support Vector Machines (SVM) to name a few. As hinted above several studies have been done by researchers to predict the academic performance of students in various examinations (Romero and Ventura, 2007). These include higher secondary, graduation, post graduation, engineering as well as medical courses. Predictions have been done for traditional university courses as well as distance learning courses.

The concept of machine learning is something born out of this environment. Computers can analyze digital data to find patterns and laws in ways that is too complex for a human to do. The basic idea of machine learning is that a computer can automatically learn from experience (Mitchell, 1997). Although machine learning applications vary, its general function is similar throughout its applications. The computer analyzes a large amount of data, and finds patterns and rules hidden in the data. These patterns and rules are mathematical in nature, and they can be easily defined and processed by a computer. The computer can then use those rules to meaningfully characterize new data. The creation of rules from data is an automatic process, and it is something that continuously improves with newly presented data.

Applications of machine learning cover a wide range of areas. Search engines use machine learning to better construct relations between search phrases and web pages. By analyzing the content of the websites, search engines can define which words and phrases are the most important in defining a certain web page, and they can use this information to return the most relevant results for a given search phrase (Witten et al., 2016). Image recognition technologies also use machine learning to identify particular objects in an image, such as faces (Alpaydin, 2004). First, the machine learning algorithm analyzes images that contain a certain object. If given enough images to process, the algorithm is able to determine whether an image contains that object or not (Watt et al., 2016). In addition, machine learning can be used to understand the kind of products a customer might be interested in. By analyzing the past products that a user has bought, the computer can make suggestions about the new products that the customer might want to buy (Witten et al., 2016). All these examples have the same basic principle. The computer processes data and learns to identify this data, and then uses this knowledge to make decisions about future data. The increase in data has made these applications more effective, and thus more common in use.

Depending on the type of input data, machine learning algorithms can be divided into supervised and unsupervised learning. In supervised learning, input data comes with a known class structure (Mohri et al., 2012). This input data is known as training data. The algorithm is usually tasked with creating a model that can predict one of the properties by using other properties. After a model is created, it is used to process data that has the same class structure as input data. In unsupervised learning, input data does not have a known class structure, and the task of the algorithm is to reveal a structure in the data (Sugiyama, 2015).

1.2       Statement of the Problem

In recent years Educational Data Mining (EDM) has emerged as a new field of research due to the development of several statistical approaches to explore data in educational context. One such application of EDM is early prediction of student results. This is necessary in higher education for identifying the “weak” students so that some form of remediation may be organized for them. In this work, a set of attributes are first defined for a group of undergraduate students majoring in Law.

1.3       Aim and Objectives of the Study

The aim of this study is to predict students academic performance using machine learning techniques. The specific objective are to:

  1. Predict the performance of UTME and DE students in faculty of Law, Olabisi Onabanjo University.
  2. Compare the performance of Law students based on gender.

1.4       Scope of the Study

The scope of this research is to investigate which is the most efficient machine learning technique in predicting the final grade of undergraduate students.


1.5       Organisation of the Study

The study is divided into five major chapter. The first chapter is the introduction which includes, background to the study, statement of the problem, objective of the study and scope of the study. In the chapter two, review of different methods used by various authors was discussed and relevant works related to the study was discussed comprehensively. The chapter three described the method used in the study. The Random Forest and K-Nearest Neighbours was explained. The chapter four review the implementation and results. The tables in chapter four were interpreted. Chapter five highlight the discussion and conclusion.

Leave a Reply

Your email address will not be published. Required fields are marked *