Unbalanced data classification Using genetic programming

Kumar, Arvind

Welcome to the Digital Repository Service@BU

The Digital Institutional Repository of BU

DRS@BU captures all intellectual assets of Bennett University. This platform enables the BU community to deposit (self-archive) their publications using a web interface, preserve and organize these publications for easy retrieval. It is expected that the repository will evolve as a major source of reference for all Bennett University publications accessible on the net.

nanoll extt

Please use this identifier to cite or link to this item: http://lrcdrs.bennett.edu.in:80/handle/123456789/2016

Title:	Unbalanced data classification Using genetic programming
Authors:	Kumar, Arvind
Keywords:	Computer Science Computer Science Software Engineering
Issue Date:	Jun-2022
Publisher:	Bennett university
Abstract:	In many real-world classification applications, such as medical diagnosis, fraud detection, bioinformatics, or fault diagnostics, it is common that one class has only a limited number of training instances (called the minority class), while the other class (called the majority class) conceive the rest. Such types of data sets are called unbalanced. In data classification, machine learning (ML) methods can face a performance bias when the nature of data sets is unbalanced. In this case, the trained classifiers may have good accuracy on the majority class but lower accuracy on the minority class. Genetic Programming (GP) is an optimistic machine learning method based on the Darwinian theory of evolution to automatically emerge computer programs to solve problems without any domain-specific knowledge. Although GP has revealed much success in developing reliable and precise classifiers for typical classification jobs, GP, like many otherML algorithms, can produce biased classifiers when the nature of data is unbalanced. This biasing is because traditional training standards such as the overall success rate in the fitness function in GP can be influenced by the more significant number of instances from the majority class. This research focuses on algorithmic methods assuming that the whole training data is important and valuable, and no data sample should be removed from the training process. The second consideration in this work is that the proposed methods should be problem-independent, and they should not expect any a-priori domain-specific or expert knowledge. Thus, this research focuses on developing GP-based approaches for unbalanced data-set classification, based on internal cost alteration in the GP fitness function and facilitating the unbalanced data set to be used “as is” in the training process. This research work demonstrates that by designing various methods in GP, we can evolve classifiers with good classification performance on the majority and the minority classes. These developed methods are evaluated, on publicly available, UCI-based binary benchmark classification problems with varying levels of imbalanced factors.
URI:	http://lrcdrs.bennett.edu.in:80/handle/123456789/2016
Appears in Collections:	School of Computer Science Engineering and Technology (SCSET)

Files in This Item:

File	Description	Size	Format
PhD-Thesis-E18SOE822-Arvind-Final-signed.pdf		1.8 MB	Adobe PDF	View/Open

Show full item record

Contact admin for Full-Text