A New Data Balancing Method for Classifying Multi-Class Imbalanced Genomic Data

TitleA New Data Balancing Method for Classifying Multi-Class Imbalanced Genomic Data
Publication TypeConference Paper
Year of Publication2016
AuthorsFarid, DMd.
EditorNowé, A
Tertiary AuthorsManderick, B
Conference Name25th Belgian-Dutch Conference on Machine Learning (Benelearn)
Date Published09/2016
Conference LocationKortrijk, Belgium
Abstract

Classification of multi-class imbalanced genomic data is a difficult task, as genomic data are noisy, high dimensional, small sample size that results overfitting and overlapping of classes. Traditional machine learning algorithms are very successful with classifying majority class instances compare to the minority class instances. The conventional data balancing methods alter the original data distribution, so they might suffer from overfitting or drop some potential information. In this paper, we propose a new method for dealing with multi-class imbalanced data based on clustering and selecting most informative instances from the majority classes.