A New Data Balancing Method for Classifying Multi-Class Imbalanced Genomic Data
|Title||A New Data Balancing Method for Classifying Multi-Class Imbalanced Genomic Data|
|Publication Type||Conference Paper|
|Year of Publication||2016|
|Tertiary Authors||Manderick, B|
|Conference Name||25th Belgian-Dutch Conference on Machine Learning (Benelearn)|
|Conference Location||Kortrijk, Belgium|
Classification of multi-class imbalanced genomic data is a difficult task, as genomic data are noisy, high dimensional, small sample size that results overfitting and overlapping of classes. Traditional machine learning algorithms are very successful with classifying majority class instances compare to the minority class instances. The conventional data balancing methods alter the original data distribution, so they might suffer from overfitting or drop some potential information. In this paper, we propose a new method for dealing with multi-class imbalanced data based on clustering and selecting most informative instances from the majority classes.