Ensemble of trees for classifying high-dimensional imbalanced genomic data

TitleEnsemble of trees for classifying high-dimensional imbalanced genomic data
Publication TypeConference Paper
Year of Publication2016
AuthorsFarid, DMd.
EditorNowe, A
Tertiary AuthorsManderick, B
Conference NameIEEE SAI Intelligent Systems Conference (IntelliSys 2016)
Date Published09/2016
Conference LocationLondon, UK

Machine learning for data mining applications in the field of bioinformatics is to extract new knowledge to provide an improved and effective diagnosis process for patients. In this paper, we introduce an adaptive ensemble learning for classifying high-dimensional multi-class imbalanced genomic data. The aspect is to design and develop an optimal ensemble method for information discovery on genomic data, which improve the prediction accuracy of DNA variant classification. The proposed method is based on ensemble of decision trees, data pre-processing, feature selection and grouping. It converts an imbalanced genomic data into multiple balanced ones and then builds a number of decision trees on these multiple data with specific feature groups. The outputs of these trees are combined for classifying new instances by majority voting technique. In this empirical study, different ensemble predictive modelling techniques like Random Forest, Boosting and Bagging were compared with the proposed ensemble method. The experimental results on genomic data (148 Exome datasets) of Brugada syndrome from the Centre of Medical Genetics, VUB UZ Brussel show that the proposed method is usually superior to the conventional ensemble learning algorithms when classifying the high-dimensional multi-class imbalanced genomic data.