A Feature Grouping Method for Ensemble Clustering of High-Dimensional Genomic Big Data

TitleA Feature Grouping Method for Ensemble Clustering of High-Dimensional Genomic Big Data
Publication TypeConference Paper
Year of Publication2016
AuthorsFarid, DMd.
EditorNowé, A
Tertiary AuthorsManderick, B
Conference NameIEEE Future Technologies Conference
Date Published12/2016
Conference LocationSan Francisco, United States
Abstract

High-dimensional genomic big data with hundred of features present a big challenge in cluster analysis. Usually, genomic data are noisy and have correlation among the features. Also, different subspaces exist in high-dimensional genomic data. This paper presents a feature selecting and grouping method for ensemble clustering of high-dimensional genomic data. Two most popular clustering methods: k-means and similarity-based clustering are used for ensemble clustering. Ensemble clustering is more effective in clustering high-dimensional complex data than the traditional clustering algorithms. In this paper, we cluster unlabeled genomic data (148 Exome data sets) of Brugada syndrome from the Centre of Medical Genetics, VUB UZ Brussel using SimpleKMeans, XMeans, DBScan, and MakeDensityBasedCluster algorithms and compare the clustering results with proposed ensemble clustering method. Furthermore, we use biclustering ($\delta$-Biclustering) algorithm on each cluster to find the sub-matrices in the genomic data, which clusters both instances and features simultaneously.