A Feature Grouping Method for Ensemble Clustering of High-Dimensional Genomic Big Data
|Title||A Feature Grouping Method for Ensemble Clustering of High-Dimensional Genomic Big Data|
|Publication Type||Conference Paper|
|Year of Publication||2016|
|Tertiary Authors||Manderick, B|
|Conference Name||IEEE Future Technologies Conference|
|Conference Location||San Francisco, United States|
High-dimensional genomic big data with hundred of features present a big challenge in cluster analysis. Usually, genomic data are noisy and have correlation among the features. Also, different subspaces exist in high-dimensional genomic data. This paper presents a feature selecting and grouping method for ensemble clustering of high-dimensional genomic data. Two most popular clustering methods: k-means and similarity-based clustering are used for ensemble clustering. Ensemble clustering is more effective in clustering high-dimensional complex data than the traditional clustering algorithms. In this paper, we cluster unlabeled genomic data (148 Exome data sets) of Brugada syndrome from the Centre of Medical Genetics, VUB UZ Brussel using SimpleKMeans, XMeans, DBScan, and MakeDensityBasedCluster algorithms and compare the clustering results with proposed ensemble clustering method. Furthermore, we use biclustering ($\delta$-Biclustering) algorithm on each cluster to find the sub-matrices in the genomic data, which clusters both instances and features simultaneously.