A new-fangled fes-k -means clustering algorithm for disease discovery and visual analytics

Tonny Oyana

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k-means clustering techniquethe Fast, Efficient, and Scalable k-means algorithm (FES-k-means). The FES-k-means algorithm uses a hybrid approach that comprises the k-d tree data structure that enhances the nearest neighbor query, the original k-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original k-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines.

Original languageEnglish (US)
Article number746021
JournalEurasip Journal on Bioinformatics and Systems Biology
Volume2010
DOIs
StatePublished - Aug 12 2010

Fingerprint

Visual Analytics
K-means Algorithm
K-means Clustering
Clustering algorithms
Clustering Algorithm
Cluster Analysis
Rate Adaptation
Evaluate
Large Data
K-means
Knowledge Discovery
Hybrid Approach
Tree Structure
Data mining
Clustering Methods
Blood
Nearest Neighbor
Data Structures
Data Mining
Clustering

All Science Journal Classification (ASJC) codes

  • Biochemistry, Genetics and Molecular Biology(all)
  • Computer Science Applications
  • Computational Mathematics

Cite this

A new-fangled fes-k -means clustering algorithm for disease discovery and visual analytics. / Oyana, Tonny.

In: Eurasip Journal on Bioinformatics and Systems Biology, Vol. 2010, 746021, 12.08.2010.

Research output: Contribution to journalArticle

@article{8ce07b9ac1eb4d7bb2d0f5f892deabd2,
title = "A new-fangled fes-k -means clustering algorithm for disease discovery and visual analytics",
abstract = "The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k-means clustering techniquethe Fast, Efficient, and Scalable k-means algorithm (FES-k-means). The FES-k-means algorithm uses a hybrid approach that comprises the k-d tree data structure that enhances the nearest neighbor query, the original k-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original k-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines.",
author = "Tonny Oyana",
year = "2010",
month = "8",
day = "12",
doi = "10.1155/2010/746021",
language = "English (US)",
volume = "2010",
journal = "Eurasip Journal on Bioinformatics and Systems Biology",
issn = "1687-4145",
publisher = "Springer Publishing Company",

}

TY - JOUR

T1 - A new-fangled fes-k -means clustering algorithm for disease discovery and visual analytics

AU - Oyana, Tonny

PY - 2010/8/12

Y1 - 2010/8/12

N2 - The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k-means clustering techniquethe Fast, Efficient, and Scalable k-means algorithm (FES-k-means). The FES-k-means algorithm uses a hybrid approach that comprises the k-d tree data structure that enhances the nearest neighbor query, the original k-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original k-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines.

AB - The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k-means clustering techniquethe Fast, Efficient, and Scalable k-means algorithm (FES-k-means). The FES-k-means algorithm uses a hybrid approach that comprises the k-d tree data structure that enhances the nearest neighbor query, the original k-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original k-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines.

UR - http://www.scopus.com/inward/record.url?scp=77955324518&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955324518&partnerID=8YFLogxK

U2 - 10.1155/2010/746021

DO - 10.1155/2010/746021

M3 - Article

VL - 2010

JO - Eurasip Journal on Bioinformatics and Systems Biology

JF - Eurasip Journal on Bioinformatics and Systems Biology

SN - 1687-4145

M1 - 746021

ER -