Optimizing Clustering Performance: A Novel Integration of Whale Optimization Algorithm and K-NN Validation in Data Mining Analytics

Nur Wahyu Hidayat; Umar Ghoni; Mursalim

doi:10.36596/jitu.v9i1.1808

Authors

Nur Wahyu Hidayat Universitas Muhammadiyah Brebes
Umar Ghoni Universitas Muhammadiyah Brebes
Mursalim Universitas Sugeng Hartono

DOI:

https://doi.org/10.36596/jitu.v9i1.1808

Keywords:

Whale Optimization Algorithm, K-Means, K-Nearest Neighbors, Davies-Bouldin Index, Accuracy

Abstract

The digital era's massive data necessitates effective clustering, a machine learning technique grouping data by similarity. Clustering large, complex datasets faces challenges like volume, dimensionality, and variability, hindering algorithms like K-Means. A key issue in K-Means is its sensitivity to initial centroid selection, impacting results. This research aims to optimize clustering performance by integrating the Whale Optimization Algorithm (WOA) for improved initial centroid determination in K-Means, and K-Nearest Neighbors (K-NN) for validating the resulting cluster quality through classification accuracy. Evaluation on iris, wine, heart, lung, and liver datasets using the Davies-Bouldin Index (DBI) showed that WOA-KMeans consistently yielded lower DBI values compared to standard K-Means, indicating superior clustering. Notably, DBI for the lung dataset drastically decreased from 2.38016 to 0.65395. Furthermore, K-NN classification using the generated cluster labels achieved high accuracy (98-99% across datasets), confirming well-separated and internally homogeneous clusters. This demonstrates WOA's effectiveness in guiding K-Means towards better solutions and K-NN's utility in validating cluster distinctiveness. This novel WOA-K-NN combination offers a more accurate and robust clustering method. The significant performance improvements observed across diverse datasets highlight its potential for enhanced data exploration and pattern discovery in complex data mining tasks.

References

G. J. Oyewole and G. A. Thopil, “Data clustering: application and trends,” Artif Intell Rev, vol. 56, no. 7, pp. 6439–6475, Jul. 2023, doi: 10.1007/s10462-022-10325-y.

T. Dinh et al., “Data clustering: an essential technique in data science,” 2024, arXiv. doi: 10.48550/ARXIV.2412.18760.

V. V. Baligodugula and F. Amsaad, “Unsupervised Learning: Comparative Analysis of Clustering Techniques on High-Dimensional Data,” 2025, arXiv. doi: 10.48550/ARXIV.2503.23215.

J. Nasiri and F. M. Khiyabani, “A whale optimization algorithm (WOA) approach for clustering,” Cogent Mathematics & Statistics, vol. 5, no. 1, p. 1483565, Jan. 2018, doi: 10.1080/25742558.2018.1483565.

M. Ahmed, R. Seraj, and S. M. S. Islam, “The k-means Algorithm: A Comprehensive Survey and Performance Evaluation,” Electronics, vol. 9, no. 8, p. 1295, Aug. 2020, doi: 10.3390/electronics9081295.

H. Singh et al., “An enhanced whale optimization algorithm for clustering,” Multimed Tools Appl, vol. 82, no. 3, pp. 4599–4618, Jan. 2023, doi: 10.1007/s11042-022-13453-3.

S. Mirjalili and A. Lewis, “The Whale Optimization Algorithm,” Advances in Engineering Software, vol. 95, pp. 51–67, May 2016, doi: 10.1016/j.advengsoft.2016.01.008.

D. Liauw, M. Q. Khairuzzaman, and G. Syarifudin, “Whale Optimization Algorithm for Data Clustering,” in 2019 7th International Conference on Cyber and IT Service Management (CITSM), Jakarta, Indonesia: IEEE, Nov. 2019, pp. 1–6. doi: 10.1109/CITSM47753.2019.8965415.

P. K. Syriopoulos, N. G. Kalampalikis, S. B. Kotsiantis, and M. N. Vrahatis, “kNN Classification: a review,” Ann Math Artif Intell, vol. 93, no. 1, pp. 43–75, Feb. 2025, doi: 10.1007/s10472-023-09882-x.

[10] M. Suyal and P. Goyal, “A Review on Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms based on Supervised Learning,” IJETT, vol. 70, no. 7, pp. 43–48, Jul. 2022, doi: 10.14445/22315381/IJETT-V70I7P205.

M. Charrad, N. Ghazzali, V. Boiteau, and A. Niknafs, “NbClust?: An R Package for Determining the Relevant Number of Clusters in a Data Set,” J. Stat. Soft., vol. 61, no. 6, 2014, doi: 10.18637/jss.v061.i06.

Y. Zhou and Z. Hao, “Multi-Strategy Improved Whale Optimization Algorithm and Its Engineering Applications,” Biomimetics, vol. 10, no. 1, p. 47, Jan. 2025, doi: 10.3390/biomimetics10010047.

N. W. Hidayat, . Purwanto, and F. Budiman, “Whale Optimization Algorithm Bat Chaotic Map Multi Frekuensi for Finding Optimum Value,” JAIS, vol. 5, no. 2, pp. 80–90, Feb. 2021, doi: 10.33633/jais.v5i2.4432.

L. Peterson, “K-nearest neighbor,” Scholarpedia, vol. 4, no. 2, p. 1883, 2009, doi: 10.4249/scholarpedia.1883.

S. Hulu and P. Sihombing, “Analysis of Performance Cross Validation Method and K-Nearest Neighbor in Classification Data,” International Journal of Research and Review, no. 4, 2020.

D. L. Davies and D. W. Bouldin, “A Cluster Separation Measure,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-1, no. 2, pp. 224–227, Apr. 1979, doi: 10.1109/TPAMI.1979.4766909.

Y. Arie Wijaya, D. Achmad Kurniady, E. Setyanto, W. Sanur Tarihoran, D. Rusmana, and R. Rahim, “Davies Bouldin Index Algorithm for Optimizing Clustering Case Studies Mapping School Facilities,” TEM Journal, pp. 1099–1103, Aug. 2021, doi: 10.18421/TEM103-13.

A. Idrus, N. Tarihoran, U. Supriatna, A. Tohir, S. Suwarni, and R. Rahim, “Distance Analysis Measuring for Clustering using K-Means and Davies Bouldin Index Algorithm,” TEM Journal, pp. 1871–1876, Nov. 2022, doi: 10.18421/TEM114-55.

Optimizing Clustering Performance: A Novel Integration of Whale Optimization Algorithm and K-NN Validation in Data Mining Analytics

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

arjuna

issn

menutambahan

contactus

flagcounter

statc

googlescholar

Current Issue

hitsas

callpaper

Language