Optimizing Clustering Performance: A Novel Integration of Whale Optimization Algorithm and K-NN Validation in Data Mining Analytics
DOI:
https://doi.org/10.36596/jitu.v9i1.1808Keywords:
Whale Optimization Algorithm, K-Means, K-Nearest Neighbors, Davies-Bouldin Index, AccuracyAbstract
The digital era's massive data necessitates effective clustering, a machine learning technique grouping data by similarity. Clustering large, complex datasets faces challenges like volume, dimensionality, and variability, hindering algorithms like K-Means. A key issue in K-Means is its sensitivity to initial centroid selection, impacting results. This research aims to optimize clustering performance by integrating the Whale Optimization Algorithm (WOA) for improved initial centroid determination in K-Means, and K-Nearest Neighbors (K-NN) for validating the resulting cluster quality through classification accuracy. Evaluation on iris, wine, heart, lung, and liver datasets using the Davies-Bouldin Index (DBI) showed that WOA-KMeans consistently yielded lower DBI values compared to standard K-Means, indicating superior clustering. Notably, DBI for the lung dataset drastically decreased from 2.38016 to 0.65395. Furthermore, K-NN classification using the generated cluster labels achieved high accuracy (98-99% across datasets), confirming well-separated and internally homogeneous clusters. This demonstrates WOA's effectiveness in guiding K-Means towards better solutions and K-NN's utility in validating cluster distinctiveness. This novel WOA-K-NN combination offers a more accurate and robust clustering method. The significant performance improvements observed across diverse datasets highlight its potential for enhanced data exploration and pattern discovery in complex data mining tasks.
References
G. J. Oyewole and G. A. Thopil, “Data clustering: application and trends,” Artif Intell Rev, vol. 56, no. 7, pp. 6439–6475, Jul. 2023, doi: 10.1007/s10462-022-10325-y.
T. Dinh et al., “Data clustering: an essential technique in data science,” 2024, arXiv. doi: 10.48550/ARXIV.2412.18760.
V. V. Baligodugula and F. Amsaad, “Unsupervised Learning: Comparative Analysis of Clustering Techniques on High-Dimensional Data,” 2025, arXiv. doi: 10.48550/ARXIV.2503.23215.
J. Nasiri and F. M. Khiyabani, “A whale optimization algorithm (WOA) approach for clustering,” Cogent Mathematics & Statistics, vol. 5, no. 1, p. 1483565, Jan. 2018, doi: 10.1080/25742558.2018.1483565.
M. Ahmed, R. Seraj, and S. M. S. Islam, “The k-means Algorithm: A Comprehensive Survey and Performance Evaluation,” Electronics, vol. 9, no. 8, p. 1295, Aug. 2020, doi: 10.3390/electronics9081295.
H. Singh et al., “An enhanced whale optimization algorithm for clustering,” Multimed Tools Appl, vol. 82, no. 3, pp. 4599–4618, Jan. 2023, doi: 10.1007/s11042-022-13453-3.
S. Mirjalili and A. Lewis, “The Whale Optimization Algorithm,” Advances in Engineering Software, vol. 95, pp. 51–67, May 2016, doi: 10.1016/j.advengsoft.2016.01.008.
D. Liauw, M. Q. Khairuzzaman, and G. Syarifudin, “Whale Optimization Algorithm for Data Clustering,” in 2019 7th International Conference on Cyber and IT Service Management (CITSM), Jakarta, Indonesia: IEEE, Nov. 2019, pp. 1–6. doi: 10.1109/CITSM47753.2019.8965415.
P. K. Syriopoulos, N. G. Kalampalikis, S. B. Kotsiantis, and M. N. Vrahatis, “kNN Classification: a review,” Ann Math Artif Intell, vol. 93, no. 1, pp. 43–75, Feb. 2025, doi: 10.1007/s10472-023-09882-x.
[10] M. Suyal and P. Goyal, “A Review on Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms based on Supervised Learning,” IJETT, vol. 70, no. 7, pp. 43–48, Jul. 2022, doi: 10.14445/22315381/IJETT-V70I7P205.
M. Charrad, N. Ghazzali, V. Boiteau, and A. Niknafs, “NbClust?: An R Package for Determining the Relevant Number of Clusters in a Data Set,” J. Stat. Soft., vol. 61, no. 6, 2014, doi: 10.18637/jss.v061.i06.
Y. Zhou and Z. Hao, “Multi-Strategy Improved Whale Optimization Algorithm and Its Engineering Applications,” Biomimetics, vol. 10, no. 1, p. 47, Jan. 2025, doi: 10.3390/biomimetics10010047.
N. W. Hidayat, . Purwanto, and F. Budiman, “Whale Optimization Algorithm Bat Chaotic Map Multi Frekuensi for Finding Optimum Value,” JAIS, vol. 5, no. 2, pp. 80–90, Feb. 2021, doi: 10.33633/jais.v5i2.4432.
L. Peterson, “K-nearest neighbor,” Scholarpedia, vol. 4, no. 2, p. 1883, 2009, doi: 10.4249/scholarpedia.1883.
S. Hulu and P. Sihombing, “Analysis of Performance Cross Validation Method and K-Nearest Neighbor in Classification Data,” International Journal of Research and Review, no. 4, 2020.
D. L. Davies and D. W. Bouldin, “A Cluster Separation Measure,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-1, no. 2, pp. 224–227, Apr. 1979, doi: 10.1109/TPAMI.1979.4766909.
Y. Arie Wijaya, D. Achmad Kurniady, E. Setyanto, W. Sanur Tarihoran, D. Rusmana, and R. Rahim, “Davies Bouldin Index Algorithm for Optimizing Clustering Case Studies Mapping School Facilities,” TEM Journal, pp. 1099–1103, Aug. 2021, doi: 10.18421/TEM103-13.
A. Idrus, N. Tarihoran, U. Supriatna, A. Tohir, S. Suwarni, and R. Rahim, “Distance Analysis Measuring for Clustering using K-Means and Davies Bouldin Index Algorithm,” TEM Journal, pp. 1871–1876, Nov. 2022, doi: 10.18421/TEM114-55.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 JITU : Journal Informatic Technology And Communication

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.