Meningkatkan Efisiensi Energi Perangkat Edge melalui Optimasi Pruning dan Kuantisasi Model

Nambi Sembilu; Iqbal Ramadhani Mukhlis; Iswanda Fauzan Satibi

doi:10.36596/jitu.v10i1.2324

Authors

Nambi Sembilu UPN Veteran Jawa Timur
Iqbal Ramadhani Mukhlis UPN Veteran Jawa Timur
Iswanda Fauzan Satibi UPN Veteran Jawa Timur

DOI:

https://doi.org/10.36596/jitu.v10i1.2324

Keywords:

edge computing, green AI, model compression, pruning, quantization

Abstract

Edge computing devices are increasingly tasked with performing artificial intelligence inference under strict constraints on processing capacity and power consumption. This study evaluates magnitude-based weight pruning and dynamic quantization as practical model compression techniques for energy-efficient edge AI deployment. MobileNetV2, pretrained on ImageNet, was adapted to the CIFAR-10 classification task and compressed under three configurations: 40% L1 unstructured pruning followed by recovery fine-tuning (Prune40), dynamic INT8 post-training quantization (QuantINT8), and a sequential combination of both (Prune+Quant). All experiments were executed on a physical Intel N150 mini PC with a thermal design power of 6 watts, using PyTorch 2.1 in CPU-only inference mode. Results show that Prune40 reduced inference latency by 17.9% while simultaneously improving classification accuracy by 1.04 percentage points, attributed to the implicit regularisation effect of sparse weight removal and recovery fine-tuning. QuantINT8 yielded moderate latency savings (6.6%) with negligible accuracy loss. The combined pipeline achieved the lowest absolute latency at a marginal energy overhead. These findings establish magnitude pruning with recovery training as the most effective single-step compression strategy for low-power x86 edge platforms.

References

E. Masanet, A. Shehabi, N. Lei, S. Smith, and J. Koomey, "Recalibrating Global Data Center Energy-Use Estimates," Science, vol. 367, no. 6481, pp. 984-986, 2020. DOI: 10.1126/science.aba3758

D. Patterson et al., "The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink," IEEE Computer, vol. 55, no. 7, pp. 18-28, 2022. DOI: 10.1109/MC.2022.3148714

E. Strubell, A. Ganesh, and A. McCallum, "Energy and Policy Considerations for Deep Learning in NLP," Proc. 57th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 3645-3650, 2019.

R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni, "Green AI," Communications of the ACM, vol. 63, no. 12, pp. 54-63, 2020.

T. Choudhary, V. Mishra, A. Goswami, and J. Sarangapani, "A Comprehensive Survey on Model Compression and Acceleration," Artificial Intelligence Review, vol. 53, no. 7, pp. 5113-5155, 2020. DOI: 10.1007/s10462-020-09816-7

X. Xu et al., "Empowering Edge Intelligence: A Comprehensive Survey on On-Device AI Models," ACM Computing Surveys, vol. 57, no. 8, pp. 1-42, 2025. DOI: 10.1145/3724420

T. Liang, J. Glossner, L. Wang, S. Shi, and X. Zhang, "Pruning and Quantization for Deep Neural Network Acceleration: A Survey," Neurocomputing, vol. 461, pp. 370-403, 2021. DOI: 10.1016/j.neucom.2021.07.045

S. Han, H. Mao, and W. J. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding," Proc. International Conference on Learning Representations (ICLR), 2016.

S. Anwar, K. Hwang, and W. Sung, "Structured Pruning of Deep Convolutional Neural Networks," ACM Journal on Emerging Technologies in Computing Systems, vol. 13, no. 3, art. 32, pp. 1-18, 2017. DOI: 10.1145/3005348

G. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, "SparseGPT: Massive Language Models Can be Accurately Pruned in One Shot," Proc. International Conference on Machine Learning (ICML), 2023.

Y. Cheng, D. Wang, P. Zhou, and T. Zhang, "A Survey of Model Compression and Acceleration for Deep Neural Networks," IEEE Signal Processing Magazine, vol. 35, no. 1, pp. 126-136, 2018. DOI: 10.1109/MSP.2017.2765695

B. Jacob et al., "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference," Proc. IEEE CVPR, pp. 2704-2713, 2018.

A. Tschand, A. T. R. Rajan, S. Idgunji, et al., "MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from MicroWatts to MWatts for Sustainable AI," Proc. IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 1201-1216, 2025. DOI: 10.1109/HPCA61900.2025.00092

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510-4520, 2018.

A. G. Howard et al., "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arXiv preprint, arXiv:1704.04861, 2017.

M. Tan and Q. V. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks," Proc. International Conference on Machine Learning (ICML), pp. 6105-6114, 2019.

E. J. Husom et al., "Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency," ACM Transactions on Internet of Things, vol. 6, no. 4, art. 28, 2025. DOI: 10.1145/3767742

J. Huang and S. Gopal, "Green AI: A Multidisciplinary Approach to Sustainability," Environmental Science and Ecotechnology, vol. 23, art. 100536, 2025. DOI: 10.1016/j.ese.2025.100536

A. Torralba, R. Fergus, and W. T. Freeman, "80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1958-1970, 2008. DOI: 10.1109/TPAMI.2008.128

J. Deng et al., "ImageNet: A Large-Scale Hierarchical Image Database," Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248-255, 2009.

P. Henderson et al., "Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning," Journal of Machine Learning Research (JMLR), vol. 21, no. 248, pp. 1-43, 2020

S. Laskaridis, T. Kouris, and N. D. Lane, "Melting Point: Mobile Inference of Large Language Models," Proc. ACM MobiSys, pp. 178-191, 2024. DOI: 10.1145/3643832.3661873

T. Dettmers, M. Lewis, Y. Belkada, and L. Zettlemoyer, "LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale," Advances in Neural Information Processing Systems (NeurIPS), vol. 35, pp. 30318-30332, 2022

Meningkatkan Efisiensi Energi Perangkat Edge melalui Optimasi Pruning dan Kuantisasi Model

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

arjuna

issn

menutambahan

contactus

flagcounter

statc

googlescholar

Current Issue

hitsas

callpaper

Language