Perfecting a Video Game with Game Metrics
Vol 19, No 5: October 2021

Large scale data analysis using MLlib

Ahmed Hussein Ali (Informatics Institute for Postgraduate Studies)
Maan Nawaf Abbod (Imam Aadham University College)
Mohammed Khamees Khaleel (AL-Iraqia University)
Mostafa Abdulghafoor Mohammed (Imam Aadham University College)
Tole Sutikno (Universitas Ahmad Dahlan)



Article Info

Publish Date
01 Oct 2021

Abstract

Recent advancements in the internet, social media, and internet of things (IoT) devices have significantly increased the amount of data generated in a variety of formats. The data must be converted into formats that is easily handled by the data analysis techniques. It is mathematically and physically expensive to apply machine learning algorithms to big and complicated data sets. It is a resource-intensive process that necessitates a huge amount of logical and physical resources. Machine learning is a sophisticated data analytics technology that has gained in importance as a result of the massive amount of data generated daily that needs to be examined. Apache Spark machine learning library (MLlib) is one of the big data analysis platforms that provides a variety of outstanding functions for various machine learning tasks, spanning from classification to regression and dimension reduction. From a computational standpoint, this research investigated Apache Spark MLlib 2.0 as an open source, autonomous, scalable, and distributed learning library. Several real-world machine learning experiments are carried out in order to evaluate the properties of the platform on a qualitative and quantitative level. Some of the fundamental concepts and approaches for developing a scalable data model in a distributed environment are also discussed.

Copyrights © 2021






Journal Info

Abbrev

TELKOMNIKA

Publisher

Subject

Computer Science & IT

Description

Submitted papers are evaluated by anonymous referees by single blind peer review for contribution, originality, relevance, and presentation. The Editor shall inform you of the results of the review as soon as possible, hopefully in 10 weeks. Please notice that because of the great number of ...