Sistemasi: Jurnal Sistem Informasi
Vol 11, No 1 (2022): Sistemasi: Jurnal Sistem Informasi

Big Data Infrastructure Design Optimizes Using Hadoop Technologies Based on Application Performance Analysis

Shafiyah Shafiyah (Electronic Engineering Polytechnic Institute of Surabaya)
Ahmad Syauqi Ahsan (Politeknik Elektronika Negeri Surabaya)
Rengga Asmara (Politeknik Elektronika Negeri Surabaya)



Article Info

Publish Date
09 Jan 2022

Abstract

Big data's infrastructure is a technology that provides the ability to store, process, analyze, and visualize large data. The tools and applications used are one of the challenges when building big data's infrastructure. In the study, we offered a new strategy to optimize big data infrastructure design that was an essential part of big data processing by performing performance analysis applications used at each stage of big data processing. The process started from collecting data sourcing online news using web crawler methods using Scrapyand Apache Nutch. Next, implement Hadoop technologies to facilitate the distribution of big data storage and computing. No-sql databases Mongo DB and HBase made it easier to query data, after which they built search engines using Elasticsearch and Apache Solr. Data saved later in analysis using hive apache, pig, and spark. The data has been analyzed was shown on the website using Zeppelins, Metabolase, Kibana, and Tableau. The test scenario consisted of the number of servers and files used. Testing parameters started from process speed, memory usage, CPU usage, throughput, etc. The performance testing results of each application were compared to and analyzed to see the merits and defaults of the application as a reference to building optimal infrastructure design to meet the needs of the user. This research has produced two big data infrastructure design alternatives. The suggested infrastructure has been implemented on computer nodes in the big data pens for processing big data from online media and proving to be running well.

Copyrights © 2022






Journal Info

Abbrev

stmsi

Publisher

Subject

Computer Science & IT Electrical & Electronics Engineering

Description

Sistemasi adalah nama terbitan jurnal ilmiah dalam bidang ilmu sains komputer program studi Sistem Informasi Universitas Islam Indragiri, Tembilahan Riau. Jurnal Sistemasi Terbit 3x setahun yaitu bulan Januari, Mei dan September,Focus dan Scope Umum dari Sistemasi yaitu Bidang Sistem Informasi, ...