Stream processing is becoming essential of IoT stack to increase the values and benefits for the organization. Along with the growth in streaming data, centralized architecture will cause large delays for providing service. To solve this problem, additional resources or machines in the cluster are needed to maintain processing performance. This research proposes to develop a distributed IoT data stream processing platform using Spark Streaming. Throughput and latency show high results, 321.4 records/s and 13.3 seconds, if the amount of data processed is less than the available resources.While the system capability scenario of up to 12 seconds of batch interval shows the highest throughput and latency, 2112.4 records/s and 5.93 seconds. Furthermore, the fault tolerance scenario shows that 6 nodes can process faster with 60 seconds. Spark Streaming also shows the ability to use distributed resources efficiently by showing CPU and memory usage across all nodes having no significant differences. The average difference in memory and CPU usage at each node is 3.6% and 2.04%.
Copyrights © 2020