Garuda - Garba Rujukan Digital

Integrated : Journal Of Information Technology and Vocational Education

Vol 1, No 1 (2019)

Falentino Sembiring (Universitas Nusa Putra)
Dian Permata Sari (Indonesian Education University)

Publish Date
01 Apr 2019

In this study Web scraping will explain the process of retrieving urls from similar sites for the erosion process and storing url data on daily, weekly, monthly, and annual databases, so that url data can be valid and invalid urls will be filtered. filtering will be done to make it easier for a number of processes to be moved into the database. The next process will distinguish url based on available content data based on title, tags, keywords like SEO. Each step will be stored in the data warehouse to create the url data center. Hopefully this is the stage to collect data for big data. Problems are limited by designing web crawlers by searching for similar sites and storing processes in the database. From the database it will be directed to the data warehouse data. after in the data warehouse, data will be processed in the interface to the user divided by classification

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Integrated : Journal Of Information Technology and Vocational Education

Website

Abbrev

integrated

Publisher

Universitas Pendidikan Indonesia

Subject

Computer Science & IT Control & Systems Engineering Education Engineering Other

Description

INTEGRATED is a scientific journal published by the Department of PSTI UPI Kampus Purwakarta. This journal contains scientific papers from Academics, Researchers, and Practitioners about research on information system and vocational education. INTEGRATED is published twice a year in April and ...

Article Info

Abstract

Design process data storage and organize data scraping

Article Info

Abstract