J-SAKTI (Jurnal Sains Komputer dan Informatika)
Vol 7, No 2 (2023): EDISI SEPTEMBER

Web Scraping for Summarization of Freelance Job Website using Vector Space Model

Andi Nurkholis (Universitas Teknokrat Indonesia, Indonesia)
Yusra Fernando (Universitas Teknokrat Indonesia, Indonesia)
Faris Arkan Ans (Universitas Teknokrat Indonesia, Indonesia)



Article Info

Publish Date
30 Sep 2023

Abstract

In the current era of digitalization, the internet is at the center of all lines of community activity, just like the field of work. Currently, many platforms provide job vacancies, especially for freelancers. To obtain this information, users usually need to open several websites to find information about suitable job vacancies. Web scraping offers a solution to overcome these problems. Based on research that has been done before, the BeautifulSoup and Selenium libraries will be used to collect data. To search for data, the vector space model method is used to find the level of data similarity between the query and the document. In exploring the data, the average perfect recall value is 100%, while the average precision value is 56%. This is because the data search uses three parameters, so the possibility of retrieving irrelevant data is more significant if the document contains a word in the user's query, even though the context does not match. Utilizing the Streamlit framework in Python can display the data processing results and help users navigate the web scraping process, data processing, and data search. This study aims to implement the web scraping method to retrieve data from freelance websites: Freelance, Project, and Sribulancer. By applying the vector space model method, users can search data from several websites without opening freelance websites one by one. Using data visualization in the form of a web application using the Streamlit framework, the web scraping results can also be processed to be presented in a more helpful form and save the user's time.

Copyrights © 2023






Journal Info

Abbrev

jsakti

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Energy

Description

J-SAKTI adalah jurnal yang diterbitkan oleh LPPM STIKOM Tunas Bangsa yang berfokus di bidang Manajemen Informatika. Pengiriman artikel tidak dipungut biaya, kemudian artikel yang diterima akan diterbitkan secara online dan dapat diakses secara gratis. Topik dari J-SAKTI adalah sebagai berikut (namun ...