INTI Nusa Mandiri
Vol 18 No 1 (2023): INTI Periode Agustus 2023

METODE VECTOR SPACE MODEL UNTUK WEB SCRAPING PADA WEBSITE FREELANCE

Andi Nurkholis (Universitas Teknokrat Indonesia)
Yusra Fernando (Universitas Teknokrat Indonesia)
Faris Arkans Ans (Universitas Teknokrat Indonesia)



Article Info

Publish Date
02 Aug 2023

Abstract

Abstract— In digitalization era, internet is at the center of all lines of community activity, just like the field of work. Currently, many platforms provide job vacancies, especially for freelancers. To obtain this information, users usually need to open several websites to find information about suitable job vacancies. Web scraping offers solution to overcome these problems. Based on research that has been done, the BeautifulSoup and Selenium libraries will be used to collect data. To search for data, vector space model method is used to find the level of data similarity between the query and the document. In exploring data, the average near-perfect recall value is 98%, while the average precision value is 56%. This is because data search uses three parameters, so the possibility of retrieving irrelevant data is more significant if the document contains a word in the user's query, even though the context does not match. Utilizing the Streamlit framework in Python can display the data processing results and help users navigate the web scraping process, data processing, and data search. This study aims to implement the web scraping method to retrieve data from freelance websites: Freelance, Project, and Sribulancer. By applying the vector space model method, users can search data from several websites without opening freelance websites one by one. Using data visualization in the form of a web application using the Streamlit framework, the web scraping results can also be processed to be presented in a more helpful form and save the user's time

Copyrights © 2023






Journal Info

Abbrev

inti

Publisher

Subject

Computer Science & IT

Description

The INTI Nusa Mandiri Journal is intended as a media for scientific studies on the results of research, thought and analysis-critical studies on the issues of Computer Science, Information Systems and Information Technology, both nationally and internationally. The scientific article in question is ...