Query Processing and Ranking of News Titles Related to the Governor of West Java Using TF-IDF and Cosine Similarity

Keywords:
Cosine Similarity, Document Ranking, News Title Search, TF-IDF, Web ScrapingAbstract
Increasing efficiency and relevance in searching for news information is a pressing need
in the digital era. This study aims to develop a news title ranking system based on keywords (que
ries) by combining the Term Frequency-Inverse Document Frequency (TF-IDF) and cosine similar
ity methods. The data used are 2,507 news titles from four of the most popular news sites in Indo
nesia, namely Kompas.com, Detik.com, CNNIndonesia.com, and Tempo.com in the last one year.
The stages carried out include web scraping, pre-processing (case folding, tokenizing, stopwords
removal, and stemming), word weighting using TF-IDF, similarity calculation using cosine simi
larity, to system performance evaluation with accuracy, precision, recall, and f1-score metrics. The
test results on three different queries show that the system is able to provide very good results with
an average accuracy of 99.75%, precision 96.67%, recall 100%, and f1-score 98.33%. This study
shows that the combination of TF-IDF and cosine similarity is effective in optimizing the search for
news titles that are relevant to the entered query.
References
https://journal.universitaspahlawan.ac.id/index.php/jpdk/article/view/13891/10691
https://journal.universitaspahlawan.ac.id/index.php/jpdk/article/view/14206/10918
https://peerj.com/articles/cs-389/
https://journals.telkomuniversity.ac.id/IJDPR/article/view/7944/2545
http://jkm.my.id/index.php/komunikasi/article/view/123/137
https://www.eksplora.stikom-bali.ac.id/index.php/eksplora/article/view/360/175
https://ejournal.uika-bogor.ac.id/index.php/krea-tif/article/download/15470/5511
https://ejournal.itn.ac.id/index.php/jati/article/view/12406/7091
https://jurnal.mdp.ac.id/index.php/jatisi/article/view/6718/1758
https://mail.ejournal.itn.ac.id/index.php/jati/article/view/13041/7284
http://www.seminar.iaii.or.id/index.php/SISFOTEK/article/view/349/297
https://pdfs.semanticscholar.org/2c0f/689d19311e7666533ee1afadbf16cc427de2.pdf
https://jurnal.unidha.ac.id/index.php/jteksis/article/view/913/652
https://doi.org/10.24002/ijis.v2i2.3029
https://bajangjournal.com/index.php/JIRK/article/view/822/549
https://repository.urecol.org/index.php/proceeding/article/view/1079/1049
https://media.neliti.com/media/publications/431771-none-a2a07dcb.pdf
https://ojs.unikom.ac.id/index.php/jamika/article/view/9424/3611
https://subset.id/index.php/IJCSR/article/view/11/4
https://jurnal.polinema.ac.id/index.php/jip/article/view/2564/2022
https://www.ojs.cahayamandalika.com/index.php/jcm/article/view/2292/1799
https://jurnalitpln.id/kilat/article/view/2001/1118
https://ejournal.itn.ac.id/index.php/jati/article/view/12494/6943
https://journal.stekom.ac.id/index.php/Bisnis/article/view/251
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Chandra Saputra, Wilcent Wilcent , Hafiz Irsyad , Abdul Rahman

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.