Query Processing and Ranking of News Titles Related to the Governor of West Java Using TF-IDF and Cosine Similarity

Keywords:
Cosine Similarity, Document Ranking, News Title Search, TF-IDF, Web ScrapingAbstract
Increasing efficiency and relevance in searching for news information is a pressing need
in the digital era. This study aims to develop a news title ranking system based on keywords (que
ries) by combining the Term Frequency-Inverse Document Frequency (TF-IDF) and cosine similar
ity methods. The data used are 2,507 news titles from four of the most popular news sites in Indo
nesia, namely Kompas.com, Detik.com, CNNIndonesia.com, and Tempo.com in the last one year.
The stages carried out include web scraping, pre-processing (case folding, tokenizing, stopwords
removal, and stemming), word weighting using TF-IDF, similarity calculation using cosine simi
larity, to system performance evaluation with accuracy, precision, recall, and f1-score metrics. The
test results on three different queries show that the system is able to provide very good results with
an average accuracy of 99.75%, precision 96.67%, recall 100%, and f1-score 98.33%. This study
shows that the combination of TF-IDF and cosine similarity is effective in optimizing the search for
news titles that are relevant to the entered query.
References
M. A. H. Erwan Effendy, Forsaktinahot Hasugian, “Menulis Isi Berita Dan Feature,” J. Pendidik. dan Konseling, vol. 4, pp. 1349–1358, 2022.
Oktamia Anggraini Putri, “Jurnal Pendidikan dan Konseling,” J. Pendidik. dan Konseling, vol. 4, no. 20, pp. 1349–1358, 2022.
N. Tahir et al., “FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data,” PeerJ Comput. Sci., vol. 7, pp. 1–24, 2021, doi: 10.7717/peerj-cs.389.
M. Fikri, “BRANDING POLITIK CALON GUBERNUR JAWA BARAT PADA AKUN INSTAGRAM @ dedimulyadi71 BRANDING POLITICS OF WEST JAVA GUBERNOR CANDIDATE ON @ dedimulyadi71 INSTAGRAM ACCOUNT,” vol. 3, no. 1, pp. 1–11, 2024.
R. Adolph, “済無No Title No Title No Title,” vol. 3, no. 2, pp. 1–23, 2016.
R. L. Musyarofah, E. U. Utami, and S. R. Raharjo, “Analisis Komentar Potensial pada Social Commerce Instagram Menggunakan TF-IDF,” J. Eksplora Inform., vol. 9, no. 2, pp. 130–139, 2020, doi: 10.30864/eksplora.v9i2.360.
R. Saputra and M. Galih Pradana, “Implementasi Algoritma Cosine Similarity dan TF-IDF dalam Menentukan Rumpun Jabatan,” vol. 12, no. 1, pp. 1–11, 2024, doi: 10.32832/kreatif.v12i1.15470.
V. M. Hersianty et al., “PENERAPAN ALGORITMA TF-IDF DAN COSINE SIMILARITY,” vol. 9, no. 1, pp. 1619–1625, 2025.
I. S. Wibowo, A. Witanti, and I. Susilawati, “Keyword Extraction Judul Berita Online Di Indonesia Menggunakan Metode TF-IDF,” J. Tek. Inform. dan Sist. Inf., vol. 11, no. 1, pp. 99–111, 2024, [Online]. Available: http://jurnal.mdp.ac.id
S. Patulus et al., “IMPLEMENTASI TEKNIK QUERY OPTIMIZATION UNTUK MENINGKATKAN,” vol. 9, no. 2, pp. 2437–2442, 2025.
I. Widaningrum, D. Mustikasari, R. Arifin, S. L. Tsaqila, and D. Fatmawati, “Algoritma Term Frequency-Inverse Document Frequency (TF-IDF) dan K-Means Clustering Untuk Menentukan Kategori Dokumen,” Pros. Semin. Nas. Sist. Inf. dan Teknol., pp. 145–149, 2022.
H. Zakiyudin and K. Marzuki, “Penerapan Algoritma Cosine Similarity dan Pembobotan TF-IDF System Penerimaan Mahasiswa Baru pada Kampus Swasta Application of the Cosine Similarity Algorithm and Weighting of the TF-IDF System for New Student Admissions on Private Campuses,” vol. 3, no. 1, pp. 19–27, doi: 10.30812/bite.v3i1.1110.
A. Z. Rizquina and C. I. Ratnasari, “Implementasi Web Scraping untuk Pengambilan Data Pada Website E-Commerce,” J. Teknol. Dan Sist. Inf. Bisnis, vol. 5, no. 4, pp. 377–383, 2023, doi: 10.47233/jteksis.v5i4.913.
A. Rahmatulloh and R. Gunawan, “Web Scraping with HTML DOM Method for Data Collection of Scientific Articles from Google Scholar,” Indones. J. Inf. Syst., vol. 2, no. 2, pp. 95–104, 2020, doi: 10.24002/ijis.v2i2.3029.
E. Prayitno, T. Suprawoto, and ..., “Optimasi Hasil Pencarian Pada Web Scrapping Menggunakan Pembobotan Kata Tf-Idf,” J. Innov. Res. Knowl., vol. 1, no. 7, pp. 241–246, 2021, [Online]. Available: https://bajangjournal.com/index.php/JIRK/article/view/822
Y. Sahria, “Implementasi Teknik Web Scraping pada Jurnal SINTA Untuk Analisis Topik Penelitian Kesehatan Indonesia,” URECOL (Unversity Res. Colloqium), pp. 297–306, 2020, [Online]. Available: http://repository.urecol.org/index.php/proceeding/article/view/1079
S. Data et al., “TF-IDF DAN K-MEANS DENGAN MEMANFAATKAN,” vol. 3, no. 1, 2022.
R. Wati, S. Ernawati, and H. Rachmi, “Pembobotan TF-IDF Menggunakan Naïve Bayes pada Sentimen Masyarakat Mengenai Isu Kenaikan BIPIH,” J. Manaj. Inform., vol. 13, no. 1, pp. 84–93, 2023, doi: 10.34010/jamika.v13i1.9424.
I. P. Wibina, K. Gumi, and A. Syafrianto, “Perbandingan Algoritma Naïve Bayes dan Decision Tree Pada Sentimen Analisis,” vol. 1, pp. 1–15, 2022.
D. F. AL-Hafiidh, I. F. Rozi, and I. K. Putri, “Peringkasan Teks Otomatis pada Portal Berita Olahraga menggunakan metode Maximum Marginal Relevance.,” J. Inform. Polinema, vol. 8, no. 3, pp. 21–30, 2022, doi: 10.33795/jip.v8i3.519.
K. Tri Putra, M. Amin Hariyadi, and C. Crysdian, “Perbandingan Feature Extraction Tf-Idf Dan Bow Untuk Analisis Sentimen Berbasis Svm,” J. Cahaya MAndalika, p. 1449, 2023.
P. C. Siswipraptini, “Klasifikasi Pekerjaan Bidang Teknologi Informasi Menggunakan Algoritma Cosine Similarity,” Kilat, vol. 12, no. 1, pp. 38–48, 2023, doi: 10.33322/kilat.v12i1.2001.
Y. S. Cendikia et al., “MENENTUKAN WARNA MAKE UP YANG COCOK BERDASARKAN JENIS SKINTONE PADA CITRA WAJAH MENGGUNAKAN NAIVE BAYES CLASSIFIER,” vol. 9, no. 1, pp. 816–823, 2025.
M. Maulidah, Windu Gata, Rizki Aulianita, and Cucu Ika Agustyaningrum, “Algoritma Klasifikasi Decision Tree Untuk Rekomendasi Buku Berdasarkan Kategori Buku,” E-Bisnis J. Ilm. Ekon. dan Bisnis, vol. 13, no. 2, pp. 89–96, 2020, doi: 10.51903/e-bisnis.v13i2.251.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Chandra Saputra, Wilcent Wilcent , Hafiz Irsyad , Abdul Rahman

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.