sistem pencari resensi buku menggunakan algoritma...
TRANSCRIPT
1
SISTEM PENCARI RESENSI BUKU MENGGUNAKAN ALGORITMA
COSINE SIMILARITY
Proposal Tugas Akhir
Diajukan Untuk Memenuhi
Persyaratan Guna Meraih Gelar Sarjana Strata 1
Teknik Informatika Universitas Muhammadiyah Malang
ASTNA AISYANA
(201310370311151)
JURUSAN TEKNIK INFORMATIKA
FAKULTAS TEKNIK
UNIVERSITAS MUHAMMADIYAH MALANG
2017
KATA PENGANTAR
Puji dan syukur penulis panjatkan ke hadirat Allah subhanahu wa ta’ala yang
telah melimpahkan kasih dan sayang-Nya kepada kita, sehingga penulis bisa
menyelesaikan skripsi dengan tepat waktu, yang berjudul:
“Sistem Pencari Resensi Buku Menggunakan Algoritma Cosine Similarity"
Tujuan dari penyusunan skripsi ini guna memenuhi salah satu syarat untuk
bisa menempuh ujian sarjana pada Fakultas Teknik Program Studi Teknik
Informatika di Universitas Muhammadiyah Malang.
Penulis menyadari masih banyak kekurangan dan keterbatasa dalam
penulisan tugas akhir ini. Maka dari itu, penulis mengharapkan saran dan kritik
yang dapat membangun agar penulisan ini dapat berguna untuk perkembangan ilmu
pengetahuan.
Malang, 1 Oktober 2017
Penulis Astna Aisyana
ix
DAFTAR ISI
LEMBAR PERSETUJUAN ................................................................................... i
LEMBAR PENGESAHAN ................................................................................... ii
LEMBAR PERNYATAAN ................................................................................. iii
KATA PENGANTAR ......................................................................................... iv
ABSTRAK ........................................................................................................... v
ABSTRACT ........................................................................................................ vi
LEMBAR PERSEMBAHAN.............................................................................. vii
DAFTAR ISI ....................................................................................................... ix
DAFTAR TABEL ............................................................................................... xi
DAFTAR GAMBAR .......................................................................................... xii
DAFTAR FORMULA ....................................................................................... xiii
BAB 1 PENDAHULUAN .................................................................................. 1
1.1 Latar Belakang ....................................................................................... 1
1.2 Rumusan Masalah................................................................................... 4
1.3 Batasan Masalah ..................................................................................... 4
1.4 Tujuan Penelitian .................................................................................... 4
1.5 Metodologi Penelitian ............................................................................. 5
1.5.1 Studi Literatur ................................................................................. 5
1.5.2 Pengumpulan Data ........................................................................... 5
1.5.3 Perancangan Sistem ......................................................................... 5
1.5.4 Implementasi ................................................................................... 5
1.5.5 Pengujian Sistem ............................................................................. 5
1.5.6 Pembuatan Laporan ......................................................................... 6
1.6 Sistematika Penulisan ............................................................................. 6
BAB 2 LANDASAN TEORI ............................................................................. 8
2.1 Tinjauan Pustaka .................................................................................... 8
2.2 Information Retrieval .............................................................................. 9
2.3 Preprocessing ......................................................................................... 9
2.4 Indexing ............................................................................................... 12
2.5 Cosine Similarity .................................................................................. 13
2.6 WordNet ............................................................................................... 15
2.6.1 WordNet Bahasa ............................................................................ 16
2.7 Generalized Vector Space Model (GVSM) ........................................... 16
x
2.8 Semantic Relatedness ............................................................................ 18
2.9 Evaluation of Ranked Retrieval ............................................................. 21
2.9.1 Recall dan precision ...................................................................... 21
BAB 3 ANALISA DAN PERANCANGAN SISTEM ....................................... 23
3.1 Analisa Masalah ................................................................................... 23
3.2 Desain Sistem ....................................................................................... 23
3.3 Data Penelitian ..................................................................................... 24
3.4 PreProcessing Data .............................................................................. 26
3.4.1 Case Folding ................................................................................. 27
3.4.2 Tokenizing ..................................................................................... 27
3.4.3 Filtering ........................................................................................ 27
3.4.4 Stemming ....................................................................................... 28
3.5 Indexing ................................................................................................ 29
3.6 Term Weighting .................................................................................... 29
3.7 Generalized Vector Space Model (GVSM) ............................................ 30
3.8 Perhitungan Cosine Similarity ............................................................... 31
3.9 WordNet Bahasa ................................................................................... 31
3.10 Perancangan Interface ....................................................................... 32
3.10.1 Halaman Upload Resensi Buku ...................................................... 32
3.10.2 Halaman Preprocessing Buku ........................................................ 33
3.10.3 Halaman Indexing Resensi Buku.................................................... 35
3.10.4 Halaman Pencarian Resensi Buku .................................................. 35
3.11 Perancangan Pengujian ...................................................................... 36
3.11.1 Evaluasi Precision dan Recall ........................................................ 36
BAB 4 IMPLEMENTASI DAN PENGUJIAN .................................................. 37
4.1 Implementasi dan Pengujian ................................................................. 37
4.1.1 Implementasi Interface .................................................................. 37
4.1.2 Metode Pengujian Data .................................................................. 40
4.2 Kesalahan Hasil Preprocessing ............................................................. 43
4.3 Evaluasi dan Analisa Hasil.................................................................... 44
BAB 5 PENUTUPAN ....................................................................................... 46
5.1 Kesimpulan .......................................................................................... 46
5.2 Saran .................................................................................................... 47
xi
DAFTAR TABEL
Tabel 2.1 Contoh Term Query ............................................................................ 20 Tabel 2.2 Contoh Term Indeks ............................................................................ 20 Tabel 2.3 Contoh ID Sense Term ke-1 Query dan Indeks .................................... 20 Tabel 3.1 Contoh Data Resensi Buku .................................................................. 24 Tabel 3.2 Contoh Stop Word List ........................................................................ 28 Tabel 3.3 Contoh Kata Dasar .............................................................................. 28 Tabel 3.4 Contoh struktur file index .................................................................... 29 Tabel 3.5 Contoh Struktur File df ........................................................................ 29 Tabel 3.6 Data WordNet Bahasa ......................................................................... 31 Tabel 4.1 Query pengujian .................................................................................. 40 Tabel 4.2 Hasil Pengujian Precision dan Recall ................................................... 41 Tabel 4.3 Hasil Pengujian Precision dan Recall ................................................... 42
xii
DAFTAR GAMBAR
Gambar 2.1 Ilustrasi Cosine Similarity ................................................................ 14 Gambar 2.2 Fragmen hubungan is-a dalam WordNet .......................................... 16 Gambar 2.3 Komputasi Semantic Relatedness ..................................................... 19 Gambar 3.1 Desain Sistem Sistem Pencari Resensi Buku .................................... 24 Gambar 3.2 Skema Preprocessing Data ............................................................... 26 Gambar 3.3 Skema Case Folding ........................................................................ 27 Gambar 3.4 Skema Tokenizing ........................................................................... 27 Gambar 3.5 Skema Filtering ............................................................................... 27 Gambar 3.6 Skema Stemming ............................................................................. 28 Gambar 3.7 Skema Proses Term Weighting ........................................................ 30 Gambar 3.8 Skema Generalized Vector Space Model ......................................... 30 Gambar 3.9 Skema Proses Cosine Similarity ....................................................... 31 Gambar 3.10 Skema WordNet Bahasa ................................................................ 32 Gambar 3.11 Halaman Menu Resensi Buku ........................................................ 33 Gambar 3.12 Halaman proses Case Folding, Tokenizing, dan Stopword Removal ........................................................................................................................... 33 Gambar 3.13 Halaman Hasil Stopword Removal dan proses Stemming .............. 34 Gambar 3.14 Halaman Hasil Stemming .............................................................. 34 Gambar 3.15 Halaman Indexing Resensi Buku ................................................... 35 Gambar 3.16 Halaman Pencarian Resensi Buku .................................................. 36 Gambar 4.1 Halaman Upload File Resensi Buku ................................................. 37 Gambar 4.2 Halaman Case Folding, Tokenizing, dan Stopword Removal ........... 38 Gambar 4.3 Halaman Stemming ......................................................................... 38 Gambar 4.4 Halaman Hasil Stemming ................................................................ 39 Gambar 4.5 Halaman Indexing ............................................................................ 39 Gambar 4.6 Halaman Mesin Pencari ................................................................... 40 Gambar 4.7 Hasil Pencarian. ............................................................................... 40 Gambar 4.8 Kesalahan Hasil Preprocessing ........................................................ 44 Gambar 4.9 Diagram Garis Precision dan Recall ................................................. 44
DAFTAR PUSTAKA
[1] F. Amin, “Sistem Temu Kembali Informasi dengan Metode Vector Space
Model,” J. Sist. Inf. Bisnis, vol. 2, pp. 78–83, 2012.
[2] A. Hamzah, “Temu kembali informasi berbasis kluster untuk sistem temu
kembali informasi teks bahasa indonesia,” J. Teknol., vol. 2, no. 1, pp. 1–7,
2009.
[3] F. X. dan A. W. Arunanto, “RANCANG BANGUN APLIKASI
PENCARIAN DOKUMEN BERBASIS WEB MENGGUNAKAN
METODE SUFFIX CACTUS CLUSTERING,” vol. 2, no. 1, pp. 1–7, 2003.
[4] A. Utomo, “SISTEM REKOMENDASI BUKU BERBASIS WEB
RETRIEVAL DENGAN MENGGUNAKAN BIWORD WINNOWING
FINGERPRINT,” Universitas Islam Negeri Sultan Syarif Kasim Riau, 2014.
[5] Sugiyamta, “SISTEM DETEKSI KEMIRIPAN DOKUMEN DENGAN
ALGORITMA COSINE SIMILARITY DAN SINGLE PASS
CLUSTERING,” vol. 7, no. 2, 2015.
[6] A. Firdaus and A. Vatresia, “APLIKASI PENDETEKSI KEMIRIPAN
PADA DOKUMEN TEKS MENGGUNAKAN ALGORITMA NAZIEF &
ADRIANI DAN METODE COSINE SIMILARITY,” vol. 10, no. 1, pp. 96–
109, 2014.
[7] D. W. Wicaksono and M. I. Irawan, “Sistem Deteksi Kemiripan Antar
Dokumen Teks Menggunakan Model Bayesian Pada Term Latent Semantic
Analysis (LSA),” vol. 3, no. 2, 2014.
[8] I. W. S. Wicaksana, “Membandingkan Pendekatan Latent Semantic terhadap
WordNet untuk Semantic Similarity.”
[9] S. Christina, “Kinerja Cosine Similarity Dan Semantic Similarity Dalam
Pengidentifikasian Relevansi Nomor Halaman Pada Daftar Indeks Istilah,”
Semin. Nas. Teknol. Inf. dan Komun., vol. 2014, no. Sentika, 2014.
[10] W. Witanti, H. Rahmanto, and F. Renaldi, “PEMBANGUNAN SISTEM
TEMU BALIK INFORMASI ( INFORMATION RETRIEVAL ) DALAM
PEMILIHAN PEMAIN SEPAK BOLA BERKUALITAS,” Semin. Nas.
Teknol. Inf. dan Komun., pp. 18–19, 2016.
[11] F. W. O. P. and J. B. B. Darmawan, “SISTEM PEMEROLEHAN
INFORMASI UNDANG-UNDANG DAN KASUS MENGGUNAKAN
STRUKTUR DATA INVERTED INDEX DENGAN PEMBOBOTAN TF-
IDF,” J. Ilm. Widya Tek., vol. 15, no. 2, pp. 127–132, 2016.
[12] L. Meng, R. Huang, and J. Gu, “A Review of Semantic Similarity Measures
in WordNet 1,” Int. J. Hybrid Inf. Technol., vol. 6, no. 1, pp. 1–12, 2013.
[13] G. Tsatsaronis and V. Panagiotopoulou, “A Generalized Vector Space Model
for Text Retrieval Based on Semantic Relatedness,” Proc. EACL 2009
Student Res. Work., pp. 70–78, 2009.
[14] B. K. Wangsa, D. Utomo, and S. Nugroho, “Sistem Peringkas Berita
Otomatis berbasis Text Mining menggunakan Generalized Vector Space
Model : Studi Kasus Berita diambil dari Media Massa Online,” J. Ilm.
Elektrotek., vol. 13, no. 2, pp. 231–241, 2014.
[15] A. Wibowo, A. H. Handojo, and C. Widjaja, “Implementasi Generalized
Vector Space Model Menggunakan WordNet,” Artif. Intell.
[16] P. Clough and M. Sanderson, “Evaluating the performance of information
retrieval systems using test collections,” vol. 18, no. 2, 2013.
[17] P. L. Pendit, “Perpustakaan Digital : Perspektif Perpustakaan Perguruan
Tinggi Indonesia,.” Sagung Seto, Jakarta, 2007.
[18] L. Robinson, “Implementasi Metode Generalized Vector Space Model Pada
Aplikasi Information Retrieval untuk Pencarian Informasi Pada Kumpulan
Dokumen Teknik Elektro Di UPT BPI LIPI Jurnal Ilmiah Komputer dan
Informatika ( KOMPUTA ),” J. Ilm. Komput. dan Inform.