implementasi algoritma rabin karp untuk … · gambar 3. 8 sequence diagram cari judul berita...

12
IMPLEMENTASI ALGORITMA RABIN KARP UNTUK REKOMENDASI JUDUL BERITA INDONESIA TUGAS AKHIR Sebagai Persyaratan Guna Meraih Gelar Sarjana Strata 1 Teknik Informatika Universitas Muhammadiyah Malang Oleh: Adika Ridlo Taqwin NIM. 201210370311068 JURUSAN TEKNIK INFORMATIKA FAKULTAS TEKNIK UNIVERSITAS MUHAMMADIYAH MALANG 2016

Upload: hoangnhu

Post on 09-Mar-2019

223 views

Category:

Documents


0 download

TRANSCRIPT

IMPLEMENTASI ALGORITMA RABIN KARP

UNTUK REKOMENDASI JUDUL BERITA

INDONESIA

TUGAS AKHIR

Sebagai Persyaratan Guna Meraih Gelar Sarjana Strata 1 Teknik Informatika

Universitas Muhammadiyah Malang

Oleh:

Adika Ridlo Taqwin

NIM. 201210370311068

JURUSAN TEKNIK INFORMATIKA

FAKULTAS TEKNIK

UNIVERSITAS MUHAMMADIYAH MALANG

2016

DAFTAR ISI

BAB I ....................................................................................................................... 1

PENDAHULUAN .................................................................................................. 1

1.1 LATAR BELAKANG .................................................................................... 1

1.1.1 RUMUSAN MASALAH ............................................................................................................... 2 1.1.2 TUJUAN PENELITIAN ................................................................................................................. 3 1.1.3 BATASAN MASALAH ................................................................................................................. 3 1.1.4 METODOLOGI ......................................................................................................................... 3 1.1.5 SISTEMATIKA PENULISAN........................................................................................................... 4

BAB II ..................................................................................................................... 6

LANDASAN TEORI ............................................................................................. 6

2.1 BERITA .......................................................................................................... 6

2.2 PERBEDAAN BERITA MEDIA ELEKTRONIK DAN BERITA

MEDIA CETAK..................................................................................................... 7

2.3 INFORMASI .................................................................................................. 8

2.4 SISTEM INFORMASI .................................................................................. 8

2.5 SIMILARITY ................................................................................................. 9

2.5.1 DISTANCE-BASED SIMILARITY MEASURE ........................................................................................ 9 2.5.2 FEATURE-BASED SIMILARITY MEASURE ......................................................................................... 9 2.5.3 PROBABILISTIC-BASED SIMILIRATY MEASURE .................................................................................. 9

2.6 PENGUKURAN NILAI SIMILARITY ....................................................... 9

2.7 KAPPA STATISTIK ................................................................................... 10

2.8 PRECISION .................................................................................................. 11

2.9 TEXT MINING ............................................................................................ 12

2.10 TEXT PROCESSING ................................................................................ 12

2.11 ALGORITMA RABIN KARP .................................................................. 13

2.12 PROSES HASHING................................................................................... 13

2.13 RECOMMENDER SYSTEMS ................................................................... 14

2.13.1 FILTERING BERDASARKAN ATURAN (RULE-BASED RECOMMENDATION)............................................ 14 2.13.2 FILTERING BERBASIS KONTEN (CONTENT-BASED RECOMMENDATION) .............................................. 14 2.13.3 PENYARINGAN KOLABORATIF (COLLABORATIVE FILTERING (CF) BASED ............................................. 15 2.13.4 HYBRID FILTERING (HYBRID FILTERING BASED RECOMMENDATION) ................................................. 15

2.14 REKOMENDASI KUTIPAN LOKAL (LOCAL CITATION

RECOMMENDATION)...................................................................................... 15

2.15 REKOMENDASI KUTIPAN GLOBAL (GLOBAL CITATION

RECOMMENDATION)...................................................................................... 16

2.16 WEB SCRAPPING .................................................................................... 17

2.17 SOSIAL MEDIA ........................................................................................ 18

BAB III .................................................................................................................. 19

ANALISA DAN PERANCANGAN SISTEM ................................................... 19

3.1 FLOWCHART SISTEM .............................................................................. 19

3.2 PERANCANGAN KEBUTUHAN SISTEM ............................................. 20

3.2.1 USE CASE DAN SKENARIO SISTEM ............................................................................................. 21 3.2.2 ACTIVITY DIAGRAM ................................................................................................................ 21

3.3 PERANCANGAN TAHAP ANALIASA ................................................... 24

3.3.1 ROBUSTNESS DIAGRAM .......................................................................................................... 24 3.3.2 SEQUENCE DIAGRAM LEVEL ANALISA......................................................................................... 24 3.3.3 CLASS DIAGRAM .................................................................................................................... 27

3.4 TAHAPAN PENCARIAN RABBIT SEARCH .......................................... 27

3.5 TAHAPAN PREPROCESSING .................................................................. 28

3.5.1 CASE FOLDING ...................................................................................................................... 28 3.5.2 NORMALISASI KATA ............................................................................................................... 28 3.5.3 TOKENIZING.......................................................................................................................... 30 3.5.4 STEMMING ........................................................................................................................... 30 3.5.5 STOPWORD REMOVAL ............................................................................................................ 35

3.6 TAHAPAN PROCESSING ......................................................................... 37

3.6.1 PEMBENTUKAN K-GRAM ........................................................................................................ 37 3.6.2 PROSES HASHING .................................................................................................................. 39 3.6.3 MENGHILANGKAN NILAI GANDA PADA NILAI HASHING ................................................................. 39 3.6.4 MENGHITUNG NILAI KEDEKATAN .............................................................................................. 40

3.7 PERANCANGAN ANTARMUKA ............................................................ 41

3.7.1 PERANCANGAN HALAMAN UTAMA ........................................................................................... 41 3.7.2 PERANCANGAN HALAMAN KATEGORI ........................................................................................ 42 3.7.3 PERANCANGAN HALAMAN HASIL PENCARIAN .............................................................................. 43 3.7.4 PERANCANGAN HALAMAN DETAIL PROSES ................................................................................. 44

BAB IV .................................................................................................................. 45

IMPLEMENTASI DAN PENGUJIAN SISTEM .............................................. 45

4.1 SPESIFIKASI KEBUTUHAN HARDWARE DAN SOFTWARE ......... 45

4.1.1 KEBUTUHAN HARDWARE ........................................................................................................ 45 4.1.2 KEBUTUHAN SOFTWARE .......................................................................................................... 45

4.2 IMPLEMENTASI SISTEM ........................................................................ 45

4.2.1 PEMBUATAN BASIS DATA ........................................................................................................ 46 4.2.1.1 Tabel Berita ............................................................................................................... 46 4.2.1.2 Tabel Preprocessing Key ........................................................................................... 47 4.2.1.3 Tabel Preprocessing Teks Pembanding ..................................................................... 48

4.2.1.4 Tabel Stopword ......................................................................................................... 50 4.2.1.5 Tabel Kata Dasar ....................................................................................................... 50 4.2.1.6 Tabel Stopword Removal .......................................................................................... 51 4.2.1.7 Tabel Rabin ............................................................................................................... 51 4.2.1.8 Tabel Sementara ....................................................................................................... 53

4.2.2 PEMBUATAN CODE PROGRAM .................................................................................................. 54 4.2.2.1 Konfigurasi pada fremework CI ................................................................................ 54 4.2.2.2 Pembuatan Kode Program Pada Kelas Berita ........................................................... 56

4.2.2.2.1 Fungsi Index ..................................................................................................................... 56 4.2.2.2.2 Fungsi Home ..................................................................................................................... 56 4.2.2.2.3 Fungsi Scrapping Sub ........................................................................................................ 57 4.2.2.2.4 Fungsi Scrapping All ......................................................................................................... 58 4.2.2.2.5 Fungsi Cari ........................................................................................................................ 58 4.2.2.2.6 Fungsi Multi Explode ........................................................................................................ 59 4.2.2.2.7 Fungsi Hapus Imbuhan ..................................................................................................... 59 4.2.2.2.8 Fungsi Ambil Judul ............................................................................................................ 60 4.2.2.2.9 Fungsi Preprocessing Judul .............................................................................................. 60 4.2.2.2.10 Fungsi Stopword Removal .............................................................................................. 62

4.2.2.3 Pembuatan Kode Program Pada Kelas Site .............................................................. 63 4.2.2.3.1 Fungsi Kategori ................................................................................................................. 63 4.2.2.3.2 Fungsi Hasil Kategor ......................................................................................................... 64 4.2.2.3.3 Fungsi Rabin Fix ................................................................................................................ 64 4.2.2.3.4 Fungsi Hasil....................................................................................................................... 67

4.3 PENGUJIAN ................................................................................................ 67

4.3.1 PENGUJIAN ANTAR MUKA ....................................................................................................... 68 4.3.1.1 Halaman Utama ....................................................................................................... 68 4.3.1.2 Halaman Kategori ..................................................................................................... 69 4.3.1.3 Halaman Hasil Pencarian .......................................................................................... 71 4.3.1.4 Halaman Detail Proses .............................................................................................. 73

4.3.2 PENGUJIAN MENGGUNAKAN KAPPA STATISTIK ............................................................................ 75 4.3.4 PENGUJIAN PRECISION ............................................................................................................ 77 4.3.5 PENGUJIAN TAHAPAN PROSES PENCARIAN ................................................................................. 78

BAB V ................................................................................................................... 81

KESIMPULAN DAN SARAN ............................................................................ 81

5.1 KESIMPULAN ............................................................................................ 81

5.2 SARAN .......................................................................................................... 81

DAFTAR PUSTAKA ........................................................................................... 82

Daftar Gambar

Gambar 3. 1 Flowchart Sistem ............................................................................. 19

Gambar 3. 2 Use Case Sistem .............................................................................. 21

Gambar 3. 3 Activity Diagram Scrapping Data ................................................... 21

Gambar 3. 4 Activity Diagram Pilih Kategori ...................................................... 22

Gambar 3. 5 Activity Diagram Cari Berita ........................................................... 22

Gambar 3. 6 Activity Diagram Detail Proses ....................................................... 23

Gambar 3. 7 Robustness Diagram ........................................................................ 24

Gambar 3. 8 Sequence Diagram Cari Judul Berita .............................................. 25

Gambar 3. 9 Sequence Diagram Pilih Kategori ................................................... 25

Gambar 3. 10 Sequence Diagram Scrapping Data .............................................. 26

Gambar 3. 11 Sequence Diagram Detail Proses................................................... 26

Gambar 3. 12 Class Diagram ............................................................................... 27

Gambar 3. 13 Bagan Tahapan Pencarian .............................................................. 27

Gambar 3. 14 Contoh Case Folding ..................................................................... 28

Gambar 3. 15 Contoh Normalisasi ....................................................................... 29

Gambar 3. 16 Contoh Tokenizing ......................................................................... 30

Gambar 3. 17 Alur Proses Stimming .................................................................... 31

Gambar 3. 18 Alur Proses Penghapusan Kata Tidak Penting .............................. 36

Gambar 3. 19 Contoh Proses Pembentukan K-gram ............................................ 38

Gambar 3. 20 Perancangan Utama ....................................................................... 41

Gambar 3. 21 Perancangan Halaman Kategori .................................................... 42

Gambar 3. 22 Perancangan Halaman Hasil Pencarian ......................................... 43

Gambar 3. 23 Perancangan Halaman Detail Proses ............................................. 44

Gambar 4. 1 Kode Program 1 ............................................................................... 54

Gambar 4. 2 Kode Program 2 ............................................................................... 54

Gambar 4. 3 Contoh Menggunakan htaccess ....................................................... 54

Gambar 4. 4 Contoh Tidak Menggunakan htaccess ............................................. 55

Gambar 4. 5 Konfigurasi Autoload ...................................................................... 55

Gambar 4. 6 Konfigurasi Routes .......................................................................... 55

Gambar 4. 7 Konfigurasi Koneksi Database ........................................................ 55

Gambar 4. 8 Fungsi Index..................................................................................... 56

Gambar 4. 9 Fungsi Home .................................................................................... 56

Gambar 4. 10 Fungsi Scrapping Sub .................................................................... 57

Gambar 4. 11 Fungsi Scrapping All ..................................................................... 58

Gambar 4. 12 Fungsi Cari..................................................................................... 58

Gambar 4. 13 Fungsi Multi Explode ..................................................................... 59

Gambar 4. 14 Fungsi Hapus Imbuhan .................................................................. 59

Gambar 4. 15 Fungsi Hapus Imbuhan .................................................................. 60

Gambar 4. 16 Kode Program Untuk Case Folding .............................................. 60

Gambar 4. 17 Kode Program Untuk Tokenizing .................................................. 61

Gambar 4. 18 Kode Program Untuk Normalisasi................................................. 61

Gambar 4. 19 Kode Program Untuk Tokenizing .................................................. 62

Gambar 4. 20 Fungsi Stopword Removal ............................................................. 62

Gambar 4. 21 Fungsi Kategori ............................................................................. 63

Gambar 4. 22 Fungsi Hasil Kategori .................................................................... 64

Gambar 4. 23 Fungsi Rabin Fix............................................................................ 65

Gambar 4. 24 Fungsi Proses Hashing .................................................................. 65

Gambar 4. 25 Kode Proses Hashing ..................................................................... 66

Gambar 4. 26 Kode Proses Menghitung Nilai Kedekatan.................................... 66

Gambar 4. 27 Proses Penyimpanan Nilai Similarity ............................................ 66

Gambar 4. 28 Fungsi Hasil ................................................................................... 67

Gambar 4. 29 Halaman Utama ............................................................................. 68

Gambar 4. 30 Halaman Utama Untuk Smartphone dan Tablet ............................ 68

Gambar 4. 31 Halaman Kategori .......................................................................... 69

Gambar 4. 32 Halaman Kategori Untuk Smartphone dan Tablet ......................... 70

Gambar 4. 33 Halaman Hasil Pencarian ............................................................... 71

Gambar 4. 34 Halaman Hasil Pencarian Untuk Smartphone dan Tablet ............. 72

Gambar 4. 35 Halaman Detail Proses ................................................................... 73

Gambar 4. 36 Halaman Detail Proses Untuk Smartphone dan Tablet ................. 74

Daftar Tabel

Tabel 1. Perbedaan Berita Media Elektronik Dengan Media Cetak ....................... 7

Tabel 2. Proses Penghapusan Kata Pada Teks Pembanding Yang Tidak Memiliki

Kemiripan Dengan Teks Masukan ......................................................................... 37

Tabel 3. Contoh Hasil Pembentukan K-gram........................................................ 38

Tabel 4. Hasil Pembentukan Hashing ................................................................... 39

Tabel 5. Contoh Penghilangan Nilai Ganda Pada Hashing ................................... 40

Tabel 6. Contoh Hasil Hashing ............................................................................. 40

Tabel 7. Berita ....................................................................................................... 46

Tabel 8. Preprocessing Key ................................................................................... 47

Tabel 9. Preprocessing Teks Pembanding ............................................................ 48

Tabel 10. Tabel Stopword...................................................................................... 50

Tabel 11. Kata Dasar ............................................................................................. 50

Tabel 12. Tabel Stopword Removal....................................................................... 51

Tabel 13. Tabel Rabin ........................................................................................... 51

Tabel 14. Sementara .............................................................................................. 53

Tabel 15. Tabel Skenario Hasil Pengujian Kappa ................................................. 75

Tabel 16. Tabel Skenario Hasil Pengujian Kappa ................................................. 76

Tabel 17. Tabel Hasil Pengujian Kappa ................................................................ 76

Tabel 18. Tabel Skenario Pengujian Precision ..................................................... 77

Tabel 19. Hasil Pengujian Precision ..................................................................... 78

Tabel 20. Hasil Pengujian Tahapan Proses Pencarian........................................... 79

Daftar Persamaan

Persamaan 2. 1 Menghitung Nilai Similarity ....................................................... 10

Persamaan 2. 2 Menghitung Nilai P(A) ............................................................... 11

Persamaan 2. 3 Menghitung Nilai P(E) ................................................................ 11

Persamaan 2. 4 Menghitung Nilai Kappa ............................................................. 11

Persamaan 2. 5 Menghitung Nilai Precision ........................................................ 11

Persamaan 2. 6 Menghitung Nilai Hashing.......................................................... 13

DAFTAR PUSTAKA

[1] N. H. R. A. d. M. A. Wandi, “Pengembangan Sistem Rekomendasi

Penelusuran Buku dengan Penggalian Association Rule Menggunakan

Algoritma Apriori,” Jurnal Teknik ITS, vol. 1, pp. 445-449, 2012.

[2] Salmuasih, “Perancangan Sistem Deteksi Plagiat Pada Dokumen Teks

Dengan Konsep Similarity Menggunakan Algoritma Rabin Karp,” 2013.

[3] M. S. P. B. S. Sahriar Hamza, “Sistem Koreksi Soal Essay Otomatis Dengan

Menggunakan Metode Rabin Karp,” Jurnal EECCIS, vol. 7, no. 2, pp. 153-

158, 2013.

[4] D. I. Muda, JURNALISTIK TELEVISI "Menjadi Reporter Profesional",

Bandung: PT REMAJA ROSDAKARYA, 2005.

[5] Witarto, Memahami Sistem Informasi. Pendekatan praktis rekayasa sistem

informasi melalui kasus-kasus sistem informasi di sekitar kita, Bandung:

Informatika Bandung, 2004.

[6] B. Zaka, “Theory and Applications of Similarity Detection Techniques,”

Disertation. Institute for Information Systems and Computer Media (IISCM),

Graz University of Technology Austria, 2009.

[7] S. H. M. Muhammad A. Al Rahmani, “N-Gram-Based Techniques for Arabic

Text Document Matching; Case Study: Courses Accreditation,” ABHATH

AL-YARMOUK: "Basic Sci. & Eng.", vol. 21, no. 1, pp. 85-105, 2012.

[8] F. R. A. Maskur, “Implementasi Web Semantik Untuk Aplikasi Pencarian

Tugas Akhir Menggunakan Ontologi Dan Cosine Similarity,” Jurnal Ilmiah

NERO, vol. 2, no. 1, pp. 11-18, 2015.

[9] J. S. Ronen Feldman, The Text Mining Handbook, Cambridge: Cambridge

University Press, 2007.

[10] Z. A. B. Y. F. A. W. Diah Pudi Langgeni, “Clustering Artikel Berita

Berbahasa Indonesia Menggunakan Unsupervised Feature Selection,”

Seminar Nasional Informatika 2010 (semnasIF 2010), pp. 1-10, 2010.

[11] F. A. S. S. R. Rahadian Dustrial Dewandono, “Clone Detection Using Rabin-

Karp Parallel Algorithm,” Departemen of Informatics, Institut Teknologi

Sepuluh Nopember, pp. 21-26, 2013.

[12] A. R. N. K. Vidya SaiKrishna, “String Matching and its Applications in

Diversified Fields,” IJCSI International Journal of Computer Science Issues,

vol. 9, no. 1, pp. 219-226, 2012.

[13] H. B. Firdaus, “Deteksi Plagiat Dokumen Menggunakan Algoritma Rabin-

Karp,” JURNAL ILMU KOMPUTER DAN TEKNOLOGI INFORMASI, vol.

III, no. 12, 2003.

[14] I. R. Ahmad Aulia Wiguna, “Pemanfaatan Algoritma Rabin-Karp Untuk

Mengetahui Tingkat Kemiripan Dari Source Code Pada Pemrograman Lisp”.

[15] S. A. D. A. A. R. Vidya, “Recommendation of News Groups to the Users

Based on Cobweb Clustering,” Scholars Journal of Engineering and

Technology (SJET), vol. 2, no. 1, pp. 54-59, 2014.

[16] M. N. B. S. N. M. N. I. U. Yahya AlMurtadha, “IPACT: Improved Web Page

Recommendation System Using Profile Aggregation Based On Clustering of

Transaction,” American Journal of Applied Sciences, vol. 8, no. 3, pp. 277-

283, 2011.

[17] X. K. d. Haifeng Liu, “Context-Based Collaborative Filtering for Citation

Recommendation,” The Journal for rapid open access publishing, vol. 3, pp.

1695-1702, 2015.

[18] C. Hanretty, Scrapping The Web For Arts And Humanities, Anglia:

University Of East Anglia, 2013.

[19] R. D. Curtis Rasmussen, “Empowering Users Through Privacy Management

Recommender Systems,” IEEE Canada International Humanitarian

Technology Conference - (IHTC), 2014.