sequential pattern

25
Sequential Pattern Mining Sequential Pattern 1

Upload: yanuardi-firmansyah

Post on 08-Dec-2015

253 views

Category:

Documents


7 download

DESCRIPTION

SQ

TRANSCRIPT

Sequential Pattern Mining

Sequential Pattern 1

Bahasan

1. Pendahuluan2. Sequence3. Sequential Pattern Mining

Sequential Pattern 2

Sequence• Sebuah sequence adalah urutan dari elemen-elemen (transaksi)

s = < e1 e2 e3 … >

Setiap elemen terdiri dari kumpulan kejadian-kejadian (item)ei = {i1, i2, …, ik}

Setiap elemen merupakan atribut yang dihubungkan dengan suatu lokasi atau waktu tertentu (spesifik)

• Panjang Sequence, |s|, adalah banyaknya unsur-unsur sequence yang diberikan.

• A k-sequence adalah sebuah sequence yang terdiri dari k kejadian (item)

Sequential Pattern 3

SequenceDatabaseSequence

Sequence Elemen (Transaksi) Kejadian(Item)

Customer Transaksi-transaksi penjualan yang dilakukan oleh konsumen tertentu

Item – item yang dibeli konsumen dalam waktu t.

Buku, diary Produk, CD, dll.

Web Data Aktifitas browsing pada pengunjung web tertentu

Sekumpulan File-file yang dilihat pengunjung web setelah melakukan proses single mouse click

Home page, index page, contact info, dll

Event data Kejadian – kejadian yang dihasilkan oleh sensor tertentu

Kejadian-kejadi yang timbul dari sensor saat waktu t

Jenis-jenis tanda(alarm) yang dihasilkan oleh sensor

Genome sequences

DNA sequence dari spesies tertentu

Elemen dari DNA sequence Bases A,T,G,C

Sequential Pattern 4

Sequence Web sequence:

< {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation} {Return to Shopping} >

Sequence kejadian kecelakaan yang disebabkan oleh ledakan nuklir pada 3-mile Island:(http://stellar-one.com/nuclear/staff_reports/summary_SOE_the_initiating_event.htm) < {clogged resin} {outlet valve closure} {loss of feedwater}

{condenser polisher outlet valve shut} {booster pumps trip} {main waterpump trips} {main turbine trips} {reactor pressure increases}>

Sequence buku checked out pada perpustakaan: <{Fellowship of the Ring} {The Two Towers} {Return of the King}>

Sequential Pattern 5

CS632 - Data Mining 64/3/01

Mining Sequential Patterns (1)

• Sequential patterns are ordered list of itemsets.

• Market basket example:– Customers typically rent “star wars” then “empire

strikes back” then “return of the Jedi”– “Fitted sheets and pillow cases” then “comforter”

then “drapes and ruffles”

CS632 - Data Mining 74/3/01

Mining Sequential Patterns (2)

• Looks at sequences of transactions as opposed to a single transaction.

• Groups transactions based on customer ID.– Customer sequence.

SequenceDefinisi Subsequent

Sequential Pattern 8

Sebuah sequence <a1 a2 … an> terdapat dalam sequence lain <b1 b2 … bm> (m ≥ n) jika terdapat integer i1 < i2 < … < in maka a1 bi1 , a2 bi1, …, an bin

Support subsequence w didefinisikan sebagai bagian dari data sequence yang berisi w

Sequential pattern adalah subsequence yang sering muncul (yaitu, support subsequence ≥ minsup)

Data sequence Subsequence Contain?

< {2,4} {3,5,6} {8} > < {2} {3,5} > Yes

< {1,2} {3,4} > < {1} {2} > No

< {2,4} {2,4} {2,5} > < {2} {4} > Yes

Sequential Pattern MiningDefinisi

Sequential Pattern 9

Terdapat:

Database sequence

Minimum menetapkan user yang

mendukung(support), minsup

Task:

Menemukan semua subsequence dengan user

yang mendukung ≥ minsup

Sequential Pattern Mining

• Algoritma Sequential Pattern Mining

Sequential Pattern 10

1. Sort Phase

2. Large Itemset Phase

3. Transformation Phase

4. Sequence Phase

5. Maximal Phase

Sequential Pattern Mining• Contoh Kasus

Sequential Pattern 11

customerID itemBought

1111

10,50203040

2222

10304030,50

3333

10203040

444

103050

55

4050

Sequential Pattern Mining

Sequential Pattern 12

customerID itemBought

1111

10,50203040

2222

10304030,50

3333

10203040

444

103050

55

4050

customerID Customer Sequence

1 <(10,50) (20) (30) (40)>

2 <(10) (30) (40) (30,50)>

3 <(10) (20) (30) (40)>

4 <(10) (30) (50)>

5 <(40) (50)>

1.Sort PhaseMengurutkan berdasarkan customerID sebagai major key

Sequential Pattern Mining2. Large Itemset Phase

Sequential Pattern 13

• Menentukan Large Itemset

• Memetakan ItemsetcustomerID Customer Sequence

1 <(10,50) (20) (30) (40)>

2 <(10) (30) (40) (30,50)>

3 <(10) (20) (30) (40)>

4 <(10) (30) (50)>

5 <(40) (50)>

Itemset Support

(10) 4

(20) 2

(30) 4

(40) 4

(50) 4

(10,50) 1

(30,50) 1

min_sup = 40%

40% x 5 = 2 customer sequence

Large Itemset Dipetakan ke-

(10) 1

(20) 2

(30) 3

(40) 4

(50) 5

Catatan:Sehingga apabila terdapat itermset (10,50), Memenuhi minimum support akan dikodekan 6, dst

Sequential Pattern Mining3. Transformation Phase

Sequential Pattern 14

Customer ID Original Sequence Transformed CustomerSequence

Setelah Pemetaan

1 <(10,50) (20) (30) (40)> ⟨ {(10) (50)} {(20)} {(30)} {(40)} ⟩ ⟨ {1, 5} {2} {3} {4} ⟩

2 <(10) (30) (40) (30,50)> ⟨ {(10)} {(30)} {(40)} {(30) (50)}⟩ ⟨ {1} {3} {4} {3, 5}⟩

3 <(10) (20) (30) (40)> ⟨ {(10)} {(20)} {(30)} {(40)} ⟩ ⟨ {1} {2} {3} {4} ⟩

4 <(10) (30) (50)> ⟨ {(10)} {(30)} {(50)}⟩ ⟨ {1} {3} {5} ⟩

5 <(40) (50)> ⟨ {(40)} {(50)}⟩ ⟨ {4} {5} ⟩

• Menghapus non-Large Itemset • Memetakan Large Itemset ke suatu integer

CS632 - Data Mining 154/3/01

Algorithm: Transformation (contoh)

Sequential Pattern Mining4. Sequential Phase

Sequential Pattern 16

Menggunakan set Large Itemset, untuk mencari hasil sequence tertentu

Dua jenis algoritma Count-All

1. Algoritma AprioriAll Count-Some

1. Algoritma AprioriSome2. Algoritma DynamicSome

Sequential Pattern Mining

⟨ {1, 5} {2} {3} {4} ⟩ ⟨ {1} {3} {4} {3, 5}⟩ ⟨ {1} {2} {3} {4} ⟩

⟨ {1} {3} {4} ⟩ ⟨ {4} {5} ⟩

Sequential Pattern 17

Customer Sequence

Sequential Pattern Mining

⟨ {1, 5} {2} {3} {4} ⟩ ⟨ {1} {3} {4} {3, 5}⟩ ⟨ {1} {2} {3} {4} ⟩

⟨ {1} {3} {5} ⟩ ⟨ {4} {5} ⟩

Sequential Pattern 18

Large 1-Sequence

Sequence Support<1> 4<2> 2<3> 4<4> 4<5> 4

Sequential Pattern Mining

⟨ {1, 5} {2} {3} {4} ⟩ ⟨ {1} {3} {4} {3, 5}⟩ ⟨ {1} {2} {3} {4} ⟩

⟨ {1} {3} {5} ⟩ ⟨ {4} {5} ⟩

Sequential Pattern 19

Large 2-Sequence

Sequence Support<1 2> 2<1 3> 4<1 4> 3<1 5> 2<2 3> 2<2 4> 2<3 4> 3<3 5> 2<4 5> 2

Sequential Pattern Mining

⟨ {1, 5} {2} {3} {4} ⟩ ⟨ {1} {3} {4} {3, 5}⟩ ⟨ {1} {2} {3} {4} ⟩

⟨ {1} {3} {5} ⟩ ⟨ {4} {5} ⟩

Sequential Pattern 20

Large 3-Sequence

Sequence Support<1 2 3> 2<1 2 4> 2<1 3 4> 3<1 3 5> 2<2 3 4> 2

Sequential Pattern Mining

⟨ {1, 5} {2} {3} {4} ⟩ ⟨ {1} {3} {4} {3, 5}⟩ ⟨ {1} {2} {3} {4} ⟩

⟨ {1} {3} {5} ⟩ ⟨ {4} {5} ⟩

Sequential Pattern 21

Large 4-Sequence

Sequence Support<1 2 3 4> 2<1 3 4 5> 1

Sequential Pattern Mining5. Maximum Phase

Sequential Pattern 22

S, set seluruh Large Itemset n, merupakan jarak terpanjang

sequence

Sequential Pattern Mining

Sequential Pattern 23

Sequence Support<1 2 3 4> 2

Sequence Support<1 2 3> 2<1 2 4> 2<1 3 4> 3<1 3 5> 2<2 3 4> 2

Sequence Support<1 2> 2<1 3> 4<1 4> 3<1 5> 3<2 3> 2<2 4> 2<3 4> 3<3 5> 2<4 5> 2

Sequence Support<1> 4<2> 2<3> 4<4> 4<5> 4

Terima kasih

Sequential Pattern

Latihan soal• Berikut daftar transaksi pembelian dari customer. Maka

carilah pola urutan pembelian item yang dilakukan customer.

Virtual Memory 25

Customer ID Transaction Time

Items Bought

11

June 25 93June 30 93

3090

222

June 10 93June 15 93June 20 93

10, 203040, 60, 70

3 June 25 93 30, 50, 70

444

June 25 93June 30 93July 25 93

3040, 7090

5 June 12 93 90

Minimum support: 2 item