analisis cluster

1. Konsep Dasar1. Konsep Dasar

2. Statistik dalam Analisis Cluster2. Statistik dalam Analisis Cluster

3. Langkah-langkah Analisis Cluster3. Langkah-langkah Analisis Cluster

a.a. Rumuskan PermasalahanRumuskan Permasalahan

b.b. Memilih ukuran Jarak atau KesamaanMemilih ukuran Jarak atau Kesamaan

c.c. Memilih Prosedur Peng-clusteranMemilih Prosedur Peng-clusteran

d.d. Menetapkan Jumlah ClusterMenetapkan Jumlah Cluster

e.e. Interpretasi dan Profil dari ClusterInterpretasi dan Profil dari Cluster

f.f. Menaksir Reliabilitas and Validitas Menaksir Reliabilitas and Validitas

Pokok BahasanPokok Bahasan

Cluster Analysis adalah suatu teknik mengelompokkan obyek atau Cluster Analysis adalah suatu teknik mengelompokkan obyek atau cases ke dalam kelompok yang relatif homogen yang disebut cases ke dalam kelompok yang relatif homogen yang disebut CLUSTERCLUSTER

Analisis Cluster sering juga disebut sebagai :Analisis Cluster sering juga disebut sebagai :• Classification AnalysisClassification Analysis• Numerical TaxonomyNumerical Taxonomy

Pengelompokan dalam Pengelompokan dalam prakek sering tidak sama dengan sering tidak sama dengan pengelompokan yang ideal

Perbedaan Analisis Discriminant dengan Cluster :Perbedaan Analisis Discriminant dengan Cluster :

Konsep DasarKonsep Dasar

Discriminant Cluster Memerlukan pengetahuan awal dari keanggotaan kelompok untuk masing masing obyek

Tidak ada informasi awal tentang keanggotaan kelompok dari obyek-obyek tersebut

W

Situasi Pengelompokan IdealSituasi Pengelompokan Ideal

Variable 2

Var

iab

le 1

Back

Situasi Pengelompokan dalam PraktekSituasi Pengelompokan dalam Praktek

XX

Variable 2

Var

iab

le 1

Back

Penggunaan Analisis ClusterPenggunaan Analisis Cluster

Contoh :Contoh :

Segmentasi Pasar.Segmentasi Pasar.

Memahami perilaku pembeliMemahami perilaku pembeli

Mengidentifikasi peluang produk baru.Mengidentifikasi peluang produk baru.

Memilih pasar yang akan diuji.Memilih pasar yang akan diuji.

Mengurangi DataMengurangi Data

Statistik dalam Analisis ClusterStatistik dalam Analisis Cluster

• Agglomeration schedule

• Cluster centroid

• Cluster Centers

• Cluster membership

• Dendrogram

• Distance between cluster centers

• Incicle diagram

Langkah-langkah Analisis ClusterLangkah-langkah Analisis Cluster

Memilih ukuran Jarak atau KesamaanMemilih ukuran Jarak atau Kesamaan

Rumuskan Permasalahan

Memilih Prosedur peng-Cluster-an

Menetapkan Jumlah Cluster

Interpretasi dan Profil dari Cluster

Menaksir Reliablitas dan Validitas

Rumuskan PermasalahanRumuskan Permasalahan

Contoh :

Melakukan pengelompokan konsumen berdasarkan sikap mereka pada akvitivas belanja. Didasarkan pada penelitian sebelumnya dapat diidentifikasikan ada enamvariabel sikap. Konsumen diminta menyatakan tingkat kesepakatan mereka dengan pernyataan skala tujuh berikut ini :

V1 = Shopping is funV2 = Shopping is bad for your budgetV3 = I combine shopping with eating out.V4 = I try to get best buys while shopping.V5 = I don’t care about shopping.V6 = You can save a lot of money by comparing prices.

Data yang diperoleh dari 20 responden adalah sebagai berikut :

Case No.Case No. VV11 VV22 VV33 VV44 VV55 VV66

1 6 4 7 3 2 32 2 3 1 4 5 43 7 2 6 4 1 34 4 6 4 5 3 65 1 3 2 2 6 46 6 4 6 3 3 47 5 3 6 3 3 48 7 3 7 4 1 49 2 4 3 3 6 310 3 5 3 6 4 611 1 3 2 3 5 312 5 4 5 4 2 413 2 2 1 5 4 414 4 6 4 6 4 715 6 5 4 2 1 416 3 5 4 6 4 717 4 4 7 2 2 518 3 7 2 6 4 319 4 6 3 7 2 720 2 3 2 4 7 2

Data MentahData Mentah

Memilih ukuran Jarak atau KesamaanMemilih ukuran Jarak atau Kesamaan

Sebab tujuan clustering adalah mengelompokan obyek bersama-sama, maka beberapa pengukuran dibutuhkan untuk menilai perbedaan atau kesamaan diantara obyek.

Pengukuran yang sering dipergunakan adalah :

Euclidean Distance is square root of the sum of the square differences in values for each variables.

City Block or Manhattan distance is the sum of the absolute differences in value for each variables

Chebychev distance is the maximum absolute difference in values for any variables.

Klasifikasi Prosedur peng-Cluster-anKlasifikasi Prosedur peng-Cluster-an

Clustering Procedures

Hierarchical Nonhierarchical

Agglomerative Divisive

SequentialThreshold

ParallelThreshold

OptimizingPartitioning

LinkageMethods

VarianceMethods

CentroidMethods

Ward’s Method

Single Complete Average

Metode Hubungan Cluster (Linkage)Metode Hubungan Cluster (Linkage)

Single Linkage

Minimum Distance

Cluster 1 Cluster 2Complete Linkage

Maximum Distance

Cluster 1 Cluster 2Average Linkage

Average Distance

Cluster 1 Cluster 2

Metode Cluster Metode Cluster Agglomerative lainnyalainnya

Ward’s Procedure

Centroid Method

Output Cluster HirarkiOutput Cluster HirarkiAgglomeration Schedule

14 16 1,000 0 0 6

6 7 2,000 0 0 7

2 13 3,500 0 0 15

5 11 5,000 0 0 11

3 8 6,500 0 0 16

10 14 8,167 0 1 9

6 12 10,500 2 0 10

9 20 13,000 0 0 11

4 10 15,583 0 6 12

1 6 18,500 0 7 13

5 9 23,000 4 8 15

4 19 27,750 9 0 17

1 17 33,100 10 0 14

1 15 41,333 13 0 16

2 5 51,833 3 11 18

1 3 64,500 14 5 19

4 18 79,667 12 0 18

2 4 172,667 15 17 19

1 2 328,600 16 18 0

Stage1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

Cluster 1 Cluster 2

Cluster Combined

Coefficients Cluster 1 Cluster 2

Stage Cluster FirstAppears

Next Stage

Icicle Plot VertikalIcicle Plot Vertikal

8+

1+

4+

5+

6+

7+

2+

3+

11+

12+

13+

14+

9+

10+

16+

19+

17+

18+

15+

1

Nomor KasusNomor Kasus

1 1 1 21 1 11 11 1

98 4 0 4 09 6 3 2 8 31 5 7 62 75

Jum

lah

Clu

ster

Jum

lah

Clu

ster

Back

Dendrogram Using Ward’s MethodDendrogram Using Ward’s Method

Rescaled Distance Cluster Combine

3

15

1

12

7

8

17

6

11

5

13

2

20

9

19

16

4

10

18

14

0 15 20 255 10Case Label Seq

Back

Keanggotaan ClusterKeanggotaan Cluster

Cluster Membership

1 1 1

2 2 2

1 1 1

3 3 2

2 2 2

1 1 1

1 1 1

1 1 1

2 2 2

3 3 2

2 2 2

1 1 1

2 2 2

3 3 2

1 1 1

3 3 2

1 1 1

4 3 2

3 3 2

2 2 2

Case1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

4 Clusters 3 Clusters 2 Clusters

ClusterCluster 4 4 clustecluste

rr

33

clustclusterer

2 2 clustecluste

rr

11 88 88 88

22 66 66 1212

33 55 66

44 11

Jumlah anggota per clusterJumlah anggota per cluster

Menetapkan Jumlah ClusterMenetapkan Jumlah Cluster

Pedoman dalam menetapkan jumlah cluster : Theoretical, conceptual, or practical consideration may suggest a

certain number of cluster. In hierarchical clustering, the distance at which cluster are combined

can be used as criteria. Thins information can be obtained from the agglomeration schedule or from the dendrogram.

In non hierarchical clustering the ratio within group variance to between group variance can be plotted against the number of cluster. Point at which an elbow or a sharp bend occurs indicates an appropriate number of clusters.

The relative size of clusters should be meaningful. In Cluster Membership table by making a simple frequency count of cluster membership. We. See that a three-cluster solution result in cluster with eight, six, and six element. However, if we go to four-cluster solution, the size of clusters are eight, six, five, and one. It is not meaningful to have a cluster with only one case.

Rata-rata per VariabelRata-rata per Variabel

No. ClusterNo. Cluster VV11 VV22 VV33 VV44 VV55 VV66

1 5.750 3.625 6.000 3.125 1.750 3.875

2 1.667 3.000 1.833 3.500 5.500 3.333

3 3.500 5.833 3.333 6.000 3.500 6.000

Cluster CentroidsCluster Centroids

Nilai Cluster Centriod dapat diperoleh dari Pengolahan Data K-Nilai Cluster Centriod dapat diperoleh dari Pengolahan Data K-Mean Cluster (lihat pada Final Cluster Center)Mean Cluster (lihat pada Final Cluster Center)

Menghitung Cluster Centroids pakai Ms EcxelMenghitung Cluster Centroids pakai Ms Ecxel

No No RespResp V1V1 v2v2 v3v3 v4v4 v5v5 v6v6 Cluster Cluster

membershipmembership11 66 44 77 33 22 33 11

33 77 22 66 44 11 33 11

66 66 44 66 33 33 44 11

77 55 33 66 33 33 44 11

88 77 33 77 44 11 44 11

1212 55 44 55 44 22 44 11

1515 66 55 44 22 11 44 11

1717 44 44 77 22 22 55 11

5,755,75 3,633,63 66 3,133,13 1,881,88 3,883,88

22 22 33 11 44 55 44 22

55 11 33 22 22 66 44 22

99 22 44 33 33 66 33 22

1111 11 33 22 33 55 33 22

1313 22 22 11 55 44 44 22

2020 22 33 22 44 77 22 22

1,671,67 33 1,831,83 3,53,5 5,55,5 3,333,33

44 44 66 44 55 33 66 33

1010 33 55 33 66 44 66 33

1414 44 66 44 66 44 77 33

1616 33 55 44 66 44 77 33

1818 33 77 22 66 44 33 33

1919 44 66 33 77 22 77 33

3,53,5 5,835,83 3,333,33 66 3,53,5 66

Cluster Cluster centroid untuk centroid untuk

Cluster 1Cluster 1


Cluster 2Cluster 2


Cluster 2Cluster 2

Interpretasi and Profil dari ClusterInterpretasi and Profil dari Cluster

Kita lihat dari Tabel Cluster Centroid :Kita lihat dari Tabel Cluster Centroid :

Pada Cluster 1 V1(shopping is funshopping is fun), dan V3 (I combine I combine shopping with eating outshopping with eating out) nilainya relatif tinggi, sehingga cluster ini dapat diberi nama “fun-loving and concerned shoppers”

Pada Cluster 2 V5(I don’t care about shoppingI don’t care about shopping) nilainya relatif tinggi, sehingga cluster ini dapat diberi nama “apathetic apathetic shoppersshoppers”

Pada Cluster 3 V2 (Shopping is bad for my budgetShopping is bad for my budget), V4 (I try I try to get the best buys while shoppingto get the best buys while shopping) , dan V6 (You can save a You can save a lot of money by comparing priceslot of money by comparing prices) nilainya relatif tinggi, sehingga cluster ini dapat diberi nama “economical shoppers”

Menaksir Reliabilitas dan ValiditasMenaksir Reliabilitas dan Validitas

Prosedur formal untuk menilai reliabilitas dan viliditas dari hasil Prosedur formal untuk menilai reliabilitas dan viliditas dari hasil cluster kompleks. Prosedur berikut cukup memadai untuk cluster kompleks. Prosedur berikut cukup memadai untuk mengecek kualitas hasil cluster :mengecek kualitas hasil cluster :

1. Perform cluster analysis on the 1. Perform cluster analysis on the same datasame data using using different different distance measuredistance measure. Compare the result across measure to . Compare the result across measure to determine the stability of the solutions.determine the stability of the solutions.

2. Use 2. Use different methodsdifferent methods of clustering and compare the result. of clustering and compare the result.

3. 3. Split the data randomlySplit the data randomly in halves. Perform clustering separetly in halves. Perform clustering separetly on each half. Compare cluster centroids across the two on each half. Compare cluster centroids across the two subsamples.subsamples.

4. 4. Delete variables randomlyDelete variables randomly. Perform clustering based on the . Perform clustering based on the reduced set of variables. Compare the result with those obtained reduced set of variables. Compare the result with those obtained by clustering based on the entire set of variables.by clustering based on the entire set of variables.

ClusterCluster V1V1 V2V2 V3V3 V4V4 V5V5 V6V611 4.00004.0000 6.00006.0000 3.00003.0000 7.00007.0000 2.00002.0000 7.00007.000022 2.00002.0000 3.00003.0000 2.00002.0000 4.00004.0000 7.00007.0000 2.00002.000033 7.00007.0000 2.00002.0000 6.00006.0000 4.00004.0000 1.00001.0000 3.00003.0000

Initial Cluster CentersInitial Cluster Centers

Results of Nonhierarchical ClusteringResults of Nonhierarchical Clustering

Classification Cluster CentersClassification Cluster CentersClusterCluster V1V1 V2V2 V3V3 V4V4 V5V5 V6V611 3.81353.8135 5.89925.8992 3.25223.2522 6.48916.4891 2.51492.5149 6.69576.695722 1.85071.8507 3.02343.0234 1.83271.8327 3.78643.7864 6.44366.4436 2.50562.505633 6.35586.3558 2.83562.8356 6.15766.1576 3.67363.6736 1.30471.3047 3.20103.2010

Case Listing of Cluster MembershipCase Listing of Cluster MembershipCase IDCase ID ClusterCluster DistanceDistance Case IDCase ID ClusterCluster DistanceDistance11 33 1.7801.780 22 22 2.2542.25433 33 1.1741.174 44 11 1.8821.88255 22 2.5252.525 66 33 2.3402.34077 33 1.8621.862 88 33 1.4101.41099 22 1.8431.843 1010 11 2.1122.1121111 22 1.9231.923 1212 33 2.4002.4001313 22 3.3823.382 1414 11 1.7721.7721515 33 3.6053.605 1616 11 2.1372.1371717 33 3.7603.760 1818 11 4.4214.4211919 11 0.8530.853 2020 22 0.8130.813

Final Cluster CentersFinal Cluster CentersClusterCluster V1V1 V2V2 V3V3 V4V4 V5V5 V6V611 3.50003.5000 5.83335.8333 3.33333.3333 6.00006.0000 3.50003.5000 6.00006.000022 1.66671.6667 3.00003.0000 1.83331.8333 3.50003.5000 5.50005.5000 3.33333.333333 5.75005.7500 3.62503.6250 6.00006.0000 3.12503.1250 1.75001.7500 3.87503.8750

Distances between Final Cluster CentersDistances between Final Cluster CentersClusterCluster 1 1 2 2 3 311 0.00000.000022 5.56785.5678 0.00000.000033 5.73535.7353 6.99446.9944 0.00000.0000

Analysis of VarianceAnalysis of VarianceVariableVariable Cluster MS df Error MS df F p Cluster MS df Error MS df F pV1V1 29.1083 29.1083 22 0.60780.6078 17 47.8879 .000 17 47.8879 .000V2V2 13.5458 13.5458 22 0.62990.6299 17 21.5047 .000 17 21.5047 .000V3V3 31.3917 31.3917 22 0.83330.8333 17 37.6700 .000 17 37.6700 .000V4V4 15.7125 15.7125 22 0.72790.7279 17 21.5848 .000 17 21.5848 .000V5V5 24.1500 24.1500 22 0.73530.7353 17 32.8440 .000 17 32.8440 .000V6V6 12.1708 12.1708 22 1.07111.0711 17 11.3632 .001 17 11.3632 .001

Number of Cases in each ClusterNumber of Cases in each ClusterClusterCluster Unweighted Cases Unweighted Cases Weighted Cases Weighted Cases 11 6 6 6622 6 6 6633 8 8 88Missing Missing 0 0TotalTotal 2020 20 20

analisis cluster

Documents