pendahuluan data mining

32
Data Mining: Konsep dan Teknik  October 10, 2012 Data Mining: Konsep Dan T eknik  1 1   Bab 1 Syahril Ef endi , S.Si., MI T De pa rtemen Ma tematika & De par te men Ilmu Komp uter FMIPA USU

Upload: panda93

Post on 06-Jan-2016

28 views

Category:

Documents


3 download

DESCRIPTION

Pendahuluan data mining

TRANSCRIPT

Page 1: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 1/32

Data Mining:Konsep dan Teknik 

October 10, 2012 Data Mining: Konsep Dan Teknik   11

 

 — Bab 1 —

Syahril Efendi, S.Si., MIT

Departemen Matematika & Departemen Ilmu Komputer

FMIPA USU

Page 2: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 2/32

Bab 1. Pengenalan

Kenapa Data Mining?

 Apa itu Data Mining?

Pandangan Multi-Dimensional dari Data Mining

Macam data apa dapat ditambang?

 

October 10, 2012 Data Mining: Concepts and Techniques 2

 

Macam-macam pola apa dapat ditambang?

Teknologi apa yang digunakan?

Macam aplikasi apa yang ditargetkan? Isu-isu utama dalam Data Mining

Laporan singkat Histori Data Mining dan Masyarakat Data Mining

Kesimpulan

Page 3: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 3/32

Kenapa Data Mining?

Ledakan Pertumbuhan data : dari terabytes sampai petabytes

Pengumpulan data dan Ketersediaan data

Perkakas pengumpulan data otomatis, sitem database, Web, masyarakat

komputerisasi

Sumber-sumber Utama dari data berlimpah

Bisnis: Web, e-commerce, transactions, stocks, …

 

October 10, 2012 Data Mining: Concepts and Techniques 3

 

Sain: Remote sensing, bioinformatics, scientific simulation, …

Society : Berita, camera digital, YouTube

Kita tenggelam dalam data tapi lapar Pengetahuan

Kebutuhan adalah induk dari penemuan “Necessity is the mother of invention” Data mining:Analisis otomatis dari himpunan segerombolan data

Page 4: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 4/32

Evolusi dari Sain

Sebelum 1600, Ilmu Empiris (empirical science)

1600-1950, Ilmu teoritikal (theoretical science)

Setiap disiplin ilmu memiliki pertumbuhan komponen teoritikal. Model-model

teoritikal kerap kali termotivasi dari pengalaman dan digeneralisasi pemahamannya.

1950-1990, Ilmu Komputasional (computational science)

Lebih 50 tahun terakhir, Beberapa disiplin memiliki tiga pertumbuhan, cabang

komputasional (misalnya: empiris, teoritikal, dan ekologi komputasional, atau

 

October 10, 2012 Data Mining: Concepts and Techniques 4

 

p ys , a au ngu s .

Simulasi Ilmu komputasional secara tradisional. Pertumbuhannya tidak dapat

menemukan bentuk solusi model matematika kompleks.

1990-Sekarang, Ilmu data (data science)

Banjir data dari instrumen dan simulasi ilmu-ilmu baru

Kemampuan penyimpanan secara ekonomi dan manajemen data online (petabytes)

Internet dan jaringan komputasi yang dapat diakses mendapatkan arsip-arsip secara

universal

Scientific info. management, acquisition, organization, query, and visualization tasks

scale selalu linier dengan volume data. Data mining adalah tantangan utama baru!

Page 5: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 5/32

Evolusi Teknologi Database

1960s:

Pengumpulan Data, Pembentukan database, IMS dan jaringan DBMS

1970s:

model data Relasional, implementation DBMS relasional

1980s:

RDBMS, model data lanjutan(extended-relational, OO, deductive, dll.) 

October 10, 2012 Data Mining: Concepts and Techniques 5

 

, , , .

1990s:

Data mining, data warehousing, multimedia databases, dan Web databases

2000s

Stream data management and mining Data mining dan aplikasinya

Teknologi Web(XML, integrasi data) dan sistem informasi global

Page 6: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 6/32

 Apa itu Data Mining?

Data mining (knowledge discovery from data)

Ekstraksi kepentingan(non-trivial, implisit, sebelumnya tak diketahui

dan bermanfaat secara potensial) pola-pola atau pengetahuan dari

 jumlah data yang besar

Data mining: istilah tak cocok atau nama yang salah (a misnomer)? 

October 10, 2012 Data Mining: Concepts and Techniques 6

 

Knowledge discovery (mining) in databases (KDD), knowledge

extraction, data/pattern analysis, data archeology, data dredging,

information harvesting, business intelligence, etc.

Tampilan : berubah jadi “data mining”? Pencarian sederhana dan pemrosesan query

(Deduktif) sistem pakar

Page 7: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 7/32

Knowledge Discovery (KDD) Process

Ini adalah pandangan typikalsistem database dan komunitidata warehousing

Peran data mining penting dalamproses penemuan pengetahuan(knowledge discovery)

 

Data Mining

Pattern Evaluation

October 10, 2012 Data Mining: Concepts and Techniques 7

Data Cleaning

Data Integration

Databases

Data Warehouse

as -re evan a a

Selection

Page 8: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 8/32

Contoh : Kerangka Web Mining

Web mining biasanya meminta

Pencucian data (Data cleaning)

Integrasi data dari banyak sumber

sebuah database untuk penyimpanan data (Warehousing the data)

Konstruksi Data cube

 

October 10, 2012 Data Mining: Concepts and Techniques 8

 

Seleksi data untuk data mining

Data mining

Presentasi dari hasil-hasil penambangan

Pola-pola dan pengetahuan digunakan atau disimpan ke dalamknowledge-base

Page 9: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 9/32

Data Mining dalam Kecerdasan Bisnis

Peningkatan potensial

untuk mendukung

keputusan bisnis   End User

BusinessAnal st

DecisionMaking

Data Presentation

 

October 10, 2012 Data Mining: Concepts and Techniques 9

Data

Analyst

DBA

Visualization Techniques

Data Mining Information Discovery

Data ExplorationStatistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data Warehouses

Data Sources

 Paper, Files, Web documents, Scientific experiments, Database Systems

Page 10: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 10/32

Contoh: Mining vs. Eksplorasi Data

Kajian Kecerdasan Bisnis

Warehouse, data cube, pelaporan yang tidak banyak penambangan

Objek-objek bisnis vs. Perkakas data mining 

October 10, 2012 Data Mining: Concepts and Techniques 10

 

onto ranta sup a : er a as too s

Presenatasi Data

Eksplorasi

Page 11: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 11/32

Proses KDD: Pandangan Tipikal dari ML danStatistik 

Input Data DataMining

Data Pre-Processing

Post-Processing

October 10, 2012 Data Mining: Concepts and Techniques 11

Ini ada pandangan dari mesin pembelajaran dan komuniti statistik 

Integrasi data

Normalisasi

Seleksi Fitur

Reduksi Dimensin

Penemuan Pola Asosiasi & KorelasiKlasifikasi

Cluster Analisis Pencilan (Outlier)

… … … …

Evaluasi Pola

Seleksi Pola

Interpretasi Pola

 Visualisasi Pola

Page 12: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 12/32

Contoh : Data Mining Kedokteran

data mining Kesehatan dan kedokteran–

seringkali mengadopsi statistik dan mesin

pembelajaran

 

October 10, 2012 Data Mining: Concepts and Techniques 12

 

dan reduksi dimensi)

Klasifikasi dan/atau proses cluster

 Akhir pemrosesan untuk presentasi

Page 13: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 13/32

Pandangan Multi-Dimensi Data Mining

Data untuk ditambang Database data (extended-relational, object-oriented, heterogeneous,

legacy), data warehouse, transactional data, stream, spatiotemporal,time-series, sequence, text and web, multi-media, graphs & socialand information networks

Knowledge untuk ditambang (atau: fungsi-fungsi Data mining) Karakterisasi Diskriminasi asosiasi klasifikasi cluster trend deviasi

 

October 10, 2012 Data Mining: Concepts and Techniques 13

 

analisis pencilan (outlier), dll.

Deskriptif vs. prediktif data mining

Fungsi-fungsi Multiple/integrated dan penambangan di level multiple

Teknik-teknik utilisasi Data-intensive, data warehouse (OLAP), machine learning, statistics,

pattern recognition, visualization, high-performance, dll.

 Applikasi

Retail, telecommunication, banking, fraud analysis, bio-data mining,stock market analysis, text mining, Web mining, dll.

Page 14: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 14/32

Data Mining: macam-macam Data?

 Aplikasi dan kumpulan data berorintasi Database

Relational database, data warehouse, transactional database

 Aplikasi lanjutan dan kumpulan data lanjutan

Data streams and sensor data

Time-series data, temporal data, sequence data (incl. bio-sequences)

 

October 10, 2012 Data Mining: Concepts and Techniques 14

 

Structure data, graphs, social networks and multi-linked data

Object-relational databases

Heterogeneous databases and legacy databases

Spatial data and spatiotemporal data Multimedia database

Text databases

The World-Wide Web

Page 15: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 15/32

Fungsi Data Mining: (1) Generalisasi

Integrasi Informasi dan konstruksi data warehouse

Pencucian data, transformasi, integrasi, dan model

data multidimensional

Teknologi Data cube

 

October 10, 2012 Data Mining: Concepts and Techniques 15

 

,

agregat multidimensional

OLAP (online analytical processing)

Deskripsi konsep multidimensional: Karakterisasi dandiskriminasi

Generalisasi, Meringkas (summarize), dan karakteristik 

data kontras, yakni., wilayah kering vs. basah

Page 16: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 16/32

Fungsi Data Mining: (2) Asosiasi dan Analisis Korelasi

Frekuensi pola-pola (atau frekuensi kumpulan item)

 Apa item-item yang dibelanjakan bersama secara frekuensi

dalam pusat perbelanjaan?

 Asosiasi, korelasi vs. Kasual (sebab akibat)

Tipikal aturan asosiasi 

October 10, 2012 Data Mining: Concepts and Techniques 16

 

  . , ,

kepercayaan)

Item-item diasosiasikan dengan kuat juga dikorelasikan dengan

kuat?

Bagaimana menambang pola-pola dan aturan-aturan dengan efisiendalam kumpulan data besar?

Bagaimana menggunakan pola-pola untuk klasifikasi, cluster, dan

aplikasi lain?

Page 17: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 17/32

Fungsi Data Mining: (3) Klasifikasi

Klasifikasi dan prediksi label

Menbangun dasar model (fungsi) pada beberapa contoh pelatihan

Menggambarkan dan membedakan kelas-kelas atau Konsep-konsep untuk 

memprediksi masa depan

 Yakni., mengklasifikasi negara berdasarkan iklim (climate), atau

mengklasifikasi mobil berdasarkan jarak dan penggunaan bensin atau

 

October 10, 2012 Data Mining: Concepts and Techniques 17

 

solar

Memprediksi beberapa kelas label yang tak diketahui

Metode Tipikal

Pohon Keputusan, Klasifikasi Bayesian, support vector machines, neural

networks, Kalsifikasi berdasar aturan,Klasifikasi berdasar pola, logisticregression, …

 Aplikasi Tipikal:

Deteksi kecurangan kartu kredit, Perdagangan langsung, classifying stars,

Penyebaran penyakit (diseases), web-pages, …

Page 18: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 18/32

Fungsi Data Mining: (4) Anailisis Cluster

Pembelajaran yang tidak disupervisi (yakni, label kelas tak 

diketahui)

Group data untuk kategori baru (yakni, cluster), misalnya.,

cluster rumah untuk menemukan pola-pola distribusi

Prinsi : Maksimumkan kesamaan dalam kelas intra-class

 

October 10, 2012 Data Mining: Concepts and Techniques 18

 

& minimumkan kesamaan antar kelas (interclass)

Banyak Metode dan aplikasi

Page 19: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 19/32

Fungsi Data Mining: (5) Analisis Pencilan(Ou t l i e r  )

 Analisis Pencilan (Outlier)

Pencilan (Outlier): Suatu objek data yang tidak memenuhi dengan

prilaku umum data

Gangguan (Noise) atau Pengecualian (exception)? ― Satu orang

menyampah orang yang lain dapat menghargai

 

October 10, 2012 Data Mining: Concepts and Techniques 19

 

Berguna dalam deteksi kecurangan, analisis kejadian yang aneh

Page 20: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 20/32

Time and Ordering: Analisis Polasekuensial, Trend dan Evolusi

 Analisis Sekuen, trend dan evolusi

Trend, time-series, dan analisis deviasi: misalnya.,regresi dan prediksi nilai

Penambangan pola sekuensial

Misalnya, Pertama membeli camera digital, 

October 10, 2012 Data Mining: Concepts and Techniques 20

 

 Analisis periodik 

Motif dan analisis sekuen biologikal

Pendekatan dan motif berurutan  Analsis berbasis kesamaan

Penambangan data mengalir (streams)

Ordered, Waktu-bermacam-macam, potentially infinite,data streams

Page 21: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 21/32

 Analisis struktur dan jaringan

Penambangan graf (Graph mining)

Menemukan subgraf yang sering (misalnya., senayawa kimia), trees (XML),substructures (web fragments)

 Analisis jaringan informasi (Information network analysis)

Jaringan sosial (Social networks): aktor (objek, node) dan hubungan (edge)

misalnya, jaringan penulis dalam CS, jaringan teroris

Jaringan Multiple heterogeneous

 

October 10, 2012 Data Mining: Concepts and Techniques 21

 

a u orang mempunya e erapa ar ngan n ormas : eman, am , emansekelas, …

Link yang membawa banyak informasi semantik: Link mining

Penamabangan web (Web mining)

Web adalah jaringan informasi besar: dari PageRank untuk Google

 Analisis jaringan informasi web Penemuan komunitas Web, penambangan pendapat, penamabangan

pengguna, …

Page 22: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 22/32

Evaluasi Pengetahuan

 Apa pentingnya semua pengetahuan ditambang?

Satu orang mendapat pola dan pengetahuan dalam jumlah yang

besar

Some may fit only certain dimension space (time, location, …)

Some may not be representative, may be transient, …

 

October 10, 2012 Data Mining: Concepts and Techniques 22

 

Evaluation of mined knowledge → directly mine only

interesting knowledge?

Descriptive vs. predictive

Coverage

Typicality vs. novelty

 Accuracy

Timeliness

Page 23: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 23/32

Data Mining: Confluence of Multiple Disciplines

 

MachineLearning

StatisticsPatternRecognition

October 10, 2012 Data Mining: Concepts and Techniques 23

Data Mining Applications

 Algorithm High-PerformanceComputing

 Visualization

DatabaseTechnology

Page 24: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 24/32

Why Confluence of Multiple Disciplines?

Tremendous amount of data (Jumlah data yg luar biasa)

 Algorithms must be highly scalable to handle such as tera-bytes of data

High-dimensionality of data

Micro-array may have tens of thousands of dimensions 

October 10, 2012 Data Mining: Concepts and Techniques 24

 

 

Data streams and sensor data

Time-series data, temporal data, sequence data

Structure data, graphs, social networks and multi-linked data

Heterogeneous databases and legacy databases

Spatial, spatiotemporal, multimedia, text and Web data

Software programs, scientific simulations

New and sophisticated applications

Page 25: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 25/32

 Applications of Data Mining

Web page analysis: from web page classification, clustering to

PageRank & HITS algorithms

Collaborative analysis & recommender systems

Basket data analysis to targeted marketing

Biological and medical data analysis: classification, cluster analysis

 

October 10, 2012 Data Mining: Concepts and Techniques 25

 

(microarray data analysis), biological sequence analysis, biological

network analysis

Data mining and software engineering (e.g., IEEE Computer, Aug.

2009 issue) From major dedicated data mining systems/tools (e.g., SAS, MS SQL-

Server Analysis Manager, Oracle Data Mining Tools) to invisible data

mining

Page 26: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 26/32

Major Issues in Data Mining (1)

Mining Methodology

Mining various and new kinds of knowledge

Mining knowledge in multi-dimensional space

Data mining: An interdisciplinary effort

 

October 10, 2012 Data Mining: Concepts and Techniques 26

 

Handling noise, uncertainty, and incompleteness of data

Pattern evaluation and pattern- or constraint-guided mining

User Interaction

Interactive mining

Incorporation of background knowledge

Presentation and visualization of data mining results

Page 27: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 27/32

Major Issues in Data Mining (2)

Efficiency and Scalability

Efficiency and scalability of data mining algorithms

Parallel, distributed, stream, and incremental mining methods

Diversity of data types 

October 10, 2012 Data Mining: Concepts and Techniques 27

 

 

Mining dynamic, networked, and global data repositories

Data mining and society

Social impacts of data mining

Privacy-preserving data mining

Invisible data mining

Page 28: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 28/32

 A Brief History of Data Mining Society

1989 IJCAI Workshop on Knowledge Discovery in Databases

Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. Frawley,

1991)

1991-1994 Workshops on Knowledge Discovery in Databases

 Advances in Knowledge Discovery and Data Mining (U. Fayyad, G.

Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 1996)

 

October 10, 2012 Data Mining: Concepts and Techniques 28

 

1995-1998 International Conferences on Knowledge Discovery in Databases

and Data Mining (KDD’95-98)

Journal of Data Mining and Knowledge Discovery (1997)

 ACM SIGKDD conferences since 1998 and SIGKDD Explorations More conferences on data mining

PAKDD (1997), PKDD (1997), SIAM-Data Mining (2001), (IEEE) ICDM

(2001), etc.

 ACM Transactions on KDD starting in 2007

Page 29: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 29/32

Conferences and Journals on Data Mining

KDD Conferences

 ACM SIGKDD Int. Conf. onKnowledge Discovery inDatabases and Data Mining (KDD)

SIAM Data Mining Conf. (SDM)

(IEEE) Int. Conf. on Data Mining(ICDM)

 

Other related conferences

DB conferences: ACM SIGMOD,

 VLDB, ICDE, EDBT, ICDT, …

Web and IR conferences: WWW,

SIGIR, WSDM

ML conferences: ICML, NIPS

 

October 10, 2012 Data Mining: Concepts and Techniques 29

 

European Conf. on MachineLearning and Principles andpractices of Knowledge Discoveryand Data Mining (ECML-PKDD)

Pacific-Asia Conf. on KnowledgeDiscovery and Data Mining(PAKDD)

Int. Conf. on Web Search andData Mining (WSDM)

 

PR conferences: CVPR,

Journals

Data Mining and Knowledge

Discovery (DAMI or DMKD)

IEEE Trans. On Knowledge andData Eng. (TKDE)

KDD Explorations

 ACM Trans. on KDD

Page 30: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 30/32

Where to Find References? DBLP, CiteSeer, Google

Data mining and KDD (SIGKDD: CDROM) Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD, etc.

Journal: Data Mining and Knowledge Discovery, KDD Explorations, ACM TKDD

Database systems (SIGMOD: ACM SIGMOD Anthology —CD ROM) Conferences: ACM-SIGMOD, ACM-PODS, VLDB, IEEE-ICDE, EDBT, ICDT, DASFAA 

Journals: IEEE-TKDE, ACM-TODS/TOIS, JIIS, J. ACM, VLDB J., Info. Sys., etc.

 AI & Machine Learning Conferences: Machine learnin ML AAAI IJCAI COLT Learnin Theor CVPR NIPS etc.

 

October 10, 2012 Data Mining: Concepts and Techniques 30

 

Journals: Machine Learning, Artificial Intelligence, Knowledge and Information Systems,IEEE-PAMI, etc.

Web and IR  Conferences: SIGIR, WWW, CIKM, etc.

Journals: WWW: Internet and Web Information Systems,

Statistics Conferences: Joint Stat. Meeting, etc.

Journals: Annals of statistics, etc.

 Visualization Conference proceedings: CHI, ACM-SIGGraph, etc.

Journals: IEEE Trans. visualization and computer graphics, etc.

Page 31: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 31/32

Recommended Reference Books

S. Chakrabarti. Mining the Web: Statistical Analysis of Hypertex and Semi-Structured Data. Morgan

Kaufmann, 2002

R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2ed., Wiley-Interscience, 2000

T. Dasu and T. Johnson. Exploratory Data Mining and Data Cleaning. John Wiley & Sons, 2003

U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery and

Data Mining. AAAI/MIT Press, 1996

U. Fayyad, G. Grinstein, and A. Wierse, Information Visualization in Data Mining and Knowledge

Discovery, Morgan Kaufmann, 2001

 

October 10, 2012 Data Mining: Concepts and Techniques 31

 

. an an . am er. a a n ng: oncep s an ec n ques. organ au mann, n e ., e .

2011)

D. J. Hand, H. Mannila, and P. Smyth, Principles of Data Mining, MIT Press, 2001

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference,

and Prediction, 2nd ed., Springer-Verlag, 2009

B. Liu, Web Data Mining, Springer 2006.

T. M. Mitchell, Machine Learning, McGraw Hill, 1997

G. Piatetsky-Shapiro and W. J. Frawley. Knowledge Discovery in Databases. AAAI/MIT Press, 1991

P.-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Wiley, 2005

S. M. Weiss and N. Indurkhya, Predictive Data Mining, Morgan Kaufmann, 1998

I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java

Implementations, Morgan Kaufmann, 2nd ed. 2005

Page 32: Pendahuluan Data Mining

7/17/2019 Pendahuluan Data Mining

http://slidepdf.com/reader/full/pendahuluan-data-mining 32/32

Summary

Data mining: Discovering interesting patterns and knowledge from

massive amount of data

 A natural evolution of database technology, in great demand, with

wide applications

 A KDD process includes data cleaning, data integration, data 

October 10, 2012 Data Mining: Concepts and Techniques 32

 

, , , ,

knowledge presentation

Mining can be performed in a variety of data

Data mining functionalities: characterization, discrimination,

association, classification, clustering, outlier and trend analysis, etc.

Data mining technologies and applications

Major issues in data mining