kuliah 5 estimasi bagian 2

Estimasi
Bagian II
Problems of Dimensionality Computational Complexity Component Analysis and Discriminants Hidden Markov Models Hidden Markov Model: Extension of Markov Chains

Component Analysis and DiscriminantsCombine features in order to reduce the dimension of the feature spaceLinear combinations are simple to compute and tractableProject high dimensional data onto a lower dimensional spaceTwo classical approaches for finding optimal linear transformationPCA (Principal Component Analysis) Projection that best represents the data in a least- square senseMDA (Multiple Discriminant Analysis) Projection that best separates the data in a least-squares sense
8

Hidden Markov Models: Markov ChainsGoal: make a sequence of decisionsProcesses that unfold in time, states at time t are influenced by a state at time t-1Applications: speech recognition, gesture recognition, parts of speech tagging and DNA sequencing, Any temporal process without memory
T = {(1), (2), (3), , (T)} sequence of states

We might have 6 = {1, 4, 2, 2, 1, 4}
The system can revisit a state at different steps and not every state need to be visited
10

First-order Markov modelsOur productions of any sequence is described by the transition probabilities

P(j(t + 1) | i (t)) = aij

10

= (aij, T)

P(T | ) = a14 . a42 . a22 . a21 . a14 . P((1) = i)

Example: speech recognition

production of spoken words

Production of the word: pattern represented by phonemes

/p/ /a/ /tt/ /er/ /n/ // ( // = silent state)

Transitions from /p/ to /a/, /a/ to /tt/, /tt/ to er/, /er/ to /n/ and /n/ to a silent state

10

Hidden Markov Model (HMM)
Interaction of the visible states with the hidden states
bjk= 1 for all j where bjk=P(Vk(t) | j(t)).
3 problems are associated with this model
The evaluation problem
The decoding problem
The learning problem

The evaluation problem
It is the probability that the model produces a sequence VT of visible states. It is:

where each r indexes a particular sequence

of T hidden states.

Using equations (1) and (2), we can write:

Interpretation: The probability that we observe the particular sequence of T visible states VT is equal to the sum over all rmax possible sequences of hidden states of the conditional probability that the system has made a particular transition multiplied by the probability that it then emitted the visible symbol in our target sequence.
Example: Let 1, 2, 3 be the hidden states; v1, v2, v3 be the visible states

and V3 = {v1, v2, v3} is the sequence of visible states

P({v1, v2, v3}) = P(1).P(v1 | 1).P(2 | 1).P(v2 | 2).P(3 | 2).P(v3 | 3)

++ (possible terms in the sum= all possible (33= 27) cases !)

First possibility:

Second Possibility:

P({v1, v2, v3}) = P(2).P(v1 | 2).P(3 | 2).P(v2 | 3).P(1 | 3).P(v3 | 1) + +

Therefore:

1

(t = 1)

2

(t = 2)

3

(t = 3)

v1

v2

v3

2

(t = 1)

1

(t = 3)

3

(t = 2)

v3

v2

v1

The decoding problem (optimal state sequence)

Given a sequence of visible states VT, the decoding problem is to find the most probable sequence of hidden states.

This problem can be expressed mathematically as:

find the single best state sequence (hidden states)

Note that the summation disappeared, since we want to find

Only one unique best case !

Where: = [,A,B]

= P((1) = ) (initial state probability)

A = aij = P((t+1) = j | (t) = i)

B = bjk = P(v(t) = k | (t) = j)

In the preceding example, this computation corresponds to the selection of the best path amongst:

{1(t = 1),2(t = 2),3(t = 3)}, {2(t = 1),3(t = 2),1(t = 3)}

{3(t = 1),1(t = 2),2(t = 3)}, {3(t = 1),2(t = 2),1(t = 3)}

{2(t = 1),1(t = 2),3(t = 3)}

The learning problem (parameter estimation)

This third problem consists of determining a method to adjust the model parameters = [,A,B] to satisfy a certain optimization criterion. We need to find the best model

Such that to maximize the probability of the observation sequence:

We use an iterative procedure such as Baum-Welch or Gradient to find this local optimum

=

=

=

-

=

max

r

1

r

T

t

1

t

T

)

1

t

(

|

)

t

(

(

P

))

t

(

|

)

t

(

v

(

P

)

V

(

P

w

w

w

=

=

=

=

-

=

=

T

t

1

t

T

r

T

t

1

t

T

r

T

))

1

t

(

|

)

t

(

(

P

)

P(

(2)

))

t

(

|

)

t

(

v

(

P

)

|

V

(

P

)

1

(

w

w

w

w

w

{

}

)

T

(

),...,

2

(

),

1

(

T

r

w

w

w

w

=

=

=

-

=

states

hidden

of

sequence

possible

3

t

1

t

3

2

1

))

1

t

(

|

)

t

(

(

P

)).

t

(

|

)

t

(

v

(

P

})

v

,

v

,

v

({

P

w

w

w

)

(

P

)

|

V

(

P

)

V

(

P

T

r

r

1

r

T

r

T

T

max

w

w

=

=

]

B

,

A

,

[

p

l

=

[

]

l

w

w

w

w

w

w

w

w

w

w

w

w

|

)

T

(

V

),...,

2

(

v

),

1

(

v

),

T

(

),...,

2

(

),

1

(

P

max

arg

)

T

(

),...,

2

(

),

1

(

:

that

such

)

T

(

),...,

2

(

),

1

(

)

T

(

),...,

2

(

),

1

(

=

)

|

V

(

P

Max

T

l

l

kuliah 5 estimasi bagian 2

Documents