kuliah 5 estimasi bagian 2

13
1 Estimasi Bagian II • Problems of Dimensionality • Computational Complexity • Component Analysis and Discriminants • Hidden Markov Models • Hidden Markov Model: Extension of Markov Chains

Upload: teddy-ekatamto

Post on 27-Sep-2015

234 views

Category:

Documents


1 download

DESCRIPTION

copyright by Ahmad Hidayatno ST, MTTeknik Elektro Undip Semarang

TRANSCRIPT

  • Estimasi
    Bagian II

    Problems of Dimensionality Computational Complexity Component Analysis and Discriminants Hidden Markov Models Hidden Markov Model: Extension of Markov Chains
  • Component Analysis and DiscriminantsCombine features in order to reduce the dimension of the feature spaceLinear combinations are simple to compute and tractableProject high dimensional data onto a lower dimensional spaceTwo classical approaches for finding optimal linear transformationPCA (Principal Component Analysis) Projection that best represents the data in a least- square senseMDA (Multiple Discriminant Analysis) Projection that best separates the data in a least-squares sense

    8

  • Hidden Markov Models: Markov ChainsGoal: make a sequence of decisionsProcesses that unfold in time, states at time t are influenced by a state at time t-1Applications: speech recognition, gesture recognition, parts of speech tagging and DNA sequencing, Any temporal process without memory

    T = {(1), (2), (3), , (T)} sequence of states

    We might have 6 = {1, 4, 2, 2, 1, 4}

    The system can revisit a state at different steps and not every state need to be visited

    10

  • First-order Markov modelsOur productions of any sequence is described by the transition probabilities


    P(j(t + 1) | i (t)) = aij

    10

  • 10

  • = (aij, T)

    P(T | ) = a14 . a42 . a22 . a21 . a14 . P((1) = i)

    Example: speech recognition

    production of spoken words

    Production of the word: pattern represented by phonemes

    /p/ /a/ /tt/ /er/ /n/ // ( // = silent state)

    Transitions from /p/ to /a/, /a/ to /tt/, /tt/ to er/, /er/ to /n/ and /n/ to a silent state

    10

  • Hidden Markov Model (HMM)
    Interaction of the visible states with the hidden states

    bjk= 1 for all j where bjk=P(Vk(t) | j(t)).

    3 problems are associated with this model
    The evaluation problem
    The decoding problem
    The learning problem
  • The evaluation problem

    It is the probability that the model produces a sequence VT of visible states. It is:


    where each r indexes a particular sequence

    of T hidden states.

  • Using equations (1) and (2), we can write:


    Interpretation: The probability that we observe the particular sequence of T visible states VT is equal to the sum over all rmax possible sequences of hidden states of the conditional probability that the system has made a particular transition multiplied by the probability that it then emitted the visible symbol in our target sequence.
    Example: Let 1, 2, 3 be the hidden states; v1, v2, v3 be the visible states

    and V3 = {v1, v2, v3} is the sequence of visible states

    P({v1, v2, v3}) = P(1).P(v1 | 1).P(2 | 1).P(v2 | 2).P(3 | 2).P(v3 | 3)

    ++ (possible terms in the sum= all possible (33= 27) cases !)

  • First possibility:

    Second Possibility:


    P({v1, v2, v3}) = P(2).P(v1 | 2).P(3 | 2).P(v2 | 3).P(1 | 3).P(v3 | 1) + +


    Therefore:

    1

    (t = 1)

    2

    (t = 2)

    3

    (t = 3)

    v1

    v2

    v3

    2

    (t = 1)

    1

    (t = 3)

    3

    (t = 2)

    v3

    v2

    v1

  • The decoding problem (optimal state sequence)

    Given a sequence of visible states VT, the decoding problem is to find the most probable sequence of hidden states.

    This problem can be expressed mathematically as:

    find the single best state sequence (hidden states)

    Note that the summation disappeared, since we want to find

    Only one unique best case !

  • Where: = [,A,B]

    = P((1) = ) (initial state probability)

    A = aij = P((t+1) = j | (t) = i)

    B = bjk = P(v(t) = k | (t) = j)

    In the preceding example, this computation corresponds to the selection of the best path amongst:

    {1(t = 1),2(t = 2),3(t = 3)}, {2(t = 1),3(t = 2),1(t = 3)}

    {3(t = 1),1(t = 2),2(t = 3)}, {3(t = 1),2(t = 2),1(t = 3)}

    {2(t = 1),1(t = 2),3(t = 3)}

  • The learning problem (parameter estimation)

    This third problem consists of determining a method to adjust the model parameters = [,A,B] to satisfy a certain optimization criterion. We need to find the best model

    Such that to maximize the probability of the observation sequence:

    We use an iterative procedure such as Baum-Welch or Gradient to find this local optimum

    =

    =

    =

    -

    =

    max

    r

    1

    r

    T

    t

    1

    t

    T

    )

    1

    t

    (

    |

    )

    t

    (

    (

    P

    ))

    t

    (

    |

    )

    t

    (

    v

    (

    P

    )

    V

    (

    P

    w

    w

    w

    =

    =

    =

    =

    -

    =

    =

    T

    t

    1

    t

    T

    r

    T

    t

    1

    t

    T

    r

    T

    ))

    1

    t

    (

    |

    )

    t

    (

    (

    P

    )

    P(

    (2)

    ))

    t

    (

    |

    )

    t

    (

    v

    (

    P

    )

    |

    V

    (

    P

    )

    1

    (

    w

    w

    w

    w

    w

    {

    }

    )

    T

    (

    ),...,

    2

    (

    ),

    1

    (

    T

    r

    w

    w

    w

    w

    =

    =

    =

    -

    =

    states

    hidden

    of

    sequence

    possible

    3

    t

    1

    t

    3

    2

    1

    ))

    1

    t

    (

    |

    )

    t

    (

    (

    P

    )).

    t

    (

    |

    )

    t

    (

    v

    (

    P

    })

    v

    ,

    v

    ,

    v

    ({

    P

    w

    w

    w

    )

    (

    P

    )

    |

    V

    (

    P

    )

    V

    (

    P

    T

    r

    r

    1

    r

    T

    r

    T

    T

    max

    w

    w

    =

    =

    ]

    B

    ,

    A

    ,

    [

    p

    l

    =

    [

    ]

    l

    w

    w

    w

    w

    w

    w

    w

    w

    w

    w

    w

    w

    |

    )

    T

    (

    V

    ),...,

    2

    (

    v

    ),

    1

    (

    v

    ),

    T

    (

    ),...,

    2

    (

    ),

    1

    (

    P

    max

    arg

    )

    T

    (

    ),...,

    2

    (

    ),

    1

    (

    :

    that

    such

    )

    T

    (

    ),...,

    2

    (

    ),

    1

    (

    )

    T

    (

    ),...,

    2

    (

    ),

    1

    (

    =

    )

    |

    V

    (

    P

    Max

    T

    l

    l