- 20 Consider The Following Toy Example Training Data S I Am Sam S S Sam I Am S S Sam I Like S S Sam 1 (49.53 KiB) Viewed 34 times
(20%) Consider the following toy example: Training data: I am Sam Sam I am Sam I like Sam
-
- Site Admin
- Posts: 899603
- Joined: Mon Aug 02, 2021 8:13 am
(20%) Consider the following toy example: Training data: I am Sam Sam I am Sam I like Sam
(20%) Consider the following toy example: Training data: <s> I am Sam </s> <s> Sam I am </s> <s> Sam I like </s> <s> Sam I do like </s> <s> do I like Sam </s> Assume that we use a bigram language model with Laplace smoothing based on the above training data. a) Give the following bigram probabilities estimated by this model: P(dol <s>) P(do Sam) P(Saml <s>) P(Sam|do) P(1|Sam) P(I do) P(like) Note that for each word Wn-1, we count an additional bigram for each possible continuation Wn. Consequently, we have to take the words into consideration and also the symbol <s>. b) Calculate the probabilities of the following sequences according to this model: (1) <s> do Sam I like (2) <s> Sam do I like Which of the two sequences is more probable according to our LM?