Python: Computing Conditional Unigram Probabilities: build the model giving us the conditional probabilities of the next
Posted: Sat Feb 19, 2022 3:22 pm
Python: Computing Conditional Unigram Probabilities: build the
model giving us the conditional probabilities of the next token
given the N-1 previous tokens. The implementation of the required
class method extract_conditional_probabilities() amounts to
building and storing a probability distribution over all possible
following tokens for each N- 1-gram which occurred in the corpus.
The lookup structure will be stored in the value of the
cond_prob instance variable. (BTW: the notation (token) is
necessary in the bigram case to enforce that a unary tuple is
looked up, because (token) == token in Python). The recommended
logical structure of your implementation is as follows:
for each N-gram contained as a key in prob
split the N-gram into the first N-1 tokens (the
"mgram") and the final unigram
if the mgram is not yet a key in cond_prob, store a
new dictionary under that key
set the value for the unigram in cond_prob[mgram] to
the probability of the N-gram
for every dictionary in the values of cond_prob
add up the values assigned to all unigram keys
divide the value under each unigram by the sum of
values
Code:
def extract_conditional_probabilities(self):
"""
Compute the probability distribution
over the next tokens given an n-1-gram.
"""
model giving us the conditional probabilities of the next token
given the N-1 previous tokens. The implementation of the required
class method extract_conditional_probabilities() amounts to
building and storing a probability distribution over all possible
following tokens for each N- 1-gram which occurred in the corpus.
The lookup structure will be stored in the value of the
cond_prob instance variable. (BTW: the notation (token) is
necessary in the bigram case to enforce that a unary tuple is
looked up, because (token) == token in Python). The recommended
logical structure of your implementation is as follows:
for each N-gram contained as a key in prob
split the N-gram into the first N-1 tokens (the
"mgram") and the final unigram
if the mgram is not yet a key in cond_prob, store a
new dictionary under that key
set the value for the unigram in cond_prob[mgram] to
the probability of the N-gram
for every dictionary in the values of cond_prob
add up the values assigned to all unigram keys
divide the value under each unigram by the sum of
values
Code:
def extract_conditional_probabilities(self):
"""
Compute the probability distribution
over the next tokens given an n-1-gram.
"""