Python: 1. Generating Random Tokens In Context: we have the probability distribution that can be used to sample plausibl

Post by **answerhappygod** » Sat Feb 19, 2022 3:22 pm

Python: 1. Generating Random Tokens In Context: we have the
probability distribution that can be used to sample plausible next
tokens given the previous N − 1 tokens. Implement this
functionality in a new class method generate_random_token(mgram),
which takes a N −1-gram in the tuple encoding, and randomly
generates a plausible next token by sampling from the probability
distribution you stored in cond_prob[mgram]. In Python 3.6 or
higher (the version you should be using), this can easily be done
using the choices() function from the package random. All you
really need to do for this task is to prepare two lists, and feed
them to choices() as arguments.
2. Generating Random Sentences:
The last step is to implement the method generate_random_sentence()
for generating a random sentence with the help of the
generate_random_token(mgram) function. The key idea is to
initialize the sentence with a list of N −1 instances of "BOS", and
adding random words based on the last N −1 of the current list
until "EOS" is generated for the first time. Removing the "BOS" and
"EOS" dummy tokens from the resulting list gives you the final
sentence to return. You can use the provided helper function
list2str(sentence) to format the generated sentence list as a
string.
Write your code here:
# 1
def generate_random_token(self, mgram):
"""
Generate a random next token based on
an n-1 gram,
taking into account the probability
distribution over the possible next tokens for that n-1-gram.
:param mgram: the n-1 gram to
generate the next token for.
:type mgram: a tuple (of length n-1) of
strings.
:return a random next token for the
n-1-gram.
:rtype str
"""
pass
#2
def generate_random_sentence(self):
"""
Generate a random sentence.
:return a random sentence
:rtype list[str]
"""
pass
def tokenize_smart(sentence):
"""
Tokenize the sentence into tokens (words,
punctuation).
:param sentence: the sentence to be
tokenized
:type sentence: str
:return: list of tokens in the sentence
:rtype: list[str]
"""
tokens = []
for word in re.sub(r" +", " ",
sentence).split():
word = re.sub(r"[\"„”“»«`\(\)]", "",
word)
if word != "":
if word[-1] in
".,!?;:":
if
len(word) == 1:

tokens += [word]
else:

tokens += [word[:-1], word[-1]]
else:

tokens.append(word)
return tokens
def list2str(sentence):
"""
Convert a sentence given as a list of strings to the
sentence as a string separated by whitespace.

:param sentence: the string list to be joined
:type sentence: list[str]
:return: sentence as a string, separated by
whitespace
:rtype: str
"""
sentence = " ".join(sentence)
sentence = re.sub(r" ([\.,!\?;:])", r"\1",
sentence)
return sentence