Problem set 3 Q1-Working with lines (20 points) The code below reads in the text of "Alice in wonderland" from Project G

Post by **answerhappygod** » Tue Jul 12, 2022 8:12 am

: Problem Set 3 Q1 Working With Lines 20 Points The Code Below Reads In The Text Of Alice In Wonderland From Project G 1 (25.78 KiB) Viewed 27 times

: Problem Set 3 Q1 Working With Lines 20 Points The Code Below Reads In The Text Of Alice In Wonderland From Project G 2 (28.11 KiB) Viewed 27 times

: Problem Set 3 Q1 Working With Lines 20 Points The Code Below Reads In The Text Of Alice In Wonderland From Project G 3 (21.55 KiB) Viewed 27 times

Problem set 3 Q1-Working with lines (20 points) The code below reads in the text of "Alice in wonderland" from Project Gutenberg into a variable called book as a list of lines The first bunch of lines are just generic Project Gutenberg preamble, which we want to discard. The last line of the preamble begins with **** START OF THE PROJECT GUTENBERG EBOOK, and the actual book starts on the following line. Please replace book with a list that omits all the preamble (Le., so that the lines up to, and including the one that starts with START OF THE PROJECT GUTENBERG EBOOK are omitted. (Note: your final result should be a variable called book; so overwrite the existing book variable instead of creating a new variable with a different name) In [] with open('alice-in-wonderland.txt', '') as fp: book fp.readlines() M In [] YOUR CODE HERE raise NotImplementedError() In assert isinstance(book, list) assert len(book) 3737 assert book[6] 'Alice's Adventures in Wonderland\n" In 1]:
Q2-Lines into a string (20 points) Take the resulting line of lines in book and convert it into one long string (separated by spaces) called book string. In [1: # YOUR CODE HERE raise NotimplementedError() In 1: assert isinstance(book string, str) assert len(book string) 106933 assert book string[1000:1026] 'd (as well as she could, f Q3-Clean the string (20 points) Clean book string so that it contains only lowercase letters and spaces. Convert uppercase letters to lowercase letters, and remove anything that is not an english letter or a space. FYI: An easy way to get a list of all the unique items in a sequence (such as characters in a string) is by converting that sequence into a set with set(). When you are done with your string cleaning, there should only be 27 unique characters in book_string: space, and the lowercase english letters. In 1:#YOUR CODE HERE raise NotImplementedError() In 1: assert len(set(book_string)) = 27 # only contains 26 english Letters and space assert book_string.count('a') 9802# expected number of 'a' characters assert set (book string) set(' abcdefghijklmnopqrstuvwxyz)
Q4 - Count the words (20 points) book string should now be a clean string of words, separated by one or more spaces. Count the number of occurrences of each unique word, and store that in a dictionary called word_count In 1# YOUR CODE HERE raise NotImplementedError() In 1: assert all([w.isalpha() for w in word_count.keys()]) all keys in word_count are composed only of alphabetic characters assert word_count['a'] 690 assert word count['alice'] == 385 Q5 - Get the most frequent words (20 points) Find the top 10 most frequent words counted in word_count, and save them (in order from most to least frequent) in a list called top 10 In 1: YOUR CODE HERE raise NotImplementedError() In assert len(top 10) 10 'the assert top 10[0] assert top 10[-1]/in