Problem set 3 Q1-Working with lines (20 points) The code below reads in the text of "Alice in wonderland" from Project G
-
- Site Admin
- Posts: 899603
- Joined: Mon Aug 02, 2021 8:13 am
Problem set 3 Q1-Working with lines (20 points) The code below reads in the text of "Alice in wonderland" from Project G
Q2-Lines into a string (20 points) Take the resulting line of lines in book and convert it into one long string (separated by spaces) called book string. In [1: # YOUR CODE HERE raise NotimplementedError() In 1: assert isinstance(book string, str) assert len(book string) 106933 assert book string[1000:1026] 'd (as well as she could, f Q3-Clean the string (20 points) Clean book string so that it contains only lowercase letters and spaces. Convert uppercase letters to lowercase letters, and remove anything that is not an english letter or a space. FYI: An easy way to get a list of all the unique items in a sequence (such as characters in a string) is by converting that sequence into a set with set(). When you are done with your string cleaning, there should only be 27 unique characters in book_string: space, and the lowercase english letters. In 1:#YOUR CODE HERE raise NotImplementedError() In 1: assert len(set(book_string)) = 27 # only contains 26 english Letters and space assert book_string.count('a') 9802# expected number of 'a' characters assert set (book string) set(' abcdefghijklmnopqrstuvwxyz)
Q4 - Count the words (20 points) book string should now be a clean string of words, separated by one or more spaces. Count the number of occurrences of each unique word, and store that in a dictionary called word_count In 1# YOUR CODE HERE raise NotImplementedError() In 1: assert all([w.isalpha() for w in word_count.keys()]) all keys in word_count are composed only of alphabetic characters assert word_count['a'] 690 assert word count['alice'] == 385 Q5 - Get the most frequent words (20 points) Find the top 10 most frequent words counted in word_count, and save them (in order from most to least frequent) in a list called top 10 In 1: YOUR CODE HERE raise NotImplementedError() In assert len(top 10) 10 'the assert top 10[0] assert top 10[-1]/in