Question 1 (7 points): Purpose: Practice combining dictionaries with file input, practice with data visualization Degree
-
- Site Admin
- Posts: 899603
- Joined: Mon Aug 02, 2021 8:13 am
Question 1 (7 points): Purpose: Practice combining dictionaries with file input, practice with data visualization Degree
Question 1 (7 points): Purpose: Practice combining dictionaries with file input, practice with data visualization Degree of Difficulty: Easy Wordclouds are a visualization tool that show how often a particular word occurs in a collection of text: the bigger the word, the more often it shows up. For this question, your job is to write a Processing program that reads a collection of words from a text file and displays the matching wordcloud. To keep things simple, we won't actually use a "cloud' shape: instead we'll just show all the words in a column but with different font sizes for each word depending on the word's frequency Input files We are providing you with an input file, tweets.txt. This file is a collection of all of the words from 60 selected tweets from a certain celebrity twitter account. You will write code for your program to read these words from file (described further below); note that this file has been formatted so that there is only one word on each line of the file. In addition, when analyzing text data, certain words like "the" and "is' occur frequently but are almost al- ways uninteresting. Therefore, we have also included a list of such words to exclude when creating your wordcloud. You can simply copy/paste the list statement from ignored.txt into your Processing program. Program Behaviour For this question, your program is not interactive, so you do not need to use interactive functions like Betup() or draw() (unless you really want to). Your program should display on the canvas all of the words that occur at least 5 times in the input file (excluding the ones from the ignored list), with the textSize for each word proportional to the frequency of that word. It should look approximately like the figure below. To do this, we can split the work into two parts as follows. Part 1: Load the Data For part 1 of this problem, you should create a dictionary that summarizes the data in the input file. The keys of your dictionary should be words (recall that the input file has 1 word per line) and the value associated with each word is the number of times that word occurs in the input file. As mentioned above, do not bother storing the "boring" words from the list of ignored words in the dictionary at all. Part 2: Display the Words Once you have prepared your dictionary of word-counts from Part 1, have your program display the words in a single column in the center of the canvas. Display only those words that occurred at least 5 times, and
cookies think ice dis nom cookie happy cream chocolate eat dat birthday boy today oh Figure 1 Word cloud for a celebrity twitter account. Can you guess who it is? for each word, use a text size of 3 times the count for that word (example: if the word "pizza' showed up 6 times in the input file, you would set the text size to 18 to display that word). • Hint 1: You may find the statement textAlign (CENTER) useful in order to get a nice layout. • Hint 2: Since the text size of each word will be different, you may find it slightly tricky to correctly space the words apart. You can deal with this by taking the text size of each word into account when updating the y-coordinate of each word.