Page 1 of 1

Word clouds considered harmful? please find below article for more details to answer below question In his 2003 novel Pa

Posted: Tue Jul 12, 2022 8:21 am
by answerhappygod
Word clouds considered harmful?
please find below article for more details to answer belowquestion
In his 2003 novel Pattern Recognition, William Gibson created acharacter named Cayce Pollard with an unusual psychosomaticaffliction: She was allergic to brands. Even the logos on clothingwere enough to make her skin crawl, but her worst reactions weretriggered by the Michelin Tire mascot, Bibendum.
Although it’s mildly satirical, I can relate to this condition,since I have a similar visceral reaction to word clouds, especiallythose produced as data visualization for stories.
If you are fortunate enough to have no idea what a word cloudis, here is some background. A word cloud represents word usage ina document by resizing individual words in said documentproportionally to how frequently they are used, and then jumblingthem into some vaguely artistic arrangement. This technique firstoriginated online in the 1990s as tag clouds (famously described as“the mullets of the Internet“), which were used to display thepopularity of keywords in bookmarks.
More recently, a site named Wordle has made it radically simplerto generate such word clouds, ensuring their accelerated use asfiller visualization, much to my personal pain.
So what’s so wrong with word clouds, anyway? To understand that,it helps to understand the principles we strive for in datajournalism. At The New York Times, we strongly believe thatvisualization is reporting, with many of the same elements thatwould make a traditional story effective: a narrative that paresaway extraneous information to find a story in the data; context tohelp the reader understand the basics of the subject; interviewingthe data to find its flaws and be sure of our conclusions.Prettiness is a bonus; if it obliterates the ability to read thestory of the visualization, it’s not worth adding some wild newvisualization style or strange interface.
Of course, word clouds throw all these principles out thewindow. Here’s an example to illustrate. About six months ago, Ihad the privilege of giving a talk about how we visualized civiliandeaths in the WikiLeaks War Logs at a meeting of the New York CityHacks/Hackers. I wanted my talk to be more than “look what I did!”but also to touch on some key principles of good data journalism.What better way to illustrate these principles than with a foil, aGoofus to my Gallant?
And I found one: the word cloud. Please compare these twovisualizations — derived from the same data set — and thedifferences should be apparent:
Mapping a Deadly Day in Baghdad from The New York Timesword cloud of titles in the Iraq war logs from Fast CompanyI’m sorry to harp on Fast Company in particular here, since I’veseen this pattern across many news organizations: reporterssidestepping their limited knowledge of the subject material bypeering for patterns in a word cloud — like reading tea leaves atthe bottom of a cup. What you’re left with is a shoddyvisualization that fails all the principles I hold dear.
Every time I see a word cloud presented as insight, I die alittle inside.
For starters, word clouds support only the crudest sorts oftextual analysis, much like figuring out a protein by getting acount only of its amino acids. This can be wildly misleading; Icreated a word cloud of Tea Party feelings about Obama, and the twolargest words were implausibly “like” and “policy,” mainly becausethe importuned word “don’t” was automatically excluded. (Fairenough: Such stopwords would otherwise dominate the word clouds.) Aphrase or thematic analysis would reach more accurate conclusions.When looking at the word cloud of the War Logs, does the equalsizing of the words “car” and “blast” indicate a large number ofreports about car bombs or just many reports about cars orexplosions? How do I compare the relative frequency of lesser-usedwords? Also, doesn’t focusing on the occurrence of specific wordsinstead of concepts or themes miss the fact that different reportsabout truck bombs might be use the words “truck,” “vehicle,” oreven “bongo” (since the Kia Bongo is very popular in Iraq)?
Of course, the biggest problem with word clouds is that they areoften applied to situations where textual analysis is notappropriate. One could argue that word clouds make sense when thepoint is to specifically analyze word usage (though I’d stillsuggest alternatives), but it’s ludicrous to make sense of acomplex topic like the Iraq War by looking only at the words usedto describe the events. Don’t confuse signifiers with what theysignify.
And what about the readers? Word clouds leave them to figure outthe context of the data by themselves. How is the reader to knowfrom this word cloud that LN is a “Local National” or COP is“Combat Outpost” (and not a police officer)? Most interesting datarequires some form of translation or explanation to bring thereader quickly up to speed, word clouds provide nothing in thatregard.
Visualization is reporting, with many of the same elements thatwould make a traditional story effective.
Furthermore, where is the narrative? For our visualization, wechose to focus on one narrative out of the many within the Iraq WarLogs, and we displayed the data to make that clear. Word clouds, onthe other hand, require the reader to squint at them likestereograms until a narrative pops into place. In this case, youcan figure out that the Iraq occupation involved a lot of IEDs andexplosions. Which is likely news to nobody.
As an example of how this might lead the reader astray, weinitially thought we saw surprising and dramatic rise in sectarianviolence after the Surge, because of the word “sect” was appearingin many more reports. We soon figured out that what we were seeinghad less to do with violence levels and more to do withbureaucracy: the adoption of new Army requirements requiring thereporting of the sect of detainees. Of course, the horrificviolence we visualized in Baghdad was sectarian, but this was notsomething indicated in the text of the reports at the time. If wehad visualized the violence in Baghdad as a series of word cloudsfor each year, we might have thought that the violence was notsectarian at all.
In conclusion: Every time I see a word cloud presented asinsight, I die a little inside. Hopefully, by now, you canunderstand why. But if you are still sadistically inclined enoughto make a word cloud of this piece, don’t worry. I’ve got youcovered.
question a: Do you agree with the author? Arethere benefits and drawbacks that are not discussed by the author?Can you argue the same way or the other way around? When and how doyou want to use word clouds?
question b: Can you find other examples of textvisualization that effectively use more than word frequencies?