When you visualize text data, I think some of the first questions you want to think about is following: what are the rea
Posted: Tue Jul 12, 2022 8:21 am
When you visualize text data, I think some of the firstquestions you want to think about is following: whatare the reasonable units of the text visualization? What are thesimplifying assumptions you make?
The question of visualization is closely connected to thequestion of analysis, because what we visualize is usually theresults of analysis. For text analysis, the most common unit ofanalysis is words. At the level of acharacter, you have only a few units; at the level of a sentence,you just have too many possibilities.
It's called "word cloud" or "tag cloud" and is one of the mostcommon ways to visualize text data. You simply count the occurrenceof each word (often combined with stemming (Links to anexternal site.) and stop-words (Links to an externalsite.) removal) and then scale words according to itsfrequency. Then arrange them compactly using some optimizationalgorithms.
Actually, it is quite old. Stanley Milgram (yes, he is theMilgram who did small-world experiments and the Milgram obedienceexperiments) asked Parisians about the landmarks of Paris and drawthis map according to the number of votes. It was in 1976, waybefore we have personal computers.
Question: Ok, now can you please tell me whatis the important simplifying assumption made here? When you countthe word frequency and visualize it (or when you perform many otherword-based analyses), what do you ignore? And when dothey become important?
The question of visualization is closely connected to thequestion of analysis, because what we visualize is usually theresults of analysis. For text analysis, the most common unit ofanalysis is words. At the level of acharacter, you have only a few units; at the level of a sentence,you just have too many possibilities.
It's called "word cloud" or "tag cloud" and is one of the mostcommon ways to visualize text data. You simply count the occurrenceof each word (often combined with stemming (Links to anexternal site.) and stop-words (Links to an externalsite.) removal) and then scale words according to itsfrequency. Then arrange them compactly using some optimizationalgorithms.
Actually, it is quite old. Stanley Milgram (yes, he is theMilgram who did small-world experiments and the Milgram obedienceexperiments) asked Parisians about the landmarks of Paris and drawthis map according to the number of votes. It was in 1976, waybefore we have personal computers.
Question: Ok, now can you please tell me whatis the important simplifying assumption made here? When you countthe word frequency and visualize it (or when you perform many otherword-based analyses), what do you ignore? And when dothey become important?