Mutable Ideas

Notes and ideas about Java, Scala, Big Data, NoSQL, Quality and Software Deploy

Creating a Beautiful Tagcloud From Hashtags

Although tagcloud seems a little bit outdated and criticized visualization format, I have no doubt it can be useful sometimes. And if you can create one with only a few key strokes it is pretty sweet. Below I’ll show the technic of extracting Twitter #hashtags but you can use this technic to virtually any text source.

tagcloud

Running the above command on your Twitter data, you will extract the top 100 must frequent hashtags. Go ahead and edit the file manually to remove irrelevant or too frequent hashtags.

1
2
3
4
5
$ cat tweets.json | \
  jq -r '.entities.hashtags[].text' | tr 'A-Z' 'a-z' | \
  sort | uniq -c | sort -nr | \
  head -100 | awk '{print $2 ":" $1}' \
  > hashtags.txt

You may receive some error messages like this jq: error: Cannot iterate over null, this is because some tweets doesn’t contains any hashtags and jq throws a error when it tries to extract the text field. More about jq on this post.

The hashtags.txt file will looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
go:82263
r:76387
javascript:66695
c:60863
php:43428
java:29608
css:28545
python:22974
html5:22013
ruby:21729
...
...
...

Now go to Wordle Advanced and past the content of this archive. Save as PNG and you’re done!

If you prefer a more pythonic way, I found a excellent tutorial: A Wordcloud in Python

.

Comments