Mutable Ideas

Notes and ideas about Java, Scala, Big Data, NoSQL, Quality and Software Deploy

Create Your Own Dataset Consuming Twitter API

Several tutorials have an assumption you own a data set. Often that is not the case and you just can’t take advantage of the tutorial because you don’t have data to play along. To comply with social networks Terms and Conditions you can’t publish your data sets, but you can create your own! Follow through these few commands.

twitter-api

OAuth2

Arguably OAuth2 needs a lot of heavy lift to authenticate when compared with other methods. To bypass the boilerplate we can use Curlicue, which is a small wrapper script that invokes curl with the necessary headers for OAuth.

Just download Curlicue and install it on your system. On my Mac I cloned the git repo and ran install:

1
2
git clone git@github.com:decklin/curlicue.git
install curlicue curl-encode curlicue-setup contrib/twitpull /usr/local/bin

Setup

You need to create your Twitter’s application credentials. Create your own here and run your setup:

1
2
3
4
5
curlicue-setup \
  'https://api.twitter.com/oauth/request_token' \
  'https://api.twitter.com/oauth/authorize?oauth_token=$oauth_token' \
  'https://api.twitter.com/oauth/access_token' \
  ~/.credentials

A Few Examples

Below there are a few examples on how to query Twitter:

Tweets Sent by Me

1
2
3
4
5
twitpull -f ~/.credentials \
  statuses/user_timeline \
  screen_name=arjones \
  count=200 \
  include_rts=1 > arjones.json

Tweets Mentioning Me

1
2
3
4
5
twitpull -f ~/.credentials \
  statuses/mentions_timeline \
  count=200 \
  contributor_details=true \
  include_rts=1 > arjones-mentions.json

Searching Hashtags

1
2
3
4
5
twitpull -f ~/.credentials \
  search/tweets \
  'q=#scala OR #data' \
  count=100 \
  include_entities=true > my-dataset.json

Consuming Streaming API

If you want to consume Twitter’s Streaming API there is the official client, hbc at github.

Comments