Create your own dataset consuming Twitter API

2015-07-30

https://blog.arjon.es/2015/create-your-own-dataset-consuming-twitter-api/

#bash
#data

Several tutorials have an assumption you own a data set. Often that is not the case and you just can’t take advantage of the tutorial because you don’t have data to play along. To comply with social networks Terms and Conditions you can’t publish your data sets, but you can create your own! Follow through these few commands.

## OAuth2

Arguably OAuth2 needs a lot of heavy lift to authenticate when compared with other methods. To bypass the boilerplate we can use Curlicue, which is a small wrapper script that invokes curl with the necessary headers for OAuth.

Just download Curlicue and install it on your system. On my Mac I cloned the git repo and ran install:

git clone [email protected]:decklin/curlicue.git
install curlicue curl-encode curlicue-setup contrib/twitpull /usr/local/bin

# Setup

You need to create your Twitter’s application credentials. Create your own here and run your setup:

curlicue-setup \
  'https://api.twitter.com/oauth/request_token' \
  'https://api.twitter.com/oauth/authorize?oauth_token=$oauth_token' \
  'https://api.twitter.com/oauth/access_token' \
  ~/.credentials

## A Few Examples

Below there are a few examples on how to query Twitter:

## Tweets Sent by Me

twitpull -f ~/.credentials \
  statuses/user_timeline \
  screen_name=arjones \
  count=200 \
  include_rts=1 > arjones.json

## Tweets Mentioning Me

twitpull -f ~/.credentials \
  statuses/mentions_timeline \
  count=200 \
  contributor_details=true \
  include_rts=1 > arjones-mentions.json

## Searching Hashtags

twitpull -f ~/.credentials \
  search/tweets \
  'q=#scala OR #data' \
  count=100 \
  include_entities=true > my-dataset.json

## Consuming Streaming API

If you want to consume Twitter’s Streaming API there is the official client, hbc at github.

Mutable Ideas