Vagrant + Spark + Zeppelin a toolbox to the Data Analyst (or Data Scientist)

Recently I built an environment to help me to teach Apache Spark, my initial thoughts were to use Docker but I found some issues specially when using older machines, so to avoid more blockers I decided to build a Vagrant image and also complement the package with Apache Zeppelin as UI. This Vagrant will build on Debian Jessie, with Oracle Java, Apache Spark 1.4.1 and Zeppelin (from the master branch).

Continue Reading »

Create your own dataset consuming Twitter API

Several tutorials have an assumption you own a data set. Often that is not the case and you just can’t take advantage of the tutorial because you don’t have data to play along. To comply with social networks Terms and Conditions you can’t publish your data sets, but you can create your own! Follow through these few commands.

Continue Reading »

How to create interactive tweets heatmaps

This posts shows how to create heatmaps of conversations taking place on Twitter, this is a proof of concept technic to learn more about our current datasets, this knowledge would be latter applied to the product development cycle. My objective here is to share a simple way to create a quick visualization and be able to make an internal demo.

Continue Reading »