How-To: Accessing Socialmetrix Quantum API to Create custom Social Dashboards

This is the first post of a series on how to use Socialmetrix Quantum as datasource, which enables you to create custom dashboards or ingest social data into your internal systems empowering your big data initiatives.

To get started, we will login into your Quantum account and get the API authentication token. Don’t have an account yet? Get a free trial!

2015-10-13

https://blog.arjon.es/2015/how-to-accessing-socialmetrix-quantum-api-to-create-custom-social-dashboards/

Making Hadoop 2.6 + Spark-Cassandra driver play nice together

We have been using Spark Standalone deploy for more than one year now, but recently I tried to use Azure’s HDInsight which runs on Hadoop 2.6 (YARN deploy).

After provisioning the servers, all small tests worked fine, I have been able to run Spark-Shell, read and write to Blob Storage, until I tried to write to Datastax Cassandra cluster which constantly returned a error message: Exception in thread "main" java.io.IOException: Failed to open native connection to Cassandra at {10.0.1.4}:9042

2015-10-12

https://blog.arjon.es/2015/making-hadoop-2.6--spark-cassandra-driver-play-nice-together/

Reading compressed data with Spark using unknown file extensions

This post could also be called Reading .gz.tmp files with Spark. At Socialmetrix we have several pipelines writing logs to AWS S3, sometimes Apache Flume fails on the last phase to rename the final archive from .gz.tmp to .gz, therefore those files are unavailable to be read by SparkContext.textFile API. This post presents our workaround to process those files.

2015-10-02

https://blog.arjon.es/2015/reading-compressed-data-with-spark-using-unknown-file-extensions/

Resumen del Taller: Introducción al Desarrollo de Aplicaciones para Big Data

Durante el mes de Agosto, Juan Pampliega y yo recibimos la invitación para armar un taller de Big Data en el Espacio Fundación Telefonica como un complemento a la exposición “Big Bang Data”. Este post es un resumen del evento y las referencias de lectura para los que no tuvieran la oportunidad de participar.

2015-09-04

https://blog.arjon.es/2015/resumen-del-taller-introduccion-al-desarrollo-de-aplicaciones-para-big-data/

Vagrant + Spark + Zeppelin a toolbox to the Data Analyst (or Data Scientist)

Recently I built an environment to help me to teach Apache Spark, my initial thoughts were to use Docker but I found some issues specially when using older machines, so to avoid more blockers I decided to build a Vagrant image and also complement the package with Apache Zeppelin as UI. This Vagrant will build on Debian Jessie, with Oracle Java, Apache Spark 1.4.1 and Zeppelin (from the master branch).

2015-08-23

https://blog.arjon.es/2015/vagrant--spark--zeppelin-a-toolbox-to-the-data-analyst-or-data-scientist/

Create your own dataset consuming Twitter API

Several tutorials have an assumption you own a data set. Often that is not the case and you just can’t take advantage of the tutorial because you don’t have data to play along. To comply with social networks Terms and Conditions you can’t publish your data sets, but you can create your own! Follow through these few commands.

2015-07-30

https://blog.arjon.es/2015/create-your-own-dataset-consuming-twitter-api/

#bash
#data

¿Por qué cambiar de Apache Hadoop a Apache Spark?

Esta charla describe la experiencia de Socialmetrix con más de un año usando Apache Spark en producción, las razones que nos llevaron al cambio de Hadoop+Hive a Spark y los hechos que tomamos en cuenta para soportaron la toma de esta decisión.

2015-07-01

https://blog.arjon.es/2015/por-que-cambiar-de-apache-hadoop-a-apache-spark/

How to create interactive tweets heatmaps

This posts shows how to create heatmaps of conversations taking place on Twitter, this is a proof of concept technic to learn more about our current datasets, this knowledge would be latter applied to the product development cycle. My objective here is to share a simple way to create a quick visualization and be able to make an internal demo.

2015-06-20

https://blog.arjon.es/2015/how-to-create-interactive-tweets-heatmaps/

Material de la charla “Creando una Arquitectura para Big Data Analytics” en ArqConf 2015

Esta charla fue presentada el 30 de Abril en la Conferencia de Arquitectura IT (ArqConf 2015). En la misma presenté las características deseadas, los desafíos y una propuesta de arquitectura para Big Data Analytics que permite computación en real-time.

2015-05-10

https://blog.arjon.es/2015/material-de-la-charla-creando-una-arquitectura-para-big-data-analytics-en-arqconf-2015/