Mutable Ideas

Notes and ideas about Java, Scala, Big Data, NoSQL, Quality and Software Deploy

Creating a Beautiful Tagcloud From Hashtags

Although tagcloud seems a little bit outdated and criticized visualization format, I have no doubt it can be useful sometimes. And if you can create one with only a few key strokes it is pretty sweet. Below I’ll show the technic of extracting Twitter #hashtags but you can use this technic to virtually any text source.

Tips & Tricks to Migrate MySQL Between Datacenters

Most of our data are stored on MySQL and Cassandra, MySQL was the primary data-store when we started up the company. Currently our MySQL workload is located at AWS RDS and we would like to give a try to Microsoft Azure. This writing is to document a few tricks we learned to reduce the total time of dump, transfer and restore. Hope it can help you too.

Entrevista 90.5FM Tribunales - Aldea Global - Elecciones

El 11 de noviembre fue invitado a participar del programa Aldea Global de la rádio FM Tribunales 90.5 donde conversamos sobre el uso de redes sociales como herramienta para entender la opinión pública.

En esta oportunidad pude contar el trabajo que hacemos desde Socialmetrix para medir a los candidatos, entender el sentimiento del público y tópicos de conversación para ayudar los partidos a entender su público y sus deseos o quejas.

Making Hadoop 2.6 + Spark-Cassandra Driver Play Nice Together

We have been using Spark Standalone deploy for more than one year now, but recently I tried to use Azure’s HDInsight which runs on Hadoop 2.6 (YARN deploy).

After provisioning the servers, all small tests worked fine, I have been able to run Spark-Shell, read and write to Blob Storage, until I tried to write to Datastax Cassandra cluster which constantly returned a error message: Exception in thread "main" java.io.IOException: Failed to open native connection to Cassandra at {10.0.1.4}:9042

Vagrant + Spark + Zeppelin a Toolbox to the Data Analyst (or Data Scientist)

Recently I built an environment to help me to teach Apache Spark, my initial thoughts were to use Docker but I found some issues specially when using older machines, so to avoid more blockers I decided to build a Vagrant image and also complement the package with Apache Zeppelin as UI. This Vagrant will build on Debian Jessie, with Oracle Java, Apache Spark 1.4.1 and Zeppelin (from the master branch).

Create Your Own Dataset Consuming Twitter API

Several tutorials have an assumption you own a data set. Often that is not the case and you just can’t take advantage of the tutorial because you don’t have data to play along. To comply with social networks Terms and Conditions you can’t publish your data sets, but you can create your own! Follow through these few commands.

¿Por Qué Cambiar De Apache Hadoop a Apache Spark?

Esta charla describe la experiencia de Socialmetrix con más de un año usando Apache Spark en producción, las razones que nos llevaron al cambio de Hadoop+Hive a Spark y los hechos que tomamos en cuenta para soportaron la toma de esta decisión.