Mutable Ideas

Notes and ideas about Java, Scala, Big Data, NoSQL, Quality and Software Deploy

Reasons to Fall in Love for Postgres

I’ve been working on analytics/big data field for 10+ years, during this time I’ve been working mostly with MySQL, MongoDB, Redis and Cassandra. Just a couple of years ago I started to really pay attention to Postgres, and my regret is not getting into it earlier… On this post I try to enumerate a few features I’m using and why I think you should try it too, before jumping into the architectural and operational complexity of multiple NoSQL.

Which Is the Best Tool for Copying a Large Directory Tree Locally?

Recently we had to move a full Cassandra backup to another cluster of machines (another Datacenter on Cassandra’s jargon). Although it can be achieved using DC replication we opted for a more conservative approach and not change production configurations neither increase its load due data streaming. This post is quick comparison to find out which tool would perform better for copying a large directory tree locally.

Accessing Campaign Metrics on Socialmetrix Quantum

This is the second post about Socialmetrix Quantum API, at this time we’ll use the API to show summary statistics about campaigns. Please refer to the first post in order to get your API token and basic API usage instructions.

We walk you through the process of creating a campaign and assinging posts to it through the web ui, once the information is loaded, we’ll extract this metrics using Quantum API.

Where to Find Datasets to Learn Big Data & Data Science

Sometimes you just need data to learn how a algorithm works, to run a stress test or just to have a excuse to spin up several machines in a cluster and see how it crush the data. More often than not, it is incredibly hard to obtain data, and a few colleagues I’ve talked about had similar problem, so this post is a collection of links and references for datasets I know have been open source. Please contribute =)

Creating a Beautiful Tagcloud From Hashtags

Although tagcloud seems a little bit outdated and criticized visualization format, I have no doubt it can be useful sometimes. And if you can create one with only a few key strokes it is pretty sweet. Below I’ll show the technic of extracting Twitter #hashtags but you can use this technic to virtually any text source.