If you never heard about Jupyter Notebook, I highly recommend you to check it out. It have been my primary platform to build reports and data driven case studies. On this post I’d like to show how I create a simple and isolated environment with a Bash script and Docker to run JupyterLab.
I’ve been working on analytics/big data field for 10+ years, during this time I’ve been working mostly with MySQL, MongoDB, Redis and Cassandra. Just a couple of years ago I started to really pay attention to Postgres, and my regret is not getting into it earlier… On this post I try to enumerate a few features I’m using and why I think you should try it too, before jumping into the architectural and operational complexity of multiple NoSQL.
Recently we had to move a full Cassandra backup to another cluster of machines (another Datacenter on Cassandra’s jargon). Although it can be achieved using DC replication we opted for a more conservative approach and not change production configurations neither increase its load due data streaming. This post is quick comparison to find out which tool would perform better for copying a large directory tree locally.
La última semana tuve la oportunidad de contar la experiencia de Socialmetrix instalando y configurando clusters de Datastax Analytics en Azure. Datastax brinda una solución comercial en un bundle, conteniendo Cassandra, Spark y Solr integrados. Las charlas se dieron en Argentina Big Data Meetup. Hosted by Jampp y el Nardoz Meetup. Hosted by Medallia
We run several processes that may take hours to complete and it is nice to be notified on a Slack channel when those processes finishes correctly. Using the Slack’s Incoming Webhooks API, a small bash script and a couple of tricks it is really simple!
This is the second post about Socialmetrix Quantum API, at this time we’ll use the API to show summary statistics about campaigns. Please refer to the first post in order to get your API token and basic API usage instructions.
We walk you through the process of creating a campaign and assinging posts to it through the web ui, once the information is loaded, we’ll extract this metrics using Quantum API.
Sometimes you just need data to learn how a algorithm works, to run a stress test or just to have a excuse to spin up several machines in a cluster and see how it crush the data. More often than not, it is incredibly hard to obtain data, and a few colleagues I’ve talked about had similar problem, so this post is a collection of links and references for datasets I know have been open source. Please contribute =)
Entrevista que nos hicieron desde La Nacion sobre redes sociales y la política.