Mutable Ideas

Notes and ideas about Java, Scala, Big Data, NoSQL, Quality and Software Deploy

Reasons to Fall in Love for Postgres

I’ve been working on analytics/big data field for 10+ years, during this time I’ve been working mostly with MySQL, MongoDB, Redis and Cassandra. Just a couple of years ago I started to really pay attention to Postgres, and my regret is not getting into it earlier… On this post I try to enumerate a few features I’m using and why I think you should try it too, before jumping into the architectural and operational complexity of multiple NoSQL.

Document, Search, Time series, GiS? You can have it all

Through its rich extensions model, Postgres can handle several use-cases:

Document (JSON) support

I ingest complex nested JSONs from social networks, ie: Twitter Status (Tweet) and I need to access nested fields, enter jsonb type. You can do pretty advanced things:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
CREATE TABLE "statuses_raw" (
  "status_id" character varying(64) NOT NULL,
  "raw" jsonb DEFAULT '{}'::jsonb,
  PRIMARY KEY ("status_id")
);

-- 10 tweets mentioning #postgres ordered by followers
SELECT
  raw->>'created_at' AS created_at,
  raw->>'text' AS tweet_text,
  raw->'user'->>'name' AS user_name,
  raw->'entities'->'media'->0->>'media_url_https' AS photo,
  raw->'user'->>'followers_count' AS followers
FROM statuses_raw
WHERE raw @> '{"entities": {"hashtags": [{"text": "Postgres"}] }}'
ORDER BY (raw->'user'->>'followers_count')::bigint DESC
LIMIT 10;

Did I mentioned the JSON can be fully indexed? :D

Full Text Search support

Although we are not talking about a full-bloom search engine, indeed Postgres full-text search is Good Enough!. It is possible to create a CTE to support faceted search, if you don’t mind a hacky solution. The only downside of using this FTE engine is the query syntax, I would love to use a more standard Lucene syntax for queries.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
CREATE EXTENSION unaccent;

CREATE TEXT SEARCH CONFIGURATION pg_catalog.es ( COPY = spanish );

ALTER TEXT SEARCH CONFIGURATION pg_catalog.es ALTER MAPPING
FOR hword, hword_part, word WITH unaccent, spanish_stem;

ALTER DATABASE dashboard SET default_text_search_config = 'pg_catalog.es';
SET default_text_search_config = 'pg_catalog.es';

-- Search for the word 'bueno' using 'spanish' config
SELECT *
FROM tweets
WHERE doc_search @@ plainto_tsquery('es', 'buen día')
limit 10;

Time-series

One of the downsides of RDBMS for time-series are the access pattern hot-cold and the fact it becomes slower as data pile up, even if you don’t need to access it. Extensions like Timescale solved this problem by managing table partition and maintenance automatically - by the time of writing limited to a single node.

CitusData have an extension which distributes your data and your queries across multiple nodes, so you can have horizontal scalability!

Window Functions provide complex and yet common operations for time series, Postgres has a rich set of functions - have to say, how I missed them when I had to manually implement rank(), over() because Cassandra doesn’t support not even group by

Geographic Information System (GiS)

PostGIS is a whole new dimension by itself, it is a lot of fun playing around with coordinates and matching elements into polygons with ST_Contains, ST_DWithin, etc

Everything on one box

Not a small thing to consider is having all those features on the same box results in simpler architecture, less updates/synchronization among systems, less drivers, less languages and protocols to learn and maintain.

Other goodies

It is always possible to find a useful extension to help you achieve more, a few I like:

  • pg_cron: you can schedule tasks cron-style. A common use-case to me is periodically refresh materialize views.
  • PostgresHLL: a set-like structure used for distinct value counting with tunable precision … Example, in 1280 bytes hll can estimate the count of tens of billions of distinct values with only a few percent error.
  • CMS TopN: Count-Min Sketch and The Top-N implementation

What to dislike

Poor Client UI (for OSX)

When comparing most of the UIs with Sequel Pro (for MySQL only), they are not complete and/or are weird to use. During my research - totally subjective and not comprehensive - the best FOSS solution was pgAdmin. Currently I’m “almost” happy with Postico.

What’s next

Don’t take only my word for it, Salvatore Sanfilippo, creator of Redis, wrote this tweet:

If you got as excited about Postgres as I’m, my suggestion is to read PostgreSQL: Up and Running, 3rd Edition - by Leo S. Hsu, Regina O. Obe it is a very comprehensive book of basics and advanced features.

What do you think?

Architectures most of time doesn’t have one right answer, how do you felt about these ideas? How would you design your system?

Comments