I’ve been working on analytics/big data field for 10+ years, during this time I’ve been working mostly with MySQL, MongoDB, Redis and Cassandra. Just a couple of years ago I started to really pay attention to Postgres, and my regret is not getting into it earlier… On this post I try to enumerate a few features I’m using and why I think you should try it too, before jumping into the architectural and operational complexity of multiple NoSQL.
Document, Search, Time series, GiS? You can have it all
Through its rich extensions model, Postgres can handle several use-cases:
Document (JSON) support
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Did I mentioned the JSON can be fully indexed? :D
Full Text Search support
Although we are not talking about a full-bloom search engine, indeed Postgres full-text search is Good Enough!. It is possible to create a CTE to support faceted search, if you don’t mind a hacky solution. The only downside of using this FTE engine is the query syntax, I would love to use a more standard Lucene syntax for queries.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
One of the downsides of RDBMS for time-series are the access pattern hot-cold and the fact it becomes slower as data pile up, even if you don’t need to access it. Extensions like Timescale solved this problem by managing table partition and maintenance automatically - by the time of writing limited to a single node.
CitusData have an extension which distributes your data and your queries across multiple nodes, so you can have horizontal scalability!
Window Functions provide complex and yet common operations for time series, Postgres has a rich set of functions - have to say, how I missed them when I had to manually implement
over() because Cassandra doesn’t support not even
Geographic Information System (GiS)
Everything on one box
Not a small thing to consider is having all those features on the same box results in simpler architecture, less updates/synchronization among systems, less drivers, less languages and protocols to learn and maintain.
It is always possible to find a useful extension to help you achieve more, a few I like:
- pg_cron: you can schedule tasks cron-style. A common use-case to me is periodically refresh materialize views.
- PostgresHLL: a set-like structure used for distinct value counting with tunable precision … Example, in 1280 bytes hll can estimate the count of tens of billions of distinct values with only a few percent error.
- CMS TopN: Count-Min Sketch and The Top-N implementation
What to dislike
Poor Client UI (for OSX)
When comparing most of the UIs with Sequel Pro (for MySQL only), they are not complete and/or are weird to use. During my research - totally subjective and not comprehensive - the best FOSS solution was pgAdmin. Currently I’m “almost” happy with Postico.
Don’t take only my word for it, Salvatore Sanfilippo, creator of Redis, wrote this tweet:
Some NoSQL databases came to earth and died for us, to teach us how to use Postgres.— Salvatore Sanfilippo (@antirez) September 15, 2017
If you got as excited about Postgres as I’m, my suggestion is to read PostgreSQL: Up and Running, 3rd Edition - by Leo S. Hsu, Regina O. Obe it is a very comprehensive book of basics and advanced features.
What do you think?
Architectures most of time doesn’t have one right answer, how do you felt about these ideas? How would you design your system?