Interesting Data-Related Blogs and Articles – Week of August 4, 2019


Announcing PartiQL: One query language for all your data

AWS has open-sourced a new query language, based on SQL (“SQL-compatible”, as the blog post puts it). Essentially, it is a super-set of SQL that requires the JRE. This project bears watching, but will only be successful outside AWS if data store query engines add support for PartiQL. To further that aim, AWS is releasing the specification and reference implementation(written in Kotlin). The AWS services that support PartiQL thus far are:

  • Amazon S3 Select
  • Amazon Glacier Select
  • Amazon Redshift Spectrum
  • Amazon Quantum Ledger Database

Build highly available MySQL applications using Amazon Aurora Multi-Master

Not yet available for the PostgreSQL-compatible version of Amazon Aurora. Let’s hope it will be.

EBS default volume type updated to GP2

Per the announcement: “GP2 volumes offer lower latency and higher throughput than Standard volumes.”

Techtalk: Best Practices for Running Spark Applications Using Spot Instances on EMR

August 28. This is a 300-level session.


PostgreSQL: Regular expressions and pattern matching

The author intends this to be the first in a series of posts on using regular expressions (RE) with PostgreSQL. This post overviews four RE operators.


Python in Visual Studio Code – August 2019 Release

Visual Studio Code has been coming on strong as a popular cross-platform and cross-language IDE. This release furthers the support for Python and Jupyter notebooks.


Transitioning into the tidyverse (part 2)

A pair of posts presenting the ecosystem of packages for R (my cat Cookie’s personal favorite being purrr) that promotes a methodology for data analysis.

Software Updates

DBeaver 6.1.4 (2019-08-04)

Among the fixes are these:

  • PostgreSQL:
    • Array data type handler was fixed
    • Indexes metadata reading was fixed
    • Execution plan visualization was fixed for CTE nodes
  • SQL Server:
    • Support of identity columns creation was added
    • Session manager was fixed
  • Oracle:
    • Scheduled jobs metadata reading was fixed
    • Session management was fixed for RAC mode

General Data-Related

DFLib – a lightweight, pure Java implementation of DataFrame

Dataframes (think database tables, spreadsheets) are foundational data structures in Python, R, and Spark. This library provides similar functionality when writing Java.

Liquibase Improving Community Support

In a sign of the continued strength of Liquibase as a community project, this post announces the hiring of an open source community manager.

Traversing the Land of Graph Computing and Databases

Based on the author’s talk at Pycon X, this is a high-level overview of the resurgence of interest in graph-based technology and graph databases.

Upcoming Conferences of Interest

Strata Data Conference – New York, September 23-26

This is one of a series of O’Reilly-sponsored conferences on big data and data science. There are two coming up next year: Strata Data Conference – San Jose (March 15-18, 2020) and Strata Data Conference – London (April 20-23, 2020).

Classic Paper or Reference of the Week

Since I linked to a couple of blog posts above on the R “tidyverse”, I thought some folks might be interested in the paper by Hadley Wickham that started it all: Tidy Data, as published in the Journal of Statistical Software.

Data Technology of the Week


An Apache 2.0 licensed-project that supports a property graph data model on top of PostgreSQL (version 10.3). AgensGraph supports both ANSI-SQL and openCypher for querying. There’s an enterprise version from Bitnine.

Metadata Standard of the Week

Friend of a Friend or “FOAF” is an ontology, using RDF and OWL, to describe persons, activities, and relations to other people and objects. An example use of FOAF is to describe a social network.

Why Compile MySQL?

I’ve got an upcoming post that discusses the steps for compiling MySQL from source.  Before I get to that topic, though, I thought a preliminary post on the reasons for doing so would be helpful.

Almost every open-source database project provides an installer for various platforms.  Why then would you bother to compile MySQL (or any other database)?  There are number of good reasons.

  • You have an opportunity to read the code

If you have the time and inclination, grabbing and compiling the source code is the perfect excuse to spend some time reading the code, particularly the parts of the code that interest you.  How exactly does PostgreSQL implement k-nearest-neighbor indexing (new in version 9.1)?

Sometimes the source code is installed along with binary versions when you use an installer, but this is often not the case.  If you download the source to compile it, you know you’ll have it available to read at your leisure.

  • Some options are only available by compiling the source

You might want to try out an atypical storage engine, but you can only do that by mixing in support for that engine when you compile the code itself.  For example, you want to use the FEDERATED engine in MySQL, but that doesn’t come as part of the installer-produced installation.  You really have no choice but to compile MySQL and include support for the FEDERATED engine as part of the configure-and-compile process.

  • You can more easily set some defaults

You might not care for the default database character set.   Sometimes DBAs or developers forget to set a character set when creating a table.  For those times, you may want to enforce a standard that all database tables use the UTF-8 character set (unless specifically overridden).  When configuring MySQL for compilation use the -DDEFAULT_CHARSET=utf8 command line option.

  • You want to get an up-to-date version, including the latest patches

Want the latest and greatest version of the server, including the most recently committed changes for bugs and security holes?  No problem!  Just download and compile the most current source, and you will automatically get the latest bug fixes and security patches.

In future posts, I’ll discuss the steps for compiling MySQL and PostgreSQL from source code.