Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

About Point-in-Time 02

Mangle "Point-in-Time" is information about Mangle as a language design and development project, organized as an mdbook.

As the name suggests, "Point-in-Time" is a snapshot of information. If you are looking for more up-to-date information, check out the Mangle docs at the Mangle GitHub repo.

"Point-in-Time" is written in markdown format using mdbook and mdbook-presentation-preprocessor and published on the web. Therefore, it is possible to submit pull requests with corrections, comments, links to more updated information.

Author and Disclaimer

Burak Emir homepage.

All content, views, opinions are those of the author and not his employer.

What is in this issue 02

Today is 2024-04-12, it has been almost a year since the last issue. In this issue, I am gathering my thoughts on Mangle next steps.

Porting Mangle to Rust

I have started work on porting Mangle to Rust. The reason is the same one why Mangle was started in go in the first place: when pragmatic solutions, the environment determines what languages is the best fit.

Mangle was started in an environment that required go. Now I want to access Mangle functionality in places that use C++ and Rust. And I want to find out whether a particular approach to manipulating symbolic data can be made to work well in Rust (more on this below).

But ... why?

For pure single-machine usage, it is very feasible to just do Rust go interop using IPC or grpc. There is a section "Server" that talks about this, with demo code.

For a project like this, one can come with a few reasonable questions:

  • why port instead of e.g. working on features or documentation
  • after the port is done, is it not hard to maintain two versions
  • why have another datalog implementation when there are so many good ones?

These are all good questions, and I pondered them before embarking. The tentative answers and mitigations:

  • while the surface language is far from perfect, the few additions that are necessary for syntax seem doable in two implementations.
  • the type-checker is more work and unfinished, but that could be left for later.
  • Mangle is supposed to be not just "an implementation" but a specification; that is only convincing when there are at least two implementations.

Other Rust datalog implementations

There are indeed a lot of datalog implementations in Rust. Here are a few:

  • Datafrog - very limited and hard to read
  • Ascent - embed any type as relation, BYODS
  • Crepe - seminaive and stratified negation, Eric K. Zhang's thesis as accompanying reading material
  • DDlog - I don't know much about this one
  • asdi - Another Simplistic Datalog Implementation

These are all datalog implementations, and, like Mangle, some add extensions to Datalog. If we look beyond Rust, there are of course many more implementations.

As discussed elsewhere (pardon the approximate self-quoting - this is just my opinion):

I think it is helpful to see datalog as a formal, conceptual kernel (or "toy programming language" in the famous Alice Book "Foundations of Databases"). When we look at the functional programming languages, we do not usually see them as a dozen of incompatible implementations of lambda calculus.

The compilation technique of Ascent and Crepe is certainly something that would make sense for Mangle.

I should study these Rust datalog implementations more, but here is clear difference: already today, Mangle has support for deferred computation that adds back some of the PROLOG top-down evaluation possibilities, without adding the problems like order-sensitivity that come from PROLOGs non-declarative execution model. Tomorrow, I may want to add equality saturation or convenient syntax to encode algebraic datatypes - this could only happen in Mangle.

Compilation left for later

One issue with Mangle is that datalog evaluation is essentially interpreted. This may be helpful for evaluating queries at runtime, and when data fits in memory then the lightweight computation does not really matter. However, there are scenarios where it is not possible to use Mangle now:

  • working with large data sources that does not fit in-memory.
  • when there is more computation to do than the immediate-consequence operator

For now, the Rust port of Mangle will stick as close as possible to the golang implementation.

The challenge is to avoid the creep of features that make compilation impossible. This might already have happened, but since Mangle is still far from 1.0, there is still room to maneuver.

Finally, if it turns out that change to Mangle-golang are not possible, then it will still be possible to evolve Mangle-rust to the compilation model while keeping golang interpreter intact.

Persistence

An in-memory database is nice, but persistent storage is important for two use cases:

  • in order to make updates and keep them
  • in order to deal with larger volumes of data

These can be different strategies for these.

A model for updates

Let's start with making updates.

  • if the size of that data still comfortably fits in memory, then updates can be made in memory and the data saved.
  • write-ahead log: if writing everything is slow, update operations can be logged ahead-of-time before affecting db state, and regular checkpointing applying the updates on disk.

This is all work, but it is predictable work. Instead of adding these to Mangle, it makes a lot more sense to connect Mangle to an existing LSM-based DB like RocksDB or Cloud Spanner.

Large Volumes

Instead of serving from memory, we can read from persistent storage for every query. This is what most databases do. If latency does not play a role, then this persistent storage could be files from block storage.

Again this is mostly integration work. Connecting Mangle to these systems seems o come with many particular things that distract from the goal of having a simple to understand spec and implementation.

Server

I built a small demo server for Mangle how such an integration into a wider system could look.

The choice of tech is protobuf and grpc. These are straightforward (enough) to use, due to the availability of the protobuf compiler and supporting libraries.

Find the code here: https://github.com/burakemir/mangle-service

There are many ways in which this cannot be called a "real" database - but there are also many ways in which is beats real database: serving from memory, supporting expressive, recursive queries...

As far as proofs-of-concepts go, this little demo code also shows that the interface to the Mangle library are clumsy and worth improving.