vmx

the blllog.

Introducing Noise

2017-09-19 22:35

I meant to write this blog post for quite some time. It's my view on the new project I'm working on called Noise. I work together with Damien Katz on it full-time for already about a year now. Damien already blogged a bit about the incarnation of Noise.

I can't recall when Damien first told me about the idea, but I surely remember one meeting we had at Couchbase, were plenty of developers were packed in a small room in the Couchbase Mountain View office. Damien was presenting his idea on how flexible JSON indexing should work. It was based on an idea that came up a long time ago at IBM (see Damien's blog post for more information).

Then the years passed without this project actually happening. I've heard again about it when I was visiting Damien while I was in the Bay Area. He told me about his plan actually doing this for real. If I would join early i would become a founder of the project. It wasn't a light-hearted decision, but I eventually decided to leave Couchbase to work full-time on Noise.

Originally Damien created a prototype in C++. But as I was really convinced that Rust is the future for systems programming and databases, I started to port it to Rust before I visited him in the US. Although Damien was skeptical at first, he at least wanted to give it a try and during my stay I convinced him that Rust is the way to go.

Damien did the hard parts on the core of Noise and the Node.js bindings. I mostly spent my time getting an R-tree working on top of RocksDB. It took several attempts, but I think finally I found a good solution. Currently it's a special purpose implementation for Noise, but it could easily be made more generic, or adapted to other specific use cases. If you have such needs, please let me know. At this year's Global FOSS4G conference I presented Noise and its spatial capabilities to a wider audience. I'm happy with the feedback I got. People especially seem to enjoy the query language we came up with.

So now we have a working version which does indexing and has many query features. You can try out Noise online. There's also basic geospatial bounding box query support, which I'll blog more about once I've cleaned up the coded-in-rush-for-a-conference mess and have merged into the master branch.

There are exciting times ahead as now it's time to get some funding for the project. Damien and I don't want to do the venture capital based startup kind of thing, but rather try to find funding through other channels. This will also define the next steps. Noise is a library so it can be the basis for a scaled up distributed system, and/or to scale down into a nice small analytics system that you can run on your local hardware when you don't have access to the cloud.

So in case you read this, tried it out and think that this is exactly what you've been looking for, please tell me about your use case and perhaps you even want to help funding this project.

Categories: en, Noise, RocksDB, Rust, geo

An R-tree implementation for RocksDB

2017-02-14 22:35

It's long been my plan to implement an R-tree on top of RocksDB. Now there is a first version of it.

Getting started

Checkout the source code from my RocksDB rtree-table fork on Github, build RocksDB and the R-tree example.

git clone https://github.com/vmx/rocksdb.git
cd rocksdb
make static_lib
cd examples
make rtree_example

If you run the example it should output augsburg:

$ ./rtree_example
augsburg

For more information about how to use the R-tree, see the Readme file of the project.

Implementation

The nice thing about LSM-trees is that the index data structures can be bulk loaded. For now for my R-tree it's just a simple bottom up building with a fixed node size (default is 4KiB). The data is pre-sorted by the low value of the first dimension. This means that data has a total order, hence also sorted results based on the first dimension. The idea is based on the paper On Support of Ordering in Multidimensional Data Structures by Filip Křižka, Michal Krátký, Radim Bača.

The tree is far from optimal, but it is a good starting point. Currently only doubles are supported. In the future I'd like to support integers, fixed size decimals and also strings.

If you have a look at the source code and cringe because of the coding style, feel free to submit pull requests (my current C++ skills are sure limited).

Next steps

Currently it's a fork of RocksDB which surely isn't ideal. I've already mentioned it in last year's FOSS4G talk about the R-tree in RocksDB (warning: autoplay) that there are several possibilities:

  • Best (unlikely): Upstream merge
  • Good: Add-on without additional patches
  • Still OK: Be an easy to maintain fork
  • Worst case: Stay a fork

I hope to work together with the RocksDB folks to find a way to make such extensions easily possible with no (or minimal) code changes. Perhaps having stable interfaces or classes that can easily be overloaded.

Categories: en, RocksDB, geo

By Volker Mische

Powered by Kukkaisvoima version 7