the blllog.

Introduction to Noise’s Node.js API

2017-12-21 11:40

In the previous blog post about Noise we imported data with the help of some already prepared scripts. This time it’s an introduction in how to use Noise‘s Promise-based Node.js API directly yourself.

The dataset we use is not a ready to use single file, but one that consists of several ones. The data is the “Realized Cost Savings and Avoidance” for US government agencies. I’m really excited that such data gets openly published as JSON. I wished Germany would be that advanced in this regard. If you want to know more about the structure of the data, there’s documentation about the [JSON Schmema], they even have a “OFCIO JSON User Guide for Realized Cost Savings” on how to produce the data out of Excel.

I’ve prepared a repository containing the final code and the data. But feel free to follow along this tutorial by yourself and just point to the data directory of that repository when running the script.

Let’s start with the boilerplate code for reading in those files and parsing them as JSON. But first create a new package:

mkdir noise-cost-savings
cd noise-cost-savings
npm init --force

You can use --force here as you probably won’t publish this package anyway. Put the boilerplate code below into a file called index.js. Please note that the code is kept as simple as possible, for a real world application you surely want better error handling.

#!/usr/bin/env node
'use strict';

const fs = require('fs');
const path = require('path');

// The only command line argument is the directory where the data files are
const inputDir = process.argv[2];
console.log(`Loading data from ${inputDir}`);

fs.readdir(inputDir, (_err, files) => {
  files.forEach(file => {
    fs.readFile(path.join(inputDir, file), (_err, data) => {
      const json = JSON.parse(data);

const processFile = (data) => {
  // This is where our actual code goes

This code should already run. Checkout my repository with the data into some directory first:

git clone https://github.com/vmx/blog-introduction-to-noises-nodejs-api

Now run the script from above as:

node index.js <path-to-directory-from-my–repo-mentioned-above>/data

Before we take a closer look at the data, let’s install the Noise module. Please note that you need to have Rust installed (easiest is probably through rustup) before you can install Noise.

npm install noise-search

This will take a while. So let’s get back to code. Load the noise-search module by adding:

const noise = require('noise-search');

A Noise index needs to be opened and closed properly, else your script will hang and not terminate. Opening a new Noise index is easy. Just put this before reading the files:

const index = noise.open('costsavings', true);

It means that open an index called costsavings and create it if it doesn’t exist yet (that’s the boolean true). Closing the index is more difficult due to the asynchronous nature of the code. We can close the index only after all the processing is done. Hence we wrap the fs.readFile(…) call in a Promise. So that new code looks like this:

fs.readdir(inputDir, (_err, files) => {
  const promises = files.map(file => {
    return new Promise((resolve, reject) => {
      fs.readFile(path.join(inputDir, file), (err, data) => {
        if (err) {
          throw err;

        const json = JSON.parse(data);
  Promise.all(promises).then(() => {

If you run the script now it should print out the file names as before and terminate with a Done.. There got a directory called costsavings created after you ran the script. This is where the Noise index is stored in.

Now let’s have a look at the data files, e.g. the cost savings file from the Department of Commerce (or the JSON Schema), you’ll see that it has a single field called "strategies", which contains an array with all strategies. We are free to pre-process the data as much as we want before we insert it into Noise. So let’s create a separate document for every strategy. Our processFile() function now looks like:

const processFile = (data) => {
  data.strategies.forEach(async strategy => {
    // Use auto-generated Ids for the documents
    await index.add(strategy);

Now all the strategies get inserted. Make sure you delete the index (the costsavings directory) if you re-run the scripts, else you would end up with duplicated entries, as different Ids will be generated on every run.

To query the index you could use the Noise indexserve script that I’ve also used in the last blog post about Noise. Or we just add a small query at the end of the script after the loading is done. Our query function will do the query and output the result:

const queryNoise = async (query) => {
  const results = await index.query(query);
  for (const result of results) {

There’s not much to say, except it’s again a Promised-based API. And now hook up this function after the loading and before the index is closed. For that, replace the Promise.all(…) call with:

Promise.all(promises).then(async () => {
  await queryNoise('find {} return count()');

It’s a really simple query, it just returns the number of documents that are in there (644). After all this hard work, it’s time to make a more complicated query on this dataset to show that it was worth doing all this. Let’s return the total net savings of all agencies in 2017. Replace the query find {} return count() with:

find {fy2017: {netOrGross: == "Net"}} return sum(.fy2017.amount)

That’s $845m savings. Not bad at all!

You can learn more about the Noise Node.js API from the README at the corresponding repository. If you want to learn more about possible queries, have a look at the Noise Query Language reference.

Happy cost saving!

Categories: en, Noise, Node, JavaScript, Rust

Exploring data with Noise

2017-12-12 13:11

This is a quick introduction on how to explore some JSON data with Noise. We won’t do any pre-processing, but just load the data into Noise and see what we can do with it. Sometimes the JSON you get needs some tweaking before further analysis makes sense. For example you want to rename fields or numbers are stored as string. This exploration phase can be used to get a feeling for the data and which parts might need some adjustments.

Finding decent ready to use data that contains some nicely structured JSON was harder than I thought. Most datasets are either GeoJSON or CSV masqueraded as JSON. But I was lucky and found a JSON dump of the CVE database provided by CIRCL. So we’ll dig into the CVEs (Common Vulnerabilities and Exposures) database to find out more about all those security vulnerabilities.

Noise has a Node.js binding to get started easily. I won’t dig into the API for now. Instead I’ve prepared two scripts. One to load the data from a file containing new line separated JSON. And another one for serving up the Noise index over HTTP, so that we can explore the data via curl.


As we use the Node.js binding for Noise, you need to have Node.js, npm and Rust (easiest is probably through rustup) installed.

I’ve created a repository with the two scripts mentioned above plus a subset of the CIRCL CVE dataset. Feel free to download the full dataset from the CIRCL Open Data page (1.2G unpacked) and load it into Noise. Please note that Noise isn’t performance optimised at all yet. So the import takes some time as the hard work of all the indexing is done on insertion time.

git clone https://github.com/vmx/blog-exploring-data-with-noise
cd blog-exploring-data-with-noise
npm install

Now everything we need should be installed, let’s load the data into Noise and do a query to verify it’s installed properly.

Loading the data and verify installation

Loading the data is as easy as:

npx dataload circl-cve.json

For every inserted record one dot will be printed.

To spin up the simple HTTP server, just run:

npx indexserve circl-cve

To verify it does actually respond to queries, try:

curl -X POST -d 'find {} return count()'

If all documents got inserted correctly it should return


Everything is set up properly, now it’s time to actually exploring the data.

Exploring the data

We don’t have a clue yet, what the data looks like. So let’s start with looking at a single document:

curl -X POST -d 'find {} return . limit 1'
  "Modified": "2017-01-02 17:59:00.147000",
  "Published": "2017-01-02 17:59:00.133000",
  "_id": "34de83b0d3c547c089635c3a8b4960f2",
  "cvss": null,
  "cwe": "Unknown",
  "id": "CVE-2017-5005",
  "last-modified": {
    "$date": 1483379940147
  "references": [
  "summary": "Stack-based buffer overflow in Quick Heal Internet Security and earlier, Total Security and earlier, and AntiVirus Pro and earlier on OS X allows remote attackers to execute arbitrary code via a crafted LC_UNIXTHREAD.cmdsize field in a Mach-O file that is mishandled during a Security Scan (aka Custom Scan) operation.",
  "vulnerable_configuration": [],
  "vulnerable_configuration_cpe_2_2": []

The query above means: “Find all documents without restrictions and return it’s full contents. Limit it to a single result”.

You don’t always want to return all documents, but filter based on certain conditions. Let’s start with the word match operator ~=. It matches document which contains those words in a specific field, in our case "summary". As “buffer overflow” is a common attack vector, let’s search for all documents that contain it in the summary.

curl -X POST -d 'find {summary: ~= "buffer overflow"}'

That’s quite a long list of random characters. Noise assigns Ids to every inserted document if the document doesn’t contain a "_id" field. By default Noise returns such Ids of the matching documents. So no return value is equivalent to return ._id. Let’s return the CVE number of the matching vulnerabilities instead. That field is called "id":

curl -X POST -d 'find {summary: ~= "buffer overflow"} return .id'

If you want to know how many there are, just append a return count() to the query:

curl -X POST -d 'find {summary: ~= "buffer overflow"} return count()'

Or we can of course return the full documents to see if there are further interesting things to look at:

curl -X POST -d 'find {summary: ~= "buffer overflow"} return .'

I won’t post the output here, it’s way too much. If you scroll through the output, you’ll see that some contain a field named "capec", which is probably about the Common Attack Pattern Enumeration and Classification. Let’s have a closer look at one of those, e.g. from “CVE-2015-8388”:

curl -X POST -d 'find {id: == "CVE-2015-8388"} return .capec'
    "id": "15",
    "name": "Command Delimiters",
    "prerequisites": …
    "related_weakness": [
    "solutions": …
    "summary": …

This time we’ve used the exact match operator ==. As the CVEs have a unique Id, it only returned a single document. It’s again a lot of data, we might only care about the CAPEC names, so let’s return those:

curl -X POST -d 'find {id: == "CVE-2015-8388"} return .capec[].name'
  "Command Delimiters",
  "Flash Parameter Injection",
  "Argument Injection",
  "Using Slashes in Alternate Encoding"

Note that it is an array of an array. The reason is that in this case we only return the CAPEC names of a single document, but our filter condition could of course match more documents, like the word match operator did when we were searching for “buffer overlow”.

Let’s find out all CVEs where the CAPEC name “Directory Traversal”.

curl -X POST -d 'find {capec: [{name: == "Command Delimiters"}]} return .id'

The CAPEC data also contains references to related weaknesses as we’ve seen before. Let’s return the related_weakness of all CVEs that have the CAPEC name “Command Delimiters”.

curl -X POST -d 'find {capec: [{name: == "Command Delimiters"}]} return {cve: .id, related: .capec[].related_weakness}'
  "cve": "CVE-2015-8389",
  "related": [
  "cve": "CVE-2015-8388",
  "related": [

That’s not really what we were after. This returns the related weaknesses of all CAPECs and not just the one named “Command Delimiters”. The solution is a so called bind variable. You can store an array element that matches a condition in a variable which can then be re-used in the return value.

Jut prefix the array condition with a variable name separated by two colons:

find {capec: commdelim::[{name: == "Command Delimiters"}]}

And use it in the return value like any other path:

return {cve: .id, related: commdelim.related_weakness}

So the full query is:

curl -X POST -d 'find {capec: commdelim::[{name: == "Command Delimiters"}]} return {cve: .id, related: commdelim.related_weakness}'
  "cve": "CVE-2015-8389",
  "related": [
  "cve": "CVE-2015-8388",
  "related": [

The result isn’t that exciting as it’s the same related weaknesses for all CVEs, but of course the could be completely arbitrary. There’s no limitation on the schema.

So far we haven’t done any range requests yet. So let’s have a look at all CVEs that were last modified on December 28th with “High” severity rating according to the Common Vulnerability Scoring System. First we need to determine the correct timestamps:

date --utc --date="2016-12-28" "+%s"
date --utc --date="2016-12-29" "+%s"

Please note that the "last-modified" field has timestamps with 13 characters (ours have 10), which means that they are in milliseconds, so we just append three zeros and we’re good. The severity rating is stored in the field "cvss”, “High” severity means a value from 7.0–8.9. We need to put the field name last-modified in quotes as it contains a dash (just as you’d do it in JavaScript). The final query is:

curl -X POST -d 'find {"last-modified": {$date: >= 1482883200000, $date: < 1482969600000}, cvss: >= 7.0, cvss: <=8.9} return .id'

This was an introduction into basic querying of Noise. If you want to know about further capabilities you can have a look at the Noise Query Language reference or stay tuned for further blog posts.

Happy exploration!

Categories: en, Noise, Node, JavaScript, Rust

Printing panics in Rust

2017-12-05 16:19

This blog post is not about about dealing with normal runtime errors, you should really use the Result Type for that. This is about the case where some component might panic, but that shouldn’t bring the whole system to halt.

I was debugging some issue in the Node.js binding for Noise. It is using the noise_search crate which might panic if there’s an unrecoverable error. Though the Node.js binding should of course not crash, but handle it in a more graceful way. Hence it is catching the panics.

The existing code was only printing that there was some panic, but it didn’t contain the actual cause. I wanted to improve that.

I thought it would be easy and I could just print the debug version of the panic. So I changed the println!() to:

println!("panic happend: {:?}", result)

But that resulted only in a:

panic happened: Err(Any)

Which isn’t really that meaningful either. In the documentation about catch_unwind I read

…and will return Err(cause) if the closure panics. The cause returned is the object with which panic was originally invoked.

I didn’t really understand what this meant. Is the object that invokes the panic the function where the panic happens? I wanted the text I was putting into the panic!() call.

Thanks to rkruppe on IRC I learnt that panic!() can take any object, not just strings. Now the documentation made sense. He also mentioned that I can downcast Any if I know that type. As I always only use strings for panics that was easy:

if let Err(panic) = result {
    match panic.downcast::<String>() {
        Ok(panic_msg) => {
            println!("panic happened: {}", panic_msg);
        Err(_) => {
            println!("panic happened: unknown type.");

If you want to play a bit around with it, I’ve created a minimal example for the Rust Playground. Happy panicking!

Categories: en, Noise, Rust

Possible future direction for Noise

2017-10-06 16:42

I've applied for a grant from the Prototypefund to get some funding for Noise. It was a great opportunity to put some thoughts into which direction I might go with Noise. I've already posted my application in German, but I figured out it might also be interesting for a bigger audience. Hence here's the translated version of it.

On which open source project have you've worked before

What's the relation to main focus of the third round?

Note: The third round is about diversity.

Noise enables people that aren't computer experts to do data analysis. In my experience such analysis so far has been the privilege of a small group of people – developers – that know how to deal with raw data. Shouldn't the analysis of data be opened to a broader community? For example to people that have basic coding skills, but that don't have a deeper understanding how databases work, or how to administrate them. For those it should be easily possible to put the data into the environment they know and to get immediately started with the analysis.

Which social issues do you want to fix with your project?

Thanks to the open data movement there's a democratisation in data world happening. This has huge potential for freer formation of opinions and more self-determination. Statements and facts can get reproduced and verified. This potential must be exhausted in a broader way. Having the data available is not enough. The challenge is creating software solutions that makes such data analysis more accessible.

How do you want to implement your project?

Noise is a library written in Rust for searching and analysing JSON data. There's already a first working version. On the lowest level it's using Facebook's key-value store RocksDB, which was modified to support spatial queries.

There will be a C-API to integrate with other programming/scripting languages. Then it would also be possible to use it as a backend/driver for projects like GDAL or R. Integrating with programming/scripting languages doesn't stop with the API. Most languages have a full ecosystem including a package manager. Therefore it's important that Noise can be installed through those native mechanisms. This lowers the bar to get started. It already works for Node.js via “npm install noise-search”.

Which similar existing solutions are there and how is your project better?

Apache Lucene is a library for full text search. As it's pretty low-level it mostly isn't used directly, but together with Elasticsearch/Apache Solr. Noise is on a higher level than Apache Lucene and works with JSON. The processing/analysis is done with a simple query language.

Who is the target audience and how will your tool get a hold of them?

The target audience are people with basic programming knowledge. This could be scientists that want to do analysis for their empiric studies. Or it could be citizens from the civil society that want to do some fact-finding. With the integration into several programming/scripting languages, Noise is just another dependency/library and can easily be found and installed with the corresponding package manager.

Have you already worked on this idea? If yes, describe the current state and the future advances

The first version already supports basic full text search and it's also possible to query for numeric ranges and spatial queries on geodata (GeoJSON). The next steps are making the system more robust and to add additional interfaces. There could e.g. be a Python API in addition to the already existing Node.js one. Also there should be small projects doing some analysis to demonstrate the possibilities of Noise. Those can then be documented as tutorials for lowering the bar to get started even further.

Do a quick sketch of the most important milestones that you want to achieve during the period of funding

Note: The period of funding is 6 months.

  • C-API: Change the current Nodejs.API which is using Rust directly to a clean C-API
  • Python API: Deep integration as the Node.js one to get an easy installation through the package manager
  • More examples/documentation: Do small demo projects which are documented as tutorials to make the concepts of Noise more accessible
  • Internal improvements: The tightly coupled query parser needs to be refactored, i.a. for better error messages
  • Benchmarks: Benchmarks should prevent regressions and make Noise being able to be compared to other systems

Categories: en, Noise, funding

Introducing Noise

2017-09-19 11:00

I meant to write this blog post for quite some time. It's my view on the new project I'm working on called Noise. I work together with Damien Katz on it full-time for already about a year now. Damien already blogged a bit about the incarnation of Noise.

I can't recall when Damien first told me about the idea, but I surely remember one meeting we had at Couchbase, were plenty of developers were packed in a small room in the Couchbase Mountain View office. Damien was presenting his idea on how flexible JSON indexing should work. It was based on an idea that came up a long time ago at IBM (see Damien's blog post for more information).

Then the years passed without this project actually happening. I've heard again about it when I was visiting Damien while I was in the Bay Area. He told me about his plan actually doing this for real. If I would join early i would become a founder of the project. It wasn't a light-hearted decision, but I eventually decided to leave Couchbase to work full-time on Noise.

Originally Damien created a prototype in C++. But as I was really convinced that Rust is the future for systems programming and databases, I started to port it to Rust before I visited him in the US. Although Damien was skeptical at first, he at least wanted to give it a try and during my stay I convinced him that Rust is the way to go.

Damien did the hard parts on the core of Noise and the Node.js bindings. I mostly spent my time getting an R-tree working on top of RocksDB. It took several attempts, but I think finally I found a good solution. Currently it's a special purpose implementation for Noise, but it could easily be made more generic, or adapted to other specific use cases. If you have such needs, please let me know. At this year's Global FOSS4G conference I presented Noise and its spatial capabilities to a wider audience. I'm happy with the feedback I got. People especially seem to enjoy the query language we came up with.

So now we have a working version which does indexing and has many query features. You can try out Noise online. There's also basic geospatial bounding box query support, which I'll blog more about once I've cleaned up the coded-in-rush-for-a-conference mess and have merged into the master branch.

There are exciting times ahead as now it's time to get some funding for the project. Damien and I don't want to do the venture capital based startup kind of thing, but rather try to find funding through other channels. This will also define the next steps. Noise is a library so it can be the basis for a scaled up distributed system, and/or to scale down into a nice small analytics system that you can run on your local hardware when you don't have access to the cloud.

So in case you read this, tried it out and think that this is exactly what you've been looking for, please tell me about your use case and perhaps you even want to help funding this project.

Categories: en, Noise, RocksDB, Rust, geo

Distributed systems class with Aphyr

2017-09-14 14:15

I was in luck. Nuno Job organized a distributed systems class for his YLD crowd and invited friends over to join. The class was thought by Kyle Kingsbury (Aphyr) who is probably best known for Jepsen. I obviously couldn't say no to such an offer.

I really enjoyed it. Although I've been working on a distributed system for the most of my career, it was good to get a general overview of distributed systems from ground up. There is obviously more than just databases or what you learn at university.

We touched many topics, real world stories were told. Kyle did a great job, leading seamlessly from one topic to the other. He really knows what he's talking about, is funny and makes it an overall great experience. You find the contents of the class on Github, but you really want to have Kyle teaching it to you.

New things I've learned about

So there was something to learn for everyone. Here's a few things that I need to dig into deeper:

Categories: en

FOSS4G 2017

2017-09-01 22:56

The Global FOSS4G 2017 conference was a great experience as every year. Meeting all those people you know already for years, but also those you’ve so far met only virtually.

The talks

The program committee did a great job with the selection. Especially since there were so many to select from. Here are the most memorable talks:

  • “Optimizing Spatiotemporal Analysis Using Multidimensional Indexing with GeoWave” by Richard Fecher: The talked also touched the technical details on how they solve building a multidimensional index on top of distributed key-value stores. Currently they support Apache Accumulo, Apache HBase and Google’s Bigtable, but in theory they could also support any distributed key-value store, hence also [Apache CouchDB](http://couchdb.apache.org/ or Couchbase. I really enjoyed the technical depth and that it is based on solid research and evaluations.

  • “DIY mapping with drones and open source in a humanitarian context” by Dan Joseph: It was really nice to see that not everyone is using quadcopters for drone mapping, but that there’s also fixed-wing drones (they look like planes). It gave good details about failures and success. I wish good look with future models and the mapping itself.

  • “GPUs & Their Role in Geovisualization” by Todd Mostak: GPUs are now so powerful that you can do your multidimensional queries on points with doing table scans. That’s quite impressive. It’s also good to see that the core of MapD got open sourced under the Apache License 2.0.

Sadly I’ve missed two talks I wanted to see. One was [Steven Ottens](https://twitter.com/stvno_ speaking about “D3.js in postgres with plv8 V8/JavaScript”. It sounds daunting at first, but if you think about the numerous JavaScript libraries for geo processing that are out there, it makes sense for rapid prototyping.

The other one was Steven Feldman’s talk on “Fake Maps”. I always enjoy Steven’s talks as he digs into maps as much as I’d love to, but sadly don’t take the time to. Though he said that once the recording of the talk is out, I should grab a beer and enjoy watching it. I’m looking forward to do so.

My own talk went really well. Originally I thought being on the last slot on Friday — the last conference day — is bad, as people don’t have my time to approach you after the talk. But in the end it was actually good as I had several days to promote it to people who are interested. I loved that I was in the smallest room of the venue, hence it was packed. I’ll write more about the talk once I’ve cleaned up the code and pushed it to the master branch, so that you can all play with the spatial features yourself.

The keynotes

This year there were 5 keynotes, which I think is a good number. You always need to keep in mind that depending on the length, you might kick out 10–20 normal speaker slots. I enjoyed all of them, although in my opinion, for most of them 30 mins (instead of 45 mins) would’ve been sufficient. But I have to admit that I could probably see Paul Ramsey talking for hours and it would still be great.

Of course one keynote — the one from Richard Stallman — stood out. It surely lead to lively discussion within the community, which is really a great thing. I share the opinion of Jeff McKenna that I really respect what Stallman did and is doing and how much he is into it. Though it came clear to me that I am an Open Source developer who cares about openness and transparency.

The venue

The venue was a typical conference center, which had the benefit that the rooms were close together. This made switching rooms even within slots easily possible.

One thing I didn’t like was the air conditioning. Some rooms were cooled down way to much. Did anyone measure? I know, it’s a cultural thing and not the fault of the organizers. Though I wonder how much energy and money could’ve been saved when the temperature would’ve been lowered to an acceptable level only.

Sometimes there are discussions about the location of the OSGeo booth within the exhibition area. I think this year it was in a good spot. It wasn’t at the most prestigious place, that’s for Diamond sponsors. But at a spot where people actually gather/hang out, that’s a way better fit in my opinion.

The social events

The social events were nice and I was happy that I was able to bring in a well known and liked former community member into the icebreaker event. The icebreaker reminded me a bit of last year’s one. There it was possible to bring anyone who wanted to go there. I think the attendees had some vouchers, but I can’t recall really the details. Anyway, I think it’s a good idea to have one social event where you can bring in people that are in the area, but don’t attend the conference.

The code sprint

The code sprint was hosted at the District Hall which is a innovation/startup/co-working place. We had the whole space which was really nice. The different tribes, Java, MapServer, Postgres and Fancy Shit assembled at different spots and put up signs, so it was easy to find your way to the right group.


I also need to mention that the day before the FOSSS4G there was the JS.Geo at the same place as the code sprint. It was a really nice event and if I ever organize an English single track geo conference, I’ll get Brian Timoney as a moderator. He was so entertaining and really contributed to the great vibe this conference had.


This year there wasn’t a printed program brochure. It was all just available online at — the certainly cool — https://foss4g.guide/ or as app. I on my FirefoxOS phone was using the website. I think it could’ve been better to navigate, but it was OK and I didn’t really miss the brochure. The website based guide was OK when you were on your phone and on-site, to see which talks are up next. I don’t think it worked well if you tried to do some ahead of time planning.

The FOSS4G t-shirts look great, but I’m a bit sad that they were grey (a nice one though) and the Local Team had t-shirts in my favourite orange color.

Notes for future years

It might really make sense to not producing a printed program brochure anymore as probably all attendees have a smartphone anyway (though this needs to be checked by Local Team depending on the area). If you decide to go Web only, you should make sure it works offline and perhaps spending the time you would’ve spent on the printed one instead on the usability of the web one.

Categories: en, conference, geo

An R-tree implementation for RocksDB

2017-02-14 16:54

It's long been my plan to implement an R-tree on top of RocksDB. Now there is a first version of it.

Getting started

Checkout the source code from my RocksDB rtree-table fork on Github, build RocksDB and the R-tree example.

git clone https://github.com/vmx/rocksdb.git
cd rocksdb
make static_lib
cd examples
make rtree_example

If you run the example it should output augsburg:

$ ./rtree_example

For more information about how to use the R-tree, see the Readme file of the project.


The nice thing about LSM-trees is that the index data structures can be bulk loaded. For now for my R-tree it's just a simple bottom up building with a fixed node size (default is 4KiB). The data is pre-sorted by the low value of the first dimension. This means that data has a total order, hence also sorted results based on the first dimension. The idea is based on the paper On Support of Ordering in Multidimensional Data Structures by Filip Křižka, Michal Krátký, Radim Bača.

The tree is far from optimal, but it is a good starting point. Currently only doubles are supported. In the future I'd like to support integers, fixed size decimals and also strings.

If you have a look at the source code and cringe because of the coding style, feel free to submit pull requests (my current C++ skills are sure limited).

Next steps

Currently it's a fork of RocksDB which surely isn't ideal. I've already mentioned it in last year's FOSS4G talk about the R-tree in RocksDB (warning: autoplay) that there are several possibilities:

  • Best (unlikely): Upstream merge
  • Good: Add-on without additional patches
  • Still OK: Be an easy to maintain fork
  • Worst case: Stay a fork

I hope to work together with the RocksDB folks to find a way to make such extensions easily possible with no (or minimal) code changes. Perhaps having stable interfaces or classes that can easily be overloaded.

Categories: en, RocksDB, geo

Goodbye Couchbase

2016-07-04 08:57

I'm no longer with Couchbase. That's a pretty significant change for me. I've been with Couchbase (Couchio, CouchOne) for over 6 years, it was on May 12th 2010 when I got my Couchio email address. I was even still studying back then. It was a great time, I learnt a lot and it was a pleasure to work with so many skilled people.

Though there are exciting times ahead. I'm working with Damien Katz on a fancy new project which includes the technologies I really like to work with. It's build with Rust and the storage back-end is RocksDB. More on that as we go.

Surely some may wonder what will happen to GeoCouch. I don't know about the future of it when it comes to Couchbase, that's beyond my control. Though I do control GeoCouch in regards to Apache CouchDB. I surely want to get it working with the upcoming Apache CouchDB 2.0 release. GeoCouch might look pretty different from what it is now. I'd like to base the back-end on RocksDB. That's the reason why I'm implementing an R-tree on top of RocksDB. This also means that GeoCouch again is a free-time project of me, though that isn't necessarily a bad thing.

Categories: en, Couchbase, GeoCouch

Berlinale 2015

2015-02-18 00:38

The great Berlinale Internaltional Film Festival 2015 is over. Time for a short wrap up of the films I've watched. I'll include the one sentence review that I always tweeted right after each film.

This overview is sorted by ranking, so you can easily spot the must watch films.

Must watch films

The films that were outstanding got a rating of 5/5. Those are the ones that I feel everyone should watch and will probably enjoy.

Feature films


Berlinale program page

Unforeseeable, amusing trip that paints a diverse picture of today's Iran

The Golden Bear is really well deserved. Given that Jafar Panihi is not allowed to work as a film maker, makes this movie even more astonishing.


Berlinale program page

Classic tragic suspenseful love story

Although the film is originally from 1925, it shows that stories about love are timeless. The top rating is mostly due to the atmosphere the live music created. I'm not sure if it is that impressive when watched with audio through speakers.

Vergine giurata (Sworn Virgin)

Berlinale program page

Touching film about family and finding the gender identity

This drama doesn't need much words, it evolves a lot through the pictures. It's one of the films that moves slowly without being lengthy.

Mariposa (Butterfly)

Berlinale program page

Uncommon, clever and complex film about love and relationships

It took me a while to really understand what's going on, but it's exactly that cluelessness that makes that film that great.


Tell Spring Not to Come This Year

Berlinale program page

Extremely disturbing/moving documentary about Afghan troops after ISAF

There's not much news how things are in Afghanistan with the ISAF troops leaving, it got quite silent. It's thanks to one of the film makers that served himself in Afghanistan for the British troops that made it possible to have such an great insight view on how the war against the Taliban is still ongoing. The Panorama Audience Award is well deserved.

Danielův Svět (Daniel’s World)

Berlinale program page

Stunning must-watch documentary about coming-out as pedophile

This documentary touches a topic I've never thought of before. How can you live with being a pedophile? I really admire the protagonist. He speaks openly about his life, desires and problems. How shows well, that pedophiles shouldn't be lumped together with child abusers. The vast majority of child abusers are not even pedophiles, children are often just an easier target for them.

Great films

These are the films that I really liked (a rating of 4/5). Though I guess some of them depend really a lot on my personal taste.

Feature films

Nadie quiere la noche (Nobody Wants the Night)

Berlinale program page

Tragic critism of humanity about love and friendship

It was the first film I watched at the Berlinale, I didn't know what to expect and how many great movies I was going to see. Looking back I think it's only a 3/5.

Nasty Baby

Berlinale program page

Starts slow, get's disturbing film about different moral concepts and anger management

It sometimes has a great use of pop music.

Hedi Schneider steckt fest (Hedi Schneider is Stuck)

Berlinale program page

Anxiety states and the effect on the family

I mostly decided to watch this one as one of my favourite films Farland starring the same main actress Laura Tonke. I wasn't disappointed. It was also interesting to hear that a lot of scenes that were in the original script got left out during the cutting process. And this is really what the film feels like, it's reduced to the story it wants to tell.

El Club (The Club)

Berlinale program page

Weird moral concepts under the cover of the Catholic Church

This is really a dark and sometimes bizarre movie. The Silver Bear is well deserved.


Berlinale program page

Gripping film about Martin Luther King Jr's fight for the voting rights of the black

I had the luck to be at the gala event of the film. It luckily doesn't feel as lengthy as the running time suggests.

Knight of Cups

Berlinale program page

In search of life fulfilment

I'm sure many people dislike the movie. If you don't know the feeling of emptiness inside, although you have everything. And don't know how it feel to search for something without neither finding, nor what you're exactly looking for, then don't watch it.

Virgin Mountain

Berlinale program page

A film full of emotions about unconditional love

I needed to watch this movie as I generally enjoy films from Iceland and additionally the director Dagur Kári also directed on of my favourite films Nói albínói. This one didn't touch me as much as Nói albínói back in the days, but it's still a pleasant film to watch.

De ce eu (Why me?)

Berlinale program page

Solid film portraying a young prosecutor fighting the system in Romania

It wasn't that easy to follow the film, given that it was a complex matter and with sub-titles. I rated the film higher after I got to know that it is heavily based on a real person (even the affair is part of that).

Stories of Our Lives

Berlinale program page

Pleasant collection of episodes about homosexuality in Kenya

It's a collection of short films, all in the same style. Some are funny, some are tragic.

Ten no chasuke (Chasuke’s Journey)

Berlinale program page

Funny japanese style film about prevision

The idea of having someone writing our lives is a good one and leads to very funny situations. It reminded me a bit of Bruce Almighty and The Truman Show.

Madare ghalb atomi (Atom Heart Mother)

Berlinale program page

Getting pulled into something without much choice

I changed that tweet three times. First the rating (up to 4/5), then the contents. Thanks to someone nice sitting next to me in the next film, I was able to hear what others thought about it as it wasn't really clear to me. During the next film I then suddenly felt like understanding what it was about and how great it was.

In case you also don't feel like getting the point, feel free to get in touch with me and we can discuss it.


Berlinale program page

"Anger without opponent" [quotation of the director]

It's not a film about a shiny world where everything is fine. It shows well that things go wrong and there needs something to be done in our society. It manages to be drastic without getting unrealistic.


Flotel Europa

Berlinale program page

Interesting retrospective of yugoslavian refugees from the view of a child

It was interesting to see a film whose footage was shot many years ago an is solely based on VHS tapes.

Me’kivun ha’yaar (Out of the Forest)

Berlinale program page

Documentary showing personal perspectives on the WWII mass murder in Ponary

If war crimes happen, people often pretend they didn't knew it, don't remember or are just not talking about it. This documentary shows that it can also be a relief for some to be able to finally speak about what happened.

Average films

These are films that I enjoyed but are not really special (rating of 3/5).

Feature films

Mr. Holmes

Berlinale program page

Solid film about hindsight making a change even at old age

I consider this a harmless main stream movie that is good to watch.

Seeds of time

Berlinale program page

Interesting documentary about Cary Fowler's mission to preserve the bio diversity

The topic it covers is really interesting. I didn't rate it higher as the documentary itself isn't done that well. Sometimes I felt like I've heard that information before. It could have been condensed. It was also not really pushing forward a story as other documentaries did. I wonder if a 40 minutes version of it could say the same.

Petting Zoo

Berlinale program page

Beautiful pictures about an end of high school girl dealing with pregnancy

It's a very atmospheric film. I enjoyed watching it. Though it's sadly nothing special.

Elser (13 Minutes)

Berlinale program page

Being fed up with the system and the urge to take action on your own

It's a solid main stream movie based in Nazi Germany. It's interesting, well told and played. It reminded me of a TV production you'd watch with your family on a Friday evening on public TV.

Cha và con và (Big Father, Small Father and Other Stories)

Berlinale program page

Living as young gay in today's Vietnam

I really liked the pictures, but the story is a bit shallow.

Superwelt (Superworld)

Berlinale program page

Mysterious film about self-doubts and self-discovery

It almost got a rating of 4/5, but somehow the overall movie didn't cut it for me.


Freie Zeiten (After Work)

Berlinale program page

Entertaining documentary with interesting views on recreational activites

It was OK to watch with some funny situations, though overall just too average

Sergio Herman, Fucking Perfect

Berlinale program page

A workaholic trying to get out of it

Well made documentary, but sadly too much of a portrait of Sergio Herman. I found documentaries about people that change or try to change the world more interesting.

Cobain: Montage of Heck

Berlinale program page

Solid, but average documentary about Kurt Cobain

I don't know what I expected, but somehow I expected more of the documentary. It was interesting and well made, but just wasn't as good as other documentaries I've seen.

Other films

The following are films with no rating or a rating of 2/5 or less. All of them were feature films.


Berlinale program page

Coping with sorrow

I might have been too tired when I watched that, but somehow I couldn't connect to it, hence it only got a 2/5.

Journal d'une femme de chambre (Diary of a Chambermaid)

Berlinale program page

Waiting for the grand finale to no avail

I was always waiting for the twist or the scene that adds another perspective, but it just didn't happen, hence it's only a rating of 1/5.

Pod electricheskimi oblakami (Under Electric Clouds)

Berlinale program page

Desolate future

The film is divided into seven episodes spanning over two hours. After each episode some people left the cinema. I also felt like one episode would've been enough. Rating: 1/5.


Berlinale program page

Historic narration of the situation in Walachia in the 19th century

As my interest is not in historic movies about Walachia it's a 1/5.

Berlinale Shorts I

No notable short film

The shorts were really disappointing. I would've expected more of a short movie selection at the Berlinale. The Filmtage Augsburg do a better job on the selection.


Berlinale program page


I don't really understand what the intention of this film was. It even wasn't much clearer after the Q&A session. It's supposed to be about the tension of the relationships of the main character between his girlfriend and his mother. I also didn't understand the end. As it had nice pictures and someone might get the point I decided to not giving a rating.

Black President

Berlinale program page


I accidentally bought a ticket for this one. I thought it would be short films, but it was a movie in the Forum Expanded section. It seemed like a well made documentary, but I felt knowing too little about art and the culture in South Africa. Hence I refrained from rating it.

Categories: en, film, festival

By Volker Mische

Powered by Kukkaisvoima version 7