vmx

the blllog.

Printing panics in Rust

2017-12-05 16:19

This blog post is not about about dealing with normal runtime errors, you should really use the Result Type for that. This is about the case where some component might panic, but that shouldn’t bring the whole system to halt.

I was debugging some issue in the Node.js binding for Noise. It is using the noise_search crate which might panic if there’s an unrecoverable error. Though the Node.js binding should of course not crash, but handle it in a more graceful way. Hence it is catching the panics.

The existing code was only printing that there was some panic, but it didn’t contain the actual cause. I wanted to improve that.

I thought it would be easy and I could just print the debug version of the panic. So I changed the println!() to:

println!("panic happend: {:?}", result)

But that resulted only in a:

panic happened: Err(Any)

Which isn’t really that meaningful either. In the documentation about catch_unwind I read

…and will return Err(cause) if the closure panics. The cause returned is the object with which panic was originally invoked.

I didn’t really understand what this meant. Is the object that invokes the panic the function where the panic happens? I wanted the text I was putting into the panic!() call.

Thanks to rkruppe on IRC I learnt that panic!() can take any object, not just strings. Now the documentation made sense. He also mentioned that I can downcast Any if I know that type. As I always only use strings for panics that was easy:

if let Err(panic) = result {
    match panic.downcast::<String>() {
        Ok(panic_msg) => {
            println!("panic happened: {}", panic_msg);
        }
        Err(_) => {
            println!("panic happened: unknown type.");
        }
    }
}

If you want to play a bit around with it, I’ve created a minimal example for the Rust Playground. Happy panicking!

Categories: en, Noise, Rust

Possible future direction for Noise

2017-10-06 16:42

I've applied for a grant from the Prototypefund to get some funding for Noise. It was a great opportunity to put some thoughts into which direction I might go with Noise. I've already posted my application in German, but I figured out it might also be interesting for a bigger audience. Hence here's the translated version of it.

On which open source project have you've worked before

What's the relation to main focus of the third round?

Note: The third round is about diversity.

Noise enables people that aren't computer experts to do data analysis. In my experience such analysis so far has been the privilege of a small group of people – developers – that know how to deal with raw data. Shouldn't the analysis of data be opened to a broader community? For example to people that have basic coding skills, but that don't have a deeper understanding how databases work, or how to administrate them. For those it should be easily possible to put the data into the environment they know and to get immediately started with the analysis.

Which social issues do you want to fix with your project?

Thanks to the open data movement there's a democratisation in data world happening. This has huge potential for freer formation of opinions and more self-determination. Statements and facts can get reproduced and verified. This potential must be exhausted in a broader way. Having the data available is not enough. The challenge is creating software solutions that makes such data analysis more accessible.

How do you want to implement your project?

Noise is a library written in Rust for searching and analysing JSON data. There's already a first working version. On the lowest level it's using Facebook's key-value store RocksDB, which was modified to support spatial queries.

There will be a C-API to integrate with other programming/scripting languages. Then it would also be possible to use it as a backend/driver for projects like GDAL or R. Integrating with programming/scripting languages doesn't stop with the API. Most languages have a full ecosystem including a package manager. Therefore it's important that Noise can be installed through those native mechanisms. This lowers the bar to get started. It already works for Node.js via “npm install noise-search”.

Which similar existing solutions are there and how is your project better?

Apache Lucene is a library for full text search. As it's pretty low-level it mostly isn't used directly, but together with Elasticsearch/Apache Solr. Noise is on a higher level than Apache Lucene and works with JSON. The processing/analysis is done with a simple query language.

Who is the target audience and how will your tool get a hold of them?

The target audience are people with basic programming knowledge. This could be scientists that want to do analysis for their empiric studies. Or it could be citizens from the civil society that want to do some fact-finding. With the integration into several programming/scripting languages, Noise is just another dependency/library and can easily be found and installed with the corresponding package manager.

Have you already worked on this idea? If yes, describe the current state and the future advances

The first version already supports basic full text search and it's also possible to query for numeric ranges and spatial queries on geodata (GeoJSON). The next steps are making the system more robust and to add additional interfaces. There could e.g. be a Python API in addition to the already existing Node.js one. Also there should be small projects doing some analysis to demonstrate the possibilities of Noise. Those can then be documented as tutorials for lowering the bar to get started even further.

Do a quick sketch of the most important milestones that you want to achieve during the period of funding

Note: The period of funding is 6 months.

  • C-API: Change the current Nodejs.API which is using Rust directly to a clean C-API
  • Python API: Deep integration as the Node.js one to get an easy installation through the package manager
  • More examples/documentation: Do small demo projects which are documented as tutorials to make the concepts of Noise more accessible
  • Internal improvements: The tightly coupled query parser needs to be refactored, i.a. for better error messages
  • Benchmarks: Benchmarks should prevent regressions and make Noise being able to be compared to other systems

Categories: en, Noise, funding

Bewerbung bei Prototypefund

2017-10-02 16:49

Update 2017-10-06: There's also an English translation of this blog post now.

Ich habe mich für die dritte Runde des Prototypefund mit Noise beworben (vielen Dank an alle die Korrektur gelesen haben). Nachdem Jon seine Bewerbung mit Transforlabs veröffentlicht hat, will ich diesem Beispiel folgen und meine auch online stellen. Zudem bin ich Transparenz-Fan und natürlich auch neugierig was andere so geschrieben haben. Und obendrein gibt es noch eine Idee wohin die Reise mit Noise gehen könnte.

An welchen Open-Source-Projekten hast Du bisher gearbeitet?

Wie bezieht sich Dein Projekt auf den Themenschwerpunkt der 3. Runde?

Noise bietet Menschen die keine Computexpert_innen sind, die Möglichkeit selbst Datenanalysen durchzuführen. Meiner Erfahrung nach ist dies bisher meist einer kleinen Gruppe vorbehalten – Entwickler_innen – die wissen, wie sie mit Rohdaten umzugehen haben. Sollte man nicht die Auswertung der Datenschätze einer größeren Benutzergruppe eröffnen? Zum Beipiel Personen, die sich Grundkenntnisse im Programmieren angeeignet haben, denen aber tiefergehende Kenntnisse über die Funktionsweise oder Administration von Datenbanken fehlen. Für all jene sollte es möglich sein, die Daten ganz einfach innerhalb ihrer gewohnten Umgebung einzuspielen, um anschließend direkt mit der Analyse beginnen zu können.

Welches gesellschaftliche Problem willst Du mit Deinem Projekt lösen?

Durch die Open-Data Bewegung findet eine Demokratisierung der Datenwelt statt. Diese bietet großes Potential für freiere Meinungsbildung und mehr Selbstbestimmtheit. Aussagen und Fakten können direkt nachvollzogen und verifiziert werden. Allerdings muss dieses Potential noch besser ausgeschöpft werden. Das alleinige vorhanden sein der Daten reicht dafür nicht aus. Eine zentrale Herausforderung besteht darin Softwarelösungen zu schaffen, um die Analyse der Daten zugänglicher zu gestalten.

Wie willst Du Dein Projekt technisch umsetzen?

Noise ist eine in Rust geschriebene Bibliothek zum Durchsuchen und Analysieren von Daten im JSON-Format. Es gibt schon eine erste funktionsfähige Version. Die unterste Ebene des Systems bildet Facebooks Key-Value-Store RocksDB, das angepasst wurde, um räumliche Anfragen zu unterstützen. Grundlage für die Integration mit anderen Programmier-/Scriptsprachen bildet eine C-API. Damit wäre es auch denkbar, Noise als Backend/Treiber für Projekte wie GDAL oder R zu nutzen. Das Zusammenspiel mit Programmier-/Scriptsprachen hört aber nicht bei der API auf. Mittlerweile haben die meisten Sprachen ein ganzes Ökosystem mit einem Paketmanager. Daher ist es wichtig, dass Noise über die nativen Installationsmechanismen der jeweiligen Umgebung installiert werden kann. Dadurch wird auch der Einstieg erleichtert. Bei Node.js funktioniert dies bereits per "npm install noise-search".

Welche ähnlichen Lösungen gibt es schon, und was wird Dein Projekt anders bzw. besser machen?

Apache Lucene ist eine Bibliothek zur Volltext-Suche. Da sie sehr low-level ist, wird sie meist nicht direkt, sondern in Verbindung mit Elasticsearch/Apache Solr verwendet. Noise befindet sich im Gegensatz zu Apache Lucene auf einer höheren Ebene und arbeitet mit Daten im JSON Format, deren Verarbeitung/Analyse mit Hilfe einer einfachen Querysprache stattfindet.

Wer ist die Zielgruppe, und wie soll Dein Tool sie erreichen?

Die Zielgruppe sind Personen mit Grundkenntnissen im Programmieren. Dies können zum einen Wissenschaftler_innen sein, die Analysen für ihre empirischen Studien machen. Zum anderen aber auch Bürger_innen der Zivilgesellschaft, die einen Sachverhalt genauer unter die Lupe nehmen wollen. Durch die Integration in verschiedene Programmier-/Scriptsprachen ist Noise dort nur eine weitere Abhängigkeit/Bibliothek und somit sehr leicht über die jeweiligen Paketmanager aufzufinden und zu installieren.

Hast Du schon an der Idee gearbeitet? Wenn ja, beschreibe kurz den aktuellen Stand und erkläre die Neuerung.

Die erste Version unterstützt bereits grundlegende Volltext-Suche, zudem können Zahlbereichsanfragen und räumliche Anfragen auf Geodaten (GeoJSON) gemacht werden. Die nächsten Schritte bestehen darin, das System robuster zu machen und weitere Schnittstellen zu schaffen. Der bereits bestehenden Node.js API könnte beispielsweise eine für Python folgen. Auch sollen kleine Analyseprojekte durchgeführt werden, um die Fähigkeiten von Noise zu demonstrieren. Diese können dann in Form von Tutorials aufgearbeitet werden und dadurch den Einstieg wiederum erleichtern.

Skizziere kurz die wichtigsten Meilensteine, die Du im Förderzeitraum umsetzen willst.

  • C-API: Umwandlung der bisherigen Node.js API, die auf Rust aufsetzt, in eine saubere C-API.
  • Python API: Tiefgreifende Integration wie bei der Node.js API, um eine einfache Installation per Paketmanager zu ermöglichen.
  • Mehr Beispiele/Dokumentation: Kleine Beispielprojekte, die in Form von Tutorials dokumentiert werden, um die Konzepte von Noise zugänglicher zu machen.
  • Interne Verbesserungen: Der sehr eng mit dem Rest des Systems verbundene Query-Parser soll entflochten werden, u. a. für bessere Fehlermeldungen.
  • Benchmarks: Benchmarks sollen Regressionen verhindern und eine Möglichkeit bieten, Noise mit anderen Systemen zu vergleichen.

Categories: de, Noise, funding

Introducing Noise

2017-09-19 11:00

I meant to write this blog post for quite some time. It's my view on the new project I'm working on called Noise. I work together with Damien Katz on it full-time for already about a year now. Damien already blogged a bit about the incarnation of Noise.

I can't recall when Damien first told me about the idea, but I surely remember one meeting we had at Couchbase, were plenty of developers were packed in a small room in the Couchbase Mountain View office. Damien was presenting his idea on how flexible JSON indexing should work. It was based on an idea that came up a long time ago at IBM (see Damien's blog post for more information).

Then the years passed without this project actually happening. I've heard again about it when I was visiting Damien while I was in the Bay Area. He told me about his plan actually doing this for real. If I would join early i would become a founder of the project. It wasn't a light-hearted decision, but I eventually decided to leave Couchbase to work full-time on Noise.

Originally Damien created a prototype in C++. But as I was really convinced that Rust is the future for systems programming and databases, I started to port it to Rust before I visited him in the US. Although Damien was skeptical at first, he at least wanted to give it a try and during my stay I convinced him that Rust is the way to go.

Damien did the hard parts on the core of Noise and the Node.js bindings. I mostly spent my time getting an R-tree working on top of RocksDB. It took several attempts, but I think finally I found a good solution. Currently it's a special purpose implementation for Noise, but it could easily be made more generic, or adapted to other specific use cases. If you have such needs, please let me know. At this year's Global FOSS4G conference I presented Noise and its spatial capabilities to a wider audience. I'm happy with the feedback I got. People especially seem to enjoy the query language we came up with.

So now we have a working version which does indexing and has many query features. You can try out Noise online. There's also basic geospatial bounding box query support, which I'll blog more about once I've cleaned up the coded-in-rush-for-a-conference mess and have merged into the master branch.

There are exciting times ahead as now it's time to get some funding for the project. Damien and I don't want to do the venture capital based startup kind of thing, but rather try to find funding through other channels. This will also define the next steps. Noise is a library so it can be the basis for a scaled up distributed system, and/or to scale down into a nice small analytics system that you can run on your local hardware when you don't have access to the cloud.

So in case you read this, tried it out and think that this is exactly what you've been looking for, please tell me about your use case and perhaps you even want to help funding this project.

Categories: en, Noise, RocksDB, Rust, geo

By Volker Mische

Powered by Kukkaisvoima version 7