vmx

the blllog.

Possible future direction for Noise

2017-10-06 16:42

I've applied for a grant from the Prototypefund to get some funding for Noise. It was a great opportunity to put some thoughts into which direction I might go with Noise. I've already posted my application in German, but I figured out it might also be interesting for a bigger audience. Hence here's the translated version of it.

On which open source project have you've worked before

What's the relation to main focus of the third round?

Note: The third round is about diversity.

Noise enables people that aren't computer experts to do data analysis. In my experience such analysis so far has been the privilege of a small group of people – developers – that know how to deal with raw data. Shouldn't the analysis of data be opened to a broader community? For example to people that have basic coding skills, but that don't have a deeper understanding how databases work, or how to administrate them. For those it should be easily possible to put the data into the environment they know and to get immediately started with the analysis.

Which social issues do you want to fix with your project?

Thanks to the open data movement there's a democratisation in data world happening. This has huge potential for freer formation of opinions and more self-determination. Statements and facts can get reproduced and verified. This potential must be exhausted in a broader way. Having the data available is not enough. The challenge is creating software solutions that makes such data analysis more accessible.

How do you want to implement your project?

Noise is a library written in Rust for searching and analysing JSON data. There's already a first working version. On the lowest level it's using Facebook's key-value store RocksDB, which was modified to support spatial queries.

There will be a C-API to integrate with other programming/scripting languages. Then it would also be possible to use it as a backend/driver for projects like GDAL or R. Integrating with programming/scripting languages doesn't stop with the API. Most languages have a full ecosystem including a package manager. Therefore it's important that Noise can be installed through those native mechanisms. This lowers the bar to get started. It already works for Node.js via “npm install noise-search”.

Which similar existing solutions are there and how is your project better?

Apache Lucene is a library for full text search. As it's pretty low-level it mostly isn't used directly, but together with Elasticsearch/Apache Solr. Noise is on a higher level than Apache Lucene and works with JSON. The processing/analysis is done with a simple query language.

Who is the target audience and how will your tool get a hold of them?

The target audience are people with basic programming knowledge. This could be scientists that want to do analysis for their empiric studies. Or it could be citizens from the civil society that want to do some fact-finding. With the integration into several programming/scripting languages, Noise is just another dependency/library and can easily be found and installed with the corresponding package manager.

Have you already worked on this idea? If yes, describe the current state and the future advances

The first version already supports basic full text search and it's also possible to query for numeric ranges and spatial queries on geodata (GeoJSON). The next steps are making the system more robust and to add additional interfaces. There could e.g. be a Python API in addition to the already existing Node.js one. Also there should be small projects doing some analysis to demonstrate the possibilities of Noise. Those can then be documented as tutorials for lowering the bar to get started even further.

Do a quick sketch of the most important milestones that you want to achieve during the period of funding

Note: The period of funding is 6 months.

  • C-API: Change the current Nodejs.API which is using Rust directly to a clean C-API
  • Python API: Deep integration as the Node.js one to get an easy installation through the package manager
  • More examples/documentation: Do small demo projects which are documented as tutorials to make the concepts of Noise more accessible
  • Internal improvements: The tightly coupled query parser needs to be refactored, i.a. for better error messages
  • Benchmarks: Benchmarks should prevent regressions and make Noise being able to be compared to other systems

Categories: en, Noise, funding

Leave reply

No html allowed in reply

By Volker Mische

Powered by Kukkaisvoima version 7