The State of GeoCouch

Welcome to my talk “The State of GeoCouch”. I've chosen the title as I always wanted to be as cool as the GeoServer or PostGIS people. Though the talk will also introduce you to CouchDB, in case you don't know much about it yet.

I love open source and contribute to projects like CouchDB, MapQuery and OpenLayers. I work as a Geospatial Software Engineer at Couchbase. I came it touch with the free and open source for geospatial worl in 2008 when I spent one year in Australia. Around the same time I found about CouchDB.

What is GeoCouch? It's a spatial index for CouchDB. You could regard its relation to CouchDB as PostGIS ones to PostgreSQL. As most of the cool features come from CouchDB, I'd like to introduce it first.

So what is CouchDB? It's one of those new non-relation databases out there. There are key-value stores, graph databases and document oriented databases like CouchDB.

So one record in the database is like a row in a relational database. It's the smallest item you can store. Take the following example.

It’s a weather station which measures rainfall and temperature once a day. Don’t care about the attributes with a leading underscore at the top, these are CouchDB internals. The actual data is below. It’s quite self-explanatory.

As you can see, the data is encoded as JSON. You can have strings, numbers, arrays and even nested objects.

Let's go back to the GeoCouch. How did it all start?

In 2008 I spent one year in Australia at an Open Source geospatial company. We had a customer that produced reports about water quality and some other measurements as PDF. They decided that they wanted to go for a fancy web mapping application to visualise their data. For a prototype the gave us XML which we converted to JSON.

Then the day of the final version came closer. They wanted to go for a traditional 3-tier architecture with a database, GeoServer and OpenLayers on top. It took them weeks to come up with a database schema. Once we had it, our database expert had a quick look and said “You would need to do a cross join over all tables for our kind of queries”.

This was the point when I thought that it would be so nice to use CouchDB for it. We could have just stored the JSON in the database and would have been ready to go.

But it wouldn't have been possible for several reasons. One would have been that CouchDB was still new and convincing a customer to use it had low chances. The other one, that CouchDB haven't had any spatial indexing at this point, to make queries like bounding box searched. Hence I created GeoCouch.

A nice example for an web mapping application that uses GeoCouch is poetrybox.info. In Portland, Oregon there are poles with small wooden boxes mounted on them. Those boxes contain a poems. They are scattered all around the city.

This application displays them on a map, so you know where to find them. A click on the markers show information and a picture of them on the right side.

This example shows one of GeoCouch's goals quite clearly.

It's about publishing data fast. Especially with the open government data movement, people start to get access to actual data and they want to do something with it quickly. GeoCouch should make this easy as possible.

Speaking of the new movements, I see two big groups of people in the current geospatial world. There's the traditional GIS people and the neogeographers.

For me traditional GIS means:

projections
transformations
weird incompatible ancient data formats
axis order confusion
OGC, WMS, WFS, WPS, WPVS, OWS
ISO 19115
huge UML diagrams
SOAP, XML, Java
SQL
Government agencies
Complex systems that do almost everything

This is in contrast to what I think about neogeography:

the web
people of the web, coders, designers
maps as a tool
open data
crowdsourcing
DIY
JSON, JavaScript
NIH-Syndrome, small groups that do fancy stuff
solve problems locally
Keep it stupid simple
Cutting edge technologies

I see GeoCouch in the middle of those two movements. The reason is that…

…CouchDB is more than a database. It has many nice features that GeoCouch can leverage from.

Let's start with the RESTful HTTP API. CouchDB is heavily build upon HTTP, JavaScript and JSON. This make it easy to use for web developers, but not only for those.

You don't need any special database drivers to access the database, you just use HTTP. Almost every programming or scripting language comes with a HTTP client and JSON is also easy to use and parse.

It's also nice for operations. Your admins know HTTP from inside out. You can use all the existing tools like reverse proxies or load balancers with CouchDB.

One of the hottest features and one of the major reasons why CouchDB came into existence is the replication. True master-master replication is supported. But let's explain it with in example. I call it the offline use-case.

You are in the middle of nowhere and like to tweet about it, but there's no connection. You are offline. Or…

…there was a connection, but there's no longer any. Let's further expand on this disaster relief case.

You are out there, with your phone. You have CouchDB installed on it (iOS and Android is supported by Couchbase). You keep on collection data and store it directly on the phone.

Then there's some nearby local machine, where all the relief workers store the data on.

You can simply replicate to it. If you have setup a local Wi-Fi, you might replicate wireless. In case you don't just plug your mobile in and start the replication. That's not all.

There might be some central instead further away that collects the data of the whole disaster area.

You are not able to connect to it directly, but you drive there by car every day to replicate the data. Such a workflow doesn't necessarily need replication. But it makes things easier once the connection to the Internet (and hence the central server) is working again.

You can simply replicate directly to it. That's still not all.

Let's concentrate on the replication itself. I previously mentioned the real master-master replication. This means that you can also replicate the other way round.

You can use the replication into the opposite direction for two things:

Push data down to the phones that comes from the outside, for example OpenStreetMap data that got updated. It will get to the phones whenever they replicate their data anyway. So the people on the ground can work (offline) on current maps.
People from the outside, which access the central server through the Internet, can cleanup the data. For example with removing typos or duplicates. Thanks to the master-master replication the updated records can be replicated back to the phones.

Another nice thing about CouchDB is the concept called CouchApps. As CouchDB has an RESTful HTTP API, it is a webserver. It can serve up your static files.

So CouchDB will be your Server. It will serve up your HTML5 application that contains HTML, JavaScript and CSS to the client.

That's all. You can build your application with this 2-tier architecture.

You remember the example from the beginning with the poetry boxes? This is actually such a CouchApp. It is purely HTML5 based and is served straight from CouchDB/GeoCouch.

Think about the possibilities. For CouchDB such an app is only data, so you can replicate it just as any data. The poetryboxes app could simply be replicated to your phone and used from there. You could even include the data. It would then work offline as well. Imagine that you build a web mapping application and create a mobile version of it with just changing a few bits of CSS.

Even the application that was the reason for the creation of GeoCouch could now be built as a CouchApp instead of using the typical 3-tier architecture.

There's even more. Couch is a full ecosystem. There are two companies built around CouchDB. One is Cloudant, that do BigCouch, which is meant for up-scaling CouchDB based on Amazons Dynamo paper. The other one is Couchbase, which I work for. We do not only the up-scaling, but also the down-scaling to the phone.

There's is:

Couchbase Server, which puts CouchDB in a clustered environment which is based on Membase.
Couchbase Single Server, which is basically a CouchDB with GeoCouch and some smaller improvements.
Couchbase Mobile, which is a Couchbase Single Server that runs on your phone, iOS and Android.

Back to GeoCouch.

Of course standards are important and it's kind of a commitment to the traditional GIS world. Hence GeoCouch implements…

…OpenSearch Geo. Most people don't know OpenSearch, although almost everyone is using it. The little box at the top right corner of your browser which can be used to search on search engines or other things like dictionaries or movie database is built on OpenSearch. The standard makes sure that application can discover how a website can searched.

Some people created a geo extension to it. This means that you can make simple geo queries via a simple URL.

The OpenSearch Geo specification is currently in the OGC Fast Track process and will eventually become an OGC standard.

A URL to query GeoCouch looks like this. The first part is the host it runs on. Then there's a long part which is specific to GeoCouch, I won't bother you for now with it and just put three dots. The final part is OpenSearch Geo. This is an example for a bounding box request.

What can you use GeoCouch for? For example as a TileCache. I created an experimental extension for CouchDB a long time ago. There are also some experiments for MapProxy done by Oliver Tonnhofer.

Or to serve up vector data as in the example at the beginning of the talk. There you would just use a plain GeoCouch together with some web mapping client. Now a shameless plug: use MapQuery for it. It combines OpenLayers with jQuery. I'm committer of the MapQuery project and will make sure that it works well with GeoCouch.

And of course you can also use it within your traditional 3-tier architecture. There are already several backends available:

GDAL 1.9.0 will contain a CouchDB driver
GeoTools has a feature store in a very early stage
Deegree has an experimental implementation for their blob storage, which was created at the OSGeo Code Sprint Bolsena 2011

As all projects are in early stages, I'm sure they'd love to see some funding. You can either contact me or the projects directly.

What will the future hold? For one thing GeoCouch needs to support more complex queries like polygon or radius search. This will be achieved with using geometry libraries. There are two potential candidates.

GEOS. The problem with it is, that it is licensed under LGPL, which might be a problem on iOS. I'd love to see a switch to a BSD-like license.
The other candidate is Boost.Geometry. The problem with it is that the disjoint function doesn't yet support all geometry types (the documentation is wrong on that).

For another thing I'd like to move to multidimensional indexing. There are many index structures out there like: BUB-tree, SH-tree, X-tree, DBM-tree. I haven't decided on one yet. If you have created a new one, or know a good one that supports bulk updates, please let me know.

Now finally to a topic everyone is interested in, benchmarks. Last year I presented a comparison to SpatiaLite and PostGIS. Only two hours before my talk in 2010 Pirmin Kalberer pointed out that I did the SpatiaLite benchmarking wrong. Hence this year all I say is…

…it's not always about the performance. You've listened 20 minuted to this talk now, so you should have gotten a good impression of what CouchDB, GeoCouch and Couchbase is capable of. If your application needs one of those features, you should probably go for it.

If you don't need anything of the outlined features, you should probably use something else.

So the important thing is not the performance, but to…

…use the right tool for the right job.

Thanks for your attention!

Website: http://vmx.cx/
IRC: vmx @ freenode
Email: volker@couchbase.com
Jabber: volker@vmx.cx
Twitter: @vmische
GeoCouch:
- Source: https://github.com/couchbase/geocouch/
- Binaries: http://www.couchbase.org/