vmx

the blllog.

How I met CouchDB

2010-07-14 22:35

It was a Saturday in late April 2008, I was sitting on my Laptop in my 5m² room down under. Chatting with some German people I used to chat with for about 8 years by that time. Suddenly I discover that Jan is there, who I haven't talked with for years. Wondering why he was in there, he replied that he wanted to brag about his apache.org email address. This is how I found out about CouchDB.

After several long discussions with Jan I finally wrapped my head around the document oriented concept. I was blown away, it was exactly what I would have liked to use on so many occasions at my one year internship at a geospatial company. Though CouchDB wasn't ready, I needed spatial indexing. One week later I had a first idea of how such an extension might look like.

And only 2 years later I'm really involved in CouchDB and people actually start using GeoCouch :) I'd like to use this blog post to thank the developers and the whole community, it's been a great time and the IRC channel just kicks ass. You all helped to make CouchDB 1.0 possible!

Categories: en, CouchDB

GeoCouch Vortrag in Augsburg

2010-07-07 22:35

Im Rahmen des Diplomandencolloquium des Lehrstuhl für Humangeographie und Geoinformatik halte ich am 19.07.2010 um 17:30 Uhr (Raum 2125) an der Uni Augsburg einen Votrag über GeoCouch. Der genaue Titel lautet:

GeoCouch: Eine Erweiterung für CouchDB zur Abfrage räumlicher Daten

Er richtet sich an Geographen, wird also nicht zu sehr ins Detail der Implementierung gehen. Es sind auch keine Vorkenntnisse zum Thema CouchDB nötig. Wer also mehr über CouchDB und GeoCouch wissen will, ist herzlich dazu eingeladen. Danach stehe ich natürlich zu Fragen zur Verfügung.

Ich habe keine Ahnung wie groß die CouchDB Community im Raum Augsburg ist, aber sollte jemand dieser Einladung folgen, spricht auch nichts gegen ein anschließendes kleines CouchDB/GeoCouch/NoSQL Meetup. Am besten meldet ihr euch bei mir per Mail, denn wenn ein paar Leute sicher kommen, werden es sich andere bestimmt auch überlegen.

Sorry Planet CouchDB for writing in German, but this is about a talk in German.

Categories: de, CouchDB, GeoCouch, Erlang, geo

Bolsena hacking event

2010-06-11 22:35

The OSGeo hacking event in Bolsena/Italy was great. Many interesting people sitting the whole day in front of their laptops surrounded by a beautiful scenery and nice warm sunny weather. It gets even better when you get meat for lunch and dinner.

I had the chance to tell people a bit more about CouchDB and Couchapps,

One project I haven't heard that much before of was Degree. They build the whole stack of OGC services you could imagine. For me it was of interest that they have a blob storage in their upcoming 3.0 release. The data isn't flattened into SQL tables but stored as blobs. This sounds like good use for a CouchDB backend in the future.

I was working with Simon Pigot on a GeoNetwork re-implementation based on CouchDB using Couchapp. We got the basic stuff like putting an XML document into the database, editing it and returning the new document, as well as fulltext indexing with couchdb-lucene work. Next steps are improving the JSON to XML mapping and integrating spatial search based on GeoCouch.

The event was really enjoyable, thanks Couchio for sponsoring the trip, thanks Jeroen for organizing it, and thanks all other hackers that made it such a awesome event. Hope to see you next year!

Categories: en, CouchDB, JavaScript, geo

FOSS4G 2010: I'm speaking

2010-05-21 22:35

I did it! I'll speak at the FOSS4G Conference 2010 (Free and Open Source Software for Geospatial Conference), 6th–9th September in Barcelona about “GeoCouch: A spatial index for CouchDB”. As soon as the abstract is available online I'll link to it. Hope to see you there!

Categories: en, GeoCouch, CouchDB, Erlang, geo

Non-validating WKT parser for Erlang

2010-05-14 22:35

The upcoming OpenSearch Geo specification will add support for querying with WKT (Well-Known Text). As I plan to support this specification in GeoCouch, I was in need of a WKT parser written in Erlang. I tried several ways to write this parser, but I ended up with writing it manually, based on the ideas of the fabulous MochiWeb JSON2 Parser

The parser is meant for fast parsing, it is non-validating. This means that it parses only valid WKT and all other strings that seem to be valid, but are not. The grammar is simplified to (in EBNF as used for the XML spec):

wkt ::= item | string  '(' space* item (comma item)* ')'
item ::= string (geom | list | nested_list | item | 'EMPTY')
nested_list ::= space* '(' list (comma list)* ')' | '(' nested_list+ ')'
list ::= '(' geom (comma geom)* ')'
geom ::= space* '(' coord (comma coord)* ')'
coord ::= space* number (space+ number)*
number ::= integer | float 
integer ::=  ('-' | '+')? [0-9]+
float ::= ('-' | '+')? [0-9]+ '.' [0-9]+ exponent?
exponent = 'E' ('-' | '+')? [0-9]+
string ::= [a-zA-Z]+ (space* [a-zA-Z])*
space :== #x20
comma :== ',' space*

I hope I got the grammar right, leave a comment if not. This means also strings like this(is(10 20), a test EMPTY) would be parsed to:

{this,[{is,[{10,20}]},{'a test',[]}]}

A validating parser would be much slower as it would also need to perform checks on the geometry, e.g. for polygons whether interiors are really within the exterior ring or not.

The general rule is, a list of coordinates is transformed to a tuple, a list of coordinates to a list. The geometry name will be an atom. Here's an example for a polygon:

wkt:parse("POLYGON ((102 103, 204 205, 306 107, 102 103),
                    (12 13, 24 25, 36 17, 12 13),
                    (62 63, 74 75, 86 67, 62 63))").
{polygon,[[{102,103},{204,205},{306,107},{102,103}],
          [{12,13},{24,25},{36,17},{12,13}],
          [{62,63},{74,75},{86,67},{62,63}]]}

In case you're getting excited now, the source is available at Github, realeased under the MIT License.

If someone plans to write a validating WKT parser for Erlang (please let me know), I propose using neotoma it's really a nice "packrat parser-generator for Erlang for Parsing Expression Grammars (PEGs)".

Categories: en, GeoCouch, Erlang, geo

GeoCouch: The future is now

2010-05-03 22:35

Update: This blog entry is outdated and kepts for historical reasons. Please do always check for newer blog posts. The up to date information on how to install and use GeoCouch can be found in its README.

An idea has become reality. Exactly two years after the blog post with the initial vision, a new version of GeoCouch is finished. It's a huge step forward. The first time the dependencies were narrowed down to CouchDB itself. No Python, no SpatiaLite any longer, it's pure Erlang. GeoCouch is tightly integrated with CouchDB, so you'll get all the nice features you love about CouchDB.

Current implementation

Thanks to the feedback after the FOSS4G 2009 and "GeoCouch: The future" blog entry" it was clear that people prefer a simple, yet powerful and tightly integrated approach, rather than having to many external dependencies (which was a showstopper for quite a few people).

I implemented an R-tree (I call it vtree as the implementation is subject to change a lot) from scratch. The reason why I haven't used the already existing R-Tree implementation available at Github is that I needed something to learn Erlang, it doesn't contain test or examples and that it is always a good idea to implement a data structure yourself to understand the details/problems. My implementation is far from being perfect but works good enough for now. The vtree is implemented as an append-only data structure just as CouchDB's B-trees are. Currently it doesn't support bulk insertion.

If you want to know details on how to create your own indexer, have a look at my Indexer tutorial.

Feature set

Following the "Release early, release often" philosophy currently only points can be inserted, the only supported query is a bounding box search. Though other geometries should follow soon.

Using GeoCouch

GeoCouch is now hosted at Github. Giving GeoCouch a go is easy:

git clone http://github.com/vmx/couchdb.git
cd couchdb
./bootstrap
./configure
make dev
./utils/run

To try the spatial features when it's up and running is easy as well. Just add a spatial property and a named function to your Design Document as you would to for show or list functions:

function(doc) {
    if (doc.loc) {
        emit(doc._id, {
            type: "Point",
            coordinates: [doc.loc[0], doc.loc[1]]
        });
    }
};

All you need to do is emitting GeoJSON as the value (Remember that point is the only supported geometry at the moment), the key is currently ignored.

curl -X PUT http://127.0.0.1:5984/places
curl -X PUT -d '{"spatial":{"points":"function(doc) {\n    if (doc.loc) {\n        emit(doc._id, {\n            type: \"Point\",\n            coordinates: [doc.loc[0], doc.loc[1]]\n        });\n    }};"}}' http://127.0.0.1:5984/places/_design/main

Before a bounding box query can return anything, you need to insert Documents that contain a location.

curl -X PUT -d '{"loc": [-122.270833, 37.804444]}' http://127.0.0.1:5984/places/oakland
curl -X PUT -d '{"loc": [10.898333, 48.371667]}' http://127.0.0.1:5984/places/augsburg

And finally you can make a bounding box request:

curl -X GET 'http://localhost:5984/places/_design/main/_spatial/points/%5B0,0,180,90%5D'

This one should return only augsburg:

{"query1":[{"id":"augsburg","loc":[10.898333,48.371667]}]}

Next steps

The development of GeoCouch was quite slow in the past, but it gets up to speed as my diploma thesis (comparable to a master's thesis) will be about GeoCouch. Additionally Couchio kindly supports the development.

The next steps are (in no particular order):

  • Better R-tree (better splitting algorithm, bulk operations)
  • Supporting more geometries
  • Polygon search
  • Improving CouchDB's plugin capabilities

Thanks

I'd like to thank all the people that kept me motivated over the past two years with their tremendous feedback. Special thanks go to Jan Lehnardt for getting me onto the Couch, Cameron Shorter for introducing me into the geospatial open source business and all people from Couchio for the great two weeks in Oakland.

Categories: en, CouchDB, Python, Erlang, geo

Processing PDF files: Auto advance

2010-02-23 22:35

Sometimes you need a PDF file that auto advances (auto flip, slide show) pages after a certain amount of seconds. For example for presenting a Lightning Talk the Ingnite way. There are several ways to achieve this. Today I've spent hours to find the best way.

You could just hope that your favourite PDF viewer supports changing slides automatically in a certain interval (Evince doesn't). But you never know which viewer will be used when you rely on other people's computers. The next step is obvious, try to get the PDF file itself to auto advance. It is possible as Adobe Acrobat supports such a setting (it seems that even Acrobat Reader does, though I can't find that option in my one under Linux), I just need to find out how.

After some further research I found out that Latex' hyperref package supports it as well (no, I don't speak Czech). So I made some minimal Latex Beamer presentation to give it a try. The important notice that the \hypersetup{pdfpageduration=n} must be the first item within a \begin{frame} was found in some presentation guidelines. Guess what? It even works with Evince (tex file, PDF file).

I'm getting closer. Though my problem is that I create my slides with Inkscape (resp. Inkscape Slide), so I can't really user Latex Beamer for it. But the previously mentioned presentation guidelines also mention the /Dur entry in the PDF page object. So it should be easy to add it manually. And it really is. A quick search through the PDF file generated by Latex you can see that /Dur occurs a close to /MediaBox. After adding those /Dur 2 to my original presentation PDF file right after \MediaBox it auto flipped every 2 seconds.

I could have written a simple script that adds it to the PDF at the right place, but that sounds pretty fragile. A better approach would be to use a PDF library that is meant for manipulating PDF files. As my favourite programming language is Python at the moment, I came across pyPdf. A quick look at the internals showed that it contains everything I need.

Here's my final solution for the problem of creating auto advancing PDF slides. A small script that does exactly what I need (and not more). I've used the Python 3 version of pyPdf, but the script should look similar for Python 2.x.

#!/usr/bin/env python3.1
# Copyright (c) 2010 Volker Mische (http://vmx.cx/)
# Licensed under MIT.

import sys
from pyPdf import PdfFileWriter, PdfFileReader
from pyPdf.generic import NameObject, NumberObject

def main(argv=None):
    if argv is None:
        argv = sys.argv

    if len(argv) != 4:
        print('Usage: setduration.py [duration-in-seconds] [input-pdf]',
              '[output-pdf]')
        return

    pdfin = PdfFileReader(open(argv[2], "rb"))
    pdfout = PdfFileWriter()

    for page in pdfin.pages:
        page[NameObject('/Dur')] = NumberObject(argv[1])
        pdfout.addPage(page)

    outputStream = open(argv[3], "wb")
    pdfout.write(outputStream)

if __name__ == '__main__':
    sys.exit(main())

Categories: en, Python

GeoCouch: The future

2009-12-20 22:35

GeoCouch started as a proof of concept and was heavily rewritten for the 0.10 release. As more and more people got interested, I got feedback to see what people really want/need. And now it's time to determine the future of GeoCouch. It's your chance to shape the future. In this blog entry I'll explain my ideas for the future, but I'm more than happy to get further ideas/complains from you. So please check if my ideas match your use-cases for GeoCouch.

Stripping it down

GeoCouch needs an external spatial index, at the moment I use SpatiaLite for it, but a PostGIS backend would be easily possible. My inital idea was that it is better to use the existing power of spatial databases, rather than reinventing the wheel. I though I could use all the power they have, that I can even use them for complex analytics, but I can't. As I only store the geometries, I need to “ask” CouchDB for the attributes (no, I don't want to store attributes in my spatial index).

If I don't use the full power of the spatial databases, but only a small fraction, there might be better solution. Therefore I propose that GeoCouch will use a simple spatial index for storing the geometries, not a full blown spatial database. I haven't decided yet which one it'll be, but I really think about moving this part to Erlang (I know that quite a few people would love that move).

You will loose functionality like reprojection. The spatial index won't know anything about projections. So GeoCouch won't be projection aware anymore, but you application still can be. For example if you want to return your data in a different projection than it was stored, you do the transformation after you've queried GeoCouch.

You would also loose fancy things for geometries, like boolean operations on them. But this is something I'd call complex analytics, and not simple querying.

GeoCouch would only support three simple queries: bounding search, polygon search and radius/distance search. If the search would be within a union of polygons, let's say all countries of the European Union, you would simply make the union operation before you query GeoCouch.

Complex analytics

What I call “complex analytics” is things like: “return all apple trees that are located with a 10km range around buildings that have are over 100m high, but only in countries with a population over 50 million people” is not possible with GeoCouch as you would need the attribute values as well. Those are stored in CouchDB, so you would need to request them. What GeoCouch only supports is a simple: give me all IDs within a bounding box/polygon/radius.

Conclusion

Simple requests are needed for everyday use, thus they should be incredibly fast. Complex analytics don't necessarily need to handle thousands of requests per second, in most cases they don't even need to be processed in real-time. I'd like to see some layer build above GeoCouch, so CouchDB can even be used for analytics (which is a thing I wanted to have right from the start).

This means that GeoCouch will be mainly for high performance and massive sized projects that need some simple spatial bits, what I think the majority of users need.

If you either think you really need only those simple queries, but you want them to be fast, or you think this is wrong, that you need dynamic reprojection I can only invite you to leave a comment below or drop a mail to volker.mische@gmail.com. Thanks.

Categories: en, CouchDB, Python, geo

FOSS4G 2009: “Geodata and CouchDB” presentation is online

2009-11-17 22:35

The final wrap-up of the FOSS4G 2009, my presentation on “Geodata and CouchDB” is available online in several formats. It should also be of interest for people who are new to CouchDB as huge parts of the talk are an introduction into CouchDB.

Categories: en, CouchDB, Python, geo

Drag as long as you want

2009-11-11 22:35

It has been a very long outstanding bug (officially it was a missing feature) in OpenLayers that annoyed me from the first time I’ve been using OpenLayers. I’m talking about ticket #39: “Allow pan-dragging while outside map until mouseup”.

Normally when you drag the map in OpenLayers it will stop dragging as soon as you hit the edge of the map viewport (the div that contains the map). Whenever you have a small map, but a huge window and a loooong way to drag, it can get quite annoying, as the maximum distance you can drag at once is the size of that viewport.

But yesterday it finally happend. A patch to fix it landed in trunk. A first rough cut was made at the OpenLayers code sprint at the FOSS4G. Andreas Hocevar reviewed the code and made a more unobtrusive version of it (thanks, again).

Try these two examples to see the difference. Click on the map an drag it a long way to the right and back to the left again (you might need to zoom it a bit to see the full effect):

As it is a new feature, it isn’t enabled by default (and only available on current SVN trunk, it will be available in OpenLayers 2.9). To enable it on your map, just use the following code to add the documentDrag parameter to the DragPan control (you obviously need a recent SVN checkout).

Update (2009-11-18): It got even easier with r9805:

// Use default controls but with documentDrag enabled.
var controls = [
    new OpenLayers.Control.Navigation({documentDrag: true}),
    new OpenLayers.Control.PanZoom(),
    new OpenLayers.Control.ArgParser(),
    new OpenLayers.Control.Attribution()]
map = new OpenLayers.Map('map', {controls: controls});

For a full working version have a look at the source of the documentDrag example.

Categories: en, OpenLayers, JavaScript, geo

By Volker Mische

Powered by Kukkaisvoima version 7