vmx - the blllog.

About me

My name is Volker Mische and I'm an open source enthusiast and hacker. You can reach me via email, on Twitter (@vmx) or IRC (as vmx). Find me also on GitHub.

Australia: Getting home

2009-11-02 22:35

This one goes out to all the people that want some news from me. I’m finally back in Germany, but getting home wasn’t as easy as excpected. Now I know why you really should be at the airport 2-3 hours before your departure.

I have to admit I wasn’t too early at the airport, probably 2.5 hours prior to my departure, as everything on my previous flights went smoothly every time. But this time there was something different.

Originally I wanted to stay only for s week, but I decided to extend my stay to two weeks. This was about 7 weeks ago. Everything seemed to be alright, I got my new flight details via email (as I did for other flights as well).

Houston, we have a problem

“Sir, have you changed your booking recently?”

“Hmm, well, no, errr, no, sorry I changed it about 7 weeks ago.”

The conversation went on for a bit, with the conclusion that I neither have an e-ticket number, nor that I’m listed on the flight. There's only a booking with mine name for last week’s date. Excellent.

So I couldn’t check-in and the lady at the counter couldn’t do anything. I should contact my travel agency. Aaaaalright, it’s not a problem it’s 5:30 am in Berlin (where my travel agency is). If I can’t reach them I should go to the service point.

Dialing from Australia

I’ve always wondered, why international phone numbers start with a “+”, when I always need to call “00” instead of the plus anyway, why not appending it automatically.

I couldn’t call out from a call box. Whenever I was calling 0049 I got an error message. But luckily the madame at the service point told me that Australia has a different dial-out code (international call prefix) . It’s 0011.

So I was finally able to call the agency. Still plenty of time, 70 minutes to departure. And there was actually someone there. After another call and $20 spent, it was clear that it was the fault of another company that does the actual booking and that Qantas can’t access their stock of tickets.

A glimmer of hope

“You are already the third one today where the changes of the booking weren’t done properly”, the guy at the Qantas service point said. “We need to get you on the flight quickly, check-in closes soon”. Still 40 mins to go. So I got a new e-ticket and checked-in. That’s it for me, the Qantas people will try to find out what went wrong.

Getting through all the airport stuff seems to take longer when you are in a hurry, but I was right on time for the boarding. And obviously, I made it.

Lessons learned

Get your e-ticket number
Be early at the airport
If someone screws it up, you are likely to make it, though
Different countries have different international call prefixes

No comments

Categories: en, life

FOSS4G 2009: It was great

2009-10-25 22:35

The FOSS4G 2009 (Free and Open Source Software for Geospatial Conference) is over now, it was great. I've finally met many people that I've previously only chatted or discussed on mailing lists with.

Organisation and venue

The Sydney Convention & Exhibition Centre Darling Harbour really is an amazing venue and Arinex did a great job as well. We had good food, the technicians were keeping everything up and running, even the wireless internet didn't break down and performed well.

The Organising Committee did an excellent job (especially Mark), too. I exclude myself a bit, I was more the code monkey before the conference, rather than keeping that conference running smoothly. But because of that I had the chance to visit quite a few presentations.

Presentations

Probably the most favoured presentation was Paul Ramsey's Keynote speech. It was just incredibly insightful and entertaining (watch it at YouTube). it here.

There were to other excellent presentations as well. First the Mapping interviews with open source technologies by Chris McDowall. He is using a video projector and a Wii remote control in order to map locations people are pointing at during an interview (just watch this video to get a better idea).

And second the Visualising animal movements in ‘near’ real time by Ben Madin. It was about a project where they try to track the movements if cows in Southeast Asia. The idea is to place GSM transmitters in one of the cows' stomach to track their position. But they are facing problem like "How to get a GSM signal through 40cm of meat". Really interesting.

Geodata and CouchDB

So how did my talk go? I'm very happy with it. I haven't expected so much positive feedback and so many good conversations about CouchDB and GeoCouch afterwards and during the next days.

After show parties

After the talks it's time to socialise while having a few beers. It was again great, every single night.

One outstanding event was the Ignite Spatial on Wednesday. 10 high paced talks with 20 slides displayed 15 secs each. My favourite one was the Pie charts are evil talk by Glen Bell. Another result of the night is that I'll always think about short green skirts whenever someone is mentioning Google Wave.

The code sprint

I was code sprinting OpenLayers. It was well organised and we got some cool new stuff in. Sadly, I haven't reached my goal of fixing Ticket 39, but hopefully soon (or next year in Barcelona). But I was discussing with Roald de Wit and Andreas Hocevar the implementation details of the abstraction of the UI in OpenLayers (that idea was discussed in the Openlayers BOF).

Final words

Yes, it really was great. I hope to see you all again in Barcelona at the FOSS4G 2010.

No comments

Categories: en, geo

Benchmarking is not easy

2009-09-23 22:35

There are so many ways to have a play with CouchDB. This time I thought about using CouchDB as a TileCache storage. Sounds easy, so it was.

What is a tilecache

Everyone knows Google Maps and its small images, called tiles. Rendering those tiles for the whole world for every zoom level can be quite time consuming, therefore you can render them on demand and cache them once they are rendered. This is the business of a tilecache.

You can use the tilecache as a proxy to a remote tile server as well, that's what I did for this benchmark.

Coding

The implementation looks quite similar to the memcache one. I haven't implemented locking as I was just after something working, not a full-fledged backend.

When I finished coding, it was time to find out how it performs. That should be easy, as there's a tilecache_seeding script bundled with TileCache to fill the cache. So you fill the cache, then you switch the remote server off and test how long it takes if all requests are hits without any fails (i.e. all tiles are in your cache and don't need to be requested from a remote server).

The two contestants for the benchmark are the CouchDB backend and the one that stores the tiles directly on the filesystem.

Everyone loves numbers

We keep it simple and measure the time for seeding with time. How long will it take to request 780 tiles? The first number is the average (in seconds), the one in brackets the standard deviation.

Filesystem:

real 0.35 (0.04)
user 0.16 (0.02)
sys  0.05 (0.01)

CouchDB:

real 3.03 (0.18)
user 0.96 (0.05)
sys  0.21 (0.03)

Let's say CouchDB is 10 times slower that the file system based cache. Wow, CouchDB really sucks! Why would you use it as tile storage? Although you could:

easily store metadata with every tile, like a date when it should expire.
keep a history of tiles and show them as "travel through time layers" in your mapping application
easy replication to other servers

You just don't want such a slow hog. And those CouchDB people try to tell me that CouchDB would be fast. Pha!

Really??

You might already wonder, where the details are, the software version numbers, the specification of the system and all that stuff? These things are missing with a good reason. This benchmark just isn't right, even if I would add these details. The problem lies some layers deeper.

This benchmark is way to far away from a real-life usage. You would request much more tiles and not the same 780 ones with every run. When I was benchmarking the filesystem cache, all tiles were already in the system's cache, therefore it was that fast.

Simple solution: clear the system cache and run the tests again. Here are the results after as echo 3 > /proc/sys/vm/drop_caches


  Filesystem:
real 8.36 (0.71)
user 0.29 (0.04)
sys  0.18 (0.03)

  
  CouchDB:
real 6.64 (0.15)
user 1.13 (0.07)
sys  0.29 (0.06)


Wow, the CouchDB cache is faster than the filesystem cache. Too nice to be
true. The reason is easy: loading the CouchDB database file, thus one file
access on the disk, is way faster that 780 accesses.


Does it really matter?
Let's take the first benchmark, if CouchDB would be that much slower, but
isn't it perhaps fast enough? Even with those measures (ten times
slower than the filesystem cache) it would mean your cache can take 250
requests per second. Let's say a user requests 9 tiles per second it would be
about 25 users at the same time. With every user staying 2 minutes on the map
it would mean 18 000 users per day. Not bad.

Additionally you gain some nice things you won't have with other
caches (as outlined above). And if you really need more performance you could
always dump the tiles to the filesystem with a cron job.


Conclusion

  Benchmarking is not easy, but easy to get wrong.
  Slow might be fast enough.
  Read more about benchmarking on
Jan's
blog.

No comments
Categories:
en, 
CouchDB, 
Python, 
TileCache, 
geo



GeoCouch: New release (0.10.0)


2009-09-19
22:35


Notice: This blog post is outdated, please move on :)


It has been way to long since the initial release, but it’s finally there:
a new release of GeoCouch. For all first time visitors, GeoCouch is an
extension for CouchDB to support
geo-spatial queries like bounding box or polygon searches.

I keep this blog entry relatively short and only outline the highlights and
requirements for the new release as GeoCouch finally has a real home at
http://gitorious.org/geocouch/.
Feel free to contribute to the wiki or fork the source.


Highlights

  Many geometries
are
supported: points, lines, polygons (using Shapely).
  Queries are largely along the lines of the
OpenSearch-Geo
extension draft. Currently
supported are
bounding box and polygon searches.
  Adding new backends (in addition to SpatiaLite) is easily possible.


Requirements

  Linux 2.6.26
  CouchDB 0.10.0
  Python 2.6.0
  couchdb-python 0.6.x (0.6.0 doesn't work)
  Shapely 1.0.12
  APSW - Another Python SQLite Wrapper 3.5.9-r2
  SpatiaLite 2.3.1

Other versions might work.

Download
If you don’t like Git, you can
download GeoCouch 0.10.0
here.

9 Comments
Categories:
en, 
CouchDB, 
Python, 
geo




CouchDB: Returning all design documents with Python


2009-08-21
22:35

I just wanted to get all design documents of a
CouchDB database with
couchdb-python. I
couldn’t find any hints how to do it, it took longer to find out than expected.
Therefore this blog entry, perhaps I save someone a few minutes of research.


  
from couchdb.client import Server
couch_server = Server('http://localhost:5984/')
for designdoc in couch_server['yourdatabase']\
        .view('_all_docs', startkey='_design', endkey='_design0'):
    print 'designdoc: %s' % designdoc


Update: even simpler with slicing:

  
from couchdb.client import Server
couch_server = Server('http://localhost:5984/')
for designdoc in couch_server['yourdatabase']\
        .view('_all_docs')['_design':'_design0']:
    print 'designdoc: %s' % designdoc


No comments
Categories:
en, 
CouchDB, 
Python




FOSS4G 2009: I'm speaking


2009-07-21
22:35


  
    
  


I did it! I'll speak on the FOSS4G
Conference 2009 (Free and Open Source Software for Geospatial Conference),
20th–23rd October in Sydney about “CouchDB and Geodata”. More information
is available at the
official
website.

No comments
Categories:
en, 
CouchDB, 
geo




Poor man’s bounding box queries with CouchDB


2009-07-19
22:35


Several
people
store
geographical points within CouchDB and would like to make a
bounding box
query on them. This isn’t possible with plain CouchDB
_views. But there’s
light at the end of the tunnel. One solution will be
GeoCouch
(which can do a lot more than simple bounding box queries), once there’s a new
release, the other one is already there: you can use a the
list/show
API (Warning: the current wiki page (as at 2009-07-19) applies to CouchDB 0.9, I use the new 0.10 API).

You can either add a _list function as described in the
documentation or use my
futon-list
branch which includes an interface for easier _list function creation/editing.


Your data

The _list function needs to match your data, thus I expect documents with
a field named location which contains an array with the
coordinates. Here’s a simple example document:


  

{
   "_id": "00001aef7b72e90b991975ef2a7e1fa7",
   "_rev": "1-4063357886",
   "name": "Augsburg",
   "location": [
       10.898333,
       48.371667
   ],
   "some extra data": "Zirbelnuss"
}




The _list function

We aim at creating a _list function that returns the same response as a
normal _view would return, but filtered with a bounding box. Let’s start
with a _list function which returns the same results as plain _view (no
bounding box filtering, yet). The whitespaces of the output differ slightly.


  
function(head, req) {
    var row, sep = '\n';

    // Send the same Content-Type as CouchDB would
    if (req.headers.Accept.indexOf('application/json')!=-1)
      start({"headers":{"Content-Type" : "application/json"}});
    else
      start({"headers":{"Content-Type" : "text/plain"}});

    send('{"total_rows":' + head.total_rows +
         ',"offset":'+head.offset+',"rows":[');
    while (row = getRow()) {
        send(sep + toJSON(row));
        sep = ',\n';
    }
    return "\n]}";
};



The _list API allows to you add any arbitrary query string to the URL. In
our case that will be bbox=west,south,east,north (adapted from the
OpenSearch
Geo Extension). Parsing the bounding box is really easy. The query
parameters of the request are stored in the property req.query as
key/value pairs. Get the bounding box, split it into separate values and
compare it with the values of every row.


  
var row, location, bbox = req.query.bbox.split(',');
while (row = getRow()) {
    location = row.value.location;
    if (location[0]>bbox[0] && location[0]<bbox[2] &&
            location[1]>bbox[1] && location[1]<bbox[3]) {
        send(sep + toJSON(row));
        sep = ',\n';
    }
}

And finally we make sure that no error message is thrown when the
bbox query parameter is omitted. Here’s the final result:


  
function(head, req) {
    var row, bbox, location, sep = '\n';

    // Send the same Content-Type as CouchDB would
    if (req.headers.Accept.indexOf('application/json')!=-1)
      start({"headers":{"Content-Type" : "application/json"}});
    else
      start({"headers":{"Content-Type" : "text/plain"}});

    if (req.query.bbox)
        bbox = req.query.bbox.split(',');

    send('{"total_rows":' + head.total_rows +
         ',"offset":'+head.offset+',"rows":[');
    while (row = getRow()) {
        location = row.value.location;
        if (!bbox || (location[0]>bbox[0] && location[0]<bbox[2] &&
                      location[1]>bbox[1] && location[1]<bbox[3])) {
            send(sep + toJSON(row));
            sep = ',\n';
        }
    }
    return "\n]}";
};

An example how to access your _list function would be:
http://localhost:5984/geodata/_design/designdoc/_list/bbox/viewname?bbox=10,0,120,90&limit=10000

Now you should be able to filter any of your point clouds with a bounding
box. The performance should be alright for a reasonable number of points. A
usual use-case would something like displaying a few points on a map, where you
don’t want to see zillions of them anyway.

Stay tuned for a follow-up posting about displaying points with
OpenLayers.

4 Comments
Categories:
en, 
CouchDB, 
JavaScript, 
geo




List function editing in Futon


2009-07-19
22:35


Futon
is the graphical administration interface for
CouchDB. It’s nice and slick for
browsing and editing views, but there is one feature missing: you
can’t edit _list
functions in similar fashion. You need to edit them as JSON strings.

As I wanted to play a bit with _list, I’ve created a branch which implements
such an interface. Its usage should be quite self-explanatory. Just select a
_view, from there you can switch to the "List" tab to create or edit a _list
function.

You can get my
futon-list branch at
GitHub. Instead of using git, you can also download the share/wwww
directory (click on the download button within the
‘share/www’
directory) and unpack it over your current source.

In case you wonder why your _list function doesn’t work, the
API
has changed for CouchDB 0.10.



  
  Screenshot of the _list interface in Futon

No comments
Categories:
en, 
CouchDB, 
JavaScript




Paul van Dyk auf Street Parade 2009


2009-05-21
22:35

Hin und wieder schaue ich nach ob Paul
van Dyk mal wieder in meiner Nähe auflegt. Was musste ich diesmal zu meinem
Erstaunen feststellen? Er ist dieses Jahr
auf der Street Parade 2009
vertreten, sowohl mit einem Truck als auch auf der Main-Stage. Selbst wenn
es nur halb so gut wie 2007 wird, muss man da auf jeden Fall hin.

No comments
Categories:
de, 
Musik, 
Festival




CouchDB _mix branch: Intersection of _view and _external


2009-04-21
22:35

In CouchDB it’s possible to query an
external service
(I’ll call it _external from now on) which returns an HTTP response directly to
the client that made the request. Although this is already quite nice, it
wasn’t possible to combine such _external requests with a classical
_view.


The need for an intersection of _view and _external
Sometimes you’d like to exclude documents in a more dynamic fashion than a
CouchDB _view supports it. Examples would be
geospatial queries, a simple search like “exclude all
documents that don’t contain a certain string in the title” or even
fulltext searching. Therefore I’ve created a new handler called  “_mix”.


The problem
As _external already exists quite a long time, it was clear that I would
reuse the available functionality. The basic idea is simple: take all
documents from a _view and all from _external, intersect them and finally output
the result.
The problem is that CouchDB can be used for huge data sets, where you don’t
want to keep a complete _view in memory to perform an intersection. The goals
were:


  The output needs to be streamable
  Don’t keep all documents in memory
  Use the existing functionality


The implementation
Over the past few months I had lengthy discussions with
Paul Davis to find a suitable solution for the problem. We were
going through all our ideas over and over again. The way I’ve implemented it
now works for me so far, but it is definitely not the ultimate one and only
solution, it’s just some solution.

As most of the functionality already exists, the current API of _view and
_external is used. The difference is that it is POSTed as JSON to the mix handler instead of a GET request. Here’s an example with
curl:

curl -d '{"design": "designdoc", "view": {"name": "viewname", "query": {"limit": "11"}}, "external": {"name": "minimal", "query": {"bbox": "[23,42,46,89]"}, "include_docs": false}}' http://localhost:5984/yourdb/_mix


At the moment most of the code is just copy and pasted from
couch_httpd_view.erl and couch_httpd_external_* with some additional parsing of the POSTed JSON. The only new thing is that there’s an _external request before every document of a _view is outputted. This requests contains either the document ID or the whole document (if “include_docs” is set to “true”) and needs to return “true” if the document should be outputted (or resp. “false” if not).

I’ve included a sample _external script which excludes documents randomly (it can be found at src/contrib/minimal_external.py). To have a play
with it, you just need to enable _external and add that script. How to do that
can be found in the
CouchDB Wiki.


Get it
All you need to do to have some fun with it is checking out my
_mix branch at github.


Final words
And finally I’d like to thank
Paul Davis for his time to discuss the
issues with the intersection of _view and _external. Another “thank you” goes
out to Adam Groves, he discovered
a lot of annoyances with the parsing of the queries.

No comments
Categories:
en, 
CouchDB




Previous page
Next page



By Volker Mische
Powered by Kukkaisvoima version 7

About me

Categories

Archives

Houston, we have a problem

Dialing from Australia

A glimmer of hope

Lessons learned

Organisation and venue

Presentations

Geodata and CouchDB

After show parties

The code sprint

Final words

What is a tilecache

Coding

Everyone loves numbers

Really??

Does it really matter?

Conclusion

Highlights

Requirements

Download

Your data

The _list function

The need for an intersection of _view and _external

The problem

The implementation

Get it

Final words