GeoCouch: The future
2009-12-20 22:35
GeoCouch started as a proof of concept and was heavily rewritten for the 0.10 release. As more and more people got interested, I got feedback to see what people really want/need. And now it's time to determine the future of GeoCouch. It's your chance to shape the future. In this blog entry I'll explain my ideas for the future, but I'm more than happy to get further ideas/complains from you. So please check if my ideas match your use-cases for GeoCouch.
Stripping it down
GeoCouch needs an external spatial index, at the moment I use SpatiaLite for it, but a PostGIS backend would be easily possible. My inital idea was that it is better to use the existing power of spatial databases, rather than reinventing the wheel. I though I could use all the power they have, that I can even use them for complex analytics, but I can't. As I only store the geometries, I need to “ask” CouchDB for the attributes (no, I don't want to store attributes in my spatial index).
If I don't use the full power of the spatial databases, but only a small fraction, there might be better solution. Therefore I propose that GeoCouch will use a simple spatial index for storing the geometries, not a full blown spatial database. I haven't decided yet which one it'll be, but I really think about moving this part to Erlang (I know that quite a few people would love that move).
You will loose functionality like reprojection. The spatial index won't know anything about projections. So GeoCouch won't be projection aware anymore, but you application still can be. For example if you want to return your data in a different projection than it was stored, you do the transformation after you've queried GeoCouch.
You would also loose fancy things for geometries, like boolean operations on them. But this is something I'd call complex analytics, and not simple querying.
GeoCouch would only support three simple queries: bounding search, polygon search and radius/distance search. If the search would be within a union of polygons, let's say all countries of the European Union, you would simply make the union operation before you query GeoCouch.
Complex analytics
What I call “complex analytics” is things like: “return all apple trees that are located with a 10km range around buildings that have are over 100m high, but only in countries with a population over 50 million people” is not possible with GeoCouch as you would need the attribute values as well. Those are stored in CouchDB, so you would need to request them. What GeoCouch only supports is a simple: give me all IDs within a bounding box/polygon/radius.
Conclusion
Simple requests are needed for everyday use, thus they should be incredibly fast. Complex analytics don't necessarily need to handle thousands of requests per second, in most cases they don't even need to be processed in real-time. I'd like to see some layer build above GeoCouch, so CouchDB can even be used for analytics (which is a thing I wanted to have right from the start).
This means that GeoCouch will be mainly for high performance and massive sized projects that need some simple spatial bits, what I think the majority of users need.
If you either think you really need only those simple queries, but you want them to be fast, or you think this is wrong, that you need dynamic reprojection I can only invite you to leave a comment below or drop a mail to volker.mische@gmail.com. Thanks.
Comments
2009-12-20 19:00:01
i think you are on the right track. have done something similar (used sqlserver2008 for spatial indexing) and share your views.
Good luck with the Erlang Indexer!
2009-12-20 19:27:31
I'm agree with your point of you! I'm waiting for the erlang spatial indexer!
Good luck.
2009-12-21 10:40:11
Yes, this is the way to go!
And using Erlang will make a tight integration with couchdb possible, nice.
2009-12-21 13:39:33
I showed CouchDB, and potential for GIS extensions, to a bunch of academic GIS people at workshops last month. I'd intended to demonstrate GeoCouch, but stumbled over SpatiaLite and APSW build quirks. IMO, it would be better to avoid them as dependencies.
We're thinking similarly about the indexing and analytic roles. I blogged Friday about separating and layering them in the context of Google's maps data API, but it's also applicable to GeoCouch.
2009-12-21 23:05:02
@Mike, @ReLuc, @Søren: Thanks for your motivating "Go for it!" comments
@Sean: Sadly, a few people had problems with APSW and SpatiaLite. Your blog from Friday really fits in nicely (http://sgillies.net/blog/972/simple-and-reusable-spatial-queries/)
2009-12-26 03:36:48
I am the author of APSW and am dismayed to hear that people have had problems with it. Can anyone explain what the problems are so I can fix them? Is there anything I can do to make it easier?
Note that the recommended build instructions should work and work well. They amount to 'python setup.py fetch --all build --enable-all-extensions install test' which will get the latest SQLite, build, install and test it as one shot.
2009-12-26 10:48:54
Roger, I should have been more specific about my requirements. I was experimenting with GeoCouch in the context of virtualenv and zc.buildout and got hung up on the lack of an sdist. I was faced with a) writing an apsw-specific zc.buildout recipe (to execute a "fetch") or b) making my own sqlite-included sdist and solving the missing MANIFEST.in error that cropped up (setuptools fault?). In that context, the lack of an sdist seemed to be a quirk.
But yes, your recommended installation statement works fine in a virtualenv.
2009-12-26 17:46:56
Thanks for the APSW clarification. The funny thing is that APSW does include a very capable sdist and is how I generate the zip file that I ship. However you can't then use it again from that shipped 'sdist' due to missing MANIFEST.in.
I'll fix this for the next release.
2009-12-28 22:24:30
What do you think about "hB-pi* tree" instead of "Rtree" or even better - your particular couch can choose its index type.
2009-12-30 19:15:07
@Mike: I haven't heard of the "hB-pi* tree" tree before today. I'll have a look. I also like the idea of different underlying data structures for CouchDB, but that's a long way to go :)