<?xml version="1.0" encoding="utf-8"?>
<!-- generator="Kukkaisvoima version 7" -->
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
>
<channel>
<title>vmx: CouchDB</title>
<link>http://vmx.cx/cgi-bin/blog/index.cgi</link>
<description>Blog of Volker Mische</description>
<pubDate>Sun, 20 Dec 2009 16:37:21 +0200</pubDate>
<lastBuildDate>Sun, 20 Dec 2009 16:37:21 +0200</lastBuildDate>
<generator>http://23.fi/kukkaisvoima/</generator>
<language>en</language>
<item>
<title>GeoCouch: The future
</title>
<link>http://vmx.cx/cgi-bin/blog/index.cgi/geocouch-the-future%3A2009-12-20%3Aen%2CCouchDB%2CPython%2Cgeo</link>
<comments>http://vmx.cx/cgi-bin/blog/index.cgi/geocouch-the-future%3A2009-12-20%3Aen%2CCouchDB%2CPython%2Cgeo#comments</comments>
<pubDate>Sun, 20 Dec 2009 16:37:21 +0200</pubDate>
<dc:creator>Volker Mische</dc:creator>
<category>en</category>
<category>CouchDB</category>
<category>Python</category>
<category>geo</category>
<guid isPermaLink="false">http://vmx.cx/cgi-bin/blog/index.cgi/geocouch-the-future%3A2009-12-20%3Aen%2CCouchDB%2CPython%2Cgeo/</guid>
<description><![CDATA[ 
 [...]]]></description>
<content:encoded><![CDATA[

<p><a href="http://gitorious.org/geocouch/">GeoCouch</a> started as a <a href="/cgi-bin/blog/index.cgi/geocouch-geospatial-queries-with-couchdb:2008-10-26:en,CouchDB,Python,geo">proof of concept</a> and was heavily rewritten for the <a href="/cgi-bin/blog/index.cgi/geocouch-new-release-0.10.0:2009-09-19:en,CouchDB,Python,geo">0.10 release</a>. As more and more people got interested, I got feedback to see what people really want/need. And now it's time to determine the future of GeoCouch. It's your chance to shape the future. In this blog entry I'll explain my ideas for the future, but I'm more than happy to get further ideas/complains from you. So please check if my ideas match your use-cases for GeoCouch.
</p>
<h3>Stripping it down</h3>
<p>GeoCouch needs an external spatial index, at the moment I use <a href="http://www.gaia-gis.it/spatialite/">SpatiaLite</a> for it, but a <a href="http://postgis.refractions.net/">PostGIS</a> backend would be easily possible. My inital idea was that it is better to use the existing power of spatial databases, rather than reinventing the wheel. I though I could use all the power they have, that I can even use them for complex analytics, but I can't. As I only store the geometries, I need to “ask” CouchDB for the attributes (no, I don't want to store attributes in my spatial index).
<!--This would be possible, but I'll explain the “analytics use-case” later.-->
</p>
<p>If I don't use the full power of the spatial databases, but only a small fraction, there might be better solution. Therefore I propose that GeoCouch will use a simple spatial index for storing the geometries, not a full blown spatial database. I haven't decided yet which one it'll be, but I really think about moving this part to Erlang (I know that quite a few people would love that move).
</p>
<p>You will loose functionality like reprojection. The spatial index won't know anything about projections. So GeoCouch won't be projection aware anymore, but you application still can be. For example if you want to return your data in a different projection than it was stored, you do the transformation after you've queried GeoCouch.
</p>
<p>You would also loose fancy things for geometries, like boolean operations on them. But this is something I'd call complex analytics, and not simple querying.
</p>
<p>GeoCouch would only support three simple queries: bounding search, polygon search and radius/distance search. If the search would be within a union of polygons, let's say all countries of the European Union, you would simply make the union operation before you query GeoCouch.
</p>

<h3>Complex analytics</h3>
<p>What I call “complex analytics” is things like: “return all apple trees that are located with a 10km range around buildings that have are over 100m high, but only in countries with a population over 50 million people” is not possible with GeoCouch as you would need the attribute values as well. Those are stored in CouchDB, so you would need to request them. What GeoCouch only supports is a simple: give me all IDs within a bounding box/polygon/radius.
</p>

<h3>Conclusion</h3>
<p>Simple requests are needed for everyday use, thus they should be incredibly fast. Complex analytics don't necessarily need to handle thousands of requests per second, in most cases they don't even need to be processed in real-time. I'd like to see some layer build above GeoCouch, so CouchDB can even be used for analytics (which is a thing I wanted to have right from the start).
</p>
<p>This means that GeoCouch will be mainly for high performance and massive sized projects that need some simple spatial bits, what I think the majority of users need.
</p>
<p>If you either think you really need only those simple queries, but you want them to be fast, or you think this is wrong, that you need dynamic reprojection I can only invite you to leave a comment below or drop a mail to <a href="mailto:volker.mische@gmail.com">volker.mische@gmail.com</a>. Thanks.
</p>
]]></content:encoded>
<wfw:commentRss>http://vmx.cx/cgi-bin/blog/index.cgi/geocouch-the-future%3A2009-12-20%3Aen%2CCouchDB%2CPython%2Cgeo/feed/</wfw:commentRss>
</item>
<item>
<title>FOSS4G 2009: “Geodata and CouchDB” presentation is online
</title>
<link>http://vmx.cx/cgi-bin/blog/index.cgi/foss4g-2009-presentation-is-online%3A2009-11-17%3Aen%2CCouchDB%2CPython%2Cgeo</link>
<comments>http://vmx.cx/cgi-bin/blog/index.cgi/foss4g-2009-presentation-is-online%3A2009-11-17%3Aen%2CCouchDB%2CPython%2Cgeo#comments</comments>
<pubDate>Tue, 17 Nov 2009 11:48:43 +0200</pubDate>
<dc:creator>Volker Mische</dc:creator>
<category>en</category>
<category>CouchDB</category>
<category>Python</category>
<category>geo</category>
<guid isPermaLink="false">http://vmx.cx/cgi-bin/blog/index.cgi/foss4g-2009-presentation-is-online%3A2009-11-17%3Aen%2CCouchDB%2CPython%2Cgeo/</guid>
<description><![CDATA[ 
 [...]]]></description>
<content:encoded><![CDATA[

<p>The final wrap-up of the <a href="http://2009.foss4g.org/">FOSS4G 2009</a>,
<a href="http://2009.foss4g.org/presentations/#presentation_78">my presentation
on “Geodata and CouchDB”</a> is available online in several formats. It should
also be of interest for people who are new to CouchDB as huge parts of the
talk are an introduction into CouchDB.
</p>
<ul>
  <li>The raw slides
<a href="/blog/2009-11-17/geodata-and-couchdb.pdf">as PDF</a> (licensed under
<a href="http://creativecommons.org/licenses/by/3.0/de/">CC-BY-3.0-de</a>).</li>
  <li>The slides with comments
<a href="/blog/2009-11-17/geodata-and-couchdb.htm">as HTML</a> (licensed under
<a href="http://creativecommons.org/licenses/by/3.0/de/">CC-BY-3.0-de</a>).</li>
  <li>The <a href="http://www.fosslc.org/drupal/node/595">slides with audio</a>
(<a href="http://blip.tv/file/2795979">or at blib.tv</a>). It’s the
recording of the actual talk at the conference</a>. Thanks
<a href="http://georaffe.org/">Alex</a> and
<a href="http://www.fosslc.org/">FOSSLC</a> for recording it (licensed under
<a href="http://creativecommons.org/licenses/by-sa/3.0/">CC-BY-SA-3.0</a>).</li>
</ul>
]]></content:encoded>
<wfw:commentRss>http://vmx.cx/cgi-bin/blog/index.cgi/foss4g-2009-presentation-is-online%3A2009-11-17%3Aen%2CCouchDB%2CPython%2Cgeo/feed/</wfw:commentRss>
</item>
<item>
<title>Benchmarking is not easy
</title>
<link>http://vmx.cx/cgi-bin/blog/index.cgi/benchmarking-is-not-easy%3A2009-09-23%3Aen%2CCouchDB%2CPython%2CTileCache%2Cgeo</link>
<comments>http://vmx.cx/cgi-bin/blog/index.cgi/benchmarking-is-not-easy%3A2009-09-23%3Aen%2CCouchDB%2CPython%2CTileCache%2Cgeo#comments</comments>
<pubDate>Wed, 23 Sep 2009 17:39:06 +0200</pubDate>
<dc:creator>Volker Mische</dc:creator>
<category>en</category>
<category>CouchDB</category>
<category>Python</category>
<category>TileCache</category>
<category>geo</category>
<guid isPermaLink="false">http://vmx.cx/cgi-bin/blog/index.cgi/benchmarking-is-not-easy%3A2009-09-23%3Aen%2CCouchDB%2CPython%2CTileCache%2Cgeo/</guid>
<description><![CDATA[ 
 [...]]]></description>
<content:encoded><![CDATA[

<p>There are so many ways to have a play with
<a href="http://couchdb.apache.org">CouchDB</a>. This time I thought about using
CouchDB as a <a href="http://tilecache.org/">TileCache</a> storage. 
Sounds easy, so it was.
</p>

<h3>What is a tilecache</h3>
<p>Everyone knows <a href="http://maps.google.com/">Google Maps</a> and its
small images, called <em>tiles</em>. Rendering those tiles for the whole world
for every zoom level can be quite time consuming, therefore you can render
them on demand and cache them once they are rendered. This is the business of
a tilecache.
</p>
<p>You can use the tilecache as a proxy to a remote tile server as well, that's
what I did for this benchmark.</p>

<h3>Coding</h3>
<p><a href="/blog/2009-09-23/Couchdb.py">The implementation</a> looks quite
similar to the
<a href="http://svn.tilecache.org/trunk/tilecache/TileCache/Caches/Memcached.py">memcache
one</a>. I haven't implemented locking as I was just after something working,
not a full-fledged backend.
</p>
<p>When I finished coding, it was time to find out how it performs. That should
be easy, as there's a tilecache_seeding script bundled with TileCache to fill
the cache. So you fill the cache, then you switch the remote server off and
test how long it takes if all requests are hits without any fails (i.e. all
tiles are in your cache and don't need to be requested from a remote server).
</p>
<p>The two contestants for the benchmark are the CouchDB backend and the one
that stores the tiles directly on the filesystem.</p>

<h3>Everyone loves numbers</h3>
<p>We keep it simple and measure the time for seeding with
<a href="http://www.gnu.org/software/time/">time</a>. How long will it take to
request 780 tiles? The first number is the average (in seconds), the one in
brackets the standard deviation.
</p>
<ul>
  <li><p>Filesystem:</p>
<pre>
real 0.35 (0.04)
user 0.16 (0.02)
sys  0.05 (0.01)
</pre>
  </li>
  <li><p>CouchDB:</p>
<pre>
real 3.03 (0.18)
user 0.96 (0.05)
sys  0.21 (0.03)
</pre>
  </li>
</ul>
<p>Let's say CouchDB is 10 times slower that the file system based cache. Wow,
CouchDB really sucks! Why would you use it as tile storage? Although you could:
</p>
<ul>
  <li>easily store metadata with every tile, like a date when it should
expire.</li>
  <li>keep a history of tiles and show them as "travel through time layers"
in your mapping application</li>
  <li>easy replication to other servers</li>
</ul>
<p>You just don't want such a slow hog. And those
<a href="http://wiki.apache.org/couchdb/People_on_the_Couch">CouchDB
people</a> try to tell me that CouchDB would be fast. Pha!</p>

<h3>Really??</h3>
<p>You might already wonder, where the details are, the software version
numbers, the specification of the system and all that stuff? These things are
missing with a good reason. This benchmark just isn't right, even if I would
add these details. The problem lies some layers deeper.
</p>
<p>This benchmark is way to far away from a real-life usage. You would request
much more tiles and not the same 780 ones with every run. When I was
benchmarking the filesystem cache, all tiles were already in the system's
cache, therefore it was <em>that</em> fast.
</p>
<p>Simple solution: clear the system cache and run the tests again. Here are
the results after as <code>echo 3 > /proc/sys/vm/drop_caches</drop>
<ul>
  <li><p>Filesystem:</p>
<pre>
real 8.36 (0.71)
user 0.29 (0.04)
sys  0.18 (0.03)
</pre>
  </li>
  <li><p>CouchDB:</p>
<pre>
real 6.64 (0.15)
user 1.13 (0.07)
sys  0.29 (0.06)
</pre>
  </li>
</ul>
<p>Wow, the CouchDB cache is faster than the filesystem cache. Too nice to be
true. The reason is easy: loading the CouchDB database file, thus one file
access on the disk, is way faster that 780 accesses.
</p>

<h3>Does it really matter?</h3>
<p>Let's take the first benchmark, if CouchDB would be that much slower, but
isn't it perhaps <em>fast enough</em>? Even with those measures (ten times
slower than the filesystem cache) it would mean your cache can take 250
requests per second. Let's say a user requests 9 tiles per second it would be
about 25 users at the same time. With every user staying 2 minutes on the map
it would mean 18&#160;000 users per day. Not bad.
</p>
<p>Additionally you gain some nice things you won't have with other
caches (as outlined above). And if you really need more performance you could
always dump the tiles to the filesystem with a cron job.
</p>

<h3>Conclusion</h3>
<ol>
  <li>Benchmarking is not easy, but easy to get wrong.</li>
  <li>Slow might be fast enough.</li>
  <li>Read more about benchmarking on
<a href="http://jan.prima.de/plok/archives/176-Caveats-of-Evaluating-Databases.html">Jan's
blog</a>.</li>
</ol>
]]></content:encoded>
<wfw:commentRss>http://vmx.cx/cgi-bin/blog/index.cgi/benchmarking-is-not-easy%3A2009-09-23%3Aen%2CCouchDB%2CPython%2CTileCache%2Cgeo/feed/</wfw:commentRss>
</item>
<item>
<title>GeoCouch: New release (0.10.0)
</title>
<link>http://vmx.cx/cgi-bin/blog/index.cgi/geocouch-new-release-0.10.0%3A2009-09-19%3Aen%2CCouchDB%2CPython%2Cgeo</link>
<comments>http://vmx.cx/cgi-bin/blog/index.cgi/geocouch-new-release-0.10.0%3A2009-09-19%3Aen%2CCouchDB%2CPython%2Cgeo#comments</comments>
<pubDate>Sat, 19 Sep 2009 14:26:45 +0200</pubDate>
<dc:creator>Volker Mische</dc:creator>
<category>en</category>
<category>CouchDB</category>
<category>Python</category>
<category>geo</category>
<guid isPermaLink="false">http://vmx.cx/cgi-bin/blog/index.cgi/geocouch-new-release-0.10.0%3A2009-09-19%3Aen%2CCouchDB%2CPython%2Cgeo/</guid>
<description><![CDATA[ 
 [...]]]></description>
<content:encoded><![CDATA[

<p>It has been way to long since the initial release, but it’s finally there:
a new release of GeoCouch. For all first time visitors, GeoCouch is an
extension for <a href="http://couchdb.apache.org/">CouchDB</a> to support
geo-spatial queries like bounding box or polygon searches.
</p>
<p>I keep this blog entry relatively short and only outline the highlights and
requirements for the new release as GeoCouch finally has a real home at
<a href="http://gitorious.org/geocouch/">http://gitorious.org/geocouch/</a>.
Feel free to contribute to the wiki or fork the source.
</p>

<h3>Highlights</h3>
<ul>
  <li>Many geometries
<a href="http://gitorious.org/geocouch/pages/GeometryDefinition">are
supported</a>: points, lines, polygons (using Shapely).</li>
  <li>Queries are largely along the lines of the
<a href="http://www.opensearch.org/Specifications/OpenSearch/Extensions/Geo/1.0/Draft_1">OpenSearch-Geo
extension draft</a>. Currently
<a href="http://gitorious.org/geocouch/pages/Queries">supported</a> are
bounding box and polygon searches.</li>
  <li>Adding new backends (in addition to SpatiaLite) is easily possible.</li>
</ul>

<h3>Requirements</h3>
<ul>
  <li><a href="http://www.kernel.org/">Linux 2.6.26</a></li>
  <li><a href="http://couchdb.apache.org/">CouchDB 0.10.0</a></li>
  <li><a href="http://www.python.org/">Python 2.6.0</a></li>
  <li><a href="http://code.google.com/p/couchdb-python/">couchdb-python 0.6.x (0.6.0 doesn't work)</a></li>
  <li><a href="http://trac.gispython.org/lab/wiki/Shapely">Shapely 1.0.12</a></li>
  <li><a href="http://code.google.com/p/apsw/">APSW - Another Python SQLite Wrapper 3.5.9-r2</a></li>
  <li><a href="http://www.gaia-gis.it/spatialite/">SpatiaLite 2.3.1</a></li>
</ul>
<p>Other versions might work.</p>

<h3>Download</h3>
<p>If you don’t like Git, you can
<a href="/geocouch/downloads/geocouch-0.10.0.tar.bz2">download GeoCouch 0.10.0
here</a>.
</p>
]]></content:encoded>
<wfw:commentRss>http://vmx.cx/cgi-bin/blog/index.cgi/geocouch-new-release-0.10.0%3A2009-09-19%3Aen%2CCouchDB%2CPython%2Cgeo/feed/</wfw:commentRss>
</item>
<item>
<title>CouchDB: Returning all design documents with Python
</title>
<link>http://vmx.cx/cgi-bin/blog/index.cgi/couchdb-all-design-docs%3A2009-08-21%3Aen%2CCouchDB%2CPython</link>
<comments>http://vmx.cx/cgi-bin/blog/index.cgi/couchdb-all-design-docs%3A2009-08-21%3Aen%2CCouchDB%2CPython#comments</comments>
<pubDate>Fri, 21 Aug 2009 20:57:16 +0200</pubDate>
<dc:creator>Volker Mische</dc:creator>
<category>en</category>
<category>CouchDB</category>
<category>Python</category>
<guid isPermaLink="false">http://vmx.cx/cgi-bin/blog/index.cgi/couchdb-all-design-docs%3A2009-08-21%3Aen%2CCouchDB%2CPython/</guid>
<description><![CDATA[ 
 [...]]]></description>
<content:encoded><![CDATA[

<p>I just wanted to get all design documents of a
<a href="http://couchdb.apache.org/">CouchDB</a> database with
<a href="http://code.google.com/p/couchdb-python/">couchdb-python</a>. I
couldn’t find any hints how to do it, it took longer to find out than expected.
Therefore this blog entry, perhaps I save someone a few minutes of research.
</p>
<p>
  <pre>
<code>from couchdb.client import Server
couch_server = Server('http://localhost:5984/')
for designdoc in couch_server['yourdatabase']\
        .view('_all_docs', startkey='_design', endkey='_design0'):
    print 'designdoc: %s' % designdoc
</code></pre>
</p>
<p><strong>Update:</strong> even simpler with slicing:</p>
<p>
  <pre>
<code>from couchdb.client import Server
couch_server = Server('http://localhost:5984/')
for designdoc in couch_server['yourdatabase']\
        .view('_all_docs')['_design':'_design0']:
    print 'designdoc: %s' % designdoc
</code></pre>
</p>
]]></content:encoded>
<wfw:commentRss>http://vmx.cx/cgi-bin/blog/index.cgi/couchdb-all-design-docs%3A2009-08-21%3Aen%2CCouchDB%2CPython/feed/</wfw:commentRss>
</item>
<item>
<title>FOSS4G 2009: I'm speaking
</title>
<link>http://vmx.cx/cgi-bin/blog/index.cgi/foss4g-2009-im-speaking%3A2009-07-21%3Aen%2CCouchDB%2Cgeo</link>
<comments>http://vmx.cx/cgi-bin/blog/index.cgi/foss4g-2009-im-speaking%3A2009-07-21%3Aen%2CCouchDB%2Cgeo#comments</comments>
<pubDate>Tue, 21 Jul 2009 19:05:34 +0200</pubDate>
<dc:creator>Volker Mische</dc:creator>
<category>en</category>
<category>CouchDB</category>
<category>geo</category>
<guid isPermaLink="false">http://vmx.cx/cgi-bin/blog/index.cgi/foss4g-2009-im-speaking%3A2009-07-21%3Aen%2CCouchDB%2Cgeo/</guid>
<description><![CDATA[ 
 [...]]]></description>
<content:encoded><![CDATA[

<div class="figure">
  <a href="http://2009.foss4g.org/presentations/#presentation_78">
    <img src="/blog/2009-07-21/logo_145x90_speaking.png" alt="FOSS4G 2009 - I'm speaking!" width="145" height="90" />
  </a>
</div>

<p>I did it! I'll speak on the <a href="http://2009.foss4g.org/">FOSS4G
Conference 2009</a> (Free and Open Source Software for Geospatial Conference),
20th–23rd October in Sydney about “CouchDB and Geodata”. More information
is available at the
<a href="http://2009.foss4g.org/presentations/#presentation_78">official
website</a>.
</p>
]]></content:encoded>
<wfw:commentRss>http://vmx.cx/cgi-bin/blog/index.cgi/foss4g-2009-im-speaking%3A2009-07-21%3Aen%2CCouchDB%2Cgeo/feed/</wfw:commentRss>
</item>
<item>
<title>Poor man’s bounding box queries with CouchDB
</title>
<link>http://vmx.cx/cgi-bin/blog/index.cgi/poor-mans-bounding-box-queries-with-couchdb%3A2009-07-19%3Aen%2CCouchDB%2CJavaScript%2Cgeo</link>
<comments>http://vmx.cx/cgi-bin/blog/index.cgi/poor-mans-bounding-box-queries-with-couchdb%3A2009-07-19%3Aen%2CCouchDB%2CJavaScript%2Cgeo#comments</comments>
<pubDate>Sun, 19 Jul 2009 23:55:29 +0200</pubDate>
<dc:creator>Volker Mische</dc:creator>
<category>en</category>
<category>CouchDB</category>
<category>JavaScript</category>
<category>geo</category>
<guid isPermaLink="false">http://vmx.cx/cgi-bin/blog/index.cgi/poor-mans-bounding-box-queries-with-couchdb%3A2009-07-19%3Aen%2CCouchDB%2CJavaScript%2Cgeo/</guid>
<description><![CDATA[ 
 [...]]]></description>
<content:encoded><![CDATA[

<p>
<a href="http://mail-archives.apache.org/mod_mbox/couchdb-user/200809.mbox/<gbg36q$n8g$1@ger.gmane.org>">Several</a>
<a href="http://mail-archives.apache.org/mod_mbox/couchdb-user/200903.mbox/<20090304111938.GC16406@banot.net>">people</a>
<a href="http://mail-archives.apache.org/mod_mbox/couchdb-user/200906.mbox/<2C1A5D65-F929-42B8-93CE-A9BB68C8D1DA@mac.com>">store</a>
geographical points within <a href="http://couchdb.apache.org/">CouchDB</a> and would like to make a
<a href="http://en.wikipedia.org/wiki/Minimum_bounding_rectangle">bounding box
query</a> on them. This isn’t possible with plain CouchDB
<a href="http://wiki.apache.org/couchdb/HTTP_view_API">_views</a>. But there’s
light at the end of the tunnel. One solution will be
<a href="geocouch-geospatial-queries-with-couchdb%3A2008-10-26%3Aen%2CCouchDB%2CPython%2Cgeo">GeoCouch</a>
(which can do a lot more than simple bounding box queries), once there’s a new
release, the other one is already there: you can use a the
<a href="http://wiki.apache.org/couchdb/Formatting_with_Show_and_List">list/show
API</a> (<strong>Warning</strong>: the current wiki page (as at 2009-07-19) applies to CouchDB 0.9, I use the new 0.10 API).
</p>
<p>You can either add a _list function as described in the
<a href="http://wiki.apache.org/couchdb/Formatting_with_Show_and_List">documentation</a> or use my
<a href="http://vmx.cx/cgi-bin/blog/index.cgi/list-function-editing-in-futon%3A2009-07-19%3Aen%2CCouchDB%2CJavaScript">futon-list
branch</a> which includes an interface for easier _list function creation/editing</a>.
</p>

<h3>Your data</h3>

<p>The _list function needs to match your data, thus I expect documents with
a field named <code>location</code> which contains an array with the
coordinates. Here’s a simple example document:
</p>
<p>
  <pre>
<code>
{
   "_id": "00001aef7b72e90b991975ef2a7e1fa7",
   "_rev": "1-4063357886",
   "name": "Augsburg",
   "location": [
       10.898333,
       48.371667
   ],
   "some extra data": "Zirbelnuss"
}
</code></pre>
</p>


<h3>The _list function</h3>

<p>We aim at creating a _list function that returns the same response as a
normal _view would return, but filtered with a bounding box. Let’s start
with a _list function which returns the same results as plain _view (no
bounding box filtering, yet). The whitespaces of the output differ slightly.
</p>
<p>
  <pre>
<code>function(head, req) {
    var row, sep = '\n';

    // Send the same Content-Type as CouchDB would
    if (req.headers.Accept.indexOf('application/json')!=-1)
      start({"headers":{"Content-Type" : "application/json"}});
    else
      start({"headers":{"Content-Type" : "text/plain"}});

    send('{"total_rows":' + head.total_rows +
         ',"offset":'+head.offset+',"rows":[');
    while (row = getRow()) {
        send(sep + toJSON(row));
        sep = ',\n';
    }
    return "\n]}";
};
</code></pre>
</p>

<p>The _list API allows to you add any arbitrary query string to the URL. In
our case that will be <code>bbox=west,south,east,north</code> (adapted from the
<a href="http://www.opensearch.org/Specifications/OpenSearch/Extensions/Geo/1.0/Draft_1">OpenSearch
Geo Extension</a>). Parsing the bounding box is really easy. The query
parameters of the request are stored in the property <code>req.query</code> as
key/value pairs. Get the bounding box, split it into separate values and
compare it with the values of every row.
</p>
<p>
  <pre>
<code>var row, location, bbox = req.query.bbox.split(',');
while (row = getRow()) {
    location = row.value.location;
    if (location[0]&gt;bbox[0] && location[0]&lt;bbox[2] &&
            location[1]&gt;bbox[1] && location[1]&lt;bbox[2]) {
        send(sep + toJSON(row));
        sep = ',\n';
    }
}</code></pre>
</p>
<p>And finally we make sure that no error message is thrown when the
<code>bbox</code> query parameter is omitted. Here’s the final result:
</p>
<p>
  <pre>
<code>function(head, req) {
    var row, bbox, location, sep = '\n';

    // Send the same Content-Type as CouchDB would
    if (req.headers.Accept.indexOf('application/json')!=-1)
      start({"headers":{"Content-Type" : "application/json"}});
    else
      start({"headers":{"Content-Type" : "text/plain"}});

    if (req.query.bbox)
        bbox = req.query.bbox.split(',');

    send('{"total_rows":' + head.total_rows +
         ',"offset":'+head.offset+',"rows":[');
    while (row = getRow()) {
        location = row.value.location;
        if (!bbox || (location[0]&gt;bbox[0] && location[0]&lt;bbox[2] &&
                      location[1]&gt;bbox[1] && location[1]&lt;bbox[2])) {
            send(sep + toJSON(row));
            sep = ',\n';
        }
    }
    return "\n]}";
};</code></pre>
</p>
<p>An example how to access your _list function would be:
<code>http://localhost:5984/geodata/_design/designdoc/_list/bbox/viewname?bbox=10,0,120,90&limit=10000</code>
</p>
<p>Now you should be able to filter any of your point clouds with a bounding
box. The performance should be alright for a reasonable number of points. A
usual use-case would something like displaying a few points on a map, where you
don’t want to see zillions of them anyway.
</p>
<p>Stay tuned for a follow-up posting about displaying points with
<a href="http://openlayers.org/">OpenLayers</a>.
</p>
]]></content:encoded>
<wfw:commentRss>http://vmx.cx/cgi-bin/blog/index.cgi/poor-mans-bounding-box-queries-with-couchdb%3A2009-07-19%3Aen%2CCouchDB%2CJavaScript%2Cgeo/feed/</wfw:commentRss>
</item>
<item>
<title>List function editing in Futon
</title>
<link>http://vmx.cx/cgi-bin/blog/index.cgi/list-function-editing-in-futon%3A2009-07-19%3Aen%2CCouchDB%2CJavaScript</link>
<comments>http://vmx.cx/cgi-bin/blog/index.cgi/list-function-editing-in-futon%3A2009-07-19%3Aen%2CCouchDB%2CJavaScript#comments</comments>
<pubDate>Sun, 19 Jul 2009 17:54:46 +0200</pubDate>
<dc:creator>Volker Mische</dc:creator>
<category>en</category>
<category>CouchDB</category>
<category>JavaScript</category>
<guid isPermaLink="false">http://vmx.cx/cgi-bin/blog/index.cgi/list-function-editing-in-futon%3A2009-07-19%3Aen%2CCouchDB%2CJavaScript/</guid>
<description><![CDATA[ 
 [...]]]></description>
<content:encoded><![CDATA[

<p>
<a href="http://wiki.apache.org/couchdb/Getting_started_with_Futon">Futon</a>
is the graphical administration interface for
<a href="http://couchdb.apache.org/">CouchDB</a>. It’s nice and slick for
browsing and editing views, but there is one feature missing: you
<a href="https://issues.apache.org/jira/browse/COUCHDB-417">can’t edit _list
functions in similar fashion</a>. You need to edit them as JSON strings.
</p>
<p>As I wanted to play a bit with _list, I’ve created a branch which implements
such an interface. Its usage should be quite self-explanatory. Just select a
_view, from there you can switch to the "List" tab to create or edit a _list
function.
</p>
<p>You can get my
<a href="http://github.com/vmx/couchdb/tree/futon-list">futon-list branch at
GitHub</a>. Instead of using git, you can also download the share/wwww
directory (click on the download button within the
<a href="http://github.com/vmx/couchdb/tree/futon-list/share/www">‘share/www’
directory</a>) and unpack it over your current source.
</p>
<p>In case you wonder why your _list function doesn’t work, the
<a href="http://mail-archives.apache.org/mod_mbox/couchdb-dev/200906.mbox/<e282921e0906032231s464bd8f3g6f7ad98e585114a2@mail.gmail.com>">API
has changed for CouchDB 0.10</a>.
</p>

<div class="figure">
  <a href="/blog/2009-07-19/list-interface_l.png"><img
      src="/blog/2009-07-19/list-interface.png" width="320" height="240"
      alt="Screenshot of the _list interface in Futon" /></a>
  <p class="caption">Screenshot of the _list interface in Futon</p>
</div>
]]></content:encoded>
<wfw:commentRss>http://vmx.cx/cgi-bin/blog/index.cgi/list-function-editing-in-futon%3A2009-07-19%3Aen%2CCouchDB%2CJavaScript/feed/</wfw:commentRss>
</item>
<item>
<title>CouchDB _mix branch: Intersection of _view and _external
</title>
<link>http://vmx.cx/cgi-bin/blog/index.cgi/couchdb-mix-branch-intersection-of-view-and-external%3A2009-04-21%3Aen%2CCouchDB</link>
<comments>http://vmx.cx/cgi-bin/blog/index.cgi/couchdb-mix-branch-intersection-of-view-and-external%3A2009-04-21%3Aen%2CCouchDB#comments</comments>
<pubDate>Tue, 21 Apr 2009 01:12:05 +0200</pubDate>
<dc:creator>Volker Mische</dc:creator>
<category>en</category>
<category>CouchDB</category>
<guid isPermaLink="false">http://vmx.cx/cgi-bin/blog/index.cgi/couchdb-mix-branch-intersection-of-view-and-external%3A2009-04-21%3Aen%2CCouchDB/</guid>
<description><![CDATA[ 
 [...]]]></description>
<content:encoded><![CDATA[

<p>In CouchDB it’s possible to query an
<a href="http://wiki.apache.org/couchdb/ExternalProcesses">external service</a>
(I’ll call it _external from now on) which returns an HTTP response directly to
the client that made the request. Although this is already quite nice, it
wasn’t possible to combine such _external requests with a classical
<a href="http://wiki.apache.org/couchdb/HTTP_view_API">_view</a>.
</p>

<h3>The need for an intersection of _view and _external</h3>
<p>Sometimes you’d like to exclude documents in a more dynamic fashion than a
CouchDB _view supports it. Examples would be
<a href="/cgi-bin/blog/index.cgi/geocouch-geospatial-queries-with-couchdb%3A2008-10-26%3Aen%2CCouchDB%2CPython%2Cgeo">geospatial queries</a>, a simple search like “exclude all
documents that don’t contain a certain string in the title” or even
fulltext searching. Therefore I’ve created a new handler called  “_mix”.
</p>

<h3>The problem</h3>
<p>As _external already exists quite a long time, it was clear that I would
reuse the available functionality. The basic idea is simple: take all
documents from a _view and all from _external, intersect them and finally output
the result.</p>
<p>The problem is that CouchDB can be used for huge data sets, where you don’t
want to keep a complete _view in memory to perform an intersection. The goals
were:
</p>
<ul>
  <li>The output needs to be streamable</li>
  <li>Don’t keep all documents in memory</li>
  <li>Use the existing functionality</li>
</ul>

<h3>The implementation</h3>
<p>Over the past few months I had lengthy discussions with
Paul Davis to find a suitable solution for the problem. We were
going through all our ideas over and over again. The way I’ve implemented it
now works for me so far, but it is definitely not <em>the ultimate one and only
solution</em>, it’s just <em>some</em> solution.
</p>
<p>As most of the functionality already exists, the current API of _view and
_external is used. The difference is that it is POSTed as JSON to the mix handler instead of a GET request. Here’s an example with
<a href="http://curl.haxx.se/">curl</a>:
</p>
<p><code>curl -d '{"design": "designdoc", "view": {"name": "viewname", "query": {"limit": "11"}}, "external": {"name": "minimal", "query": {"bbox": "[23,42,46,89]"}, "include_docs": false}}' http://localhost:5984/yourdb/_mix
</code>
</p>
<p>At the moment most of the code is just copy and pasted from
<code>couch_httpd_view.erl</code> and <code>couch_httpd_external_*</code> with some additional parsing of the POSTed JSON. The only new thing is that there’s an _external request before every document of a _view is outputted. This requests contains either the document ID or the whole document (if “<code>include_docs</code>” is set to “<code>true</code>”) and needs to return “<code>true</code>” if the document should be outputted (or resp. “<code>false</code>” if not).
</p>
<p>I’ve included a sample _external script which excludes documents randomly (it can be found at <code>src/contrib/minimal_external.py</code>). To have a play
with it, you just need to enable _external and add that script. How to do that
can be found in the
<a href="http://wiki.apache.org/couchdb/ExternalProcesses">CouchDB Wiki</a>.
</p>

<h3>Get it</h3>
<p>All you need to do to have some fun with it is checking out my
<a href="http://github.com/vmx/couchdb/tree/mix">_mix branch at github</a>.
</p>

<h3>Final words</h3>
<p>And finally I’d like to thank
<a href="http://www.davispj.com/">Paul Davis</a> for his time to discuss the
issues with the intersection of _view and _external. Another “thank you” goes
out to <a href="http://addywaddy.posterous.com/">Adam Groves</a>, he discovered
a lot of annoyances with the parsing of the queries.
</p>
]]></content:encoded>
<wfw:commentRss>http://vmx.cx/cgi-bin/blog/index.cgi/couchdb-mix-branch-intersection-of-view-and-external%3A2009-04-21%3Aen%2CCouchDB/feed/</wfw:commentRss>
</item>
<item>
<title>GeoCouch: Geospatial queries with CouchDB
</title>
<link>http://vmx.cx/cgi-bin/blog/index.cgi/geocouch-geospatial-queries-with-couchdb%3A2008-10-26%3Aen%2CCouchDB%2CPython%2Cgeo</link>
<comments>http://vmx.cx/cgi-bin/blog/index.cgi/geocouch-geospatial-queries-with-couchdb%3A2008-10-26%3Aen%2CCouchDB%2CPython%2Cgeo#comments</comments>
<pubDate>Sun, 26 Oct 2008 20:59:14 +0200</pubDate>
<dc:creator>Volker Mische</dc:creator>
<category>en</category>
<category>CouchDB</category>
<category>Python</category>
<category>geo</category>
<guid isPermaLink="false">http://vmx.cx/cgi-bin/blog/index.cgi/geocouch-geospatial-queries-with-couchdb%3A2008-10-26%3Aen%2CCouchDB%2CPython%2Cgeo/</guid>
<description><![CDATA[ 
 [...]]]></description>
<content:encoded><![CDATA[

<p><strong>Update</strong> (2009-09-19): There's a new GeoCouch release. More information at <a href="/cgi-bin/blog/index.cgi/geocouch-new-release-0.10.0:2009-09-19:en,CouchDB,Python,geo">GeoCouch: New release 0.10.0</a>.</p>

<p>After almost six months of silence I finally managed to get a prototype done
(thanks <a href="http://jan.prima.de/~jan/">Jan</a> for keeping me motivated).
</p>

<h3>What do you get?</h3>
<p>You get some code to play around with, to get a slight idea of how such a
geospatial extension for <a href="http://couchdb.org/">CouchDB</a> could look
like. The code base isn’t polished yet, but it’s good enough to get it out of
the door. The current version only supports one geometry type
(<code>POINT</code>), and one operation (a bounding box search).
</p>
<p>As CouchDB doesn’t allow an intersection of results gathered from an
external service, the result of the bounding box search will be plain text
document IDs and their coordinates.
</p>

<h3>How does it work?</h3>
<p>GeoCouch consists of two parts, the indexer and the query processor.
Both are connected through stdin/out with CouchDB.</p>

<h4>Indexer (geostore)</h4>
<p>In order to make the indexer understand which fields in the document contain
geometries, a special design document is needed. As soon as a database has such
a document, the database is <em>geo-enabled</em> and the indexer will store the
geometries in a spatial index, which is a
<a href="http://www.gaia-gis.it/spatialite/">SpatiaLite</a> database at the
moment</a>.
</p>
<p>Everytime a database in CouchDB is altered (create, delete, update) the
indexer gets notified and will act accordingly to keep the spatial index
up to date with CouchDB.
</p>

<h4>Query processor (geoquery)</h4>
<p>To process queries with an external service is possible with
<a href="http://www.davispj.com/">Paul Joseph Davis’</a> excellent
<a href="http://github.com/davisp/couchdb/tree/external2">
external2 CouchDB branch</a>. Queries to CouchDB can get passed along to an
external service.
</p>
<p>At the moment the result is the output of this service, it’s plain text in
our case. In the future the external service will only return document IDs
which will be passed back to the view. The result will be an intersection of
document IDs of the view and the document IDs the external service returned.
</p>

<h3>How do I use it?</h3>
<p>When everything is installed correctly it’s quite easy to get started.</p>

<h4>Setting things up</h4>
<ul>
  <li>Create a new database named <code>geodata</code> (could be anything).</li>
  <li>Add a document named <code>myhome</code>, there you’ll store all the information
of your home including the coordinates. As we are only interested in a bounding
box search it’s enough to have a location:
      <pre>
<code>{
  "_id": "myhome",
  "_rev": "3358484250",
  "location": [ 151.208333, -33.869444 ]
}</code></pre>
  </li>
  <li>Add as many other documents like this, make sure all of them have a field
called <code>location</code> with the coordinates as array. As for the database,
the name of the field could be anything, but has to be the same in all
documents.
  </li>
  <li>Now we come to the interesting part, the special design view that
<em>geo-enables</em> the database. The document has to be named
“<code>_design/_geocouch</code>”. After creating it also needs some special fields and
will look like this:
    <p>
      <pre>
<code>{
  "_id": "_design/_geocouch",
  "_rev": "610069068",
  "srid": 4326,
  "loc": {
    "type": "POINT",
    "x": "location[0]",
    "y": "location[1]"
  }
}</code></pre>
    </p>
    <p>The coordinate system that should be used is specified by an
<a href="http://en.wikipedia.org/wiki/SRID">SRID</a>. If you don’t know which
value to use for <code>srid</code>, use <code>4326</code>. It’s assumed that
all geometries in your document belong to the same coordinate system.
    </p>
    <p>The other field is the information where to find the geometry in the
documents. The name you choose will be used for the bounding box queries,
I’ve chosen <code>loc</code>. It defines the type (<code>POINT</code>), and
where to find the x/y coordinate (this will probably be changed to lat/lon in
the future).
    </p>
    <p>The way to specify where to find the field is comparable to XPath, but
much simpler. As JSON consists of nested dictionaries and arrays, you can get a
property within an array with the index (e.g. <code>location[0]</code> is the
first element in an Array called <code>location</code>). If it is a dictionary
you specify it separated by a dot (e.g. <code>location.x</code> is a property
named <code>x</code> within another one called <code>location</code>). It can
of course be nested much deeper, the path always starts at the root of the
document (e.g. <code>bike.stolen.found[0]</code>).
    </p>
  </li>
</ul>

<h4>Bounding box search</h4>
<p>And finally you can make a bounding box search. Simply browse a URL like
this one (this is a bounding box that encloses the whole world):
</p>
<p>
  <pre>
<code>http://localhost:5984/geodata/_external/geo?q={"geom":"loc","bbox":[-180,-90,180,90]}
</code></pre>
</p>
<p>The expected result is:</p>
<p>
  <pre>
<code>myhome 151.208333 -33.869444</code></pre>
</p>

<h3>Requirements</h3>
<p>You’d like to give it a try? Here is a list of the software and their versions
I used to get it work on my system, but others might work as well. GeoCouch
includes installation/configuration instructions.</p>
<ul>
  <li><a href="http://www.kernel.org/">Linux 2.6.26</a></li>
  <li><a href="http://www.python.org/">Python 2.5.2</a></li>
  <li><a href=http://code.google.com/p/apsw/">APSW - Another Python SQLite Wrapper 3.5.9-r2</a></li>
  <li><a href="http://www.gaia-gis.it/spatialite/">SpatiaLite 2.2</a></li>
  <li><a href="http://github.com/davisp/couchdb/tree/external2">
davisp’s external2 branch of CouchDB</a>
  </li>
</ul>

<h3>Download GeoCouch</h3>
<p>Get SpacialCouch now! It’s new, it’s free
(<a href="http://www.opensource.org/licenses/mit-license.php">MIT</a>
licensed).</p>
<ul>
  <li><a href="/blog/2008-10-26/geocouch-0.0.1.tar.bz2">
GeoCouch 0.0.1</a></li>
</ul>

<h3>What’s next?</h3>
<p>The current version is meant to play with, many things are not possible,
many things needs to be improved. But with the power of SpatiaLite (and the
underlying libraries) it shouldn’t be too hard.
</p>
<p>Therefore I hope this will only be start and will end up in a discussion
on what should be done, what other things might be possible. I’d love to
hear your use cases for a geospatially enabled CouchDB.</p>

]]></content:encoded>
<wfw:commentRss>http://vmx.cx/cgi-bin/blog/index.cgi/geocouch-geospatial-queries-with-couchdb%3A2008-10-26%3Aen%2CCouchDB%2CPython%2Cgeo/feed/</wfw:commentRss>
</item>
</channel>
</rss>
