<?xml version="1.0" encoding="utf-8"?>
<!-- generator="Kukkaisvoima version 7" -->
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
>
<channel>
<title>vmx: TileCache</title>
<link>https://vmx.cx/cgi-bin/blog/index.cgi</link>
<description>Blog of Volker Mische</description>
<pubDate>Wed, 23 Sep 2009 22:35:25 +0200</pubDate>
<lastBuildDate>Wed, 23 Sep 2009 22:35:25 +0200</lastBuildDate>
<generator>http://23.fi/kukkaisvoima/</generator>
<language>en</language>
<item>
<title>Benchmarking is not easy
</title>
<link>https://vmx.cx/cgi-bin/blog/index.cgi/benchmarking-is-not-easy%3A2009-09-23%3Aen%2CCouchDB%2CPython%2CTileCache%2Cgeo</link>
<comments>https://vmx.cx/cgi-bin/blog/index.cgi/benchmarking-is-not-easy%3A2009-09-23%3Aen%2CCouchDB%2CPython%2CTileCache%2Cgeo#comments</comments>
<pubDate>Wed, 23 Sep 2009 22:35:25 +0200</pubDate>
<dc:creator>Volker Mische</dc:creator>
<category>en</category>
<category>CouchDB</category>
<category>Python</category>
<category>TileCache</category>
<category>geo</category>
<guid isPermaLink="false">https://vmx.cx/cgi-bin/blog/index.cgi/benchmarking-is-not-easy%3A2009-09-23%3Aen%2CCouchDB%2CPython%2CTileCache%2Cgeo/</guid>
<description><![CDATA[ 
 [...]]]></description>
<content:encoded><![CDATA[

<p>There are so many ways to have a play with
<a href="http://couchdb.apache.org">CouchDB</a>. This time I thought about using
CouchDB as a <a href="http://tilecache.org/">TileCache</a> storage. 
Sounds easy, so it was.
</p>

<h3>What is a tilecache</h3>
<p>Everyone knows <a href="http://maps.google.com/">Google Maps</a> and its
small images, called <em>tiles</em>. Rendering those tiles for the whole world
for every zoom level can be quite time consuming, therefore you can render
them on demand and cache them once they are rendered. This is the business of
a tilecache.
</p>
<p>You can use the tilecache as a proxy to a remote tile server as well, that's
what I did for this benchmark.</p>

<h3>Coding</h3>
<p><a href="/blog/2009-09-23/Couchdb.py">The implementation</a> looks quite
similar to the
<a href="http://svn.tilecache.org/trunk/tilecache/TileCache/Caches/Memcached.py">memcache
one</a>. I haven't implemented locking as I was just after something working,
not a full-fledged backend.
</p>
<p>When I finished coding, it was time to find out how it performs. That should
be easy, as there's a tilecache_seeding script bundled with TileCache to fill
the cache. So you fill the cache, then you switch the remote server off and
test how long it takes if all requests are hits without any fails (i.e. all
tiles are in your cache and don't need to be requested from a remote server).
</p>
<p>The two contestants for the benchmark are the CouchDB backend and the one
that stores the tiles directly on the filesystem.</p>

<h3>Everyone loves numbers</h3>
<p>We keep it simple and measure the time for seeding with
<a href="http://www.gnu.org/software/time/">time</a>. How long will it take to
request 780 tiles? The first number is the average (in seconds), the one in
brackets the standard deviation.
</p>
<ul>
  <li><p>Filesystem:</p>
<pre>
real 0.35 (0.04)
user 0.16 (0.02)
sys  0.05 (0.01)
</pre>
  </li>
  <li><p>CouchDB:</p>
<pre>
real 3.03 (0.18)
user 0.96 (0.05)
sys  0.21 (0.03)
</pre>
  </li>
</ul>
<p>Let's say CouchDB is 10 times slower that the file system based cache. Wow,
CouchDB really sucks! Why would you use it as tile storage? Although you could:
</p>
<ul>
  <li>easily store metadata with every tile, like a date when it should
expire.</li>
  <li>keep a history of tiles and show them as "travel through time layers"
in your mapping application</li>
  <li>easy replication to other servers</li>
</ul>
<p>You just don't want such a slow hog. And those
<a href="http://wiki.apache.org/couchdb/People_on_the_Couch">CouchDB
people</a> try to tell me that CouchDB would be fast. Pha!</p>

<h3>Really??</h3>
<p>You might already wonder, where the details are, the software version
numbers, the specification of the system and all that stuff? These things are
missing with a good reason. This benchmark just isn't right, even if I would
add these details. The problem lies some layers deeper.
</p>
<p>This benchmark is way to far away from a real-life usage. You would request
much more tiles and not the same 780 ones with every run. When I was
benchmarking the filesystem cache, all tiles were already in the system's
cache, therefore it was <em>that</em> fast.
</p>
<p>Simple solution: clear the system cache and run the tests again. Here are
the results after as <code>echo 3 > /proc/sys/vm/drop_caches</drop>
<ul>
  <li><p>Filesystem:</p>
<pre>
real 8.36 (0.71)
user 0.29 (0.04)
sys  0.18 (0.03)
</pre>
  </li>
  <li><p>CouchDB:</p>
<pre>
real 6.64 (0.15)
user 1.13 (0.07)
sys  0.29 (0.06)
</pre>
  </li>
</ul>
<p>Wow, the CouchDB cache is faster than the filesystem cache. Too nice to be
true. The reason is easy: loading the CouchDB database file, thus one file
access on the disk, is way faster that 780 accesses.
</p>

<h3>Does it really matter?</h3>
<p>Let's take the first benchmark, if CouchDB would be that much slower, but
isn't it perhaps <em>fast enough</em>? Even with those measures (ten times
slower than the filesystem cache) it would mean your cache can take 250
requests per second. Let's say a user requests 9 tiles per second it would be
about 25 users at the same time. With every user staying 2 minutes on the map
it would mean 18&#160;000 users per day. Not bad.
</p>
<p>Additionally you gain some nice things you won't have with other
caches (as outlined above). And if you really need more performance you could
always dump the tiles to the filesystem with a cron job.
</p>

<h3>Conclusion</h3>
<ol>
  <li>Benchmarking is not easy, but easy to get wrong.</li>
  <li>Slow might be fast enough.</li>
  <li>Read more about benchmarking on
<a href="http://jan.prima.de/plok/archives/176-Caveats-of-Evaluating-Databases.html">Jan's
blog</a>.</li>
</ol>
]]></content:encoded>
<wfw:commentRss>https://vmx.cx/cgi-bin/blog/index.cgi/benchmarking-is-not-easy%3A2009-09-23%3Aen%2CCouchDB%2CPython%2CTileCache%2Cgeo/feed/</wfw:commentRss>
</item>
</channel>
</rss>
