the blllog.

CouchDB _mix branch: Intersection of _view and _external

2009-04-21 22:35

In CouchDB it’s possible to query an external service (I’ll call it _external from now on) which returns an HTTP response directly to the client that made the request. Although this is already quite nice, it wasn’t possible to combine such _external requests with a classical _view.

The need for an intersection of _view and _external

Sometimes you’d like to exclude documents in a more dynamic fashion than a CouchDB _view supports it. Examples would be geospatial queries, a simple search like “exclude all documents that don’t contain a certain string in the title” or even fulltext searching. Therefore I’ve created a new handler called “_mix”.

The problem

As _external already exists quite a long time, it was clear that I would reuse the available functionality. The basic idea is simple: take all documents from a _view and all from _external, intersect them and finally output the result.

The problem is that CouchDB can be used for huge data sets, where you don’t want to keep a complete _view in memory to perform an intersection. The goals were:

  • The output needs to be streamable
  • Don’t keep all documents in memory
  • Use the existing functionality

The implementation

Over the past few months I had lengthy discussions with Paul Davis to find a suitable solution for the problem. We were going through all our ideas over and over again. The way I’ve implemented it now works for me so far, but it is definitely not the ultimate one and only solution, it’s just some solution.

As most of the functionality already exists, the current API of _view and _external is used. The difference is that it is POSTed as JSON to the mix handler instead of a GET request. Here’s an example with curl:

curl -d '{"design": "designdoc", "view": {"name": "viewname", "query": {"limit": "11"}}, "external": {"name": "minimal", "query": {"bbox": "[23,42,46,89]"}, "include_docs": false}}' http://localhost:5984/yourdb/_mix

At the moment most of the code is just copy and pasted from couch_httpd_view.erl and couch_httpd_external_* with some additional parsing of the POSTed JSON. The only new thing is that there’s an _external request before every document of a _view is outputted. This requests contains either the document ID or the whole document (if “include_docs” is set to “true”) and needs to return “true” if the document should be outputted (or resp. “false” if not).

I’ve included a sample _external script which excludes documents randomly (it can be found at src/contrib/minimal_external.py). To have a play with it, you just need to enable _external and add that script. How to do that can be found in the CouchDB Wiki.

Get it

All you need to do to have some fun with it is checking out my _mix branch at github.

Final words

And finally I’d like to thank Paul Davis for his time to discuss the issues with the intersection of _view and _external. Another “thank you” goes out to Adam Groves, he discovered a lot of annoyances with the parsing of the queries.

Categories: en, CouchDB

Comments are closed after 14 days.

By Volker Mische

Powered by Kukkaisvoima version 7