the blllog.

Non-validating WKT parser for Erlang

2010-05-14 22:27

The upcoming OpenSearch Geo specification will add support for querying with WKT (Well-Known Text). As I plan to support this specification in GeoCouch, I was in need of a WKT parser written in Erlang. I tried several ways to write this parser, but I ended up with writing it manually, based on the ideas of the fabulous MochiWeb JSON2 Parser

The parser is meant for fast parsing, it is non-validating. This means that it parses only valid WKT and all other strings that seem to be valid, but are not. The grammar is simplified to (in EBNF as used for the XML spec):

wkt ::= item | string  '(' space* item (comma item)* ')'
item ::= string (geom | list | nested_list | item | 'EMPTY')
nested_list ::= space* '(' list (comma list)* ')' | '(' nested_list+ ')'
list ::= '(' geom (comma geom)* ')'
geom ::= space* '(' coord (comma coord)* ')'
coord ::= space* number (space+ number)*
number ::= integer | float 
integer ::=  ('-' | '+')? [0-9]+
float ::= ('-' | '+')? [0-9]+ '.' [0-9]+ exponent?
exponent = 'E' ('-' | '+')? [0-9]+
string ::= [a-zA-Z]+ (space* [a-zA-Z])*
space :== #x20
comma :== ',' space*

I hope I got the grammar right, leave a comment if not. This means also strings like this(is(10 20), a test EMPTY) would be parsed to:

{this,[{is,[{10,20}]},{'a test',[]}]}

A validating parser would be much slower as it would also need to perform checks on the geometry, e.g. for polygons whether interiors are really within the exterior ring or not.

The general rule is, a list of coordinates is transformed to a tuple, a list of coordinates to a list. The geometry name will be an atom. Here's an example for a polygon:

wkt:parse("POLYGON ((102 103, 204 205, 306 107, 102 103),
                    (12 13, 24 25, 36 17, 12 13),
                    (62 63, 74 75, 86 67, 62 63))").

In case you're getting excited now, the source is available at Github, realeased under the MIT License.

If someone plans to write a validating WKT parser for Erlang (please let me know), I propose using neotoma it's really a nice "packrat parser-generator for Erlang for Parsing Expression Grammars (PEGs)".

Categories: en, GeoCouch, Erlang, geo

Comments are closed after 14 days.

By Volker Mische

Powered by Kukkaisvoima version 7