Matadisco | vmx - the blllog.

About me

My name is Volker Mische and I'm an open source enthusiast and hacker. You can reach me via email, or Bluesky (@vmx.cx). Find me also on GitHub.

Atmospheric data portals reply

2026-04-02 23:02

This is a reply to David Gasquez’ blog post Atmospheric Data Portals. As there’s so much in it and much of it overlaps with future plans, I thought it makes sense to write a proper public reply instead of following up in a private conversation.

First of all, read his blog post and follow the many links, there is so much to discover.

One re-occurring thing in the documents linked from the “issues on the earlier stages of the Open Data pipeline” section is that for most portals a static site should be sufficient. I fully agree with that. When it’s done properly, an automated rebuild of some parts when new data is added should work well. These days even powerful client-sided search is possible.

It’s a bit off-topic, but David’s Barefoot Data Platforms page links to Maggie Appleton’s Home Cooked Software and Barefoot Developers talk linked. I highly recommend watching it, it was one of my favourite talks at the Local-first Conference 2024. I always wanted to blog about it, but never found the time.

But now to the concrete points David mentions. If anyone has ideas on how to make those things happen with Matadisco, please open issues on the main Matadisco repo.

Take inspiration from existing flexible standards like Data Package, Croissant, and GEO ones for the core fields. Start with the smallest shared lexicon while leaving room for specialized extensions (sidecars?).

I don’t think Matadisco should go into too much detail on specifying what the metadata should look like. Making one metadata standard to rule them all is destined to fail from my experience (ISO 19115/19139 anyone?). Though there might be a lowest common denominator, similar to what Standard.site is doing for long-form publishing. In order to find out what that looks like, I propose that individual communities start by specifying Lexicons for their own needs. This could be done through tags, which I’ve outlined in the Matadisco issue “Introducing tags for filtering and extension point”.

Split datasets from “snapshots”. Say, io.datonic.dataset holds long-term properties like description and points to io.datonic.dataset.release or io.datonic.dataset.snapshot, which point to the actual resources.

Some kind of hierarchical relationship would be useful. FROST, which Matadisco drew a lot of inspiration from, is centred around IceChunk, which also has the concept of snapshots. But I don’t think we should stop at the concept of snapshots. In my original demo, I scrape a STAC catalogue for Sentinel-2 imagery. Every new image is a new record. They are all part of the same STAC collection, so we could use a similar concept in Matadisco as well.

Add an optional DASL-CID field for resources so we “pin” the bytes.

Yes, that’s something @mosh is keen to have. It’s not only useful for pinning things to a specific version, but also to make it possible to verify that the data you received is the one you expected. It sounds trivial, but the problem would be where to put it. Do you only hash the metadata record it points to? Do you hash the data container (if there’s one)? Or each resource a metadata record points to?

Core lexicon should be as agnostic as possible!

As mentioned above, it might be out of scope for Matadisco and for now it’s left to the individual communities.

Bootstrap the catalog. There are many open indexes and organizations. Crawl them!

Indeed! My first two Matadisco producers are sentinel-to-atproto crawling Element 84’s Earth Search STAC catalogue and gdi-de-csw-to-atproto crawling the GeoNetwork instance of the official German geo metadata catalogue.

Integrate with external repositories. E.g., a service that creates JSON-LD files from the datasets it sees appearing on the Atmosphere so Google Datasets picks them up. The same cron job could push data into Hugging Face or any other tool that people are already using in their fields.

At first this would need to happen for each individual type of record, see the tags proposal above.

Convince and work with high quality organizations doing something like this! I’d definitely collaborate with source.coop for example.

That surely is the goal!

No comments

Categories: en, Matadisco, ATProto, geo

FOSSGIS 2026

2026-04-01 13:24

This is a short write-up on the FOSSGIS 2026 conference. It’s a German speaking conference on free and open source geographic information systems and OpenStreetMap. So maybe a blog post in English spreads the word even wider.

While being the biggest edition ever (1000 registrations on-site, 300 online) it was well run and organized as every year. It didn’t even feel larger than usual. The CCC video team streamed live and published the cut videos the same day in outstanding quality as always.

I split this post into two sections, one about interesting talks for the geo world in general and then follow up discussions on my Matadisco talk and ATProto in general.

Talks

I’ve spent most of my time in hallway chatting with people as this is what matters most to me when I’m attending a conference in person. Nonetheless I’ve still managed to see some excellent talks.

Panel discussion on digital sovereignty in the cloud

The conference started with a high-class panel discussion on digital sovereignty in the cloud. The public discussion on that topic is often centered around where servers are located. Though that doesn’t actually matter. US companies can be forced by their government to give access to the data independent of their physical location.

Other topics touched were best practices on switching from proprietary to open source systems.

Barrier-free travelling thanks to paid mappers

Public transport in Germany must be accessible to disabled individuals (reality is far away from that). For routing, you need the data basis for it. This talk gets into the details on how Baden-Württemberg, a federal state in south Germany, works on enabling barrier-free travelling. They decided to add that information of all their 1100 train stations directly to OpenStreetMap. In order to achieve the required high quality they’ve hired through a third party company several experienced mappers from the community.

I really like the idea that OpenStreetMap can now be used as source of truth for that data set. I hope other federal states follow this lead.

Routing talks

I’ve seen two talks about routing. The one about Valhalla routing engine with MapLibre Native was interesting because it was about a special case, where you want to re-route bus lines in case of construction. Although the resulting system is not open source, they’ve contributed upstream to Valhalla, to make it work well with MapLibre Native. Those contributions can be more valuable than a one time source code dump of forked repositories, just to call it open source.

Another one was about Real-time mobility analytics for disaster relief operations, which was interesting to see how routing is used in such cases. The limitations and how such systems really help on the ground.

Matadisco and ATProto

My talk on Matadisco was about the current status of metadata catalogues, the problems and how ATProto can make things better. What I should have made clearer is what Matadisco actually is. I didn’t make it clear that it’s just a schema/convention people would use to announce their data on ATProto. It could’ve been mistaken as a piece of software or a service. You would use Matadisco in order to implement something for your pipeline.

Nonetheless people got the idea and I had good conversations afterwards. I talked with Olivia Guyot about the possible ways on how to integrate Matadisco record publishing into GeoNetwork. With Christian Willmes about creating a portal for combining paleoenvironmental and archaeological data.

While chatting about ATProto at one of the social events Klaus Stein talked about how he would like a social network to be. Users would just put static files somewhere. I agree that having static webspace somewhere without any server component is not only cheap, but also the easiest to get. He is not bothered about other components being operated by other parties, e.g. for indexing. That kept me thinking how far ATProto is away from that. I’d like to build a prototype that is like a static site generator for ATProto records. It won’t be able to act as a full PDS, you would need a WebSocket connection to get the data to a relay. But there could be a minimal service operated by a third party that polls those static PDS for updates and forwards them to a relay.

No comments

Categories: en, Matadisco, ATProto, conference, geo

By Volker Mische

vmx