URLs for Data: The Key to Scalable Data Marketplaces

Organisations looking to create data products and data marketplaces would do well to consider the rules that apply to all marketplaces. All functioning markets require standards. In fact, it's the agreement on shared standards that allows markets to operate in a coordinated way.

When it comes to data marketplaces, one community has been thinking about decentralised data standards for a very long time. We’ve gone by several names: the Semantic Web, then Linked Data, and today, we call them Knowledge Graphs.

People used to say the Semantic Web was complex - but that’s not quite right. The real issue is that decentralised data integration is complex, and the Linked Data approach is actually quite simple. It boils down to two key aspects: a common vocabulary and universal identifiers.

Today, I’d like to focus on the second: universal identifiers.

All markets need a way to identify the products they trade. Think about how this works elsewhere: we have ISINs for financial instruments, ISBNs for books, license plates for cars, and barcodes in supermarkets. Data is no different - we need a unique way to identify each data item.

The Semantic Web takes an elegant approach. It simply reuses the same mechanism we already use for identification on the web. Every document on the web gets a unique URL, and we apply that same principle to data. Each data item or node in the graph gets its own URL.

I’m deliberately using the term URL here, for those who distinguish between URLs, URIs, and IRIs, because when done properly, a URL is resolvable. That means I can take the URL of a data item, paste it into a web browser, or `curl` it from a machine, and - assuming I have the right permissions - it will resolve and return data.

So, think about it - there really aren’t many other practical ways to assign a unique identifier to every data item in your organisation. You could try setting up a central GUID service, and that might work well in a brand-new organisation. But for any organisation with an existing, complex data estate, remapping IDs across the board would be a massive and unnecessary undertaking.

That’s where URLs shine. By assigning each data item a URL, you can delegate responsibility in a scalable way. Your existing data estate can remain exactly as it is - you simply skim a web layer over the top. Each application or domain can manage its own web address - its own namespace - and define identifiers beneath that.

The structure is flexible: the URL path can reference the table where the data resides and, ultimately, include the existing primary key of that table. There’s no need to discard what’s already working - just layer a web-friendly, resolvable structure on top.

It’s a pragmatic, elegant solution - and one that scales with both your data and your organisation.

Previous
Previous

Knowledge Graphs Are Going Mainstream: The New Foundation for AI

Next
Next

Head of Data & AI: Reversing the Flow of Intelligence