Ontology as Factorisation
When we develop ontologies, we’re carefully crafting taxonomies, relationships and hierarchies. This is knowledge engineering. But at a deeper level, as we start to blend ontologies into AI, we’re also doing something mathematically elegant: we’re projecting high-dimensional data into a lower-dimensional conceptual space, much like dimensionality-reduction techniques in linear algebra. We’re factorising data.
🔵 What Do I Mean by Factorisation?
In linear algebra or machine learning, factorisation is the process of breaking down a complex system into a set of simpler, lower-dimensional components. It’s how we go from messy, high-dimensional data to something more structured and usable - for instance, latent features in a matrix factorisation model.
Ontology achieves a similar compression, but through abstraction and discretisation rather than algebraic multiplication. The first step is deciding what matters. What are the meaningful concepts we care about? What should we be paying attention to? This act of naming - of defining ontological classes - is not just descriptive. It’s selective. It’s a cognitive filter.
Once you’ve made those choices, you’re effectively projecting the chaotic surface of your data onto a smaller, more meaningful subspace - a conceptual lens. This is your factorised view of the world.
🔵 Ontological Classes as Features
Let’s say you’re working in tax law, healthcare, or finance. The raw data is sprawling - case notes, transaction logs, guidance manuals, APIs, spreadsheets. But once you define your ontological classes - Travel Expense, Employee, Business Purpose, or Diagnosis - you begin to compress that data into a smaller set of dimensions. These aren’t just labels. They’re axes of interpretation.
Your AI models now have something to hook into. Your data pipelines know what to extract, link, store and serve. You’ve constrained the entropy of your system, not by discarding information, but by organising it around meaning.
🔵 Why This Matters for AI
LLMs are famously good at handling unstructured data. But their real potential shines when they’re coupled with structure, especially when that structure reflects your domain’s core distinctions.
A well-designed ontology acts as a kind of “feature engineering” for knowledge-centric AI. You’ve defined priors for your latent variables. You’ve chosen which concepts should anchor your interpretation, and you’ve factorised your data accordingly.
The result? Faster iteration, more explainable results, and a far more coherent internal representation of your domain.
🔵 The Takeaway
Ontology isn’t just a documentation exercise or a knowledge management tool. It’s a strategic, high-leverage move in the data pipeline. When done well, it’s a way of compressing meaning, factorising chaos, and bringing clarity to your AI efforts.
If you’re serious about data-driven systems - especially those that aim to be intelligent - then ontology is not optional. It’s your starting point.