Meaning¶
The meaning of every Data Point must be directly resolvable to a machine-readable definition in verifiable formal logic.
A Data Point combines an object---using its Enterprise Knowledge Graph IRI (EKG/IRI) as its identifier---with the value of a property in some context. Hence, data is expressed at its most granular level for both data-at-rest and data-in-motion.
The property itself is always an IRI---often called predicate IRI---that refers to an object1 that represents "the meaning" of the given Data Point. This object has its own identity and is defined through further properties based on logic that allow information to be rigorously combined, queried and inferred. These properties that define properties---also called "axioms"---are standardized by means of the Web Ontology Language (OWL) by the World Wide Web Consortium (W3C) and are grouped into "OWL ontologies" for management purposes.
Rationale¶
Expressing data at a granular level allows ultimate flexibility for it to be sliced, diced, combined and aggregated. This capability to combine and infer information is further enhanced by the use of property definitions built on logic. Having the properties themselves be objects that can be looked up means that all data is self-defining and carries its meaning with it. Since the information is self-defining there is no fixed schema for the EKG as a whole and it can non-disruptively incorporate additional knowledge.
Implications¶
Some further discovery, with subject matter experts and creators of source systems, is often needed to truly understand what a given set of data really means and what can be inferred from it. In other words you cannot rely on the name of a column in a spreadsheet. A deceptively simple column name such as "number of European customers" leaves open the meaning of "European" and "customer" and timing (when does one start and stop being a customer?). And different sources could have different interpretations of that same name. The benefit is consistency, accuracy and the ability to make sound business decisions.
Advanced¶
At higher levels of EKG/Platform maturity (level 4 and 5 to be exact) the term "Data Point" may in fact become a more complex data structure that is used "on the wire" that represents the Data Point at a more "holistic" level, profoundly supporting Multiple Versions of the Truth (MVOT).
Since an EKG supports many datasets that have overlapping information coming from multiple sources, there could be:
- multiple EKG identifiers (\glspl{ekg:iri}) for the same object
- One object can have multiple identifiers that can be linked together2 and therefore be rightfully addressable with any of these identifiers.
- multiple definitions of meaning
- multiple equal or different values coming from multiple sources
- multiple versions over time of those values (temporality)
For each of these four "axes"---identity, meaning, source, and temporality---you could have multiple options to choose from even while logically, from a user perspective, it's the same data point.
Advanced client applications, services or AIs can use these Data Points to perform last-minute "at the edge" computations around finding the right value from the right timeline and source with the right quality for the given context.
-
the official term in the W3C RDF standard for this object is "resource" ↩
-
for instance via
owl:sameAs
i.e. "Individual Equality" ↩ -
See Equivalent Object Properties or Equivalent Data Properties ↩
-
See Data Subproperties ↩