Self-describing¶
An EKG is composed of a set of self-describing datasets (SDDs) that provide information about lineage, provenance, pedigree, maturity, quality, licensing and governance.
As specified in principle 2, the properties in each Data Point are linked to their definition so the meaning is not in doubt. A dataset definition supplements this with management information such as its pedigree (how/when was it derived/sourced?) and its provenance (where/who did it come from?). This applies whether the information is maintained in the EKG itself or accessed/loaded from existing enterprise systems (data at rest); or received as data streams/messages (data-in-motion).
Rationale¶
This information is essential for data selection, enforcing policy and management of the ecosystem as a whole. As well as being essential for management, the definitions taken together comprise a knowledge catalog for the content of the EKG.
Implications¶
The information needs to be maintained and made available on an ongoing basis. It also needs to be sought out for external sourced data, whether accessed in place or loaded into an EKG platform.