EDP Capability Architecture¶
An enterprise data platform is not just a storage and transformation stack. It is a governed capability system with explicit operational, control, and service responsibilities.
Executive Summary¶
- An EDP must deliver 7 distinct capability groups spanning acquisition, management, processing, storage, access, governance, and observability
- Capabilities are durable design concerns. Components (tools, products, services) are replaceable implementations of those capabilities.
- Most platform failures are not technology failures. They are capability gaps -- missing lineage, absent quality controls, no reprocessing path, no cost visibility.
- This model works as a gap assessment, vendor evaluation framework, team structure blueprint, and maturity scorecard.
- If you cannot name the component that delivers a capability, you have a gap. If you have a component but no one owns it, you have a risk.
graph TB
subgraph "Data Acquisition"
A1[Batch] --- A2[CDC] --- A3[Events] --- A4[Files] --- A5[3rd Party]
end
subgraph "Data Management"
M1[Schema] --- M2[Metadata] --- M3[Lineage] --- M4[Quality] --- M5[Historization]
end
subgraph "Core Processing"
P1[Transform] --- P2[Orchestrate] --- P3[Execute] --- P4[Isolate] --- P5[Reprocess]
end
subgraph "Storage and Modeling"
S1[Raw Zone] --- S2[Curated Zone] --- S3[Semantic Layer] --- S4[History] --- S5[Serving]
end
subgraph "Access and Consumption"
C1[BI] --- C2[Self-Service] --- C3[APIs] --- C4[ML Features] --- C5[Export]
end
subgraph "Governance and Control"
G1[Access] --- G2[Masking] --- G3[Audit] --- G4[Contracts] --- G5[Policy]
end
subgraph "Observability"
O1[Freshness] --- O2[Pipelines] --- O3[Cost] --- O4[SLAs] --- O5[Incidents]
end
A1 -.-> M1
M1 -.-> P1
P1 -.-> S1
S1 -.-> C1
C1 -.-> G1
G1 -.-> O1 The Capability Model¶
Seven capability groups. Each group contains sub-capabilities with a single responsibility. Every sub-capability must have an owner, an implementation, and a measurable outcome.
1. Data Acquisition¶
| Sub-Capability | Description |
|---|---|
| Batch ingestion | Scheduled extraction from source systems (files, databases, APIs) |
| Change data capture (CDC) | Real-time or near-real-time capture of source system changes |
| Event ingestion | Streaming event consumption from message backbones |
| File onboarding | Structured and semi-structured file processing (CSV, JSON, Parquet, XML) |
| Third-party data onboarding | External data vendor integration with quality validation |
2. Data Management¶
| Sub-Capability | Description |
|---|---|
| Schema evolution | Handling schema changes without breaking downstream consumers |
| Metadata capture | Automated collection of technical, operational, and business metadata |
| Lineage tracking | End-to-end traceability from source to consumption |
| Quality controls | Automated validation at each refinement stage (completeness, accuracy, timeliness) |
| Retention management | Policy-driven data lifecycle (hot/warm/cold/archive/delete) |
| Historization | Time-variant data preservation (SCD Type 2, append-only, bitemporal) |
| Reconciliation | Cross-source and cross-layer data consistency verification |
3. Core Platform Processing¶
| Sub-Capability | Description |
|---|---|
| Transformation | Data cleaning, conforming, aggregation, and business logic execution |
| Orchestration | Workflow scheduling, dependency management, and retry logic |
| Pipeline execution | Reliable, scalable compute for batch and micro-batch workloads |
| Workload isolation | Preventing one workload from degrading another |
| Reprocessing and recovery | Ability to replay and rebuild any layer from source |
| Rules execution | Business rule application for derived calculations and classifications |
4. Storage and Modeling¶
| Sub-Capability | Description |
|---|---|
| Raw/landing zone | Immutable capture of source data in native format |
| Curated/integrated zone | Cleansed, conformed, deduplicated datasets |
| Semantic/business layer | Business-ready models, metrics, and governed views |
| Time-variant history | Full change history for audit, regulatory, and analytical use |
| Analytical serving structures | Pre-aggregated, denormalized datasets optimized for query patterns |
5. Access and Consumption¶
| Sub-Capability | Description |
|---|---|
| BI/reporting access | Governed access for dashboards and reporting tools |
| Self-service query | Ad-hoc SQL access for analysts and data scientists |
| Governed APIs and data sharing | Controlled data exposure to external consumers |
| ML feature access | Integration with feature stores for model training and inference |
| Downstream publish/export | Reverse ETL and operational sync to downstream systems |
6. Governance and Control¶
| Sub-Capability | Description |
|---|---|
| Access control | Role-based and attribute-based access at column and row level |
| Data masking and tokenization | PII protection for non-production and limited-access use |
| Auditability | Comprehensive logging of data access, transformations, and changes |
| Data contracts | Formal schema, quality, and SLA agreements between producers and consumers |
| Stewardship hooks | Integration points for human review, approval, and escalation |
| Policy enforcement | Automated application of governance rules across all layers |
7. Observability and Platform Operations¶
| Sub-Capability | Description |
|---|---|
| Freshness monitoring | Tracking data age against SLA thresholds |
| Pipeline monitoring | Job success, failure, duration, and resource utilization |
| Cost observability | Real-time and historical cost tracking by workload, domain, and data product |
| SLA/SLO monitoring | Automated tracking of platform service level commitments |
| Incident management | Detection, classification, routing, and resolution of platform issues |
| Platform telemetry | Usage metrics, adoption tracking, and capacity signals |
Capabilities vs Components¶
A capability is what the platform must do. A component is one way to do it. Confusing the two is how platforms end up locked to a vendor with no understanding of what they actually need.
| Concept | Capability | Component (one realization) |
|---|---|---|
| Lineage | End-to-end data traceability | Dataplex, Unity Catalog, OpenLineage |
| Historization | Time-variant data preservation | SCD Type 2 tables, Data Vault satellites |
| Policy enforcement | Automated governance rules | Column-level security policies, tag-based masking |
| Orchestration | Workflow scheduling and dependency | Airflow, Cloud Composer, Dagster |
| Quality controls | Automated data validation | Great Expectations, dbt tests, Soda |
| Transformation | Data cleaning and business logic | dbt, Spark, Dataform |
| Freshness monitoring | Data age tracking | Custom dashboards, Monte Carlo, Bigeye |
Capabilities are durable. Components change. Design for the capability, select the component.
How to Use This Model¶
Gap assessment. Walk through all 7 groups and 35+ sub-capabilities. For each one, name the component that delivers it. If you cannot, you have a gap. Gaps do not fix themselves.
Vendor evaluation. When evaluating a platform vendor or tool, map their feature set to these capabilities. Most vendors cover 3-4 groups well and leave the rest to you. Know what you are buying and what you still need to build or integrate.
Team structure. Each capability group suggests a team ownership boundary. Data acquisition is not the same team as governance. Observability is not the same team as transformation. Align teams to capability groups, not to tools.
Maturity assessment. For each sub-capability, score your current state: (0) absent, (1) manual/ad-hoc, (2) partially automated, (3) fully automated with monitoring. The distribution tells you where to invest. A platform that scores 3 on transformation but 0 on lineage is not mature -- it is blind.