EDP Capability Architecture¶

An enterprise data platform is not just a storage and transformation stack. It is a governed capability system with explicit operational, control, and service responsibilities.

Executive Summary¶

An EDP must deliver 7 distinct capability groups spanning acquisition, management, processing, storage, access, governance, and observability
Capabilities are durable design concerns. Components (tools, products, services) are replaceable implementations of those capabilities.
Most platform failures are not technology failures. They are capability gaps -- missing lineage, absent quality controls, no reprocessing path, no cost visibility.
This model works as a gap assessment, vendor evaluation framework, team structure blueprint, and maturity scorecard.
If you cannot name the component that delivers a capability, you have a gap. If you have a component but no one owns it, you have a risk.

graph TB
    subgraph "Data Acquisition"
        A1[Batch] --- A2[CDC] --- A3[Events] --- A4[Files] --- A5[3rd Party]
    end
    subgraph "Data Management"
        M1[Schema] --- M2[Metadata] --- M3[Lineage] --- M4[Quality] --- M5[Historization]
    end
    subgraph "Core Processing"
        P1[Transform] --- P2[Orchestrate] --- P3[Execute] --- P4[Isolate] --- P5[Reprocess]
    end
    subgraph "Storage and Modeling"
        S1[Raw Zone] --- S2[Curated Zone] --- S3[Semantic Layer] --- S4[History] --- S5[Serving]
    end
    subgraph "Access and Consumption"
        C1[BI] --- C2[Self-Service] --- C3[APIs] --- C4[ML Features] --- C5[Export]
    end
    subgraph "Governance and Control"
        G1[Access] --- G2[Masking] --- G3[Audit] --- G4[Contracts] --- G5[Policy]
    end
    subgraph "Observability"
        O1[Freshness] --- O2[Pipelines] --- O3[Cost] --- O4[SLAs] --- O5[Incidents]
    end

    A1 -.-> M1
    M1 -.-> P1
    P1 -.-> S1
    S1 -.-> C1
    C1 -.-> G1
    G1 -.-> O1

The Capability Model¶

Seven capability groups. Each group contains sub-capabilities with a single responsibility. Every sub-capability must have an owner, an implementation, and a measurable outcome.

1. Data Acquisition¶

Sub-Capability	Description
Batch ingestion	Scheduled extraction from source systems (files, databases, APIs)
Change data capture (CDC)	Real-time or near-real-time capture of source system changes
Event ingestion	Streaming event consumption from message backbones
File onboarding	Structured and semi-structured file processing (CSV, JSON, Parquet, XML)
Third-party data onboarding	External data vendor integration with quality validation

2. Data Management¶

Sub-Capability	Description
Schema evolution	Handling schema changes without breaking downstream consumers
Metadata capture	Automated collection of technical, operational, and business metadata
Lineage tracking	End-to-end traceability from source to consumption
Quality controls	Automated validation at each refinement stage (completeness, accuracy, timeliness)
Retention management	Policy-driven data lifecycle (hot/warm/cold/archive/delete)
Historization	Time-variant data preservation (SCD Type 2, append-only, bitemporal)
Reconciliation	Cross-source and cross-layer data consistency verification

3. Core Platform Processing¶

Sub-Capability	Description
Transformation	Data cleaning, conforming, aggregation, and business logic execution
Orchestration	Workflow scheduling, dependency management, and retry logic
Pipeline execution	Reliable, scalable compute for batch and micro-batch workloads
Workload isolation	Preventing one workload from degrading another
Reprocessing and recovery	Ability to replay and rebuild any layer from source
Rules execution	Business rule application for derived calculations and classifications

4. Storage and Modeling¶

Sub-Capability	Description
Raw/landing zone	Immutable capture of source data in native format
Curated/integrated zone	Cleansed, conformed, deduplicated datasets
Semantic/business layer	Business-ready models, metrics, and governed views
Time-variant history	Full change history for audit, regulatory, and analytical use
Analytical serving structures	Pre-aggregated, denormalized datasets optimized for query patterns

5. Access and Consumption¶

Sub-Capability	Description
BI/reporting access	Governed access for dashboards and reporting tools
Self-service query	Ad-hoc SQL access for analysts and data scientists
Governed APIs and data sharing	Controlled data exposure to external consumers
ML feature access	Integration with feature stores for model training and inference
Downstream publish/export	Reverse ETL and operational sync to downstream systems

6. Governance and Control¶

Sub-Capability	Description
Access control	Role-based and attribute-based access at column and row level
Data masking and tokenization	PII protection for non-production and limited-access use
Auditability	Comprehensive logging of data access, transformations, and changes
Data contracts	Formal schema, quality, and SLA agreements between producers and consumers
Stewardship hooks	Integration points for human review, approval, and escalation
Policy enforcement	Automated application of governance rules across all layers

7. Observability and Platform Operations¶

Sub-Capability	Description
Freshness monitoring	Tracking data age against SLA thresholds
Pipeline monitoring	Job success, failure, duration, and resource utilization
Cost observability	Real-time and historical cost tracking by workload, domain, and data product
SLA/SLO monitoring	Automated tracking of platform service level commitments
Incident management	Detection, classification, routing, and resolution of platform issues
Platform telemetry	Usage metrics, adoption tracking, and capacity signals

Capabilities vs Components¶

A capability is what the platform must do. A component is one way to do it. Confusing the two is how platforms end up locked to a vendor with no understanding of what they actually need.

Concept	Capability	Component (one realization)
Lineage	End-to-end data traceability	Dataplex, Unity Catalog, OpenLineage
Historization	Time-variant data preservation	SCD Type 2 tables, Data Vault satellites
Policy enforcement	Automated governance rules	Column-level security policies, tag-based masking
Orchestration	Workflow scheduling and dependency	Airflow, Cloud Composer, Dagster
Quality controls	Automated data validation	Great Expectations, dbt tests, Soda
Transformation	Data cleaning and business logic	dbt, Spark, Dataform
Freshness monitoring	Data age tracking	Custom dashboards, Monte Carlo, Bigeye

Capabilities are durable. Components change. Design for the capability, select the component.

How to Use This Model¶

Gap assessment. Walk through all 7 groups and 35+ sub-capabilities. For each one, name the component that delivers it. If you cannot, you have a gap. Gaps do not fix themselves.

Vendor evaluation. When evaluating a platform vendor or tool, map their feature set to these capabilities. Most vendors cover 3-4 groups well and leave the rest to you. Know what you are buying and what you still need to build or integrate.

Team structure. Each capability group suggests a team ownership boundary. Data acquisition is not the same team as governance. Observability is not the same team as transformation. Align teams to capability groups, not to tools.

Maturity assessment. For each sub-capability, score your current state: (0) absent, (1) manual/ad-hoc, (2) partially automated, (3) fully automated with monitoring. The distribution tells you where to invest. A platform that scores 3 on transformation but 0 on lineage is not mature -- it is blind.