Enterprise Data Foundation for a Tier-1 Bank
Multi-year program
Context. A global bank needed to replace fragmented data pipelines with a unified, auditable data platform capable of supporting regulatory reporting, analytics, and AI use cases. The existing estate was a mix of legacy ETL, ad hoc scripts, and disconnected data stores with no consistent lineage.
Approach. Designed a three-layer architecture: ingestion (Dataflow + Pub/Sub), orchestration (Cloud Composer), and modeling (dbt with Data Vault 2.0 on BigQuery). Data Vault was chosen for its insert-only pattern and mandatory metadata, enabling point-in-time reconstruction and full source traceability required by BCBS 239 and DORA.
Key decisions. CMEK encryption for all data at rest. Private IP networking with no public endpoints. Column-level security via BigQuery policy tags. Terraform modules for reproducible infrastructure. Incremental loading with merge strategy for idempotent operations.
Outcome. Unified data platform serving multiple business domains, with full lineage from source to report and regulatory examination readiness built into the architecture rather than bolted on.