For the ETL engineer¶
Audience — the engineer who owns the projection from upstream source systems into the two shared base tables at Your Institution.
What you do today¶
You run a load — nightly batch, hourly micro-batch, or streaming near-real-time, depending on the upstream system — and the data lands somewhere downstream that operators / executives / compliance look at. When something goes wrong, the symptom is rarely "ETL failed" (that's the easy case, which alerts catch); it's usually "the dashboard showed wrong numbers" or "we have drift on this account" or "this transfer is missing its counterpart leg."
The hard case is silent: the load ran, the row counts look right, but a metadata key got left blank, an external counterparty's leg didn't make it across, or a force-posted transfer wasn't tagged as such. The downstream sheets render as exceptions; the operators escalate; you reverse-engineer which load step was responsible.
What this tool does differently¶
Every dashboard reads off two tables: transactions (one row per
posting leg) and daily_balances (one row per account-day). Every
sheet — every L1 invariant, every L2 hygiene check, every
Investigation question — projects off those two. Your ETL contract
is a stable, narrow column shape (Schema v6);
your debug surface is the same Today's Exceptions / Daily
Statement / Transactions chain the operators use.
When a load goes silently wrong, the L1 dashboard surfaces it as a specific row on a specific sheet. The check_type names the class of failure — and tells you which load step to audit:
- Drift
- Overdraft
- Limit Breach
- Stuck Pending
- Stuck Unbundled
- Supersession
The first time you ship a load fix and watch the corresponding row disappear from Today's Exceptions on tomorrow's run — that's the proof your ETL is observable, not just running.
What we are not asking you to learn¶
- Not a new schema. The two base tables are the contract; if you can write to them in the right shape, every dashboard works.
- Not the dashboard internals. You don't author visuals or filters or drills. The dashboards are generated from the L2 YAML by the integrator role; you're upstream of that.
- Not every L2 primitive. You need to know which metadata keys your loads are responsible for setting. The L2 YAML names them; the integrator can tell you which ones are load-time vs. set-by-downstream-process.
How to start¶
- Read the Data Integration handbook. It covers the two-table contract, the metadata keys, the matview refresh sequence, and the idempotency / supersession rules that let you safely re-run a load.
- Read Schema v6 — Data Feed Contract. It's the column-by-column reference for the two base tables. Treat it as the authoritative source for any "what type does this field need to be?" question.
- Walk the recipes in Walkthroughs → ETL in this order:
- Bookmark What to do when demo passes but prod fails?. It's the canonical first-stop for "the loader works locally, but doesn't on real data" debug arcs.
Your daily routine¶
- Verify the previous business day's load landed: open the Info sheet on any deployed dashboard. It carries a deploy stamp + matview row counts. Counts at zero or unchanged from yesterday means your load didn't run (or ran but landed in the wrong schema).
- Spot-check a known busy account on Daily Statement. The per-(account, day) walk should show your loaded postings — opening + flow + closing should all reconcile.
- Watch Today's Exceptions for new violations that look ETL-shaped (drift on a stable account, missing counterparty leg, force-posted-without- internal-catchup, supersession trail).
- After any schema or metadata change, refresh all
spec_example_*matviews. The L1 invariant views don't auto- refresh; stale matviews are a frequent cause of "the dashboard shows yesterday's state."
The matview-refresh contract¶
Per the
Data Integration handbook, the refresh
sequence is dependency-ordered: base tables first, then
spec_example_drift / spec_example_overdraft / spec_example_limit_breach
/ spec_example_stuck_pending / spec_example_stuck_unbundled, then the
Investigation matviews
(spec_example_inv_pair_rolling_anomalies,
spec_example_inv_money_trail_edges), then the daily-statement
rollups. The CLI's refresh_matviews_sql(l2_instance) helper
emits the right ordering for any L2 instance — call it from your
load orchestrator after every transactions / daily_balances
write.
The concepts you'll want grounded¶
- Double-entry posting — the conservation invariant your loads must preserve. Any leg you drop, double-load, or sign-flip surfaces as drift.
- Sweep / net / settle — the daily cycle behind aggregating rails. Impacts how you batch your loads (you can't "load Monday's sweep on Wednesday" without confusing the matviews).
- Open vs. closed loop — the system-boundary question. Closed-loop legs you load both sides; open-loop you load the internal side and wait for the external counterparty's confirmation.
- Eventual consistency — why a stuck-pending row on a fresh load is not necessarily a bug; aging-bucket bands tell you when it becomes one.
What "good" looks like¶
After a few weeks of running the load against the dashboard:
- You're catching load anomalies from the L1 surface within a day of them happening, not from end-of-month reconciliation.
- New metadata keys you add show up immediately on the relevant L2 Flow Tracing sheets.
- When operators ask "is the load OK?" you answer from the Info sheet's matview row counts in 10 seconds.
- Load reruns + supersession traces produce zero net change on the L1 dashboard (the supersession audit catches what changed, the conservation check confirms net delta = 0).
That's the acceptance bar. The ETL works when the institution's downstream surfaces stay in-sync with the upstream feed without manual data-team intervention.