Skip to content

How do I populate spec_example_transactions from my core banking system?

Engineering walkthrough — Data Integration Team. Foundational.

The story

You've got an upstream core banking system with a gl_postings table (or its local equivalent — general_ledger.entry, accounting.posting_detail, etc.). It carries one row per posting leg already, which is the natural granularity of our spec_example_transactions table. You need to write the ETL job that lands it in our two-table schema by the morning cut so the four L2-fed dashboards (L1 Reconciliation, L2 Flow Tracing, Investigation, Executives) work.

The good news: it's mostly a column-rename. The contract is small (11 mandatory columns + a handful of conditional ones — see Schema_v6.md → ETL contract / minimum viable feed). The bad news: skip the wrong column and a downstream check goes silent. So this walkthrough covers the canonical projection plus the per-column failure modes.

The question

"For my core banking system's gl_postings table, what's the canonical projection that maps it to spec_example_transactions? What columns must I populate, and what columns can wait until v2?"

Where to look

Two reference points:

  • docs/Schema_v6.md — column-level contract and per-column failure modes ("If you skip this, what dashboard breaks?").
  • common/l2/schema.py::emit_schema(l2_instance) — the source of truth for the prefixed DDL. Every base table, view, and matview your dashboards read is emitted here, all under the L2 instance prefix. Call it from Python (or apply directly via quicksight-gen schema apply --execute) to see the rendered output for your L2 instance.

The spec_example in this walkthrough's SQL is your L2 instance name (e.g., spec_example); your ETL substitutes it once when wiring the projection.

What you'll see in the demo

Run (substitute your own L2 path for the bundled fixture below):

from quicksight_gen.common.l2.loader import load_instance
from quicksight_gen.common.l2.schema import emit_schema

l2 = load_instance("tests/l2/spec_example.yaml")
print(emit_schema(l2)[:4000])

The first CREATE TABLE block is spec_example_transactions itself — the column list, types, and constraints your projection has to satisfy. The second is spec_example_daily_balances. Read both end-to-end before writing the projection; they're the contract.

For an end-to-end mapping from core_banking.gl_postingsspec_example_transactions, see Example 1 in docs/Schema_v6.md (the SQL block under "Populating customer DDA postings from core banking"). It's a real INSERT INTO ... SELECT FROM against a hypothetical core-banking source schema.

What it means

For every row your ETL writes, you're committing to a contract:

  1. The 11 mandatory columns (per Schema_v6.md → minimum viable feed) get the row visible on the dashboards at all.
  2. transfer_parent_id populated only for chained transfers (e.g., external_txn → payment → settlement → sale or any reversal chain your L2 declares). Skip it and pipeline-traversal walkthroughs (Money Trail, Account Network) silently return nothing for the affected rows.
  3. origin = 'ExternalForcePosted' on Fed / processor force-posts. Skip it and the L1 drift split between bank-initiated activity and force-posted activity collapses (rows look like normal operator postings, drift checks under-fire).
  4. metadata JSON — the universal extras container. Skip it on day 1 if your downstream consumer doesn't need it; populate it in priority order (source first, then per-transfer_type keys per the catalog). The catalog tables in Schema_v6 list the keys + what each one drives.

Everything else (memo, external_system, account_parent_role) is conditional — populate when the downstream consumer demands it.

Drilling in

The mapping pattern looks like this for a customer-DDA posting (from Schema_v6 Example 1, abbreviated):

INSERT INTO spec_example_transactions (
    transaction_id, transfer_id, transfer_type, origin,
    account_id, account_name, account_parent_role, account_role,
    account_scope, amount_money, amount, status,
    posting, business_day_start, memo, metadata
)
SELECT
    p.posting_id,                                      -- your PK
    p.transfer_id,                                     -- your transfer grouping
    p.transfer_type,                                   -- map your enum to ours
    'InternalInitiated'                  AS origin,    -- or ExternalForcePosted for Fed
    p.account_number                     AS account_id,
    a.account_name,
    a.parent_role                        AS account_parent_role,
    a.account_role,
    a.account_scope,                                   -- Internal / External
    p.signed_amount                      AS amount_money,
    ABS(p.signed_amount)                 AS amount,
    CASE WHEN p.posting_status = 'P' THEN 'Posted' ELSE 'Failed' END,
    p.posting_timestamp                  AS posting,
    p.posting_timestamp::date            AS business_day_start,
    p.memo,
    JSON_OBJECT('source' VALUE 'core_banking')
FROM core_banking.gl_postings p
JOIN core_banking.accounts a ON a.account_number = p.account_number
WHERE p.posting_date >= CURRENT_DATE - INTERVAL '7 days';

A few things to note about this projection:

  • business_day_start is denormalized from posting deliberately — fast date filters in the dashboard datasets need a column they can range-scan without an expression cast. It's redundancy for query speed; the cost is your ETL writes one extra column.
  • status maps from your status enum to ours. Anything that's not Posted MUST be Pending or Failed (no fourth state) — the drift check and net-zero check both WHERE status = 'Posted' to exclude in-flight or rejected legs.
  • amount_money is + for money flowing INTO the account (a debit in bank's-bookkeeping terms), for money flowing OUT (a credit). spec_example_daily_balances.money for any account-day equals SUM(amount_money) up to that day, so getting this sign right is what makes the drift check honest. If your upstream uses the opposite sign convention, flip it here, not later in a view. Every check assumes our sign convention.
  • metadata carries source on every row from this projection (driven by the JSON_OBJECT(... VALUE 'core_banking') literal). That single key is enough to satisfy the Investigation provenance walks on day 1.

Next step

Once your projection is wired up:

  1. Populate a small slice — one day, one source system. Don't try to backfill 90 days on the first run.
  2. Run the validation walkthrough (How do I prove my ETL is working before going live?) — it walks you through the net-zero, drift-recompute, and parent-chain integrity checks you should run before declaring the load complete.
  3. Open the L1 Reconciliation Dashboard's Today's Exceptions sheet — if the KPI reads 0 with no detail rows, your feed landed and the contract holds. If KPIs spike unexpectedly, see What do I do when the demo passes but my prod data fails? for the symptom-organized debug recipes.
  4. Iterate on metadata — once the minimum feed is stable, layer in transfer_parent_id and the per-transfer_type metadata keys per the Metadata JSON columns contract.

If your upstream source isn't a gl_postings table — say it's a processor report, a Fed statement file, or a sweep-engine log — the same projection shape applies, but the inbound columns differ. Schema_v6.md's examples cover Fed-statement and processor-feed ingest; the same INSERT INTO spec_example_transactions pattern applies regardless of source.