How do I add an app-specific metadata key?¶
Customization walkthrough — Developer / Product Owner. Reskinning + extending.
The story¶
Your ETL team added a new attribute to
spec_example_transactions.metadata — say originating_branch on
sales, or risk_score on external transfers. The key is landing
in the JSON; you can see it in the database. Now you need to
surface it on the dashboards: as a column in a table, a filter in
a sheet control, a grouping dimension in a pivot, or a category
axis on a bar chart.
The dashboard side of the metadata-key contract is shorter than
the ETL side and uses no schema migration: JSON_VALUE(metadata,
'$.your_key') in a dataset SQL, an entry in the matching
DatasetContract, and the visual reference picks it up. Because
every dataset is direct-query (not SPICE), a new column appears
the moment your next deploy completes — no refresh step.
This walkthrough covers the dashboard-side read pattern and the
load-bearing decision: when to surface a metadata key as a
column vs a filter vs a grouping dimension. The ETL-side write
pattern (what your ETL team does to get the key into metadata
in the first place) is in the
Data Integration Handbook.
The question¶
"My team's ETL is now writing originating_branch into the
metadata JSON on sale rows. How do I expose it on the
dashboards so end users can filter and group by branch?"
Where to look¶
Three reference points:
- Data Integration Handbook → How do I add a metadata key? — the producer side. Read this first if you're not yet certain the key is being written. The dashboard read pattern below assumes the data is already there.
src/quicksight_gen/apps/<app>/datasets.py— the dataset SQL files. Every existingJSON_VALUEextraction is a model for the one you're adding. Grep forJSON_VALUE(metadatato see them.- Schema_v6.md → metadata text column contract —
the cataloged keys, their per-
transfer_typeplacement, and the forbidden-syntax list (->>,->,@>,?are all out — onlyJSON_VALUE/JSON_QUERY/JSON_EXISTS).
What you'll see in the demo¶
An existing dataset reads sale-row metadata. Grep an L1 dataset
SQL for JSON_VALUE(metadata:
SELECT
transaction_id,
transfer_id,
posting,
amount_money,
JSON_VALUE(metadata, '$.merchant_name') AS merchant_name,
JSON_VALUE(metadata, '$.card_brand') AS card_brand,
JSON_VALUE(metadata, '$.cashier') AS cashier,
JSON_VALUE(metadata, '$.payment_method') AS payment_method
FROM spec_example_transactions
WHERE transfer_type = 'sale'
The matching DatasetContract declares each extracted column:
SALES_CONTRACT = DatasetContract(columns=[
ColumnSpec("transaction_id", "STRING"),
ColumnSpec("transfer_id", "STRING"),
ColumnSpec("posting", "DATETIME"),
ColumnSpec("amount_money", "DECIMAL"),
ColumnSpec("merchant_name", "STRING"),
ColumnSpec("card_brand", "STRING"),
ColumnSpec("cashier", "STRING"),
ColumnSpec("payment_method", "STRING"),
])
That's the pattern. Adding a column is a SQL line + a
ColumnSpec line. The visual layer picks it up via the dataset
reference; no schema migration, no rebuild step.
What it means¶
The dashboard-side metadata-key contract has three parts:
Part 1 — Read it in dataset SQL via JSON_VALUE¶
Add the extraction to the relevant build_*_dataset() function:
SELECT
-- existing columns ...
JSON_VALUE(metadata, '$.originating_branch') AS originating_branch
FROM spec_example_transactions
WHERE transfer_type = 'sale';
JSON_VALUE returns NULL for rows that don't carry the key.
Decide upfront whether NULL is acceptable in your visual:
- Yes (most cases): leave the SQL as-is. Filter / group visuals show NULL as a "(blank)" bucket; KPIs / counts ignore NULL rows.
- No (key is mandatory for the visual's question): add
WHERE JSON_EXISTS(metadata, '$.originating_branch')to filter out the un-keyed rows. - No, but show a sentinel: wrap with
COALESCE(JSON_VALUE(...), 'unknown') AS originating_branch. The cell rendersunknowninstead of blank.
Part 2 — Add the column to the DatasetContract¶
SALES_CONTRACT = DatasetContract(columns=[
# existing entries ...
ColumnSpec("originating_branch", "STRING"),
])
Run tests/test_dataset_contract.py — the contract test verifies
your SQL projection emits exactly the contract columns in the
declared order. Green = the column shape is consistent. See
How do I swap dataset SQL? for the
contract / projection relationship in detail.
Part 3 — Decide how the visual surfaces the new column¶
This is the load-bearing UX decision. The metadata key is the data; the visual treatment is what end users actually experience. Three patterns from the existing dashboards:
- As a table column. Drag
originating_branchinto a Table visual's Field Wells → Group By. Shows up as a sortable, filterable cell in every row. Right answer when the user's question is "show me one row per sale, including the branch." - As a sheet-level filter (dropdown / multi-select). Add a ParameterControl + linked Filter group. Shows up as a sheet header control that scopes every visual on the sheet. Right answer when the user's question is "show me everything for the West Hills branch."
- As a grouping / category dimension. Drag into a BarChart's Group By or a Pivot's Row dimension. Visualizes the metric across branches. Right answer when the user's question is "compare sales volume across branches."
The same metadata key can appear in all three forms on the same sheet — they're not mutually exclusive. But add them in priority order: usually the table column first (lowest cost, highest clarity), then the filter, then the grouping visual.
Drilling in¶
A few patterns to know once the basic addition works:
When to promote a metadata key to a first-class column¶
Metadata is the right home for new keys and per-transfer-type
keys. But if a key is read in many WHERE / GROUP BY clauses
across many datasets, the constant JSON_VALUE(metadata, '$.key')
becomes friction:
- More keystrokes per dataset.
- Slower at scale (
JSON_VALUEis fine for direct query but costs more than a native column on big tables). - Easier to typo (
'$.cardbrand'silently returns NULL forever).
The promotion path is a schema migration: add the column to
spec_example_transactions (or spec_example_daily_balances), update
the ETL projection to write the new column directly, and update
dataset SQL to reference the column instead of JSON_VALUE. The
DatasetContract doesn't change — same column name, same type,
just a different upstream source. Don't pre-promote: keep keys
in metadata until the friction is real.
When to surface a metadata key as a filter vs a column¶
Rule of thumb from the existing dashboards:
- Cardinality < 10 (card_brand, payment_method): filter works well. End user picks one or two values from a dropdown. Drives the whole sheet.
- Cardinality 10-100 (cashier, originating_branch): filter is OK but the dropdown gets long. A table column + click-to-filter (left-click drill on the cell) is often better — the user sees the available values inline rather than picking from a list.
- Cardinality > 100 (merchant_account_id, statement_line_id): table column only. Filtering by a high-cardinality value is a search problem, not a dropdown problem; QuickSight's search-in-filter is functional but not great UX. Surface as a column and let the user use the table's Find function.
Why the demo data isn't the constraint¶
The demo seeds these keys for every applicable row, so the
demo always shows non-blank cells. Production data may not.
If your ETL adds originating_branch going forward but didn't
backfill historical rows, every dataset reading the key
returns NULL for old rows. Three options, in order of
preference:
- Backfill in the warehouse. A one-shot UPDATE to add the key to old rows with a derived value. Cleanest if the value is recoverable.
- Coalesce in the projection.
COALESCE(JSON_VALUE(...), 'pre-2026') AS originating_branch. Old rows render aspre-2026; new rows show their actual branch. Honest about the data shape change. - Filter in the visual. Add a sheet-level filter defaulting to the date range where the key exists. Hides the issue but makes the dashboard less useful for historical analysis.
Next step¶
Once the new column is reading and rendering:
- Run the contract test.
.venv/bin/pytest tests/test_dataset_contract.py -k <dataset_name>. Green = projection and contract agree on the column shape. Red = something drifted; usually a typo in the column alias (originating_branchvsoriginating_brnach). - Deploy and verify the visual.
quicksight-gen json apply -c config.yaml -o out/ --execute. Open the dashboard, confirm the new column / filter / dimension renders with non-blank values. - Decide whether to document. If the key is bank-specific
(your own
originating_branchschema), document in your internal customization README; if it's a candidate for upstream contribution, add a row to Schema_v6.md → metadata catalog and propose it as part of the canonical key set.
Related walkthroughs¶
- Data Integration Handbook → How do I add a metadata key? —
the producer side of the same pattern. Read this if the
key isn't yet being written into
metadataupstream. - How do I swap the SQL behind a dataset? — the contract / projection relationship. Adding a metadata column is a contract change; the swap walkthrough explains why that's not free and what the test catches.
- Schema_v6 → metadata text column contract —
the canonical per-
transfer_typemetadata key inventory and the forbidden-syntax list (no->>, no JSONB, no Postgres-specific operators).