How do I run the test suite against my customized dataset SQL?¶
Customization walkthrough — Developer / Product Owner. Testing.
The story¶
You've customized — swapped a dataset's SQL to read from your
warehouse view, added an originating_branch metadata key,
extended transfer_type with repo. Each customization is a
small mutation to a small surface (one SQL function, one
ColumnSpec, one L2 declaration). The shipped test suite covers
the contract layer: do the dataset SQL projections still emit
the columns the visuals expect? But it doesn't (and can't) cover
your SQL's semantic correctness — whether your warehouse view
returns the right numbers.
This walkthrough covers the testing layout: which shipped tests catch which classes of breakage, and where to add your own tests for customization-specific concerns.
The question¶
"I changed a dataset's SQL. Which tests are likely to fail now, and where do I add a test for the change I just made?"
Where to look¶
Three reference points:
tests/— the shipped pytest suite. Two layers: unit / integration (fast, no AWS) and e2e (gated onQS_GEN_E2E=1, hits a real AWS account).tests/test_dataset_contract.py— the contract test. For every dataset, asserts the SQL projection's column shape matches the declaredDatasetContract. This is the test that catches contract drift after a SQL swap.run_e2e.sh— the one-shot runner that regenerates JSON, re-deploys to AWS, and runs the full e2e suite. Use this before declaring a customization production-ready.
What you'll see in the demo¶
The default test invocation:
.venv/bin/pytest
Runs every test under tests/ except tests/e2e/ (those
are gated). Total wall time on a fresh laptop: ~5-10 seconds
for ~200 tests. The contract tests alone run in well under a
second.
The e2e invocation:
./run_e2e.sh
Regenerates JSON, deploys to the AWS account in
run/config.yaml, then runs the e2e suite. Total wall time:
~10-15 minutes (the deploy alone is ~5 minutes; the browser
tests run with pytest-xdist at -n 4 by default).
For a single test:
.venv/bin/pytest tests/test_dataset_contract.py -k overdraft -v
The -k filter matches on test ID. The contract test IDs are
the first column of each contract (e.g., account_id for
OVERDRAFT_CONTRACT). Use this to narrow to one customization
at a time during iteration.
What it means¶
The shipped tests divide into four layers, each catching a different class of customization breakage:
Layer 1 — Unit tests (tests/test_*.py)¶
Fast. No AWS. No database. Pure-Python assertions about the generator's output.
test_dataset_contract.py— the SQL projection matches the declaredDatasetContract. The single most important test for customization. Fails if your SQL swap forgot a column or got the order wrong.test_generate.py— the full generate pipeline produces valid analysis + dashboard JSON. Catches cross-references that broke (a visual referencing a dataset that no longer exists, a filter referencing a column that's gone).test_<app>.pyper app (test_executives.py,test_investigation.py, etc.) — per-app visual + filter wiring assertions. Catches "the visual now references a column the contract dropped."test_theme_presets.py— theme preset registry validity. Add a test here when registering a new preset for your bank.
Run this layer on every customization commit:
.venv/bin/pytest tests/ --ignore=tests/e2e -x
-x stops on first failure — fastest feedback when iterating.
Layer 2 — Demo data tests (tests/test_demo_data.py)¶
Asserts the demo seed generator's output. Contains the
SHA256 hash lock that catches any byte-level shift in seed
output, plus per-scenario coverage assertions
(TestScenarioCoverage).
If you customized the demo generator (added a new
transfer_type value's seed branch, planted a new exception
scenario), the hash test fails — that's the prompt to re-lock
the hash by pasting the new value into the assertion. See
CLAUDE.md "Demo Data Conventions" for the re-lock pattern.
If you customized the L2 instance to add a new
transfer_type value, the demo seed should also be updated
to plant ≥1 row of the new type, so the e2e tests have
something to render. The TestScenarioCoverage pattern
makes this a one-line assertion.
Layer 3 — L2 schema + seed contract tests (tests/test_l2_seed_contract.py)¶
Asserts the per-prefix DDL emitted by common.l2.schema.emit_schema(l2_instance)
and the seed bytes emitted by common.l2.seed.emit_seed(l2_instance, scenario).
Catches:
- Schema migrations that don't round-trip (DROP without matching CREATE, missing index).
- Per-prefix view emission that drifts from the L2 instance vocabulary.
Customizations that touch common/l2/schema.py (a new view, a
new index) are most likely to fail tests here. The fix is
usually to update the matching test expectation alongside the
schema change.
Layer 4 — End-to-end (tests/e2e/*, gated on QS_GEN_E2E=1)¶
The expensive layer. Two sub-layers:
- API tests (
@pytest.mark.api) — boto3 calls against the deployed AWS resources. Asserts dataset row counts, dashboard structure, sheet inventory, drill-action wiring. Catches "the dataset deployed but returns zero rows for the customer's data" — the failure mode the contract test can't catch. - Browser tests (
@pytest.mark.browser) — Playwright WebKit headless. Loads the deployed dashboard, clicks through tabs, asserts visual rendering and filter interactions. Catches "the dashboard deployed but the visual layer is broken because of a column the dataset no longer emits."
Run the full e2e once before declaring a customization production-ready:
./run_e2e.sh
Or skip the deploy if you're iterating on tests against an existing deployment:
./run_e2e.sh --skip-deploy
Drilling in¶
A few patterns to know once the basic test layout makes sense:
Add a unit test for a custom dataset's SQL semantics¶
The shipped contract test asserts column shape, not column correctness. Your custom SQL needs its own correctness test. Pattern:
# tests/test_my_overdraft_customization.py
import pytest
import psycopg2
from quicksight_gen.common.config import load_config
from quicksight_gen.apps.l1_dashboard.datasets import build_overdraft_dataset
@pytest.mark.skipif(
not os.environ.get("QS_GEN_TEST_DB_URL"),
reason="set QS_GEN_TEST_DB_URL to a fixture-loaded warehouse",
)
def test_overdraft_returns_known_overdrawn_account():
cfg = load_config("config.yaml")
ds = build_overdraft_dataset(cfg)
sql = next(iter(ds.PhysicalTableMap.values())).CustomSql.SqlQuery
conn = psycopg2.connect(os.environ["QS_GEN_TEST_DB_URL"])
rows = conn.cursor().execute(sql).fetchall()
# Your bank's known-overdrawn-yesterday account fixture
assert any(r[0] == "fixture-overdrawn-account-1" for r in rows)
The pattern: extract the SQL from the built DataSet, run it against a test warehouse with known fixture data, assert on specific row presence. Skip the test when the fixture warehouse isn't available (CI gates this).
Add an e2e test for a customization-specific scenario¶
If your customization adds a new exception check (or extends
an existing one to fire on a new transfer_type), add an e2e
test that verifies the visual layer surfaces it. Pattern:
# tests/e2e/test_repo_exception_check.py
import pytest
@pytest.mark.api
def test_repo_transfers_appear_in_transfer_type_filter(qs_client, dashboard_id):
# After your seed includes 'repo' transfers, the Transfer Type
# filter dropdown should auto-populate the new value.
distinct_types = qs_client.get_dashboard_filter_values(
dashboard_id, filter_id="filter-l1-transfer-type",
)
assert "repo" in distinct_types
The e2e suite runs against the deployed dashboard. New tests
follow the existing test patterns
(tests/e2e/test_l1_*.py is the canonical reference).
When to add a test vs trust the contract test¶
The contract test catches:
- Column added / removed / renamed.
- Column type changed.
- Column order changed.
The contract test does not catch:
- The SQL returns wrong numbers (semantic bug).
- The SQL returns zero rows (your warehouse view is empty).
- The SQL has a typo that returns NULL where data should be
(e.g.,
JSON_VALUE(metadata, '$.cardbrand')vs'$.card_brand'). - Performance regressions (your warehouse view is now 30s instead of 200ms).
Add unit tests for the second category (semantic correctness), add e2e tests for the third (visual rendering), and add a manual / monitoring check for the fourth (the dashboard's direct-query latency degrades visibly under bad SQL — but no automated test catches it before deploy).
CI integration¶
The shipped GitHub Actions workflow (.github/workflows/ci.yml)
runs the unit + integration layers (Layer 1-3) on every push.
It does not run e2e — that requires AWS credentials and a
real account. For your fork:
- Layer 1-3 in CI on every push (free, fast feedback).
- Layer 4 in a separate workflow, manually triggered or on a schedule, against a sandbox AWS account.
Keep the test gating clear: a unit-test failure should block a PR; an e2e failure on a sandbox account should warn but not block (the failure may be infrastructure flakiness, not a code regression).
Next step¶
Once you have a test plan in place:
- Run the shipped suite first.
.venv/bin/pytest tests/ --ignore=tests/e2e -x. This catches the most common customization breakage class (contract drift) in seconds. - Add at least one customization-specific test per
customization commit. A custom dataset SQL gets a row-count
or fixture-row assertion. A new
transfer_typegets aTestScenarioCoverageassertion in the demo data tests. A new metadata key gets aJSON_EXISTSassertion in the relevant dataset's column projection. - Run e2e before the first production deploy.
./run_e2e.shagainst a sandbox or staging AWS account. The browser tests are the catch-all for "the dashboard renders" — a green run here is the last gate before a production deploy.
Related walkthroughs¶
- How do I swap the SQL behind a dataset? — the contract test (Layer 1) is what enforces the swap-without-breaking-visuals guarantee.
- How do I run my first deploy? —
the deploy is part of the e2e (Layer 4) loop. Re-running
json apply --executebetween iterations also re-runs the contract test implicitly via the build. - How do I extend canonical values? —
paired with this walkthrough's
TestScenarioCoveragerecommendation. Adding a new value without a coverage assertion means the value lands in production without ever having been e2e-tested.