Skip to content

Tree — Data

Dataset is the tree node for one dataset registration on the App; Column is a typed reference to a column on a Dataset. The chained factories (ds["col"].dim() / .sum() / .date()) build typed Dim / Measure slots that visuals consume directly.

CalcField is an analysis-level calculated field — bound to one Dataset, available across visuals via the same typed-Dim shape.

datasets

Dataset tree nodes (L.1.7) + typed Column refs (L.1.17).

Dataset is a first-class tree concept: visuals and filters reference a Dataset instance by object ref instead of by string identifier, and the App walks the tree to derive the precise dependency graph — which Sheet / Visual / FilterGroup uses which Dataset.

The dependency graph drives: - Selective deploy (only re-create datasets that downstream changes touch). - Matview REFRESH ordering (REFRESH only the matviews backing Datasets that an updated deploy surface depends on).

Construction-time check (in App.emit_analysis): every Dataset referenced from the tree must be registered on the App via app.add_dataset(). Catches "visual references undeclared dataset" at emit time, where the existing string-keyed pattern lets the mismatch flow through to deploy.

Typed Column refs (L.1.17 — fragility fix). Bare-string column names in Dim(ds, "column_name") were silently typo-able. The new path:

  • ds["column_name"] validates column_name against the dataset's registered DatasetContract (raises KeyError at the wiring site on typos) and returns a typed Column wrapper.
  • Column chains into the field-well factories: ds["col"].dim(), ds["col"].sum(), ds["col"].distinct_count(), etc. The chained form is the preferred new style — single source of truth for the (dataset, column) pair, validated.
  • Bare strings still work as the escape hatch for cases where no contract is registered (test fixtures, kitchen-sink) — the resolver treats string and Column refs uniformly at emit.

Dataset dataclass

Tree node for one dataset registration on the App.

identifier is the logical identifier visuals/filters reference (the existing per-app DS_INV_ACCOUNT_NETWORK / DS_AR_TRANSACTIONS strings — values like "inv-account-network-ds"). arn is the AWS QuickSight DataSetArn the deployed analysis points at.

Frozen because Dataset acts as the dependency-graph KEY: it must be hashable so visuals/filters that reference it can be collected into set[Dataset] for the dependency walk.

ds["column_name"] returns a typed Column ref (validated against the dataset's registered DatasetContract if one exists) — see Column docstring for the chained factory pattern.

identifier instance-attribute

identifier: str

arn instance-attribute

arn: str

__init__

__init__(identifier: str, arn: str) -> None

__getitem__

__getitem__(name: str) -> Column

Return a typed Column ref for name.

Validates name against the registered DatasetContract when one exists. Raises KeyError at the wiring site on typos — that turns a silent "broken visual at deploy" into a loud "broken column at construction".

When no contract is registered (early test fixtures or the kitchen-sink, which doesn't carry a contract), validation is skipped — the Column ref is built without checking, same as the bare-string escape hatch.

emit_declaration

emit_declaration() -> DataSetIdentifierDeclaration

Column dataclass

Typed column reference — dataset object ref + column name.

Authors construct via ds["col_name"] (which validates against the contract). Pass to Dim/Measure constructors directly, or use the chained factories below for the most concise wiring:

ds["amount"].sum()                 # Measure.sum
ds["recipient_id"].dim()           # categorical Dim
ds["window_end"].date()            # date Dim
ds["depth"].numerical()            # numerical Dim
ds["recipient_id"].distinct_count()

Frozen + hashable so a Column can be reused across visual slots (the chain ds["col"] returns a value-equal Column each time; ds["col"] == ds["col"] is True, useful for set membership in column-coverage tests).

Imports are lazy inside the factory methods to break the Dataset → Column → Dim/Measure → Dataset circular import.

dataset instance-attribute

dataset: Dataset

name instance-attribute

name: str

human_name property

human_name: str

Plain-English header label for this column (v8.5.0).

Looks up the column on the dataset's registered contract and returns the contract's human_name (override or auto-derived title-case). Returns the title-cased column name as a fallback if the dataset has no contract — keeps the test fixtures (which construct Datasets directly without going through build_dataset) usable without forcing a registry round-trip.

__init__

__init__(dataset: Dataset, name: str) -> None

dim

dim(*, kind: DimKind = 'categorical', field_id: str | AutoResolved = AUTO) -> Dim

date

date(*, date_granularity: TimeGranularity | None = 'DAY', field_id: str | AutoResolved = AUTO) -> Dim

numerical

numerical(*, field_id: str | AutoResolved = AUTO, currency: bool = False) -> Dim

sum

sum(*, field_id: str | AutoResolved = AUTO, currency: bool = False) -> Measure

max

max(*, field_id: str | AutoResolved = AUTO, currency: bool = False) -> Measure

min

min(*, field_id: str | AutoResolved = AUTO, currency: bool = False) -> Measure

average

average(*, field_id: str | AutoResolved = AUTO, currency: bool = False) -> Measure

count

count(*, field_id: str | AutoResolved = AUTO) -> Measure

distinct_count

distinct_count(*, field_id: str | AutoResolved = AUTO) -> Measure

fields

Field-well leaf nodes — Dim + Measure typed wrappers.

Every visual's field wells contain a mix of DimensionField and MeasureField entries (source / target columns, group-by fields, aggregated values). These tree nodes wrap them with typed factories (Dim.date(...), Measure.sum(...)) so construction-time typing drives what the visual gets, rather than hand-wiring the underlying models every time.

Auto field_id (L.1.16): both Dim and Measure accept an optional field_id keyword. When omitted, the App walker assigns f-{visual_kind}-s{sheet_idx}-v{visual_idx}-{role}{slot_idx} at emit time. Authors typically pass Dim(ds, "column_name") and reference the leaf via Python variable for sort / drill plumbing (both accept Dim / Measure object refs in addition to bare field-id strings).

DimKind module-attribute

DimKind = Literal['categorical', 'date', 'numerical']

MeasureKind module-attribute

MeasureKind = Literal['sum', 'max', 'min', 'average', 'count', 'distinct_count']

FieldRef module-attribute

FieldRef = Dim | Measure | str

Dim dataclass

One dimension field-well entry — typed wrapper that emits a DimensionField of the appropriate kind.

dataset is a Dataset object ref — the locked L.1.7 hard switch. The dataset must be registered on the parent App (via app.add_dataset()) for the analysis to emit.

column accepts either a bare str (a real column on the dataset) or a CalcField object ref (an analysis-level calculated field). The CalcField ref carries the calc-field identity through the type checker — the App's emit-time validation catches references to unregistered calc fields.

Default kind is categorical (the most common); use the date() / numerical() classmethods for the other variants.

field_id is keyword-only and Optional (L.1.16 auto-ID). When omitted, the App walker assigns one based on the leaf's tree position. Pass an explicit field_id="..." only when external consumers (browser e2e selectors, etc.) need a stable id — cross-reference plumbing (sort_by, drill writes) accepts the leaf object directly.

Identity-keyed (eq=False) so the auto-id resolver can mutate the field_id at emit time. Dim leaves stay hashable via the default object identity hash, which lets the dependency graph set-membership check work.

dataset instance-attribute

dataset: Dataset

column instance-attribute

column: ColumnRef

kind class-attribute instance-attribute

kind: DimKind = 'categorical'

date_granularity class-attribute instance-attribute

date_granularity: TimeGranularity | None = field(default=None, kw_only=True)

field_id class-attribute instance-attribute

field_id: str | AutoResolved = field(default=AUTO, kw_only=True)

currency class-attribute instance-attribute

currency: bool = field(default=False, kw_only=True)

__init__

__init__(dataset: Dataset, column: ColumnRef, kind: DimKind = 'categorical', *, date_granularity: TimeGranularity | None = None, field_id: str | AutoResolved = AUTO, currency: bool = False) -> None

date classmethod

date(dataset: Dataset, column: ColumnRef, *, date_granularity: TimeGranularity | None = 'DAY', field_id: str | AutoResolved = AUTO) -> Dim

Date dimension. date_granularity defaults to "DAY" — QuickSight's most common bucket for daily series. Pass None to omit the granularity (the renderer falls back to its default, which can shift bucketing on day-vs-month dashboards).

numerical classmethod

numerical(dataset: Dataset, column: ColumnRef, *, field_id: str | AutoResolved = AUTO, currency: bool = False) -> Dim

calc_field

calc_field() -> CalcField | None

The CalcField this Dim references, or None if it points at a real dataset column. Used by the dependency-graph walk.

emit

emit() -> DimensionField

emit_unaggregated_field

emit_unaggregated_field() -> dict[str, object]

Emit the raw UnaggregatedField dict shape used inside TableUnaggregatedFieldWells.Values. The model layer types that field as list[dict[str, Any]] rather than a typed union, so the tree emits it as a dict directly.

Q.1.a.7 — When currency=True is set on a numerical Dim, the same USD FormatConfiguration that emit() wires onto a NumericalDimensionField is also folded into the unaggregated field shape so table cells render with "$" + thousands separator + 2 decimals. Without this, currency=True only took effect when the Dim was used as a chart axis or KPI value, not when it was used as a table column (the by-far common case).

Measure dataclass

One value field-well entry — typed wrapper that emits a MeasureField with the appropriate aggregation shape.

dataset is a Dataset object ref (L.1.7 hard switch). The dataset must be registered on the parent App for the analysis to emit.

field_id is keyword-only and Optional (L.1.16 auto-ID). When omitted, the App walker assigns one based on the leaf's tree position.

Use the classmethod factories for ergonomic construction: Measure.sum(...), Measure.distinct_count(...), etc. Aggregation kind determines which underlying model class is emitted (numerical aggregations on numeric columns, categorical on count-style aggregations).

dataset instance-attribute

dataset: Dataset

column instance-attribute

column: ColumnRef

kind instance-attribute

field_id class-attribute instance-attribute

field_id: str | AutoResolved = field(default=AUTO, kw_only=True)

currency class-attribute instance-attribute

currency: bool = field(default=False, kw_only=True)

__init__

__init__(dataset: Dataset, column: ColumnRef, kind: MeasureKind, *, field_id: str | AutoResolved = AUTO, currency: bool = False) -> None

sum classmethod

sum(dataset: Dataset, column: ColumnRef, *, field_id: str | AutoResolved = AUTO, currency: bool = False) -> Measure

max classmethod

max(dataset: Dataset, column: ColumnRef, *, field_id: str | AutoResolved = AUTO, currency: bool = False) -> Measure

min classmethod

min(dataset: Dataset, column: ColumnRef, *, field_id: str | AutoResolved = AUTO, currency: bool = False) -> Measure

average classmethod

average(dataset: Dataset, column: ColumnRef, *, field_id: str | AutoResolved = AUTO, currency: bool = False) -> Measure

count classmethod

count(dataset: Dataset, column: ColumnRef, *, field_id: str | AutoResolved = AUTO) -> Measure

distinct_count classmethod

distinct_count(dataset: Dataset, column: ColumnRef, *, field_id: str | AutoResolved = AUTO) -> Measure

calc_field

calc_field() -> CalcField | None

The CalcField this Measure references, or None if it points at a real dataset column.

emit

emit() -> MeasureField

resolve_field_id

resolve_field_id(ref: FieldRef) -> str

Read the resolved field_id off a Dim / Measure / bare string.

calc_fields

Typed analysis-level calculated fields (L.1.8).

A CalcField is the typed wrapper around the existing per-app CalculatedField dict ({Name, DataSetIdentifier, Expression}). Visuals and filters reference calc fields the same way they reference real dataset columns — by passing the column to Dim / Measure / CategoryFilter / NumericRangeFilter. The column slot accepts either a bare str (a real column or a calc-field name) OR a CalcField object reference; the typed ref carries the validated calc-field identity through the type checker.

Validation (L.1.8):

  • Analysis.add_calc_field rejects duplicate calc-field names within an analysis.
  • App._validate_calc_field_references (added in L.1.8) raises if any tree-referenced CalcField isn't registered on the Analysis. Catches "filter references calc field that doesn't exist" and "calc field declared but never used".

Dependency graph (L.1.7 + L.1.8):

  • Each CalcField carries a Dataset ref. The CalcField's dataset participates in App.dataset_dependencies() so declaring a calc field on dataset D establishes D as a dep even when no visual directly references D's columns.

Auto-name (L.2.6 follow-up): name is Optional. When omitted, the App walker assigns calc-{idx} at emit time based on the calc field's index in analysis.calc_fields. Pass an explicit name= when the calc field's column header text matters to analysts (the name becomes the underlying ColumnName in the data model — analyst-facing unless a visual's label options override it).

ColumnRef module-attribute

ColumnRef = str | CalcField | Column

CalcField dataclass

Tree node for one analysis-level calculated field.

name is the column-style identifier visuals/filters reference (e.g. "is_anchor_edge"). Optional — auto-derived as calc-{idx} at emit time when not specified.

dataset is the Dataset object ref the expression evaluates against. expression is the QuickSight calc expression (e.g. "ifelse({source} = ${pAnchor}, 'yes', 'no')").

shape is Optional and only matters for drill sources: when a drill action reads this calc field's value (via a Dim / Measure object ref in the drill's writes), the tree needs a ColumnShape to type-check the drill parameter binding. Tag here once rather than re-passing the shape at every drill site.

Identity-keyed (eq=False) so the auto-name resolver can mutate the name field at emit time. CalcFields stay hashable via the default object identity hash, which is what the dependency-graph set membership needs anyway.

Emits a plain dict that drops straight into AnalysisDefinition.CalculatedFields — same shape the existing builders write today.

dataset instance-attribute

dataset: Dataset

expression instance-attribute

expression: str

name class-attribute instance-attribute

name: str | AutoResolved = AUTO

shape class-attribute instance-attribute

shape: ColumnShape | None = None

__init__

__init__(dataset: Dataset, expression: str, name: str | AutoResolved = AUTO, shape: ColumnShape | None = None) -> None

emit

emit() -> dict[str, str]

resolve_column

resolve_column(column: ColumnRef) -> str

Read the column-name string off a ColumnRef.

For a CalcField, the name is set by App._resolve_auto_ids(); callers asserting the resolver ran can rely on this returning str.

calc_field_in

calc_field_in(column: ColumnRef) -> CalcField | None

Return the CalcField if column is one, else None.

Used by the dependency-graph walk to harvest CalcField refs from Dim / Measure / Filter column slots. Column refs return None (they reference a real dataset column, not a calc field).