Tree — Data¶
Dataset is the tree node for one dataset registration on the App;
Column is a typed reference to a column on a Dataset. The chained
factories (ds["col"].dim() / .sum() / .date()) build typed
Dim / Measure slots that visuals consume directly.
CalcField is an analysis-level calculated field — bound to one
Dataset, available across visuals via the same typed-Dim shape.
datasets ¶
Dataset tree nodes (L.1.7) + typed Column refs (L.1.17).
Dataset is a first-class tree concept: visuals and filters reference a
Dataset instance by object ref instead of by string identifier,
and the App walks the tree to derive the precise dependency
graph — which Sheet / Visual / FilterGroup uses which Dataset.
The dependency graph drives: - Selective deploy (only re-create datasets that downstream changes touch). - Matview REFRESH ordering (REFRESH only the matviews backing Datasets that an updated deploy surface depends on).
Construction-time check (in App.emit_analysis): every Dataset
referenced from the tree must be registered on the App via
app.add_dataset(). Catches "visual references undeclared dataset"
at emit time, where the existing string-keyed pattern lets the
mismatch flow through to deploy.
Typed Column refs (L.1.17 — fragility fix). Bare-string column
names in Dim(ds, "column_name") were silently typo-able. The
new path:
ds["column_name"]validatescolumn_nameagainst the dataset's registeredDatasetContract(raisesKeyErrorat the wiring site on typos) and returns a typedColumnwrapper.Columnchains into the field-well factories:ds["col"].dim(),ds["col"].sum(),ds["col"].distinct_count(), etc. The chained form is the preferred new style — single source of truth for the (dataset, column) pair, validated.- Bare strings still work as the escape hatch for cases where no contract is registered (test fixtures, kitchen-sink) — the resolver treats string and Column refs uniformly at emit.
Dataset
dataclass
¶
Tree node for one dataset registration on the App.
identifier is the logical identifier visuals/filters reference
(the existing per-app DS_INV_ACCOUNT_NETWORK / DS_AR_TRANSACTIONS
strings — values like "inv-account-network-ds"). arn is
the AWS QuickSight DataSetArn the deployed analysis points at.
Frozen because Dataset acts as the dependency-graph KEY: it must
be hashable so visuals/filters that reference it can be collected
into set[Dataset] for the dependency walk.
ds["column_name"] returns a typed Column ref (validated
against the dataset's registered DatasetContract if one exists)
— see Column docstring for the chained factory pattern.
__getitem__ ¶
__getitem__(name: str) -> Column
Return a typed Column ref for name.
Validates name against the registered DatasetContract
when one exists. Raises KeyError at the wiring site on
typos — that turns a silent "broken visual at deploy" into a
loud "broken column at construction".
When no contract is registered (early test fixtures or the kitchen-sink, which doesn't carry a contract), validation is skipped — the Column ref is built without checking, same as the bare-string escape hatch.
Column
dataclass
¶
Typed column reference — dataset object ref + column name.
Authors construct via ds["col_name"] (which validates against
the contract). Pass to Dim/Measure constructors directly, or use
the chained factories below for the most concise wiring:
ds["amount"].sum() # Measure.sum
ds["recipient_id"].dim() # categorical Dim
ds["window_end"].date() # date Dim
ds["depth"].numerical() # numerical Dim
ds["recipient_id"].distinct_count()
Frozen + hashable so a Column can be reused across visual slots
(the chain ds["col"] returns a value-equal Column each time;
ds["col"] == ds["col"] is True, useful for set membership in
column-coverage tests).
Imports are lazy inside the factory methods to break the Dataset → Column → Dim/Measure → Dataset circular import.
human_name
property
¶
human_name: str
Plain-English header label for this column (v8.5.0).
Looks up the column on the dataset's registered contract and
returns the contract's human_name (override or auto-derived
title-case). Returns the title-cased column name as a fallback
if the dataset has no contract — keeps the test fixtures (which
construct Datasets directly without going through
build_dataset) usable without forcing a registry round-trip.
fields ¶
Field-well leaf nodes — Dim + Measure typed wrappers.
Every visual's field wells contain a mix of DimensionField and
MeasureField entries (source / target columns, group-by fields,
aggregated values). These tree nodes wrap them with typed factories
(Dim.date(...), Measure.sum(...)) so construction-time typing
drives what the visual gets, rather than hand-wiring the underlying
models every time.
Auto field_id (L.1.16): both Dim and Measure accept an
optional field_id keyword. When omitted, the App walker assigns
f-{visual_kind}-s{sheet_idx}-v{visual_idx}-{role}{slot_idx} at
emit time. Authors typically pass Dim(ds, "column_name") and
reference the leaf via Python variable for sort / drill plumbing
(both accept Dim / Measure object refs in addition to bare
field-id strings).
MeasureKind
module-attribute
¶
MeasureKind = Literal['sum', 'max', 'min', 'average', 'count', 'distinct_count']
Dim
dataclass
¶
One dimension field-well entry — typed wrapper that emits a
DimensionField of the appropriate kind.
dataset is a Dataset object ref — the locked L.1.7 hard
switch. The dataset must be registered on the parent App (via
app.add_dataset()) for the analysis to emit.
column accepts either a bare str (a real column on the
dataset) or a CalcField object ref (an analysis-level
calculated field). The CalcField ref carries the calc-field
identity through the type checker — the App's emit-time
validation catches references to unregistered calc fields.
Default kind is categorical (the most common); use the
date() / numerical() classmethods for the other variants.
field_id is keyword-only and Optional (L.1.16 auto-ID). When
omitted, the App walker assigns one based on the leaf's tree
position. Pass an explicit field_id="..." only when external
consumers (browser e2e selectors, etc.) need a stable id —
cross-reference plumbing (sort_by, drill writes) accepts the
leaf object directly.
Identity-keyed (eq=False) so the auto-id resolver can mutate
the field_id at emit time. Dim leaves stay hashable via the
default object identity hash, which lets the dependency graph
set-membership check work.
date_granularity
class-attribute
instance-attribute
¶
date_granularity: TimeGranularity | None = field(default=None, kw_only=True)
field_id
class-attribute
instance-attribute
¶
field_id: str | AutoResolved = field(default=AUTO, kw_only=True)
__init__ ¶
__init__(dataset: Dataset, column: ColumnRef, kind: DimKind = 'categorical', *, date_granularity: TimeGranularity | None = None, field_id: str | AutoResolved = AUTO, currency: bool = False) -> None
date
classmethod
¶
date(dataset: Dataset, column: ColumnRef, *, date_granularity: TimeGranularity | None = 'DAY', field_id: str | AutoResolved = AUTO) -> Dim
Date dimension. date_granularity defaults to "DAY" —
QuickSight's most common bucket for daily series. Pass None
to omit the granularity (the renderer falls back to its default,
which can shift bucketing on day-vs-month dashboards).
numerical
classmethod
¶
numerical(dataset: Dataset, column: ColumnRef, *, field_id: str | AutoResolved = AUTO, currency: bool = False) -> Dim
calc_field ¶
calc_field() -> CalcField | None
The CalcField this Dim references, or None if it points at a real dataset column. Used by the dependency-graph walk.
emit_unaggregated_field ¶
emit_unaggregated_field() -> dict[str, object]
Emit the raw UnaggregatedField dict shape used inside
TableUnaggregatedFieldWells.Values. The model layer types
that field as list[dict[str, Any]] rather than a typed
union, so the tree emits it as a dict directly.
Q.1.a.7 — When currency=True is set on a numerical Dim, the
same USD FormatConfiguration that emit() wires onto a
NumericalDimensionField is also folded into the unaggregated
field shape so table cells render with "$" + thousands
separator + 2 decimals. Without this, currency=True only took
effect when the Dim was used as a chart axis or KPI value, not
when it was used as a table column (the by-far common case).
Measure
dataclass
¶
One value field-well entry — typed wrapper that emits a
MeasureField with the appropriate aggregation shape.
dataset is a Dataset object ref (L.1.7 hard switch). The
dataset must be registered on the parent App for the analysis
to emit.
field_id is keyword-only and Optional (L.1.16 auto-ID). When
omitted, the App walker assigns one based on the leaf's tree
position.
Use the classmethod factories for ergonomic construction:
Measure.sum(...), Measure.distinct_count(...), etc.
Aggregation kind determines which underlying model class is
emitted (numerical aggregations on numeric columns,
categorical on count-style aggregations).
field_id
class-attribute
instance-attribute
¶
field_id: str | AutoResolved = field(default=AUTO, kw_only=True)
__init__ ¶
__init__(dataset: Dataset, column: ColumnRef, kind: MeasureKind, *, field_id: str | AutoResolved = AUTO, currency: bool = False) -> None
sum
classmethod
¶
sum(dataset: Dataset, column: ColumnRef, *, field_id: str | AutoResolved = AUTO, currency: bool = False) -> Measure
max
classmethod
¶
max(dataset: Dataset, column: ColumnRef, *, field_id: str | AutoResolved = AUTO, currency: bool = False) -> Measure
min
classmethod
¶
min(dataset: Dataset, column: ColumnRef, *, field_id: str | AutoResolved = AUTO, currency: bool = False) -> Measure
average
classmethod
¶
average(dataset: Dataset, column: ColumnRef, *, field_id: str | AutoResolved = AUTO, currency: bool = False) -> Measure
count
classmethod
¶
distinct_count
classmethod
¶
distinct_count(dataset: Dataset, column: ColumnRef, *, field_id: str | AutoResolved = AUTO) -> Measure
calc_fields ¶
Typed analysis-level calculated fields (L.1.8).
A CalcField is the typed wrapper around the existing per-app
CalculatedField dict ({Name, DataSetIdentifier, Expression}).
Visuals and filters reference calc fields the same way they reference
real dataset columns — by passing the column to Dim / Measure
/ CategoryFilter / NumericRangeFilter. The column slot accepts
either a bare str (a real column or a calc-field name) OR a
CalcField object reference; the typed ref carries the validated
calc-field identity through the type checker.
Validation (L.1.8):
Analysis.add_calc_fieldrejects duplicate calc-field names within an analysis.App._validate_calc_field_references(added in L.1.8) raises if any tree-referencedCalcFieldisn't registered on the Analysis. Catches "filter references calc field that doesn't exist" and "calc field declared but never used".
Dependency graph (L.1.7 + L.1.8):
- Each
CalcFieldcarries aDatasetref. The CalcField's dataset participates inApp.dataset_dependencies()so declaring a calc field on dataset D establishes D as a dep even when no visual directly references D's columns.
Auto-name (L.2.6 follow-up): name is Optional. When omitted, the
App walker assigns calc-{idx} at emit time based on the calc
field's index in analysis.calc_fields. Pass an explicit name=
when the calc field's column header text matters to analysts (the name
becomes the underlying ColumnName in the data model — analyst-facing
unless a visual's label options override it).
CalcField
dataclass
¶
Tree node for one analysis-level calculated field.
name is the column-style identifier visuals/filters reference
(e.g. "is_anchor_edge"). Optional — auto-derived as
calc-{idx} at emit time when not specified.
dataset is the Dataset object ref the expression evaluates
against. expression is the QuickSight calc expression
(e.g. "ifelse({source} = ${pAnchor}, 'yes', 'no')").
shape is Optional and only matters for drill sources: when a
drill action reads this calc field's value (via a Dim /
Measure object ref in the drill's writes), the tree needs
a ColumnShape to type-check the drill parameter binding. Tag
here once rather than re-passing the shape at every drill site.
Identity-keyed (eq=False) so the auto-name resolver can mutate
the name field at emit time. CalcFields stay hashable via the
default object identity hash, which is what the dependency-graph
set membership needs anyway.
Emits a plain dict that drops straight into
AnalysisDefinition.CalculatedFields — same shape the existing
builders write today.
__init__ ¶
__init__(dataset: Dataset, expression: str, name: str | AutoResolved = AUTO, shape: ColumnShape | None = None) -> None
resolve_column ¶
resolve_column(column: ColumnRef) -> str
Read the column-name string off a ColumnRef.
For a CalcField, the name is set by App._resolve_auto_ids();
callers asserting the resolver ran can rely on this returning str.
calc_field_in ¶
Return the CalcField if column is one, else None.
Used by the dependency-graph walk to harvest CalcField refs from
Dim / Measure / Filter column slots. Column refs return None
(they reference a real dataset column, not a calc field).