Appendix to Section 6.2
Partition-Based Clustering (K-Means) – EU NUTS-3 Population Structure

B.1 – Round 1: Quality Exploration (K = 3 … 50)

The analyst begins Phase 1 of the SI workflow by sweeping the number of clusters from K = 3 to K = 50 with a fixed random seed. The data comprise demographic indicators for ~1,500 European NUTS-3 regions: percentage of female population and distribution across age groups (0–14, 15–24, 25–44, 45–64), with pct_male and pct_65_plus excluded to avoid linear redundancy.

ParameterValue
K (number of clusters)3 … 50, step 1
Random seed1 (fixed)
Iterations48
Input attributespct_female, pct_0_14, pct_15_24, pct_25_44, pct_45_64

The IteraScope metrics chart reveals a clear silhouette peak at K = 6 with a secondary plateau from K ≈ 18 to K ≈ 24. The SSE curve shows the expected monotone decrease with no sharp elbow. The HDBSCAN archetype detection marks K = 24 as the first complete iteration (all archetypes present); however, K = 20 already achieves a silhouette score close to the plateau maximum and contains all but one archetype.

K-Means Quality Exploration – K = 3 to 50
Figure B.1: IteraScope display for K-Means quality exploration (K = 3 … 50). The metrics chart (top) shows silhouette peaking at K = 7 with a secondary plateau around K = 18–24. The first complete iteration (dark gridline) appears at K = 24. The Sankey view (centre) shows progressive cluster splitting as K increases. The 2D embedding (bottom-right) shows group archetypes colour-coded by HDBSCAN cluster membership.
(This view is shown as Fig. 5 in the paper.)

Transition Analysis: Comparing K = 20 and K = 24 in IteraScope

To compare the two candidate configurations in detail, the analyst examines the Sankey transitions between K = 20 and K = 24. The first view below shows the full transition pattern: most groups at K = 20 persist as single groups at K = 24, with a few splitting into two or more sub-clusters. The violet cluster (20.10) is of particular interest—it maintains strong band continuity toward cluster 24.7.

IteraScope – Sankey transition K=20 to K=24
Figure B.2: IteraScope showing Sankey transitions between K = 20 and K = 24. Most bands are nearly horizontal, indicating that core clusters persist. A few bands bifurcate, representing clusters that split when K increases from 20 to 24. The violet cluster (20.10) maintains a dominant band toward 24.7.

The analyst then focuses on cluster 20.10 specifically, using the "highlight transitions from" interaction to visualise all outgoing flows from this group to K = 24. The Sankey view confirms that the vast majority of members persist in cluster 24.7, with only thin bands splitting off to adjacent clusters (24.3, 24.11, etc.).

IteraScope – Transition from cluster 20.10 to all K=24 clusters
Figure B.3: IteraScope showing all transitions from cluster 20.10 (K = 20) to clusters at K = 24. The dominant band flows to 24.7 (the violet cluster's successor); thin peripheral bands flow to adjacent clusters—confirming that the additional clusters at K = 24 merely shave off boundary members.

B.2 – Round 1: Domain Validation at K = 20

The analyst shares the K = 20 grouping to the linked map and parallel coordinates to assess geographic coherence and demographic interpretability. The map reveals clearly delineated regional typologies: Nordic, Mediterranean, Central-European, and Eastern-European patterns are visible as spatially contiguous clusters. The parallel coordinates plot shows well-separated multivariate profiles for each cluster.

EU NUTS3 Map – K = 20
Figure B.4: Choropleth map for K = 20. The violet cluster (20.10) forms a spatially contiguous block covering Scandinavia, the Baltic states, and parts of northern Germany. Other clusters show similarly coherent geographic patterns: Mediterranean (warm colours), Central-European (greens), Eastern-European (blues).
(Corresponds to Fig. 6, left, in the paper.)
Parallel coordinates – K = 20
Figure B.5: Parallel coordinates for K = 20 with the violet cluster highlighted. Its profile shows: comparatively lower share of children (pct_0_14) and young adults (pct_15_24, pct_25_44), combined with elevated proportions in the 45–64 band—consistent with ageing, sparsely populated northern districts.
(Corresponds to Fig. 6, right, in the paper.)

B.3 – Round 1: Domain Validation at K = 24

For comparison, the analyst also shares the K = 24 grouping to the same linked views. The map shows similar macro-level patterns but appears visibly more scattered: several spatially coherent regions from K = 20 are now fragmented without clear interpretive gain. The parallel coordinates plot confirms that the four additional clusters do not introduce qualitatively new demographic profiles—they subdivide existing ones along cluster boundaries.

EU NUTS3 Map – K = 24
Figure B.6: Choropleth map for K = 24. The same macro-patterns are visible but more fragmented. The violet cluster (now 24.7) has a reduced spatial footprint—some border districts have been reassigned to adjacent clusters, breaking geographic contiguity.
Parallel coordinates – K = 24
Figure B.7: Parallel coordinates for K = 24. The additional clusters (compared to K = 20) create more overlapping bands in indicator space, suggesting that the extra groups subdivide existing demographic profiles rather than revealing new ones.

B.4 – Round 1: Transition Analysis (20.10 → 24.7)

The analyst highlights the specific transition from cluster 20.10 to cluster 24.7 and propagates the resulting class attribute to the map and parallel coordinates. This isolates the members that persist in the violet cluster when K increases from 20 to 24. The map confirms that the retained members form the spatial core of the cluster—a contiguous mass across Scandinavia and the Baltic coast. The parallel coordinates plot shows that the demographic profile of this core is even more sharply defined than the full cluster at K = 20, since borderline members with intermediate profiles have been excluded.

EU NUTS3 Map – Transition 20.10 to 24.7
Figure B.8: Map showing members that transition from 20.10 to 24.7 (i.e., persist in the violet cluster). These form the contiguous spatial core in Scandinavia. The geographic coherence is even stronger than the full cluster at K = 20.
(Corresponds to Fig. 7a, left, in the paper.)
Parallel coordinates – Transition 20.10 to 24.7
Figure B.9: Parallel coordinates for the 20.10 → 24.7 transition. The retained members show a tighter, more distinctive demographic profile: very low pct_0_14 and pct_15_24, elevated pct_45_64—the "purest" ageing-northern profile within the original violet cluster.
(Corresponds to Fig. 7a, right, in the paper.)

B.5 – Round 1: Transition Analysis (20.10 → 24.*)

Next, the analyst highlights the transition from cluster 20.10 to all clusters at K = 24, creating a class attribute whose values encode each destination ("20.10 → 24.7", "20.10 → 24.3", "20.10 → 24.11", etc.). This decomposition reveals where the "lost" members go. The map shows that members reassigned away from the violet cluster are scattered along its geographic boundary—they are absorbed into geographically adjacent clusters rather than forming a single coherent new group. The parallel coordinates confirm that these boundary members have intermediate demographic profiles: they share characteristics of both the violet cluster and their new host clusters.

EU NUTS3 Map – Transition 20.10 to all K=24 clusters
Figure B.10: Map showing the full decomposition of cluster 20.10 into K = 24 destinations. The violet core (24.7) is surrounded by scattered coloured regions representing members lost to adjacent clusters. These losses are peripheral and geographically dispersed—not concentrated in any single area that would warrant a separate cluster.
(Corresponds to Fig. 7b, left, in the paper.)
Parallel coordinates – Transition 20.10 to all K=24 clusters
Figure B.11: Parallel coordinates showing all destination classes from 20.10. The dominant violet band (→ 24.7) retains the distinctive ageing profile; the thinner bands (other destinations) show intermediate values, confirming that the additional clusters at K = 24 merely capture boundary regions without introducing new demographic archetypes.
(Corresponds to Fig. 7b, right, in the paper.)

B.6 – Round 2: Seed Stability (K = 20, seeds 1 … 30)

To verify that the K = 20 solution is not an artefact of a particular random initialisation, the analyst reruns K-Means with K = 20 fixed and sweeps the random seed from 1 to 30. This round addresses Phase 5 of the SI workflow: archetype verification and robustness assessment.

ParameterValue
K (number of clusters)20 (fixed)
Random seed1 … 30, step 1
Iterations30

The IteraScope display shows remarkable consistency across seeds. The Sankey bands between consecutive seeds are almost perfectly horizontal—indicating no splits or merges. Silhouette fluctuates by less than 0.01 across all 30 seeds. The 2D embedding shows tight archetype clusters with negligible seed-to-seed drift, confirming that the cluster structure at K = 20 is determined by the data geometry rather than by the random initialisation.

Seed 1 is selected as the reference solution because: (i) none of its 20 clusters falls into HDBSCAN noise—indicating that every cluster possesses sufficient density and coherence; and (ii) the iteration is marked as complete, confirming that all recurrent archetypes are represented.

Seed stability – K = 20, seeds 1 to 30
Figure B.12: IteraScope display for the seed-stability sweep (K = 20, seeds 1–30). The metrics chart (top) shows negligible silhouette variation (< 0.01). The Sankey view (centre) shows near-horizontal bands across all 30 seeds—confirming that the cluster structure is fully robust to random initialisation. The 2D embedding (bottom-right) shows tight, overlapping archetype clusters with no seed-dependent outliers. Seed 1 is complete (dark gridline) and has no groups in HDBSCAN noise.

Summary of the Analytical Process

The table below summarises the two rounds of the partition-based clustering workflow, showing how each round addressed specific analytical questions within the SI workflow framework.

Round Parameters Workflow Phase Key Finding
1 K = 3 … 50, seed = 1 Phases 1–4 Silhouette plateau at K = 18–24; K = 20 balances metric quality with spatial coherence and demographic interpretability; K = 24 adds fragmentation without new archetypes; violet cluster (20.10) is the most geographically coherent group
2 K = 20 (fixed), seeds 1 … 30 Phase 5 Cluster structure fully robust to random initialisation; silhouette variation < 0.01; no seed produces HDBSCAN noise groups; seed 1 is complete

Final selected parameters: K = 20, seed = 1. This configuration produces 20 clusters capturing distinct European demographic typologies—including an ageing northern-European profile (violet cluster), young Mediterranean and Eastern-European profiles, and urban/metropolitan profiles—with high membership confidence, strong geographic coherence, near-complete archetype coverage, and full robustness to random initialisation.


End of Appendix to Section 6.2