The attached papers include published research papers containing descriptions of visual analytics workflows. Your task will be to use this as a workflow library for recommending workflows addressing new problems. Analyse the library and extract reusable patterns.
Reusable Visual Analytics Workflow Patterns
After analysing all 17 papers, I extracted the following reusable workflow patterns organized by type:
1. Iterative Model Building & Validation
| Aspect | Description |
|---|---|
| Sources | Partition-Based Framework, EMA, RfX, VA_4_ST |
| Core Loop | Build model → Visualize errors/residuals → Identify weaknesses → Refine → Repeat |
| Key Idea | Use derived quantities (residuals, error distributions) as the target of the next visual exploration cycle to expose effects not yet captured by the model. |
Generic Steps:
- Identify candidate features (ranked by relevance)
- Build initial model on top-ranked feature(s)
- Visualize model residuals over remaining features
- Identify next best feature or interaction to add
- Refit model; evaluate improvement (e.g., RMSE)
- Stop when gains are negligible or overfitting appears
2. Progressive Abstraction / Simplification
| Aspect | Description |
|---|---|
| Sources | Episodes & Topics, Event Sequence Simplification, Movement→Events→Places, MobilityGraphs, Reducing Snapshots to Points |
| Core Idea | Transform raw data through successive abstraction levels until interpretable patterns emerge. |
Generic Steps:
- Level 0 – Raw data: Time-stamped records, trajectories, event logs
- Level 1 – Episodes/Events: Segment data into meaningful units (sliding window, event extraction, discretisation)
- Level 2 – Patterns: Symbolise/cluster episodes into recognisable patterns (SAX encoding, topic modelling, spatial clustering)
- Level 3 – Distribution of Patterns: Visualise when/where/how patterns occur across the full dataset
3. Cluster-then-Explore
| Aspect | Description |
|---|---|
| Sources | Cluster & Calendar, MobilityGraphs, Reducing Snapshots to Points, Human-in-the-Loop |
| Core Idea | Compute features → Cluster → Visualize clusters on a familiar layout → Drill into individual clusters. |
Generic Steps:
- Define a feature vector for each data item
- Compute pairwise similarities/distances
- Apply clustering (density-based, k-means, hierarchical)
- Map clusters onto a meaningful layout (calendar, 2D projection, map)
- Interactively select/expand clusters to inspect members
- Label or characterise clusters semantically
4. Human-in-the-Loop ML
| Aspect | Description |
|---|---|
| Sources | Human-in-the-Loop, UTOPIAN, RfX, explainMLVis, What-If Tool |
| Core Idea | Domain experts steer or validate an ML model through interactive visual tools without needing ML expertise. |
Generic Steps:
- Train an initial (possibly imperfect) model
- Visualise model decisions/outputs at multiple granularities
- Expert inspects and identifies problems (misclassifications, implausible rules)
- Expert provides corrections (relabel, merge topics, adjust thresholds, remove features)
- System updates model incorporating expert feedback
- Repeat until model is trusted / sufficiently accurate
5. Similarity-based Recommendation
| Aspect | Description |
|---|---|
| Sources | EventAction, What-If Tool, EMA |
| Core Idea | Given a current case, find similar archived cases, explore their outcomes, and recommend actions. |
Generic Steps:
- Represent the current record as a feature vector
- Retrieve similar archived records (distance/similarity metric)
- Visualise outcome distribution of similar cases
- Identify features/actions correlated with desired outcome
- Let user specify an action plan
- Provide feedback on estimated outcome probability
6. Multi-level Drill-down (Overview → Explanation → Instance)
| Aspect | Description |
|---|---|
| Sources | explainMLVis, RfX, Partition-Based Framework, What-If Tool |
| Core Idea | Move from aggregate statistics to feature-level explanations to raw instances for root-cause analysis. |
Generic Steps:
- Overview: Show aggregate performance (confusion matrix, ROC, accuracy)
- Feature/Explanation Level: Show which features drive decisions; group instances by explanation
- Instance Level: Inspect individual data items and their feature values
- Insight: Formulate hypothesis about model/data improvement
7. Spatio-temporal Event Aggregation Pipeline
| Aspect | Description |
|---|---|
| Sources | From Movement→Events→Places, MobilityGraphs, Visually-Driven Analysis |
| Core Idea | Extract events from trajectories, cluster them into places, aggregate by space×time, then analyse temporal profiles of places. |
Generic Steps:
- Define event criteria (speed < threshold, direction change, stops)
- Extract events from trajectories
- Cluster events spatially (+ optionally by direction/time) → define "places"
- Aggregate events/trajectories by places and time intervals
- Visualise resulting time series on maps (temporal diagrams, flow maps)
- Compare temporal profiles across places
8. Feature Engineering + Visual Assessment
| Aspect | Description |
|---|---|
| Sources | Human-in-the-Loop, Partition-Based Framework, RfX |
| Core Idea | Derive interval-based or partition-based features, visually assess their discriminative power, iterate until features effectively separate classes. |
Generic Steps:
- Derive candidate features (summary statistics, domain-specific transformations)
- Visualise feature distributions (histograms, projections)
- Assess separability of classes in feature space
- Select best feature combination
- Generate labelled examples (via clustering + inspection)
- Validate with simple classifier (e.g., kNN)
9. Temporal Pattern Discovery via Clustering + Temporal Layout
| Aspect | Description |
|---|---|
| Sources | Cluster & Calendar, MobilityGraphs, Episodes & Topics |
| Core Idea | Cluster time-indexed items by behavioural similarity, then display cluster membership on a calendar/timeline to reveal periodic patterns. |
Generic Steps:
- Segment time series into comparable units (days, episodes, snapshots)
- Define distance metric between units
- Cluster units by similarity
- Display cluster assignments on a temporal layout (calendar, matrix, timeline)
- Inspect average pattern per cluster
- Identify weekly/daily/seasonal recurrences and anomalies
10. Exploratory Model Analysis (EMA)
| Aspect | Description |
|---|---|
| Sources | EuroVis2019-EMA |
| Core Idea | When neither the modelling task nor model type is known a priori, the workflow supports discovering what models can be built from a dataset. |
Generic Steps:
- Data Exploration: Visualise data attributes and relationships
- Problem Exploration: System proposes possible modelling problems (classification, regression, forecasting)
- Problem Specification: User selects/refines a modelling problem
- Model Generation: AutoML generates diverse candidate models
- Model Exploration: Compare models via post-hoc predictions on holdout data
- Model Selection & Export: Choose and export preferred model(s)
Cross-cutting Design Principles
| Principle | Description | Papers |
|---|---|---|
| Linked multi-view coordination | Changes in one view filter/highlight in others | All papers |
| Ranking + small multiples | Rank items by computed relevance, show top-ranked in coordinated small multiples | Partition-Based, RfX, EMA |
| Derived quantities as exploration targets | Use residuals, differences, variances as new targets to expose subtle effects | Partition-Based, MobilityGraphs |
| Adjustable level of detail | Let users control granularity (partitioning depth, number of clusters, time window) | Partition-Based, MobilityGraphs, Reducing Snapshots |
| Colour-coded categorical membership | Assign distinct colours to clusters/topics/classes for rapid associative perception | Episodes & Topics, MobilityGraphs, Reducing Snapshots |
| Semantic interaction / direct manipulation | Let users split/merge/relabel/drag to steer computation | UTOPIAN, Human-in-the-Loop, What-If Tool |
| Provenance / undo | Track analysis history so users can backtrack | Event Sequence Simplification, Partition-Based |
| Model-agnostic explanations | Treat model as black box; explain via input–output perturbation | explainMLVis, What-If Tool |
These patterns can be composed and adapted to address new analysis problems by selecting the pattern(s) whose input data type, analytical goal, and user expertise level match the new problem context.
I have data about use of public bikes in a city. The records represent trips: bike ID, origin place and time, destination place and time. The origin and destination places are docking stations distributed over the city. The coordinates and capacities of the docking stations are given. In many popular places, there are multiple docking stations clustered close to each other. I need to analyse the spatio-temporal patterns of taking and returning bikes. I am especially interested in revealing patterns of stations running out of available bikes and stations overcrowded with returned bikes. The patterns are expected to be related to the diurnal and weekly cycle as well as seasons and holidays. I think, the analysis should be done not at the level of individual stations but at the level of "places" containing groups of spatially close stations. Based on my analysis, I want to develop a model that will tell me for any given date and time of the day how bikes should be allocated/relocated in order to satisfy the demands and avoid overfilling of stations.
Analytical Approach for Public Bike-Sharing Data
Below is a structured methodology that covers your entire pipeline — from spatial grouping of stations, through spatio-temporal pattern analysis, to building a predictive/prescriptive model for bike reallocation.
1. Defining "Places" by Grouping Spatially Close Stations
Goal: Aggregate individual docking stations into meaningful spatial units ("places").
Method:
- Apply density-based spatial clustering (e.g., DBSCAN or OPTICS) to station coordinates, using geographic distance thresholds informed by walking distance (e.g., 200–400 m). This follows the approach described in From Movement Tracks through Events to Places (Andrienko et al., 2011), where density-based clustering delineates relevant places with arbitrary shapes and sizes.
- Incorporate station capacity as a weighting factor — stations with higher capacity in the same cluster should dominate the aggregate statistics.
- Validate clusters visually on a map. Adjust thresholds interactively until the resulting places correspond to recognizable urban zones (commercial centres, transit hubs, residential areas, parks, etc.).
Output: A set of places, each containing one or more stations, with an aggregate capacity (sum of individual station capacities).
2. Deriving Time Series for Each Place
For each place, compute time series of:
| Metric | Description |
|---|---|
| Departures(t) | Number of bikes taken from the place in time interval t |
| Arrivals(t) | Number of bikes returned to the place in time interval t |
| Net flow(t) | Arrivals − Departures |
| Cumulative net flow | Running sum of net flow from a reference start |
| Occupancy estimate(t) | Estimated number of bikes present (if initial state is known or can be inferred) |
| Fullness ratio(t) | Occupancy / total capacity of the place |
Choose a temporal granularity of 30 min or 1 hour — fine enough to capture morning/evening rush effects but coarse enough to smooth noise.
3. Detecting Critical Events (Running Out / Overcrowding)
Define events as threshold crossings:
- Empty event: fullness ratio drops below a low threshold (e.g., < 10% of capacity)
- Full event: fullness ratio exceeds a high threshold (e.g., > 90% of capacity)
Extract these events with their spatial location (place), time, duration, and severity. This event-based view aligns with the methodology in From Movement Tracks through Events to Places.
4. Analysing Spatio-Temporal Patterns
4.1 Temporal pattern analysis (diurnal, weekly, seasonal)
Apply the Cluster and Calendar-based Visualization approach (van Wijk & van Selow, 1999):
- For each place, treat daily profiles (e.g., 24 or 48 values per day) as patterns.
- Cluster similar daily profiles using hierarchical agglomerative clustering with a distance measure such as RMSE.
- Display the cluster assignments on a calendar view — this immediately reveals which days share similar patterns and exposes the roles of weekdays, weekends, holidays, and seasons.
Expected findings:
- Working-day patterns with morning outflows from residential places and inflows to business/university places (reversed in the evening)
- Weekend patterns with different spatial foci (parks, leisure areas)
- Holiday patterns resembling weekends
- Seasonal variation in overall usage intensity
4.2 Multi-attribute pattern analysis
Since you have multiple places with concurrent time series, consider the Episodes and Topics approach (Andrienko et al., 2023):
- Divide the timeline into episodes (e.g., 1-day or half-day windows).
- For each place and episode, compute SAX-like symbolic patterns of the fullness ratio.
- Apply topic modelling (NMF or LDA) to discover recurring co-occurrence patterns across places — e.g., "morning rush topic" where central places fill up while suburban places empty simultaneously.
4.3 Flow pattern analysis
Use the MobilityGraphs methodology (von Landesberger et al., 2016):
- Represent flows between places as a dynamic weighted directed graph.
- Apply temporal clustering (e.g., k-means on aggregated flow vectors) to group time intervals with similar flow situations.
- Visualize representative flow maps for each temporal cluster on a geographic display.
This reveals, for instance, that weekday mornings have a strong centripetal flow pattern, while evenings have a centrifugal one.
5. Building the Predictive/Prescriptive Model
5.1 Feature engineering
For each place and target time point, construct features:
| Category | Examples |
|---|---|
| Temporal | Hour of day, day of week, month, is_holiday, is_school_vacation |
| Recent state | Current occupancy, net flow in last 1/2/3 hours |
| Historical | Average occupancy at same hour/day-of-week over past weeks |
| Weather (if available) | Temperature, precipitation, wind |
| Event context | Special events, university calendar |
5.2 Model type
Following the Partition-Based Framework for Regression (Mühlbacher & Piringer, 2013) and the Exploratory Model Analysis workflow (Cashman et al., 2019):
- Start with exploratory feature ranking — partition-based visualizations to identify which features best explain occupancy variation.
- Build a regression model (e.g., gradient-boosted trees or random forest) to predict the expected occupancy of each place at any given future date/time.
- Validate using residual analysis with partition-based visualizations to detect local biases (e.g., model underestimates demand on rainy days).
5.3 From prediction to prescription (reallocation)
Once you have a model predicting expected demand (departures) and supply (arrivals) for each place and time slot:
- Compute expected deficit/surplus = predicted arrivals − predicted departures + current stock − desired buffer.
- Formulate a redistribution plan: for a given date/time, identify places expected to become empty or full and calculate how many bikes should be moved, from where, and to where.
- Optimise redistribution routes considering truck capacity and travel time between places.
5.4 Interactive model exploration
Use a What-If Tool style interface (Wexler et al., 2019) to allow operators to:
- Select a date/time and see predicted occupancy across all places on a map.
- Adjust parameters (e.g., "what if there is a concert at place X tonight?") and see updated predictions.
- Compare the model's suggestion against current allocation.
6. Summary of the Workflow
Stations → [Spatial Clustering] → Places
Trips → [Temporal Aggregation] → Time Series per Place
Time Series → [Calendar Clustering + Topic Modelling] → Spatio-Temporal Patterns
Patterns → [Feature Engineering] → Predictive Model
Model + Current State → [Optimization] → Reallocation Plan
This pipeline leverages visual analytics at every step: maps for spatial validation, calendar views for temporal patterns, flow maps for mobility structure, and partition-based displays for model building and validation. The human remains in the loop for interpreting patterns, validating model decisions, and incorporating domain knowledge (e.g., planned events, infrastructure changes).