ATWL Session — Bikes Recommendation from ATWL

User #1

The attached document contains a definition of a language called ATWL designed for representation of visual analytics workflows. You need to understand the language concepts and semantics it is meant to express, and also the syntax used for expressing the semantics. In the next step, you will be given a "library" of VA workflows extracted from published papers. Your task will be to help a user to design a workflow for achieving user's goals by finding relevant examples in the "library" and suggesting how they can be adapted to the user's case.

Assistant #1

💭 Show reasoning

The user has provided me with a detailed specification document for ATWL (Artifact–Transform Workflow Language), which is a declarative language for representing visual analytics workflows. Let me summarize my understanding of the key concepts: **Core Building Blocks:** 1. **Artifacts** - identifiable analytical objects (8 types) 2. **Transforms** - how artifacts are derived/modified 3. **Control structures** - sequencing, branching, iteration **Artifact Types:** 1. **Entities** - collections of identifiable analytical objects (with internal structure, embedment, and features) 2. **Feature** - explicit descriptors of properties of entities 3. **Arrangement** - how entities are organized within a reference structure 4. **Visualisation** - external visual representation for human perception 5. **Pattern** - abstracted regularities or structures 6. **Model** - formal/computational representation for prediction/simulation/explanation 7. **Knowledge** - explicitly formulated knowledge (derived or injected) 8. **Specification** - parameters, settings, constraints, or method choices Now I'm looking at the transform intents, which define the operations available in the language. So far I can see define-unit for creating entities, characterise for computing features, contextualise for placing entities into arrangements, and it looks like the list continues with visual operations. The workflow structure uses a header with optional templates that summarize analytic stages using this intent vocabulary, and can include loop notation for repetitive processes. I'm ready to work with the user's library of VA workflows once they provide it.

I've thoroughly reviewed the ATWL (Artifact–Transform Workflow Language) specification. Here's a summary of my understanding:

Key Concepts

8 Artifact Types:

Type	Role
entities	Collections of analytical objects (with internal structure, embedment, features)
feature	Descriptors of entity properties or relationships
arrangement	Positioning of entities within a reference context
visualisation	Visual representations for human perception
pattern	Abstracted regularities/structures
model	Formal/computational representations (prediction, simulation, explanation)
knowledge	Explicit statements, rules, insights (derived or injected)
specification	Parameters, settings, constraints controlling transforms

8 Transform Intents:
define-unit → characterise → contextualise → visualise → abstract → build-model → generate-knowledge → assess

Control Structures: loops (with qualitative stopping conditions), conditionals, assignments

Workflow Structure: header with optional template summarizing analytic stages using intent vocabulary

I'm ready for the next step — please provide the library of VA workflows and/or the user's goals, and I'll help find relevant examples and suggest adaptations.

User #2

I have attached an additional file with the workflow library. You can perform initial analysis for revealing reusable and adaptable parts.

Assistant #2

💭 Show reasoning

The user has provided a comprehensive library of 17 ATWL-represented visual analytics workflows. Let me analyze this library to identify reusable and adaptable parts, common patterns, and structural similarities that would help me assist users in designing new workflows. Let me organize my analysis: ## 1. Workflow Categories by Domain **Temporal/Time Series Analysis:** - 1.1 Cluster-Calendar (time series clustering + calendar visualization) - 1.4 EventFlow (temporal event sequence simplification) - 1.5 EventAction (temporal event sequence recommendation) - 1.9 Progressive Abstraction of Multivariate Temporal Data **Movement/Spatial Analysis:** - 1.3 MobilityGraphs (mass mobility flow analysis) - 1.6 Extracting Significant Places from Trajectories - 1.7 Progressive Clustering of Trajectories - 1.12 Data Engineering for Movement Pattern Classification **Network Analysis:** - 1.2 Dynamic Network Analysis - 1.16 Exploring Deep Learning Models in TensorFlow (dataflow graph) **Topic/Text Analysis:** - 1.8 Human-Steered Topic Modelling **Model Building & Evaluation:** - 1.10 Partition-based Regression Modelling - 1.11 Spatio-temporal Analysis and Modelling - 1.13 Exploratory Model Analysis - 1.14 Binary Classifier Diagnostics - 1.15 Interactive Exploration of Random Forest Classifier - 1.17 What-If Probing of ML Models I'm noticing several recurring patterns across these analyses. There's an iterative clustering approach that shows up in multiple projects where I define clusters, visualize them, assess the results, then either adjust parameters or move forward. I'm also seeing a feature engineering loop that involves computing features, visualizing, assessing, and refining as needed.

Initial Analysis of the Workflow Library

I've analyzed all 17 workflows and identified reusable patterns, common modules, and structural building blocks that can be adapted to new tasks.

Workflow Categories by Domain

Category	Workflows	Key Characteristics
Time Series / Temporal	1.1, 1.4, 1.5, 1.9	Episode partitioning, temporal patterns, calendar views
Movement / Spatial	1.3, 1.6, 1.7, 1.12	Trajectory analysis, spatial clustering, place extraction
Network / Graph	1.2, 1.16	Snapshot projection, hierarchical simplification
Topic / Text	1.8	Matrix decomposition, interactive refinement
Model Building	1.10, 1.11	Iterative feature selection, residual analysis
Model Evaluation / Diagnostics	1.13, 1.14, 1.15, 1.17	Black-box probing, explanation, fairness

Reusable Modular Patterns

I identified 9 recurring structural modules that appear across multiple workflows:

Module A: Iterative Clustering with Visual Assessment

Found in: 1.1, 1.3, 1.6, 1.7, 1.11, 1.12

loop:
  define-unit (cluster by similarity) →
  visualise (clusters on map/graph/projection) →
  assess (quality & interpretability) →
  [adjust parameters | exit]

Module B: Projection-based Exploration

Found in: 1.2, 1.8, 1.12

characterise (vectorize) →
contextualise (dimensionality reduction to 2D) →
visualise (scatterplot with encoding) →
abstract (identify groupings/patterns)

Module C: Model Building with Residual Feedback

Found in: 1.10, 1.11

loop:
  generate-knowledge (specify model config) →
  build-model (fit) →
  characterise (compute residuals) →
  visualise (residual distributions) →
  assess (random vs. systematic) →
  [refine model | exit]

Module D: Progressive Simplification

Found in: 1.4, 1.7, 1.16

loop:
  define-unit (simplify/filter/aggregate) →
  visualise (simplified view) →
  assess (interpretability vs. information loss) →
  [simplify further | exit]

Module E: Feature Engineering Loop

Found in: 1.9, 1.12

loop:
  characterise (compute/refine features) →
  visualise (feature-based views) →
  assess (discriminative power & coherence) →
  [engineer new features | exit]

Module F: Similarity-based Cohort Selection

Found in: 1.5, 1.13

characterise (compute similarity/relevance) →
visualise (distribution with controls) →
define-unit (select subset by threshold) →
characterise (analyze subset properties)

Module G: Temporal Partitioning → Profiling → Grouping

Found in: 1.1, 1.2, 1.3, 1.9

define-unit (partition into temporal units) →
characterise (extract profiles/vectors) →
define-unit (cluster by profile similarity) →
visualise (calendar/timeline + representatives)

Module H: Model Diagnostics Loop

Found in: 1.13, 1.14, 1.15, 1.17

loop:
  characterise (compute predictions/explanations) →
  visualise (multi-level diagnostic views) →
  abstract (identify decision patterns/errors) →
  assess (sufficiency of understanding) →
  [refine investigation focus | exit]

Module I: Plan/Specification Tuning with Feedback

Found in: 1.5, 1.10, 1.11

loop:
  generate-knowledge (specify plan/config) →
  characterise (estimate outcome) →
  visualise (estimated vs. actual) →
  assess (satisfactory?) →
  [refine specification | exit]

Common Artifact Templates

Artifact Role	Typical Structure	Example Workflows
Input entities with temporal embedment	`entities, internal structure: sequence/episode, embedment: {set, time}`	1.1, 1.4, 1.5, 1.6, 1.7, 1.12
Feature vectors for clustering/projection	`feature(entities), value structure: vector, value type: numeric`	All
Specification controlling a loop	`specification, representation form: "parameter settings"`	1.1, 1.3, 1.6, 1.9, 1.10, 1.11
Arrangement via projection	`arrangement(entities), context: projection_space, principle: "DR"`	1.2, 1.8, 1.12
Cluster entities	`entities, internal structure: group/cluster, embedment: set`	1.1, 1.3, 1.6, 1.7, 1.11, 1.12
Pattern (human interpretation)	`pattern(entities), representation form: "labeled descriptions"`	All
Knowledge (final output)	`knowledge(patterns), representation form: "statements and explanations"`	All

Cross-Workflow Adaptability Map

If the user wants to…	Primary template	Supporting modules from
Find patterns in time series	1.1, 1.9	Module G + A
Analyze movement/trajectories	1.6, 1.7, 1.12	Modules A + B + E
Build a predictive model iteratively	1.10, 1.13	Modules C + F
Diagnose/explain an existing model	1.14, 1.15, 1.17	Module H
Analyze network evolution	1.2, 1.3	Modules B + G
Discover topics/themes in data	1.8, 1.9	Module B + I
Recommend actions from historical data	1.5	Modules F + I
Classify behavior patterns	1.12	Modules E + A + B
Simplify complex event data	1.4	Module D
Build spatio-temporal models	1.11	Modules A + C

I'm ready to help you design a workflow. Please describe your analytical goals, data characteristics, and desired outcomes, and I'll find the most relevant examples and suggest adaptations.

User #3

I have data about use of public bikes in a city. The records represent trips: bike ID, origin place and time, destination place and time. The origin and destination places are docking stations distributed over the city. The coordinates and capacities of the docking stations are given. In many popular places, there are multiple docking stations clustered close to each other. I need to analyse the spatio-temporal patterns of taking and returning bikes. I am especially interested in revealing patterns of stations running out of available bikes and stations overcrowded with returned bikes. The patterns are expected to be related to the diurnal and weekly cycle as well as seasons and holidays. I think, the analysis should be done not at the level of individual stations but at the level of "places" containing groups of spatially close stations. Based on my analysis, I want to develop a model that will tell me for any given date and time of the day how bikes should be allocated/relocated in order to satisfy the demands and avoid overfilling of stations.

Assistant #3

💭 Show reasoning

The user wants to analyze public bike-sharing data with these characteristics: **Data:** - Trip records: bike ID, origin (station + time), destination (station + time) - Docking stations: coordinates, capacities - Stations are spatially clustered in popular places **Goals:** 1. Analyze spatio-temporal patterns of bike taking/returning 2. Reveal patterns of stations running out of bikes (empty) and stations overcrowded (full) 3. Patterns related to diurnal, weekly, seasonal, and holiday cycles 4. Analysis at "place" level (groups of spatially close stations) 5. Develop a model for bike allocation/relocation given any date/time Now I'm identifying which analytical approaches would work best here. Time series clustering methods would help uncover the diurnal and weekly rhythms in bike movement, while calendar-based pattern analysis is perfect for capturing seasonal and holiday effects. For the spatial side, I need clustering techniques to group nearby stations into places, and movement classification to understand the behavior patterns. Spatio-temporal modeling will tie everything together for the predictive allocation model. Now I'm mapping out the full workflow structure: first I'll group stations into places using spatial clustering, then compute their aggregated characteristics like capacity and usage trends over time. Next comes temporal pattern analysis at the place level to identify how bike availability fluctuates, followed by identifying critical events like empty and full states. I'll discover recurring patterns tied to temporal cycles, build a predictive model with feedback on residuals, and finally optimize the allocation recommendations.

Workflow Design: Bike-Sharing Spatio-Temporal Pattern Analysis & Allocation Optimization

Strategy Overview

Your task naturally decomposes into 5 analytical phases, each mappable to library patterns:

Phase	Goal	Primary Library Source	Module
1. Define Places	Cluster stations spatially	1.6, 1.7	A (Iterative Clustering)
2. Temporal Profiling	Characterize usage patterns per place over time	1.1, 1.9	G (Temporal Partition→Profile→Group)
3. Critical State Detection	Find empty/full events & their patterns	1.9, 1.12	E (Feature Engineering Loop)
4. Pattern Discovery	Reveal diurnal/weekly/seasonal structures	1.1, 1.9, 1.3	B (Projection) + D (Simplification)
5. Allocation Model	Predictive + prescriptive model	1.11, 1.5	C (Model+Residual) + I (Plan Tuning)

Proposed ATWL Workflow

📋 Show/hide full ATWL specification

workflow:
  name: "Spatio-temporal analysis of bike-sharing usage patterns and allocation optimization"
  description: "Analysing trip data and station capacities to reveal patterns of bike
    shortages and overcrowding related to temporal cycles, and building an allocation
    model for proactive bike redistribution."

  template: "define-unit (group stations into places) → characterise (compute temporal
    usage profiles and occupancy states per place) → contextualise (relate to diurnal,
    weekly, seasonal, and holiday cycles) → visualise (map + calendar + time-series views)
    → abstract (identify critical shortage/overcrowding patterns) → build-model (predict
    demand and critical states) → generate-knowledge (derive allocation/relocation
    recommendations) → assess (validate model against observed critical events)"

  body:

    # ════════════════════════════════════════════════════════════════
    # PHASE 1: DEFINE PLACES (adapted from Workflow 1.6 / Module A)
    # ════════════════════════════════════════════════════════════════

    artifact:
      id: stations
      type: entities
      description: "Docking stations distributed over the city"
      internal structure: unstructured set
      embedment:
        space: "geographic coordinates (lat, lon)"
      features:
        - "capacity (number of docks)"
        - "station ID"

    artifact:
      id: trips
      type: entities
      description: "Individual bike trip records"
      internal structure: "origin-destination pair"
      embedment:
        space: "origin station, destination station"
        time: "start timestamp, end timestamp"
      features:
        - "bike ID"
        - "origin station"
        - "destination station"
        - "start time"
        - "end time"

    artifact:
      id: clustering_spec
      type: specification
      description: "Parameters for spatial clustering of stations"
      representation form: "distance threshold, minimum stations per place"

    loop:
      description: "Iteratively define places by adjusting spatial clustering parameters"
      condition: "until places are geographically coherent and meaningful"

      transform:
        intent: define-unit
        operation: "Spatial clustering of stations based on proximity (e.g., DBSCAN
          or hierarchical clustering with geographic distance)"
        input: [stations, clustering_spec]
        output: places

      artifact:
        id: places
        type: entities
        description: "Groups of spatially close stations representing a 'place'"
        internal structure: "group/cluster of stations"
        embedment:
          space: "geographic (centroid + extent)"
        features:
          - "aggregate capacity (sum of dock capacities)"
          - "number of stations"
          - "spatial extent"

      transform:
        intent: characterise
        operation: "Compute aggregate capacity and geographic centroid for each place"
        input: places
        output: place_features

      transform:
        intent: visualise
        operation: "Display places on a city map, sized by capacity, with station
          membership shown"
        input: [places, place_features]
        output: place_map_vis

      transform:
        intent: assess
        operation: "Judge whether places correspond to meaningful city areas; check
          no isolated high-capacity stations are merged incorrectly"
        input: [place_map_vis, places]
        output: clustering_spec

    # end loop Phase 1

    # ════════════════════════════════════════════════════════════════
    # PHASE 2: TEMPORAL PROFILING (adapted from Workflows 1.1, 1.9 / Module G)
    # ════════════════════════════════════════════════════════════════

    transform:
      intent: define-unit
      operation: "Aggregate trip origins and destinations from station level to place
        level; partition time into uniform intervals (e.g., 1-hour slots)"
      input: [trips, places]
      output: place_time_series

    transform:
      intent: characterise
      operation: "Estimate occupancy level at each place and time slot by accumulating
        net flows from a known or estimated initial state"
      input: [place_time_series, place_features]
      output: occupancy_series

    transform:
      intent: contextualise
      operation: "Annotate each time slot with temporal context: hour of day, day of
        week, month, season, holiday/non-holiday, special event flag"
      input: [place_time_series, occupancy_series]
      output: annotated_time_series

    # ════════════════════════════════════════════════════════════════
    # PHASE 3: CRITICAL STATE DETECTION (adapted from 1.12 / Module E)
    # ════════════════════════════════════════════════════════════════

    artifact:
      id: threshold_spec
      type: specification
      description: "Thresholds for critical states"
      representation form: "'empty' = fill ratio < 0.1; 'full' = fill ratio > 0.9"

    loop:
      description: "Iteratively refine thresholds for critical states"
      condition: "until critical events correspond to real service disruptions"

      transform:
        intent: characterise
        operation: "Identify critical events: time slots where a place is near-empty
          or near-full. Compute duration and severity."
        input: [annotated_time_series, threshold_spec]
        output: critical_events

      transform:
        intent: characterise
        operation: "Compute frequency and regularity features of critical events per
          place: how often, at which times, on which days, in which seasons"
        input: [critical_events, annotated_time_series]
        output: critical_profiles

      transform:
        intent: visualise
        operation: "Calendar heatmap of critical events; map colored by frequency;
          timeline of events for selected places"
        input: [critical_events, critical_profiles, places]
        output: critical_vis

      transform:
        intent: assess
        operation: "Evaluate whether thresholds capture meaningful disruptions;
          adjust thresholds"
        input: [critical_vis, critical_events]
        output: threshold_spec

    # end loop Phase 3

    # ════════════════════════════════════════════════════════════════
    # PHASE 4: PATTERN DISCOVERY (adapted from 1.1, 1.9, 1.3 / Modules G, B)
    # ════════════════════════════════════════════════════════════════

    transform:
      intent: define-unit
      operation: "Partition each place's time series into daily episodes (24h profiles)"
      input: annotated_time_series
      output: daily_profiles

    transform:
      intent: characterise
      operation: "Represent each daily profile as a feature vector"
      input: daily_profiles
      output: profile_vectors

    loop:
      description: "Cluster daily profiles to reveal typical usage patterns"
      condition: "until clusters are distinct and interpretable"

      transform:
        intent: define-unit
        operation: "Cluster daily profile vectors to find typical day-types"
        input: [profile_vectors, pattern_clustering_spec]
        output: profile_clusters

      transform:
        intent: visualise
        operation: "Cluster centroids as line charts; calendar coloring each day
          by cluster; map showing which places exhibit which patterns"
        input: [profile_clusters, daily_profiles, places]
        output: pattern_vis

      transform:
        intent: abstract
        operation: "Interpret clusters: label as 'workday morning-sink/evening-source',
          'weekend leisure', 'holiday pattern', etc."
        input: [pattern_vis, profile_clusters]
        output: usage_patterns

      transform:
        intent: assess
        operation: "Check whether patterns align with known temporal cycles; verify
          critical events concentrate in specific patterns; refine clustering"
        input: [usage_patterns, critical_profiles, pattern_vis]
        output: pattern_clustering_spec

    # end loop Phase 4

    # Cross-place flow analysis
    transform:
      intent: characterise
      operation: "Compute OD flow matrices between places for each typical time period"
      input: [trips, places, usage_patterns]
      output: flow_matrices

    transform:
      intent: visualise
      operation: "Flow maps showing dominant bike movements; highlight imbalance flows"
      input: [flow_matrices, places, usage_patterns]
      output: flow_vis

    transform:
      intent: generate-knowledge
      operation: "Synthesize: which places are sources vs. sinks at which times;
        which temporal cycles drive imbalances"
      input: [usage_patterns, critical_profiles, flow_matrices, flow_vis]
      output: pattern_knowledge

    # ════════════════════════════════════════════════════════════════
    # PHASE 5: ALLOCATION MODEL (adapted from 1.11, 1.5 / Modules C, I)
    # ════════════════════════════════════════════════════════════════

    loop:
      description: "Iteratively build and refine a predictive model"
      condition: "until residuals show no systematic patterns"

      transform:
        intent: generate-knowledge
        operation: "Specify model configuration"
        input: [pattern_knowledge, model_spec]
        output: model_spec

      transform:
        intent: build-model
        operation: "Train predictive model on historical data. Target: predicted
          demand/fill-level per place per time slot."
        input: [annotated_time_series, occupancy_series, critical_events, model_spec]
        output: demand_model

      transform:
        intent: characterise
        operation: "Compute predictions on validation set; calculate residuals"
        input: [demand_model, annotated_time_series]
        output: model_residuals

      transform:
        intent: visualise
        operation: "Residual maps and calendars: show where/when model errs"
        input: [model_residuals, places]
        output: residual_vis

      transform:
        intent: assess
        operation: "Evaluate: Are residuals random or systematic? → refine features
          or model type"
        input: [residual_vis, model_residuals, pattern_knowledge]
        output: model_spec

    # end loop Phase 5a

    # Allocation optimization
    loop:
      description: "Develop and refine allocation/relocation recommendations"
      condition: "until allocation plan satisfactorily reduces predicted critical events"

      transform:
        intent: generate-knowledge
        operation: "Predict demand at all places; formulate allocation plan"
        input: [demand_model, places, place_features, allocation_spec]
        output: allocation_plan

      transform:
        intent: characterise
        operation: "Simulate the effect of the allocation plan"
        input: [allocation_plan, demand_model, occupancy_series]
        output: simulated_outcome

      transform:
        intent: visualise
        operation: "Before/after comparison maps; timeline of planned actions;
          cost-benefit summary"
        input: [simulated_outcome, allocation_plan, places]
        output: allocation_vis

      transform:
        intent: assess
        operation: "Evaluate: Does the plan reduce critical events at feasible cost?"
        input: [allocation_vis, simulated_outcome]
        output: allocation_spec

    # end loop Phase 5b

    # Final output
    transform:
      intent: generate-knowledge
      operation: "Consolidate: document patterns, model, and operational guidelines"
      input: [pattern_knowledge, demand_model, allocation_plan, usage_patterns]
      output: final_knowledge

How Library Workflows Were Adapted

Phase	Source	What was reused	Key adaptation
1	1.6 (place extraction)	Spatial clustering loop with visual assessment	Changed from trajectory stops to station locations; added capacity aggregation
2	1.1 (temporal patterns)	Time partitioning → profile characterization	Changed from multivariate sensor data to departure/arrival/occupancy counts
3	1.12 (behavior classification)	Feature engineering loop with threshold tuning	Reframed for critical state detection (empty/full) with severity metrics
4	1.9 (calendar patterns) + 1.1	Daily profile clustering → calendar visualization → pattern labeling	Combined with OD flow analysis (from 1.3) for inter-place imbalance
5a	1.11 (spatio-temporal model)	Iterative model building with residual diagnostics	Target changed to fill-level prediction + critical state probability
5b	1.5 (recommendation)	Plan specification → simulate outcome → assess → refine	Adapted to vehicle relocation optimization with cost-benefit simulation

Key Design Decisions & Recommendations

Place granularity: Start with ~200–300m distance threshold for DBSCAN; verify that major transit hubs, business districts, and residential areas form coherent places.
Occupancy estimation: Since you don't have real-time occupancy snapshots, you'll need to reconstruct occupancy from cumulative net flows. Consider periodic "resets" (e.g., early morning counts) if available.
Pattern hierarchy: Expect to find 4–7 typical daily profiles (e.g., workday-residential-source, workday-CBD-sink, weekend-park, holiday-flat, event-spike).
Model features: Beyond time-of-day/week/season, consider adding weather data and local event calendars — these are often the unexplained residuals in pure temporal models.
Allocation model: The relocation optimization can range from simple rule-based (rebalance to target fill-ratio before morning rush) to full optimization (vehicle routing with capacity constraints). Start simple, then refine based on Phase 5b assessment.

Would you like me to elaborate on any specific phase, suggest concrete visualization designs, or discuss particular algorithmic choices for any step?

User #4

Describe this recommended workflow in natural language.

Assistant #4

💭 Show reasoning

The user wants me to describe the workflow I designed in natural language, without the ATWL code. I should explain the analytical process in a clear, readable narrative form that describes what happens at each phase, what decisions are made, and how the phases connect.

Bike-Sharing Spatio-Temporal Analysis: Workflow Description

Overall Goal

The workflow aims to understand when and where bike shortages and overcrowding occur in a city's bike-sharing system, discover the temporal regularities behind these problems, and ultimately build a decision-support tool that recommends how to redistribute bikes proactively to prevent service disruptions.

The analysis proceeds through five phases, each building on the results of the previous one.

Phase 1: Defining Places from Individual Stations

The analysis begins by grouping individual docking stations into meaningful "places." Since many popular locations have multiple stations clustered nearby, it makes sense to treat them as a single analytical unit with combined capacity.

The analyst uses a spatial clustering algorithm (such as DBSCAN with a geographic distance threshold) to group stations that are close to each other. The result is displayed on a city map, where each place appears as a cluster with its total docking capacity indicated by size. The analyst inspects this map and judges whether the resulting places correspond to recognizable city areas — transit hubs, business districts, university campuses, parks, residential neighborhoods. If some clusters seem too large (merging unrelated areas) or too fragmented (splitting a coherent location), the analyst adjusts the distance threshold or manually corrects the grouping, and the clustering is repeated. This continues until the set of places is geographically coherent and operationally meaningful.

Phase 2: Building Temporal Usage Profiles

Once places are defined, all trip records are aggregated from the station level to the place level. Time is divided into regular slots (for example, one-hour intervals), and for each place and each time slot, the system counts how many bikes were taken (departures) and how many were returned (arrivals). The difference — net flow — tells whether a place is gaining or losing bikes during that interval.

From these flows, the system estimates the occupancy level at each place over time: how many bikes are present relative to the total capacity. This produces a fill-ratio time series for every place across the entire observation period.

Each time slot is then annotated with its temporal context: the hour of day, day of week, whether it is a workday or weekend, the month and season, and whether it falls on a public holiday. This contextual annotation is essential for later discovering which temporal cycles drive the usage patterns.

Phase 3: Detecting Critical States

With occupancy time series in hand, the analyst defines what constitutes a "critical state." A shortage occurs when a place's fill ratio drops so low that arriving users are unlikely to find an available bike. Overcrowding occurs when the fill ratio is so high that returning users cannot find an empty dock.

The system scans the occupancy series and flags all episodes where a place enters a critical state, recording the type (shortage or overcrowding), duration, and severity. It then computes a profile for each place summarizing how frequently critical events occur at different hours, on different days of the week, and in different seasons.

These findings are visualized in multiple coordinated views: a calendar heatmap showing which days had critical events, a city map colored by critical event frequency, and timelines showing the progression of individual episodes. The analyst examines these displays to assess whether the chosen thresholds capture genuine service disruptions. If the thresholds are too sensitive (flagging trivially brief or mild events) or too lenient (missing real problems), they are adjusted and the detection is rerun. This iterates until the critical events reliably represent meaningful service failures.

Phase 4: Discovering Spatio-Temporal Patterns

To reveal the underlying structure of usage behavior, the system extracts daily profiles for each place — the characteristic shape of how departures, arrivals, and fill level evolve from midnight to midnight on a given day. Each daily profile is represented as a numerical vector (for instance, 24 values of hourly net flow).

These vectors are then clustered to find groups of similar days. The expectation is that a small number of typical patterns will emerge: perhaps a "workday residential" pattern where bikes leave in the morning and return in the evening, a "workday business district" pattern with the opposite shape, a "weekend leisure" pattern with midday peaks, a "holiday" pattern that is flat or has different timing, and so on.

The results are visualized as representative daily curves for each cluster, a calendar where each day is colored by its cluster membership (revealing weekly and seasonal regularities), and a map showing which places exhibit which patterns. The analyst interprets and labels the clusters, checks whether they align with known temporal cycles, and verifies that critical events concentrate within specific patterns. If the clustering is too coarse or too fine, the number of clusters or the similarity metric is adjusted and the process repeats.

Additionally, the system computes origin-destination flow matrices between places for each typical time period. These are displayed as flow maps showing the dominant movements of bikes — for example, a strong morning flow from residential periphery toward the city center on workdays. The analyst identifies which directional flows cause the imbalances that lead to critical states.

The phase concludes with a synthesis: a consolidated description of which places are sources versus sinks at which times, which temporal cycles drive the imbalances, and which places are chronically problematic.

Phase 5: Building the Allocation Model

Part A: Demand Prediction

Armed with the pattern knowledge, the analyst builds a predictive model. Given a place, a date, and a time of day, the model should predict the expected demand (departures and arrivals), the resulting fill level, and the probability of entering a critical state.

The model is trained on historical data using temporal context features (hour, day of week, season, holiday flag) and place characteristics (capacity, typical pattern type) as inputs. After training, the model's predictions are compared against actual observations. Residuals — the differences between predicted and actual values — are examined through diagnostic visualizations: heatmaps showing where and when the model errs, distributions of error magnitudes, and maps highlighting places with systematic mispredictions.

If residuals reveal systematic patterns (for example, the model consistently underpredicts demand on the first warm day of spring, or fails during local events), the analyst refines the model by adding features (such as weather data or event calendars), changing the model architecture, or resampling the training data. This cycle continues until the residuals appear random and the model reliably anticipates critical states.

Part B: Allocation Optimization

The predictive model becomes the foundation for an operational decision-support tool. For any given target date and time, the system uses the model to forecast demand at all places and identify which ones are heading toward critical states. It then formulates an allocation plan: a set of recommended relocation actions specifying how many bikes should be moved, from which places to which others, and in what time window — before the problems materialize.

The effect of the proposed plan is simulated: the system estimates what the fill levels would be after the relocations are carried out, and counts how many critical events would be prevented. This is visualized as a before-and-after comparison — maps showing predicted critical states without intervention versus with the proposed redistribution, along with a cost summary (total distance traveled, number of vehicle trips required).

The analyst evaluates whether the plan achieves an acceptable reduction in critical events at feasible cost. If not — perhaps the plan is too expensive, or it handles average days well but fails on holidays — the optimization parameters are adjusted (for example, relaxing service targets for low-priority places, or increasing the planning horizon to allow earlier, cheaper moves). The process iterates until a satisfactory balance between service quality and operational cost is achieved.

Final Output

The workflow produces three interconnected deliverables:

A catalogue of spatio-temporal usage patterns: named, characterized typical behaviors of each place across different temporal contexts, with explicit identification of critical imbalance patterns and their causes.
A predictive model: capable of forecasting, for any given date and time, the expected demand and critical-state risk at every place in the system.
An allocation recommendation engine: which uses the predictions to generate actionable redistribution plans that proactively prevent shortages and overcrowding, with transparent cost-benefit tradeoffs for operational decision-makers.