Problem: Discover major research topics in a corpus of IEEE VIS publications (1990–2024) using NMF topic modelling and analyse how topic prominence evolves over time, emphasising long-term trends rather than minor fluctuations.
Source: Recommended by an LLM agent equipped with the ATWL workflow library (Formal condition, Problem B).
Workflow Header & Template
workflow TopicEvolutionAnalysis
template:
characterise (vectorise) → build-model (topic model) →
characterise (assign) → contextualise (projection) → visualise →
loop(assess → generate-knowledge (refine) →
build-model → characterise → contextualise → visualise) →
characterise (aggregate) → characterise (smooth) →
contextualise (timeline) → visualise →
loop(assess → generate-knowledge (adjust smoothing) →
characterise (smooth) → contextualise → visualise) →
abstract → define-unit (eras) → visualise →
generate-knowledge
description: "Discover major research topics in a corpus of IEEE VIS
publications (1990-2024) using NMF topic modelling and analyse how
topic prominence evolves over time, emphasising long-term trends
rather than minor fluctuations"
Input: Publication Corpus
# ===============================================
# INPUT: Publication corpus
# ===============================================
artifact papers : entities
origin: given
internal structure: elementary
embedment: set
features:
- id: year
value structure: atomic
value type: temporal
description: "Publication year"
- id: title
value structure: atomic
value type: text
description: "Paper title"
- id: abstract_text
value structure: atomic
value type: text
description: "Paper abstract"
description: "IEEE VIS publications from 1990 to 2024 including
IEEE TVCG and IEEE CG&A articles published at IEEE VIS,
each described by publication year, title, and abstract"
Phase 1: Topic Discovery
# ===============================================
# PHASE 1: TOPIC DISCOVERY
# ===============================================
# -----------------------------------------------
# T1: Vectorise text using TF-IDF
# -----------------------------------------------
artifact S_vectorisation : specification
origin: given
representation form: "parameter settings"
description: "Parameters controlling text vectorisation: maximum
and minimum document frequency thresholds, n-gram range,
vocabulary size limit, and stop word list"
transform T_vectorise :
intent: characterise
manner: "bag-of-words encoding with TF-IDF weighting"
input: papers, S_vectorisation
output: tfidf_matrix
actor: machine
description: "Represent each paper by a weighted term-frequency
vector derived from concatenated title and abstract text"
artifact tfidf_matrix : feature(papers)
value structure: matrix
value type: numeric
description: "Term-document matrix with TF-IDF weighting;
rows correspond to papers, columns to vocabulary terms"
# ------------------------------------------------
# T2: Build NMF topic model
# ------------------------------------------------
artifact S_topic_model : specification
origin: given
representation form: "parameter settings"
description: "Parameters controlling NMF topic modelling: number
of topics, initialisation method, maximum iterations, and
number of top words per topic for representation"
transform T_build_topic_model :
intent: build-model
manner: "non-negative matrix factorisation"
input: tfidf_matrix, S_topic_model
output: topic_model
actor: machine
description: "Decompose the term-document matrix into a
document-topic weight matrix and a topic-term weight matrix
using NMF with specified parameters"
artifact topic_model : model(tfidf_matrix, S_topic_model)
model type: "topic model"
representation form: "matrix factorisation (W, H)"
description: "NMF topic model consisting of a document-topic
weight matrix (W) and a topic-term weight matrix (H);
each topic is represented as a weighted combination of
vocabulary terms"
# ----------------------------------------------
# T3: Assign topics and extract representations
# ----------------------------------------------
transform T_assign_topics :
intent: characterise
manner: "assign primary topic based on maximum weight"
input: papers, topic_model
output: topic_assignments, topic_keywords
actor: machine
description: "Assign each paper to its dominant topic based on
the maximum weight in the document-topic matrix; extract
top-weighted terms per topic as keyword representations"
artifact topic_assignments : feature(papers)
value structure: atomic
value type: categorical
description: "Primary topic assignment for each paper, determined
by the topic with highest weight in the document-topic matrix"
artifact topic_keywords : feature(topic_model)
value structure: vector
value type: text
description: "Top-weighted terms characterising each topic,
serving as human-readable topic representations"
# ------------------------------------------
# T4: Project documents to 2D
# ------------------------------------------
transform T_project :
intent: contextualise
manner: "projection by dimensionality reduction"
input: papers, tfidf_matrix, topic_assignments
output: projection_space, document_arrangement
actor: machine
description: "Project TF-IDF vectors to 2D using UMAP,
preserving similarity relationships between papers to
enable visual assessment of topic coherence"
artifact projection_space : entities
internal structure: elementary
features:
- id: dimensionality
value structure: atomic
value type: numeric
description: "Number of spatial dimensions (2)"
description: "Two-dimensional projection space created by
dimensionality reduction, serving as reference frame for
document arrangement"
artifact document_arrangement : arrangement(papers)
context: projection_space
principle: "dimensionality reduction preserving cosine
similarity in TF-IDF space"
description: "2D positioning of papers where proximity
indicates textual similarity; topic clusters appear as
spatially coherent groups"
# -----------------------------------------------
# T5: Visualise topic overview
# -----------------------------------------------
transform T_visualise_topics :
intent: visualise
manner: "scatterplot with topic colouring and keyword summaries"
input: document_arrangement, topic_assignments, topic_keywords
output: topic_overview_vis
actor: machine
description: "Display papers as coloured points in the 2D
projection with topic keyword summaries and a bar chart
showing topic sizes"
artifact topic_overview_vis : visualisation(document_arrangement,
topic_assignments, topic_keywords)
layout: "coordinated views: 2D scatterplot and topic size
bar chart"
form: "coloured points with topic labels; horizontal bars"
encoding: "scatterplot position from document_arrangement;
point colour by topic_assignments; bar chart shows paper
count per topic; topic annotations show top keywords from
topic_keywords"
description: "Overview of discovered topics showing document
distribution in projected space and topic sizes with
keyword characterisations"
Loop L1: Topic Refinement
# =========================================
# LOOP L1: Topic refinement
# =========================================
loop L_topic_refinement:
purpose: "Iteratively refine the topic model until topics are
coherent, interpretable, and at appropriate granularity
for characterising research themes"
until: "Topics are semantically coherent, well-separated in the
projection, and at a granularity suitable for tracking
research evolution over 35 years"
body:
transform T_assess_topics :
intent: assess
manner: "evaluate semantic coherence and granularity"
input: topic_overview_vis, topic_keywords, topic_model
output: topic_assessment
actor: human
description: "Assess topic quality: semantic coherence of
keyword sets, spatial separation in projection,
appropriate number of topics for the analytical goal,
and absence of overly broad or redundant topics"
artifact topic_assessment : knowledge(topic_model)
representation form: "quality judgment"
description: "Assessment of topic quality identifying whether
topics are coherent, well-separated, and at appropriate
granularity, or whether parameter adjustment is needed"
if topic_assessment indicates refinement needed:
then:
transform T_adjust_topic_params :
intent: generate-knowledge
manner: "formulate revised topic model parameters"
input: topic_assessment, topic_overview_vis, S_topic_model
output: S_topic_model'
actor: human
description: "Adjust topic modelling parameters based
on assessment: change number of topics, modify
vocabulary constraints, or adjust stop word list
to improve topic quality"
artifact S_topic_model' : specification
representation form: "parameter settings"
description: "Updated NMF topic modelling parameters
after analyst refinement"
transform T_rebuild_topic_model :
intent: build-model
manner: "non-negative matrix factorisation"
input: tfidf_matrix, S_topic_model'
output: topic_model'
actor: machine
description: "Recompute NMF topic model with revised
parameters"
artifact topic_model' : model(tfidf_matrix, S_topic_model')
model type: "topic model"
representation form: "matrix factorisation (W, H)"
description: "Refined NMF topic model with updated parameters"
transform T_reassign_topics :
intent: characterise
manner: "assign primary topic based on updated weights"
input: papers, topic_model'
output: topic_assignments', topic_keywords'
actor: machine
description: "Recompute topic assignments and keyword
representations from the refined model"
artifact topic_assignments' : feature(papers)
value structure: atomic
value type: categorical
description: "Updated primary topic assignments from
refined model"
artifact topic_keywords' : feature(topic_model')
value structure: vector
value type: text
description: "Updated top-weighted terms per topic
from refined model"
transform T_reproject :
intent: contextualise
manner: "projection by dimensionality reduction"
input: papers, tfidf_matrix, topic_assignments'
output: document_arrangement'
actor: machine
description: "Update 2D projection to reflect revised
topic structure"
artifact document_arrangement' : arrangement(papers)
context: projection_space
principle: "dimensionality reduction preserving cosine
similarity"
description: "Updated 2D positioning reflecting refined
topic assignments"
transform T_revisualise_topics :
intent: visualise
manner: "scatterplot with topic colouring and keyword summaries"
input: document_arrangement', topic_assignments', topic_keywords'
output: topic_overview_vis'
actor: machine
description: "Update topic overview visualisation with
refined model results"
artifact topic_overview_vis' : visualisation(document_arrangement',
topic_assignments', topic_keywords')
layout: "coordinated views: 2D scatterplot and topic
size bar chart"
form: "coloured points with topic labels; horizontal bars"
encoding: "scatterplot position from document_arrangement';
point colour by topic_assignments'; bar chart shows
paper count per topic; annotations show top keywords"
description: "Updated topic overview reflecting refined model"
assign:
S_topic_model := S_topic_model'
topic_model := topic_model'
topic_assignments := topic_assignments'
topic_keywords := topic_keywords'
document_arrangement := document_arrangement'
topic_overview_vis := topic_overview_vis'
else:
exit loop L_topic_refinement
end loop L_topic_refinement
Phase 2: Temporal Topic Profiling
# ==============================================
# PHASE 2: TEMPORAL TOPIC PROFILING
# ==============================================
# ----------------------------------------------
# T8: Aggregate topic membership per year
# ----------------------------------------------
transform T_aggregate_temporal :
intent: characterise
manner: "aggregate topic weights by time intervals"
input: papers, topic_model
output: topic_temporal_profiles
actor: machine
description: "For each year, sum the NMF document-topic weights
across all papers published in that year and normalise to
obtain topic proportions per year; soft assignment provides
smoother temporal profiles than hard topic counts"
artifact topic_temporal_profiles : feature(topic_model)
value structure: matrix
value type: numeric
representation form: "time series per topic"
description: "Matrix of topic proportions over years (topics as
rows, years as columns); each cell contains the normalised
sum of document-topic weights for that topic in that year"
# ----------------------------------------
# T9: Apply temporal smoothing
# ----------------------------------------
artifact S_smoothing : specification
origin: given
representation form: "parameter settings"
description: "Parameters controlling temporal smoothing: method
(LOESS, moving average, or Gaussian kernel), smoothing
fraction or window size, and whether to display proportions
or absolute counts"
transform T_smooth_trends :
intent: characterise
manner: "temporal smoothing to extract trends"
input: topic_temporal_profiles, S_smoothing
output: topic_trends
actor: machine
description: "Apply temporal smoothing to each topic's yearly
proportion series to suppress minor fluctuations and reveal
long-term trends spanning multiple years"
artifact topic_trends : feature(topic_model)
value structure: matrix
value type: numeric
representation form: "smoothed time series per topic"
description: "Smoothed topic proportion time series where
short-term fluctuations are suppressed and multi-year
trends are emphasised"
# -------------------------------------------
# T10: Arrange topics on shared timeline
# -------------------------------------------
artifact timeline : entities
origin: given
internal structure: elementary
embedment: time
features:
- id: year_value
value structure: atomic
value type: temporal
description: "Calendar year from 1990 to 2024"
description: "Temporal reference structure providing the shared
time axis for positioning topic trends"
transform T_arrange_temporal :
intent: contextualise
manner: "temporal alignment on shared time axis"
input: topic_trends, timeline
output: temporal_arrangement
actor: machine
description: "Align all topic trend curves on a common timeline
from 1990 to 2024 for comparative temporal visualisation"
artifact temporal_arrangement : arrangement(topic_trends)
context: timeline
principle: "calendar year mapping to horizontal position"
description: "Arrangement of topic trend curves along the shared
1990-2024 time axis enabling visual comparison of temporal
evolution"
# ----------------------------------------
# T11: Visualise temporal evolution
# ----------------------------------------
transform T_visualise_evolution :
intent: visualise
manner: "stacked area chart and small-multiple line charts"
input: temporal_arrangement, topic_trends, topic_keywords
output: evolution_vis
actor: machine
description: "Display topic evolution through coordinated temporal
views: a stacked area chart showing compositional change,
small-multiple line charts showing individual topic
trajectories, and a heatmap showing topic intensity over time"
artifact evolution_vis : visualisation(temporal_arrangement,
topic_trends, topic_keywords)
layout: "coordinated temporal views: stacked area chart,
small-multiple line charts, and heatmap"
form: "stacked coloured areas; individual trend lines with
shaded confidence regions; colour-coded cells"
encoding: "horizontal position from temporal_arrangement;
vertical extent in stacked chart from topic proportions;
line height in small multiples from topic_trends; cell
colour intensity in heatmap from topic_trends; topic
identity encoded by colour consistent with topic_overview_vis"
description: "Multi-view temporal visualisation showing both
overall compositional evolution and individual topic
trajectories from 1990 to 2024"
Loop L2: Smoothing Refinement
# ==============================================
# LOOP L2: Smoothing refinement
# ==============================================
loop L_smoothing_refinement:
purpose: "Iteratively adjust smoothing parameters until temporal
visualisation reveals genuine long-term trends without
excessive loss of meaningful temporal structure"
until: "Trend curves are smooth enough to show multi-year patterns
without retaining year-to-year noise, and important
transitions remain visible"
body:
transform T_assess_smoothing :
intent: assess
manner: "evaluate smoothing adequacy for trend visibility"
input: evolution_vis, topic_trends
output: smoothing_assessment
actor: human
description: "Assess whether the smoothing level is
appropriate: curves should reveal long-term trends
without retaining minor fluctuations; important
transitions and inflection points should remain visible"
artifact smoothing_assessment : knowledge(topic_trends)
representation form: "quality judgment"
description: "Assessment of smoothing adequacy: whether trends
are visible, whether noise is suppressed, and whether
meaningful temporal structure is preserved"
if smoothing_assessment indicates adjustment needed:
then:
transform T_adjust_smoothing :
intent: generate-knowledge
manner: "adjust smoothing parameters"
input: smoothing_assessment, evolution_vis, S_smoothing
output: S_smoothing'
actor: human
description: "Adjust smoothing parameters based on
assessment: increase smoothing fraction for smoother
trends, decrease for more temporal detail, or switch
smoothing method"
artifact S_smoothing' : specification
representation form: "parameter settings"
description: "Updated smoothing parameters after analyst
adjustment"
transform T_resmooth :
intent: characterise
manner: "temporal smoothing to extract trends"
input: topic_temporal_profiles, S_smoothing'
output: topic_trends'
actor: machine
description: "Recompute smoothed trends with adjusted
parameters"
artifact topic_trends' : feature(topic_model)
value structure: matrix
value type: numeric
representation form: "smoothed time series per topic"
description: "Recomputed smoothed topic trends with
adjusted smoothing parameters"
transform T_rearrange_temporal :
intent: contextualise
manner: "temporal alignment on shared time axis"
input: topic_trends', timeline
output: temporal_arrangement'
actor: machine
description: "Realign smoothed topic trends on the
shared timeline"
artifact temporal_arrangement' : arrangement(topic_trends')
context: timeline
principle: "calendar year mapping to horizontal position"
description: "Updated temporal arrangement with
resmoothed trends"
transform T_revisualise_evolution :
intent: visualise
manner: "stacked area chart and small-multiple line charts"
input: temporal_arrangement', topic_trends', topic_keywords
output: evolution_vis'
actor: machine
description: "Update temporal evolution visualisation
with resmoothed trends"
artifact evolution_vis' : visualisation(temporal_arrangement',
topic_trends', topic_keywords)
layout: "coordinated temporal views: stacked area chart,
small-multiple line charts, and heatmap"
form: "stacked coloured areas; individual trend lines;
colour-coded cells"
encoding: "same encoding as evolution_vis with updated
smoothed values"
description: "Updated temporal evolution visualisation
reflecting adjusted smoothing"
assign:
S_smoothing := S_smoothing'
topic_trends := topic_trends'
temporal_arrangement := temporal_arrangement'
evolution_vis := evolution_vis'
else:
exit loop L_smoothing_refinement
end loop L_smoothing_refinement
Phase 3: Trend Interpretation & Knowledge Generation
# ===========================================
# PHASE 3: TREND INTERPRETATION AND KNOWLEDGE GENERATION
# ===========================================
# -------------------------------------------
# T14: Identify trend patterns
# -------------------------------------------
transform T_identify_trends :
intent: abstract
manner: "classify topic trajectories and identify inflection points"
input: evolution_vis, topic_overview_vis, topic_trends
output: trend_patterns
actor: human
description: "Identify qualitative trend patterns from the temporal
visualisation: classify each topic as rising, declining,
stable, or peaked; identify major inflection points and
dominant research eras"
artifact trend_patterns : pattern(topic_trends, topic_model)
representation form: "categorised trend types with temporal characteristics"
description: "Identified trend patterns including rising topics,
declining topics, stable topics, peaked topics with
approximate peak years, and major inflection points in the
overall research landscape"
# --------------------------------------------
# T15: Segment timeline into research eras
# --------------------------------------------
transform T_segment_eras :
intent: define-unit
manner: "segment timeline by compositional change points"
input: topic_trends, trend_patterns
output: research_eras, era_labels
actor: hybrid
description: "Segment the 1990-2024 timeline into distinct
research eras based on compositional change-point detection
and analyst interpretation; each era is a contiguous period
with a relatively stable topic mix"
artifact research_eras : entities
internal structure: episode
embedment: time
features:
- id: time_span
value structure: atomic
value type: temporal
description: "Start and end years of the era"
- id: dominant_topics
value structure: list
value type: reference
description: "Topics with highest average proportion
during this era"
description: "Distinct research eras representing contiguous
periods with coherent topic composition, separated by
years of notable compositional shift"
artifact era_labels : feature(research_eras)
value structure: atomic
value type: categorical
description: "Descriptive labels characterising each research era
based on its dominant topics and temporal position"
# ----------------------------------------------
# T16: Visualise annotated timeline
# ----------------------------------------------
transform T_visualise_annotated :
intent: visualise
manner: "annotated stacked area chart with era boundaries"
input: temporal_arrangement, topic_trends, research_eras,
era_labels, topic_keywords
output: annotated_evolution_vis
actor: machine
description: "Overlay era boundaries and labels on the stacked
area chart to show how the research landscape is structured
into distinct periods"
artifact annotated_evolution_vis : visualisation(temporal_arrangement,
topic_trends, research_eras, era_labels)
layout: "stacked area chart with era boundary overlays"
form: "stacked coloured areas with vertical boundary lines and
era label annotations"
encoding: "horizontal position from temporal_arrangement; stacked
areas from topic_trends; vertical dashed lines at era
boundaries; text annotations from era_labels positioned at
era midpoints"
description: "Annotated streamgraph showing topic evolution with
research era boundaries and labels, providing a structured
narrative of the field's development"
# ---------------------------------------------
# T17: Generate knowledge
# ---------------------------------------------
transform T_generate_knowledge :
intent: generate-knowledge
manner: "synthesise findings into structured narrative"
input: trend_patterns, annotated_evolution_vis, research_eras,
topic_keywords, topic_assessment
output: evolution_knowledge
actor: human
description: "Synthesise all findings into a structured narrative
summary: document major research themes, their temporal
trajectories, key transitions, emerging and fading topics,
and the overall evolution of the IEEE VIS research community
from 1990 to 2024"
artifact evolution_knowledge : knowledge(trend_patterns,
research_eras, topic_model)
representation form: "statements and explanations"
description: "Comprehensive understanding of research topic
evolution in the IEEE VIS community: identified major
themes with keyword characterisations, classification of
each topic's temporal trajectory, delineation of distinct
research eras with their dominant concerns, identification
of key transition points, and narrative synthesis of how
the field has developed over 35 years"