ATWL Representation of the Recommended Workflow for Problem B

Research Topic Evolution in IEEE VIS Publications (1990–2024)

Problem: Discover major research topics in a corpus of IEEE VIS publications (1990–2024) using NMF topic modelling and analyse how topic prominence evolves over time, emphasising long-term trends rather than minor fluctuations.

Source: Recommended by an LLM agent equipped with the ATWL workflow library (Formal condition, Problem B).

Workflow Header & Template

workflow TopicEvolutionAnalysis template: characterise (vectorise) → build-model (topic model) → characterise (assign) → contextualise (projection) → visualiseloop(assessgenerate-knowledge (refine) → build-modelcharacterisecontextualisevisualise) → characterise (aggregate) → characterise (smooth) → contextualise (timeline) → visualiseloop(assessgenerate-knowledge (adjust smoothing) → characterise (smooth) → contextualisevisualise) → abstractdefine-unit (eras) → visualisegenerate-knowledge description: "Discover major research topics in a corpus of IEEE VIS publications (1990-2024) using NMF topic modelling and analyse how topic prominence evolves over time, emphasising long-term trends rather than minor fluctuations"

Input: Publication Corpus

# =============================================== # INPUT: Publication corpus # =============================================== artifact papers : entities origin: given internal structure: elementary embedment: set features: - id: year value structure: atomic value type: temporal description: "Publication year" - id: title value structure: atomic value type: text description: "Paper title" - id: abstract_text value structure: atomic value type: text description: "Paper abstract" description: "IEEE VIS publications from 1990 to 2024 including IEEE TVCG and IEEE CG&A articles published at IEEE VIS, each described by publication year, title, and abstract"

Phase 1: Topic Discovery

# =============================================== # PHASE 1: TOPIC DISCOVERY # =============================================== # ----------------------------------------------- # T1: Vectorise text using TF-IDF # ----------------------------------------------- artifact S_vectorisation : specification origin: given representation form: "parameter settings" description: "Parameters controlling text vectorisation: maximum and minimum document frequency thresholds, n-gram range, vocabulary size limit, and stop word list" transform T_vectorise : intent: characterise manner: "bag-of-words encoding with TF-IDF weighting" input: papers, S_vectorisation output: tfidf_matrix actor: machine description: "Represent each paper by a weighted term-frequency vector derived from concatenated title and abstract text" artifact tfidf_matrix : feature(papers) value structure: matrix value type: numeric description: "Term-document matrix with TF-IDF weighting; rows correspond to papers, columns to vocabulary terms" # ------------------------------------------------ # T2: Build NMF topic model # ------------------------------------------------ artifact S_topic_model : specification origin: given representation form: "parameter settings" description: "Parameters controlling NMF topic modelling: number of topics, initialisation method, maximum iterations, and number of top words per topic for representation" transform T_build_topic_model : intent: build-model manner: "non-negative matrix factorisation" input: tfidf_matrix, S_topic_model output: topic_model actor: machine description: "Decompose the term-document matrix into a document-topic weight matrix and a topic-term weight matrix using NMF with specified parameters" artifact topic_model : model(tfidf_matrix, S_topic_model) model type: "topic model" representation form: "matrix factorisation (W, H)" description: "NMF topic model consisting of a document-topic weight matrix (W) and a topic-term weight matrix (H); each topic is represented as a weighted combination of vocabulary terms" # ---------------------------------------------- # T3: Assign topics and extract representations # ---------------------------------------------- transform T_assign_topics : intent: characterise manner: "assign primary topic based on maximum weight" input: papers, topic_model output: topic_assignments, topic_keywords actor: machine description: "Assign each paper to its dominant topic based on the maximum weight in the document-topic matrix; extract top-weighted terms per topic as keyword representations" artifact topic_assignments : feature(papers) value structure: atomic value type: categorical description: "Primary topic assignment for each paper, determined by the topic with highest weight in the document-topic matrix" artifact topic_keywords : feature(topic_model) value structure: vector value type: text description: "Top-weighted terms characterising each topic, serving as human-readable topic representations" # ------------------------------------------ # T4: Project documents to 2D # ------------------------------------------ transform T_project : intent: contextualise manner: "projection by dimensionality reduction" input: papers, tfidf_matrix, topic_assignments output: projection_space, document_arrangement actor: machine description: "Project TF-IDF vectors to 2D using UMAP, preserving similarity relationships between papers to enable visual assessment of topic coherence" artifact projection_space : entities internal structure: elementary features: - id: dimensionality value structure: atomic value type: numeric description: "Number of spatial dimensions (2)" description: "Two-dimensional projection space created by dimensionality reduction, serving as reference frame for document arrangement" artifact document_arrangement : arrangement(papers) context: projection_space principle: "dimensionality reduction preserving cosine similarity in TF-IDF space" description: "2D positioning of papers where proximity indicates textual similarity; topic clusters appear as spatially coherent groups" # ----------------------------------------------- # T5: Visualise topic overview # ----------------------------------------------- transform T_visualise_topics : intent: visualise manner: "scatterplot with topic colouring and keyword summaries" input: document_arrangement, topic_assignments, topic_keywords output: topic_overview_vis actor: machine description: "Display papers as coloured points in the 2D projection with topic keyword summaries and a bar chart showing topic sizes" artifact topic_overview_vis : visualisation(document_arrangement, topic_assignments, topic_keywords) layout: "coordinated views: 2D scatterplot and topic size bar chart" form: "coloured points with topic labels; horizontal bars" encoding: "scatterplot position from document_arrangement; point colour by topic_assignments; bar chart shows paper count per topic; topic annotations show top keywords from topic_keywords" description: "Overview of discovered topics showing document distribution in projected space and topic sizes with keyword characterisations"

Loop L1: Topic Refinement

# ========================================= # LOOP L1: Topic refinement # ========================================= loop L_topic_refinement: purpose: "Iteratively refine the topic model until topics are coherent, interpretable, and at appropriate granularity for characterising research themes" until: "Topics are semantically coherent, well-separated in the projection, and at a granularity suitable for tracking research evolution over 35 years" body: transform T_assess_topics : intent: assess manner: "evaluate semantic coherence and granularity" input: topic_overview_vis, topic_keywords, topic_model output: topic_assessment actor: human description: "Assess topic quality: semantic coherence of keyword sets, spatial separation in projection, appropriate number of topics for the analytical goal, and absence of overly broad or redundant topics" artifact topic_assessment : knowledge(topic_model) representation form: "quality judgment" description: "Assessment of topic quality identifying whether topics are coherent, well-separated, and at appropriate granularity, or whether parameter adjustment is needed" if topic_assessment indicates refinement needed: then: transform T_adjust_topic_params : intent: generate-knowledge manner: "formulate revised topic model parameters" input: topic_assessment, topic_overview_vis, S_topic_model output: S_topic_model' actor: human description: "Adjust topic modelling parameters based on assessment: change number of topics, modify vocabulary constraints, or adjust stop word list to improve topic quality" artifact S_topic_model' : specification representation form: "parameter settings" description: "Updated NMF topic modelling parameters after analyst refinement" transform T_rebuild_topic_model : intent: build-model manner: "non-negative matrix factorisation" input: tfidf_matrix, S_topic_model' output: topic_model' actor: machine description: "Recompute NMF topic model with revised parameters" artifact topic_model' : model(tfidf_matrix, S_topic_model') model type: "topic model" representation form: "matrix factorisation (W, H)" description: "Refined NMF topic model with updated parameters" transform T_reassign_topics : intent: characterise manner: "assign primary topic based on updated weights" input: papers, topic_model' output: topic_assignments', topic_keywords' actor: machine description: "Recompute topic assignments and keyword representations from the refined model" artifact topic_assignments' : feature(papers) value structure: atomic value type: categorical description: "Updated primary topic assignments from refined model" artifact topic_keywords' : feature(topic_model') value structure: vector value type: text description: "Updated top-weighted terms per topic from refined model" transform T_reproject : intent: contextualise manner: "projection by dimensionality reduction" input: papers, tfidf_matrix, topic_assignments' output: document_arrangement' actor: machine description: "Update 2D projection to reflect revised topic structure" artifact document_arrangement' : arrangement(papers) context: projection_space principle: "dimensionality reduction preserving cosine similarity" description: "Updated 2D positioning reflecting refined topic assignments" transform T_revisualise_topics : intent: visualise manner: "scatterplot with topic colouring and keyword summaries" input: document_arrangement', topic_assignments', topic_keywords' output: topic_overview_vis' actor: machine description: "Update topic overview visualisation with refined model results" artifact topic_overview_vis' : visualisation(document_arrangement', topic_assignments', topic_keywords') layout: "coordinated views: 2D scatterplot and topic size bar chart" form: "coloured points with topic labels; horizontal bars" encoding: "scatterplot position from document_arrangement'; point colour by topic_assignments'; bar chart shows paper count per topic; annotations show top keywords" description: "Updated topic overview reflecting refined model" assign: S_topic_model := S_topic_model' topic_model := topic_model' topic_assignments := topic_assignments' topic_keywords := topic_keywords' document_arrangement := document_arrangement' topic_overview_vis := topic_overview_vis' else: exit loop L_topic_refinement end loop L_topic_refinement

Phase 2: Temporal Topic Profiling

# ============================================== # PHASE 2: TEMPORAL TOPIC PROFILING # ============================================== # ---------------------------------------------- # T8: Aggregate topic membership per year # ---------------------------------------------- transform T_aggregate_temporal : intent: characterise manner: "aggregate topic weights by time intervals" input: papers, topic_model output: topic_temporal_profiles actor: machine description: "For each year, sum the NMF document-topic weights across all papers published in that year and normalise to obtain topic proportions per year; soft assignment provides smoother temporal profiles than hard topic counts" artifact topic_temporal_profiles : feature(topic_model) value structure: matrix value type: numeric representation form: "time series per topic" description: "Matrix of topic proportions over years (topics as rows, years as columns); each cell contains the normalised sum of document-topic weights for that topic in that year" # ---------------------------------------- # T9: Apply temporal smoothing # ---------------------------------------- artifact S_smoothing : specification origin: given representation form: "parameter settings" description: "Parameters controlling temporal smoothing: method (LOESS, moving average, or Gaussian kernel), smoothing fraction or window size, and whether to display proportions or absolute counts" transform T_smooth_trends : intent: characterise manner: "temporal smoothing to extract trends" input: topic_temporal_profiles, S_smoothing output: topic_trends actor: machine description: "Apply temporal smoothing to each topic's yearly proportion series to suppress minor fluctuations and reveal long-term trends spanning multiple years" artifact topic_trends : feature(topic_model) value structure: matrix value type: numeric representation form: "smoothed time series per topic" description: "Smoothed topic proportion time series where short-term fluctuations are suppressed and multi-year trends are emphasised" # ------------------------------------------- # T10: Arrange topics on shared timeline # ------------------------------------------- artifact timeline : entities origin: given internal structure: elementary embedment: time features: - id: year_value value structure: atomic value type: temporal description: "Calendar year from 1990 to 2024" description: "Temporal reference structure providing the shared time axis for positioning topic trends" transform T_arrange_temporal : intent: contextualise manner: "temporal alignment on shared time axis" input: topic_trends, timeline output: temporal_arrangement actor: machine description: "Align all topic trend curves on a common timeline from 1990 to 2024 for comparative temporal visualisation" artifact temporal_arrangement : arrangement(topic_trends) context: timeline principle: "calendar year mapping to horizontal position" description: "Arrangement of topic trend curves along the shared 1990-2024 time axis enabling visual comparison of temporal evolution" # ---------------------------------------- # T11: Visualise temporal evolution # ---------------------------------------- transform T_visualise_evolution : intent: visualise manner: "stacked area chart and small-multiple line charts" input: temporal_arrangement, topic_trends, topic_keywords output: evolution_vis actor: machine description: "Display topic evolution through coordinated temporal views: a stacked area chart showing compositional change, small-multiple line charts showing individual topic trajectories, and a heatmap showing topic intensity over time" artifact evolution_vis : visualisation(temporal_arrangement, topic_trends, topic_keywords) layout: "coordinated temporal views: stacked area chart, small-multiple line charts, and heatmap" form: "stacked coloured areas; individual trend lines with shaded confidence regions; colour-coded cells" encoding: "horizontal position from temporal_arrangement; vertical extent in stacked chart from topic proportions; line height in small multiples from topic_trends; cell colour intensity in heatmap from topic_trends; topic identity encoded by colour consistent with topic_overview_vis" description: "Multi-view temporal visualisation showing both overall compositional evolution and individual topic trajectories from 1990 to 2024"

Loop L2: Smoothing Refinement

# ============================================== # LOOP L2: Smoothing refinement # ============================================== loop L_smoothing_refinement: purpose: "Iteratively adjust smoothing parameters until temporal visualisation reveals genuine long-term trends without excessive loss of meaningful temporal structure" until: "Trend curves are smooth enough to show multi-year patterns without retaining year-to-year noise, and important transitions remain visible" body: transform T_assess_smoothing : intent: assess manner: "evaluate smoothing adequacy for trend visibility" input: evolution_vis, topic_trends output: smoothing_assessment actor: human description: "Assess whether the smoothing level is appropriate: curves should reveal long-term trends without retaining minor fluctuations; important transitions and inflection points should remain visible" artifact smoothing_assessment : knowledge(topic_trends) representation form: "quality judgment" description: "Assessment of smoothing adequacy: whether trends are visible, whether noise is suppressed, and whether meaningful temporal structure is preserved" if smoothing_assessment indicates adjustment needed: then: transform T_adjust_smoothing : intent: generate-knowledge manner: "adjust smoothing parameters" input: smoothing_assessment, evolution_vis, S_smoothing output: S_smoothing' actor: human description: "Adjust smoothing parameters based on assessment: increase smoothing fraction for smoother trends, decrease for more temporal detail, or switch smoothing method" artifact S_smoothing' : specification representation form: "parameter settings" description: "Updated smoothing parameters after analyst adjustment" transform T_resmooth : intent: characterise manner: "temporal smoothing to extract trends" input: topic_temporal_profiles, S_smoothing' output: topic_trends' actor: machine description: "Recompute smoothed trends with adjusted parameters" artifact topic_trends' : feature(topic_model) value structure: matrix value type: numeric representation form: "smoothed time series per topic" description: "Recomputed smoothed topic trends with adjusted smoothing parameters" transform T_rearrange_temporal : intent: contextualise manner: "temporal alignment on shared time axis" input: topic_trends', timeline output: temporal_arrangement' actor: machine description: "Realign smoothed topic trends on the shared timeline" artifact temporal_arrangement' : arrangement(topic_trends') context: timeline principle: "calendar year mapping to horizontal position" description: "Updated temporal arrangement with resmoothed trends" transform T_revisualise_evolution : intent: visualise manner: "stacked area chart and small-multiple line charts" input: temporal_arrangement', topic_trends', topic_keywords output: evolution_vis' actor: machine description: "Update temporal evolution visualisation with resmoothed trends" artifact evolution_vis' : visualisation(temporal_arrangement', topic_trends', topic_keywords) layout: "coordinated temporal views: stacked area chart, small-multiple line charts, and heatmap" form: "stacked coloured areas; individual trend lines; colour-coded cells" encoding: "same encoding as evolution_vis with updated smoothed values" description: "Updated temporal evolution visualisation reflecting adjusted smoothing" assign: S_smoothing := S_smoothing' topic_trends := topic_trends' temporal_arrangement := temporal_arrangement' evolution_vis := evolution_vis' else: exit loop L_smoothing_refinement end loop L_smoothing_refinement

Phase 3: Trend Interpretation & Knowledge Generation

# =========================================== # PHASE 3: TREND INTERPRETATION AND KNOWLEDGE GENERATION # =========================================== # ------------------------------------------- # T14: Identify trend patterns # ------------------------------------------- transform T_identify_trends : intent: abstract manner: "classify topic trajectories and identify inflection points" input: evolution_vis, topic_overview_vis, topic_trends output: trend_patterns actor: human description: "Identify qualitative trend patterns from the temporal visualisation: classify each topic as rising, declining, stable, or peaked; identify major inflection points and dominant research eras" artifact trend_patterns : pattern(topic_trends, topic_model) representation form: "categorised trend types with temporal characteristics" description: "Identified trend patterns including rising topics, declining topics, stable topics, peaked topics with approximate peak years, and major inflection points in the overall research landscape" # -------------------------------------------- # T15: Segment timeline into research eras # -------------------------------------------- transform T_segment_eras : intent: define-unit manner: "segment timeline by compositional change points" input: topic_trends, trend_patterns output: research_eras, era_labels actor: hybrid description: "Segment the 1990-2024 timeline into distinct research eras based on compositional change-point detection and analyst interpretation; each era is a contiguous period with a relatively stable topic mix" artifact research_eras : entities internal structure: episode embedment: time features: - id: time_span value structure: atomic value type: temporal description: "Start and end years of the era" - id: dominant_topics value structure: list value type: reference description: "Topics with highest average proportion during this era" description: "Distinct research eras representing contiguous periods with coherent topic composition, separated by years of notable compositional shift" artifact era_labels : feature(research_eras) value structure: atomic value type: categorical description: "Descriptive labels characterising each research era based on its dominant topics and temporal position" # ---------------------------------------------- # T16: Visualise annotated timeline # ---------------------------------------------- transform T_visualise_annotated : intent: visualise manner: "annotated stacked area chart with era boundaries" input: temporal_arrangement, topic_trends, research_eras, era_labels, topic_keywords output: annotated_evolution_vis actor: machine description: "Overlay era boundaries and labels on the stacked area chart to show how the research landscape is structured into distinct periods" artifact annotated_evolution_vis : visualisation(temporal_arrangement, topic_trends, research_eras, era_labels) layout: "stacked area chart with era boundary overlays" form: "stacked coloured areas with vertical boundary lines and era label annotations" encoding: "horizontal position from temporal_arrangement; stacked areas from topic_trends; vertical dashed lines at era boundaries; text annotations from era_labels positioned at era midpoints" description: "Annotated streamgraph showing topic evolution with research era boundaries and labels, providing a structured narrative of the field's development" # --------------------------------------------- # T17: Generate knowledge # --------------------------------------------- transform T_generate_knowledge : intent: generate-knowledge manner: "synthesise findings into structured narrative" input: trend_patterns, annotated_evolution_vis, research_eras, topic_keywords, topic_assessment output: evolution_knowledge actor: human description: "Synthesise all findings into a structured narrative summary: document major research themes, their temporal trajectories, key transitions, emerging and fading topics, and the overall evolution of the IEEE VIS research community from 1990 to 2024" artifact evolution_knowledge : knowledge(trend_patterns, research_eras, topic_model) representation form: "statements and explanations" description: "Comprehensive understanding of research topic evolution in the IEEE VIS community: identified major themes with keyword characterisations, classification of each topic's temporal trajectory, delineation of distinct research eras with their dominant concerns, identification of key transition points, and narrative synthesis of how the field has developed over 35 years"