Show Me the Infographic I Imagine: Intent-Aware Infographic Retrieval for Authoring Support
A new AI retrieval system helps non-designers find infographic templates by understanding fuzzy creative intent — not just keywords.

The Thesis
Most people who need to make an infographic aren't designers. They know roughly what they want — 'something clean with a timeline feel' — but translating that into a search query is hard, and keyword search usually fails them. This paper builds a retrieval system that interprets vague creative language, maps it onto a structured taxonomy of content and visual-design intent, then finds matching exemplars from a large infographic library. The catch is that infographics are genuinely weird objects for AI: they mix charts, icons, text, and layout in ways that models trained on ordinary photographs handle poorly. The system also includes a lightweight agent that helps users adapt a retrieved template to their own data, making this a modest end-to-end authoring assist rather than just a search box.
Catalyst
Large vision-language models (systems trained to link images and text, like CLIP) have become good enough to serve as a retrieval backbone, but they were trained on natural photos, not data-dense visual documents. Recent work on multimodal document understanding — combined with larger infographic corpora becoming available — made it practical to fine-tune and augment these models for the infographic domain specifically. The rise of no-code design tools (Canva, Adobe Express) also created a clear commercial pressure to solve exactly this problem.
What's New
Prior retrieval systems for design assets relied on keyword tags or general-purpose vision-language embeddings (vector representations that encode meaning) such as CLIP, which were trained on captioned photographs. Those models struggle with infographics because the meaning of an infographic is distributed across layout, color, iconography, and embedded text — not a single natural scene. This paper adds a formative user study that produces a structured intent taxonomy, then uses that taxonomy to automatically enrich and disambiguate free-form queries before retrieval, improving match quality over vanilla CLIP-style baselines according to both automated metrics and a user study.
The Counter
The user study reported in the paper is small — typical for HCI (human-computer interaction) papers — and it's hard to know whether the retrieval gains hold at the scale of a real product with millions of diverse queries. The intent taxonomy was derived from a formative study of how people describe infographics, but that population may not represent the full range of users or languages a commercial product would face. The underlying retrieval models still rely on CLIP-family embeddings that the authors themselves acknowledge are poorly suited to infographics; the query-enrichment step is a workaround, not a fix. Canva already has substantial internal investment in template recommendation, and it has far more behavioral data than any academic corpus. Finally, the 'interactive agent' for adapting designs to new data is described but not rigorously benchmarked against real-world editing tasks, so the end-to-end authoring claim rests on thin evidence.
Longs
- ADBE (Adobe) — owns Adobe Express and a large stock-asset library directly threatened by or positioned to absorb this capability
- CDAY (Ceridian/design-adjacent SaaS, minor) — illustrative of HR/comms teams as end users
- GOOGL — Google Slides and Google's Workspace suite are natural integration surfaces for template retrieval
- SSTK (Shutterstock) — stock asset and template libraries are the corpus this system needs; could license or build this
Shorts
- Keyword-tag-based stock asset search (Getty Images, iStock) — their moat is metadata tagging, which this approach sidesteps entirely
- Generic CLIP-based multimodal search startups that haven't adapted to document-heavy visual formats
Enablers (Picks & Shovels)
- CLIP and its successors (OpenAI, open-source variants) — the vision-language embedding backbone the system builds on
- Large infographic corpora such as the Pew Research infographic dataset and academic collections used for training and evaluation
- LLM-based query expansion APIs (OpenAI, Anthropic) — used to rewrite and enrich vague user queries before retrieval
Private Watchlist
- Canva (private) — largest no-code design platform; template discovery is a core UX problem they are actively working on
- Gamma (private) — AI-native presentation tool that would benefit from intent-aware template retrieval
- Beautiful.ai (private) — slide and infographic authoring startup with template recommendation features
Resources
The Paper
While infographics have become a powerful medium for communicating data-driven stories, authoring them from scratch remains challenging, especially for novice users. Retrieving relevant exemplars from a large corpus can provide design inspiration and promote reuse, substantially lowering the barrier to infographic authoring. However, effective retrieval is difficult because users often express design intent in ambiguous natural language, while infographics embody rich and multi-faceted visual designs. As a result, keyword-based search often fails to capture design intent, and general-purpose vision-language retrieval models trained on natural images are ill-suited to the text-heavy, multi-component nature of infographics. To address these challenges, we develop an intent-aware infographic retrieval framework that better aligns user queries with infographic designs. We first conduct a formative study of how people describe infographics and derive an intent taxonomy spanning content and visual design facets. This taxonomy is then leveraged to enrich and refine free-form user queries, guiding the retrieval process with intent-specific cues. Building on the retrieved exemplars, users can adapt the designs to their own data with high-level edit intents, supported by an interactive agent that performs low-level adaptation. Both quantitative evaluations and user studies are conducted to demonstrate that our method improves retrieval quality over baseline methods while better supporting intent satisfaction and efficient infographic authoring.