Back to Posts

Agentic Workgraph: Python Workflows with Their Guts Showing

Rating SFW

## N8n for a World After Drag-and-Drop

Most workflow tools assume a human is still standing in front of the machine, dragging boxes around, clicking connectors, and gently convincing a JSON document to behave.

That is not the world we live in.

We live in a world where agents write code, run code, inspect code, retry code, and route work through systems that are too dynamic to fit cleanly inside a polite little YAML schema. We still want graphs. We still want history. We still want to see what happened. But we do not want to give up the thing code is good at: abstraction, reuse, composition, loops, branching, and ordinary Python logic.

That is why we built agentic-workgraph.

The short version:

N8n for a world where humans no longer interact with code directly.

The slightly less short version:

It feels like ComfyUI, n8n, PyTorch Dynamo, and IDA Pro had a baby. It lets us write workflows as ordinary Python functions, trace those functions into a graph, execute them with automatic item-level concurrency, and then inspect the resulting runs in a debugger-like UI that shows the machinery rather than hiding it under a marketing layer.

## Not JSON. Not YAML. Real Python.

The core idea is simple:

  • @node marks a function as a workflow step
  • @workflow marks a function that wires those nodes together
  • tracing turns those calls into a graph representation
  • execution turns that graph into a real run with events, artifacts, errors, and history

So instead of writing this:

steps:
  - fetch
  - summarize
  - publish

you write this:

@workflow(name="research-pipeline")
def research_pipeline():
    urls = fetch_urls(query="agentic frameworks")
    pages = scrape_pages(urls=urls)
    summaries = summarize(pages=pages)
    return publish_report(summaries=summaries)

That may look ordinary, and that is exactly the point. You can read it. An agent can edit it. You can factor pieces into helper functions. You can import models, constants, validators, and other workflows. You can put the whole thing in Git and treat it like software instead of like configuration pretending not to be software.

The graph is generated from the code by tracing execution, not by forcing you to maintain a second representation by hand.

There is a subtle difference there, and it matters.

Most workflow tools start from the graph and ask you to encode your logic inside it.

agentic-workgraph starts from the code and lets the graph emerge from the traced calls. The code is the primary artifact. The graph is the rendered structure of that code in motion.

That sounds philosophical until you try to maintain a real system for six months. Then it becomes practical very quickly.

## Why We Use It

At Xuthal Labs, our pipeline is not one straight line. It is a nest of preparations, training runs, evaluations, render batches, asset selection, release bundles, campaign drafts, and posting flows. A graph system helps, but only if it can survive contact with reality.

Reality looks like this:

  • a dataset can be built from ComfyUI generations, OpenAI generations, or source images
  • one workflow may need to launch another workflow and keep that child run inspectable
  • some steps fan out over dozens of items
  • some steps converge back into a single artifact
  • some workflows need artifact folders as the stable handoff contract
  • every run needs to be tied back to the actual code version that produced it

That last point matters more than most people think. History in agentic-workgraph is versioned, so runs are associated with the workflow version they were created from. If the workflow changes, the new runs are not ambiguously mixed with the old ones. We can tell which code shaped which result.

That makes debugging less like divination and more like engineering.

## Automatic Fan-Out and Parallel Execution

One of the nicest parts of the system is the list propagation model.

At the workflow level, values are carried as lists of items. At the node level, you write functions as though they process one thing:

@node(id="summarize_page", concurrency=8)
async def summarize_page(ctx, page: Page):
    return summarize_text(page.body)

And then the workflow passes a list:

pages = scrape_pages(urls=urls)
summaries = summarize_page(page=pages)

If pages contains 47 items, the executor sees 47 inputs and maps the node across them automatically. No explicit parallel() wrapper. No separate map primitive. No second language for concurrency. The fan-out is implicit in the list length, and the node concurrency setting controls how aggressively that work runs.

This is one of the few places where the machinery feels genuinely elegant. The code stays small. The behavior stays powerful.

### A Concrete Example: Research Fan-Out

This is the kind of thing that would be mildly annoying in many visual workflow tools and utterly ordinary here:

@workflow(name="research-pipeline")
def research_pipeline():
    urls = fetch_urls(query="agentic frameworks")
    pages = scrape_pages(urls=urls)
    summaries = summarize_page(page=pages)
    return synthesize_report(summary=summaries)

The behavior looks like this:

  • fetch_urls returns one list of URLs
  • scrape_pages expands that into page objects
  • summarize_page fans out over every page
  • synthesize_report pulls that set back into one report

You did not write a special “map” node. You did not manually wire a fan-out operator. You just passed a list into a node whose single-item signature made sense.

That turns out to be a very pleasant way to think.

## Supergraphs and Subgraphs

We recently added one of the features we wanted most: subgraphs.

That means one workflow can run another workflow as a node:

report = run_subgraph(
    workflow=subgraph_child,
    id="run_child_subgraph",
    kwargs={"claims": claims},
)

This is not fake nesting. The child workflow becomes a real run.

That means:

  • the child run shows up in history
  • it has its own timeline
  • it has its own debugger view
  • it has its own artifacts and errors
  • the parent graph still shows the subgraph call as one node in the larger stage

This is how we are building stage wrappers now. dataset-prep can orchestrate concept intake, seed set creation, and dataset assembly as child workflows. training-testing can orchestrate proofing, evaluation, multibase training setup, and validation the same way. The parent workflow acts as a conductor. The child workflows remain the real debugging surface.

That is the correct split. The stage should summarize. The child run should confess.

### A Concrete Example: Supergraph over Real Child Runs

This is the pattern we use in the Thalis pipeline:

@workflow(name="thalis-dataset-prep")
def thalis_dataset_prep():
    concept_packet = run_subgraph(
        workflow=thalis_concept_intake_to_packet,
        id="run_concept_intake",
        kwargs={"prompt_text": "A world made of glass cathedrals"},
    )
    seed_set = run_subgraph(
        workflow=thalis_concept_to_seed_set,
        id="run_concept_to_seed_set",
        kwargs={"selector": concept_packet},
    )
    dataset = run_subgraph(
        workflow=thalis_seed_set_to_dataset,
        id="run_seed_set_to_dataset",
        kwargs={"selector": seed_set},
    )
    return write_stage_artifact(dataset=dataset)

What matters is not just that the parent can call the children.

What matters is that:

  • each child becomes a real run
  • each child can fail independently and visibly
  • each child has its own artifacts
  • the parent keeps a high-level stage view without swallowing the internals

That makes supergraphs actually usable instead of merely decorative.

## The Debugger Part

A lot of workflow tools want to look simple. They hide the internals. They show a pretty pipeline diagram and hope you do not ask where the state actually lives.

We wanted the opposite.

The UI in agentic-workgraph is read-only except for launching runs. It is not trying to be a visual editor. It is trying to be an instrument panel.

You can inspect:

  • node state
  • item-level execution
  • streamed events
  • errors
  • artifacts
  • traces
  • subgraph runs
  • workflow history across versions

That is where the IDA Pro / debugger analogy comes in. We do not want a graph that merely flatters the code. We want a graph that shows the inner workings of the workflow while it is alive, and a run history that lets us autopsy it cleanly when it fails.

### History That Remembers the Code Version

One of the features I wanted early was versioned history.

If a workflow changes, the graph hash changes. Runs are associated with the version that produced them. That means:

  • you can see which historical run belongs to which workflow definition
  • you do not silently mix pre-refactor and post-refactor runs together
  • a debugger trace remains tied to the code that actually produced it

That sounds obvious, but a lot of workflow systems act as though all runs of “the same workflow name” are naturally comparable forever. They are not. Software changes. Behavior changes. The history should admit that.

## Tracing, Loops, and the Honest Answer

Because the graph is built by tracing Python rather than by constraining the language into a rigid DAG builder, it can handle things many workflow tools avoid:

  • ordinary control flow
  • loops
  • branching
  • subgraphs
  • dynamic item fan-out

There are tradeoffs.

If a workflow is purely acyclic, ordering is easy. If you introduce loops or random dispatch, there is no single perfect canonical ordering for all possible executions. What you get is an observed execution shape, not a god’s-eye theorem about every path the program might take.

That is acceptable for our use. These are operational workflows, not proofs. We care about what happened, what is likely to happen, and where the machinery bent. Tracing gives us that. For the pathological cases, you can always build more advanced representations later: sampled traces, collapsed loop regions, strongly connected components, and other tricks from compiler land.

But the baseline model is already useful:

  • write normal Python
  • trace it into a graph
  • run it
  • inspect the result

Useful first. Perfect later.

### A Concrete Example: The Weird Case

We even tested the strange edge case out loud: a workflow that keeps a list of node functions, shuffles them, and calls them in that shuffled order.

That is not a normal production workflow. It is a stress test for the model.

The result was instructive:

  • execution works
  • tracing captures the encountered order for that run
  • the resulting graph is useful as an observed execution graph
  • it is not a canonical statement about every possible path the program might have taken

That is the honest answer. And an honest answer is more valuable than a fake promise of universal static order.

## How We Actually Use It

Right now, we use agentic-workgraph to drive the backbone of the Thalis workflow system:

  • concept intake and normalization
  • seed-set generation
  • dataset assembly
  • proofing job creation
  • proofing evaluation
  • multibase training setup
  • model validation
  • prompt library generation
  • render batching
  • asset bucketing
  • Reel preparation

The next layer is already taking shape:

  • release bundle preparation
  • campaign dispatch bundle preparation
  • stage-level supergraphs that wrap existing workflows without erasing them

In practice, one of our stage boundaries now looks like this:

  • concept-intake-to-packet
  • concept-to-seed-set
  • seed-set-to-dataset
  • wrapped by dataset-prep

And another looks like this:

  • dataset-to-proofing-job
  • proofing-job-to-eval
  • proofed-concept-to-multibase-batch
  • multibase-run-to-model-validation
  • wrapped by training-testing

That is exactly the kind of layering the system is meant to support: small concrete workflows below, larger stage orchestration above, and real artifacts at every handoff.

The important design choice is that our workflow artifacts are usually artifact folders on disk. One graph writes a folder with a manifest and related files; the next graph accepts a selector pointing at that folder. That keeps the contracts concrete. Agents can inspect them. Humans can inspect them. Git and the filesystem remain first-class tools rather than embarrassing implementation details.

## Why This Matters

There is a category of software that still thinks “no-code” means “we have replaced code.”

What it usually means is:

  • the code still exists
  • you just are not allowed to touch it directly
  • and eventually someone has to tunnel underneath the UI with scripts anyway

We are not interested in pretending code went away.

We are interested in building systems for a world where:

  • humans increasingly supervise rather than hand-author every step
  • agents increasingly write and modify workflow logic
  • observability matters more than toy visual editing
  • versioned execution history matters more than polished brochureware

And, just as importantly:

  • agents need to be able to read and rewrite the workflow source directly
  • the visual graph should help explain the code, not replace it
  • artifacts should live in normal files and folders that other tools can inspect
  • the whole system should still feel like software, not like a trapped wizard inside a GUI

In that world, Python is still the right medium. The graph is not a replacement for the code. It is a rendered, inspectable shadow of the code as it moves.

That is what agentic-workgraph gives us.

## Try It

The project is open source on GitHub:

thalismind/agentic-workgraph

If you want a workflow system that behaves more like a debugger, a profiler, and a runtime than like a form builder, it may be useful to you too.

The lotus-eaters will keep wiring YAML together and calling it orchestration.

We prefer code that leaves tracks.