← Writing

The Missing Agent Stack: Identity + Durable Execution with LDP and JamJet

Three agents. One question. The router has to pick. Every multi-agent framework solves this the same way: match skill labels. The agent says “I do research,” the router says “good enough.”

Here’s what happens when you add identity — metadata about who each agent is, not just what it claims to do:

What is the capital of France?
  0.936  quick-lookup     [fast, cost=low]
  0.618  domain-specialist [domain-expert, cost=medium]
  0.442  deep-analyst      [analytical, cost=high]

Analyze the impact of QE on emerging market debt sustainability
  1.000  domain-specialist [domain-expert, cost=medium]
  0.933  deep-analyst      [analytical, cost=high]
  0.606  quick-lookup      [fast, cost=low]

What are the tax implications of Roth IRA conversions?
  0.800  domain-specialist [domain-expert, cost=medium]
  0.668  quick-lookup      [fast, cost=low]
  0.668  deep-analyst      [analytical, cost=high]

Easy question → fast, cheap agent wins. Hard question → analytical agent wins. Domain question → domain expertise breaks the tie. All decided in ~2ms, no LLM call for routing. Without identity cards, the router sees three identical “research” agents and picks randomly.

This is actual output from a runnable example using LDP for identity and JamJet for execution. Let’s walk through how it works.

The Stack

Identity — WHO is this agent? LDP Governance — SHOULD it do this? LDP Routing — WHERE to send it? LDP + JamJet Execution — WILL it complete reliably? JamJet Audit — WHAT happened? Provenance missing everywhere missing everywhere skill-match only some frameworks rarely tracked

LangGraph gives you execution. MCP gives you tool communication. A2A gives you agent-to-agent messaging. None of them tell you who the agent is, whether it should handle a given task, or what happened after it did. LDP fills identity and governance. JamJet fills execution with durability — crash recovery, checkpointing, cost caps enforced at the Rust layer. Together they cover the full stack.

Identity Cards

In A2A, an agent exposes a name and a skill list. In LDP, an agent carries an identity card — structured metadata about the model behind it. Here’s the deep-analyst agent:

AgentCandidate(
    uri="jamjet://research/deep-analyst",
    agent_card={
        "name": "Deep Analyst",
        "labels": {
            "ldp.delegate_id": "ldp:delegate:deep-analyst",
            "ldp.model_family": "Claude",
            "ldp.reasoning_profile": "analytical",
            "ldp.cost_profile": "high",
            "ldp.latency_profile": "p50:8000ms",
            "ldp.quality_score": "0.92",
        },
    },
    skills=["deep-analysis", "reasoning", "multi-step"],
    trust_domain="research",
)

A reasoning_profile of “analytical” means the model handles multi-step reasoning but is slow. A quality_score of 0.92 means it’s excellent but expensive. These fields are protocol-level — declared once, available to any router, no LLM call to access them. The quick-lookup and domain-specialist agents follow the same pattern with different profiles. Full definitions →

Routing Strategy

JamJet’s Coordinator supports pluggable strategies. We subclass DefaultCoordinatorStrategy to score agents using their LDP identity metadata. The core idea: classify task difficulty with heuristics (~2ms), then weight five dimensions differently based on difficulty.

class LdpCoordinatorStrategy(DefaultCoordinatorStrategy):

    async def score(self, task, candidates, weights, context):
        difficulty = classify_difficulty(task)  # keyword heuristics, ~2ms

        for c in candidates:
            labels = c.agent_card.get("labels", {})
            reasoning = labels.get("ldp.reasoning_profile", "")
            quality = float(labels.get("ldp.quality_score", "0.5"))

            # Hard tasks want analytical profiles + high quality
            # Easy tasks want fast profiles + low cost
            reasoning_fit = 1.0 if (
                (difficulty == "hard" and reasoning == "analytical") or
                (difficulty == "easy" and reasoning == "fast")
            ) else 0.5

            # Dynamic weights shift by difficulty
            ldp_weights = {
                "capability_fit": 2.0,
                "cost_fit": 1.5 if difficulty == "easy" else 0.5,
                "historical_performance":
                    2.5 if difficulty == "hard" else 1.0,
            }
            # ... compute composite score

Easy tasks heavily weight cost (1.5) and barely weight quality (1.0). Hard tasks flip it: quality gets 2.5, cost drops to 0.5. Domain matches add a bonus. The result: a fast, cheap agent scores 0.94 on “capital of France” while the expensive analyst scores 0.44 — exactly what you want.

Full strategy implementation →

Easy Hard + Domain Medium + Domain “Capital of France?” “QE on EM debt” “Roth IRA tax” quick-lookup 0.94 domain-specialist 0.62 deep-analyst 0.44 domain-specialist 1.00 deep-analyst 0.93 quick-lookup 0.61 domain-specialist 0.80 quick-lookup 0.67 deep-analyst 0.67 Easy → fast/cheap agent wins. Hard → analytical/domain agent wins. Domain expertise breaks ties. All routing decisions in ~2ms, no LLM call. In the LDP paper, this achieved ~12x lower latency on easy tasks vs A2A skill-matching.

Provenance-Weighted Synthesis

Routing picks the best agent, but sometimes you want all perspectives. The example fans out to all three agents in parallel, then weights their answers using LDP provenance — metadata attached to every result about who produced it:

provenance = {
    "produced_by": "ldp:delegate:deep-analyst",
    "reasoning_profile": "analytical",
    "quality_score": 0.92,
    "confidence": 0.88,   # self-reported for this specific answer
}

Three signals determine how much each answer influences the synthesis:

For the QE analysis question, the domain-specialist gets weight 0.86 (rank #1, domain match). The deep-analyst gets 0.74 (rank #2 but quality 0.92, confidence 0.88). The quick-lookup gets 0.41 (rank #3, confidence 0.45 — even the agent knows it’s out of its depth).

rank_weight = 1.0 / (1 + rank)          # positional decay
combined = (rank_weight * 0.4
            + quality_score * 0.35
            + confidence * 0.25)

This implements the LDP paper’s RQ3 finding: structured provenance improves synthesis when signals are accurate. When provenance is noisy — inflated confidence, unverified claims — it actually degrades quality below the no-provenance baseline. Honest metadata beats detailed metadata.

Durable Execution

Identity-aware routing is a protocol concern. But once you’ve made a smart routing decision and are making real API calls, you need the execution to be reliable.

JamJet checkpoints every step automatically. If the deep-analyst crashes mid-analysis, the runtime resumes from the last checkpoint — it doesn’t re-route, re-execute the other agents, or lose their provenance. The synthesis step has everything it needs after recovery.

Cost caps are the other critical piece. LDP identity cards declare cost profiles, but declaration is not enforcement. JamJet enforces budget limits at the Rust layer — not a prompt that says “please stay under $2,” but an actual runtime constraint that terminates execution when the budget is hit.

Agent(
    "deep-analyst",
    model="claude-sonnet-4-6",
    max_cost_usd=2.0,     # hard stop, enforced at Rust layer
    max_iterations=15,
)

The identity card says what the agent should cost. The runtime ensures it actually costs that.

Try It

git clone https://github.com/jamjet-labs/jamjet
python examples/ldp-identity-routing/main.py

Runs with mock responses, no API key needed. The output shows routing scores, each agent’s response with provenance, and the weighted synthesis.

Using real LLMs instead of mocks →

Set ANTHROPIC_API_KEY, uncomment the REAL_AGENTS block in main.py, and replace the mock execute_agent() with agent.run() calls. Same routing logic, same provenance, real model responses. See the README for details.

Where This Fits

LDP and JamJet aren’t competitors to A2A or MCP — they’re complementary layers. A2A handles agent-to-agent messaging. MCP handles tool access. LDP tells you who the agent is and whether it should handle a task. JamJet makes sure it executes reliably and within budget.

The example is deliberately simple — three agents, three questions. But the architecture scales: more agents makes identity-aware routing more valuable (the cost of misrouting increases), real API calls make durable execution essential (crashes happen), and cross-organizational agents make trust domains the security boundary that prevents untrusted models from handling sensitive tasks.

The protocol layer is where these properties should live. Not in application code. Not in prompts. In the infrastructure.

Related


LDP paper: arXiv:2603.08852. JamJet: jamjet-labs/jamjet. LDP reference implementation: ldp-protocol.

Want the visual version? See the interactive LDP paper with animated diagrams and experiment charts.