Why Multi-Agent AI Systems Need Identity-Aware Routing
Multi-agent systems are the next frontier of production AI. The idea is straightforward: instead of one monolithic model doing everything, you have specialized agents that delegate tasks to each other. A coding agent hands off a math problem to a reasoning agent. A triage agent routes customer queries to the right specialist. The promise is efficiency, modularity, and better results.
The problem is that the protocols these agents use to communicate — Google's A2A, Anthropic's MCP — treat every model as a black box. They expose skill names and descriptions. They don't expose what actually matters for delegation: what kind of model is behind the agent, how fast it is, how much it costs, whether it reasons well or just pattern-matches quickly.
This is the gap that motivated LDP (LLM Delegate Protocol) — an identity-aware protocol designed for delegation between LLM-based agents. The core thesis: if you're routing tasks between models, you need to know more about those models than their skill labels.
The Routing Problem
Consider a system with three delegates: an 8B parameter reasoning model, a 7B coding specialist, and a lightweight 3B classifier. A user submits a straightforward sentiment classification task. Under A2A, the router sees three agents advertising overlapping skills. It picks one based on skill-name matching — and might route a trivial classification to the 8B reasoning model. The task completes in 35 seconds instead of 3.
This isn't hypothetical. In our experiments, identity-aware routing achieved roughly 12x lower latency on easy tasks by sending them to the lightweight model — because the protocol knew it was lightweight, fast, and good enough for classification. A2A's skill-matching couldn't make that distinction.
The latency difference sounds like an optimization problem, but at scale it becomes an economics problem. Every unnecessary second on a heavy model is wasted compute, wasted tokens, wasted cost. Multiply that across thousands of delegations per hour and the protocol's inability to distinguish models becomes a material expense.
What LDP Adds
LDP introduces five mechanisms that existing protocols lack. Each addresses a specific limitation we observed in production-style multi-agent setups.
1. Delegate Identity Cards
A2A's Agent Card has 7 fields: name, description, version, URL, skills, authentication, and capabilities. LDP's Delegate Identity Card has 20+ fields organized into core identity, trust and security, capabilities, and behavioral profiles.
The critical additions are quality hints (a continuous 0–1 score per capability), reasoning profiles (qualitative: "deep-analytical" vs. "fast-practical"), cost profiles, and latency hints. These are the fields that enable a router to make intelligent delegation decisions rather than guessing from skill names.
A2A's Agent Card vs. LDP's Delegate Identity Card
2. Progressive Payload Modes
Not all communication between agents needs to be verbose natural language. LDP defines six payload modes of increasing efficiency, from plain text (Mode 0) to semantic frames (Mode 1, structured JSON with typed fields) to more compact representations.
In practice, semantic frames reduced token consumption by 37% compared to plain text — a statistically significant result (p=0.031) — without any quality degradation. The structured format helps models focus. A2A's JSON wrapping, by contrast, saves only about 7% because it wraps verbose text in a JSON envelope rather than restructuring the communication itself.
Delegates negotiate the richest mutually supported mode during session establishment. If a higher mode fails mid-exchange — schema validation error, codec incompatibility — the protocol falls back automatically: Mode N to N-1 to eventually Mode 0. Every delegate must support plain text, so communication never fails entirely. In our simulated failure tests, LDP achieved 100% task completion across all failure types, compared to 35% for A2A.
Six progressive payload modes with automatic fallback. Mode 1 (semantic frames) was empirically validated.
3. Governed Sessions
A2A is stateless by design. Each request is independent. This means that in a 10-round conversation between agents, the entire context must be re-transmitted with every message. By round 10, 39% of the tokens are pure overhead — context the receiving agent has already seen.
LDP introduces persistent sessions with server-side context. A five-step handshake (HELLO, CAPABILITY_MANIFEST, SESSION_PROPOSE, SESSION_ACCEPT, then task exchange) establishes the session once. After that, context is maintained server-side, eliminating the re-transmission tax.
LDP governed session: setup once, then exchange tasks without re-transmitting context
The savings are modest at 3 rounds (about 10%) but grow linearly. At 10 rounds, LDP used 12,990 tokens versus A2A's 16,010. For long-running agent collaborations — research tasks, multi-step planning, iterative code review — this overhead compounds fast.
4. Structured Provenance
When a downstream agent receives results from three upstream delegates and must synthesize a final answer, it helps to know which delegate produced what, with what confidence, and whether that confidence was verified.
This led to one of the more surprising findings in our research: the provenance paradox. Accurate provenance didn't significantly improve synthesis quality over no provenance at all (p=0.47). But noisy provenance — unverified self-reported confidence — actively harmed quality, doubling the variance of output scores. When one delegate's confidence was artificially inflated to 0.99 and marked as verified, the synthesizer over-weighted its output and produced worse decisions.
The design implication is clear: a protocol that exposes confidence without verification may be worse than one that exposes no confidence at all. This is why LDP's provenance structure includes explicit verification.performed and verification.status fields. A2A provides no provenance beyond task completion status.
5. Trust Domains
A2A relies on transport-level authentication — bearer tokens. This covers "is this request authenticated?" but cannot answer "is this agent allowed to perform this specific action?" or "has this exact message been seen before?"
LDP introduces trust domains — security boundaries within which identity, policy, and transport guarantees are enforced at three levels: per-message signatures with replay protection, session-level trust domain compatibility checks, and a policy engine that validates each task against configurable rules.
In simulated security analysis, LDP detected 96% of attack attempts (untrusted domain joins, capability escalation, replay attacks, cross-domain access) compared to 6% for bearer token authentication. This is a protocol-design evaluation, not an empirical penetration test — the detection rates follow from the presence or absence of the relevant protocol fields. But that's precisely the point: A2A's protocol design doesn't have the fields needed to detect these attack categories.
What Didn't Work
The honest finding: identity-aware routing did not improve aggregate quality over skill-matching. Across 30 tasks at three difficulty levels, A2A's skill-matching scored 7.43 versus LDP's 6.80. The difference wasn't statistically significant (p=0.56), but the direction was opposite to our hypothesis.
Why? Partly because our delegate pool was small — three models. With only three options, random selection gives you a 1-in-3 chance of picking the optimal delegate. The routing advantage of knowing model properties is expected to emerge with larger, more heterogeneous pools where the cost of misrouting increases.
Partly because the quality benefits of identity-enriched prompts — injecting delegate metadata into the system prompt — showed only modest, difficulty-dependent effects. On hard tasks, identity prompts scored 4.81 versus 3.80 for generic prompts, but the difference didn't reach significance at n=10. The sample sizes were too small to detect what may be real but moderate effects.
We report these null results because they inform where LDP's value actually lies. It's not in making individual responses better. It's in making the system faster, cheaper, and more governable — routing efficiency, token reduction, session management, and security boundaries.
Practical Adoption
LDP doesn't require an all-or-nothing adoption. We propose three interoperability profiles:
Profile A (Basic): Identity cards + text payloads + signed messages. This captures the routing benefit — 12x latency reduction on easy tasks — with minimal integration overhead. Any system that can attach metadata to agent descriptions can implement this.
Profile B (Enterprise): Adds provenance tracking with verification fields and policy enforcement. This is for regulated environments where you need to know which model produced which output, and whether those confidence scores were verified.
Profile C (High-Performance): Payload mode negotiation and governed sessions. This captures the 37% token reduction and eliminates session overhead. Worth the complexity for high-volume systems where token costs are a line item.
A natural question is whether A2A could simply be extended with custom metadata fields. In principle, yes — you could add model properties as custom fields. But without negotiation semantics, fallback mechanisms, session lifecycle, or policy enforcement built into the protocol, those extensions remain fragile. A custom field that nobody validates, nobody negotiates, and nobody falls back from is a comment, not a protocol primitive.
Where This Goes
This is initial evidence from a controlled setting — three local models, 30 tasks per condition, a single LLM judge. The results that reached significance (payload efficiency, session overhead) are robust. The results that didn't (routing quality, provenance value) suggest real effects that need larger-scale validation.
The open questions are practical: Should identity fields be self-declared by model providers, measured by benchmarks, or certified by external parties? How do the routing benefits scale at 50 or 500 delegates instead of 3? Do the higher payload modes (embedding hints, latent capsules) deliver on their theoretical promise?
What we can say with confidence is that treating all agents as interchangeable black boxes — the current default — leaves efficiency and governance on the table. The protocol layer is where those properties should live.
The full paper is available at arXiv:2603.08852. LDP is implemented as a plugin for the JamJet agent runtime. The protocol specification, implementation, and experiment code are open-source: ldp-protocol and ldp-research.