Research
Published work on multi-agent AI systems — identity-aware protocols, structured collective reasoning, and the infrastructure that makes delegation work.
Papers
Trust infrastructure for multi-agent AI systems: identity, delegation, provenance, and collective reasoning.
AIP: Agent Identity Protocol for Verifiable Delegation Across MCP and A2A
Identity layer. A scan of ~2,000 MCP servers found all lacked authentication. MCP added OAuth 2.1, but it only covers single-hop auth. When an orchestrator delegates to a specialist that calls a tool, the delegation chain disappears. AIP introduces Invocation-Bound Capability Tokens (IBCTs) in two formats: compact (JWT + EdDSA) for single-hop and chained (Biscuit + Datalog) for multi-hop delegation with cryptographic scope attenuation.
Key results. 0.049ms verification in Rust, 0.189ms in Python. 0.086% overhead in real LLM multi-agent chains. 100% rejection rate across 600 adversarial attacks in 6 categories, including two attack classes (delegation depth violation, audit evasion) that plain JWT deployments cannot detect.
The Provenance Paradox in Multi-Agent LLM Routing: Delegation Contracts and Attested Identity in LDP
When delegates can inflate self-reported quality scores, quality-based routing produces a provenance paradox: it systematically selects the worst delegates, performing worse than random. This paper extends LDP with delegation contracts, a claimed-vs-attested identity model, and typed failure semantics.
Key results. Self-claimed routing performs worse than random selection (simulated: 0.55 vs. 0.68; real models: 8.90 vs. 9.30). Attested routing achieves near-optimal performance (d=9.51, p<0.001). Sensitivity analysis across 36 configurations confirms the paradox emerges reliably. Sub-microsecond validation overhead, fully backward-compatible.
DCI: From Debate to Deliberation — Structured Collective Reasoning with Typed Epistemic Acts
Multi-agent debate is the dominant approach for collective LLM reasoning, but it discards disagreements, lacks convergence guarantees, and scales poorly. Deliberative Collective Intelligence (DCI) introduces typed reasoning moves (assert, challenge, refine, synthesize), preserved disagreements as first-class objects, and a convergence algorithm that guarantees termination.
Key results. +0.95 quality gain over debate on non-routine tasks. 9.56 hidden-profile score (best in study). Guaranteed convergence in bounded rounds. Honest null results: debate wins on routine tasks where the 62x token cost isn't justified.
LDP: An Identity-Aware Protocol for Multi-Agent LLM Systems
Current agent protocols (A2A, MCP) treat AI agents as opaque services — exposing only a name and skill list. LDP extends service-oriented protocols with rich delegate identity cards, progressive payload modes, governed sessions, structured provenance, and trust domains. This enables metadata-aware routing: send easy tasks to fast models, hard tasks to capable ones.
Key results. ~12x lower latency on easy tasks through delegate specialization. 37% token reduction via semantic frames (p=0.031). 96% vs 6% attack detection with trust domains. Noisy provenance degrades quality below no-provenance baseline — verification matters.
Open Source
Research-adjacent code — protocol implementations, experiment harnesses, and reference architectures.
aip
Agent Identity Protocol. Verifiable, delegable identity for AI agents across MCP and A2A. Rust + Python reference implementations. Apache 2.0.
ldp-protocol
LLM Delegate Protocol — Python SDK and Rust reference implementation. Identity-aware routing, provenance tracking, trust domains. pip install ldp-protocol
ldp-research
Experiment code and data for the LDP paper. Six research questions, A2A baselines, ablation conditions, LLM-as-judge evaluation.
enterprise-rag-bench
RAG patterns benchmarked for enterprise. Five chunking strategies, five retrieval patterns, evaluation harness, guardrails.
applied-nlp-research
Production NLP from pre-LLM to post-LLM era. Capsule networks, BiLSTM-CRF for NER, transformer fine-tuning, PyTorch.