Research

Published work on multi-agent AI systems — identity-aware protocols, structured collective reasoning, and the infrastructure that makes delegation work.

Papers

Trust infrastructure for multi-agent AI systems: identity, delegation, provenance, and collective reasoning.

AIP: Agent Identity Protocol for Verifiable Delegation Across MCP and A2A

Sunil Prakash · arXiv:2603.24775 · 2026

Identity layer. A scan of ~2,000 MCP servers found all lacked authentication. MCP added OAuth 2.1, but it only covers single-hop auth. When an orchestrator delegates to a specialist that calls a tool, the delegation chain disappears. AIP introduces Invocation-Bound Capability Tokens (IBCTs) in two formats: compact (JWT + EdDSA) for single-hop and chained (Biscuit + Datalog) for multi-hop delegation with cryptographic scope attenuation.

Key results. 0.049ms verification in Rust, 0.189ms in Python. 0.086% overhead in real LLM multi-agent chains. 100% rejection rate across 600 adversarial attacks in 6 categories, including two attack classes (delegation depth violation, audit evasion) that plain JWT deployments cannot detect.

Paper Blog post Code

The Provenance Paradox in Multi-Agent LLM Routing: Delegation Contracts and Attested Identity in LDP

Sunil Prakash · arXiv:2603.18043 · 2026

When delegates can inflate self-reported quality scores, quality-based routing produces a provenance paradox: it systematically selects the worst delegates, performing worse than random. This paper extends LDP with delegation contracts, a claimed-vs-attested identity model, and typed failure semantics.

Key results. Self-claimed routing performs worse than random selection (simulated: 0.55 vs. 0.68; real models: 8.90 vs. 9.30). Attested routing achieves near-optimal performance (d=9.51, p<0.001). Sensitivity analysis across 36 configurations confirms the paradox emerges reliably. Sub-microsecond validation overhead, fully backward-compatible.

Paper Experiment code

DCI: From Debate to Deliberation — Structured Collective Reasoning with Typed Epistemic Acts

Sunil Prakash · arXiv:2603.11781 · 2026

Multi-agent debate is the dominant approach for collective LLM reasoning, but it discards disagreements, lacks convergence guarantees, and scales poorly. Deliberative Collective Intelligence (DCI) introduces typed reasoning moves (assert, challenge, refine, synthesize), preserved disagreements as first-class objects, and a convergence algorithm that guarantees termination.

Key results. +0.95 quality gain over debate on non-routine tasks. 9.56 hidden-profile score (best in study). Guaranteed convergence in bounded rounds. Honest null results: debate wins on routine tasks where the 62x token cost isn't justified.

Interactive paper Paper Blog post

LDP: An Identity-Aware Protocol for Multi-Agent LLM Systems

Sunil Prakash · arXiv:2603.08852 · 2026

Current agent protocols (A2A, MCP) treat AI agents as opaque services — exposing only a name and skill list. LDP extends service-oriented protocols with rich delegate identity cards, progressive payload modes, governed sessions, structured provenance, and trust domains. This enables metadata-aware routing: send easy tasks to fast models, hard tasks to capable ones.

Key results. ~12x lower latency on easy tasks through delegate specialization. 37% token reduction via semantic frames (p=0.031). 96% vs 6% attack detection with trust domains. Noisy provenance degrades quality below no-provenance baseline — verification matters.

Interactive paper Project page Paper Blog post Protocol code Experiment code

Open Source

Research-adjacent code — protocol implementations, experiment harnesses, and reference architectures.

aip

Agent Identity Protocol. Verifiable, delegable identity for AI agents across MCP and A2A. Rust + Python reference implementations. Apache 2.0.

View

ldp-protocol

LLM Delegate Protocol — Python SDK and Rust reference implementation. Identity-aware routing, provenance tracking, trust domains. pip install ldp-protocol

View

ldp-research

Experiment code and data for the LDP paper. Six research questions, A2A baselines, ablation conditions, LLM-as-judge evaluation.

View

enterprise-rag-bench

RAG patterns benchmarked for enterprise. Five chunking strategies, five retrieval patterns, evaluation harness, guardrails.

View

applied-nlp-research

Production NLP from pre-LLM to post-LLM era. Capsule networks, BiLSTM-CRF for NER, transformer fine-tuning, PyTorch.

View

Profiles

Find me on academic platforms.

Google Scholar ORCID arXiv GitHub LinkedIn