new RFD: Create websocket-transport.md#476
new RFD: Create websocket-transport.md#476steve02081504 wants to merge 2 commits intoagentclientprotocol:mainfrom
Conversation
|
We've got an ACP CLI that uses WebSockets to run AI agents in the cloud instead of locally. If you're into adopting WebSockets as a standard way to communicate between Agent and Client, here are some findings. Authentication/Authorization
Communicating Results Back to the ClientThe main difference when running agents in the cloud is that the client and agent don't share the same filesystem. So, they need a different way to communicate the location of source files when:
We've added custom extensions to the ACP protocol to describe the git-remote in the Connection ReliabilityStdio is a form of IPC, so it's a reliable communication channel. WebSockets are not, so we need to figure out what the client should do when there's a transport error. There are two options: (a) propagating error to the end-user, (b) reestablish connection and restore ACP session state.
Right now, we're leaning towards (b) (WIP). We're leaving dangling sessions behind and relying on the cloud infrastructure to clean them up eventually. There's an alternative solution that hasn't been explored: tunneling the ACP protocol using a message-oriented channel with delivery guarantees. This can be done easily by enumerating all messages and implementing a simple ACK protocol for 2-way at-least-once delivery. This is how VSCode IPC/RPC works (the protocol between window and extension host processes). |
Elevator pitch
While the Agent Communication Protocol (ACP) supports multiple concurrent sessions, the current SDK exclusively uses stdio for transport. This creates a tight 1:1 coupling between the client and the agent process, making it difficult to expose a single agent instance to multiple clients without complex bridge processes. This proposal introduces native WebSocket transport support, unlocking the protocol's native multi-session capabilities over the network and enabling scalable, service-oriented agent architectures.
Status quo
Currently, the ACP SDK (
@agentclientprotocol/sdk) is designed around stdio-based communication. While the protocol allows a single agent to handle multiple sessions, stdio streams are inherently single-connection pipes.To connect multiple clients (e.g., multiple IDE windows or remote users) to a single agent backend, developers are currently forced to use a "Bridge Architecture":
ws://localhost:8931/ws/acp).Example from fount framework:
Problems with this approach:
Current Workarounds:
fount_ide_agent.mjs).What we propose to do about it
We propose extending the ACP SDK to natively support WebSocket as a transport layer, alongside the existing stdio transport.
Key Design Principles:
Proposed Client API:
Proposed Server API:
Shiny future
In the shiny future, ACP agents can fully utilize their multi-session capabilities:
Client <-> Agent, reducing latency and points of failure.Implementation details and plan
Phase 1: Core Transport Abstraction (v1.0)
Goal: Separate protocol logic from transport implementation.
Refactor Existing Code:
StdioTransportclass.AgentClientConnectionandAgentSideConnectionaccept aTransportparameter.Backward Compatibility:
StdioTransport.Phase 2: WebSocket Transport (v1.1)
Goal: Implement WebSocket transport with feature parity to stdio.
Client-Side:
WebSocketTransportclass.Server-Side:
webSocketStream(ws)helper for common WebSocket libraries (ws,uWebSockets.js).Testing:
ws, Deno native).Phase 3: Documentation & Ecosystem (v1.2)
Reference Implementations:
Migration Guide:
Best Practices:
Phase 4 (Optional): Additional Transports (v2.0)
Frequently asked questions
Since ACP supports multiple sessions, why does stdio limit us to one process per client?
While an ACP Agent logic can handle infinite sessions, the stdio transport is a physical 1:1 pipe. You cannot easily multiplex multiple distinct clients (e.g., a desktop IDE and a web dashboard) into the same
stdinstream without writing a complex custom multiplexer.Currently, to share one agent, clients spawn "bridge" processes. This proposal removes the need for those bridges, allowing the Agent to handle connection multiplexing natively via the WebSocket server implementation, which is the standard way to handle concurrency in network services.
What alternative approaches did you consider, and why did you settle on this one?
WebSocket was chosen because:
How does this affect existing stdio-based agents?
Zero impact. The current stdio-based API remains the default:
The WebSocket transport is opt-in, activated only when explicitly configured.
What about authentication and security?
ACP over WebSocket supports multiple authentication strategies:
new WebSocket(url, ['api-key-token'])).Authorizationheader during WebSocket handshake.ws://host/acp?token=...).The SDK will document best practices for each scenario. TLS (wss://) should be used in production to encrypt all communication.
Won't this make the SDK more complex?
The complexity is encapsulated within transport implementations. The core protocol logic (
initialize,newSession,prompt) remains unchanged. Developers using the SDK only see:For advanced use cases, the transport abstraction provides clear extension points without coupling to SDK internals.
How does this compare to Language Server Protocol (LSP)?
LSP faced a similar challenge and now supports:
ACP should follow LSP's proven approach: protocol-transport independence enables the same agent implementation to work across desktop IDEs, web IDEs, mobile apps, and CLI tools.
What about connection lifecycle and error handling?
The SDK will handle common WebSocket scenarios:
disconnectevents; optionally auto-reconnect with exponential backoff.Reconnection behavior will be configurable:
Can I use this with serverless platforms?
Yes, with caveats:
The SDK will document these platform-specific patterns.