-
Notifications
You must be signed in to change notification settings - Fork 1
FCE-2750 Add support for agent image capture #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for agent image capture functionality, enabling agents to request and receive images from video tracks in real-time. The changes include new protobuf message types for image capture requests/responses and a multimodal example demonstrating integration with Google's Gemini Live API.
Changes:
- Added
capture_imagemethod toAgentSessionfor requesting images from video tracks - Introduced
IncomingTrackImagemessage type for receiving captured images - Created a comprehensive multimodal example showing audio+video interaction with Gemini Live API
Reviewed changes
Copilot reviewed 17 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Added multimodal example to workspace members |
| protos | Updated submodule to include agent image capture protocol definitions |
| fishjam/integrations/gemini.py | Added metadata to output audio track settings |
| fishjam/events/_protos/fishjam/init.py | Added protobuf message types for image capture and channel management |
| fishjam/agent/agent.py | Implemented capture_image method and IncomingTrackImage message handling |
| fishjam/agent/init.py | Exported IncomingTrackImage type |
| examples/multimodal/pyproject.toml | Configuration for new multimodal example project |
| examples/multimodal/multimodal/worker.py | Background task worker for managing async operations |
| examples/multimodal/multimodal/session.py | Gemini Live API session management for audio and image streaming |
| examples/multimodal/multimodal/room.py | Fishjam room service managing room and agent lifecycle |
| examples/multimodal/multimodal/notifier.py | Event handlers for peer and track lifecycle notifications |
| examples/multimodal/multimodal/config.py | Configuration for Gemini model and capture settings |
| examples/multimodal/multimodal/agent.py | Multimodal agent coordinating video capture and Gemini interaction |
| examples/multimodal/main.py | FastAPI application entry point |
| examples/multimodal/README.md | Documentation for the multimodal example |
| compile_proto.sh | Updated protobuf compilation to use uv run |
| .gitmodules | Updated protos submodule to use agent-capture-image branch |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7b296bf to
bc9545d
Compare
aa5f13d to
de4e01d
Compare
This reverts commit 75ffe1e.
Description
Describe your changes.
Motivation and Context
Why is this change required? What problem does it solve? If it fixes an open
issue, please link to the issue here.
Documentation impact
Types of changes
not work as expected)