This project connects your Unity editor directly to the Gemini 2.0 Multimodal Live API, acting as a real-time pair-programming and debugging assistant that can see your screen and talk with you.
- Backend Middleware (
/backend): A minimal Google Cloud Run Python FastAPI service that bridges basic WebSockets to Google's officialgoogle-genaiasync SDK. - Unity Client (
/unity): Four lightweight standalone C# standard-library scripts you drop into your Unity project to handle screen grabbing, WebSocket streaming, and mic/audio streaming.
You will need Google Cloud CLI installed, and billing enabled on a GCP Project. The Cloud Run service will automatically use its default Compute Engine Service Account to authenticate with Vertex AI, so no API keys are required.
- Navigate to the
backendfolder via terminal. - Run the deployment script:
./deploy.sh
- The script will output a Service URL like
https://gemini-live-assistant-xxxxx-uc.a.run.app. Convert this to a WebSocket wss URL:wss://gemini-live-assistant-xxxxx-uc.a.run.app/ws
- Copy the four
.csfiles from theunitydirectory into your Unity project'sAssets/Scriptsfolder. - Create an Empty GameObject in your current scene, name it
Gemini Assistant. - Add the
GeminiLiveClientscript to it.- Paste the
wss://...URL into the Server URL property.
- Paste the
- Add the
ScreenStreamerscript to it.- You can leave settings as default. Make sure it targets your Main Camera (or drag a specific Camera object onto it).
- Add the
AudioStreamerscript to it.- Leave the Mic device empty to use your default system microphone.
- Add the
AudioReceiverscript to it.- Unity will automatically attach an
AudioSourcecomponent as well. - Wait! Ensure the
AudioSourcehas Play On Awake checked and its Volume is turned up.
- Unity will automatically attach an
Hit the Play button in Unity.
The system will automatically capture your microphone and screen (at 1 FPS), route it through the Cloud Run proxy to the Gemini API, and stream Gemini's replies back out through the AudioReceiver AudioSource in realtime.
Talk normally as you script or debug in the editor!