I built the first working Android app that uses Gemma 4 native tool calling to control a phone — fully on-device, no cloud #615
Replies: 2 comments
-
|
Some things I ran into while building on Gemma 4 + LiteRT-LM that the team might want to know about: 1. Single session constraint is painful for multi-mode apps LiteRT-LM only allows one Conversation per Engine. PokeClaw needs the LLM for chat mode AND task execution AND auto-reply — three features competing for one session. Right now I have to close and recreate conversations every time I switch modes. It works, but it's fragile and adds latency. Would love to see support for multiple concurrent conversations on a single engine, or at least a faster session swap mechanism. 2. Engine warmup on CPU is the biggest UX killer On CPU-only devices, loading the engine takes ~45 seconds (optimized down from minutes). During this time the user sees nothing and thinks the app crashed. If there's a way to do progressive loading or get a "ready" callback earlier, that would help a lot. Even a progress percentage would let us show a loading indicator. 3. Session conflict errors are hard to debug "A session already exists" error when trying to create a new Conversation while another one hasn't fully closed. The close() call seems to be async but there's no way to await it. I had to add Thread.sleep(1000) delays as a workaround, which is not great. A synchronous close or a callback when the session is fully released would be ideal. 4. Auto-reply requires engine recycling For auto-reply (monitoring notifications and generating responses), I need the LLM but the chat UI might be holding the session. Currently I call EngineHolder.close() then getOrCreate() to force-release. This works but it's destroying and rebuilding the entire engine just to get a free session. Expensive on CPU. 5. Tool calling works great — no complaints Native tool calling on v0.10.0 is solid. Structured output, clean API, no parsing needed. This is what made the whole project possible. The model reliably returns well-formed tool calls even on the 2.3B E2B variant. Genuinely impressed. None of these are blockers — PokeClaw ships and works today. But fixing 1-3 would make on-device agentic apps significantly easier to build. Happy to provide more details or test any changes. |
Beta Was this translation helpful? Give feedback.
-
|
That's an impressive Android integration! Running Gemma natively on device is a big deal for privacy-sensitive apps. For anyone studying LLM deployment and interview prep in this space: ML Interview Prep — has sections on model compression, quantization, and on-device deployment tradeoffs. Might be useful context for mobile ML work! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey Gemma team 👋
Wanted to share what I built with Gemma 4 the week it dropped.
PokeClaw is a native Android app that runs Gemma 4 E2B entirely on-device using LiteRT-LM v0.10.0 and uses native tool calling to autonomously control the phone through the Android Accessibility API.
The model reads the accessibility tree, picks a tool (tap, swipe, type, open_app, send_message, auto_reply), executes it, reads the new screen state, and loops until the task is done. The entire agentic pipeline runs on the phone's CPU.
I'm a solo developer (CS dropout, no Android experience before this). Gemma 4's native tool calling on LiteRT-LM is what made this possible. Without structured tool call output, the agentic loop would have required brittle text parsing. With it, the model just returns structured calls directly. That's the difference between a hack and a working app.
Some numbers from testing on a CPU-only budget phone:
The hardest part was session management. LiteRT-LM only allows one Conversation per Engine, but the app needs the LLM for both chat and task execution. Had to build an engine handoff system that closes the chat conversation, hands the engine to the task agent, and reclaims it after.
Open source, Apache 2.0: https://github.com/agents-io/PokeClaw
Thank you to @clementfarabet, @OlivierLacombe, and the entire Gemma team for shipping this model under Apache 2.0 with native tool calling. You made it possible for a solo dev to build a working phone agent in two nights.
Would love any feedback from the team on how to get better performance out of LiteRT-LM on CPU, or if there are upcoming optimizations we should watch for.
Beta Was this translation helpful? Give feedback.
All reactions