I built the first working Android app that uses Gemma 4 native tool calling to control a phone — fully on-device, no cloud #615

ithiria894 · 2026-04-06T11:18:55Z

ithiria894
Apr 6, 2026

Hey Gemma team 👋

Wanted to share what I built with Gemma 4 the week it dropped.

PokeClaw is a native Android app that runs Gemma 4 E2B entirely on-device using LiteRT-LM v0.10.0 and uses native tool calling to autonomously control the phone through the Android Accessibility API.

The model reads the accessibility tree, picks a tool (tap, swipe, type, open_app, send_message, auto_reply), executes it, reads the new screen state, and loops until the task is done. The entire agentic pipeline runs on the phone's CPU.

I'm a solo developer (CS dropout, no Android experience before this). Gemma 4's native tool calling on LiteRT-LM is what made this possible. Without structured tool call output, the agentic loop would have required brittle text parsing. With it, the model just returns structured calls directly. That's the difference between a hack and a working app.

Some numbers from testing on a CPU-only budget phone:

~45 seconds warmup (optimized down from several minutes through engine initialization and session handoff architecture)
After warmup, the agentic loop is responsive
Phones with Tensor G3/G4, Snapdragon 8 Gen 2/3, Dimensity 9200+ are significantly faster

The hardest part was session management. LiteRT-LM only allows one Conversation per Engine, but the app needs the LLM for both chat and task execution. Had to build an engine handoff system that closes the chat conversation, hands the engine to the task agent, and reclaims it after.

Open source, Apache 2.0: https://github.com/agents-io/PokeClaw

Thank you to @clementfarabet, @OlivierLacombe, and the entire Gemma team for shipping this model under Apache 2.0 with native tool calling. You made it possible for a solo dev to build a working phone agent in two nights.

Would love any feedback from the team on how to get better performance out of LiteRT-LM on CPU, or if there are upcoming optimizations we should watch for.

ithiria894 · 2026-04-06T11:21:05Z

ithiria894
Apr 6, 2026
Author

Some things I ran into while building on Gemma 4 + LiteRT-LM that the team might want to know about:

1. Single session constraint is painful for multi-mode apps

LiteRT-LM only allows one Conversation per Engine. PokeClaw needs the LLM for chat mode AND task execution AND auto-reply — three features competing for one session. Right now I have to close and recreate conversations every time I switch modes. It works, but it's fragile and adds latency. Would love to see support for multiple concurrent conversations on a single engine, or at least a faster session swap mechanism.

2. Engine warmup on CPU is the biggest UX killer

On CPU-only devices, loading the engine takes ~45 seconds (optimized down from minutes). During this time the user sees nothing and thinks the app crashed. If there's a way to do progressive loading or get a "ready" callback earlier, that would help a lot. Even a progress percentage would let us show a loading indicator.

3. Session conflict errors are hard to debug

"A session already exists" error when trying to create a new Conversation while another one hasn't fully closed. The close() call seems to be async but there's no way to await it. I had to add Thread.sleep(1000) delays as a workaround, which is not great. A synchronous close or a callback when the session is fully released would be ideal.

4. Auto-reply requires engine recycling

For auto-reply (monitoring notifications and generating responses), I need the LLM but the chat UI might be holding the session. Currently I call EngineHolder.close() then getOrCreate() to force-release. This works but it's destroying and rebuilding the entire engine just to get a free session. Expensive on CPU.

5. Tool calling works great — no complaints

Native tool calling on v0.10.0 is solid. Structured output, clean API, no parsing needed. This is what made the whole project possible. The model reliably returns well-formed tool calls even on the 2.3B E2B variant. Genuinely impressed.

None of these are blockers — PokeClaw ships and works today. But fixing 1-3 would make on-device agentic apps significantly easier to build. Happy to provide more details or test any changes.

0 replies

aasimansari1 · 2026-05-20T02:37:51Z

aasimansari1
May 20, 2026

That's an impressive Android integration! Running Gemma natively on device is a big deal for privacy-sensitive apps.

For anyone studying LLM deployment and interview prep in this space: ML Interview Prep — has sections on model compression, quantization, and on-device deployment tradeoffs. Might be useful context for mobile ML work!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I built the first working Android app that uses Gemma 4 native tool calling to control a phone — fully on-device, no cloud #615

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

I built the first working Android app that uses Gemma 4 native tool calling to control a phone — fully on-device, no cloud #615

Uh oh!

ithiria894 Apr 6, 2026

Replies: 2 comments

Uh oh!

ithiria894 Apr 6, 2026 Author

Uh oh!

aasimansari1 May 20, 2026

ithiria894
Apr 6, 2026

ithiria894
Apr 6, 2026
Author

aasimansari1
May 20, 2026