Submission for the Gemma 4 Good Hackathon · Kaggle × Google DeepMind

Voice-first, vision-capable clinical AI.
On Gemma 4. Entirely on-device.

ChartLite captures any clinical encounter by voice or photo and runs the entire loop on the phone itself — speech-to-text, a structured medical note, and a safety check that cross-references every drug, diagnosis, and symptom against a real medical database — using Gemma 4 on Android, fully offline. We benchmarked it against 12 models on 6 independent datasets before we shipped.

Try the app

Debug-signed APK runs on any Android 8+ device (330 MB). First launch downloads the Gemma 4 weights from Hugging Face (~2.8 GB for E4B, ~1.5 GB for E2B). Demo recorded on Galaxy Fold 7 running Gemma 4 E2B — both E2B and E4B ship; tier routing picks based on RAM.

Download ChartLite.apk →

Read the technical brief

One-page engineering companion to the 3-minute video: hardware-aware Gemma 4 routing, MediaPipe LiteRT integration, function calling against BODHI, and the honest audit of BODHI's safety lift.

Open the technical brief →

See the benchmark

12 models × 6 datasets · ~54,000 model-question evaluations · every number traces to raw JSON. Methodology, headline findings, and the honest BODHI audit on one readable page.

Open the benchmark →

Read the writeup

Full submission writeup: architecture, why Gemma 4, the multi-model benchmark, the honest BODHI audit, multimodal capture, and how to verify every number.

Open the writeup →

Read the code

Apache 2.0. App + 8,000-line reproducible benchmark suite. Disprove any number on the dashboard with the raw data.

github.com/prismindanalytics/chartlite →

Auto-fits the phone

ChartLite picks the right Gemma 4 size automatically: the larger E4B on 6 GB+ phones, the smaller E2B on 4 GB phones, and a tiny fallback on 3 GB phones. Same app, every device.

View the routing code →

The numbers we ship against

82.74

Gemma 4 e4b on the peer-reviewed clinical-note benchmark (ACI-Bench)

Ties Claude Haiku 4.5 (82.72) · 5 splits

82.6

Gemma 4 e4b on 156 real clinician-annotated transcripts (Eka Care)

Within 5 points of Claude Haiku 4.5

Clinical artifact types one button can read

Gemma 4 chooses which safety checks to run

Per-visit cloud cost when running on-device

vs $0.24–4.50 per 100 visits on cloud tiers

An honest note on the BODHI lift

BODHI lifts gross safety detection by 26–63 percentage points across the 12 models we tested. We publish the decomposition so you can verify it:

~40% of the lift is genuine clinical catch — the LLM never raised the danger.
~60% is substring-match scoring artifact — the LLM said the equivalent in different words and our scorer didn't credit it.
~20% of BODHI alerts are false positives (drug-actually-indicated, fuzzy-match referral noise).

Net of artifact and noise, the lift is most pronounced where deployment is real: Qwen 3.5 0.8B goes 3% → 66%; Gemma 4 e4b 30% → 57%; least pronounced on saturated frontier. Full audit in the technical brief.

Voice-first, vision-capable clinical AI.On Gemma 4. Entirely on-device.