Gemma 4 Good Hackathon · Deep-dive companion

Technical brief

One-page engineering companion to the 3-minute submission video — every claim traces to source code in the repo or to raw data on the benchmark dashboard.

Read this if you want the engineering depth without watching the second video. Every claim here traces to source code in the repo or to raw data on the dashboard at https://benchmark.chartlite.health.


1 — Two Gemma 4 sizes, one codebase

ChartLite picks the right on-device LLM for each phone automatically via a single call:

// app/.../extraction/LlmModelManager.kt
fun recommendedTierForRam(ramGb: Double): ModelTier {
    return when {
        ramGb >= 6.0 -> ModelTier.GEMMA_4_E4B   // MediaPipe LiteRT
        ramGb >= 4.0 -> ModelTier.GEMMA_4_E2B   // MediaPipe LiteRT
        else         -> ModelTier.QWEN_3_5_0_8B // MNN-LLM fallback
    }
}

No configuration, no compromise — flagship phones get the better model, mid-tier phones still get Gemma 4, ultra-low-end devices fall back gracefully.

2 — Gemma 4 via MediaPipe LiteRT

Gemma 4 ships as INT4-quantized .task bundles via Google AI Edge LiteRT. GemmaBridge.kt wraps the API directly:

val opts = LlmInference.LlmInferenceOptions.builder()
    .setModelPath(taskFile.absolutePath)
    .setMaxTokens(4096)
    .build()
llm = LlmInference.createFromOptions(context, opts)

Models pulled from huggingface.co/litert-community/gemma-4-E{2,4}B-it-litert-lm on first launch. Native chat-template handling, deterministic seeded sampling, NPU acceleration when present.

3 — Function calling, on-device-flavour

The cloud Gemma family has a native function-calling API. The on-device variant via MediaPipe does not — so we adapt. CdssToolRegistry.kt asks Gemma 4 to emit a JSON array of {name, args} tool calls; the dispatcher parses + executes against the existing StaticCDSS layer deterministically. Four tools registered against the BODHI knowledge graph:

Reliable on E4B, reasonable on E2B. The clinical-encounter beat in the demo shows the model choosing which two tools to invoke after seeing a prescription photo + patient context — and BODHI's triage table seeing a 4-year-old with pneumonia, a respiratory rate of 40 (alarming for that age) and oxygen saturation of 94% (low), then escalating to EMERGENCY. The language model on its own missed the case because it never combined the three numbers.

4 — BODHI honest audit

The dashboard's three-arm safety design (LLM-alone / production rules / rules + BODHI) shows a gross +26–63 pp lift in safety detection across 12 models when BODHI is wired in. We audited that lift on GPT-5.5's 35 missed dangers caught by Arm 3:

Net of artifact and noise: a real clinical-safety contribution, largest where it matters most — Qwen 3.5 0.8B goes 3 → 66 %, Gemma 4 e4b 30 → 57 %, Opus 4.7 38 → 80 %. Sonnet 4.6 gains the least: top cloud models already score near the ceiling, so BODHI has less to add.

The implication is the opposite of the marketing default: BODHI is not a generative model substitute, it is a deterministic safety net under clinician judgement. ChartLite renders alerts with severity tiers and an audit trail in the standard medical-terminology system (SNOMED); the clinician decides.

5 — Multilingual by default

Three layers cover the language ladder:

Layer Coverage Source
Gemma 4 reasoning 140 + languages Google model card
Parakeet TDT v3 speech-to-text 26 (English + 25 EU) at 1.69 % word-error rate NVIDIA, on-device via Sherpa-ONNX
Omnilingual ASR 1,600 + languages Meta, on-device via ONNX Runtime

ChartLite's ModelDownloader.rankTiersForDevice(language) picks the right speech model per (language × device RAM). The Eka Calculator dataset is Hindi-English code-switched clinical prose (the way Indian clinicians actually talk in the consult room) — a real test of multilingual ability, and Gemma 4 handles it natively without translation.

6 — Reproduce every number

Each of the 12 models × 6 benchmarks × ~54 K model-question evaluations on the dashboard traces to per-(model, case) JSON files preserved in scripts/{benchmark}_raw/. A judge can:

git clone github.com/prismindanalytics/chartlite        # app + integration
git clone github.com/prismindanalytics/clinical-edge-bench  # the benchmark suite
pip install -r scripts/requirements.txt
python3 scripts/benchmark_pharmacology_mcqa.py --models gemma4-e4b --limit 50

…and pull any number off the dashboard with curl-able raw JSON. If a single number doesn't reproduce, that's a bug — please open an issue.

What's deliberately not on the dashboard

Where to look next


Apache 2.0, except where BODHI's CC BY-NC 4.0 applies. Built on Google's Gemma 4, MediaPipe LLM Inference, NVIDIA Parakeet TDT v3, Meta Omnilingual ASR, Sherpa-ONNX, llama.cpp, and Eka Care's BODHI clinical knowledge graph.