Technical brief — ChartLite

Read this if you want the engineering depth without watching the second video. Every claim here traces to source code in the repo or to raw data on the dashboard at https://benchmark.chartlite.health.

1 — Two Gemma 4 sizes, one codebase

ChartLite picks the right on-device LLM for each phone automatically via a single call:

// app/.../extraction/LlmModelManager.kt
fun recommendedTierForRam(ramGb: Double): ModelTier {
    return when {
        ramGb >= 6.0 -> ModelTier.GEMMA_4_E4B   // MediaPipe LiteRT
        ramGb >= 4.0 -> ModelTier.GEMMA_4_E2B   // MediaPipe LiteRT
        else         -> ModelTier.QWEN_3_5_0_8B // MNN-LLM fallback
    }
}

No configuration, no compromise — flagship phones get the better model, mid-tier phones still get Gemma 4, ultra-low-end devices fall back gracefully.

2 — Gemma 4 via MediaPipe LiteRT

Gemma 4 ships as INT4-quantized .task bundles via Google AI Edge LiteRT. GemmaBridge.kt wraps the API directly:

val opts = LlmInference.LlmInferenceOptions.builder()
    .setModelPath(taskFile.absolutePath)
    .setMaxTokens(4096)
    .build()
llm = LlmInference.createFromOptions(context, opts)

Models pulled from huggingface.co/litert-community/gemma-4-E{2,4}B-it-litert-lm on first launch. Native chat-template handling, deterministic seeded sampling, NPU acceleration when present.

3 — Function calling, on-device-flavour

The cloud Gemma family has a native function-calling API. The on-device variant via MediaPipe does not — so we adapt. CdssToolRegistry.kt asks Gemma 4 to emit a JSON array of {name, args} tool calls; the dispatcher parses + executes against the existing StaticCDSS layer deterministically. Four tools registered against the BODHI knowledge graph:

check_drug_drug_interactions(meds: string[])
check_drug_allergy(meds: string[], allergies: string[])
check_drug_condition(meds: string[], diagnoses: string[])
check_triage_urgency(diagnoses: string[])

Reliable on E4B, reasonable on E2B. The clinical-encounter beat in the demo shows the model choosing which two tools to invoke after seeing a prescription photo + patient context — and BODHI's triage table seeing a 4-year-old with pneumonia, a respiratory rate of 40 (alarming for that age) and oxygen saturation of 94% (low), then escalating to EMERGENCY. The language model on its own missed the case because it never combined the three numbers.

4 — BODHI honest audit

The dashboard's three-arm safety design (LLM-alone / production rules / rules + BODHI) shows a gross +26–63 pp lift in safety detection across 12 models when BODHI is wired in. We audited that lift on GPT-5.5's 35 missed dangers caught by Arm 3:

~40 % genuine clinical catch — the LLM never raised the danger (e.g. combining a pneumonia diagnosis with abnormal vital signs into a single EMERGENCY triage; the model had each fact but didn't act on the combination).
~60 % substring-match scoring artefact — the LLM said the equivalent in different words, the substring scorer didn't credit it.
~20 % of BODHI alerts are false positives — a drug flagged as unindicated when it actually was indicated (9 / 134) + a fuzzy referral match firing the wrong rule (16 / 134, e.g. "Chikungunya" suggested on any febrile case).

Net of artifact and noise: a real clinical-safety contribution, largest where it matters most — Qwen 3.5 0.8B goes 3 → 66 %, Gemma 4 e4b 30 → 57 %, Opus 4.7 38 → 80 %. Sonnet 4.6 gains the least: top cloud models already score near the ceiling, so BODHI has less to add.

The implication is the opposite of the marketing default: BODHI is not a generative model substitute, it is a deterministic safety net under clinician judgement. ChartLite renders alerts with severity tiers and an audit trail in the standard medical-terminology system (SNOMED); the clinician decides.

5 — Multilingual by default

Three layers cover the language ladder:

Layer	Coverage	Source
Gemma 4 reasoning	140 + languages	Google model card
Parakeet TDT v3 speech-to-text	26 (English + 25 EU) at 1.69 % word-error rate	NVIDIA, on-device via Sherpa-ONNX
Omnilingual ASR	1,600 + languages	Meta, on-device via ONNX Runtime

ChartLite's ModelDownloader.rankTiersForDevice(language) picks the right speech model per (language × device RAM). The Eka Calculator dataset is Hindi-English code-switched clinical prose (the way Indian clinicians actually talk in the consult room) — a real test of multilingual ability, and Gemma 4 handles it natively without translation.

6 — Reproduce every number

Each of the 12 models × 6 benchmarks × ~54 K model-question evaluations on the dashboard traces to per-(model, case) JSON files preserved in scripts/{benchmark}_raw/. A judge can:

git clone github.com/prismindanalytics/chartlite        # app + integration
git clone github.com/prismindanalytics/clinical-edge-bench  # the benchmark suite
pip install -r scripts/requirements.txt
python3 scripts/benchmark_pharmacology_mcqa.py --models gemma4-e4b --limit 50

…and pull any number off the dashboard with curl-able raw JSON. If a single number doesn't reproduce, that's a bug — please open an issue.

What's deliberately not on the dashboard

We don't claim ChartLite is deployed in any production clinic. It is production-ready and awaiting clinical pilot. We have country configurations (medical codes, formulary, language packs) for South Africa, Ethiopia, Malawi, Kenya, Nigeria, US, UK, India — not deployments.
The 100-encounter synthetic safety benchmark is a directional research signal, not a clinical certification. The "ground truth" answers it grades against were themselves generated by an LLM panel (Opus 4.7 + GPT-5.4 + Gemini 3.1 Pro), two of which also appear among the models we test. We declare this openly in the Methodology tab.
One narrow scoring quirk: BODHI groups lab tests into categories (e.g. "liver-function tests"), so a model that recommends a more specific lab by name sometimes won't get credit. Affects ~1–2 % of total dangers.

Where to look next

Live demo + APK: https://chartlite.health/hackathon
Source code: https://github.com/prismindanalytics/chartlite (Apache 2.0)
Benchmark dashboard: https://benchmark.chartlite.health — every number above is reachable here.
Submission video: see Kaggle Media Gallery.

Apache 2.0, except where BODHI's CC BY-NC 4.0 applies. Built on Google's Gemma 4, MediaPipe LLM Inference, NVIDIA Parakeet TDT v3, Meta Omnilingual ASR, Sherpa-ONNX, llama.cpp, and Eka Care's BODHI clinical knowledge graph.