The full submission writeup

Repo: github.com/prismindanalytics/chartlite (Apache 2.0)
Independent benchmark: benchmark.chartlite.health
APK: v1.0.0-hackathon release
Live demo: chartlite.health/hackathon
3-minute demo: youtu.be/zBWnh5FDVkw

Tracks entered: Main · Impact (Health & Sciences primary; Safety & Trust also fits) · Special Technology (LiteRT primary — we ship Gemma 4 via Google AI Edge’s MediaPipe LiteRT; Cactus also fits — hardware-aware tier routing and Gemma function-calls pick BODHI tools by artifact type).

ChartLite is an Android app. A clinician records a visit by voice or photo and walks away with a structured clinical note, a billing claim, and a real-time safety check on every drug and diagnosis — running entirely on the phone, no internet.

The 3-minute demo: a mother brings her 4-year-old daughter to a rural clinic with fever and cough. The doctor dictates the encounter and prescribes amoxicillin syrup. The patient is allergic to penicillin. Gemma 4 — running on the phone — catches the error before the prescription leaves the room. Then the same alert fires from a different input: a handwritten prescription captured on camera. Same model, two modalities, one safety net.

1. The problem

Frontline clinical workers spend a third of their day on paperwork. The places that need clinical AI most have the least connectivity, and most “clinical AI” tools assume a cloud call per visit — which fails on rural 2G.

ChartLite runs the entire visit on a mid-range Android phone with the data plan off: voice transcription, structured note, safety check against a real medical knowledge graph, encrypted SMS relay.

2. We benchmarked before we shipped

A 3% hallucination on dosages is a coroner’s inquest, not a rounding error. So we built the benchmark first.

12 models — 6 cloud (Claude Opus / Sonnet / Haiku 4.x, GPT-5.5 / 5.4 / 4.1) and 6 on-device (Qwen 3.5 0.8B / 2B / 9B, Gemma 4 e2b / e4b, MedGemma 1.5) — across six clinical evals: 100 synthetic visits; 156 real doctor-patient transcripts annotated by clinicians at Eka Care (India’s largest digital-health platform, 80M+ patient records); ACI-Bench (a peer-reviewed benchmark that converts doctor-patient dialogue into the SOAP note clinicians write after every visit — Subjective, Objective, Assessment, Plan; 207 transcripts × 5 splits); CRESCENDDI (a public dataset of dangerous drug-drug interactions); NFI Pharmacology MCQA (925-question pharmacology exam from India’s National Formulary, multiple choice); and the Medical Calculator Eval (1,066 clinical-math vignettes across 26 specialties). Per-(model, case) JSON, scoring code, and methodology open at benchmark.chartlite.health.

Two findings shaped what we shipped:

Gemma 4 e4b is the first on-device model good enough to ship for clinical note generation. On ACI-Bench (scored 0–100) it lands at 82.74, tied with Claude Haiku 4.5 (82.72) and 6 points behind the GPT-5.5 leader (88.48). On 156 real Eka transcripts it scores 82.6 — within 5 points of Haiku 4.5. Zero API cost, runs entirely on the phone. Where it lags — pharmacology MCQ and drug-dose math — ChartLite delegates to BODHI instead of asking the model.

BODHI is a medical knowledge graph from Eka Care — 779 diagnoses, 1,186 drugs, 812 lab tests, 10,352 symptom-to-diagnosis links — published as open data for non-commercial use. It’s a deterministic lookup: if a drug appears with an allergy on the patient record, the rule fires. When we ran every model with BODHI as an additional safety check, raw alert-detection lifted 26–63 percentage points. Honest audit of that lift: ~40% is a real catch the model would otherwise miss; ~60% is scoring artifact (the model said the right thing in different words and our text-match scorer didn’t credit it); ~20% of BODHI alerts are false positives (e.g. fuzzy match firing the wrong rule). Net of artifact and noise, value lands where it matters most — small on-device models that frontline clinics actually run: Qwen 3.5 0.8B 3% → 66%; Gemma 4 e4b 30% → 57%. Top cloud models already near the ceiling gain least. BODHI is a safety net under clinician judgement, not a substitute.

3. How ChartLite works

Voice → text: NVIDIA Parakeet TDT v3 (1.69% word-error rate on English) on 4 GB+ phones; Meta Omnilingual ASR (1,600+ languages) on smaller devices.
Note generation: Gemma 4 e2b or e4b on-device, or Claude Sonnet over the network — same downstream code.
Encrypted database on the international medical-records standard (HL7 FHIR), with the encryption online banking uses (AES-256-GCM).
Safety checks: drug allergies, drug-drug interactions, dose-by-weight, vitals out of range — plus BODHI for diagnosis-level checks.
Encrypted SMS relay: a whole visit packed into one 160-character SMS, for clinics on 2G.
Billing: ICD-10 (diagnosis) → CPT / HCPCS (procedure / supply) codes for insurance claims.
Phone-to-phone sync over Bluetooth or local WiFi when offline.

4. Multimodal capture: 8 artifacts, one button

Clinicians work with paper. One “📷 Capture clinical artifact” button sends a photo to Gemma 4’s vision and pipes the result into the BODHI safety check:

Artifact	What gets extracted
Pill bottle / medication package	Drug, dose, route, frequency, expiry, manufacturer
Lab report	Test, value, unit, reference range, flag
Rapid diagnostic test (malaria / HIV / COVID)	Test type, result, visible bands
Vital-signs device (BP cuff, pulse oximeter)	Vital, value, unit
Referral letter	Sending facility, diagnosis, reason, urgency
Vaccine card / Yellow Card	Vaccine, date, dose number, batch, route
Handwritten prescription	Drug, dose, route, frequency, duration
Discharge summary	Diagnosis, meds, follow-up, alerts

Tap → photo → Gemma 4 reads the image → Gemma 4 picks which safety checks to run (drug-drug? drug-allergy? wrong drug for condition? emergency triage?) → the result renders with structured fields, the tool calls Gemma chose, and any alerts.

This is how a 4-billion-parameter model becomes safe in a clinical setting: multimodal perception, structured tool calls, deterministic medical lookups.

5. Deployment readiness

Privacy. Voice and model run on the phone. Nothing leaves the device unless the clinician syncs. The SMS relay encrypts each visit with AES-256-GCM (the same algorithm online banking uses).

Footprint. Gemma 4 e2b in 4-bit quantization is ~1.6 GB and runs on 2 GB-RAM phones. e4b is ~3.1 GB and needs 4 GB. The app ships both and picks one by RAM at install. The submission demo runs e2b on a Galaxy Fold 7: vision-extraction + safety-tool trace takes 30–50 s end-to-end.

Governance. BODHI is © Eka Care, CC BY-NC 4.0 — commercial deployment needs an Eka license. Everything else (benchmark, methodology, app) is Apache 2.0. ChartLite is production-ready and awaiting clinical pilot.

6. What we built for this hackathon

Gemma 4 vision wired into the model dispatcher; one session handles text and image.
Three new artifact prompts (vaccine card, handwritten Rx, discharge summary) on top of five existing.
Function-calling refactor so Gemma 4 picks which of four safety checks to run per artifact.
Universal capture button on the encounter screen.
Default on-device model flipped to Gemma 4 based on the benchmark.

Every dashboard number links to its raw JSON. The Methodology tab declares the synthetic-data and self-bias soft spots and records exact API model IDs. If a judge wants to disprove a single number, the raw data is there.

ChartLite is independent work. BODHI © Eka Care, CC BY-NC 4.0. Built on MediaPipe LLM Inference and Google’s Gemma 4. Apache 2.0 except where the BODHI license applies.