Intelligence that fits in your pocket.
A complete clinical AI pipeline — speech recognition, structured extraction, and decision support — running entirely on a $100 phone.
End-to-end pipeline
From the clinician's voice to a structured clinical encounter in six stages, all on-device. The structured output feeds directly into the billing engine for automated ICD-10 to CPT/HCPCS claim generation and SOAP note production.
Omnilingual speech recognition
Meta Omnilingual ASR — 1600+ languages, CTC architecture, fully offline via ONNX Runtime. Supports all ChartLite target languages (Zulu, Xhosa, Amharic, Chichewa, English, and more).
| Tier | Model | Quantization | Size | Target |
|---|---|---|---|---|
| LITE | Omnilingual ASR 300M | INT8 | 365 MB | Galaxy A03/A04 (<4 GB RAM) |
| STANDARD | Omnilingual ASR 1B | INT8 | 1.03 GB | Mid-range+ (4+ GB RAM) |
Dual-mode ASR: Omnilingual ONNX on-device when offline, Google Speech-to-Text when connected. Automatic fallback ensures voice capture always works. Apache 2.0 licensed.
Retrieval-augmented extraction
Instead of stuffing 815 reference entries into every prompt, we retrieve only what's relevant.
Index
At app startup, TF-IDF vector store indexes 300 ICD-10 codes + 515 formulary drugs (~20–50ms).
Retrieve
Per transcript, cosine similarity finds 10–15 most relevant codes and drugs.
Prompt
Compact prompt: instructions + retrieved references + transcript.
Generate
Qwen 3.5 processes with 80% more context window available.
| Component | Before (static prompt) | After (RAG pipeline) |
|---|---|---|
| Reference data | ~6,000 tokens (815 entries) | ~400–800 tokens (15–25 entries) |
| Available for transcript | ~700 tokens | ~5,000+ tokens |
| Available for generation | ~1,000 tokens | ~2,000+ tokens |
| Disambiguation quality | Low (no keywords/aliases) | High (retrieved entries include keywords + local terms) |
Unified JSON extraction format
A single benchmark JSON schema shared by both cloud (Claude) and on-device (Qwen 3.5) extractors — consistent output regardless of inference path.
{
"diagnoses": [
{
"icd10Code": "J06.9",
"description": "Upper resp. infection",
"isPrimary": true,
"confidence": 0.9
}
],
"medications": [
{
"formularyCode": "0097",
"name": "Paracetamol",
"dose": 500,
"unit": "mg",
"frequency": "TDS"
}
]
}
One schema, every model. Hallucination guards and field validation run identically on cloud and on-device results. The structured JSON output feeds directly into the billing module for automated insurance claim generation (ICD-10 to CPT/HCPCS mapping, E/M level coding) and SOAP note production.
Dictation-first for on-device
Short structured snippets instead of full conversation recording — optimized for small language models.
Clinician presses mic
Dictates "BP 168 over 98, pulse 92" (~5–30 seconds)
On-device ASR transcribes instantly
Meta Omnilingual ASR (1600+ languages) runs in real-time on device via ONNX
Regex extraction provides immediate preview
No model load required — instant structured feedback
Snippets accumulate throughout consultation
Each dictation adds to the encounter transcript
Single LLM pass at finalization
All snippets processed together for a coherent structured encounter
Why dictation mode?
- 0.8B models can't reliably parse full patient-doctor conversations
- Short structured phrases produce clean, extractable text
- Regex preview gives instant feedback — LLM confirms at the end
- Single model load saves battery (2–3W sustained during inference)
Battery-conscious processing
Model loads once for N patients instead of N times.
| Trigger | Behavior |
|---|---|
| Manual | Clinician taps "Process Queue" during a break |
| Urgent | Immediate single extraction for referral/emergency |
| End of session | Process remaining queue before closing |
Quantized for the edge
| Tier | Model | Quantization | Size | Context Window |
|---|---|---|---|---|
| SMALL | Qwen 3.5 0.8B | Q4_K_M | 560 MB | 32,768 tokens |
| LARGE | Qwen 3.5 2B | Q4_K_M | 1.5 GB | 32,768 tokens |
Hardware-aware selection: 0.8B for 2GB devices, 2B for 4GB+. Both run via llama.cpp built from source.
Fits on a budget phone
| Device Class | Example | RAM | ASR | LLM | Total Footprint |
|---|---|---|---|---|---|
| Budget | Galaxy A03 | 2 GB | LITE (365 MB) | Qwen 0.8B (560 MB) | ~950 MB |
| Mid-range | Galaxy A14 | 4 GB | STANDARD (1.03 GB) | Qwen 2B (1.5 GB) | ~2.6 GB |
| High-end | Galaxy A54 | 6+ GB | STANDARD (1.03 GB) | Qwen 2B (1.5 GB) | ~2.6 GB |