Why we don't use LLMs for clinical data transformations

When we started building Glance, one of the earliest decisions we made — and one we’ve held to ever since — was to keep large language models out of the clinical data transformation pipeline.

This is a deliberate architectural choice. It’s also increasingly a contrarian one. AI-assisted ETL is everywhere now, and there’s genuine appeal: LLMs are good at pattern matching, they can handle messy inputs, and they reduce the manual effort of schema mapping. We understand the temptation.

But clinical data is different.

The auditing problem

When a Glance ETL pipeline maps a source field to an OMOP concept, a regulator can ask: “Why did that value get mapped to LOINC 4548-4 instead of LOINC 17856-6?” The answer has to be a deterministic rule — one we can point to in code, explain to a clinician, and reproduce on demand.

An LLM gives a probabilistic answer. A different prompt, a different model version, or a different sampling temperature might give a different result. The output isn’t guaranteed to be the same next week. That’s not acceptable when the downstream use is HCC risk scoring, quality measure computation, or anything that touches a payer contract.

The vocabulary is already structured

OMOP CDM ships with the Athena vocabulary: 2.4 million standardized concepts covering SNOMED CT, RxNorm, LOINC, ICD-10, HCPCS, CPT, and dozens of other standards. The mapping problem — turning a messy source value into a standard concept — is not an AI problem. It’s a lookup problem with well-defined rules for when there’s no exact match.

We use deterministic concept matching against the Athena vocabulary, with confidence scoring and a stem table architecture. The same input always produces the same OMOP output. Every transformation is traceable to a specific rule. Your data pipeline behaves the same at 2am as it does at 2pm.

What we do use AI for

We’re not anti-AI. Glance uses Bayesian modeling for risk stratification and population segmentation. We’ve published on interpretable treatment effect estimation for discharge planning. Predicato, our open-source knowledge graph framework, uses local ML models for entity extraction and hybrid search.

The difference is task fitness. For extracting structured knowledge from unstructured documents, local ML models are appropriate. For transforming structured clinical data between standardized formats, deterministic logic is appropriate. Mixing those up doesn’t make you more sophisticated — it makes your pipeline less reliable.

The practical result

Every Glance ETL job is:

Reproducible — run it twice, get the same output
Auditable — trace any output value back to its source rule
Explainable — show a clinician or regulator exactly how a coding decision was made
Version-controlled — mapping rules are code; they live in git and change with intentionality

That’s what clinical data infrastructure needs to be. Not because regulators require it (they do), but because healthcare decisions downstream depend on it.

If you’re evaluating health data platforms and the ETL layer is a black box, ask how the vendor explains a specific mapping decision to a CMS auditor. The answer will tell you a lot.

The auditing problem

The vocabulary is already structured

What we do use AI for

The practical result

Sound Prediction — Health Data Infrastructure