Semantic Integrity — Case Study Library
3 Levels of Semantic Validation in Patent Translation
Neural Machine Translation optimizes for statistical fluency, not legal accuracy. These case studies document the systematic failure modes that alter patent scope — and the alignment protocols that correct them.
Term-Level Precision
Domain-Specific Terminology & Hallucination Prevention
Generic NMT models optimize for statistical probability, not technical accuracy. They hallucinate plausible-sounding but factually wrong translations by choosing high-frequency terms over domain-correct ones. In polysemous contexts — "current" as water vs. electricity, "soft" as texture vs. mechanical compliance — models default to the statistically dominant meaning, creating semantic errors that alter patent scope.
View Case Studies →Phrase-Level Accuracy
Multi-Word Technical Expressions
Technical terms often function as indivisible semantic units that lose meaning under word-by-word translation. Generic models break compound expressions into components and translate them independently, destroying the technical relationship that defines the concept. A "gate oxide layer" is a unified semiconductor concept, not three words that can be recombined arbitrarily.
View Case Studies →In-Context Consistency
Document-Wide Terminological Stability
NMT models have no long-term memory — each sentence is translated semi-independently. In multi-claim patent documents, this creates catastrophic term variation where a single component accumulates three different French translations across 15 claims. Patent examiners interpret term variation as intentional claim differentiation, triggering indefiniteness rejections under 35 U.S.C. § 112(b).
View Case Studies →