The Core Mission
The Problem: The "Fluency Mask"
Neural Machine Translation (NMT) models are excellent at mimicry. They produce
sentences that sound perfect but are factually wrong.
In technical domains, this "Fluency Mask" is dangerous.
Example: A generic NMT model translates a semiconductor patent claim:
Source (English):
"...wherein the current flows through the substrate layer..."
❌ NMT Output (Fluent but Wrong):
"...le courant circulant à travers la couche de substrat..."
Problem: "current" → "courant" (correct word, wrong domain)
BUT: Context confusion — model treats it as electrical current
when the patent actually describes water flow in a cooling system.
The model chose "courant électrique" because electricity appears more
frequently in its training data than fluidic flow contexts.
It optimized for statistical probability, not technical accuracy.
The Solution: Data-Driven Calibration
Semantic Integrity is the systematic process of aligning the
model's vector space with the physical reality of the invention. This is not a
one-time fix — it is an ongoing curation process fed by expert
Human-in-the-Loop (HITL) validation.
The Goal: A model that doesn't just predict likely words,
but understands domain-specific constraints that override statistical frequency.
The Three Levels of Semantic Fidelity
We align the model across three critical layers, each addressing a distinct
failure mode:
Level 1: Anti-Hallucination (The Safety Layer)
Objective: Elimination of "Critical Failures" — instances
where the model invents concepts not present in the source text.
The Challenge: NMT models generate translations token-by-token
based on probability distributions. When faced with ambiguous or rare terms,
they "hallucinate" by selecting high-frequency alternatives that fit
grammatically but distort meaning.
Common Hallucination Patterns in Patent Translation
Pattern 1: Polysemy Misresolution
Source: "The chip performs a U-turn during signal routing..."
❌ Generic NMT:
"La puce effectue un demi-tour pendant le routage du signal..."
(Treats "U-turn" as literal vehicular maneuver)
✓ Domain-Aligned Model:
"La puce effectue un retour en U pendant le routage du signal..."
(Recognizes "U-turn" as chip routing terminology)
Pattern 2: Context-Free Substitution
Source: "...the package comprises a leadframe..."
❌ Generic NMT:
"...l'emballage comprend un cadre de plomb..."
("leadframe" → "cadre de plomb" literally "lead frame",
suggesting material composition rather than semiconductor component)
✓ Domain-Aligned Model:
"...le boîtier comprend un cadre de connexion..."
(Recognizes "leadframe" as standard semiconductor packaging term)
Why It Matters: Hallucinations can
broaden or narrow the scope of protection in ways that
contradict the inventor's intent. A "lead frame" made of lead is not the
same invention as a "leadframe" (connection structure).
→ View Level 1 Case Catalog
Level 2: Terminological Precision (The Accuracy Layer)
Objective: Override generic synonyms with specific,
client-approved nomenclature.
The Challenge: NMT models see "plastic," "resin," "polymer,"
and "thermoplastic" as interchangeable because they share semantic vector
space. But in patent prosecution,
these terms are legally distinct.
Source: "The housing is formed from a thermoplastic resin..."
❌ Generic NMT (Random Synonym Selection):
Translation 1: "Le boîtier est formé d'une résine thermoplastique..."
Translation 2: "Le boîtier est formé d'un plastique thermoplastique..."
Translation 3: "Le boîtier est formé d'un polymère thermoplastique..."
Problem: All three are linguistically correct, but only ONE matches
the client's approved terminology and prior art landscape.
Legal Impact: If a competitor's patent uses "polymer" and
your client's patent uses "thermoplastic resin," the terminological
distinction may be the basis for establishing product differentiation.
→ View Level 2 Case Catalog
Level 3: In-Context Consistency (The Coherence Layer)
Objective: Ensure that terminology remains stable across
the entire document.
The Challenge: Generic NMT models have
no long-term memory. Each sentence is translated
semi-independently, leading to catastrophic inconsistency in multi-claim
patent documents.
The Consistency Failure Pattern
Claim 1: "A device comprising a guide member..."
→ "Dispositif comprenant un élément de guidage..." ✓
Claim 5: "The device of claim 1, wherein the guide member..."
→ ❌ "Le dispositif selon la revendication 1, le guide..."
(Switches from "élément de guidage" → "guide")
Claim 9: "The device of claim 1, wherein the guide member..."
→ ❌ "Le dispositif selon la revendication 1, le membre directeur..."
(Switches to entirely different term "membre directeur")
Why It Matters: Patent examiners and courts interpret
term variation as intentional claim differentiation. If
"guide member" becomes three different French terms, the claims may be
rejected for indefiniteness.
→ View Level 3 Case Catalog