Level 1: Term-Level Precision
Domain-Specific Terminology & Hallucination Prevention
Generic NMT models optimize for statistical probability, not technical accuracy. They hallucinate plausible-sounding but factually wrong translations by choosing high-frequency terms over domain-correct ones. In polysemous contexts, models default to the statistically dominant meaning, creating semantic errors that alter patent scope — often invisibly, behind fluent output.
| ID | Error Type | Case Summary | |
|---|---|---|---|
| L1-001 | Hallucination | Semantic Integrity & Domain Disambiguation — "1-Hot" Hallucination. In digital logic, "1-hot encoding" refers to a specific binary state. Generic NMT models, biased by general corpus data, hallucinated a thermal property (1-chaud), interpreting "hot" as temperature rather than a logical value. This created a "Thermal Hallucination" where a circuit was described as having a high temperature. The alignment protocol uses NER tagging to force the correct "Digital Logic" domain interpretation (1 parmi N). | View PDF |
| L1-002 | Polysemy | Polysemy Resolution — "U-Turn" (Traffic vs. Optics). The term "U-turn" appears billions of times in traffic contexts but has a specific meaning in photonics (a waveguide shape). Generic NMT defaulted to the high-frequency traffic meaning (demi-tour), creating nonsensical claims about chips performing road maneuvers. This case tracks a "Drift Sequence" where the model produced three different wrong translations. Alignment was achieved by enforcing domain-specific context markers ("chip", "waveguide") to override the statistical bias. | View PDF |
| L1-003 | Hallucination | Semantic Hallucination — The "Phantom Component". A LiDAR patent claim specified a 2x2 coupler (signal splitter). The generic model hallucinated a completely different component: lentille de divergence (diverging lens), likely associating "2x2" and "optical" with beam spreading. This substitution of a coupler for a lens would render the patent invalid for Lack of Enablement. The solution involved a "Term-Level Drift Detection" workflow to penalize creative substitutions in hardware components. | View PDF |
| L1-004 | Synonym Drift | In-Context Synonym Drift — The "Elasticity" Oscillation. In medical balloon technology, "Expandable" (volume increase / inflation) and "Extensible" (elastic stretching) represent distinct physical properties. The generic NMT model exhibited "Goldfish Memory," oscillating between expansible and extensible within the same claim. This inconsistency technically describes a device that shifts physical properties mid-sentence, risking rejection for Indefiniteness. The alignment protocol implements "Global Term-Locking" to enforce the canonical term expansible and strictly reject synonyms once the domain is set. | View PDF |
| L1-005 | Hallucination | Token Prediction Failure — The "Guilty" Device. Technical neologisms like couplable (connectable) are rare in general training data. Generic NMT models, interpreting the rare token as a typo, probabilistically "auto-corrected" it to the high-frequency word coupable (Guilty). This hallucination resulted in a patent claim legally asserting that an electronic reader was "guilty with the mobile device," rendering the claim nonsensical. The alignment protocol implements "Levenshtein Distance Protection" to strictly preserve rare technical morphology against statistical smoothing. | View PDF |
| L1-006 | Morphological | Morphological Calque — The "Electro-Sans" Neologism. The term "Electroless" refers to a specific autocatalytic chemical process. Generic NMT models, treating the word as separate components, tokenized it as [Electro] + [Less] and translated the suffix literally to create Électro-sans (Electro-Without). This hallucination invents a nonsensical French word that fails to describe the actual chemical mechanism. The alignment protocol implements "Morphological Parsing" to correctly map technical suffixes to their conceptual equivalents rather than translating them literally. | View PDF |
| L1-007 | Polysemy | Domain Polysemy — The "Cuddly" Exosuit ("Soft" vs. "Douce"). In the domain of Soft Robotics, "Soft" refers to mechanical compliance (Souple). The generic NMT model selected the high-frequency textural meaning Douce (Soft to the touch / Gentle), resulting in a patent claiming a "Gentle Exosuit" rather than a "Flexible / Non-Rigid Exosuit." The model also oscillated between Douce and Souple across claims. The alignment protocol enforces "Domain-Specific Disambiguation" to map [Soft + Robotics] → [Souple]. | View PDF |
| L1-008 | Register Failure | Register Failure — The "Medical Mud" Incident. The term "Ice Slurry" refers to a sterile suspension of ice crystals used in medical thermodynamics. The generic NMT model selected the high-frequency construction / mining definition: Boue (Mud / Sludge). This created a catastrophic register mismatch where a sterile surgical product was described as Boue médicale (Medical Mud), rendering the claim physically impossible and commercially repulsive. The alignment protocol enforces "Negative Terminology Constraints" to block industrial terms in sterile contexts. | View PDF |
| L1-009 | Polysemy | Polysemy & Oscillation — The "Tiny Bus Driver" (Conducteur vs. Entraîneur). In a mechanical inhaler patent, the term "Driver" referred to a gear or actuator. The generic NMT model translated it in Claim 1 as Conducteur (Vehicle Driver / Electrical Conductor), implying the device contained an electrical wire or a human operator. In Claim 3, it oscillated to the correct term Entraîneur, violating the patent rule of "Same Term = Same Feature." The alignment protocol enforces "Variable-Based Domain Locking" to map [Mechanical] + [Driver] to Entraîneur and back-propagate this choice globally. | View PDF |
| L1-010 | Register Failure | Terminology Register Hallucination — The "Windows & Doors" Trap. In an orthopaedic surgical tool patent, the generic NMT model translated "Strike Assembly" as Ensemble gâche. While technically correct in architecture, a gâche in French refers specifically to a door or window strike plate — a static component. This imported residential hardware terminology into a high-precision medical power tool, creating a nonsensical and legally indefensible claim. The alignment protocol enforces "Domain-Aware Disambiguation" to maintain the mechanical register (Ensemble de frappe). | View PDF |
| L1-011 | Hallucination | Semantic Hallucination & Hybrid Drift — The "Coupled In" Trap. In a waveguide display patent, the specific active process of "in-coupling" light was mistranslated. The generic NMT model hallucinated a passive state (élément optique couplé) and produced a non-existent linguistic hybrid: lumière couplée in. This failure masks the active functional mechanism of the invention (light injection) and introduces untranslated remnants into the legal claim. The alignment protocol enforces "Directional Functional Mapping" to ensure [In-Coupling] maps to active injection terms like couplage d'entrée. | View PDF |
| L1-012 | Register Failure | Terminology Register Failure — The "Weaving" Shuttle. In a neuroprosthetic brain electrode patent, the term "shuttle element" refers to a mechanical component used to deploy electrode shanks into brain tissue. The generic NMT model translated this as élément de navette. In French, navette is heavily weighted toward the textile industry (weaving shuttle), importing archaic industrial terminology into a sterile surgical context. The alignment protocol enforces "Medical-Mechanical Register Locking" to ensure functional equivalents like élément de transfert are used. | View PDF |
| L1-013 | Register Failure | Register Hallucination — The "Office Building" Lens. In a liquid crystal lens patent, the term "communication space" (a microscopic gap between grooves) was translated as espace de communication. While literally correct, this term in French is a standard HR / architectural phrase for office meeting areas, importing a "Corporate / Social" register into a micro-optical device. The alignment protocol enforces "Fluid Dynamics Register Locking" to ensure functional terms like espace de mise en communication are used. | View PDF |