Translation memories - Pixwel Platform

This document describes how Translation Memories (TM) are indexed (written) and fetched (read). A translation memory is a per-project store of previously-translated lines so that the same source line, encountered again, can be auto-filled with its prior translation. Phrases are stored in the phrases collection (Phrases model). Each phrase records a single source line (ov) and its translation (text), scoped by project and language.

Data Model

Phrases (api/models/Phrases.php)

Field	Type	Notes
`_id`	id
`project`	id	The project the phrase belongs to. Match scope.
`asset`	id	The asset it was indexed from. Recorded but not used when fetching.
`translation`	id	The translation document it was indexed from. Used to wipe a doc’s phrases on re-index.
`language`	string	Target language code. Match scope.
`ov`	string	The tag/newline-stripped source (OV) line. Match key.
`text`	string	The translated line returned to the editor.
`print`	boolean	`true` if generated from a print asset’s `printLines`. Written but never read.

Validation: project and asset must exist; language, ov, and text must be non-empty. A blank cleaned-OV or blank translation silently fails to save.

Indexing (writing phrases)

Trigger

Indexing runs through the GeneratePhrases save filter, wired into the Translations model save chain (api/models/Translations.php:41, implemented in api/models/translations/save/GeneratePhrases.php). It fires on every translation save, but indexes only if all of the following hold:

status === 'submitted' — a for-review save does not index. The later approval save (which flips status to submitted) is what indexes a reviewed translation.
multiTranslation is empty — dual-language / multiTranslation submissions never index.
translator !== "Pixwel" — transcriptions (OV authoring) are excluded.
The asset has an OV transcription of the matching type. The type is resolved from the work request:
- graphics translation → autogfx (if auto) or graphics
- dialogue translation → dialogue (graphics asset) or _id
- otherwise → _id

For image assets, indexing runs twice: indexByOV and indexByPrintOV.

Per-line rules (`indexByOV` / `indexByPrintOV`)

api/models/Translations.php:394 and :456.

Bail entirely if count(translation lines) !== count(OV lines), or the translation doesn’t exist. Matching is strictly by line position (k), not by content.
Wipe all existing phrases for this translation first (Phrases::remove(['translation' => _id]), plus print: true for the print variant).
Per line, skip if text is empty.
Per line, skip if custom === false && machine === false. Only lines that were user-edited (custom) or machine-translated (machine) are indexed. Lines taken verbatim from a TM match (auto) or left as untranslated OV are skipped. (See Line source flags.)
Compute the match key: cleanLine(OV_line[k].text) — strips HTML tags, \r, and \n.
Remove any prior phrase with the same {ov, project, language} (+ print: true for print) — so a {project, language, ov} triple holds at most one phrase.
Create the new Phrases document.

Line source flags

The subtitler maps the editor’s translationFrom value onto the stored flags when saving (ui/3x/modules/services/subtitle-service.js:307-309):

`translationFrom`	`auto`	`custom`	`machine`	Indexed?
`'custom'` (user-edited)	false	true	false	✅
`'mt'` (machine translation)	false	false	true	✅
`'tm'` (filled from a TM match)	true	false	false	❌ skipped
`'ov'` (untranslated / fell back to original)	false	false	false	❌ skipped

Bulk reindex

Assets::indexTranslations() (api/models/Assets.php:580) wipes all phrases for an asset and re-runs indexByOV for every submitted, non-OV translation on it. Only invoked by the RegeneratePhrases migration (api/migrations/RegeneratePhrases.php) — never from the UI.

Fetching (reading phrases)

Endpoint

GET /translations/translate?id=…&language=… (api/controllers/Translations.php:18). Adding machine=true routes to AWS Translate instead of TM. Resolves to Translations::translate() → getMemoryTranslation() (api/models/Translations.php:239).

Frontend input

getTranslationId(workRequest) (ui/3x/utils/orders.js:175) supplies the id. It is the target translation’s _id (dialogue, graphics, or both, depending on order mode):

Order mode	`id` returned
`print`, `script`	`workRequest.translation._id`
`gfx`, `autogfx`	`workRequest.graphicsTranslation._id`
`script+gfx`	`[translation._id, graphicsTranslation._id]`

For dual-language orders (e.g. GER-PFR), the frontend splits the language, makes one call per language, and merges results positionally with a <span></span> separator to match the dual-language editor rendering (ui/3x/modules/services/translation-service.js getTranslationMemories).

Server lookup rules (`getMemoryTranslation`)

Load the document by the passed _id. Use printLines for document/image media types, otherwise lines.
For each line, look up a Phrases match on exactly { project, ov: cleanLine(line.text), language }. Matching is by exact tag/newline-stripped source-text equality, scoped to project + language — not asset. This is what enables reuse of a translation across different assets in the same project.
order: ['_id' => 'desc'] — on multiple matches, the most recently created phrase wins.
Return one entry per line: the phrase’s text, or null when there is no match.
The lookup does not filter on print.

Subtitler Editor — auto-translation & provenance

This section covers how the subtitler applies TM and MT to the editor and how each line’s source is shown. (Distinct from indexing/fetching above, which is the API side.)

Toggles and layering

The subtitler has two independent toggles — Translation Memories and Machine Translations — that can both be on at once. TranslationService.autoTranslate() resolves each line to the highest-priority source that has data: Precedence: Custom > TM > MT > OV

Custom — a manual edit. Always wins and is never overwritten by a toggle; autoTranslate returns custom lines untouched.
TM — a translation-memory match (when the TM toggle is on).
MT — a machine translation (when the MT toggle is on). MT is applied first, then TM overwrites per line where a match exists, so with both toggles on TM takes precedence and MT backfills the rest.
OV — the fallback when no higher source applies (and the line isn’t a custom edit).

Turning a toggle off recomputes non-custom lines (reverting them to OV when nothing else applies); custom edits remain.

History

Layering and custom-edit preservation are the original behavior. PR #3187 (6ad8027b6, “allow user to enable only one of the two toggles…”, PLATFORM-3916, May 2026) made the toggles mutually exclusive and changed autoTranslate to overwrite custom edits (tracking a customText field to restore them on toggle-off). That PR has been reverted — the toggles are independent again and custom edits are preserved by precedence, so the customText machinery is gone. Every reverted behavior (mutual exclusion, custom overwrite, customText, the related tests) traces solely to #3187.

Provenance colors

Each line’s source is shown by a colored left accent bar on the translation field, using fixed semantic colors:

Source	Color	Hex
Custom	green	`#2f9e44`
TM	violet	`#9d4edd`
MT	orange	`#f08c00`

Chosen for strong separation in both hue and lightness (green = dark, violet = medium, orange = light) so the three sources remain distinguishable for colorblind users and in grayscale — verified against deuteranopia/protanopia/tritanopia simulations. | OV | — (neutral) | — | These colors live in ui/3x/constants/provenance.js as PROVENANCE_COLORS and are intentionally not part of the themeable palette (~/theme) — provenance is a semantic status signal that must stay stable across themes/white-labeling. The same colors are reused for:

A color key (ProvenanceKey, data-testid="sub-provenance-key") in the actions bar — Custom / Memories / Machine swatches + labels, the always-visible legend for the accent-bar colors.
The TM / MT toggles — each toggle’s checked state takes its source color (accent prop on TranslationToggle), so a toggle visually matches the lines it produces.

The left accent is declared before the :focus / .is-editing rules so the blue active-cell border still wins while editing. Notes:

There is no per-source icon. An earlier version colored a per-line icon (circleCheck / translationMemories / machineTranslations); it became redundant once the accent bar + color key carried the signal, and was removed. Split/merge icons remain (yellow).
Non-color fallback: the field carries a title tooltip with the source label (Custom/Memories/Machine/OV), and the edit menu shows the same label as text for the active row.
Icon resolves Theme[color] || color, so it accepts both theme keys and raw hex (kept for the split/merge icons and future use).

Key Rules

Match scope is {project, language, ov-text} — never asset. TM is reused project-wide.
Match key is the cleaned source line — HTML tags and line breaks are stripped on both write and read, so matching is exact on the visible source text only.
Only user-edited (custom) or machine (machine) lines are indexed. Untouched OV lines and lines accepted verbatim from a TM suggestion are not written back.
submitted status indexes; for-review does not. Reviewed translations index when they are later approved to submitted.
multiTranslations never index — but the fetch path fully supports reading dual-language TM.
Most recent phrase wins on duplicate matches (_id desc).
Image assets index both subtitle and print phrases.
Line count must match the OV or the whole translation is skipped during indexing.

Known Asymmetries

These are mismatches between the write and read paths, relevant to ongoing TM work:

print is written but never read. indexByPrintOV tags phrases print: true, but getMemoryTranslation never filters on it. A print fetch can therefore return a non-print phrase (and vice versa) — whichever is newest.
multiTranslations are read but never written. Dual-language submissions contribute nothing to the memory, even though the fetch path does elaborate per-language merging to read them.
Index records asset; fetch ignores it. Reuse is project-wide by design — confirm that is the intended boundary for any given workflow.
Index keys on the OV transcription’s line text; fetch keys on the passed translation document’s line text. They align only because matching is positional on index and source-text-equality on read.

Code References

GeneratePhrases::filter() — api/models/translations/save/GeneratePhrases.php — indexing trigger and gate conditions
Translations::indexByOV() / indexByPrintOV() — api/models/Translations.php:394 / :456 — per-line indexing
Translations::translate() / getMemoryTranslation() — api/models/Translations.php:217 / :239 — fetch logic
Translations::cleanLine() — api/models/Translations.php:294 — match-key normalization
Assets::indexTranslations() — api/models/Assets.php:580 — bulk reindex (migration only)
Phrases — api/models/Phrases.php — phrase schema and validation
translate route — api/controllers/Translations.php:18 — endpoint binding
TranslationService.getTranslationMemories() — ui/3x/modules/services/translation-service.js — frontend fetch + dual-language merge
getTranslationId() — ui/3x/utils/orders.js:175 — resolves which translation id to fetch
fetchTranslationMemories() — ui/3x/modules/hooks/use-subtitler-queries.js:789 — assembles TM into the editor
SubtitleService.to2xTranslation() — ui/3x/modules/services/subtitle-service.js:298 — maps editor source onto custom/machine/auto flags
TranslationService.autoTranslate() — ui/3x/modules/services/translation-service.js — applies TM/MT to the editor (Custom > TM > MT > OV)
PROVENANCE_COLORS — ui/3x/constants/provenance.js — fixed semantic source colors (not themed)
Provenance rendering — ui/3x/modules/components/subtitler/segment/index.js (source-* classes + title on the field) and segment.css.js (left accent bar)
Toggles & color key — ui/3x/pages/subtitler/index.js and subtitler.css.js (TranslationToggle accent prop, ProvenanceKey)

​Data Model

​Indexing (writing phrases)

​Trigger

​Per-line rules (indexByOV / indexByPrintOV)

​Line source flags

​Bulk reindex

​Fetching (reading phrases)

​Endpoint

​Frontend input

​Server lookup rules (getMemoryTranslation)

​Subtitler Editor — auto-translation & provenance

​Toggles and layering

​History

​Provenance colors

​Key Rules

​Known Asymmetries

​Code References