Difference between revisions of "The WFC Translation memory format Wordfast Classic"

From Wordfast Wiki
Jump to: navigation, search
Line 85: Line 85:
 
|PS
 
|PS
 
|}
 
|}
 +
 +
The header (first line in the TU) in the example above defines two attributes named Domain and Client. The first TU contains two attribute values: EL and PS. Either attribute names (unique per TM) or attribute values (multiple: one per TU) can be made of up to 64 characters (acronyms are used in the example above: EL for Electronics and PS for a client, however, longer descriptors can be used). Question/exclamation marks ( ! ¡ ? ¿ ) are forbidden in attributes names and values.
 +
 
When reading a TU, Wordfast defaults on the side of optimism in case the TU does not look correct or canonical. When in a TU:
 
When reading a TU, Wordfast defaults on the side of optimism in case the TU does not look correct or canonical. When in a TU:
  

Revision as of 20:09, 5 November 2017

A Wordfast translation memory is a tab-delimited text file. It's the simplest of all formats - it can be opened with text editors, like Notepad, or unicode-compliant word processors, as well as with Excel. Wordfast TMs can be regular ANSI (8-bit) text, or Unicode UTF-16 (both little-endian and big-endian).

A Translation Memory (TM) is a set of lines (paragraphs) of text. In a pure text file where the display does not wrap, lines are paragraphs. The very first line is a header, and all other lines are TUs (Translation Units), sometimes called "entries". Lines/Entries/TUs are sets of fields, a field being any text (even lack of text, which denotes an empty field) followed by a tabulator. In other words, the Wordfast TM format is Tab-delimited Text, which is arguably one of the oldest, most robust, open, easy to manipulate data format ever. In the header (the very first line in a TM), each field begins with a % (per cent) mark.

Fields making up a TU:

Field Example Format Remark
Date 20041231~165410 yyyymmdd~hhmmss - the example here means 31 December 2004, at 16:54:10, local time. See note on the tilde ~ character further below. Optional field: can be empty
User ID

(Attribute #1)

YAC Initials of the TU's creator. Optional field: can be empty
Counter 5 A number between 0 and 9999 that records how many times this TU was proposed as a 100% match and accepted, meaning, re-used, as it is. Optional field: can be empty
Source language EN-US TMX-compliant language code (but case-insensitive with WFC). It is made of a two-letter ISO language code, and optinally, a dash followed by a two-letter local variant. Optional field: can be empty.

Rule: field cannot be longer than 5 characters.

Source segment Red Riding Hood was walking in the woods. The source segment. Maximum size: 8000 Unicode characters. Should contain at least one character.
Target language FR-FR Language code, TMX-compliant Optional field: can be empty.

Rule: field cannot be longer than 5 characters.

Target segment Le Petit Chaperon Rouge se promenait dans les bois. The target segment. Maximum size: 8000 Unicode characters. Optional field: can be empty
Attribute #2 (optional) EL A mnemonic (maximum length=64 characters; no space allowed) for user-defined attribute #1. See Wordfast's "Sample" attributes. Optional field: can be empty+tabulator omitted
Attribute #3 (optional) PS Optional field: can be empty+tabulator omitted
Attribute #4 (optional) Optional field: can be empty+tabulator omitted
Attribute #5 (optional) Optional field: can be empty+tabulator omitted

Here are the first two paragraphs (the TM's header and first Translation Unit) of a TM where the TU is defined as in the table above. Paragraphs are long, so they may wrap in your display - but there are only two paragraphs:

%20041231~160445 %YAC, Yves A. Champollion %TU=00000000 %EN-US %Wordfast TM v5.0 %FR-FR %87412764
20041231~165410 YAC 5 EN-US Red Riding Hood was walking in the woods. FR-FR Le Chaperon Rouge se promenait dans les bois. EL PS

The header (first line in the TU) in the example above defines two attributes named Domain and Client. The first TU contains two attribute values: EL and PS. Either attribute names (unique per TM) or attribute values (multiple: one per TU) can be made of up to 64 characters (acronyms are used in the example above: EL for Electronics and PS for a client, however, longer descriptors can be used). Question/exclamation marks ( ! ¡ ? ¿ ) are forbidden in attributes names and values.

When reading a TU, Wordfast defaults on the side of optimism in case the TU does not look correct or canonical. When in a TU:

  • the date is missing: if Wordfast is executing a loop that parses TUs, then it will take the previous TU's date and increment it with one second, otherwise, it will take the local machine's current date and time;
  • the user ID is empty, Wordfast will assume the TM header's user ID. If it is missing, Wordfast will use the user's identity as defined in Ms-Word. If it is missing, Wordfast will use XX;
  • a language code is missing or incorrect - but less than 6 characters: Wordfast will use the current TM's header language code (the code in the first line of the TM).

Fault detection (Wordfast considering that a TU is a bad one) is based on counting how many tabulators are in a line of text. A line of text with less than 6 tabulators cannot form a valid TU. Another fault-detection method used by Wordfast is that language codes should not be no longer than 5 characters. When language codes of more than 5 characters are encountered during a TM reorganisation, it is an indicator that something is amiss with that particular TU, and it is assumed to be faulty.

Remarks:

  1. The date does not necessarily have a tilde (~) separating date and time. Any printable character can be used there, except a number. Wordfast uses the tilde (~), and the equal (=) sign. The equal sign, in the Wordfast editor, means the TU was "marked" (flagged). This has no consequence at all on the TU's status: it remains fully valid. Although Wordfast always records the date and time when writing a TU, the date and time are optional and could be empty (or even made of an invalid date) in which case Wordfast would simply assume the current date and time. All dates and times are "local", taken from the local computer's clock.
  2. If any optional field is left empty, its trailing tabulator should be present. For a TU to be valid, there must be at least six tabulators, with the fifth field (the source segment, located between the fourth and the fifth tabulator) made of at least one printable character.
  3. The date's first character (a number from 0 to 9, usually, a number 2 if the TU was created in the current millenium) can appear to be "x". This means that this TU is not valid anymore. The first full reorganisation of the TM by Wordfast will erase this TU. Do not remove the "x", or replace it with a number, unless you know what you are doing.
 Back to Wordfast Classic User Manual