Tags in a WFC TM Wordfast Classic

From Wordfast Wiki
Revision as of 09:18, 28 September 2017 by Samar (talk | contribs) (Created page with "When dealing with so-called tagged documents, a Wordfast TM records placeholders for tags. Those placeholders have a &tX; format, where X is the order of appearance of tags...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

When dealing with so-called tagged documents, a Wordfast TM records placeholders for tags. Those placeholders have a &tX; format, where X is the order of appearance of tags in the source segment. The X order is noted A (ANSI decimal 65), B, C, etc., up to ANSI decimal code 165. Thus, there can be no more than 100 tags in a Wordfast segment.

For example, the following tagged source segment:

<FONT FACE="Helvetica"> This is some text.</FONT>

would appear, in a Wordfast TM as:

&tA;This is some text.&tB;

At translation time, when Wordfast pulls a TU from the TM and is about to propose the TU's target segment as a translation candidate, Wordfast uses a substitution algorithm to dress the proposed target segment with the full "real" tags, taken from the document's (not the TM's) source segment, using a triangulation method:

Document's source segment <—> TM's source segment <—> TM's target segment

The triangulation can be successful only if all target tags have a "parent" tag in the source segment. This is because, at translation time, only the new source segment, and the target has to be worked out by the machine. In other words, it's not a problem if the TM's source segment contains tags that do not appear in the TM's target segment. The reverse is a problem, however. If the TM's target segment has tags that do not appear in the TM's source segment (orphaned tags), Wordfast records the full syntax of these orphaned tags at TU creation time, so that they can be restored properly at translation time, when the target segment must be proposed with the correct format. If we have, at TU creation time:

In source segment: <FONT FACE="Arial">This is some text:
In target segment: <FONT FACE="Arial">Voici du texte&nbsp;:

then the target segment would be recorded in the TM as:

&tA;Voici du texte&t=;&nbsp;&t=;:

where &t=; opens and closes the original tag syntax (&nbsp; in our example).

Other examples of segments:

In source segment: <FT>This is some text<AR> here<FT>.
In target segment: <AR>Voici du texte<FT> ici.
In TM TU source: &tA;This is some text&tB; here&tA;.
In source segment: <FT>This is some text<AR> here.
In target segment: <AR>Voici du<AR> texte<X;X> ici<FT>.
In TM TU source: &tA;This is some text&tB; here.
In TM TU target: &tB;Voici du&tB; texte&t=;<X;X>&t=; ici&tA;.

In most translation memory systems, TMs are overloaded with tags that do not belong there. A TM takes significance when its content is put to (re-) use, meaning, when its past translations are leveraged for a new transation project. Re-using TM content is only done in the presence of a new document to be translated. In other words, at use time, we can operate a triangulation between a new document's new source segment which contains the new formatting, and an existing TM source/target pair which contains formatting placeholders.

Only orphaned (unknown) target-side tags need to store the complete tag syntax. Those are target-side tags that have no equivalent n the source segment. All the rest is unnecessary, purely redundant information.

 Back to Wordfast Classic User Manual