Difference between revisions of "Translation Memory Rules Wordfast Classic"

From Wordfast Wiki
Jump to: navigation, search
 
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
Translation Memory (TM) rules are used to fine-tune WFC's TM engine. The TM engine's task is to find the best suitable match for the source segment you are currently translating when a segment is opened. Unfortunately, in many instances, there is no "perfect match", or objective identity between the source segment in your document, and the closest candidate in the TM. In this situation, the TM engine has to draw a subjective match through a process that uses artificial intelligence to "figure out" whether the degree of fuzziness makes the candidate TU a good choice. In some cases, WFC uses a substitution algorithm to update the proposed segment and bring it closer to an exact match. The elements that are updated or substituted are typically untranslatable items (like numbers, fields, tags), also called placeables. The goal is to relieve the translator from the chore of spotting and updating placeables.
 
Translation Memory (TM) rules are used to fine-tune WFC's TM engine. The TM engine's task is to find the best suitable match for the source segment you are currently translating when a segment is opened. Unfortunately, in many instances, there is no "perfect match", or objective identity between the source segment in your document, and the closest candidate in the TM. In this situation, the TM engine has to draw a subjective match through a process that uses artificial intelligence to "figure out" whether the degree of fuzziness makes the candidate TU a good choice. In some cases, WFC uses a substitution algorithm to update the proposed segment and bring it closer to an exact match. The elements that are updated or substituted are typically untranslatable items (like numbers, fields, tags), also called placeables. The goal is to relieve the translator from the chore of spotting and updating placeables.
 
 
This is obvious when numbers are involved. WFC will consider the following two sentences to be "exact" matches:
 
 
  
The net weight is 1,000 Kg.
+
This is obvious when numbers are involved. WFC will consider the following two sentences to be "exact" matches:
  
The net weight is 2,000 Kg.
+
::<span style="font-family: Courier New; font-size: 10pt">The net weight is 1,000 Kg.</span>
 +
::<span style="font-family: Courier New; font-size: 10pt">The net weight is 2,000 Kg.</span>
  
 
+
because WFC can easily detect numbers and carry out a substitution. In this situation, numbers like 1,000 or 2,000 are considered placeables by the TM engine, and they are updated to reflect the document's reality rather than the TM.
because WFC can easily detect numbers and carry out a substitution. In this situation, numbers like 1,000 or 2,000 are considered placeables by the TM engine, and they are updated to reflect the document's reality rather than the TM.
 
 
   
 
   
  
 
The method is a great help and time-saver in most situations. Most here is so overwhelming that, by default, most translation tools are set to automatically substitute placeables like numbers, or fields.
 
The method is a great help and time-saver in most situations. Most here is so overwhelming that, by default, most translation tools are set to automatically substitute placeables like numbers, or fields.
  
+
This method can fail when the placeable substitution requires a grammatical or syntactical update of the target segment - a task which WFC cannot perform. In the following example:  
 
 
This method can fail when the placeable substitution requires a grammatical or syntactical update of the target segment - a task which WFC cannot perform. In the following example:
 
 
 
 
 
 
The process takes 2 years to complete.
 
 
 
The process takes 8 years to complete.
 
  
 +
::<span style="font-family: Courier New; font-size: 10pt">The process takes 2 years to complete.</span>
 +
::<span style="font-family: Courier New; font-size: 10pt">The process takes 8 years to complete.</span>
  
 
the substitution process (replacing 2 with 8) would work flawlessly with most languages, but would produce a grammatically incorrect sentence in a few languages, like Russian.
 
the substitution process (replacing 2 with 8) would work flawlessly with most languages, but would produce a grammatically incorrect sentence in a few languages, like Russian.
 
  
 
The TM rules tab offers a high level of customization in this respect.
 
The TM rules tab offers a high level of customization in this respect.
Line 34: Line 23:
 
Some penalties apply only to exact (so-called 100%) matches, others on lower values of match values, exact or fuzzy.
 
Some penalties apply only to exact (so-called 100%) matches, others on lower values of match values, exact or fuzzy.
  
 +
<span style="color:red; background:#FFFF00">!</span> Note: The three penalties below (on TM, BTM, and Remote TM) are made visible to the translator, and constitute a temporary penalty. The match rate (the small purple number between the two segments) will appear bold and red to warn the translator that a temporary penalty has been applied (as is the case with attribute-based penalties). Contrary to other penalties further below, those three penalties do not turn a 100% match into a "real" fuzzy match, which means that if a penalized 100% proposition is accepted as is by the translator, the translation unit is not written into the TM or VLTM.
 
   
 
   
 
'''Note:''' The three penalties below (on TM, BTM, and Remote TM) are made visible to the translator, and constitute a temporary penalty. The match rate (the small purple number between the two segments) will appear bold and red to warn the translator that a temporary penalty has been applied (as is the case with attribute-based penalties). Contrary to other penalties further below, those three penalties do not turn a 100% match into a "real" fuzzy match, which means that if a penalized 100% proposition is accepted as is by the translator, the translation unit is not written into the TM or VLTM.
 
 
 
 
 
* '''Penalty on TM:''' (100% and fuzzies) this penalty is applied when a proposed match is drawn from the TM.
 
* '''Penalty on TM:''' (100% and fuzzies) this penalty is applied when a proposed match is drawn from the TM.
  
Line 47: Line 32:
 
   
 
   
  
'''Note:''' In all cases below, a penalty of 1 point or more would produce a so-called fuzzy match. If the translator accepts the translation as is, WFC will write the (now new) translation unit into the TM, therefore adding an additional version of the previously existing TU, this time with a different case. It is important to note that although penalties produce a more strict TM engine, they tend to populate TMs with more translation units.
+
<span style="color:red; background:#FFFF00">!</span> Note: In all cases below, a penalty of 1 point or more would produce a so-called fuzzy match. If the translator accepts the translation as is, WFC will write the (now new) translation unit into the TM, therefore adding an additional version of the previously existing TU, this time with a different case. It is important to note that although penalties produce a more strict TM engine, they tend to populate TMs with more translation units.
 
   
 
   
  
 
* '''Penalty for case difference:''' (100% only) this penalty is applied when an exact match is found in the TM, but case is the only difference. Example:
 
* '''Penalty for case difference:''' (100% only) this penalty is applied when an exact match is found in the TM, but case is the only difference. Example:
 
   
 
   
 
+
::<span style="font-family: Courier New; font-size: 10pt">Meet us at the ATA!</span>
Meet us at the ATA!
+
::<span style="font-family: Courier New; font-size: 10pt">MEET US AT THE ATA!</span>
 
 
MEET US AT THE ATA!
 
 
 
 
   
 
   
 
* '''Penalty for different numbers:''' (100% only) this penalty is applied when different numbers are found in a segment. Example:
 
* '''Penalty for different numbers:''' (100% only) this penalty is applied when different numbers are found in a segment. Example:
  
 +
::<span style="font-family: Courier New; font-size: 10pt">The process takes 2 years to complete.</span>
 +
::<span style="font-family: Courier New; font-size: 10pt">The process takes 8 years to complete.</span>
  
The process takes 2 years to complete.
+
The last two items apply when an existing TU is re-used, or edited, after WFC has proposed it as a 100% match. A TU is re-used if you validate a proposed 100% (green) TU without editing (modifying) the target segment (the translation). A TU is edited if you edit (modify) the target segment. The following rules apply immediately after you validate such "100% match" TUs, to control the way they are stored into the TM.  
  
The process takes 8 years to complete.
+
'''In-Context Matches:''' This features enables In-Context Matches (ICM). ICMs are matches where the previous and the following segments match at 100%. The idea is that if a segment is embedded in a series of three exact matches, the trustworthiness of that segment greatly increases. ICMs have a score of 101 so they are picked first in case there are other competing 100% matches. Remember that match scoring, in TMs, carries little linguistic sense.
  
 +
If your TM had no previous ICM detection, you can reprocess it to enable ICM matches:
  
The last two items apply when an existing TU is re-used, or edited, after WFC has proposed it as a 100% match. A TU is re-used if you validate a proposed 100% (green) TU without editing (modifying) the target segment (the translation). A TU is edited if you edit (modify) the target segment. The following rules apply immediately after you validate such "100% match" TUs, to control the way they are stored into the TM.
+
Enable a level of ICM support in WFC's TM rules tab.
 
 
 
In-Context Matches: This features enables In-Context Matches (ICM). ICMs are matches where the previous and the following segments match at 100%. The idea is that if a segment is embedded in a series of three exact matches, the trustworthiness of that segment greatly increases. ICMs have a score of 101 so they are picked first in case there are other competing 100% matches. Remember that match scoring, in TMs, carries little linguistic sense.
 
 
 
 
If your TM had no previous ICM detection, you can reprocess it to enable ICM matches, enable a level of ICM support in WFC's TM rules tab.
 
 
 
 
If the TM has been previously sorted or shuffled, segments may not be in their original sequence any more. If that is the case, use WFC's Data editor (one icon before the last in the WFC toolbar), click "Tools", then sort the TM on date. That will restore a decent level of historical sequence, which is important for ICMs.
 
If the TM has been previously sorted or shuffled, segments may not be in their original sequence any more. If that is the case, use WFC's Data editor (one icon before the last in the WFC toolbar), click "Tools", then sort the TM on date. That will restore a decent level of historical sequence, which is important for ICMs.
  
 +
Back in the WFC setup dialog box, in the Translation Memory pane, click the "Reorganise" button. Wordfast will create the necessary indexes for ICMs.
  
Back in the WFC setup dialog box, in the Translation Memory pane, click the "Reorganise" button. Wordfast will create the necessary indexes for ICMs.
 
  
  
 
* '''Penalty for whitespace difference:''' (100% only) this penalty is applied when an exact match is found in the TM, but the only difference is in spaces found at either beginning or end of the segment, or where there is a different number of repeated spaces within the segment. Example:
 
* '''Penalty for whitespace difference:''' (100% only) this penalty is applied when an exact match is found in the TM, but the only difference is in spaces found at either beginning or end of the segment, or where there is a different number of repeated spaces within the segment. Example:
  
 +
::<span style="font-family: Courier New; font-size: 10pt">Meet us at the ATA!</span>
 +
::<span style="font-family: Courier New; font-size: 10pt">Meet us at the ATA!</span>
 
   
 
   
 
Meet us at the<span style="background:yellow"> </span>ATA!
 
 
Meet us at the<span style="background:yellow">  </span>ATA!
 
 
 
 
 
* '''Penalty for different quotes/apostrophes/dashes:''' (100% only) this penalty is applied when an exact match is found in the TM, but the types of Quotes, Apostrophes, or Dashes (QADs), are different.
 
* '''Penalty for different quotes/apostrophes/dashes:''' (100% only) this penalty is applied when an exact match is found in the TM, but the types of Quotes, Apostrophes, or Dashes (QADs), are different.
 +
{|style="width: 400px"
 +
|Different " quotes are:||" « » “ ” „ , ‛ ‘ ’
 +
|-
 +
|Different ' apostrophes are:||' ` ’
 +
|-
 +
|Different - dashes are:||- – —
 +
|}
  
+
Note that is sometimes used as a closing quote, sometimes as an apostrophe. WFC assumes is a closing quote when the same segment contains before .
 
 
Different " quotes are:  " « » “ ” „ , ‛ ‘ ’
 
 
 
Different ' apostrophes are:  ' ` ’
 
 
 
Different - dashes are:  - – —
 
 
 
 
 
 
Note that ' is sometimes used as a closing quote, sometimes as an apostrophe. WFC assumes ' is a closing quote when the same segment contains ' before '.
 
 
  
 
WFC is blind to QADs when a 100% match is found, and when, in the TM's segment, the only difference is made of different QADs which WFC can substitute without any ambiguity, as in:
 
WFC is blind to QADs when a 100% match is found, and when, in the TM's segment, the only difference is made of different QADs which WFC can substitute without any ambiguity, as in:
 
 
This is a "quoted sentence".
 
  
This is a "quoted sentence".
+
::<span style="font-family: Courier New; font-size: 10pt">This is a "quoted sentence".</span>
 +
::<span style="font-family: Courier New; font-size: 10pt">This is a "quoted sentence".</span>
 
   
 
   
 
 
This penalty will force WFC's TM engine to consider the two segments above as not being 100% matches.
 
This penalty will force WFC's TM engine to consider the two segments above as not being 100% matches.
  
Line 119: Line 85:
  
  
This feature offers 4 choices:
+
This feature offers 4 choices:  
 
 
 
* '''Add to TM by overwriting the existing TU:''' the existing TU will be deleted and the edited TU added to the TM, i.e., the edited TU replaces the existing TU;
 
  
 +
* '''Add to TM by overwriting the existing TU:''' : the existing TU will be deleted and the edited TU added to the TM, i.e., the edited TU replaces the existing TU;
 
* '''Add to TM; overwrite existing TU if attributes are identical:''' the edited TU is added to the TM, but the existing TU will be deleted only if all its attribute values (like User ID, Client, Subject etc) are identical to the newly created TU;
 
* '''Add to TM; overwrite existing TU if attributes are identical:''' the edited TU is added to the TM, but the existing TU will be deleted only if all its attribute values (like User ID, Client, Subject etc) are identical to the newly created TU;
 
 
* '''Add to TM; do not overwrite existing TU:''' the edited TU will be added to the TM and the existing one will not be deleted from the TM, even if attributes are identical. Normally, this option should not be used, except in very specific projects, because it generates real redundancies.
 
* '''Add to TM; do not overwrite existing TU:''' the edited TU will be added to the TM and the existing one will not be deleted from the TM, even if attributes are identical. Normally, this option should not be used, except in very specific projects, because it generates real redundancies.
 
 
* '''Do not add to TM:''' the edited TU will not be added to the TM at all, and the existing TU will not be deleted.
 
* '''Do not add to TM:''' the edited TU will not be added to the TM at all, and the existing TU will not be deleted.
  
+
When WFC finds more than one possible translation for a source segment, the Alt+Right shortcut will show the other possible translations the proposed translation will be displayed one after the next in the target segment. Alt+Right/Left will let you cycle through all proposed translations.  
 
 
When WFC finds more than one possible translation for a source segment, the Alt+Right shortcut will show the other possible translations - the proposed translation will be displayed one after the next in the target segment. Alt+Right/Left will let you cycle through all proposed translations.
 
 
 
 
  
 
In case there are many identical translation units in the TM, the first match proposed by WFC should be the most recent one, based on its date stamp.
 
In case there are many identical translation units in the TM, the first match proposed by WFC should be the most recent one, based on its date stamp.
  
+
When re-using an existing TU, update if attributes are different: if the currently active attributes (as set in WFC > Translation Memory > Attributes) are different from the candidate TU's own attributes (as found in the TM), you may choose to update the TU in the TM with the new set of attributes (the TU will be rewritten "as is", but the current set of attributes will replace the existing ones). Check the "Update existing TU if attributes are different" checkbox. The usage counter will be incremented, and the new set of attributes will replace the TU's existing attributes; source and target text remain the same.
  
When re-using an existing TU, update if attributes are different: if the currently active attributes (as set in WFC > Translation Memory > Attributes) are different from the candidate TU's own attributes (as found in the TM), you may choose to update the TU in the TM with the new set of attributes (the TU will be rewritten "as is", but the current set of attributes will replace the existing ones). Check the "Update existing TU if attributes are different" checkbox. The usage counter will be incremented, and the new set of attributes will replace the TU's existing attributes; source and target text remain the same.
 
  
  
 
   Back to [[Wordfast Classic User Manual]]
 
   Back to [[Wordfast Classic User Manual]]

Latest revision as of 01:56, 29 October 2017

Translation Memory (TM) rules are used to fine-tune WFC's TM engine. The TM engine's task is to find the best suitable match for the source segment you are currently translating when a segment is opened. Unfortunately, in many instances, there is no "perfect match", or objective identity between the source segment in your document, and the closest candidate in the TM. In this situation, the TM engine has to draw a subjective match through a process that uses artificial intelligence to "figure out" whether the degree of fuzziness makes the candidate TU a good choice. In some cases, WFC uses a substitution algorithm to update the proposed segment and bring it closer to an exact match. The elements that are updated or substituted are typically untranslatable items (like numbers, fields, tags), also called placeables. The goal is to relieve the translator from the chore of spotting and updating placeables.

This is obvious when numbers are involved. WFC will consider the following two sentences to be "exact" matches:

The net weight is 1,000 Kg.
The net weight is 2,000 Kg.

because WFC can easily detect numbers and carry out a substitution. In this situation, numbers like 1,000 or 2,000 are considered placeables by the TM engine, and they are updated to reflect the document's reality rather than the TM.


The method is a great help and time-saver in most situations. Most here is so overwhelming that, by default, most translation tools are set to automatically substitute placeables like numbers, or fields.

This method can fail when the placeable substitution requires a grammatical or syntactical update of the target segment - a task which WFC cannot perform. In the following example:

The process takes 2 years to complete.
The process takes 8 years to complete.

the substitution process (replacing 2 with 8) would work flawlessly with most languages, but would produce a grammatically incorrect sentence in a few languages, like Russian.

The TM rules tab offers a high level of customization in this respect.


Some penalties apply only to exact (so-called 100%) matches, others on lower values of match values, exact or fuzzy.

! Note: The three penalties below (on TM, BTM, and Remote TM) are made visible to the translator, and constitute a temporary penalty. The match rate (the small purple number between the two segments) will appear bold and red to warn the translator that a temporary penalty has been applied (as is the case with attribute-based penalties). Contrary to other penalties further below, those three penalties do not turn a 100% match into a "real" fuzzy match, which means that if a penalized 100% proposition is accepted as is by the translator, the translation unit is not written into the TM or VLTM.

  • Penalty on TM: (100% and fuzzies) this penalty is applied when a proposed match is drawn from the TM.
  • Penalty on BTM: (100% and fuzzies) this penalty is applied when a proposed match is drawn from the BTM.
  • Penalty on remote TM: (100% and fuzzies) this penalty is applied when a proposed match is drawn from a remote TM, either through Wordfast Anywhere or through Wordfast Server.


! Note: In all cases below, a penalty of 1 point or more would produce a so-called fuzzy match. If the translator accepts the translation as is, WFC will write the (now new) translation unit into the TM, therefore adding an additional version of the previously existing TU, this time with a different case. It is important to note that although penalties produce a more strict TM engine, they tend to populate TMs with more translation units.


  • Penalty for case difference: (100% only) this penalty is applied when an exact match is found in the TM, but case is the only difference. Example:
Meet us at the ATA!
MEET US AT THE ATA!
  • Penalty for different numbers: (100% only) this penalty is applied when different numbers are found in a segment. Example:
The process takes 2 years to complete.
The process takes 8 years to complete.

The last two items apply when an existing TU is re-used, or edited, after WFC has proposed it as a 100% match. A TU is re-used if you validate a proposed 100% (green) TU without editing (modifying) the target segment (the translation). A TU is edited if you edit (modify) the target segment. The following rules apply immediately after you validate such "100% match" TUs, to control the way they are stored into the TM.

In-Context Matches: This features enables In-Context Matches (ICM). ICMs are matches where the previous and the following segments match at 100%. The idea is that if a segment is embedded in a series of three exact matches, the trustworthiness of that segment greatly increases. ICMs have a score of 101 so they are picked first in case there are other competing 100% matches. Remember that match scoring, in TMs, carries little linguistic sense.

If your TM had no previous ICM detection, you can reprocess it to enable ICM matches:

Enable a level of ICM support in WFC's TM rules tab. If the TM has been previously sorted or shuffled, segments may not be in their original sequence any more. If that is the case, use WFC's Data editor (one icon before the last in the WFC toolbar), click "Tools", then sort the TM on date. That will restore a decent level of historical sequence, which is important for ICMs.

Back in the WFC setup dialog box, in the Translation Memory pane, click the "Reorganise" button. Wordfast will create the necessary indexes for ICMs.


  • Penalty for whitespace difference: (100% only) this penalty is applied when an exact match is found in the TM, but the only difference is in spaces found at either beginning or end of the segment, or where there is a different number of repeated spaces within the segment. Example:
Meet us at the ATA!
Meet us at the ATA!
  • Penalty for different quotes/apostrophes/dashes: (100% only) this penalty is applied when an exact match is found in the TM, but the types of Quotes, Apostrophes, or Dashes (QADs), are different.
Different " quotes are: " « » “ ” „ , ‛ ‘ ’
Different ' apostrophes are: ' ` ’
Different - dashes are: - – —

Note that ’ is sometimes used as a closing quote, sometimes as an apostrophe. WFC assumes ’ is a closing quote when the same segment contains ‘ before ’.

WFC is blind to QADs when a 100% match is found, and when, in the TM's segment, the only difference is made of different QADs which WFC can substitute without any ambiguity, as in:

This is a "quoted sentence".
This is a "quoted sentence".

This penalty will force WFC's TM engine to consider the two segments above as not being 100% matches.


Editing an existing TU

This feature offers 4 choices:

  • Add to TM by overwriting the existing TU: : the existing TU will be deleted and the edited TU added to the TM, i.e., the edited TU replaces the existing TU;
  • Add to TM; overwrite existing TU if attributes are identical: the edited TU is added to the TM, but the existing TU will be deleted only if all its attribute values (like User ID, Client, Subject etc) are identical to the newly created TU;
  • Add to TM; do not overwrite existing TU: the edited TU will be added to the TM and the existing one will not be deleted from the TM, even if attributes are identical. Normally, this option should not be used, except in very specific projects, because it generates real redundancies.
  • Do not add to TM: the edited TU will not be added to the TM at all, and the existing TU will not be deleted.

When WFC finds more than one possible translation for a source segment, the Alt+Right shortcut will show the other possible translations – the proposed translation will be displayed one after the next in the target segment. Alt+Right/Left will let you cycle through all proposed translations.

In case there are many identical translation units in the TM, the first match proposed by WFC should be the most recent one, based on its date stamp.

When re-using an existing TU, update if attributes are different: if the currently active attributes (as set in WFC > Translation Memory > Attributes) are different from the candidate TU's own attributes (as found in the TM), you may choose to update the TU in the TM with the new set of attributes (the TU will be rewritten "as is", but the current set of attributes will replace the existing ones). Check the "Update existing TU if attributes are different" checkbox. The usage counter will be incremented, and the new set of attributes will replace the TU's existing attributes; source and target text remain the same.


  Back to Wordfast Classic User Manual