Building a Rules File for Tagged Text Translation in Wordfast Pro

From Wordfast Wiki
Jump to: navigation, search

The following instructions are for building a rules (*.properties) file for tagged text translation in Wordfast Pro 3 (WFP3) or Wordfast Pro (WFP). A rules file tells either Wordfast Pro (WFP) application where to look for translatable text in a tagged text file so it can extract the text as source segments.

Create a new rules (*.properties) file using a text editor

  1. Open a preferred text editor (e.g., Notepad in Windows)
  2. Add the following two extraction rules to a new blank text file:
    paragraphPrefix.1=
    paragraphSuffix.1=
  3. Name and save the text file to a simple location (e.g., the desktop), note that location for future reference, and close the file
  4. Go to that file's location now and change its file extension from .txt to .properties
    NOTE: If you don't see the file's .txt extension, go to this wikiHow tutorial and follow Method 2 or Method 3 for Windows, or Method 4 for Mac.
  5. Reopen the text file (now a rules, or .properties, file) in the preferred text editor in preparation for the next section below
    NOTE: If the rules file is not recognized by the operating system, right-click the file, go to Open With and choose the preferred text editor.

Determine tag patterns for translatable text

  1. Open the desired source tagged text file in the preferred text editor
    NOTE: If the source tagged text file is not recognized by the operating system, right-click the file, go to Open With and choose the preferred text editor.
  2. Within the opened source tagged text file, look for recurring tags (or other markup) at the beginning and ending of any translatable text and note these recurring tags/markup
    • EXAMPLE 1: Below, note that <string> opens translatable text and </string> closes translatable text.
    <string>Text to translate</string>
    <string>More text to translate</string>
    <string>Final text to translate</string>
    • EXAMPLE 2: Below, note that (equals symbol followed by a space) opens translatable text (everything before is not relevant) and (end of line) closes translatable text.
    String Text to translate¶
    String More text to translate¶
    String Final text to translate¶
    NOTE: Both · (space formatting mark) and (end of line formatting mark) in all examples are used for visual convenience and will not necessarily be visible in your text editor.
    • EXAMPLE 3: Below, note that in some cases <string> opens translatable text and </string> closes translatable text, and in other cases (equals symbol followed by a space) opens translatable text (everything before is not relevant) and (end of line) closes translatable text.
    <string>Text to translate</string>
    String Text to translate¶
    <string>More text to translate</string>
    String More text to translate¶
  3. Leave the source tagged text file open in the preferred text editor in preparation for the next section below

Add extraction rules to the new rules file

  1. Switch to the open rules file (re-open it if it is not still open) and add each set of opening/closing tags (or other markup) as a prefix-suffix extraction rule pair
    • Using EXAMPLE 1 from above, <string> is prefix 1 and </string> is suffix 1
    paragraphPrefix.1=<string>
    paragraphSuffix.1=</string>
    • Using EXAMPLE 2 from above, (equals symbol followed by a space) is prefix 1 and (end of line) is suffix 1.
    NOTE: (?m)$ must be used to represent (end of line) in extraction rules.[1]
    paragraphPrefix.1=
    paragraphSuffix.1=(?m)$
    • Using EXAMPLE 3 from above, <string> is prefix 1 and </string> is suffix 1, and (equals symbol followed by a space) is prefix 2 and (end of line) is suffix 2.
    NOTE: (?m)$ must be used to represent (end of line) in extraction rules.
    paragraphPrefix.1=<string>
    paragraphSuffix.1=</string>
    paragraphPrefix.2=
    paragraphSuffix.2=(?m)$
    NOTE: In the case above where multiple prefix/suffix extraction rule pairs are required, each additional pair must use next highest number (e.g., a third pair would be paragraphPrefix.3=paragraphSuffix.3=), and rule pairs are processed in the order given.
    • OPTION: Some translatable text may include HTML tags within it. In this case, a prefix-suffix extraction rule pair can optionally include the extraction rule paragraphFormat.1=html-included, which ensures HTML tags are converted to internal WFP tags and are thus not translatable nor included in word counts. If EXAMPLE 1 from directly above contained translatable text with HTML tags, the optional rule would be a third line added to the prefix-suffix pair and written as follows:
    paragraphPrefix.1=<string>
    paragraphSuffix.1=</string>
    paragraphFormat.1=html-included
  2. Save the updated rules file, leave it open in the preferred text editor, and proceed to either the WFP or WFP3 section below according to the tool in use

Add a file format filter to Wordfast Pro and import source tagged text file

NOTE: Skip this section and go to the next if you are using WFP3.
  1. Switch to the opened source tagged text file (re-open it if it is not still open) and determine its encoding
    • METHOD 1: Check the first line of the file for an XML declaration, which states the encoding (see below in bold). If there is no XML declaration, proceed to METHOD 2.
    <?xml version="1.0" encoding="UTF-8"?>
    • METHOD 2: Check the text editor for the encoding of the open file (refer to the text editor's documentation for help). See below for how to do this in Windows Notepad.
    Go to File > Save As and locate the Encoding drop-down at the bottom of the Save As dialog box. Once the encoding has been determined, cancel the Save As dialog.
  2. Open Wordfast Pro
  3. Choose Create Project and set up the project as usual, but DO NOT add the source tagged text file yet or complete the wizard
  4. Still in the wizard, choose Create Filter
  5. From the Choose file format drop-down, select Text based filter (*.xml)
  6. In the Filter Name field, type a simple, appropriate name for the new filter
  7. From the Encoding drop-down, select the one that matches what you noted in step 1 of this section
  8. From the Target Encoding drop-down, select the same encoding as the source tagged text file
  9. Next to Extraction Rules, choose the Browse... button, then locate and open your rules (.properties) file
  10. In the Extension field, type the source file extension (without preceding period) and click OK
  11. Choose Add File, locate the folder containing the source tagged text file, select it, and choose Open (or simply drag and drop the source tagged text file into the files pane)
  12. In the Type column, choose the drop-down next to the source tagged text file and select your new filter's name
  13. Select Create Project to complete the wizard as usual
NOTE: Skip the next section and go to the "Make any rules file adjustments/corrections" section after it if you are using WFP.

Add a file format filter to Wordfast Pro 3 and import source tagged text file

NOTE: Skip this section and go to the next if you are using WFP.
  1. Switch to the opened source tagged text file (re-open it if it is not still open) and determine its encoding
    • METHOD 1: Check the first line of the file for an XML declaration, which states the encoding (see below in bold). If there is no XML declaration, proceed to METHOD 2.
    <?xml version="1.0" encoding="UTF-8"?>
    • METHOD 2: Check the text editor for the encoding of the open file (refer to the text editor's documentation for help). See below for how to do this in Windows Notepad.
    Go to File > Save As and locate the Encoding drop-down at the bottom of the Save As dialog box. Once the encoding has been determined, cancel the Save As dialog.
  2. Open Wordfast Pro 3
  3. Choose Edit then Preferences
  4. In the Preferences list on the left, go to Translations > Filters > Formats
  5. To the right of Available Formats, choose New
  6. In the New Format dialog, scroll down and choose Text Based Filter
  7. In the Filter Name field, type a simple, appropriate name for the new filter
  8. From Source Encoding drop-down, select the one that matches what you noted in step 1 of this section
  9. Select Target Encoding option, then select the same encoding as the source tagged text file from the drop-down
  10. Next to Conversion rules, choose the ... button, then locate and open your rules (.properties) file
  11. In the Extension field, type *. followed by the source file extension (e.g., *.txt) and click OK
  12. If necessary, create a project from File > Create Project
  13. With the appropriate project active, choose File > Open File, locate the folder containing the source tagged text file, select it, and choose Open (or simply drag and drop the source tagged text file into WFP3's Editor Perspective)
  14. In the Choose Format dialog that appears, select your new filter's name and then OK

Make any rules file adjustments/corrections

NOTE: If WFP extracted translatable segments from the source tagged text file as expected, skip this section and proceed to translation. Otherwise, continue below.
  1. ISSUE: File opens in WFP, but unintended text is included or some translatable text is missing.
    • SOLUTION: This usually means that the prefix-suffix extraction rule pair is too general (unintended text is included) or too specific (translatable text is missing). Start by reopening the rules file AND the source tagged text file in the preferred text editor. Switch to the imported source tagged text file already open in WFP, note a segment with unintended text or a location where translatable text is missing, then return to the source tagged text file in the preferred text editor and locate that item. Once located, it should help to determine how to adjust the rules file. After each rules file adjustment, start a new project with the source tagged text file and updated rules file to see if the adjustment fixed the issue.
    • EXAMPLE: In the three lines below, <string> generally starts translatable text and </string> generally ends translatable text. However, the red text in the second and third lines is problematic.
    1. <string>Text to translate</string> │ EXTRACTION = Text to translate
    2. <string "length=4">More text to translate</string> │ EXTRACTION = (nothing)
      • The opening tag variation in red prevents text extraction. A regular expression (regex) needs to be used to capture any variation of the opening tag, such as the highlighted part of the example below.
        paragraphPrefix.1=<string[^>]*>
    3. <string>Final text to translate<reserved>DO NOT TRANSLATE</reserved></string> │ EXTRACTION = Final text to translate<reserved>DO NOT TRANSLATE</reserved>
      • The <reserved> tag pair and the untranslatable text they encase fall between the prefix-suffix <string> tag pair, so are included in the translatable text. To completely exclude the <reserved> tag pair and any text between them, add an externalTag.N= line after the prefix and suffix lines, add the <reserved> tag pair, and use a regex between them to capture the text (see number one below; regex is highlighted). To include this content for context but make it untranslatable as WFP tags, add an internalTag.N= line after the prefix and suffix lines, add the <reserved> tag pair, and use a regex between them to capture the text (see number two below; regex is highlighted).
      1. paragraphPrefix.1=<string>
        paragraphSuffix.1=</string>
        externalTag.1=<reserved>.*?</reserved>
      2. paragraphPrefix.1=<string>
        paragraphSuffix.1=</string>
        internalTag.1=<reserved>.*?</reserved>
  2. ISSUE: Source tagged text file does not open and an error message is displayed indicating there is no translatable text.
    • SOLUTION: This usually means there is an issue with the prefix or suffix text due to a typo, misspelling, or prefix/suffix choice. Start by reopening the rules file AND the source tagged text file in the preferred text editor. Go to the rules file and check for typos or misspellings in the prefix and suffix text. Correct any issues that are found and save the rules file. If none are found, go to the source tagged text file in the preferred text editor and double-check that translatable text is indeed located between the prefix and suffix text that you specified in the rules file. Again, correct any issues that are found and save the rules file. After each rules file adjustment, start a new project with the source tagged text file and updated rules file to see if the adjustment fixed the issue. WFP3 NOTE: If the source tagged text file has an associated TXML file next to it, delete it before importing the source tagged text file again. Otherwise, WFP3 will simply reopen the previously created and problematic TXML instead of generating a new one from the updated rules file.
    • EXAMPLE: The two items below illustrate the issues described above if the sample source tagged text is <oops></oops><string>Text to translate</string>.
    1. paragraphPrefix.1=<strin>
      paragraphSuffix.1=</string>
      • The typo (i.e., the missing "g") in the prefix prevents any translatable text from being extracted.
    2. paragraphPrefix.1=<oops>
      paragraphSuffix.1=</oops>
      • Mistakenly choosing the <oops> tag, which does not encase any translatable text in this example, prevents anything from being extracted. Switching it to the correct <string> tag fixes the issue.
  3. ISSUE: Source tagged text file does not open, or project creation action appears to do nothing. An error message may or may not be displayed.
    • SOLUTION: This usually means there is an issue with a regular expression (regex) in the rules file. Regexes can be used, for example, to specify variations in prefixes, suffixes, external tags, and/or internal tags, or to capture chunks of general text (see ISSUE 1's second and third EXAMPLE items above for instances of how this is implemented). Go to the rules file and check any regexes for typos, incorrect syntax, or incomplete sets/groups (e.g., items that require parentheses or square brackets). Correct any issues that are found and save the rules file. After each rules file adjustment, start a new project with the source tagged text file and updated rules file to see if the adjustment fixed the issue. WFP3 NOTE: If the source tagged text file has an associated TXML file next to it, delete it before importing the source tagged text file again. Otherwise, WFP3 will simply reopen the previously created and problematic TXML instead of generating a new one from the updated rules file.
    • EXAMPLE: The two items below illustrate the issues described above if the sample source tagged text is <string "length=3">\"Text to\n translate\"</string>.
    1. paragraphPrefix.1=<string[^>*>
      • The stated prefix above includes a regex (to capture any variation of the opening tag) whose exclude character set [^] is missing its closing square bracket between the two red characters. The fix needed in order for WFP to understand the regex is then <string;[^>]*>.
    2. internalTag.1=\[^\s]
      • The regex above is intended to locate within translatable text a literal backslash character followed by any single character that is NOT a white space ([^] = any character NOT specified within the square brackets; \s = any white space character, such as a space or tab). Anything the expression locates is then marked up as an untranslatable WFP tag, so \" from the sample source tagged text above would be marked up. However, regexes processed by Java, the programming language of WFP, must escape any backslashes by preceding them with a backslash. Thus, [^\s] from internalTag.1= above becomes [^\\s] in a rules file. The second problem is that the engine that interprets regular expressions has its own requirement of escaping a literal backslash character by preceding it with a backslash, so the first \ from internalTag.1= above becomes \\. For Java to understand, both of those backslashes must be escaped with a backslash, thus yielding \\\\ in a rules file. The final fix for WFP to understand the entire regex is then \\\\[^\\s].

References

  1. By adding (?m) at the beginning of the line, you turn on the multiline regex flag, which makes ^ and $ work as the beginning/end of line.