CJK (Chinese, Japanese, Korean) Wordfast Classic

From Wordfast Wiki
Jump to: navigation, search

The following discussion concerns the WFC-generated data (like translation memories and glossaries). It does not concern documents. Ms-Word documents always support Unicode, and do not lose encoding. If there are issues, those are font (rendering) issues, or material brought into Ms-Word by copy-pasting alien material.

Unicode translation memories and glossaries should be used for translation where one of the two languages (source or target) is CJK. All versions of WFC after the year 2007 use only Unicode TMs and glossaries, so that should not be a worry.

Use path names and file names with latin, non-accented (English) letters only for TMs, glossaries, INI files, and the Ms-Word Startup path (as displayed in Word/Options/Default file paths/Startup). Try to keep TM and glossary file names under 32 letters, using English non-accented letters, preferably without spaces. WFC may not support folder and file names with unicode characters. If WFC malfunctions, this could be due to the Ms-Word Startup path containing unicode characters. If this is the case, create a folder, for example C:\Startup, or MyMac:Startup and copy Wordfast.dot there. Start Ms-Word, use the Tools/Options (or Office button/Options or Preferences) > Default folders dialog box to change Ms-Word's Startup folder to the one you just created. Close and restart Ms-Word.

If given the choice of Unicode flavour when you save a TM or glossary, select the simple "Unicode" (this can be just Unicode, or UTF-16) setting, not a language-specific encoding.

If you use Ms-Word XP (Ms-Word 2002), note that a notorious Ms-Word 2002 glitch prevents it from saving documents as Unicode (unless you specifically added that feature at installation time). In this case, export the TM to unicode. To do so, start the TM/Glossary editor, click "Tools", and run the "Rewrite as Unicode" special filter. Another workaround is to open an existing Unicode document, delete all its contents, paste your data into it, save it then rename it directly on disk.

In WFC's main window, next to the translation memory path and name, you should see the (CJK) mention. This mention appears if the source language code begins with either ZH-, JA-, or KO-. This mention is essential for WFC to switch to a mode compatible with Chinese, Japanese, or Korean.

Notes:

  • For Japanese, Chinese, and Korean, make sure the full-width (double-width) punctuation (like 。!?) are visible in the WFC/Setup/General "End-of-segment punctuation"setting. They should be automatically added there when you create a translation memory with JA, KO, or ZH in the source language (for example, ja-JP, zh-CN, etc.). If you do not see the Japanese or Chinese full stop, question mark, exclamation mark, select them in a document. Copy them (Ctrl+C). Open WFC. In the WFC/Setup/General "End-of-segment punctuation"setting, press Enter to edit the value, then paste your punctuation before the existing punctuations there (I advise not to delete the existing, latin punctuation).
  • For Japanese and Chinese, check at least the "An ESP without a trailing space ends a segment" rule in WFC/Setup/Seg, so that end-of-sentence punctuations that are not followed by a space may still be recognised as ending a sentence. This too is normally done automatically by WFC when the TM is CJK.
  • To have all target segments receive a specific font (a font that can display CJK characters), use the WFC/Setup/General "Target font" setting to specify the target font. But this is not necessary if your platform automatically adapts fonts to languages.

To have both Concordance search and glossaries displayed using a specific font, go to WFC/Setup/Pandor'as box. Add the parameter TermFont="MyFont" with the required font instead of MyFont.

If you open a glossary or a translation memory with Ms-Word and cannot read the text: select all text then apply a font that can display your language (a specific font, or a generic Unicode font). If you still cannot see text properly displayed, and all you see are question marks (????) then perhaps, at some stage, the file was saved as (rewritten) using a simple text or text-only or 8-bit ANSI format rather than Unicode. There is no way back. Make sure Unicode files remain Unicode at all times. This concerns the Text format used for translation memories and glossaries, not Ms-Word documents. Unicode is not relevant with the DOC file format.

If an Ms-Word document does not display your language properly, it's a font problem. Target segments must receive the proper font; see above for automatically applying a certain font to target segments.

Back to Wordfast Classic User Manual