+Tools
|
+Tools (say "PlusTools") has two functions:
Click here for the installer (Windows 10/11 recommended)
Most CAT tools have little to no utilities to manage Translation Memories and glossaries.
Some tools provide a simplistic lookup function, and little else.
+Tools' aim is to bring a speadsheet-like familiar interface for browsing and searching, and support TMs that by far exceed spreadsheet capacity (millions of lines).
The other aim is to provide utilities designed and optimized for linguistic databases.
Powerful utilities are available:
The list of features is constanty growing. Terminology extraction is in preparation.
Note: +Tools is currenty in beta stage, and free. Use at your own risk.
Overall, the browser paradigm is used, so you don't need to learn yet another User Interface.
Data is presented in a spreadsheet way so again, you are in a familar UI. The F2 key toggles between source/target view (2-column), or all fields.
Click a + symbol to add a tab, a x symbol to close a tab.
+Tools can open up to 5 tabs.
TM stands for Translation Memory
Glossary is frequently used instead of the more glorious "Terminology Database" for the sake of compacity.
TMX is the acronym for Translation Memory eXchange, a TM standard widely used in the translation/localization industry, maintained by the OASIS group.
TBX is the acronym for Term Base eXchange, a terminology standard widely used in the translation/localization industry, maintained by the OASIS group.
WFS means Wordfast Server, a TM and terminology (glossary) server marketed by Wordfast LLC. Note that WFS is free for personal use (up to three simultaneously connected users). It's one the best-kept secrets in the industry, bringing immense power to users with a technical mind.
WFC means Wordfast Classic, a widely used translation tool which first appeared in 1999, and whose TM/glo formats (unchanged since 1999) probably are the most user-friendly.
+Tools is programmed by Yves Champollion.Who
Suggestions, bug reports, spanking can be directed to yves@champollion.netWhy
Maintaining CAT tool data (TMs and Glossaries) is a problem for power users, project managers, even freelance translators with years of activity, who have accumulated precious assets.
One free and open-source utility is Olifant, a member of the Okapi framework,
with Yves Savourel as the main impulse behind the project.
Olifant supports TMX and the WFC format. But it's not meant for glossaries, and can only handle TMs of modest sizes.
Here are issues power users deal with every day:
To select/unselect individual lines, create a filter first with Ctrl+L, or with the ☰ Menu > Filter.
Without a filter | |
Delete one line | Delete key |
Add one line | Insert key |
Edit a cell | Enter. After edition, Enter confirms changes, Escape cancels changes. |
Edit an entire line | Shift+Ctrl+Enter. Note that with an XML file, this is a raw edition, you must pay attention to marked-up content, tags, XML-forbidden characters, etc. Do not edit XML files if you are not familiar with XML, TMX, TBX. |
Find | Ctrl+F (Ctrl+Up/Down to find next/previous) |
Find-replace | Ctrl+H |
Filters | Ctrl+L (Filter, Concordance, Suspicious TUs, Redundant TUs) |
Columns | F2 Toggles all-column, or 2-column view. Right-click column header to reset column widths |
To end of file | Ctrl+End |
To start of file | Ctrl+Home |
Paste all lines from the clipboard | Ctrl+V |
With a filter | |
Select all | Ctrl+A |
Unselect all | Ctrl+B |
Reverse-select all | Ctrl+R |
Select current line | Space bar (It's a toggle to Select/Unselect) |
Copy | Ctrl+C Copies all selected lines to the clipboard, and overwrites it. |
Delete lines | Ctrl+Delete |
The clipboard is just another file named clipboard.tmp, and survives closing/opening +Tools. The clipboard can be opened as any other file. As with most clipboards, every Copy action rewrites it.
The approach here is to statistically spot & remove garbage. On close inspection, you may fault the software, finding false positives. You can manually unselect "garbage" that is legit, however, when the TM size is over 10,000, not to mention 100,000 or a million units, that is not feasible. The idea, with very large TMs, is that, if spotting garbage is 90% reliable, and garbage is frequent, a TM that's free from 90% of its garbage (risking the deletion of a couple legit units) is still better than a dirty monster. Everything comes at a price, even cleanliness.
Note:
- +Tools lets you specify whether the unselected line (which will not be deleted with Ctrl+Delete) is the youngest, or oldest, in the series of siblings. Note that a custom filter can also be used to define that one "surviving" (unselected) line. In the Custom filter, two options (Is in List, or Is not in List) are specially geared toward that task. For example, you can opt for one particular User ID, or a list of those IDs, to be preferred, or avoided, in determining the one surviving line. Simply enter a list of User IDs, separated with commas, as argument.
- When you press Ctrl+Delete, all selected lines are deleted (except the one, per group of siblings, which is not selected).
- Lines are presented in a sorted display, so that siblings (groups of redundant lines) are listed together. Note that the TM or glossary is not actually sorted, this is only a visual thing.
Evaluating the aligner. If you see misalignments, try aligning the same content with other aligners, even those that come at a price, and let me know the overall result.
I may not respond to uncompared, undocumented "This/that was wrongly aligned".
See my note on the subject: Testing vs Using:Academical vs Professional.
Note:
- +Tools can use a list of names to be replaced with anonymous placeholders.
In the list, if a name ends with an equal sign =, whatever comes after will replace that name; otherwise, a random placeholder will be used. Placeholders are chosen to make linguistic sense, so that, if applicable, Machine Translation can recognize obfuscated items as names, rather than things. Note that for a name to be replaced, it must begin with a capital letter in the TM, or be all in capitals.- Email addresses and URLs can also be obfuscated.
- Figures can be obfuscated. If that is the case, +Tools intelligently treats figures. For example, dates are handled to remain valid as dates after obfuscation. This is important for MT training.
Note: the clipboard is just another file, and can be viewed as either a TM, or a glossary.
Note: Paste/Append between an XML-based format (TMX, TBX) and Wordfast Classic TXT:
This is possible from TXT to XML.
From XML to TXT, only the donor XML units with two language codes that match those of the recipient TXT file will be copied.
If you work across TMs and glossaries whose language codes are not compatible, be aware that copying data must respect language codes.
To connect to WFS TMs and glossaries, note that the HTTP/HTTPS mode is used.
Make sure your version of WFS is 1.14.742.274, or higher.
If that is not the case, just download WFS from https://www.wordfast.net/zip/WfServer.zip.
Note that WFS is free for up to three simultaneous connections. If you see the red "demo" mode flashing in WFS, but you have under 4 connections, WFS is actually working in full mode.
Recent versions of Wordfast Server are compatible with the standard REST interface model used by +Tools.
Under Setup > Network, check the Port HTTP Active checkbox. Make sure the computer where WfServer runs has the port 81 opened (or any other port you set up).
In the "Accounts" tab in WFS, you must check the "Allow raw calls" checkbox. The account and its assigned TM/Glossary should not restrict access, and not use encryption.
When the above is set and secured, +Tools can "talk" to WFS.
TMs | Glossaries | Accounts |
Users | Groups | Activity |
Sessions | Setup |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A connection string is the same as with WF Pro, Wf Anywhere, except that the wf:// part (the native WFS protocol) is replaced with http:// as follows:
wherehttp://accountName:passWord@11.12.13.14:81
For WFS locally running in the same machine as +Tools, the IP number would be 127.0.0.1, or "localhost" if you use a domain name.
Note that if WFS is "localhost" (located on your computer, IP 127.0.0.1), or in your company's intranet, HTTPS is not necessary, just use HTTP. If HTTPS is used, the workstation (a physical server) must be equipped with a valid SSL certificate for the IP, or domain name.
The TM and Account, defined under TM or Account, should not restrict, nor encrypt accesss.
In case of connection difficulty, make sure the "Accept raw calls" checkbox is checked (see above). For more complex issues, check out Troubleshooting section in WFS' manual. Note, however, that your protocol is http:// (not the default wf:// protocol as in the manual).
Press Ctrl+F for a quick-and-easy "full file" search mode. Ctrl+F searches the entire database: source and target segments, dates, user names, all meta data.
Note that Concordance ( ☰ Hambuger menu > Filter, or Ctrl+L) is different, as it works as a filter, only displaying TUs that contain all, or any of, the keyword(s) being searched.
With Wordfast Server, Concordance uses the TM's index, so it's much faster than the "brute force", regular search. TUs that pass the Concordance search in WFS are sorted, with those containing multiple words - the more relevant ones - on top. Concordance only retrieves up to one megabyte of text, which is still a few thousand TUs. If you search for very common words like and or the, you may reach that maximum. But Concordance is typically used to spot real words, usually rare or peculiar ones, not frequent "stopwords".
Clipboard.tmp contains any data you wrote into the clipboard with Ctrl+C. It can be deleted.
It can also be opened by +Tools as any other file. Which means that you can first apply a filter,
press Ctrl+C to copy filtered lines into the clipboard, open the clipboard,
repeat the operation to drill down a file with successive filters.
This shows the versatility in +Tools: you can apply successive layers of custom and/or preset filters to virtually extract whatever you want from a file.
A file that ends with EXPORT.txt contains data exported with a custom export. It can be discarded after use.
A FLT file (a file with a .flt extension) is a temporary file that contains the data found when applying a filter.
FLT files can be deleted; they will recreated by +Tools when the need occurs.
A BAK file is a backup of a file before a complete rewrite. A Search-replace operation, or a reorganization, entirely rewrites a file. Bak files can be deleted.
This is at beta stage, but used by some in production.
Click here for the installer (Windows 10/11 recommended)
+Tools is a desktop application designed as a possible successor to Wordfast Classic, in case Microsoft ends support for Office VBA, on which Wordfast Classic runs.
In beta stage, +Tools runs on Microsoft Windows.
Mac support: There are plans to release +Tools for the Mac.
Audience: +Tools is meant for solo freelance translators working directly for clients, with Microsoft DOCX, or PDF files (see the note on translating PDF). If jobs are delivered by a Project Manager (PM), please use the tool that your PM recommends.
I know Wordfast Classic (WFC). What are the differences between WFC and +Tools?
+Tools does not require Microsoft Word.
This is why you will see those red critters, so-called <1>tags<2>, inside opened segments.
+Tools tries to minimize the quantity of tags. With simple document formatting, tags should be rare.
At this early stage, +Tools implements a strict one-to-one tag verification, source to target.
Flexibility will be introduced for advanced use, allowing custom tags.
Other than that, you will find the simplicity that made WFC a success from the year 1999 onward.
The open-source, user-friendly WFC TXT format for TMs and glossaries is used.
If +Tools finds a pre-existing Wordfast Classic setup like wordfast.ini, it uses it.
What is the main difference between +Tools and other CAT tools?
+Tools has been patiently coded from scratch, instruction after instruction, to avoid an assemblage of bulky third-party libraries.
Wordfast Classic users still run TMs and glossaries created 25 years ago, and never had to upgrade or convert them.
I don't want to waste my time. What is +Tools missing compared to other CAT tools?
The intended audience, as described earlier, rarely, if ever, uses that level of sophistication. They expect a simple and efficient production tool.
The focus with +Tools, beside a blend of simplicity and efficiency, is modern Machine Translation. This is where +Tools shines, and outperforms the competition.
Does +Tools need the internet; is it a cloud application?
No. We have a cloud solution, another tool called Wordfast Anywhere. +Tools does not need the internet to run, unless you set it up to query Machine Translation from a web source - but that is optional. And even if you need web-based MT, +Tools offers a one-click offline mode: pre-MT the entire document in a minute, and the MT results are locally stored. From that point on, you can work offline for days, and still have MT support.
Can +Tools be integrated in an existing TMS or workflow?
It's coming, when we get past the proof-of-concept stage. +Tools' code is ready for that. If your TMS can bundle files into a zipped project package (document, TM, glossary, tool setup), you will be covered. The TM format, the glossary format, the tool's setup are all open source, and text-based. They are easy to create or manipulate with simple scripts. The underlying document format for +Tools is XLIFF, which is universally used in the translation industry. The DOCX format can be used too. Unlike other workflows, you can add the entire application, +Tools, into the project file, or include a link to download/install it in seconds.It looks like +Tools is based on XLIFF. Can I translate XLIFF files from other tools, like Wordfast Pro, MemoQ, Trados, etc?
That is intended. However, as long as +Tools is in beta stage, 100% compatibility cannot be guaranteed. TXLF, which is Wordfast Pro's XLIFF, is currently suppported with good results.
If you translate XLIFF, chances are, you are not directly working for a client, you are subcontracting for a translation agency. Use their recommended translation tool.
Let's get something behind us: no translation tool supports first-degree PDF translation. You will always convert PDF to an editable format, like DOCX. After translation, you either
Now for the two variants of PDF.
A. The PDF contains text.
Double-click the PDF to open it. Try to select just a few words with the mouse. If you can do that, and copy-paste the selected text, your PDF contains text. Otherwise, the PDF contains images of text (scans, screenshots, whatever they're called). Jump to B. The PDF contains images.
At that point, there are various ways to proceed:
- You have Microsoft Word installed.
- Open the PDF in Microsoft Word.
- Use File / Save as, select "DOCX"as save format.
- You have a Google account (you have one is you use gmail, or any other google service).
- Move your PDF to your Google Drive.
- Start Google Docs. It's one the many apps Google offers, click here: https://docs.google.com
- Open that PDF (which is now in Google Drive) in Google Docs for edition.
- Use File / Download, then select "DOCX" as download format.
B. The PDF contains images.
- You have a Wordfast Anywhere (WFA) account.
If you have a valid license to a paid Wordfast application, you have a WFA account.
Otherwise, a $1 WFA trial account will do fine. Note that the $1 trial work does not expire for PDF conversion.
Add the PDF to the documents in your current WFA project, using "add file", the small + icon.
After the conversion process, download the resulting DOCX.- Using www.zamzar.com, convert your PDF to DOCX. It's free for occasional & personal use.
This discussion touches the philosophy of coding.
There are two relevant categories of code to this discussion:
In category 1, it is acceptable that the software, when faced with a malformed container, attempts at salvaging what can be salvaged,
informing the user that it does so, or prompting the user for action.
In category 2, where the code is a step in a chain of actions, it must refuse malformed XML, unless it has a User Interface (UI) to prompt users for a decision. "Libraries" rarely have a UI, they simply return an error.
A case in point is the most used applications: internet browsers.
Browsers do not reject malformed HTML.
Browsers accept a certain degree of recuperable errors, and have mechanisms to work around.
The purpose is to display what can be displayed, which usually is most of the document.
+Tools falls in that last category, with exceptions.
If a TMX / TBX / TXT file, is malformed, +Tools attempts to step around malformed units, and display what can be displayed.
At that point, using "Reorganize", the file can be rewritten without the faulty units it contains.
That is done within the limits of what's possible.
+Tools uses other fault tolerance techniques to avoid walking away on users with the dreaded "File rejected: error at line X column Y".
Although geeks can fix and survive those errors, most people can't and won't, and wish that the rest of the file (probably 99% of it) could be saved.
Exception: With XLIFF, a malformed container is more difficult to salvage. XLIFF is part of a long chain, where various tools intervene: document text extraction, segmentation, translation, reconstruction of the final document. Fixing an error in XLIFF does not guarantee success in reconstructing the final document.
That is why malformed XLIFF files are rejected.
As of 2024, this is a hot topic, and it can be viewed from different angles, such as the human side, like the practice of translation, deontology, or the financial aspect of translation, or technology. The angle here is technology.
In the past three decades, a few disruptive technologies have forced CAT tools to deeply modify their behaviors. The latest one is AI. AI may bring drastic improvements in translation in two major areas that will impact the daily lives of translators: Machine Translation and Dictation. It may also improve areas that are less apparent to translators, such as tag handling and segmentation. At this time (2024) +Tools focuses on the MT side of AI. You will find +Tools pre-equipped with connectors to www.openai.com.
AI operators other than www.openai.com can be leveraged as well, using "Custom MT".
AI can propose different styles of speech. Those different styles are achieved by fine-tuning so-called prompts.
A "prompt" is how we, humans, tell the blind machine what we expect. It's a game of concision, clarity, rectitude.
+Tools demonstrates the power and versatility of AI by offering 5 different types of speech in its default MT setup. They are unchecked by default, so you have to activate them.
www.openai.com's AI-driven translation comes at a price, and requires an API key. However, in 2024, +Tools offers that for free.
If those options (A1, A2, A3, A4, A5) are checked under "Machine Translation", once a segment is opened (press F6 to re-MT an opened segment if needed),
you will see small buttons pop up.
Every button demonstrates a speech style. Click a button to see what happens. Click "All" to see all styles.
Feel free to fine-tune prompts, there's no limit. To do so, proceed to the general Setup, "Machine Translation", click one MT definition line, click "Edit".
You can add more speech styles, the rule is that their ID contain a number (A1, A2, etc.). To add a custom MT, select "Custom", click the line, then "Edit".
Here is a screenshot of PlusTools running two AI speech styles in addition to the default "Neutral" mode: DEI (genderless / inclusive), and casual. The source text was chosen to showcase AI's capacity to offer different speech types in translation:
+Tools beta is a desktop application - not a cloud application. By default, +Tools does not connect to the internet for any purpose:
+Tools allows connections to remote online Machine Translation services such as Microsoft Translator, Google Translate, deepL, etc. Those are opt-ins. By default, no remote Machine Translation services are connected.
+Tools never uses the internet without users' express consentment.
Obtaining the software in beta format, and using it, does not require users to provide any personal information.
Yves Champollion does not collect, store, share, recycle, disclose, any personal data it may become privy to.
Contact person for privacy matters: Yves Champollion
Yves Champollion adamantly opposes practices known as spamming, data collection, cookie collection, telemetry, analytics, etc.
Right of objection and redress:
All users have a right to get a communication of any data they believe is kept by Yves Champollion, a right to correct that information, or delete that information.
Please direct all privacy matters to:
Yves Champollion
44 rue Danton
94270 Le Kremlin-Bicetre
France
Or contact yves@champollion.net
All trademarks noted™ are the property of their respective owners.
Ms-Word™, Excel™, Access™, PowerPoint™ are trademarks of Microsoft Corp.