Difference between revisions of "Useful regular expressions (Regex)"

From Wordfast Wiki
Jump to: navigation, search
m
 
(10 intermediate revisions by 3 users not shown)
Line 1: Line 1:
A regular expression (regex or regexp) is a sequence of characters that define a search pattern. They can be very helpful for filtering out segments in the TXLF Editor or for find/replace operations. Check out [https://en.wikipedia.org/wiki/Regular_expression#Basic_concepts this article] for a more detailed explanation of the history of regular expressions and how they work.
+
A '''regular expression (regex''' or '''regexp)''' is a sequence of characters that define a search pattern.<ref>Check out [https://en.wikipedia.org/wiki/Regular_expression#Basic_concepts this article] for a more detailed explanation of the history of regular expressions and how they work.</ref> They can be very helpful in the Editor View for both the filter bar (used to show/hide specific segments) and the Find/Replace dialog. Below are a few examples of useful regexes. In order to use them in [[Wordfast Pro]], make sure any available '''regex''' option is enabled accordingly.
  
Below are a few examples of useful regex. Make sure to tick the regex box accordingly.
+
==Filter Bar: Showing/Hiding Certain Segments==
  
__TOC__
+
===Hide number-only segments===
 
 
==Hide all number-only segments==
 
  
 
Use the following regex in the segment filtering bar:
 
Use the following regex in the segment filtering bar:
  
^(([0-9][^\n]*[^0-9])|([^0-9][^\n]*[0-9])|([^0-9]?[^\n]*[^0-9]))$
+
^(([0-9][^\n]*[^0-9])|([^0-9][^\n]*[0-9])|([^0-9]?[^\n]*[^0-9]))$
  
 
This regex will also hide numbers with punctuation (decimals, etc.)
 
This regex will also hide numbers with punctuation (decimals, etc.)
  
 
+
===Show only number-only segments===
==Show only number-only segments==
 
  
 
Use the following regex in the segment filtering bar:
 
Use the following regex in the segment filtering bar:
  
^(?:(?:-|–|(?:(?:\$|€|£)(?:\h)?))?(?:\d{1,3})(?:\h|,|\.|(?:(?:\h)?(?:%|\$|€|£)))?)+$
+
^(?:(?:-|–|(?:(?:\$|€|£)(?:\h)?))?(?:\d{1,3})(?:\h|,|\.|(?:(?:\h)?(?:%|\$|€|£)))?)+$
  
 
If you have numbers like 8,675,309.00 that need to be replaced with 8.675.309,00, you can copy all sources to target with the filter applied, then apply a 3-step find and replace:
 
If you have numbers like 8,675,309.00 that need to be replaced with 8.675.309,00, you can copy all sources to target with the filter applied, then apply a 3-step find and replace:
1. Find . and replace with DUMMY
+
# Find '''.''' and replace with '''DUMMY'''
2. Find , and replace with .
+
# Find ''',''' and replace with '''.'''
3. Find DUMMY and replace with ,
+
# Find '''DUMMY''' and replace with ''','''
 +
 
 +
===Show only thousands, ten thousands or hundred thousands surrounded by parentheses===
 +
 
 +
If you negative monetary figures in a financial report, they are generally surrounded by parentheses like this (1,234) or (10,123) or (100,123).
 +
 
 +
Use the following regex in the segment filtering bar to filter out these segments when the divider is a non-breaking space:
 +
 +
\(\d{1,3}∘\d{3}\)
 +
 
 +
And this one if the divider is a comma:
 +
 
 +
\(\d{1,3},\d{3}\)
 +
 
 +
With the filter in place, copy all sources, then find ∘ and replace with , or vice versa. (∘ represents a non-breaking space).
 +
 
 +
===Show only segments containing ''numbers in a specific format''===
 +
 
 +
Numbers that need to be converted to a different format can be found within any segment containing other text. Depending on the source number format you need to convert, use the corresponding regex from option A, B, or C in the table below to show only those segments for later conversion.
  
 +
{| class="wikitable"
 +
|-
 +
! style="color:black" | Option
 +
! style="color:black" | Thousands
 +
! style="color:black" | Decimal
 +
! style="color:black" | Example
 +
! style="color:black" | Regex
 +
|-
 +
|A
 +
|Space<sup>*</sup>
 +
|Comma
 +
| style="font-weight: bold" |1<mark style="background-color: brown; color:white; font-size: x-large"> </mark>000<mark style="background-color: brown; color:white; font-size: x-large">,</mark>00
 +
|<nowiki>(?:\d{1,3})(?:(?:(?:(?<!,\d{3})( |\u00A0|\u2009|\u202F)(?=\d{3}[^\d])))|(?:(?<!\.\d{3}),(?=\d(?!\d+(,\d|\.\d)))))</nowiki>
 +
|-
 +
|B
 +
|Dot
 +
|Comma
 +
| style="font-weight: bold" |1<mark style="background-color: brown; color:white; font-size: x-large">.</mark>000<mark style="background-color: brown; color:white; font-size: x-large">,</mark>00
 +
|<nowiki>(?:\d{1,3})(?:(?:(?<!,\d{3})\.(?=\d{3}[^\d]))|(?:(?<!\d( |\u00A0|\u2009|\u202F)\d{3}),(?! |\d{3}(,[^ ]|\.[^ ]))))</nowiki>
 +
|-
 +
|C
 +
|Comma
 +
|Dot
 +
| style="font-weight: bold" |1<mark style="background-color: brown; color:white; font-size: x-large">,</mark>000<mark style="background-color: brown; color:white; font-size: x-large">.</mark>00
 +
|<nowiki>(?:\d{1,3})(?:(?:(?<!([\d]( |\u00A0|\u2009|\u202F)|(\.))\d{3}),(?=\d{3}[^\d]))|(?:(?<!\d( |\u00A0|\u2009|\u202F)\d{3})\.(?!$| |\d{3}(,[^ ]|\.[^ ]))))</nowiki>
 +
|}
 +
<sup>&#42;</sup> <span style="font-size: smaller">Includes no-break space (U+00A0), thin space (U+2009), and narrow no-break space (U+202F)</span>
  
==Invert currency symbols==
+
==Filter Bar: Combine with Find/Replace==
 +
 
 +
===Replace ''numbers in a specific format''===
 +
 
 +
If you want to convert numbers between the formats in the table above, locate which options you want to convert between in the table below and follow the respective steps.
 +
 
 +
{| class="wikitable"
 +
|-
 +
! style="color:black" |(A) 1<mark style="background-color: brown; color:white; font-size: x-large"> </mark>000<mark style="background-color: brown; color:white; font-size: x-large">,</mark>00<br />
 +
to<br />
 +
(C) 1<mark style="background-color: brown; color:white; font-size: x-large">,</mark>000<mark style="background-color: brown; color:white; font-size: x-large">.</mark>00
 +
! style="color:black" |(B) 1<mark style="background-color: brown; color:white; font-size: x-large">.</mark>000<mark style="background-color: brown; color:white; font-size: x-large">,</mark>00<br />
 +
to<br />
 +
(C) 1<mark style="background-color: brown; color:white; font-size: x-large">,</mark>000<mark style="background-color: brown; color:white; font-size: x-large">.</mark>00
 +
! style="color:black" |(C) 1<mark style="background-color: brown; color:white; font-size: x-large">,</mark>000<mark style="background-color: brown; color:white; font-size: x-large">.</mark>00<br />
 +
to<br />
 +
(A) 1<mark style="background-color: brown; color:white; font-size: x-large"> </mark>000<mark style="background-color: brown; color:white; font-size: x-large">,</mark>00
 +
! style="color:black" |(C) 1<mark style="background-color: brown; color:white; font-size: x-large">,</mark>000<mark style="background-color: brown; color:white; font-size: x-large">.</mark>00<br />
 +
to<br />
 +
(B) 1<mark style="background-color: brown; color:white; font-size: x-large">.</mark>000<mark style="background-color: brown; color:white; font-size: x-large">,</mark>00
 +
|- style="vertical-align:top;"
 +
|
 +
# Copy option '''(A)''' regex from table above<br /><mark style="background-color: LemonChiffon;">NOTE</mark>: Remaning steps are in WFP
 +
# With a fresh file (target segments all empty) open in ''Editor View'', enable ''Regex'' option in segment filter bar
 +
# Paste option '''(A)''' regex into filter field next to ''Regex'' option, press Enter
 +
# After filter is applied, go to ''Translation'' tab and click ''Copy All Sources''
 +
# Press Ctrl+H to show ''Find/Replace'' dialog
 +
# Enable ''Use Regex'' and ''Search Target'' options, disable ''Search Source'' option
 +
# Paste following expression into ''Find Next:'' field
 +
#;<nowiki>(\d{1,3})(?: |\u00A0|\u2009|\u202F)(\d)</nowiki>
 +
# Paste following expression into the ''Replace with:'' field
 +
#;<nowiki>$1DUMMY$2</nowiki>
 +
# Click ''Replace All''
 +
# Paste following expression into ''Find Next:'' field
 +
#;<nowiki>(\d),(\d)</nowiki>
 +
# Paste following expression into ''Replace with:'' field
 +
#;<nowiki>$1.$2</nowiki>
 +
# Click ''Replace All''
 +
# Paste following expression into ''Find Next:'' field
 +
#;<nowiki>DUMMY(?=.*\d{3}\.)</nowiki>
 +
# Paste following comma into ''Replace with:'' field
 +
#;<span style="font-size: large">,</span>
 +
# Click ''Replace All''<br /><mark style="background-color: LemonChiffon;">NOTE</mark>: Stop here unless source numbers had thousand''th'' place spaces; otherwise, also do 16–18.
 +
# Paste following expression into ''Find Next:'' field
 +
#;DUMMY
 +
# Clear ''Replace with:'' field and leave empty
 +
# Click ''Replace All''
 +
|
 +
# Copy option '''(B)''' regex from table above<br /><mark style="background-color: LemonChiffon;">NOTE</mark>: Remaning steps are in WFP
 +
# With a fresh file (target segments all empty) open in ''Editor View'', enable ''Regex'' option in segment filter bar
 +
# Paste option '''(B)''' regex into filter field next to ''Regex'' option, press Enter
 +
# After filter is applied, go to ''Translation'' tab and click ''Copy All Sources''
 +
# Press Ctrl+H to show ''Find/Replace'' dialog
 +
# Enable ''Use Regex'' and ''Search Target'' options, disable ''Search Source'' option
 +
# Paste following expression into ''Find Next:'' field
 +
#;<nowiki>(\d{1,3})\.(\d)</nowiki>
 +
# Paste following expression into the ''Replace with:'' field
 +
#;<nowiki>$1DUMMY$2</nowiki>
 +
# Click ''Replace All''
 +
# Paste following expression into ''Find Next:'' field
 +
#;<nowiki>(\d),(\d)</nowiki>
 +
# Paste following expression into ''Replace with:'' field
 +
#;<nowiki>$1.$2</nowiki>
 +
# Click ''Replace All''
 +
# Paste following expression into ''Find Next:'' field
 +
#;DUMMY
 +
# Paste following comma into ''Replace with:'' field
 +
#;<span style="font-size: large">,</span>
 +
# Click ''Replace All''
 +
|
 +
# Copy option '''(C)''' regex from table above<br /><mark style="background-color: LemonChiffon;">NOTE</mark>: Remaning steps are in WFP
 +
# With a fresh file (target segments all empty) open in ''Editor View'', enable ''Regex'' option in segment filter bar
 +
# Paste option '''(C)''' regex into filter field next to ''Regex'' option, press Enter
 +
# After filter is applied, go to ''Translation'' tab and click ''Copy All Sources''
 +
# Press Ctrl+H to show ''Find/Replace'' dialog
 +
# Enable ''Use Regex'' and ''Search Target'' options, disable ''Search Source'' option
 +
# Paste following expression into ''Find Next:'' field
 +
#;<nowiki>(\d{1,3}),(\d)</nowiki>
 +
# Paste following expression into the ''Replace with:'' field
 +
#;<nowiki>$1DUMMY$2</nowiki>
 +
# Click ''Replace All''
 +
# Paste following expression into ''Find Next:'' field
 +
#;<nowiki>(\d)\.(\d)</nowiki>
 +
# Paste following expression into ''Replace with:'' field
 +
#;<nowiki>$1,$2</nowiki>
 +
# Click ''Replace All''
 +
# Paste following expression into ''Find Next:'' field
 +
#;DUMMY
 +
# Paste following highlighted ''narrow no-break space'' character into ''Replace with:'' field
 +
#;<mark style="background-color: brown; color:white; font-size: large"> </mark>
 +
# Click ''Replace All''
 +
|
 +
# Copy option '''(C)''' regex from table above<br /><mark style="background-color: LemonChiffon;">NOTE</mark>: Remaning steps are in WFP
 +
# With a fresh file (target segments all empty) open in ''Editor View'', enable ''Regex'' option in segment filter bar
 +
# Paste option '''(C)''' regex into filter field next to ''Regex'' option, press Enter
 +
# After filter is applied, go to ''Translation'' tab and click ''Copy All Sources''
 +
# Press Ctrl+H to show ''Find/Replace'' dialog
 +
# Enable ''Use Regex'' and ''Search Target'' options, disable ''Search Source'' option
 +
# Paste following expression into ''Find Next:'' field
 +
#;<nowiki>(\d{1,3}),(\d)</nowiki>
 +
# Paste following expression into the ''Replace with:'' field
 +
#;<nowiki>$1DUMMY$2</nowiki>
 +
# Click ''Replace All''
 +
# Paste following expression into ''Find Next:'' field
 +
#;<nowiki>(\d)\.(\d)</nowiki>
 +
# Paste following expression into ''Replace with:'' field
 +
#;<nowiki>$1,$2</nowiki>
 +
# Click ''Replace All''
 +
# Paste following expression into ''Find Next:'' field
 +
#;DUMMY
 +
# Paste following dot into ''Replace with:'' field
 +
#;<span style="font-size: large">.</span>
 +
# Click ''Replace All''
 +
|}
 +
 
 +
==Find/Replace==
 +
 
 +
===Find thousands separated by a non-breaking space and replace NBSP with a comma===
 +
 
 +
Find:
 +
 
 +
(\d{1,3})∘(\d{3})
 +
 
 +
Replace with:
 +
 
 +
$1,$2
 +
 
 +
Make sure to search target segments only, test on a few segments before replacing all, and run this operation twice to account for numbers in the millions.
 +
 
 +
===Invert currency symbols===
  
 
Say you have a lot of monetary values like 103,50€ in your document and you want to globally find/replace with €103.50, how would you do this?
 
Say you have a lot of monetary values like 103,50€ in your document and you want to globally find/replace with €103.50, how would you do this?
  
Open the Find/Replace function and be sure to tick the '''Use Regex''' box.
+
Open the ''Find/Replace'' function and be sure to tick the '''Use Regex''' box.
  
 
Type the following regex in the ''Find what'' field:
 
Type the following regex in the ''Find what'' field:
  
(^[^,]+?)(,)([^€]+?)(€)
+
(^[^,]+?)(,)([^€]+?)(€)
  
 
Type the following regex in the ''Replace with'' field:
 
Type the following regex in the ''Replace with'' field:
  
\€$1\.$3
+
\€$1\.$3
 
 
 
 
  
----
+
'''NOTE''': This only works for values up to 999. Values in the thousands will need another regex operation to replace comma/space/decimal with comma/space/decimal.
Version: Wordfast Pro 5.8<br>
 
Operating System: macOS
 
  
--[[User:John|John]], 17 May 2019
+
==References==
  
[[Category:Wordfast Pro 5]]
+
[[Category:Wordfast Pro]]

Latest revision as of 18:55, 2 June 2021

A regular expression (regex or regexp) is a sequence of characters that define a search pattern.[1] They can be very helpful in the Editor View for both the filter bar (used to show/hide specific segments) and the Find/Replace dialog. Below are a few examples of useful regexes. In order to use them in Wordfast Pro, make sure any available regex option is enabled accordingly.

Filter Bar: Showing/Hiding Certain Segments

Hide number-only segments

Use the following regex in the segment filtering bar:

^(([0-9][^\n]*[^0-9])|([^0-9][^\n]*[0-9])|([^0-9]?[^\n]*[^0-9]))$

This regex will also hide numbers with punctuation (decimals, etc.)

Show only number-only segments

Use the following regex in the segment filtering bar:

^(?:(?:-|–|(?:(?:\$|€|£)(?:\h)?))?(?:\d{1,3})(?:\h|,|\.|(?:(?:\h)?(?:%|\$|€|£)))?)+$

If you have numbers like 8,675,309.00 that need to be replaced with 8.675.309,00, you can copy all sources to target with the filter applied, then apply a 3-step find and replace:

  1. Find . and replace with DUMMY
  2. Find , and replace with .
  3. Find DUMMY and replace with ,

Show only thousands, ten thousands or hundred thousands surrounded by parentheses

If you negative monetary figures in a financial report, they are generally surrounded by parentheses like this (1,234) or (10,123) or (100,123).

Use the following regex in the segment filtering bar to filter out these segments when the divider is a non-breaking space:

\(\d{1,3}∘\d{3}\)

And this one if the divider is a comma:

\(\d{1,3},\d{3}\)

With the filter in place, copy all sources, then find ∘ and replace with , or vice versa. (∘ represents a non-breaking space).

Show only segments containing numbers in a specific format

Numbers that need to be converted to a different format can be found within any segment containing other text. Depending on the source number format you need to convert, use the corresponding regex from option A, B, or C in the table below to show only those segments for later conversion.

Option Thousands Decimal Example Regex
A Space* Comma 1 000,00 (?:\d{1,3})(?:(?:(?:(?<!,\d{3})( |\u00A0|\u2009|\u202F)(?=\d{3}[^\d])))|(?:(?<!\.\d{3}),(?=\d(?!\d+(,\d|\.\d)))))
B Dot Comma 1.000,00 (?:\d{1,3})(?:(?:(?<!,\d{3})\.(?=\d{3}[^\d]))|(?:(?<!\d( |\u00A0|\u2009|\u202F)\d{3}),(?! |\d{3}(,[^ ]|\.[^ ]))))
C Comma Dot 1,000.00 (?:\d{1,3})(?:(?:(?<!([\d]( |\u00A0|\u2009|\u202F)|(\.))\d{3}),(?=\d{3}[^\d]))|(?:(?<!\d( |\u00A0|\u2009|\u202F)\d{3})\.(?!$| |\d{3}(,[^ ]|\.[^ ]))))

* Includes no-break space (U+00A0), thin space (U+2009), and narrow no-break space (U+202F)

Filter Bar: Combine with Find/Replace

Replace numbers in a specific format

If you want to convert numbers between the formats in the table above, locate which options you want to convert between in the table below and follow the respective steps.

(A) 1 000,00

to
(C) 1,000.00

(B) 1.000,00

to
(C) 1,000.00

(C) 1,000.00

to
(A) 1 000,00

(C) 1,000.00

to
(B) 1.000,00

  1. Copy option (A) regex from table above
    NOTE: Remaning steps are in WFP
  2. With a fresh file (target segments all empty) open in Editor View, enable Regex option in segment filter bar
  3. Paste option (A) regex into filter field next to Regex option, press Enter
  4. After filter is applied, go to Translation tab and click Copy All Sources
  5. Press Ctrl+H to show Find/Replace dialog
  6. Enable Use Regex and Search Target options, disable Search Source option
  7. Paste following expression into Find Next: field
    (\d{1,3})(?: |\u00A0|\u2009|\u202F)(\d)
  8. Paste following expression into the Replace with: field
    $1DUMMY$2
  9. Click Replace All
  10. Paste following expression into Find Next: field
    (\d),(\d)
  11. Paste following expression into Replace with: field
    $1.$2
  12. Click Replace All
  13. Paste following expression into Find Next: field
    DUMMY(?=.*\d{3}\.)
  14. Paste following comma into Replace with: field
    ,
  15. Click Replace All
    NOTE: Stop here unless source numbers had thousandth place spaces; otherwise, also do 16–18.
  16. Paste following expression into Find Next: field
    DUMMY
  17. Clear Replace with: field and leave empty
  18. Click Replace All
  1. Copy option (B) regex from table above
    NOTE: Remaning steps are in WFP
  2. With a fresh file (target segments all empty) open in Editor View, enable Regex option in segment filter bar
  3. Paste option (B) regex into filter field next to Regex option, press Enter
  4. After filter is applied, go to Translation tab and click Copy All Sources
  5. Press Ctrl+H to show Find/Replace dialog
  6. Enable Use Regex and Search Target options, disable Search Source option
  7. Paste following expression into Find Next: field
    (\d{1,3})\.(\d)
  8. Paste following expression into the Replace with: field
    $1DUMMY$2
  9. Click Replace All
  10. Paste following expression into Find Next: field
    (\d),(\d)
  11. Paste following expression into Replace with: field
    $1.$2
  12. Click Replace All
  13. Paste following expression into Find Next: field
    DUMMY
  14. Paste following comma into Replace with: field
    ,
  15. Click Replace All
  1. Copy option (C) regex from table above
    NOTE: Remaning steps are in WFP
  2. With a fresh file (target segments all empty) open in Editor View, enable Regex option in segment filter bar
  3. Paste option (C) regex into filter field next to Regex option, press Enter
  4. After filter is applied, go to Translation tab and click Copy All Sources
  5. Press Ctrl+H to show Find/Replace dialog
  6. Enable Use Regex and Search Target options, disable Search Source option
  7. Paste following expression into Find Next: field
    (\d{1,3}),(\d)
  8. Paste following expression into the Replace with: field
    $1DUMMY$2
  9. Click Replace All
  10. Paste following expression into Find Next: field
    (\d)\.(\d)
  11. Paste following expression into Replace with: field
    $1,$2
  12. Click Replace All
  13. Paste following expression into Find Next: field
    DUMMY
  14. Paste following highlighted narrow no-break space character into Replace with: field
  15. Click Replace All
  1. Copy option (C) regex from table above
    NOTE: Remaning steps are in WFP
  2. With a fresh file (target segments all empty) open in Editor View, enable Regex option in segment filter bar
  3. Paste option (C) regex into filter field next to Regex option, press Enter
  4. After filter is applied, go to Translation tab and click Copy All Sources
  5. Press Ctrl+H to show Find/Replace dialog
  6. Enable Use Regex and Search Target options, disable Search Source option
  7. Paste following expression into Find Next: field
    (\d{1,3}),(\d)
  8. Paste following expression into the Replace with: field
    $1DUMMY$2
  9. Click Replace All
  10. Paste following expression into Find Next: field
    (\d)\.(\d)
  11. Paste following expression into Replace with: field
    $1,$2
  12. Click Replace All
  13. Paste following expression into Find Next: field
    DUMMY
  14. Paste following dot into Replace with: field
    .
  15. Click Replace All

Find/Replace

Find thousands separated by a non-breaking space and replace NBSP with a comma

Find:

(\d{1,3})∘(\d{3})

Replace with:

$1,$2

Make sure to search target segments only, test on a few segments before replacing all, and run this operation twice to account for numbers in the millions.

Invert currency symbols

Say you have a lot of monetary values like 103,50€ in your document and you want to globally find/replace with €103.50, how would you do this?

Open the Find/Replace function and be sure to tick the Use Regex box.

Type the following regex in the Find what field:

(^[^,]+?)(,)([^€]+?)(€)

Type the following regex in the Replace with field:

\€$1\.$3

NOTE: This only works for values up to 999. Values in the thousands will need another regex operation to replace comma/space/decimal with comma/space/decimal.

References

  1. Check out this article for a more detailed explanation of the history of regular expressions and how they work.