Difference between revisions of "From Text to Doc: a smarter approach Wordfast Classic"

From Wordfast Wiki
Jump to: navigation, search
(Created page with "The following macro attempts to rebuild a DOC-like document from a TXT document where all lines unconditionally end with a paragraph mark. Text copied from the Internet, or fr...")
 
 
Line 1: Line 1:
 
The following macro attempts to rebuild a DOC-like document from a TXT document where all lines unconditionally end with a paragraph mark.
 
The following macro attempts to rebuild a DOC-like document from a TXT document where all lines unconditionally end with a paragraph mark.
 +
 
Text copied from the Internet, or from PDF files, suffer from this common problem.
 
Text copied from the Internet, or from PDF files, suffer from this common problem.
 +
 
Note that there is no sure-fire way of "guessing" how paragraphs should be rebuilt. The following macro uses a few methods that usually give good results, rebuilding most paragraphs correctly. But the final result must be visually checked before professional use.
 
Note that there is no sure-fire way of "guessing" how paragraphs should be rebuilt. The following macro uses a few methods that usually give good results, rebuilding most paragraphs correctly. But the final result must be visually checked before professional use.
 +
  
 
<span style="font-family: Courier New; font-size: 8pt">Sub TextToDoc()</span>
 
<span style="font-family: Courier New; font-size: 8pt">Sub TextToDoc()</span>

Latest revision as of 07:55, 6 November 2017

The following macro attempts to rebuild a DOC-like document from a TXT document where all lines unconditionally end with a paragraph mark.

Text copied from the Internet, or from PDF files, suffer from this common problem.

Note that there is no sure-fire way of "guessing" how paragraphs should be rebuilt. The following macro uses a few methods that usually give good results, rebuilding most paragraphs correctly. But the final result must be visually checked before professional use.


Sub TextToDoc()

Dim S As Selection, D1 As Range, D2 As Range, IsPara As Boolean, T As String

If Windows.Count = 0 Then MsgBox "Sorry, no document open": Exit Sub

Set S = ActiveWindow.Selection: Set D1 = S.Range: Set D2 = S.Range

S.End = 0

Do While S.Start < S.StoryLength - 1

' Turn off screen refresh for better speed

Application.ScreenUpdating = False

IsPara = False

' We store the last letter of the line into the string T

S.MoveEndUntil vbCr: T = Trim(S.Text): T = Right(T, 1)

' A first attempt to determine if we do have an end of paragraph:

' the line ends with an end-of-sentence

If InStr(".!?", T) > 0 Then IsPara = True

If S.End < S.StoryLength - 3 Then

D1.SetRange S.End + 1, S.End + 2

If IsPara Then D2.SetRange S.End - 1, S.End Else D2.SetRange S.End - 2, S.End - 1

' If the last character of the line is lowercase and the first character of the next line is uppercase,

' we'll assume we've got a real paragraph.

' Disable this for languages that capitalize a lot, like German etc.

If D2.Characters(1).Case = wdLowerCase And D1.Characters(1).Case = wdUpperCase Then IsPara = True

' if the font name or size varies from the current line to the next, we'll also assume

' there's a new paragraph. Very often the case with text copied from PDF; not

' relevant with Txt files.

If S.Font.Name <> D1.Font.Name Then IsPara = True

If S.Font.Size <> D1.Font.Size Then IsPara = True

End If

' If we do not have a paragraph, then join the two lines into one and move on

If Not IsPara Then

S.Start = S.End: S.Delete: S.InsertAfter " "

Else

S.InsertParagraphAfter: S.MoveStart wdParagraph, 1: S.MoveStart wdParagraph, 1

End If

Loop

S.End = 0

MsgBox "Text to Doc conversion finished. Please check the document."

End Sub

Back to Wordfast Classic User Manual