Extract text content from a cell (in bold, italic, etc.)

I am trying to extract text content from Excel using a macro. This is my code:

Dim i As Integer, j As Integer
Dim v1 As Variant
Dim Txt As String

v1 = Range("A2:C15")
For i = 1 To UBound(v1)
    For j = 1 To UBound(v1, 2)
        Txt = Txt & v1(i, j)
    Next j
    Txt = Txt & vbCrLf
Next i

MsgBox Txt

      

But it only shows raw characters in the sense that it does not display formatting information such as bold, italic, underline, etc.

I want to extract text along with formatting information.

Example: This is sample text

Expected Output: This is sample text

Actual output: This is sample text

Can someone explain what is wrong with the code and tell me what is wrong?

+3


source to share


3 answers


Ok, let's make the algorithm from @stucharo a bit easier to extend.

Public Function getHTMLFormattedString(r As Range) As String

 isBold = False
 isItalic = False
 isUnderlined = False
 s = ""
 cCount = 0
 On Error Resume Next
 cCount = r.Characters.Count
 On Error GoTo 0

 If cCount > 0 Then

  For i = 1 To cCount

   Set c = r.Characters(i, 1)

   If isUnderlined And c.Font.Underline = xlUnderlineStyleNone Then
    isUnderlined = False
    s = s & "</u>"
   End If

   If isItalic And Not c.Font.Italic Then
    isItalic = False
    s = s & "</i>"
   End If

   If isBold And Not c.Font.Bold Then
    isBold = False
    s = s & "</b>"
   End If


   If c.Font.Bold And Not isBold Then
    isBold = True
    s = s + "<b>"
   End If

   If c.Font.Italic And Not isItalic Then
    isItalic = True
    s = s + "<i>"
   End If

   If Not (c.Font.Underline = xlUnderlineStyleNone) And Not isUnderlined Then
    isUnderlined = True
    s = s + "<u>"
   End If

   s = s & c.Text

   If i = cCount Then
    If isUnderlined Then s = s & "</u>"
    If isItalic Then s = s & "</i>"
    If isBold Then s = s & "</b>"
   End If

  Next i

 Else
  s = r.Text
  If r.Font.Bold Then s = "<b>" & s & "</b>"
  If r.Font.Italic Then s = "<i>" & s & "</i>"
  If Not (r.Font.Underline = xlUnderlineStyleNone) Then s = "<u>" & s & "</u>"
 End If

 getHTMLFormattedString = s
End Function

      

To be clear, this function only works with a range that contains one cell. But it should be easy to call this function for every cell in the larger range and concatenate the returned rows into one.



Edit with OP:

I named the function with the following code:

Sub ReplaceFormattingTags()

Dim i As Integer, j As Integer
Dim rng As Range
Dim Txt As String

Set rng = Range("A2:C15")
For i = 1 To rng.Rows.Count
    For j = 1 To rng.Columns.Count
        Txt = Txt & getHTMLFormattedString(rng(i, j)) & " "
    Next j
    Txt = Txt & vbCrLf
Next i

Debug.Print Txt

End Sub

      

+1


source


Formatting is not allowed in the messagebox without changing the system defaults, which is not a first-class approach. If you want to display formatted text in an invitation, you are probably easiest to create a custom form and format your shortcut accordingly.

For example, you can determine if a cell has bold fomatting using:

Dim isBold As Boolean
isBold = v1(i, j).Font.Bold

      

And apply this to the font of the custom shape label using:

label.Font.Bold = isBold

      

If you want to output to a text file (i.e. a file), then this cannot store formatting information. The best you could hopefully achieve is to create a markup style output where:

If isBold Then
    txt = "<b >mytext< /b>" 'Ignore the spaces
Else
    txt = "mytext"
End If

      



The property range.Font.Bold

has three return parameters:

v1(i, j).Font.Bold = True  'if the entire cell IS bold
v1(i, j).Font.Bold = False 'if the entire cell IS NOT bold
v1(i, j).Font.Bold = Null  'if the cell is PARTIALLY bold

      

The call IsNull(v1(i, j).Font.Bold)

will tell you if you have partial fomatting in the cell. Unfortunately, you must then evaluate each character in the string individually to identify bold characters. This function should detect where bold formatting is turned on or off in the line contained in the passed Range

object and add the appropriate markup tag:

Function markup(rng As Range) As String

Dim chr As Integer
Dim isCharBold As Boolean
Dim str As String
Dim tempChar As Characters

isCharBold = False
str = ""

If IsNull(rng.Font.Bold) Then
    For chr = 1 To rng.Characters.Count
        Set tempChar = rng.Characters(chr, 1)
        If isCharBold Then
            If tempChar.Font.Bold Then
                str = str + tempChar.Text
            Else
                isCharBold = False
                str = str & "</b>" & tempChar.Text
            End If
        Else
            If tempChar.Font.Bold Then
                isCharBold = True
                str = str + "<b>" & tempChar.Text
            Else
                str = str & tempChar.Text
            End If
        End If
    Next chr
Else
    str = rng.Value
End If

markup = str

End Function

      

Note that case Else

just returns the default string values. You can change this approach to work for any of the properties .Font

, eg. strikethrough, underline, italic ...

The structure in the OP assumes that you are assigning the contents of a range of cells to an array of type Variant

. This essentially leaves you with an unformatted character string at each index of the array. In this case, you cannot extract formatting from the array strings. To access a property Characters().Font.Bold

you have to work with an object Range

, so your best bet is to iterate over each cell in Range("A2:C15")

directly. This can be achieved by modifying the source code as such, so it now calls the markup function:

Sub OutputText()

Dim i As Integer, j As Integer
Dim rng As Range
Dim Txt As String

Set rng = Range("A2:C15")
For i = 1 To rng.Rows.Count
    For j = 1 To rng.Columns.Count
        Txt = Txt & markup(rng(i, j)) & " "
    Next j
    Txt = Txt & vbCrLf
Next i

Debug.Print Txt

End Sub

      

+2


source


VBA string does not support formatting. It will cleanly take a string out of the range. No formatting at all. If you want to format a string, you won't be able to see it through msgbox.

The only way to do this is to store it in a cell and then format the cell. But then it doesn't give you the output as a formatted string message.

If you plan to then place the string in a formatted cell, you will need to save the formatting somewhere, or copy it from the cell you received the text from. And then apply formatting to the cell

+1


source







All Articles