Extract text content from a cell (in bold, italic, etc.)
I am trying to extract text content from Excel using a macro. This is my code:
Dim i As Integer, j As Integer
Dim v1 As Variant
Dim Txt As String
v1 = Range("A2:C15")
For i = 1 To UBound(v1)
For j = 1 To UBound(v1, 2)
Txt = Txt & v1(i, j)
Next j
Txt = Txt & vbCrLf
Next i
MsgBox Txt
But it only shows raw characters in the sense that it does not display formatting information such as bold, italic, underline, etc.
I want to extract text along with formatting information.
Example: This is sample text
Expected Output: This is sample text
Actual output: This is sample text
Can someone explain what is wrong with the code and tell me what is wrong?
source to share
Ok, let's make the algorithm from @stucharo a bit easier to extend.
Public Function getHTMLFormattedString(r As Range) As String
isBold = False
isItalic = False
isUnderlined = False
s = ""
cCount = 0
On Error Resume Next
cCount = r.Characters.Count
On Error GoTo 0
If cCount > 0 Then
For i = 1 To cCount
Set c = r.Characters(i, 1)
If isUnderlined And c.Font.Underline = xlUnderlineStyleNone Then
isUnderlined = False
s = s & "</u>"
End If
If isItalic And Not c.Font.Italic Then
isItalic = False
s = s & "</i>"
End If
If isBold And Not c.Font.Bold Then
isBold = False
s = s & "</b>"
End If
If c.Font.Bold And Not isBold Then
isBold = True
s = s + "<b>"
End If
If c.Font.Italic And Not isItalic Then
isItalic = True
s = s + "<i>"
End If
If Not (c.Font.Underline = xlUnderlineStyleNone) And Not isUnderlined Then
isUnderlined = True
s = s + "<u>"
End If
s = s & c.Text
If i = cCount Then
If isUnderlined Then s = s & "</u>"
If isItalic Then s = s & "</i>"
If isBold Then s = s & "</b>"
End If
Next i
Else
s = r.Text
If r.Font.Bold Then s = "<b>" & s & "</b>"
If r.Font.Italic Then s = "<i>" & s & "</i>"
If Not (r.Font.Underline = xlUnderlineStyleNone) Then s = "<u>" & s & "</u>"
End If
getHTMLFormattedString = s
End Function
To be clear, this function only works with a range that contains one cell. But it should be easy to call this function for every cell in the larger range and concatenate the returned rows into one.
Edit with OP:
I named the function with the following code:
Sub ReplaceFormattingTags()
Dim i As Integer, j As Integer
Dim rng As Range
Dim Txt As String
Set rng = Range("A2:C15")
For i = 1 To rng.Rows.Count
For j = 1 To rng.Columns.Count
Txt = Txt & getHTMLFormattedString(rng(i, j)) & " "
Next j
Txt = Txt & vbCrLf
Next i
Debug.Print Txt
End Sub
source to share
Formatting is not allowed in the messagebox without changing the system defaults, which is not a first-class approach. If you want to display formatted text in an invitation, you are probably easiest to create a custom form and format your shortcut accordingly.
For example, you can determine if a cell has bold fomatting using:
Dim isBold As Boolean isBold = v1(i, j).Font.Bold
And apply this to the font of the custom shape label using:
label.Font.Bold = isBold
If you want to output to a text file (i.e. a file), then this cannot store formatting information. The best you could hopefully achieve is to create a markup style output where:
If isBold Then
txt = "<b >mytext< /b>" 'Ignore the spaces
Else
txt = "mytext"
End If
The property range.Font.Bold
has three return parameters:
v1(i, j).Font.Bold = True 'if the entire cell IS bold
v1(i, j).Font.Bold = False 'if the entire cell IS NOT bold
v1(i, j).Font.Bold = Null 'if the cell is PARTIALLY bold
The call IsNull(v1(i, j).Font.Bold)
will tell you if you have partial fomatting in the cell. Unfortunately, you must then evaluate each character in the string individually to identify bold characters. This function should detect where bold formatting is turned on or off in the line contained in the passed Range
object and add the appropriate markup tag:
Function markup(rng As Range) As String
Dim chr As Integer
Dim isCharBold As Boolean
Dim str As String
Dim tempChar As Characters
isCharBold = False
str = ""
If IsNull(rng.Font.Bold) Then
For chr = 1 To rng.Characters.Count
Set tempChar = rng.Characters(chr, 1)
If isCharBold Then
If tempChar.Font.Bold Then
str = str + tempChar.Text
Else
isCharBold = False
str = str & "</b>" & tempChar.Text
End If
Else
If tempChar.Font.Bold Then
isCharBold = True
str = str + "<b>" & tempChar.Text
Else
str = str & tempChar.Text
End If
End If
Next chr
Else
str = rng.Value
End If
markup = str
End Function
Note that case Else
just returns the default string values. You can change this approach to work for any of the properties .Font
, eg. strikethrough, underline, italic ...
The structure in the OP assumes that you are assigning the contents of a range of cells to an array of type Variant
. This essentially leaves you with an unformatted character string at each index of the array. In this case, you cannot extract formatting from the array strings. To access a property Characters().Font.Bold
you have to work with an object Range
, so your best bet is to iterate over each cell in Range("A2:C15")
directly. This can be achieved by modifying the source code as such, so it now calls the markup function:
Sub OutputText()
Dim i As Integer, j As Integer
Dim rng As Range
Dim Txt As String
Set rng = Range("A2:C15")
For i = 1 To rng.Rows.Count
For j = 1 To rng.Columns.Count
Txt = Txt & markup(rng(i, j)) & " "
Next j
Txt = Txt & vbCrLf
Next i
Debug.Print Txt
End Sub
source to share
VBA string does not support formatting. It will cleanly take a string out of the range. No formatting at all. If you want to format a string, you won't be able to see it through msgbox.
The only way to do this is to store it in a cell and then format the cell. But then it doesn't give you the output as a formatted string message.
If you plan to then place the string in a formatted cell, you will need to save the formatting somewhere, or copy it from the cell you received the text from. And then apply formatting to the cell
source to share