Is it possible to skip nodes in XML file in Excel VBA?

Question

Is it possible to skip nodes in XML file in Excel VBA?

I am completely new to all this XML and VBA. I honestly do everything based on my knowledge of oop, but the current project I am working on needs to be done in VBA.
I have searched everywhere but I cannot find and cannot think of any practical way to do what I have to do right now, so I ask all knowledge here to help if possible.

So basically I have an XML file that looks something like this:

<a name="something" >
   <b name="something">
      <c>
         <d>number1</d>
         <e>number2</e>
         <f>
            <g>number3</g>
            <h>number4</h>
         </f>
       </c>
    </b>
</a>

My problem is I need to generate XML content in an Excel sheet. I know how to do this, however what I end up with is something like:

something
something
number1 number2 number3
number4
number1
number2
number3
number4 number3 number4

I want to somehow get rid of the italicized lines.

I use recursion in subroutines because it's necessary. However, when I check the child nodes and the node value to be printed, I get all the child nodes and their node values, which themselves do not have a specific value.

I know I can skip nodes manually using baseName, but that's not what I'm looking for. If you could do your own function or something more general that could be applied to any case of this event, it would be nice.

Thanks for the help!

EDIT: (The code I have)

Sub Main()
    Dim XDoc As MSXML2.DOMDocument
    Set XDoc = New MSXML2.DOMDocument

    Set mainWorkBook = ActiveWorkbook
    mainWorkBook.Sheets("Sheet1").Clear

    Dim point As IXMLDOMSelection

    Filename = ThisWorkbook.Worksheets("Sheet1").Range("A1").Value
    XDoc.Load (Filename)

    Set point = XDoc.SelectNodes("/*")
    Call ProcessChildNodes(point(0))
End Sub


Sub PrintNodeValue(Node As IXMLDOMNode)
    If (Node.Attributes.Length = 0) Then
        Row = Row + 1
        mainWorkBook.Sheets("Sheet1").Cells(Row, 1).Value = Node.Text
    End If
End Sub


Sub PrintAttributesValue(Node As IXMLDOMNode)
    If (Node.Attributes.Length <> 0) Then
        Row = Row + 1
        For j = Node.Attributes.Length - 1 To 0 Step -1
            strng = Node.Attributes.Length
            mainWorkBook.Sheets("Sheet1").Cells(Row, strng - j).Value = Node.Attributes(j).Text
        Next
    End If
End Sub


Sub ProcessChildNodes(Node As IXMLDOMNode) 
    If (Node.HasChildNodes) Then
        For m = 0 To Node.ChildNodes.Length - 1
            If Node.ChildNodes(m).NodeType <> NODE_TEXT Then
                Call PrintNodeValue(Node.ChildNodes(m))
                Call PrintAttributesValue(Node.ChildNodes(m))
                Call ProcessChildNodes(Node.ChildNodes(m))
            End If
        Next
    Else
    End If
End Sub

+3

xml vba excel-vba xml-parsing excel

Smile Jul 27. 17 at 6:03

source to share

1 answer

Pᴇʜ · Accepted Answer · 2017-07-27T08:20:41+0000

If you understood correctly that you want to ignore nodes with

no text input
and attribute

in your example <c>

and <f>

.

The only trick I have found is to process the XML directly. So we first look for a unique parameter that only has a node <c>

and <f>

has no other node. So I found the following rules:

node must not have an attribute
node must be executed directly by another node (no text)

This means that after <c>

and <f>

there is only another tag starting with <

.

Theory

If we look at the XML node code <c>

we get with it Node.xml

, it looks like this:

<c>
    <d>number1</d>
    <e>number2</e>
    <f>
        <g>number3</g>
        <h>number4</h>
    </f>
</c>

to make handling easier, we flatten out line breaks, tabs and spaces, so we end up here:

<c><d>number1</d><e>number2</e><f><g>number3</g><h>number4</h></f></c>

Now we just need to check if the first node tag is <c>

directly followed by another tag starting with <

. So we find the first one >

and see what follows <

. If this is true, then node can be omitted since it has no attribute and does not have one.

Let the coding start

We do everything in a procedure PrintNodeValue

and start by declaring a variable and getting the raw XML of the actually processed node:

Dim xml as String
xml = Node.xml

We flatten this xml

and remove all line breaks, tabs and spaces

xml = Replace(xml, vbCrLf, vbNullString)
xml = Replace(xml, vbTab, vbNullString)
xml = Replace(xml, " ", vbNullString)

xml

Now <c><d>number1</d><e>number2</e><f><g>number3</g><h>number4</h></f></c>

We will remove the first tag

xml = Right(xml, Len(xml) - InStr(1, xml, ">"))

and see if our 2 rules apply (no attribute and another tag)

If (Node.Attributes.Length = 0) And Left(xml, 1) <> "<" Then

So we're done with ...

Sub PrintNodeValue(Node As IXMLDOMNode)
    Dim xml As String
    xml = Node.xml                                  'get raw xml
    xml = Replace(xml, vbCrLf, vbNullString)        'strip off line breaks
    xml = Replace(xml, vbTab, vbNullString)         'strip off tabs
    xml = Replace(xml, " ", vbNullString)           'strip off spaces
    xml = Right(xml, Len(xml) - InStr(1, xml, ">")) 'strip off first tag

    If (Node.Attributes.Length = 0) And Left(xml, 1) <> "<" Then 'check our 2 rules
        iRow = iRow + 1
        mainWorkBook.Sheets("Sheet1").Cells(iRow, 1).Value = Node.Text
    End If
End Sub

which produces ...

something
something
number1
number2
number3
number4

Please note that you may need to change vbCrLf

to vbCr

or vbLf

depending on which system, for example. Windows, Linux or Mac, originally an XML file (they use different line breaks). To be on the safe side, you can also delete all 3.

xml = Replace(xml, vbCrLf, vbNullString)
xml = Replace(xml, vbCr, vbNullString)
xml = Replace(xml, vbLf, vbNullString)

Is it possible to skip nodes in XML file in Excel VBA?

Theory

Let the coding start

So we're done with ...

More articles: