Is it possible to skip nodes in XML file in Excel VBA?

I am completely new to all this XML and VBA. I honestly do everything based on my knowledge of oop, but the current project I am working on needs to be done in VBA.
I have searched everywhere but I cannot find and cannot think of any practical way to do what I have to do right now, so I ask all knowledge here to help if possible.

So basically I have an XML file that looks something like this:

<a name="something" >
   <b name="something">
      <c>
         <d>number1</d>
         <e>number2</e>
         <f>
            <g>number3</g>
            <h>number4</h>
         </f>
       </c>
    </b>
</a>

      

My problem is I need to generate XML content in an Excel sheet. I know how to do this, however what I end up with is something like:

something
something
number1 number2 number3
number4
number1
number2
number3
number4 number3 number4

I want to somehow get rid of the italicized lines.

I use recursion in subroutines because it's necessary. However, when I check the child nodes and the node value to be printed, I get all the child nodes and their node values, which themselves do not have a specific value.

I know I can skip nodes manually using baseName, but that's not what I'm looking for. If you could do your own function or something more general that could be applied to any case of this event, it would be nice.

Thanks for the help!

EDIT: (The code I have)

Sub Main()
    Dim XDoc As MSXML2.DOMDocument
    Set XDoc = New MSXML2.DOMDocument

    Set mainWorkBook = ActiveWorkbook
    mainWorkBook.Sheets("Sheet1").Clear

    Dim point As IXMLDOMSelection

    Filename = ThisWorkbook.Worksheets("Sheet1").Range("A1").Value
    XDoc.Load (Filename)

    Set point = XDoc.SelectNodes("/*")
    Call ProcessChildNodes(point(0))
End Sub


Sub PrintNodeValue(Node As IXMLDOMNode)
    If (Node.Attributes.Length = 0) Then
        Row = Row + 1
        mainWorkBook.Sheets("Sheet1").Cells(Row, 1).Value = Node.Text
    End If
End Sub


Sub PrintAttributesValue(Node As IXMLDOMNode)
    If (Node.Attributes.Length <> 0) Then
        Row = Row + 1
        For j = Node.Attributes.Length - 1 To 0 Step -1
            strng = Node.Attributes.Length
            mainWorkBook.Sheets("Sheet1").Cells(Row, strng - j).Value = Node.Attributes(j).Text
        Next
    End If
End Sub


Sub ProcessChildNodes(Node As IXMLDOMNode) 
    If (Node.HasChildNodes) Then
        For m = 0 To Node.ChildNodes.Length - 1
            If Node.ChildNodes(m).NodeType <> NODE_TEXT Then
                Call PrintNodeValue(Node.ChildNodes(m))
                Call PrintAttributesValue(Node.ChildNodes(m))
                Call ProcessChildNodes(Node.ChildNodes(m))
            End If
        Next
    Else
    End If
End Sub

      

+3


source to share


1 answer


If you understood correctly that you want to ignore nodes with

  • no text input
  • and attribute

in your example <c>

and <f>

.


The only trick I have found is to process the XML directly. So we first look for a unique parameter that only has a node <c>

and <f>

has no other node. So I found the following rules:

  • node must not have an attribute
  • node must be executed directly by another node (no text)

This means that after <c>

and <f>

there is only another tag starting with <

.

Theory

  • If we look at the XML node code <c>

    we get with it Node.xml

    , it looks like this:

    <c>
        <d>number1</d>
        <e>number2</e>
        <f>
            <g>number3</g>
            <h>number4</h>
        </f>
    </c>
    
          

  • to make handling easier, we flatten out line breaks, tabs and spaces, so we end up here:

    <c><d>number1</d><e>number2</e><f><g>number3</g><h>number4</h></f></c>
    
          

  • Now we just need to check if the first node tag is <c>

    directly followed by another tag starting with <

    . So we find the first one >

    and see what follows <

    . If this is true, then node can be omitted since it has no attribute and does not have one.



Let the coding start

  • We do everything in a procedure PrintNodeValue

    and start by declaring a variable and getting the raw XML of the actually processed node:

    Dim xml as String
    xml = Node.xml
    
          

  • We flatten this xml

    and remove all line breaks, tabs and spaces

    xml = Replace(xml, vbCrLf, vbNullString)
    xml = Replace(xml, vbTab, vbNullString)
    xml = Replace(xml, " ", vbNullString)
    
          

    xml

    Now <c><d>number1</d><e>number2</e><f><g>number3</g><h>number4</h></f></c>

  • We will remove the first tag

    xml = Right(xml, Len(xml) - InStr(1, xml, ">"))
    
          

    and see if our 2 rules apply (no attribute and another tag)

    If (Node.Attributes.Length = 0) And Left(xml, 1) <> "<" Then
    
          

So we're done with ...

Sub PrintNodeValue(Node As IXMLDOMNode)
    Dim xml As String
    xml = Node.xml                                  'get raw xml
    xml = Replace(xml, vbCrLf, vbNullString)        'strip off line breaks
    xml = Replace(xml, vbTab, vbNullString)         'strip off tabs
    xml = Replace(xml, " ", vbNullString)           'strip off spaces
    xml = Right(xml, Len(xml) - InStr(1, xml, ">")) 'strip off first tag

    If (Node.Attributes.Length = 0) And Left(xml, 1) <> "<" Then 'check our 2 rules
        iRow = iRow + 1
        mainWorkBook.Sheets("Sheet1").Cells(iRow, 1).Value = Node.Text
    End If
End Sub

      

which produces ...

something
something
number1
number2
number3
number4

Please note that you may need to change vbCrLf

to vbCr

or vbLf

depending on which system, for example. Windows, Linux or Mac, originally an XML file (they use different line breaks). To be on the safe side, you can also delete all 3.

xml = Replace(xml, vbCrLf, vbNullString)
xml = Replace(xml, vbCr, vbNullString)
xml = Replace(xml, vbLf, vbNullString)

      

+1


source







All Articles