Is it possible to skip nodes in XML file in Excel VBA?
I am completely new to all this XML and VBA. I honestly do everything based on my knowledge of oop, but the current project I am working on needs to be done in VBA.
I have searched everywhere but I cannot find and cannot think of any practical way to do what I have to do right now, so I ask all knowledge here to help if possible.
So basically I have an XML file that looks something like this:
<a name="something" >
<b name="something">
<c>
<d>number1</d>
<e>number2</e>
<f>
<g>number3</g>
<h>number4</h>
</f>
</c>
</b>
</a>
My problem is I need to generate XML content in an Excel sheet. I know how to do this, however what I end up with is something like:
something
something
number1 number2 number3
number4
number1
number2
number3
number4 number3 number4
I want to somehow get rid of the italicized lines.
I use recursion in subroutines because it's necessary. However, when I check the child nodes and the node value to be printed, I get all the child nodes and their node values, which themselves do not have a specific value.
I know I can skip nodes manually using baseName, but that's not what I'm looking for. If you could do your own function or something more general that could be applied to any case of this event, it would be nice.
Thanks for the help!
EDIT: (The code I have)
Sub Main() Dim XDoc As MSXML2.DOMDocument Set XDoc = New MSXML2.DOMDocument Set mainWorkBook = ActiveWorkbook mainWorkBook.Sheets("Sheet1").Clear Dim point As IXMLDOMSelection Filename = ThisWorkbook.Worksheets("Sheet1").Range("A1").Value XDoc.Load (Filename) Set point = XDoc.SelectNodes("/*") Call ProcessChildNodes(point(0)) End Sub Sub PrintNodeValue(Node As IXMLDOMNode) If (Node.Attributes.Length = 0) Then Row = Row + 1 mainWorkBook.Sheets("Sheet1").Cells(Row, 1).Value = Node.Text End If End Sub Sub PrintAttributesValue(Node As IXMLDOMNode) If (Node.Attributes.Length <> 0) Then Row = Row + 1 For j = Node.Attributes.Length - 1 To 0 Step -1 strng = Node.Attributes.Length mainWorkBook.Sheets("Sheet1").Cells(Row, strng - j).Value = Node.Attributes(j).Text Next End If End Sub Sub ProcessChildNodes(Node As IXMLDOMNode) If (Node.HasChildNodes) Then For m = 0 To Node.ChildNodes.Length - 1 If Node.ChildNodes(m).NodeType <> NODE_TEXT Then Call PrintNodeValue(Node.ChildNodes(m)) Call PrintAttributesValue(Node.ChildNodes(m)) Call ProcessChildNodes(Node.ChildNodes(m)) End If Next Else End If End Sub
source to share
If you understood correctly that you want to ignore nodes with
- no text input
- and attribute
in your example <c>
and <f>
.
The only trick I have found is to process the XML directly. So we first look for a unique parameter that only has a node <c>
and <f>
has no other node. So I found the following rules:
- node must not have an attribute
- node must be executed directly by another node (no text)
This means that after <c>
and <f>
there is only another tag starting with <
.
Theory
-
If we look at the XML node code
<c>
we get with itNode.xml
, it looks like this:<c> <d>number1</d> <e>number2</e> <f> <g>number3</g> <h>number4</h> </f> </c>
-
to make handling easier, we flatten out line breaks, tabs and spaces, so we end up here:
<c><d>number1</d><e>number2</e><f><g>number3</g><h>number4</h></f></c>
-
Now we just need to check if the first node tag is
<c>
directly followed by another tag starting with<
. So we find the first one>
and see what follows<
. If this is true, then node can be omitted since it has no attribute and does not have one.
Let the coding start
-
We do everything in a procedure
PrintNodeValue
and start by declaring a variable and getting the raw XML of the actually processed node:Dim xml as String xml = Node.xml
-
We flatten this
xml
and remove all line breaks, tabs and spacesxml = Replace(xml, vbCrLf, vbNullString) xml = Replace(xml, vbTab, vbNullString) xml = Replace(xml, " ", vbNullString)
xml
Now<c><d>number1</d><e>number2</e><f><g>number3</g><h>number4</h></f></c>
-
We will remove the first tag
xml = Right(xml, Len(xml) - InStr(1, xml, ">"))
and see if our 2 rules apply (no attribute and another tag)
If (Node.Attributes.Length = 0) And Left(xml, 1) <> "<" Then
So we're done with ...
Sub PrintNodeValue(Node As IXMLDOMNode) Dim xml As String xml = Node.xml 'get raw xml xml = Replace(xml, vbCrLf, vbNullString) 'strip off line breaks xml = Replace(xml, vbTab, vbNullString) 'strip off tabs xml = Replace(xml, " ", vbNullString) 'strip off spaces xml = Right(xml, Len(xml) - InStr(1, xml, ">")) 'strip off first tag If (Node.Attributes.Length = 0) And Left(xml, 1) <> "<" Then 'check our 2 rules iRow = iRow + 1 mainWorkBook.Sheets("Sheet1").Cells(iRow, 1).Value = Node.Text End If End Sub
which produces ...
something
something
number1
number2
number3
number4
Please note that you may need to change vbCrLf
to vbCr
or vbLf
depending on which system, for example. Windows, Linux or Mac, originally an XML file (they use different line breaks). To be on the safe side, you can also delete all 3.
xml = Replace(xml, vbCrLf, vbNullString) xml = Replace(xml, vbCr, vbNullString) xml = Replace(xml, vbLf, vbNullString)
source to share