Talend tXMLMap gets all shards of an element without loop
I am looking into Talend and I need XML to CSV mapping which I cannot do right.
I have a document like this (really simplified):
<Report>
<Payment>
<Reference Type="1" Value="wathever11"/>
<Reference Type="2" Value="wathever12"/>
<Amount value="222"/>
</Payment>
<Payment>
<Reference Type="1" Value="wathever22"/>
<Reference Type="3" Value="wathever23"/>
<Amount value="123242"/>
</Payment>
<Payment>
<Reference Type="3" Value="wathever33"/>
<Reference Type="2" Value="wathever32"/>
<Amount value="12976"/>
</Payment>
<Payment>
<Reference Type="1" Value="wathever41"/>
<Reference Type="4" Value="wathever44"/>
<Amount value="1456"/>
</Payment>
<Payment>
<Amount value="12976"/>
</Payment>
</Report>
So I can't have any of the multiple links that can contain up to 4 different types.
I need to convert this to CSV like this:
ReferenceType1, ReferenceType2, ReferenceType3, ReferenceType4, Amount
whatever11 , whatever12 , , , 222
, whatever22 , whatever23 , , 123242
, whatever32 , whatever33 , , 12976
whatever41 , , , whatever44 , 1456
, , , , 12976
To achieve this, I created this simple structure:
In XMLFileInput, this is the config:
Where payment is a document
In tXMLMap ive set the document structure like this:
Report
βββPayment (loop) First case, see below
βββ Reference (loop) Second case, see below
| βββ @Type
| βββ @Value
βββ Amount
βββ @Value
In the output table, I create a column for each link with an expression like this:
[row1.Payment:/Payment/Reference/@Type].equals("1")? [row1.Payment:/Payment/Reference/@Value]: ""
But I cannot match all links in one line. If I set the cycle at the pay level, I get this:
ReferenceType1, ReferenceType2, ReferenceType3, ReferenceType4, Amount
whatever11 , , , , 222
, , , , 123242
, , , , 12976
whatever41 , , , , 1456
, , , , 12976
If the loop is set to basic, I get a new line for each link, like this:
ReferenceType1, ReferenceType2, ReferenceType3, ReferenceType4, Amount
whatever11 , , , , 222
, whatever12 , , , 222
, whatever22 , , , 123242
, , whatever23 , , 123242
, , whatever33 , , 12976
, whatever32 , , , 12976
whatever41 , , , , 1456
, , , whatever44 , 1456
And there is no line for an unreferenced element.
There must be a way to achieve the expected result. I tried to create an intermediate step by mapping each reference to another table along with a sequential number to further combine the columns into one row.
Although I am not finished, I think this might be the way to go. But the link is not the only element I need it for. I have a lot of elements where this happens in real xml and the number of intermediate tables as it would be would be large.
So I prefer to ask experts to try to find an easier way to do this.
Any ideas?
source to share
Good news: this is possible with tExtractXMLField. Check out my suggested solution:
My output was based on your input XML file, but it is different from the expected output, please double check carefully:
.-------------------+-------------------+-------------------+-------------------+------.
| tLogRow_1 |
|=------------------+-------------------+-------------------+-------------------+-----=|
|referenceTypeValue1|referenceTypeValue2|referenceTypeValue3|referenceTypeValue4|amount|
|=------------------+-------------------+-------------------+-------------------+-----=|
|wathever11 |wathever12 |null |null |222 |
|wathever22 |null |wathever23 |null |123242|
|null |wathever32 |wathever33 |null |12976 |
|wathever41 |null |null |wathever44 |1456 |
|null |null |null |null |12976 |
'-------------------+-------------------+-------------------+-------------------+------'
Bad news: This is not possible with tXMLMap in Talend 5.3.1 for me. I tried to put this in the expression field in tXMLMap: [row2.doc: / Report / Payment / Reference [@Type = 1] / @ Value] But then my work won't be generated anymore due to extra brackets
source to share