XML parsing in VB.Net fails due to special character

I have VB.Net code that parses an XML string.

The XML string comes from a third party TCP stream and therefore we have to take the data we receive and process. The problem is that one of the given items can sometimes contain special characters, for example. &, $, <and thus when "XMLDoc.LoadXml (XML)" is executed it fails - note that XMLDoc is configured as "Dim XMLDoc As XmlDocument = New XmlDocument ()".

Have tried google answers for this, but I am really trying to find a solution. Have a look at RegEX, but realized that it has some limitations; or I just don't understand it enough.

If it helps here give an XLM example that we would sink to us (just for information, the message tag comes from the SMS): - (if it helps the only bit that will ever have an error (and all I have to check ) in the section <Message>O&N</Message>

, so in this case the message came up with &)

<IncomingMessage><DeviceSendTime>19/02/2013 14:00:50</DeviceSendTime>
 <Sender>0000111111</Sender>
 <Status>New</Status>
 <Transport>Sms</Transport>
 <Id>-1</Id>
 <Message>O&N</Message>
 <Timestamp>19/02/2013 14:00:50</Timestamp>
 <ReadTimestamp>19/02/2013 14:00:50</ReadTimestamp>
</IncomingMessage>

      

+3


source to share


2 answers


If we look specifically inside elements Message

and assume that Message

there are no nested elements in the element:

Dim url = "put url here"
Dim s As String

Dim characterMappings = New Dictionary(Of String, String) From {
    {"&", "&amp;"},
    {"<", "&lt;"},
    {">", "&gt;"},
    {"""", "&quot;"}
}

Using client As New WebClient
    s = client.DownloadString(url)
End Using
s = Regex.Replace(s,
    "(?:<Message>).*?(" & String.Join("|", characterMappings.Keys) & ").*?(?:</Message>)",
    Function(match) characterMappings(match.Groups(1).Value)
)
Dim x = XDocument.Parse(s)

      

$

shouldn't be a problem with XML, but if you can add it to a dictionary.

The use WebClient

comes from here .



Update

Since it $

has special meaning in regular expressions, it cannot be simply added to the dictionary; it must be escaped using \

a regular expression pattern. The easiest way to do this is to write the template by hand, instead of inserting the keys into the dictionary:

s = Regex.Replace(s,
    "(?:<Message>).*?(&|<|>|\$).*?(?:</Message>)",
    Function(match) characterMappings(match.Groups(1).Value)
)

      

Also, I recommend Expresso for working with regular expressions.

+3


source


Your XML is invalid and hence not XML. Either fix the code that generates the XML (correct approach), or pretend it's a text file and all the trouble parsing unstructured text.



As you said in the question <Message>O&N</Message>

, XML is not valid. The most likely reason for this "XML" is to use string concatenation to construct it instead of using the correct XML manipulation techniques. Unless you are using some secret language , all languages in practice have built in or support a library for creating XML, so it shouldn't be difficult to create XML right.

+1


source







All Articles