Bullet points are not replaced - csv to xml

I am reading a CSV file and converting it to XML. The problem is bullet points, hyphens, etc. I am trying to replace "•" along with other characters that are not considered valid. When XML is generated, the bullet point is a square, in fact anything that is not recognized is a square. When I copy the "square" from the generated XML, all the "special" characters are "visible" as a black diamond with a question mark inside. It is represented as "" in the XML output. I tried:

int i = (int)'•';
Console.WriteLine(i);

      

and I see the value 8226.

So, I tried to replace \u8226

with "html for bullet" so that it displays correctly, but it doesn't work.

I read the original CSV this way:

string[] csvfile = File.ReadAllLines(inputFile).Skip(1).ToArray();

      

The files I'm reading won't be huge, so I'm reading into an array.

Then I split into "," to give me columns to convert to XML elements. If I open the file in Excel and replace it manually through Excel, no problem. I am getting the expected xml output. I would like to do this programmatically. I have no problem replacing inside the xml element with regular text, like this:

new XElement("elementName", columns[14].ToLower().Replace("yes", "1")

      

If I try:

new XElement("elementName", columns[14].ToLower().Replace("•", "htmlReplacement")

      

Nothing changed.

Any insight would be great!

Here is the code I'm using:

// regex patterns above to replace below - this works

        string inputFile = @"pathTo.csv";

        string[] csvfile = File.ReadAllLines(inputFile).Skip(1).ToArray();

        XNamespace xsi = XNamespace.Get("http://www.w3.org/2001/XMLSchema-instance");
        XNamespace xsiNsl = XNamespace.Get("something.xsd");

        XElement jobs = new XElement("Root",
            new XAttribute(XNamespace.Xmlns + "xsi", xsi.NamespaceName),
            new XAttribute(xsi + "noNamespaceSchemaLocation", xsiNsl),

            from line in csvfile
            //let columns = line.Replace(", ", ", ").Replace(",0", ",0").Split(',')

            let columns = Regex.Replace(Regex.Replace(Regex.Replace(Regex.Replace(line, dPat, rdPat), dPat2, rdPat2), dPat3, rdPat3), dPat4, rdPat4).Split(',')

            select new XElement("item",
                new XElement("column1", columns[0]),
                new XElement("Column2", columns[1]),
                new XElement("Column3", new XCData(columns[2].Replace("–", "-").Replace("•", "•").Replace("®", "®").Replace("©", "©"))),
                new XElement("Column4", new XCData(columns[3].Replace("–", "-").Replace("•", "•").Replace("®", "®").Replace("©", "©"))),
                new XElement("Column5", new XCData(columns[4].Replace("–", "-").Replace("\x0095", "• ").Replace("®", "®").Replace("©", "©").Replace("\n\n", "").Replace("\"", ""))),
                new XElement("column6", columns[5]),
                new XElement("column7", columns[6].Replace("/", "-")),
                new XElement("column8", columns[7]),
                new XElement("column 9", columns[8].Replace("$", "").Replace(" ", "").Replace(".00", "")),
                new XElement("column10", columns[9]),
                new XElement("column11", columns[10].Replace("/", "-")),
                new XElement("column12", columns[11].Replace("/", "-")),
                new XElement("column13", columns[12].ToLower().Replace("yes", "1").Replace("no", "0")),
                new XElement("column14", columns[13].ToLower().Replace("yes", "1").Replace("no", "0")),
                new XElement("column15", columns[14].ToLower().Replace("yes", "1").Replace("no", "0")),
                new XElement("column16", columns[15].ToLower().Replace("yes", "1").Replace("no", "0")),
                new XElement("column17", columns[16].ToLower().Replace("yes", "1").Replace(" ", "0")),
                new XElement("column18", columns[17]),
                new XElement("column19", columns[18]),
                new XElement("column20", columns[19])));

        jobs.Save(@"outputPathFor.xml");

      

The generated xml is expected, except for unrecognized characters that are not replaced. I tried using hex, but didn't replace them.

Thank!

+3


source to share


1 answer


You might want to make sure you have a more general way of escaping unicode characters from input (instead of calls string.Replace

). Like below:

public static IEnumerable<string> UnicodeXmlEscape(IEnumerable<string> input)
{
    var sb = new StringBuilder();
    foreach (var line in input)
    {
        // Loop through each character in the line to see if it
        // needs escaping.
        for (int i = 0; i < line.Length; i++)
        {
            if (char.IsSurrogatePair(line, i))
                // Escape in "&#xABC1234E" format
                sb.AppendFormat(@"&#x{0:x8}", char.ConvertToUtf32(line, i++)); // i++ to skip next one.
            else
            {
                int ci = char.ConvertToUtf32(line, i);
                if (ci > 127) 
                    // Escape in "&#xAB12" format
                    sb.AppendFormat(@"&#x{0:x4}", ci);
                else // regular ASCII
                    sb.Append(line[i]);
            }
        }
        yield return sb.ToString();
        sb.Clear();
    }
}

      

So this:

var escaped = UnicodeXmlEscape(new [] { 
    @"I'm trying to replace • along with other characters that are not being" 
});
foreach (var line in escaped)
    Console.WriteLine(line);

      

Will output the result below:

I'm trying to replace &#x2022 along with other characters that are not being

      



Please note that some Unicode characters are not legal in xml ( http://www.w3.org/TR/unicode-xml/ ). The above code does not check for their occurrence.

How to use this in your code

In your code, you can just use it like this to process every line read from the input file and "xml unicode escape".

var csvfile = UnicodeXmlEscape(File.ReadLines(inputFile).Skip(1)).ToArray();

      

To get the correct escaped strings that you can use as input to split the columns. No more need to do string.Replace

later.

0


source







All Articles