Is there a better way to replace non-ascii characters in C #

I have C # codes for stripping non-ASCII characters in an input text file and then outputting to a .NonAsciiChars text file. because the incoming file is in XML format and the return method can be LF ONLY or CRLF, so I am not doing the replacement line by line (I am using StreamReader.ReadToEnd ())

Now the problem is that the size of the incoming file is huge (about 2GB), I am getting the following error. is there a better way to do the removal of non-ASCII characters in my case? the incoming file will also send about 4GB, I'm afraid the reading part will also get an OutOfMemoryException at this time.

Many thanks.

DateTime:2014-08-04 12:55:26,035 Thread ID:[1] Log Level:ERROR Logger Property:OS_fileParser.Program property:[(null)] - Message:System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at System.Text.StringBuilder.ExpandByABlock(Int32 minBlockCharCount)
   at System.Text.StringBuilder.Append(Char* value, Int32 valueCount)
   at System.Text.StringBuilder.Append(Char[] value, Int32 startIndex, Int32 charCount)
   at System.IO.StreamReader.ReadToEnd()
   at OS_fileParser.MyProgram.FormatXmlFile(String inFile) in D:\Test\myProgram.cs:line 530
   at OS_fileParser.MyProgram.Run() in D:\Test\myProgram.cs:line 336

      

myProgram.cs line 530: content = Regex.Replace (content, pattern, "");

myProgram.cs line 336: This is a point call to the following method

                const string pattern = @"[^\x20-\x7E]";

                string content;
                using (var reader = new StreamReader(inFile))
                {
                    content = reader.ReadToEnd();
                    reader.Close();
                }

                content = Regex.Replace(content, pattern, "");

                using (var writer = new StreamWriter(inFile + ".NonAsciiChars"))
                {
                    writer.Write(content);
                    writer.Close();
                }

                using (var myXmlReader = XmlReader.Create(inFile + ".NonAsciiChars", myXmlReaderSettings))
                {
                    try
                    {
                        while (myXmlReader.Read())
                        {
                        }
                    }
                    catch (XmlException ex)
                    {
                        Logger.Error("Validation error: " + ex);
                    }
                }

      

+3


source to share


1 answer


You receive OutOfMemoryException

. To save memory you can process files in chunks, here is a good example of how to process a file line by line and here by byte using a buffer (reading 1 byte is slow).

In the simplest case it is like this:



string line;    
using (var reader = new StreamReader(inFile))
    using (var writer = new StreamWriter(inFile + ".NonAsciiChars"))
        while ((line = reader.ReadLine()) != null)
        {
            ... // code to process line
            writer.Write(line);
        }

      

+3


source







All Articles