What's the most efficient way to implement ReadLine () for a binary stream?

Please feel free to correct me if I am wrong at any point ...

I am trying to read a CSV file (comma separated values) using .NET I / O classes. Now the problem is that this CSV file may contain some fields with soft carriage returns (i.e. Single \ r or \ n markers, not the standard \ r \ n used in text files for line termination) in some fields and in standard text mode, the StreamReader I / O class does not follow the standard convention and considers soft carriage returns as hard carriages do, which compromises the integrity of the CSV file.

Now using the BinaryReader class seems to be the only option, but the BinaryReader does not have a ReadLine () function, so you need to implement ReadLine () yourself.

My current approach reads one character from the stream at a time and fills the StringBuilder until \ r \ n is received (ignoring all other characters, including the single \ r or \ n), and then returns the string representation of the StringBuilder (using ToString ()).

But I'm wondering: is this the most efficient way to implement the ReadLine () function? Please enlighten me.

+1


source to share


7 replies


Likely. In terms of order, it goes through each char once, so it will be O (n) (where n is the length of the stream), so this is not a problem. To read a single character, BinaryReader is the best choice.

What I would do is make a class

public class LineReader : IDisposable
{
    private Stream stream;
    private BinaryReader reader;

    public LineReader(Stream stream) { reader = new BinaryReader(stream); }

    public string ReadLine()
    {
        StringBuilder result = new StringBuilder();
        char lastChar = reader.ReadChar();
        // an EndOfStreamException here would propogate to the caller

        try
        {
            char newChar = reader.ReadChar();
            if (lastChar == '\r' && newChar == '\n')
                return result.ToString();

            result.Append(lastChar);
            lastChar = newChar;
        }
        catch (EndOfStreamException)
        {
            result.Append(lastChar);
            return result.ToString();
        }
    }

    public void Dispose()
    {
        reader.Close();
    }
}

      



Or something like that.

<sub> (WARNING: the code has not been tested and provided AS IS without warranty of any kind, either expressed or implied. If this software proves to be defective or destroys the planet, you assume the cost of all necessary maintenance, repair or rectification.) sub>

+6


source


You may need to use an ODBC / OleDB connection for this. If you point the oledb connection data source to the directory containing the csv files, you can query it as if each CSV were a table.
check http://www.connectionstrings.com/?carrier=textfile> connectionstrings.com for correct connection string



+1


source


Here's an extension method for the BinaryReader class:

using System.IO;
using System.Text;

public static class BinaryReaderExtension
{
    public static string ReadLine(this BinaryReader reader)
    {
        if (reader.IsEndOfStream())
            return null;

        StringBuilder result = new StringBuilder();
        char character;
        while(!reader.IsEndOfStream() && (character = reader.ReadChar()) != '\n')
            if (character != '\r' && character != '\n')
                result.Append(character);

        return result.ToString();
    }

    public static bool IsEndOfStream(this BinaryReader reader)
    {
        return reader.BaseStream.Position == reader.BaseStream.Length; 
    }
}

      

I haven't tested in all conditions, but this code worked for me.

+1


source


How easy is it to preprocess a file?

Replace soft carriages with something unique.

For writing CSV files with line in the data, that's bad design.

0


source


You can read a larger chunk at a time, unencode it to a string using Encoder.GetString and then split into lines using string.Split ("\ r \ n"), or even highlight the line header using the string .Substring (0, string.IndexOf ("\ r \ n")) and the rest to process the next line. Remember to add the next read operation to your last line from your previous read.

0


source


Your approach sounds great. One way to make your method more efficient would be to store each string as it is created in a regular string (i.e. not in a StringBuilder), and then add the entire string to your StringBuilder. See this article for further explanation - StringBuilder is not automatically the best choice here.

It probably won't matter much.

0


source


Here's a faster alternative with encoding support. It extends BinaryReader, so you can use it for both reading and binary chunks, as well as StreamReader like ReadLine directly on a binary stream.

public class LineReader : BinaryReader
{
    private Encoding _encoding;
    private Decoder _decoder;

    const int bufferSize = 1024;
    private char[] _LineBuffer = new char[bufferSize];

    public LineReader(Stream stream, int bufferSize, Encoding encoding)
        : base(stream, encoding)
    {
        this._encoding = encoding;
        this._decoder = encoding.GetDecoder();
    }

    public string ReadLine()
    {
        int pos = 0;

        char[] buf = new char[2];

        StringBuilder stringBuffer = null;
        bool lineEndFound = false;

        while(base.Read(buf, 0, 2) > 0)
        {
            if (buf[1] == '\r')
            {
                // grab buf[0]
                this._LineBuffer[pos++] = buf[0];
                // get the '\n'
                char ch = base.ReadChar();
                Debug.Assert(ch == '\n');

                lineEndFound = true;
            }
            else if (buf[0] == '\r')
            {
                lineEndFound = true;
            }                    
            else
            {
                this._LineBuffer[pos] = buf[0];
                this._LineBuffer[pos+1] = buf[1];
                pos += 2;

                if (pos >= bufferSize)
                {
                    stringBuffer = new StringBuilder(bufferSize + 80);
                    stringBuffer.Append(this._LineBuffer, 0, bufferSize);
                    pos = 0;
                }
            }

            if (lineEndFound)
            {
                if (stringBuffer == null)
                {
                    if (pos > 0)
                        return new string(this._LineBuffer, 0, pos);
                    else
                        return string.Empty;
                }
                else
                {
                    if (pos > 0)
                        stringBuffer.Append(this._LineBuffer, 0, pos);
                    return stringBuffer.ToString();
                }
            }
        }

        if (stringBuffer != null)
        {
            if (pos > 0)
                stringBuffer.Append(this._LineBuffer, 0, pos);
            return stringBuffer.ToString();
        }
        else
        {
            if (pos > 0)
                return new string(this._LineBuffer, 0, pos);
            else
                return null;
        }
    }

}

      

0


source







All Articles