Cannot recognize char in text file (eg ² μ)

Question

Cannot recognize char in text file (eg ² μ)

I have a text file with content:

A B C D Ä 1 4 0 $ % & € / [ ) = ß ² µ §

If you ask me about the encoding, I have no idea. If I open it with Notepad ++ I see in the encoding menuEncoding in ANSI

I would like to read this file and recognize each character correctly. As the code I have:

//open and locking the file
using (FileStream fs = File.Open(@"C:\testfile.txt", FileMode.Open, FileAccess.Read, FileShare.None))
{
    using (TextReader reader = new StreamReader(fs))
    {
        string line;
        //reading and printing each line
        while ((line = reader.ReadLine()) != null)
        {
            System.Console.WriteLine(line);
        }
    }
}

As an output, I get: enter image description here

So, for Ä € ß ² µ §

I get a ?

. This is why I thought about it because of the console, so I changed it to UTF8, so I might be able to get a better result. But that doesn't help.

System.Console.OutputEncoding = System.Text.Encoding.UTF8;

enter image description here

This is why I think there is something wrong while reading the file. I should probably change the encoding of the StreamReader. But there are not many options. I tried UTF8, ASCII but it didn't help. Any ideas?

Edit: Thanks Matthew for adding System.Text.Encoding.Default

to StreamReader. Now only char is €

not recognized. I don't understand if there are any "special" characters?

Edit2: ok, €

was a problem just because the console is buggy (?). If I look at the line in debug mode then €

fine too.

So now a working solution has been created for me:

1.) Using the reader with default encoding:

using (TextReader reader = new StreamReader(fs, System.Text.Encoding.Default))

and

2.) Without using the console for output, just reading the line in debug mode

+3

c # .net encoding

sabisabi 05 Feb 13 at 13:34

source to share

2 answers

You can use Mozilla's generic Charset detector, the .NET port of which is available here , to determine the encoding for a file quite reliably. This will then allow you to open most files with the correct encoding with minimal effort on your part.

+1

Matt whitfield 05 Feb 13 at 13:37

source to share

Matthew watson · Accepted Answer · 2013-02-05T13:44:59+0000

If you are using ANSI you can do it like this:

using (TextReader reader = new StreamReader(fs, System.Text.Encoding.Default))

However, this will only work if your current code page is correct for the file you are reading. It probably will, but for complete portability, you have to define the actual page of code you are using and using:

using (TextReader reader = new StreamReader(fs, new System.Text.Encoding(codePageNumber)))

where codePageNumber is the code page of the text file.

Cannot recognize char in text file (eg ² μ)

More articles: