Read source file with Unicode characters

I have a file in the / res / raw folder (R.raw.test) with the following content:

This is Tésêt

I want to read it in line. My current code:

public static String readRawTextFile(Context ctx, int resId) {
    InputStream inputStream = ctx.getResources().openRawResource(resId);

    InputStreamReader inputreader;
    try {
        inputreader = new InputStreamReader(inputStream, "UTF-8");
    } catch (UnsupportedEncodingException e1) {
        e1.printStackTrace();
        return null;
    }
    BufferedReader buffreader = new BufferedReader(inputreader);
    String line;
    StringBuilder text = new StringBuilder();

    try {
        while ((line = buffreader.readLine()) != null) {
            text.append(line);
            text.append('\n');
        }
    } catch (IOException e) {
        return null;
    }
    return text.toString();
}

      

But the returned string is:

It's T st

How can I solve this? Thanks to

0


source to share


4 answers


Your code seems to be OK. The string is also returned if you try to view it in a non- UTF-8

. I ran your code from groovyConsole

which is UNICODE and it displays the line fine UTF-8

.



+2


source


First of all, you need to determine the encoding of the file / res / raw

If on UNIX you can enter the following commands

file /res/raw

      



And then put the correct encoding in

inputreader = new InputStreamReader(inputStream, "UTF-8");

      

+1


source


Hi, I would try something like this:

StringBuilder str = new StringBuilder();
File file = new File("c:\\some_file.txt");
FileInputStream is = new FileInputStream(file);
Reader reader = new InputStreamReader(is, "UTF-8");
while(true){
    int ch = reader.read();
    if(ch < 0){
      break;
    }
    str.append((char)ch);
}
String myString = str.toString();

      

If you want to write just use InputStreamWriter

with FileOutputStream

and set the correct encoding ... it works like a charm ...

Hope I can help :-)

+1


source


I had a file that also gave me a result similar to "This is T s t" and for me setting the charsetName to UTF-16 did the trick

+1


source







All Articles