How to decode ubyte [] to specified encoding?
Are you trying to convert a text file to utf-8? If the answer is "yes", Phobos has a special feature for this: @trusted string toUTF8(in char[] s)
. See http://dlang.org/phobos/std_utf.html for details .
Sorry if this is not what you want.
source to share
I found a way, maybe using std.algorithm.reduce should be better
import std.string;
import std.stdio;
import std.encoding;
import std.algorithm;
void main( string[] args ){
File f = File( "pathToAfFile.txt", "r" );
size_t i;
auto e = EncodingScheme.create("utf-8");
foreach( const(ubyte)[] buffer; f.byChunk( 4096 ) ){
size_t step = 0;
if( step == 0 ) step = e.firstSequence( buffer );
for( size_t start; start + step < buffer.length; start = start + step )
write( e.decode( buffer[start..start + step] ) );
}
}
source to share
File.byChunk returns a range that ubyte [] returns across the front.
A quick google search showed that UTF-8 uses 1 to 6 bytes to encode data, so just make sure you always have 6 bytes of data and you can use the std.encoding decoder to convert its dchar character. Then you can use std.utf toUFT8 to convert to normal string instead of dstring.
The transform function below will convert any unsigned range to a string.
import std.encoding, std.stdio, std.traits, std.utf;
void main()
{
File input = File("test.txt");
string data = convert(input.byChunk(512));
writeln("Data: ", data);
}
string convert(R)(R chunkRange)
in
{
assert(isArray!(typeof(chunkRange.front)) && isUnsigned!(typeof(chunkRange.front[0])));
}
body
{
ubyte[] inbuffer;
dchar[] outbuffer;
while(inbuffer.length > 0 || !chunkRange.empty)
{
while((inbuffer.length < 6) && !chunkRange.empty)// Max UTF-8 byte length is 6
{
inbuffer ~= chunkRange.front;
chunkRange.popFront();
}
outbuffer ~= decode(inbuffer);
}
return toUTF8(outbuffer); // Convert to string instead of dstring
}
source to share