How do I perform a stream character conversion?

Question

How do I perform a stream character conversion?

I have data stored on disk in files that are too large to be stored in main memory.

I want to pass this data from disk to the data processing pipeline via iconv

for example:

zcat myfile | iconv -f L1 -t UTF-8 | # rest of the pipeline goes here

Unfortunately I see the iconv buffer the entire file in memory until it is exhausted before any data is output. This means that I use all of my main memory on a blocking operation in a pipeline that is otherwise minimal memory size.

I've tried calling iconv like this:

stdbuf -o 0 iconv -f L1 -t UTF-8

But it looks like iconv itself manages the buffering - it has nothing to do with the Linux buffer buffer.

I see this with a binary that is packaged with gblic 2.6 and 2.7 on Arch Linux and I described it with glibc 2.5 on Debian.

Is there some way around this? I know streaming character conversion is not easy, but I would think that such a widely used unix tool would work on streams; it's not uncommon to work with files that won't fit into main memory. Should I collapse my own associated binary libiconv

?

+3

linux encoding glibc iconv

Cera 23 jan. At 3:54 pm

source to share

1 answer

jim mcnamara · Answer 1 · 2013-01-23T04:40:11+0000

Consider calling iconv (3) with iconv_open - wire a simple C procedure to these two calls. Read from stdin, write to stdout. Read this example:

http://www.gnu.org/software/libc/manual/html_node/iconv-Examples.html

This example is clearly intended to handle what you are describing. - avoid "stateful" waiting for data.

How do I perform a stream character conversion?

More articles: