How do I perform a stream character conversion?
I have data stored on disk in files that are too large to be stored in main memory.
I want to pass this data from disk to the data processing pipeline via iconv
for example:
zcat myfile | iconv -f L1 -t UTF-8 | # rest of the pipeline goes here
Unfortunately I see the iconv buffer the entire file in memory until it is exhausted before any data is output. This means that I use all of my main memory on a blocking operation in a pipeline that is otherwise minimal memory size.
I've tried calling iconv like this:
stdbuf -o 0 iconv -f L1 -t UTF-8
But it looks like iconv itself manages the buffering - it has nothing to do with the Linux buffer buffer.
I see this with a binary that is packaged with gblic 2.6 and 2.7 on Arch Linux and I described it with glibc 2.5 on Debian.
Is there some way around this? I know streaming character conversion is not easy, but I would think that such a widely used unix tool would work on streams; it's not uncommon to work with files that won't fit into main memory. Should I collapse my own associated binary libiconv
?
source to share
Consider calling iconv (3) with iconv_open - wire a simple C procedure to these two calls. Read from stdin, write to stdout. Read this example:
http://www.gnu.org/software/libc/manual/html_node/iconv-Examples.html
This example is clearly intended to handle what you are describing. - avoid "stateful" waiting for data.
source to share