However, the standard iconv program slurps the entire file into memory, which doesn't work for large data sets (such as database exports). You'll see errors like:
iconv: unable to allocate buffer for input: Cannot allocate memory
iconv: cannot open input file `database.txt': File too large
This script is just a wrapper that processes the input file in manageable chunks and writes it to standard output: iconv-chunks
11 comments:
iconv-chunks fails :Bad file descriptor on line 62 (code) line 57893/110meg or 285Meg db dump clean.
this is where subroutines are called in the code .
The command I am running is ./iconv-chunks datafile -f utf-8 -t utf-8 > dataonly_cleaned
Any ideas?
Hmmm. What OS are you using?
The iconv program runs fine for you on smaller files?
Like 62 looks like the external call to iconv is failing.
You probably need to add the -c option to skip characters that are not convertible.
file had 4.2G size
here is my solution
uconv -f UTF-16LE -t UTF-8 < data.csv > data_utf8.csv
may be will work with iconv too
! and uconv support callbacks for invlid characters - read man
how can I add this script ?
from where?
The script is available here:
http://maurice.aubrey.googlepages.com/iconv-chunks.txt
With my perl this script leaves a tmp file in /tmp. Unfortunately, discovered this when /tmp filled up completely.
I appended this line to fix it.
unlink $tmp;
Fixed the removal of the temp file and moved the script to github: https://github.com/mla/iconv-chunks
[root@db1 scripts]# ./iconv-chunks /root/scripts/hist1.dmp -f utf8 -t utf32 > /root/scripts/hist2.dmp
iconv: illegal input sequence at position 535322
command 'iconv -f utf8 -t utf32 /tmp/44RA8vXwBe' failed: Inappropriate ioctl for device at ./iconv-chunks line 63, <> line 3295.
Can you help?
Are you sure hist1.dmp is encoded as utf8? Maybe really it's Latin1? Try "-f latin1" instead.
Post a Comment