MLA Wire: iconv: file too large

Monday, August 20, 2007

iconv: file too large

The iconv utility is used to convert file encodings. I'm using it to convert a Postgresql database from LATIN1 to UTF8.

However, the standard iconv program slurps the entire file into memory, which doesn't work for large data sets (such as database exports). You'll see errors like:

iconv: unable to allocate buffer for input: Cannot allocate memory
 iconv: cannot open input file `database.txt': File too large

This script is just a wrapper that processes the input file in manageable chunks and writes it to standard output: iconv-chunks

11 comments:

compass2k said...: iconv-chunks fails :Bad file descriptor on line 62 (code) line 57893/110meg or 285Meg db dump clean.
this is where subroutines are called in the code .
The command I am running is ./iconv-chunks datafile -f utf-8 -t utf-8 > dataonly_cleaned
Any ideas?; September 16, 2008 at 4:31 PM
mla said...: Hmmm. What OS are you using?

The iconv program runs fine for you on smaller files?

Like 62 looks like the external call to iconv is failing.; September 16, 2008 at 6:22 PM
Unknown said...: You probably need to add the -c option to skip characters that are not convertible.; February 25, 2009 at 10:57 AM
nicola said...: file had 4.2G size
here is my solution

uconv -f UTF-16LE -t UTF-8 < data.csv > data_utf8.csv

may be will work with iconv too; March 19, 2009 at 2:25 PM
nicola said...: ! and uconv support callbacks for invlid characters - read man; March 19, 2009 at 2:26 PM
R.M said...: how can I add this script ?
from where?; June 25, 2009 at 9:45 PM
mla said...: The script is available here:
http://maurice.aubrey.googlepages.com/iconv-chunks.txt; June 25, 2009 at 11:59 PM
Anonymous said...: With my perl this script leaves a tmp file in /tmp. Unfortunately, discovered this when /tmp filled up completely.

I appended this line to fix it.

unlink $tmp;; February 11, 2011 at 1:26 PM
mla said...: Fixed the removal of the temp file and moved the script to github: https://github.com/mla/iconv-chunks; May 2, 2015 at 2:25 PM
Артем said...: [root@db1 scripts]# ./iconv-chunks /root/scripts/hist1.dmp -f utf8 -t utf32 > /root/scripts/hist2.dmp
iconv: illegal input sequence at position 535322
command 'iconv -f utf8 -t utf32 /tmp/44RA8vXwBe' failed: Inappropriate ioctl for device at ./iconv-chunks line 63, <> line 3295.

Can you help?; February 24, 2016 at 2:12 AM
mla said...: Are you sure hist1.dmp is encoded as utf8? Maybe really it's Latin1? Try "-f latin1" instead.; February 24, 2016 at 11:48 AM

Monday, August 20, 2007

iconv: file too large

11 comments:

Archive