5.7.120. contrib::recode

5.7.120.1. NAME

recode.pl - Converts a database from one encoding (or multiple encodings) to UTF-8.

5.7.120.2. SYNOPSIS

contrib/recode.pl [--guess [--show-failures]] [--charset=iso-8859-2]
                  [--overrides=file_name]

 --dry-run        Don't modify the database.

 --charset        Primary charset your data is currently in. This can be
                  optionally omitted if you do --guess.

 --guess          Try to guess the charset of the data.

 --show-failures  If we fail to guess, show where we failed.

 --overrides      Specify a file containing overrides. See --help
                  for more info.

 --help           Display detailed help.

If you aren't sure what to do, try:

  contrib/recode.pl --guess --charset=cp1252

5.7.120.3. OPTIONS

--dry-run

Don't modify the database, just print out what the conversions will be.

recode.pl will print out a Key for each item. You can use this in the overrides file, described below.

--guess

If your database is in multiple different encodings, specify this switch and recode.pl will do its best to determine the original charset of the data. The detection is usually very reliable.

If recode.pl cannot guess the charset, it will leave the data alone, unless you've specified --charset.

--charset=charset-name
 

If you do not specify --guess, then your database is converted from this character set into the UTF-8.

If you have specified --guess, recode.pl will use this charset as a fallback--when it cannot guess the charset of a particular piece of data, it will guess that the data is in this charset and convert it from this charset to UTF-8.

charset-name must be a charset that is known to perl's Encode module. To see a list of available charsets, do:

perl -MEncode -e 'print join("\n", Encode->encodings(":all"))'

--show-failures
 If --guess fails to guess a charset, print out the data it failed on.
--overrides=file_name
 This is a way of specifying certain encodings to override the encodings of --guess. The file is a series of lines. The line should start with the Key from --dry-run, and then a space, and then the encoding you'd like to use.

This documentation undoubtedly has bugs; if you find some, please file them here.