Configuring Locales on Linux for PostgreSQL

16.2k Views Asked by At

I'm having trouble getting a particular database set up and running. I'm trying to restore a postgreSQL dump I got from somebody else. I've tried a few methods to no avail.

  1. Straight from pg_restore

    pg_restore -C -d postgres --exit-on-error maggie_prod_20111221.dump.sql

  2. Creating the database and tablespace first

    createdb -T template0 maggieprod -E LATIN1

    SQL: CREATE TABLESPACE magdat OWNER maggie LOCATION '/somewhere/magdat';

    pg_restore -v -d template1 maggie_prod_20110121.dump.sql

Using the first method I get the following:

pg_restore: [archiver (db)] Error while PROCESSING TOC:

pg_restore: [archiver (db)] Error from TOC entry 2308; 1262 16386 DATABASE maggieprod postgres

pg_restore: [archiver (db)] could not execute query: ERROR: encoding LATIN1 does not match locale en_CA.utf8

DETAIL: The chosen LC_CTYPE setting requires encoding UTF8. Command was: CREATE DATABASE maggieprod WITH TEMPLATE = template0 ENCODING = 'LATIN1' TABLESPACE = magdat;

And using the second, when I try and create the database I get:

createdb: database creation failed: ERROR: encoding LATIN1 does not match locale en_CA.utf8

DETAIL: The chosen LC_CTYPE setting requires encoding UTF8.

So it seems to be that I cannot create a LATIN1 encoding database? Why is that? I am new to locales and encoding and don't know very much about them. I just know that the dump was made off of a LATIN1 database.

The output of locale is:

LANG=en_CA.utf8 LC_CTYPE="en_CA.utf8" LC_NUMERIC="en_CA.utf8" LC_TIME="en_CA.utf8" LC_COLLATE="en_CA.utf8" LC_MONETARY="en_CA.utf8" LC_MESSAGES="en_CA.utf8" LC_PAPER="en_CA.utf8" LC_NAME="en_CA.utf8" LC_ADDRESS="en_CA.utf8" LC_TELEPHONE="en_CA.utf8" LC_MEASUREMENT="en_CA.utf8" LC_IDENTIFICATION="en_CA.utf8" LC_ALL=

And the output of locale -a is:

C en_AG en_AG.utf8 en_AU.utf8 en_BW.utf8 en_CA.utf8 en_DK.utf8 en_GB.utf8 en_HK.utf8 en_IE.utf8 en_IN en_IN.utf8 en_NG en_NG.utf8 en_NZ.utf8 en_PH.utf8 en_SG.utf8 en_US.utf8 en_ZA.utf8 en_ZW.utf8 POSIX

I don't see LATIN1 in the second command, should I? If so, how would I go about adding it? Is it correct for me to assume that I need to change the locale on my computer? If so, is there a way to do that only for postgreSQL? Also, when I try and open the dump I see a lot of garbage characters, I am assuming this is because of the encoding, how would I look at it properly?

Thanks for any help.

3

There are 3 best solutions below

1
On

You need to create the database with a locale that matches the encoding, e.g.,

createdb -T template0 maggieprod -E LATIN1 --locale=en_CA

Since you don't have all locales installed, I guess you are using Debian or Ubuntu. In that case, call dpkg-reconfigure locales or install the locales-all package.

Alternatively, create the database with encoding UTF8. As long as all your clients set the client encoding correctly, it shouldn't make a difference.

0
On

If you are still interested using the recode command will transform your database dump to the character set of your choice before you import it into your new database. See this link - http://blog.e-shell.org/134

0
On

I had trouble using the createdb syntax from The_Denominater, so I did it the following way:

CREATE DATABASE maggieprod WITH ENCODING = 'LATIN1'
  LC_CTYPE = 'en_CA' LC_COLLATE = 'en_CA'
  TEMPLATE template0;