I'm facing this problem with PostgreSQL 15.4 but not 14. It happens when all 3 conditions are satisfied:
TEXT @@ to_tsquery()is executed.- The
TEXTcolumn contains a multi-byte UTF-8 character. - The statement is run after a simple
plperlfunction call in the same session.
The error is:
ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding.
Reproduction:
$ createdb -U postgres test
$ psql -U postgres test
psql (15.4)
Type "help" for help.
test=# \l test
List of databases
Name | Owner | Encoding | Collate | Ctype | ICU Locale | Locale Provider | Access privileges
------+----------+----------+-------------+-------------+------------+-----------------+-------------------
test | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc |
(1 row)
test=# CREATE EXTENSION IF NOT EXISTS plperl;
CREATE EXTENSION
test=# CREATE FUNCTION public.simple_quoter(text) RETURNS TEXT
LANGUAGE plperl
AS $_X$
return "'$_[0]'"
$_X$;
CREATE FUNCTION
test=# CREATE TABLE t (c TEXT);
CREATE TABLE
test=# INSERT INTO t VALUES ('Queensrÿche');
INSERT 0 1
test=# SELECT * FROM t WHERE to_tsquery('english','Queen:*') @@ to_tsvector('english',c);
ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding.
Now that the plperl extension and simple_quote() function are created, the simple repro.sql file can be created:
SELECT simple_quoter('hi');
SELECT c @@ to_tsquery('english','Queen:*') FROM t;
which also reproduces the error:
$ psql -U postgres test < repro.sql
simple_quoter
---------------
'hi'
(1 row)
ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding.
When run in its own session, it is fine:
test=# SELECT c @@ to_tsquery('english','Queen:*') FROM t;
?column?
----------
t
(1 row)
When initdb was run, no special parameters were given for locale:
$ initdb -D /var/lib/postgres/data
The locale settings were the same as they are now:
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
test=# SHOW CLIENT_ENCODING;
client_encoding
-----------------
UTF8
(1 row)
test=# SHOW SERVER_ENCODING;
server_encoding
-----------------
UTF8
(1 row)
Questions
Is this a bug with PostgreSQL 15 or am I overlooking something?
Is there a workaround (without having to not use plperl in the same session)?
Additional Details
OS: Current version of arch linux (2023-Sept-3)
PostgreSQL build: https://gitlab.archlinux.org/archlinux/packaging/packages/postgresql/-/blob/main/PKGBUILD pkgver=15.4 pkgrel=2