7.3. Single-byte character set recoding

You can set up this feature with the --enable-recode option to configure. This option was formerly described as "Cyrillic recode support" which doesn't express all its power. It can be used for any single-byte character set recoding.

This method uses a file charset.conf file located in the database directory (PGDATA). It's a typical configuration text file where spaces and newlines separate items and records and # specifies comments. Three keywords with the following syntax are recognized here:

BaseCharset      server_charset
RecodeTable      from_charset to_charset file_name
HostCharset      host_spec    host_charset

BaseCharset defines the encoding of the database server. All character set names are only used for mapping inside of charset.conf so you can freely use typing-friendly names.

RecodeTable records specify translation tables between server and client. The file name is relative to the PGDATA directory. The table file format is very simple. There are no keywords and characters are represented by a pair of decimal or hexadecimal (0x prefixed) values on single lines:

char_value   translated_char_value

HostCharset records define the client character set by IP address. You can use a single IP address, an IP mask range starting from the given address or an IP interval (e.g., 127.0.0.1, 192.168.1.100/24, 192.168.1.20-192.168.1.40).

The charset.conf file is always processed up to the end, so you can easily specify exceptions from the previous rules. In the src/data/ directory you will find an example charset.conf and a few recoding tables.

As this solution is based on the client's IP address and character set mapping there are obviously some restrictions as well. You cannot use different encodings on the same host at the same time. It is also inconvenient when you boot your client hosts into multiple operating systems. Nevertheless, when these restrictions are not limiting and you do not need multibyte characters then it is a simple and effective solution.