FAQ

[PostgreSQL] invalid byte sequence for encoding "UTF8": 0xf1612220

AI Rumman
May 11, 2011 at 7:16 am
I am trying to migrate a database from Postgresql 8.2 to Postgresql 8.3 and
getting the following error:

pg_restore: [archiver (db)] Error from TOC entry 2764; 0 29708702 TABLE DATA
originaldata postgres
pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence for
encoding "UTF8": 0xf1612220
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".
CONTEXT: COPY wi_originaldata, line 3592

I took a dump from 8.2 server and then tried to restore at 8.3.

Both the client_encoding and server_encoding are UTF8 at both the servers.

Table "public.data"
Column | Type
Modifiers
---------------------+------------------------+------------------------------------------------------------------------
orgid | integer |
id | integer | not null default
nextval(('"data"'::text)::regclass)
datatypecode | character varying(15) |
batchname | character varying(60) |
filename | character varying(60) |
encoding | character varying(20) |
errormessage | character varying(255) |
originaldata_backup | bytea |
processeddata | bytea |
validatedflag | smallint |
processedflag | smallint |
createddate | date |
createdtime | time without time zone |
modifieddate | date |
modifiedtime | time without time zone |
processeddate | date |
processedtime | time without time zone |
deletedflag | smallint |
originaldata | text |
Indexes:
"data_pkey" PRIMARY KEY, btree (id)

Any help will be appreciable.
reply

Search Discussions

2 responses

  • Craig Ringer at May 12, 2011 at 2:52 am

    On 05/11/2011 03:16 PM, AI Rumman wrote:
    I am trying to migrate a database from Postgresql 8.2 to Postgresql 8.3
    and getting the following error:

    pg_restore: [archiver (db)] Error from TOC entry 2764; 0 29708702 TABLE
    DATA originaldata postgres
    pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence
    for encoding "UTF8": 0xf1612220
    HINT: This error can also happen if the byte sequence does not match
    the encoding expected by the server, which is controlled by
    "client_encoding".
    CONTEXT: COPY wi_originaldata, line 3592

    I took a dump from 8.2 server and then tried to restore at 8.3.

    Both the client_encoding and server_encoding are UTF8 at both the servers.
    Newer versions of Pg got better at caching bad unicode. While this helps
    prevent bad data getting into the database, it's a right pain if you're
    moving data over from an older version with less strict checks.

    I don't know of any way to relax the checks for the purpose of importing
    dumps. You'll need to fix your dump files before loading them (by
    finding the faulty text and fixing it) or fix it in the origin database
    before migrating the data. Neither approach is nice or easy, but nobody
    has yet stepped up to write a unicode verifier tool that checks old
    databases' text fields against stricter rules...

    --
    Craig Ringer
  • Cédric Villemain at May 12, 2011 at 8:33 am

    2011/5/12 Craig Ringer <craig@postnewspapers.com.au>:
    On 05/11/2011 03:16 PM, AI Rumman wrote:

    I am trying to migrate a database from Postgresql 8.2 to Postgresql 8.3
    and getting the following error:

    pg_restore: [archiver (db)] Error from TOC entry 2764; 0 29708702 TABLE
    DATA originaldata postgres
    pg_restore: [archiver (db)] COPY failed: ERROR:  invalid byte sequence
    for encoding "UTF8": 0xf1612220
    HINT:  This error can also happen if the byte sequence does not match
    the encoding expected by the server, which is controlled by
    "client_encoding".
    CONTEXT:  COPY wi_originaldata, line 3592

    I took a dump from 8.2 server and then tried to restore at 8.3.

    Both the client_encoding and server_encoding are UTF8 at both the servers.
    Newer versions of Pg got better at caching bad unicode. While this helps
    prevent bad data getting into the database, it's a right pain if you're
    moving data over from an older version with less strict checks.

    I don't know of any way to relax the checks for the purpose of importing
    dumps. You'll need to fix your dump files before loading them (by finding
    the faulty text and fixing it) or fix it in the origin database before
    migrating the data. Neither approach is nice or easy, but nobody has yet
    stepped up to write a unicode verifier tool that checks old databases' text
    fields against stricter rules...

    The 2 following articles have SQL functions and documentation you may
    find useful:

    http://tapoueh.org/articles/blog/_Getting_out_of_SQL_ASCII,_part_1.html
    http://tapoueh.org/articles/blog/_Getting_out_of_SQL_ASCII,_part_2.html

    --
    Craig Ringer

    --
    Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-general


    --
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Related Discussions

Discussion Navigation
viewthread | post