FAQ
Hi all,

I'm trying to (efficiently) determine which rows have column values with characters outside of the range of 0-127.

My first attempt was something like this:

select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0x80-0xFF]') > 0;

But, that seems to select every row in the documents table, not just the ones containing characters with values in the range 128-255.

Looking at one of the rows returned from the query above, with dump(doc_authors) confirms that rows being returned don't have characters in the range 128-255.

This is a Unicode database, so, I also tried:

select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0c0080-0cFFFF]') > 0;

but, again, this seems to return every row.

So, can someone offer me a clue here?

Honestly, this is the first time I've tried using any of Oracle's REGEXP functions.

I'm sure I'm just doing something stupid, but I don't have a clue what it is, and the examples I've run across in the manuals and on the web, don't have anything similar to what I'm trying to do.

AdvThanksance,

-Mark

Search Discussions

  • Anonymous at Nov 9, 2010 at 1:03 pm
    Could you try using Oracle's csscan utility and pick a US7ASCII database as the target for the migration? This should identify any rows with data outside the normal US7ASCII character set...

    Bill

    From: oracle-l-bounce_at_freelists.org On Behalf Of Bobak, Mark
    Sent: Tuesday, November 09, 2010 7:56 AM
    To: oracle-l@freelists.org
    Subject: Determining which rows have characters outside of the standard ASCII (0-127)

    Hi all,

    I'm trying to (efficiently) determine which rows have column values with characters outside of the range of 0-127.

    My first attempt was something like this:

    select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0x80-0xFF]') > 0;

    But, that seems to select every row in the documents table, not just the ones containing characters with values in the range 128-255.

    Looking at one of the rows returned from the query above, with dump(doc_authors) confirms that rows being returned don't have characters in the range 128-255.

    This is a Unicode database, so, I also tried:

    select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0c0080-0cFFFF]') > 0;

    but, again, this seems to return every row.

    So, can someone offer me a clue here?

    Honestly, this is the first time I've tried using any of Oracle's REGEXP functions.

    I'm sure I'm just doing something stupid, but I don't have a clue what it is, and the examples I've run across in the manuals and on the web, don't have anything similar to what I'm trying to do.

    AdvThanksance,

    -Mark
  • Bobak, Mark at Nov 9, 2010 at 1:43 pm
    Hi Bill,

    I tried your suggestion, but, it doesn't seem to work as expected:

    pqrac101:[pqprd1]:(/home/oracle):$csscan

    Character Set Scanner v2.1 : Release 10.2.0.3.0 - Production on Tue Nov 9 08:16:45 2010

    Copyright (c) 1982, 2005, Oracle. All rights reserved.

    Username: adds_at_prd1

    Password:

    Connected to:
    Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
    With the Partitioning, Real Application Clusters and Data Mining options

    (1)Full database, (2)User, (3)Table, (4)Column: 1 > 4

    Current database character set is WE8ISO8859P1.

    Enter new database character set name: > US7ASCII

    Enter array fetch buffer size: 1024000 >

    Enter number of scan processes to utilize(1..32): 1 >

    Enter column name to scan: > DOCUMENTS.DOC_AUTHORS

    Enter column name to scan: >

    Enumerating tables to scan...

    table(s) contain no character type columns

    Scanner terminated successfully.

    Not sure what "contain no character type columns" actually means.....

    -Mark

    From: Johnson, William L (TEIS)
    Sent: Tuesday, November 09, 2010 8:04 AM
    To: Bobak, Mark; oracle-l@freelists.org
    Subject: RE: Determining which rows have characters outside of the standard ASCII (0-127)

    Could you try using Oracle's csscan utility and pick a US7ASCII database as the target for the migration? This should identify any rows with data outside the normal US7ASCII character set...

    Bill

    From: oracle-l-bounce_at_freelists.org On Behalf Of Bobak, Mark
    Sent: Tuesday, November 09, 2010 7:56 AM
    To: oracle-l@freelists.org
    Subject: Determining which rows have characters outside of the standard ASCII (0-127)

    Hi all,

    I'm trying to (efficiently) determine which rows have column values with characters outside of the range of 0-127.

    My first attempt was something like this:

    select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0x80-0xFF]') > 0;

    But, that seems to select every row in the documents table, not just the ones containing characters with values in the range 128-255.

    Looking at one of the rows returned from the query above, with dump(doc_authors) confirms that rows being returned don't have characters in the range 128-255.

    This is a Unicode database, so, I also tried:

    select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0c0080-0cFFFF]') > 0;

    but, again, this seems to return every row.

    So, can someone offer me a clue here?

    Honestly, this is the first time I've tried using any of Oracle's REGEXP functions.

    I'm sure I'm just doing something stupid, but I don't have a clue what it is, and the examples I've run across in the manuals and on the web, don't have anything similar to what I'm trying to do.

    AdvThanksance,

    -Mark
  • Guillermo Alan Bort at Nov 9, 2010 at 2:00 pm
    can you describe the table authors for us?
    Alan.-
    On Tue, Nov 9, 2010 at 10:43 AM, Bobak, Mark wrote:

    Hi Bill,



    I tried your suggestion, but, it doesn�t seem to work as expected:



    pqrac101:[pqprd1]:(/home/oracle):$csscan





    Character Set Scanner v2.1 : Release 10.2.0.3.0 - Production on Tue Nov 9
    08:16:45 2010



    Copyright (c) 1982, 2005, Oracle. All rights reserved.





    Username: adds_at_prd1



    Password:



    Connected to:

    Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit
    Production

    With the Partitioning, Real Application Clusters and Data Mining options



    (1)Full database, (2)User, (3)Table, (4)Column: 1 > 4



    Current database character set is WE8ISO8859P1.



    Enter new database character set name: > US7ASCII



    Enter array fetch buffer size: 1024000 >



    Enter number of scan processes to utilize(1..32): 1 >



    Enter column name to scan: > DOCUMENTS.DOC_AUTHORS



    Enter column name to scan: >



    Enumerating tables to scan...



    table(s) contain no character type columns



    Scanner terminated successfully.



    Not sure what �contain no character type columns� actually means�..



    -Mark





    *From:* Johnson, William L (TEIS)
    *Sent:* Tuesday, November 09, 2010 8:04 AM
    *To:* Bobak, Mark; oracle-l@freelists.org
    *Subject:* RE: Determining which rows have characters outside of the
    standard ASCII (0-127)



    Could you try using Oracle�s csscan utility and pick a US7ASCII database as
    the target for the migration? This should identify any rows with data
    outside the normal US7ASCII character set�



    Bill


    ------------------------------

    *From:* oracle-l-bounce_at_freelists.org [mailto:
    oracle-l-bounce@freelists.org] *On Behalf Of *Bobak, Mark
    *Sent:* Tuesday, November 09, 2010 7:56 AM
    *To:* oracle-l@freelists.org
    *Subject:* Determining which rows have characters outside of the standard
    ASCII (0-127)


    Hi all,



    I�m trying to (efficiently) determine which rows have column values with
    characters outside of the range of 0-127.



    My first attempt was something like this:



    select doc_id,doc_authors from documents where
    regexp_instr(doc_authors,�[0x80-0xFF]�) > 0;



    But, that seems to select every row in the documents table, not just the
    ones containing characters with values in the range 128-255.



    Looking at one of the rows returned from the query above, with
    dump(doc_authors) confirms that rows being returned don�t have characters
    in the range 128-255.



    This is a Unicode database, so, I also tried:



    select doc_id,doc_authors from documents where
    regexp_instr(doc_authors,�[0c0080-0cFFFF]�) > 0;



    but, again, this seems to return every row.



    So, can someone offer me a clue here?



    Honestly, this is the first time I�ve tried using any of Oracle�s REGEXP
    functions.



    I�m sure I�m just doing something stupid, but I don�t have a clue what it
    is, and the examples I�ve run across in the manuals and on the web, don�t
    have anything similar to what I�m trying to do.





    AdvThanksance,



    -Mark



    --
    http://www.freelists.org/webpage/oracle-l
  • Guillermo Alan Bort at Nov 9, 2010 at 2:00 pm
    that was the table documents, sorry.
    Alan.-

    On Tue, Nov 9, 2010 at 11:00 AM, Guillermo Alan Bort
    wrote:
    can you describe the table authors for us?
    Alan.-


    On Tue, Nov 9, 2010 at 10:43 AM, Bobak, Mark wrote:

    Hi Bill,



    I tried your suggestion, but, it doesn�t seem to work as expected:



    pqrac101:[pqprd1]:(/home/oracle):$csscan





    Character Set Scanner v2.1 : Release 10.2.0.3.0 - Production on Tue Nov 9
    08:16:45 2010



    Copyright (c) 1982, 2005, Oracle. All rights reserved.





    Username: adds_at_prd1



    Password:



    Connected to:

    Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit
    Production

    With the Partitioning, Real Application Clusters and Data Mining options



    (1)Full database, (2)User, (3)Table, (4)Column: 1 > 4



    Current database character set is WE8ISO8859P1.



    Enter new database character set name: > US7ASCII



    Enter array fetch buffer size: 1024000 >



    Enter number of scan processes to utilize(1..32): 1 >



    Enter column name to scan: > DOCUMENTS.DOC_AUTHORS



    Enter column name to scan: >



    Enumerating tables to scan...



    table(s) contain no character type columns



    Scanner terminated successfully.



    Not sure what �contain no character type columns� actually means�..



    -Mark





    *From:* Johnson, William L (TEIS)
    *Sent:* Tuesday, November 09, 2010 8:04 AM
    *To:* Bobak, Mark; oracle-l@freelists.org
    *Subject:* RE: Determining which rows have characters outside of the
    standard ASCII (0-127)



    Could you try using Oracle�s csscan utility and pick a US7ASCII database
    as the target for the migration? This should identify any rows with data
    outside the normal US7ASCII character set�



    Bill


    ------------------------------

    *From:* oracle-l-bounce_at_freelists.org [mailto:
    oracle-l-bounce@freelists.org] *On Behalf Of *Bobak, Mark
    *Sent:* Tuesday, November 09, 2010 7:56 AM
    *To:* oracle-l@freelists.org
    *Subject:* Determining which rows have characters outside of the standard
    ASCII (0-127)


    Hi all,



    I�m trying to (efficiently) determine which rows have column values with
    characters outside of the range of 0-127.



    My first attempt was something like this:



    select doc_id,doc_authors from documents where
    regexp_instr(doc_authors,�[0x80-0xFF]�) > 0;



    But, that seems to select every row in the documents table, not just the
    ones containing characters with values in the range 128-255.



    Looking at one of the rows returned from the query above, with
    dump(doc_authors) confirms that rows being returned don�t have characters
    in the range 128-255.



    This is a Unicode database, so, I also tried:



    select doc_id,doc_authors from documents where
    regexp_instr(doc_authors,�[0c0080-0cFFFF]�) > 0;



    but, again, this seems to return every row.



    So, can someone offer me a clue here?



    Honestly, this is the first time I�ve tried using any of Oracle�s REGEXP
    functions.



    I�m sure I�m just doing something stupid, but I don�t have a clue what it
    is, and the examples I�ve run across in the manuals and on the web, don�t
    have anything similar to what I�m trying to do.





    AdvThanksance,



    -Mark



    --
    http://www.freelists.org/webpage/oracle-l
  • Herring Dave - dherri at Nov 9, 2010 at 2:09 pm
    Mark,

    Could something like the following work for you?

    select doc_id, doc_authors
    from documents
    where doc_authors != convert(doc_authors, 'us7ascii');

    Dave Herring  | DBA
    Acxiom Global Technology Solutions

    630-944-4762 office | 630-430-5988 cell | 630-944-4989 fax
    1501 Opus Pl | Downers Grove, IL, 60515 | U.S.A. | www.acxiom.com
    Service Desk: 888-243-4566, https://servicedesk.acxiom.com, GSCA_at_DNB.com

    From: oracle-l-bounce_at_freelists.org On Behalf Of Bobak, Mark
    Sent: Tuesday, November 09, 2010 7:44 AM
    To: Johnson, William L (TEIS); oracle-l@freelists.org
    Subject: RE: Determining which rows have characters outside of the standard ASCII (0-127)

    Hi Bill,

    I tried your suggestion, but, it doesn't seem to work as expected:

    pqrac101:[pqprd1]:(/home/oracle):$csscan

    Character Set Scanner v2.1 : Release 10.2.0.3.0 - Production on Tue Nov 9 08:16:45 2010

    Copyright (c) 1982, 2005, Oracle.  All rights reserved.

    Username: adds_at_prd1

    Password:

    Connected to:
    Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
    With the Partitioning, Real Application Clusters and Data Mining options

    (1)Full database, (2)User, (3)Table, (4)Column: 1 > 4

    Current database character set is WE8ISO8859P1.

    Enter new database character set name: > US7ASCII

    Enter array fetch buffer size: 1024000 >

    Enter number of scan processes to utilize(1..32): 1 >

    Enter column name to scan: > DOCUMENTS.DOC_AUTHORS

    Enter column name to scan: >

    Enumerating tables to scan...

    table(s) contain no character type columns

    Scanner terminated successfully.

    Not sure what "contain no character type columns" actually means.....

    -Mark

    From: Johnson, William L (TEIS)
    Sent: Tuesday, November 09, 2010 8:04 AM
    To: Bobak, Mark; oracle-l@freelists.org
    Subject: RE: Determining which rows have characters outside of the standard ASCII (0-127)

    Could you try using Oracle's csscan utility and pick a US7ASCII database as the target for the migration?  This should identify any rows with data outside the normal US7ASCII character set...

    Bill

    From: oracle-l-bounce_at_freelists.org On Behalf Of Bobak, Mark
    Sent: Tuesday, November 09, 2010 7:56 AM
    To: oracle-l@freelists.org
    Subject: Determining which rows have characters outside of the standard ASCII (0-127)

    Hi all,

    I'm trying to (efficiently) determine which rows have column values with characters outside of the range of 0-127.

    My first attempt was something like this:

    select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0x80-0xFF]') > 0;

    But, that seems to select every row in the documents table, not just the ones containing characters with values in the range 128-255.

    Looking at one of the rows returned from the query above, with dump(doc_authors) confirms that rows being returned don't have characters in the range 128-255.

    This is a Unicode database, so, I also tried:

    select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0c0080-0cFFFF]') > 0;

    but, again, this seems to return every row.

    So, can someone offer me a clue here?

    Honestly, this is the first time I've tried using any of Oracle's REGEXP functions.

    I'm sure I'm just doing something stupid, but I don't have a clue what it is, and the examples I've run across in the manuals and on the web, don't have anything similar to what I'm trying to do.

    AdvThanksance,

    -Mark

    The information contained in this communication is confidential, is
    intended only for the use of the recipient named above, and may be legally
    privileged.

    If the reader of this message is not the intended recipient, you are
    hereby notified that any dissemination, distribution or copying of this
    communication is strictly prohibited.

    If you have received this communication in error, please resend this
    communication to the sender and delete the original message or any copy
    of it from your computer system.

    Thank You.
  • Anonymous at Nov 9, 2010 at 2:11 pm
    Is this a partitioned table?
    Bug 9488049: CSSCAN DOES NOT SCAN PARTITIONED IOT TABLE.

    -----Original Message-----
    From: Herring Dave - dherri
    Sent: Tuesday, November 09, 2010 9:10 AM
    To: Mark.Bobak_at_proquest.com; Johnson, William L (TEIS); oracle-l_at_freelists.org
    Subject: RE: Determining which rows have characters outside of the standard ASCII (0-127)

    Mark,

    Could something like the following work for you?

    select doc_id, doc_authors
    from documents
    where doc_authors != convert(doc_authors, 'us7ascii');

    Dave Herring  | DBA
    Acxiom Global Technology Solutions

    630-944-4762 office | 630-430-5988 cell | 630-944-4989 fax
    1501 Opus Pl | Downers Grove, IL, 60515 | U.S.A. | www.acxiom.com
    Service Desk: 888-243-4566, https://servicedesk.acxiom.com, GSCA_at_DNB.com

    From: oracle-l-bounce_at_freelists.org On Behalf Of Bobak, Mark
    Sent: Tuesday, November 09, 2010 7:44 AM
    To: Johnson, William L (TEIS); oracle-l@freelists.org
    Subject: RE: Determining which rows have characters outside of the standard ASCII (0-127)

    Hi Bill,

    I tried your suggestion, but, it doesn't seem to work as expected:

    pqrac101:[pqprd1]:(/home/oracle):$csscan

    Character Set Scanner v2.1 : Release 10.2.0.3.0 - Production on Tue Nov 9 08:16:45 2010

    Copyright (c) 1982, 2005, Oracle.  All rights reserved.

    Username: adds_at_prd1

    Password:

    Connected to:
    Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
    With the Partitioning, Real Application Clusters and Data Mining options

    (1)Full database, (2)User, (3)Table, (4)Column: 1 > 4

    Current database character set is WE8ISO8859P1.

    Enter new database character set name: > US7ASCII

    Enter array fetch buffer size: 1024000 >

    Enter number of scan processes to utilize(1..32): 1 >

    Enter column name to scan: > DOCUMENTS.DOC_AUTHORS

    Enter column name to scan: >

    Enumerating tables to scan...

    table(s) contain no character type columns

    Scanner terminated successfully.

    Not sure what "contain no character type columns" actually means.....

    -Mark

    From: Johnson, William L (TEIS)
    Sent: Tuesday, November 09, 2010 8:04 AM
    To: Bobak, Mark; oracle-l@freelists.org
    Subject: RE: Determining which rows have characters outside of the standard ASCII (0-127)

    Could you try using Oracle's csscan utility and pick a US7ASCII database as the target for the migration?  This should identify any rows with data outside the normal US7ASCII character set...

    Bill

    From: oracle-l-bounce_at_freelists.org On Behalf Of Bobak, Mark
    Sent: Tuesday, November 09, 2010 7:56 AM
    To: oracle-l@freelists.org
    Subject: Determining which rows have characters outside of the standard ASCII (0-127)

    Hi all,

    I'm trying to (efficiently) determine which rows have column values with characters outside of the range of 0-127.

    My first attempt was something like this:

    select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0x80-0xFF]') > 0;

    But, that seems to select every row in the documents table, not just the ones containing characters with values in the range 128-255.

    Looking at one of the rows returned from the query above, with dump(doc_authors) confirms that rows being returned don't have characters in the range 128-255.

    This is a Unicode database, so, I also tried:

    select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0c0080-0cFFFF]') > 0;

    but, again, this seems to return every row.

    So, can someone offer me a clue here?

    Honestly, this is the first time I've tried using any of Oracle's REGEXP functions.

    I'm sure I'm just doing something stupid, but I don't have a clue what it is, and the examples I've run across in the manuals and on the web, don't have anything similar to what I'm trying to do.

    AdvThanksance,

    -Mark

    The information contained in this communication is confidential, is
    intended only for the use of the recipient named above, and may be legally
    privileged.

    If the reader of this message is not the intended recipient, you are
    hereby notified that any dissemination, distribution or copying of this
    communication is strictly prohibited.

    If you have received this communication in error, please resend this
    communication to the sender and delete the original message or any copy
    of it from your computer system.

    Thank You.

    --
    http://www.freelists.org/webpage/oracle-l
  • Bobak, Mark at Nov 9, 2010 at 2:12 pm
    Nope.

    -----Original Message-----
    From: Johnson, William L (TEIS)
    Sent: Tuesday, November 09, 2010 9:12 AM
    To: Herring Dave - dherri; Bobak, Mark; oracle-l@freelists.org
    Subject: RE: Determining which rows have characters outside of the standard ASCII (0-127)

    Is this a partitioned table?
    Bug 9488049: CSSCAN DOES NOT SCAN PARTITIONED IOT TABLE.

    -----Original Message-----
    From: Herring Dave - dherri
    Sent: Tuesday, November 09, 2010 9:10 AM
    To: Mark.Bobak_at_proquest.com; Johnson, William L (TEIS); oracle-l_at_freelists.org
    Subject: RE: Determining which rows have characters outside of the standard ASCII (0-127)

    Mark,

    Could something like the following work for you?

    select doc_id, doc_authors
    from documents
    where doc_authors != convert(doc_authors, 'us7ascii');

    Dave Herring  | DBA
    Acxiom Global Technology Solutions

    630-944-4762 office | 630-430-5988 cell | 630-944-4989 fax
    1501 Opus Pl | Downers Grove, IL, 60515 | U.S.A. | www.acxiom.com
    Service Desk: 888-243-4566, https://servicedesk.acxiom.com, GSCA_at_DNB.com

    From: oracle-l-bounce_at_freelists.org On Behalf Of Bobak, Mark
    Sent: Tuesday, November 09, 2010 7:44 AM
    To: Johnson, William L (TEIS); oracle-l@freelists.org
    Subject: RE: Determining which rows have characters outside of the standard ASCII (0-127)

    Hi Bill,

    I tried your suggestion, but, it doesn't seem to work as expected:

    pqrac101:[pqprd1]:(/home/oracle):$csscan

    Character Set Scanner v2.1 : Release 10.2.0.3.0 - Production on Tue Nov 9 08:16:45 2010

    Copyright (c) 1982, 2005, Oracle.  All rights reserved.

    Username: adds_at_prd1

    Password:

    Connected to:
    Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
    With the Partitioning, Real Application Clusters and Data Mining options

    (1)Full database, (2)User, (3)Table, (4)Column: 1 > 4

    Current database character set is WE8ISO8859P1.

    Enter new database character set name: > US7ASCII

    Enter array fetch buffer size: 1024000 >

    Enter number of scan processes to utilize(1..32): 1 >

    Enter column name to scan: > DOCUMENTS.DOC_AUTHORS

    Enter column name to scan: >

    Enumerating tables to scan...

    table(s) contain no character type columns

    Scanner terminated successfully.

    Not sure what "contain no character type columns" actually means.....

    -Mark

    From: Johnson, William L (TEIS)
    Sent: Tuesday, November 09, 2010 8:04 AM
    To: Bobak, Mark; oracle-l@freelists.org
    Subject: RE: Determining which rows have characters outside of the standard ASCII (0-127)

    Could you try using Oracle's csscan utility and pick a US7ASCII database as the target for the migration?  This should identify any rows with data outside the normal US7ASCII character set...

    Bill

    From: oracle-l-bounce_at_freelists.org On Behalf Of Bobak, Mark
    Sent: Tuesday, November 09, 2010 7:56 AM
    To: oracle-l@freelists.org
    Subject: Determining which rows have characters outside of the standard ASCII (0-127)

    Hi all,

    I'm trying to (efficiently) determine which rows have column values with characters outside of the range of 0-127.

    My first attempt was something like this:

    select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0x80-0xFF]') > 0;

    But, that seems to select every row in the documents table, not just the ones containing characters with values in the range 128-255.

    Looking at one of the rows returned from the query above, with dump(doc_authors) confirms that rows being returned don't have characters in the range 128-255.

    This is a Unicode database, so, I also tried:

    select doc_id,doc_authors from documents where regexp_instr(doc_authors,'[0c0080-0cFFFF]') > 0;

    but, again, this seems to return every row.

    So, can someone offer me a clue here?

    Honestly, this is the first time I've tried using any of Oracle's REGEXP functions.

    I'm sure I'm just doing something stupid, but I don't have a clue what it is, and the examples I've run across in the manuals and on the web, don't have anything similar to what I'm trying to do.

    AdvThanksance,

    -Mark

    The information contained in this communication is confidential, is
    intended only for the use of the recipient named above, and may be legally
    privileged.

    If the reader of this message is not the intended recipient, you are
    hereby notified that any dissemination, distribution or copying of this
    communication is strictly prohibited.

    If you have received this communication in error, please resend this
    communication to the sender and delete the original message or any copy
    of it from your computer system.

    Thank You.

    --
    http://www.freelists.org/webpage/oracle-l
  • Ian Cary at Nov 9, 2010 at 2:39 pm
    Hi Mark,

    Would regexp_instr(doc_authors,'[[:alnum:][:blank:][:cntrl:][:punct:]]')
    help as I believe this covers ascii(0-127)

    Cheers,

    Ian
    ---------+----------------------------->
    Mark.Bobak_at_proques|
    t.com |
    Sent by: |
    oracle-l-bounce_at_fr|
    eelists.org |
    09/11/2010 14:12 |
    Please respond to |
    Mark.Bobak |
    ---------+----------------------------->
    --------------------------------------------------------------------------------------------------------------------------------------------------|
    To: WLJohnson_at_tycoelectronics.com, Dave.Herring_at_acxiom.com, oracle-l_at_freelists.org |
    cc: |
    Subject: RE: Determining which rows have characters outside of the standard ASCII (0-127) |
    --------------------------------------------------------------------------------------------------------------------------------------------------|
    Nope.

    -----Original Message-----
    From: Johnson, William L (TEIS)
    Sent: Tuesday, November 09, 2010 9:12 AM
    To: Herring Dave - dherri; Bobak, Mark; oracle-l@freelists.org
    Subject: RE: Determining which rows have characters outside of the standard
    ASCII (0-127)

    Is this a partitioned table?
    Bug 9488049: CSSCAN DOES NOT SCAN PARTITIONED IOT TABLE.

    -----Original Message-----
    From: Herring Dave - dherri
    Sent: Tuesday, November 09, 2010 9:10 AM
    To: Mark.Bobak_at_proquest.com; Johnson, William L (TEIS);
    oracle-l@freelists.org
    Subject: RE: Determining which rows have characters outside of the standard
    ASCII (0-127)

    Mark,

    Could something like the following work for you?

    select doc_id, doc_authors
    from documents
    where doc_authors != convert(doc_authors, 'us7ascii');

    Dave Herring  | DBA
    Acxiom Global Technology Solutions

    630-944-4762 office | 630-430-5988 cell | 630-944-4989 fax
    1501 Opus Pl | Downers Grove, IL, 60515 | U.S.A. | www.acxiom.com
    Service Desk: 888-243-4566, https://servicedesk.acxiom.com, GSCA_at_DNB.com

    From: oracle-l-bounce_at_freelists.org
    On Behalf Of Bobak, Mark
    Sent: Tuesday, November 09, 2010 7:44 AM
    To: Johnson, William L (TEIS); oracle-l@freelists.org
    Subject: RE: Determining which rows have characters outside of the standard
    ASCII (0-127)

    Hi Bill,

    I tried your suggestion, but, it doesn't seem to work as expected:

    pqrac101:[pqprd1]:(/home/oracle):$csscan

    Character Set Scanner v2.1 : Release 10.2.0.3.0 - Production on Tue Nov 9
    08:16:45 2010

    Copyright (c) 1982, 2005, Oracle.  All rights reserved.

    Username: adds_at_prd1

    Password:

    Connected to:
    Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit
    Production
    With the Partitioning, Real Application Clusters and Data Mining options

    (1)Full database, (2)User, (3)Table, (4)Column: 1 > 4

    Current database character set is WE8ISO8859P1.

    Enter new database character set name: > US7ASCII

    Enter array fetch buffer size: 1024000 >

    Enter number of scan processes to utilize(1..32): 1 >

    Enter column name to scan: > DOCUMENTS.DOC_AUTHORS

    Enter column name to scan: >

    Enumerating tables to scan...

    table(s) contain no character type columns

    Scanner terminated successfully.

    Not sure what "contain no character type columns" actually means.....

    -Mark

    From: Johnson, William L (TEIS)
    Sent: Tuesday, November 09, 2010 8:04 AM
    To: Bobak, Mark; oracle-l@freelists.org
    Subject: RE: Determining which rows have characters outside of the standard
    ASCII (0-127)

    Could you try using Oracle's csscan utility and pick a US7ASCII database as
    the target for the migration?  This should identify any rows with data
    outside the normal US7ASCII character set...

    Bill

    From: oracle-l-bounce_at_freelists.org
    On Behalf Of Bobak, Mark
    Sent: Tuesday, November 09, 2010 7:56 AM
    To: oracle-l@freelists.org
    Subject: Determining which rows have characters outside of the standard
    ASCII (0-127)

    Hi all,

    I'm trying to (efficiently) determine which rows have column values with
    characters outside of the range of 0-127.

    My first attempt was something like this:

    select doc_id,doc_authors from documents where
    regexp_instr(doc_authors,'[0x80-0xFF]') > 0;

    But, that seems to select every row in the documents table, not just the
    ones containing characters with values in the range 128-255.

    Looking at one of the rows returned from the query above, with
    dump(doc_authors) confirms that rows being returned don't have characters
    in the range 128-255.

    This is a Unicode database, so, I also tried:

    select doc_id,doc_authors from documents where
    regexp_instr(doc_authors,'[0c0080-0cFFFF]') > 0;

    but, again, this seems to return every row.

    So, can someone offer me a clue here?

    Honestly, this is the first time I've tried using any of Oracle's REGEXP
    functions.

    I'm sure I'm just doing something stupid, but I don't have a clue what it
    is, and the examples I've run across in the manuals and on the web, don't
    have anything similar to what I'm trying to do.

    AdvThanksance,

    -Mark

    The information contained in this communication is confidential, is
    intended only for the use of the recipient named above, and may be legally
    privileged.

    If the reader of this message is not the intended recipient, you are
    hereby notified that any dissemination, distribution or copying of this
    communication is strictly prohibited.

    If you have received this communication in error, please resend this
    communication to the sender and delete the original message or any copy
    of it from your computer system.

    Thank You.

    --
    http://www.freelists.org/webpage/oracle-l

    For the latest data on the economy and society consult National Statistics at http://www.ons.gov.uk

    *********************************************************************************

    Please Note: Incoming and outgoing email messages are routinely monitored for compliance with our policy on the use of electronic communications
    *********************************************************************************

    Legal Disclaimer : Any views expressed by the sender of this message are not necessarily those of the Office for National Statistics
    *********************************************************************************

    The original of this email was scanned for viruses by the Government Secure Intranet virus scanning service supplied by Cable&Wireless Worldwide in partnership with MessageLabs. (CCTM Certificate Number 2009/09/0052.) On leaving the GSi this email was certified virus free.
    Communications via the GSi may be automatically logged, monitored and/or recorded for legal purposes.
    --
    http://www.freelists.org/webpage/oracle-l

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouporacle-l @
categoriesoracle
postedNov 9, '10 at 12:55p
activeNov 9, '10 at 2:39p
posts9
users5
websiteoracle.com

People

Translate

site design / logo © 2022 Grokbase