FAQ
I have a log file with full Windows paths on a line. eg:
K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
\somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ; 1259006416

As I try to pull in the line and process it, python changes the "\10"
to a "\x08". This is before I can do anything with it. Is there a way
to specify that incoming lines (say, when using .readlines() ) should
be treated as raw strings?

TIA

Search Discussions

  • MRAB at Nov 24, 2009 at 8:27 pm

    utabintarbo wrote:
    I have a log file with full Windows paths on a line. eg:
    K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
    \somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ; 1259006416

    As I try to pull in the line and process it, python changes the "\10"
    to a "\x08". This is before I can do anything with it. Is there a way
    to specify that incoming lines (say, when using .readlines() ) should
    be treated as raw strings?
    .readlines() doesn't change the "\10" in a file to "\x08" in the string
    it returns.

    Could you provide some code which shows your problem?
  • Carsten Haese at Nov 24, 2009 at 8:28 pm

    utabintarbo wrote:
    I have a log file with full Windows paths on a line. eg:
    K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
    \somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ; 1259006416

    As I try to pull in the line and process it, python changes the "\10"
    to a "\x08".
    Python does no such thing. When Python reads bytes from a file, it
    doesn't interpret or change those bytes in any way. Either there is
    something else going on here that you're not telling us, or the file
    doesn't contain what you think it contains. Please show us the exact
    code you're using to process this file, and show us the exact contents
    of the file you're processing.
  • Utabintarbo at Nov 24, 2009 at 9:20 pm

    On Nov 24, 3:27?pm, MRAB wrote:
    .readlines() doesn't change the "\10" in a file to "\x08" in the string
    it returns.

    Could you provide some code which shows your problem?
    Here is the code block I have so far:
    for l in open(CONTENTS, 'r').readlines():
    f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
    if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
    shutil.rmtree(os.path.join(DIR1,f))
    if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
    shutil.rmtree(os.path.join(DIR2,f))

    I am trying to find dirs with the basename of the initial path less
    the extension in both DIR1 and DIR2

    A minimally obfuscated line from the log file:
    K:\sm\SMI\des\RS\Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz->/arch_m1/
    smi/des/RS/Pat/10DJ/121.D5-30\1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    11/24/2009 08:16:42 ; 1259068602

    What I get from the debugger/python shell:
    'K:\\sm\\SMI\\des\\RS\\Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz->/arch_m1/
    smi/des/RS/Pat/10DJ/121.D5-30/1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    11/24/2009 08:16:42 ; 1259068602'

    TIA
  • Jon Clements at Nov 24, 2009 at 9:50 pm

    On Nov 24, 9:20?pm, utabintarbo wrote:
    On Nov 24, 3:27?pm, MRAB wrote:


    .readlines() doesn't change the "\10" in a file to "\x08" in the string
    it returns.
    Could you provide some code which shows your problem?
    Here is the code block I have so far:
    for l in open(CONTENTS, 'r').readlines():
    ? ? f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
    ? ? if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
    ? ? ? ? shutil.rmtree(os.path.join(DIR1,f))
    ? ? ? ? if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
    ? ? ? ? ? ? ? ? shutil.rmtree(os.path.join(DIR2,f))

    I am trying to find dirs with the basename of the initial path less
    the extension in both DIR1 and DIR2

    A minimally obfuscated line from the log file:
    K:\sm\SMI\des\RS\Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz->/arch_m1/
    smi/des/RS/Pat/10DJ/121.D5-30\1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    11/24/2009 08:16:42 ; 1259068602

    What I get from the debugger/python shell:
    'K:\\sm\\SMI\\des\\RS\\Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz->/arch_m1/
    smi/des/RS/Pat/10DJ/121.D5-30/1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    11/24/2009 08:16:42 ; 1259068602'

    TIA
    jon at jon-desktop:~/pytest$ cat log.txt
    K:\sm\SMI\des\RS\Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz->/arch_m1/
    smi/des/RS/Pat/10DJ/121.D5-30\1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    11/24/2009 08:16:42 ; 1259068602
    log = open('/home/jon/pytest/log.txt', 'r').readlines()
    log
    ['K:\\sm\\SMI\\des\\RS\\Pat\\10DJ\\121.D5-30\\1215B-B-D5-BSHOE-MM.smz-
    /arch_m1/\n', 'smi/des/RS/Pat/10DJ/121.D5-30\\1215B-B-D5-BSHOE-
    MM.smz ; t9480rc ;\n', '11/24/2009 08:16:42 ; 1259068602\n']

    See -- it's not doing anything :)

    Although, "Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz" and "Pat
    \x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz" seem to be fairly different -- are
    you sure you're posting the correct output!?

    Jon.
  • Jon Clements at Nov 24, 2009 at 9:54 pm

    On Nov 24, 9:50?pm, Jon Clements wrote:
    On Nov 24, 9:20?pm, utabintarbo wrote: [snip]
    Although, "Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz" and "Pat
    \x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz" seem to be fairly different -- are
    you sure you're posting the correct output!?
    Ugh... let's try that...

    Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz
    Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz

    Jon.
  • Rhodri James at Nov 25, 2009 at 1:11 am

    On Tue, 24 Nov 2009 21:20:25 -0000, utabintarbo wrote:
    On Nov 24, 3:27 pm, MRAB wrote:

    .readlines() doesn't change the "\10" in a file to "\x08" in the string
    it returns.

    Could you provide some code which shows your problem?
    Here is the code block I have so far:
    for l in open(CONTENTS, 'r').readlines():
    f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
    if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
    shutil.rmtree(os.path.join(DIR1,f))
    if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
    shutil.rmtree(os.path.join(DIR2,f))
    Ahem. This doesn't run. os.path.split() returns a tuple, and calling
    os.path.splitext() doesn't work. Given that replacing the entire loop
    contents with "print l" readily disproves your assertion, I suggest you
    cut and paste actual code if you want an answer. Otherwise we're just
    going to keep saying "No, it doesn't", because no, it doesn't.
    A minimally obfuscated line from the log file:
    K:\sm\SMI\des\RS\Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz->/arch_m1/
    smi/des/RS/Pat/10DJ/121.D5-30\1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    11/24/2009 08:16:42 ; 1259068602

    What I get from the debugger/python shell:
    'K:\\sm\\SMI\\des\\RS\\Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz->/arch_m1/
    smi/des/RS/Pat/10DJ/121.D5-30/1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    11/24/2009 08:16:42 ; 1259068602'
    When you do what, exactly?

    --
    Rhodri James *-* Wildebeest Herder to the Masses
  • Rhodri James at Nov 25, 2009 at 1:16 am

    On Wed, 25 Nov 2009 01:11:29 -0000, Rhodri James wrote:
    On Tue, 24 Nov 2009 21:20:25 -0000, utabintarbo wrote:
    On Nov 24, 3:27 pm, MRAB wrote:

    .readlines() doesn't change the "\10" in a file to "\x08" in the string
    it returns.

    Could you provide some code which shows your problem?
    Here is the code block I have so far:
    for l in open(CONTENTS, 'r').readlines():
    f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
    if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
    shutil.rmtree(os.path.join(DIR1,f))
    if f in os.listdir(DIR2) and
    os.path.isdir(os.path.join(DIR2,f)):
    shutil.rmtree(os.path.join(DIR2,f))
    Ahem. This doesn't run. os.path.split() returns a tuple, and calling
    os.path.splitext() doesn't work.
    I meant, "doesn't work on a tuple". Sigh. It's been one of those days.

    --
    Rhodri James *-* Wildebeest Herder to the Masses
  • Grant Edwards at Nov 25, 2009 at 3:31 am

    On 2009-11-25, Rhodri James wrote:
    On Tue, 24 Nov 2009 21:20:25 -0000, utabintarbo wrote:
    On Nov 24, 3:27 pm, MRAB wrote:

    .readlines() doesn't change the "\10" in a file to "\x08" in the string
    it returns.

    Could you provide some code which shows your problem?
    Here is the code block I have so far:
    for l in open(CONTENTS, 'r').readlines():
    f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
    if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
    shutil.rmtree(os.path.join(DIR1,f))
    if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
    shutil.rmtree(os.path.join(DIR2,f))
    Ahem. This doesn't run. os.path.split() returns a tuple, and calling
    os.path.splitext() doesn't work. Given that replacing the entire loop
    contents with "print l" readily disproves your assertion, I suggest you
    cut and paste actual code if you want an answer. Otherwise we're just
    going to keep saying "No, it doesn't", because no, it doesn't.
    It's, um, rewarding to see my recent set of instructions being
    followed.
    A minimally obfuscated line from the log file:
    K:\sm\SMI\des\RS\Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz->/arch_m1/
    smi/des/RS/Pat/10DJ/121.D5-30\1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    11/24/2009 08:16:42 ; 1259068602

    What I get from the debugger/python shell:
    'K:\\sm\\SMI\\des\\RS\\Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz->/arch_m1/
    smi/des/RS/Pat/10DJ/121.D5-30/1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    11/24/2009 08:16:42 ; 1259068602'
    When you do what, exactly?
    ;)

    --
    Grant
  • Jon Clements at Nov 25, 2009 at 12:58 pm

    On Nov 25, 3:31?am, Grant Edwards wrote:
    On 2009-11-25, Rhodri James wrote:


    On Tue, 24 Nov 2009 21:20:25 -0000, utabintarbo <utabinta... at gmail.com> ?
    wrote:
    On Nov 24, 3:27 pm, MRAB wrote:

    .readlines() doesn't change the "\10" in a file to "\x08" in the string
    it returns.
    Could you provide some code which shows your problem?
    Here is the code block I have so far:
    for l in open(CONTENTS, 'r').readlines():
    ? ? f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
    ? ? if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
    ? ? ? ? shutil.rmtree(os.path.join(DIR1,f))
    ? ? ? ? if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
    ? ? ? ? ? ? shutil.rmtree(os.path.join(DIR2,f))
    Ahem. ?This doesn't run. ?os.path.split() returns a tuple, and calling ?
    os.path.splitext() doesn't work. ?Given that replacing the entire loop ?
    contents with "print l" readily disproves your assertion, I suggest you ?
    cut and paste actual code if you want an answer. ?Otherwise we're just ?
    going to keep saying "No, it doesn't", because no, it doesn't.
    It's, um, rewarding to see my recent set of instructions being
    followed.
    A minimally obfuscated line from the log file:
    K:\sm\SMI\des\RS\Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz->/arch_m1/
    smi/des/RS/Pat/10DJ/121.D5-30\1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    11/24/2009 08:16:42 ; 1259068602
    What I get from the debugger/python shell:
    'K:\\sm\\SMI\\des\\RS\\Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz->/arch_m1/
    smi/des/RS/Pat/10DJ/121.D5-30/1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    11/24/2009 08:16:42 ; 1259068602'
    When you do what, exactly?
    ;)

    --
    Grant
    Can't remember if this thread counts as "Edwards' Law 5[b|c]" :)

    I'm sure I pinned it up on my wall somewhere, right next to
    http://imgs.xkcd.com/comics/tech_support_cheat_sheet.png

    Jon.
  • Terry Reedy at Nov 24, 2009 at 11:06 pm

    utabintarbo wrote:
    I have a log file with full Windows paths on a line. eg:
    K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
    \somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ; 1259006416

    As I try to pull in the line and process it, python changes the "\10"
    to a "\x08".
    This should only happen if you paste the test into your .py file as a
    string literal.
    This is before I can do anything with it. Is there a way
    to specify that incoming lines (say, when using .readlines() ) should
    be treated as raw strings?
    Or if you use execfile or compile and ask Python to interprete the input
    as code.

    There are no raw strings, only raw string code literals marked with an
    'r' prefix for raw processing of the quoted text.
  • Rzed at Dec 2, 2009 at 1:22 am
    utabintarbo <utabintarbo at gmail.com> wrote in
    news:adc6c455-5616-471a-8b39-d7fdad2179e4 at m33g2000vbi.googlegroups.c
    om:
    I have a log file with full Windows paths on a line. eg:
    K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
    \somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ;
    1259006416

    As I try to pull in the line and process it, python changes the
    "\10" to a "\x08". This is before I can do anything with it. Is
    there a way to specify that incoming lines (say, when using
    .readlines() ) should be treated as raw strings?

    TIA
    Despite all the ragging you're getting, it is a pretty flakey thing
    that Python does in this context:
    (from a python shell)
    x = '\1'
    x
    '\x01'
    x = '\10'
    x
    '\x08'

    If you are pasting your string as a literal, then maybe it does the
    same. It still seems weird to me. I can accept that '\1' means x01,
    but \10 seems to be expanded to \010 and then translated from octal
    to get to x08. That's just strange. I'm sure it's documented
    somewhere, but it's not easy to search for.

    Oh, and this:
    '\7'
    '\x07'
    '\70'
    '8'
    ... is realy odd.

    --
    rzed
  • Dave Angel at Dec 2, 2009 at 5:39 am

    rzed wrote:
    utabintarbo <utabintarbo at gmail.com> wrote in
    news:adc6c455-5616-471a-8b39-d7fdad2179e4 at m33g2000vbi.googlegroups.c
    om:

    I have a log file with full Windows paths on a line. eg:
    K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
    \somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ;
    1259006416

    As I try to pull in the line and process it, python changes the
    "\10" to a "\x08". This is before I can do anything with it. Is
    there a way to specify that incoming lines (say, when using
    .readlines() ) should be treated as raw strings?

    TIA
    Despite all the ragging you're getting, it is a pretty flakey thing
    When the OP specified readline(), which does *not* behave this way, he
    probably deserved what you call "ragging." The backslash escaping is
    for string literals, which are in code, not in data files.

    In any case, there's a big difference between surprising (to you), and
    flakey.
    that Python does in this context:
    (from a python shell)
    x = '\1'
    x
    '\x01'
    x = '\10'
    x
    '\x08'

    If you are pasting your string as a literal, then maybe it does the
    same. It still seems weird to me. I can accept that '\1' means x01,
    but \10 seems to be expanded to \010 and then translated from octal
    to get to x08. That's just strange. I'm sure it's documented
    somewhere, but it's not easy to search for.
    Check in the help for "escape Strings". It's documented (in vers. 2.6,
    anyway) in a nice chart that backslash followed by 3 digits, is
    interpreted as octal. I don't like it much either, but it's inherited
    from C, which has worked that way for 30+ years.

    Online, see
    http://www.python.org/doc/2.6.4/reference/lexical_analysis.html, and
    look in section 2.4.1 for the chart.
    Oh, and this:
    '\7'
    '\x07'
    '\70'
    '8'
    ... is realy odd.
    Octal 70 is hex 38 (or decimal 56), which is the character '8'.

    DaveA

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedNov 24, '09 at 7:52p
activeDec 2, '09 at 5:39a
posts13
users9
websitepython.org

People

Translate

site design / logo © 2022 Grokbase