FAQ
Hi,

i have a script that reads and writes linux paths in a file. I save the
path (as unicode) with 2 other variables. I save them seperated by "," and
the "packets" by newlines. So my file looks like this:
path1, var1A, var1B
path2, var2A, var2B
path3, var3A, var3B
....

this works for "normal" paths but as soon as i have a path that does
include a "," it breaks. The problem now is that (afaik) linux allows
every char (aside from "/" and null) to be used in filenames. The only
solution i can think of is using null as a seperator, but there have to a
cleaner version ?

Thanks for any help

Biene_Maja

Search Discussions

  • Grant Edwards at Aug 31, 2010 at 3:22 pm

    On 2010-08-31, AmFreak at web.de wrote:
    Hi,

    i have a script that reads and writes linux paths in a file. I save the
    path (as unicode) with 2 other variables. I save them seperated by "," and
    the "packets" by newlines. So my file looks like this:
    path1, var1A, var1B
    path2, var2A, var2B
    path3, var3A, var3B
    ....

    this works for "normal" paths but as soon as i have a path that does
    include a "," it breaks. The problem now is that (afaik) linux allows
    every char (aside from "/" and null) to be used in filenames. The only
    solution i can think of is using null as a seperator, but there have to a
    cleaner version ?
    The normal thing to do is to escape the delimiter when it appears in
    data. There are lots of plenty of escaping standards to choose from,
    and some of them (e.g. the one used for URLs) are already present
    in various bits of Python's standard library.

    --
    Grant Edwards grant.b.edwards Yow! ... the HIGHWAY is
    at made out of LIME JELLO and
    gmail.com my HONDA is a barbequeued
    OYSTER! Yum!
  • AmFreak at Aug 31, 2010 at 9:45 pm
    Thanks for all the nice answers!
    The normal thing to do is to escape the delimiter when it appears in
    data. There are lots of plenty of escaping standards to choose from,
    and some of them (e.g. the one used for URLs) are already present
    in various bits of Python's standard library.
    The CSV module has something like that, but im using unicode and it
    doesn't work with that.


    Why is your impression that the null character is "dirty"?
    E.g. that's how find|xargs etc. usually work.
    Another alternative would be if you gaurantee that your varn's don't
    have commas then put the path last. But that doesn't account for
    filenames containing newlines.
    Another alternative would be to wrap with some kind of serialization
    library. But really, what's so dirty about null?
    I think i just prefer a little formated file instead of one loooong row :)



    A simple solution would be to save each line of data using JSON with the
    json
    module:
    import json
    path = "x,y,z"
    varA = 12
    varB = "abc"
    line = json.dumps([path, varA, varB])
    print line
    ["x,y,z", 12, "abc"]
    loadpathA, loadvarA, loadvarB = json.loads(line)
    print loadpathA
    x,y,z
    print loadvarA
    12
    print loadvarB
    abc

    Thanks, just tried it - so simple, but seems to work like a charm. Really
    aprecciated :D.
  • MRAB at Aug 31, 2010 at 4:19 pm

    On 31/08/2010 15:49, AmFreak at web.de wrote:
    Hi,

    i have a script that reads and writes linux paths in a file. I save the
    path (as unicode) with 2 other variables. I save them seperated by ","
    and the "packets" by newlines. So my file looks like this:
    path1, var1A, var1B
    path2, var2A, var2B
    path3, var3A, var3B
    ....

    this works for "normal" paths but as soon as i have a path that does
    include a "," it breaks. The problem now is that (afaik) linux allows
    every char (aside from "/" and null) to be used in filenames. The only
    solution i can think of is using null as a seperator, but there have to
    a cleaner version ?
    You could use a tab character '\t' instead.
  • Grant Edwards at Aug 31, 2010 at 4:58 pm

    On 2010-08-31, MRAB wrote:
    On 31/08/2010 15:49, AmFreak at web.de wrote:
    Hi,

    i have a script that reads and writes linux paths in a file. I save the
    path (as unicode) with 2 other variables. I save them seperated by ","
    and the "packets" by newlines. So my file looks like this:
    path1, var1A, var1B
    path2, var2A, var2B
    path3, var3A, var3B
    ....

    this works for "normal" paths but as soon as i have a path that does
    include a "," it breaks. The problem now is that (afaik) linux allows
    every char (aside from "/" and null) to be used in filenames. The only
    solution i can think of is using null as a seperator, but there have to
    a cleaner version ?
    You could use a tab character '\t' instead.
    That just breaks with a different set of filenames.

    --
    Grant Edwards grant.b.edwards Yow! ! Everybody out of
    at the GENETIC POOL!
    gmail.com
  • MRAB at Aug 31, 2010 at 5:13 pm

    On 31/08/2010 17:58, Grant Edwards wrote:
    On 2010-08-31, MRABwrote:
    On 31/08/2010 15:49, AmFreak at web.de wrote:
    Hi,

    i have a script that reads and writes linux paths in a file. I save the
    path (as unicode) with 2 other variables. I save them seperated by ","
    and the "packets" by newlines. So my file looks like this:
    path1, var1A, var1B
    path2, var2A, var2B
    path3, var3A, var3B
    ....

    this works for "normal" paths but as soon as i have a path that does
    include a "," it breaks. The problem now is that (afaik) linux allows
    every char (aside from "/" and null) to be used in filenames. The only
    solution i can think of is using null as a seperator, but there have to
    a cleaner version ?
    You could use a tab character '\t' instead.
    That just breaks with a different set of filenames.
    How many filenames contain control characters? Surely that's a bad idea.
  • Nobody at Aug 31, 2010 at 6:33 pm

    On Tue, 31 Aug 2010 18:13:44 +0100, MRAB wrote:

    this works for "normal" paths but as soon as i have a path that does
    include a "," it breaks. The problem now is that (afaik) linux allows
    every char (aside from "/" and null) to be used in filenames. The only
    solution i can think of is using null as a seperator, but there have to
    a cleaner version ?
    You could use a tab character '\t' instead.
    That just breaks with a different set of filenames.
    How many filenames contain control characters? Surely that's a bad idea.
    It may be a bad idea, but it's permitted by the OS. If you're writing a
    general-purpose tool, having it flake out whenever it encounters an
    "unusual" filename is also a bad idea.

    FWIW, my usual solution is URL-encoding (i.e. replacing any "awkward"
    character by a "%" followed by two hex digits representing the byte's
    value). It has the advantage that you can extend the set of bytes which
    need encoding as needed without having to change the code (e.g. you can
    provide a command-line argument or configuration file setting which
    specifies which bytes need to be encoded).
  • MRAB at Aug 31, 2010 at 6:57 pm

    On 31/08/2010 19:33, Nobody wrote:
    On Tue, 31 Aug 2010 18:13:44 +0100, MRAB wrote:

    this works for "normal" paths but as soon as i have a path that does
    include a "," it breaks. The problem now is that (afaik) linux allows
    every char (aside from "/" and null) to be used in filenames. The only
    solution i can think of is using null as a seperator, but there have to
    a cleaner version ?
    You could use a tab character '\t' instead.
    That just breaks with a different set of filenames.
    How many filenames contain control characters? Surely that's a bad idea.
    It may be a bad idea, but it's permitted by the OS.
    [snip]
    So are viruses. :-)
  • Alan Meyer at Sep 1, 2010 at 12:20 am
    On 8/31/2010 2:33 PM, Nobody wrote:

    ...
    FWIW, my usual solution is URL-encoding (i.e. replacing any "awkward"
    character by a "%" followed by two hex digits representing the byte's
    value). It has the advantage that you can extend the set of bytes which
    need encoding as needed without having to change the code (e.g. you can
    provide a command-line argument or configuration file setting which
    specifies which bytes need to be encoded).
    I like that one.

    A similar solution is to use an escape character, e.g., backslash, e.g.,
    "This is a backslash\\ and this is a comma\,."

    However, because the comma won't appear at all in the URL-encoded
    version, it has the virtue of still allowing you to split on commas.

    You must of course also URL encode the '%' as %25, e.g.,
    "Here is a comma (%2C) and this (%2C) is a percent sign."

    Alan
  • Grant Edwards at Aug 31, 2010 at 6:49 pm

    On 2010-08-31, MRAB wrote:
    On 31/08/2010 17:58, Grant Edwards wrote:
    On 2010-08-31, MRABwrote:
    On 31/08/2010 15:49, AmFreak at web.de wrote:
    Hi,

    i have a script that reads and writes linux paths in a file. I save the
    path (as unicode) with 2 other variables. I save them seperated by ","
    and the "packets" by newlines. So my file looks like this:
    path1, var1A, var1B
    path2, var2A, var2B
    path3, var3A, var3B
    ....

    this works for "normal" paths but as soon as i have a path that does
    include a "," it breaks. The problem now is that (afaik) linux allows
    every char (aside from "/" and null) to be used in filenames. The only
    solution i can think of is using null as a seperator, but there have to
    a cleaner version ?
    You could use a tab character '\t' instead.
    That just breaks with a different set of filenames.
    How many filenames contain control characters?
    How many filenames contain ","? Not many, but the OP wants his
    program to be bulletproof. Can't fault him for that.

    If I had a nickle for every Unix program or shell-script that failed
    when a filename had a space it it....
    Surely that's a bad idea.
    Of course it's a bad idea. That doesn't stop people from doing it.

    --
    Grant Edwards grant.b.edwards Yow! ! Now I understand
    at advanced MICROBIOLOGY and
    gmail.com th' new TAX REFORM laws!!
  • Stefan Schwarzer at Aug 31, 2010 at 9:35 pm
    Hi Grant,
    On 2010-08-31 20:49, Grant Edwards wrote:
    How many filenames contain ","?
    CVS repository files end with ,v . However, just let's agree
    that nobody uses CVS anymore. :-)
    Not many, but the OP wants his
    program to be bulletproof. Can't fault him for that.
    What about using the csv (not CVS) module?

    Stefan
  • Nobody at Sep 1, 2010 at 12:25 am

    On Tue, 31 Aug 2010 18:49:33 +0000, Grant Edwards wrote:

    How many filenames contain control characters?
    How many filenames contain ","? Not many,
    Unless you only ever deal with "Unix folk", it's not /that/ uncommon to
    encounter filenames which are essentially complete sentences, punctuation
    included.

    FWIW, I've found that a significant proportion of "why can't I burn this
    file to a CD" queries are because the Joliet extension to ISO-9660 "only"
    allows 64 characters in a filename.
  • Albert van der Horst at Sep 1, 2010 at 1:46 pm
    In article <i5jirs$4ae$1 at reader1.panix.com>,
    Grant Edwards wrote:
    On 2010-08-31, MRAB wrote:
    On 31/08/2010 17:58, Grant Edwards wrote:
    On 2010-08-31, MRABwrote:
    On 31/08/2010 15:49, AmFreak at web.de wrote:
    Hi,

    i have a script that reads and writes linux paths in a file. I save the
    path (as unicode) with 2 other variables. I save them seperated by ","
    and the "packets" by newlines. So my file looks like this:
    path1, var1A, var1B
    path2, var2A, var2B
    path3, var3A, var3B
    ....

    this works for "normal" paths but as soon as i have a path that does
    include a "," it breaks. The problem now is that (afaik) linux allows
    every char (aside from "/" and null) to be used in filenames. The only
    solution i can think of is using null as a seperator, but there have to
    a cleaner version ?
    You could use a tab character '\t' instead.
    That just breaks with a different set of filenames.
    How many filenames contain control characters?
    How many filenames contain ","? Not many, but the OP wants his
    program to be bulletproof. Can't fault him for that.
    As appending ",v" is the convention for rcs / cvs archives, I would
    say: a lot. Enough to guarantee that all my backup tar's contain at
    least a few.
    If I had a nickle for every Unix program or shell-script that failed
    when a filename had a space it it....
    I'd rather have it fail for spaces than for comma's.
    Surely that's a bad idea.
    Of course it's a bad idea. That doesn't stop people from doing it.

    --
    Grant Edwards grant.b.edwards Yow! ! Now I understand
    at advanced MICROBIOLOGY and
    gmail.com th' new TAX REFORM laws!!

    --
    --
    Albert van der Horst, UTRECHT,THE NETHERLANDS
    Economic growth -- being exponential -- ultimately falters.
    albert at spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
  • Jeremy Sanders at Aug 31, 2010 at 4:58 pm

    AmFreak at web.de wrote:

    i have a script that reads and writes linux paths in a file. I save the
    path (as unicode) with 2 other variables. I save them seperated by "," and
    the "packets" by newlines. So my file looks like this:
    path1, var1A, var1B
    path2, var2A, var2B
    path3, var3A, var3B
    If you're generating the file and it is safe to do so (you're not getting
    the data from the internet), you could use repr((path1, v1, v2)) to save the
    line to the file and eval to interpret back the tuple.

    Alternatively you could use // as a separator, making sure that you replace
    multiple slashes in the path which a single slash (which are equivalent).

    Jeremy
  • Albert Hopkins at Aug 31, 2010 at 6:10 pm

    On Tue, 2010-08-31 at 16:49 +0200, AmFreak at web.de wrote:
    i have a script that reads and writes linux paths in a file. I save
    the
    path (as unicode) with 2 other variables. I save them seperated by ","
    and
    the "packets" by newlines. So my file looks like this:
    path1, var1A, var1B
    path2, var2A, var2B
    path3, var3A, var3B
    ....

    this works for "normal" paths but as soon as i have a path that does
    include a "," it breaks. The problem now is that (afaik) linux
    allows
    every char (aside from "/" and null) to be used in filenames. The
    only
    solution i can think of is using null as a seperator, but there have
    to a
    cleaner version ?
    Why is your impression that the null character is "dirty"?

    E.g. that's how find|xargs etc. usually work.

    Another alternative would be if you gaurantee that your varn's don't
    have commas then put the path last. But that doesn't account for
    filenames containing newlines.

    Another alternative would be to wrap with some kind of serialization
    library. But really, what's so dirty about null?
  • Arnaud Delobelle at Aug 31, 2010 at 7:12 pm

    AmFreak at web.de writes:

    Hi,

    i have a script that reads and writes linux paths in a file. I save
    the path (as unicode) with 2 other variables. I save them seperated by
    "," and the "packets" by newlines. So my file looks like this:
    path1, var1A, var1B
    path2, var2A, var2B
    path3, var3A, var3B
    ....

    this works for "normal" paths but as soon as i have a path that does
    include a "," it breaks. The problem now is that (afaik) linux allows
    every char (aside from "/" and null) to be used in filenames. The only
    solution i can think of is using null as a seperator, but there have
    to a cleaner version ?

    Thanks for any help

    Biene_Maja
    A simple solution would be to save each line of data using JSON with the json
    module:
    import json


    path = "x,y,z"
    varA = 12
    varB = "abc"
    line = json.dumps([path, varA, varB])
    print line
    ["x,y,z", 12, "abc"]
    loadpathA, loadvarA, loadvarB = json.loads(line)
    print loadpathA
    x,y,z
    print loadvarA
    12
    print loadvarB
    abc

    HTH

    --
    Arnaud

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedAug 31, '10 at 2:49p
activeSep 1, '10 at 1:46p
posts16
users10
websitepython.org

People

Translate

site design / logo © 2022 Grokbase