FAQ
hi all,
i have a text processing script that can work with a file but cannot work
with another file that has the same content.
as i compared the 2 files, i found the file that cannot work has a "^M" at
the end of each line. what is this? is this what made it not work?
by the way, i'm under unix.
thanks....

-Remy

Search Discussions

  • Jeff Peng at May 21, 2008 at 8:55 am

    On Wed, May 21, 2008 at 4:45 PM, Remy Guo wrote:

    as i compared the 2 files, i found the file that cannot work has a "^M" at
    the end of each line. what is this?
    run "dos2unix filename.txt" to convert it to unix format.
    you may got the file from windows.

    --
    Jeff Peng - Peng.Kyo@Gmail.com
    Professional Squid supports in China
    http://www.ChinaSquid.com/
  • Rob Coops at May 21, 2008 at 9:21 am
    That ^M is a line feed, or well the windows version of a line feed.

    There are several different ways in which to write a line feed and of course
    to make our lives better *nix, Dos/Windows and Mac all have their own way of
    writting them.

    So Jeff's suggestion relies on a little application that simply finds ad
    relapces all the dos/windows ways of doing things like line feeds and
    replaces them with the unix version of the same.

    If you feel like doing this in perl a simple regex will do the trick, at
    least for the line feeds, but there is more windows fun to be had, like the
    way MS Word replaces ceretain characters like " ' - and even ... with a
    special charatcer because they are estetacly more pleasing to the reader of
    the document. I am sure there are more examples and dos2unix covers them
    all.

    So if it is a single file and only a one of then dos2unix is the easiest
    way, if you want to do it in perl then you will most likely have to use a
    regex because not all moachines will have the dos2unix applicaiton
    available.

    Regards,

    Rob Coops
    On Wed, May 21, 2008 at 10:45 AM, Remy Guo wrote:

    hi all,
    i have a text processing script that can work with a file but cannot work
    with another file that has the same content.
    as i compared the 2 files, i found the file that cannot work has a "^M" at
    the end of each line. what is this? is this what made it not work?
    by the way, i'm under unix.
    thanks....

    -Remy
  • Remy Guo at May 21, 2008 at 2:20 pm
    it's really interesting... then how can i match that ^M using regex?
    i've tried "chomp" when reading each line but it doesn't work...

    2008/5/21 Rob Coops <rcoops@gmail.com>:
    That ^M is a line feed, or well the windows version of a line feed.

    There are several different ways in which to write a line feed and of
    course to make our lives better *nix, Dos/Windows and Mac all have their own
    way of writting them.

    So Jeff's suggestion relies on a little application that simply finds ad
    relapces all the dos/windows ways of doing things like line feeds and
    replaces them with the unix version of the same.

    If you feel like doing this in perl a simple regex will do the trick, at
    least for the line feeds, but there is more windows fun to be had, like the
    way MS Word replaces ceretain characters like " ' - and even ... with a
    special charatcer because they are estetacly more pleasing to the reader of
    the document. I am sure there are more examples and dos2unix covers them
    all.

    So if it is a single file and only a one of then dos2unix is the easiest
    way, if you want to do it in perl then you will most likely have to use a
    regex because not all moachines will have the dos2unix applicaiton
    available.

    Regards,

    Rob Coops
    On Wed, May 21, 2008 at 10:45 AM, Remy Guo wrote:

    hi all,
    i have a text processing script that can work with a file but cannot work
    with another file that has the same content.
    as i compared the 2 files, i found the file that cannot work has a "^M" at
    the end of each line. what is this? is this what made it not work?
    by the way, i'm under unix.
    thanks....

    -Remy
  • Xavier Noria at May 21, 2008 at 2:29 pm

    On Wed, May 21, 2008 at 4:20 PM, Remy Guo wrote:

    it's really interesting... then how can i match that ^M using regex?
    i've tried "chomp" when reading each line but it doesn't work...
    That's "\r" everywhere except in Macs before Mac OS X. Some programs
    display "\r" as "^M" but that's just a way to show it, there's really
    just one character.

    I you'd like to order the ideas about how newlines work have a look at
    this article:

    http://www.onlamp.com/pub/a/onlamp/2006/08/17/understanding-newlines.html

    -- fxn
  • Bob McConnell at May 21, 2008 at 2:38 pm
    From: Rob Coops
    That ^M is a line feed, or well the windows version of a line feed.
    Actually, it is an ASCII CR or carriage return. Microsoft uses CR/LF for
    end of line, where Unixen use just LF. Apple used something else, but
    may have changed when they switched to OSX. I used tr to clean it up,
    much like dos2unix does. I think the command was:

    $ tr "\r\n" "\n" < badfile > goodfile

    Bob McConnell
  • Remy Guo at May 21, 2008 at 2:43 pm
    it's done! great~ :) \r can match the ^M.
    thanks all~ Microsoft costs me several hours.... -_-

    2008/5/21 Bob McConnell <rvm@cbord.com>:
    From: Rob Coops
    That ^M is a line feed, or well the windows version of a line feed.
    Actually, it is an ASCII CR or carriage return. Microsoft uses CR/LF for
    end of line, where Unixen use just LF. Apple used something else, but
    may have changed when they switched to OSX. I used tr to clean it up,
    much like dos2unix does. I think the command was:

    $ tr "\r\n" "\n" < badfile > goodfile

    Bob McConnell
  • Sivasakthi at May 21, 2008 at 10:00 am
    The 'script' utility output normally has ^M and other control characters
    embedded in the output. To have all these control characters removed,

    try: $ col -b < script.txt > newfile.txt

    Regards,
    Siva
    On Wed, 2008-05-21 at 16:45 +0800, Remy Guo wrote:

    hi all,
    i have a text processing script that can work with a file but cannot work
    with another file that has the same content.
    as i compared the 2 files, i found the file that cannot work has a "^M" at
    the end of each line. what is this? is this what made it not work?
    by the way, i'm under unix.
    thanks....

    -Remy
  • Andrew Curry at May 21, 2008 at 2:24 pm
    If your trying to do this on a unix based system

    ^M is equivalent to \r\n so you can get rid of \r I believe.

    -----Original Message-----
    From: Remy Guo
    Sent: 21 May 2008 15:20
    To: Rob Coops
    Cc: Perl Beginners
    Subject: Re: what is ^M at the end of a line?

    it's really interesting... then how can i match that ^M using regex?
    i've tried "chomp" when reading each line but it doesn't work...

    2008/5/21 Rob Coops <rcoops@gmail.com>:
    That ^M is a line feed, or well the windows version of a line feed.

    There are several different ways in which to write a line feed and of
    course to make our lives better *nix, Dos/Windows and Mac all have
    their own way of writting them.

    So Jeff's suggestion relies on a little application that simply finds
    ad relapces all the dos/windows ways of doing things like line feeds
    and replaces them with the unix version of the same.

    If you feel like doing this in perl a simple regex will do the trick,
    at least for the line feeds, but there is more windows fun to be had,
    like the way MS Word replaces ceretain characters like " ' - and even
    ... with a special charatcer because they are estetacly more pleasing
    to the reader of the document. I am sure there are more examples and
    dos2unix covers them all.

    So if it is a single file and only a one of then dos2unix is the
    easiest way, if you want to do it in perl then you will most likely
    have to use a regex because not all moachines will have the dos2unix
    applicaiton available.

    Regards,

    Rob Coops
    On Wed, May 21, 2008 at 10:45 AM, Remy Guo wrote:

    hi all,
    i have a text processing script that can work with a file but cannot
    work with another file that has the same content.
    as i compared the 2 files, i found the file that cannot work has a
    "^M" at the end of each line. what is this? is this what made it not
    work?
    by the way, i'm under unix.
    thanks....

    -Remy

    This e-mail is from the PA Group. For more information, see
    www.thepagroup.com.

    This e-mail may contain confidential information. Only the addressee is
    permitted to read, copy, distribute or otherwise use this email or any
    attachments. If you have received it in error, please contact the sender
    immediately. Any opinion expressed in this e-mail is personal to the sender
    and may not reflect the opinion of the PA Group.

    Any e-mail reply to this address may be subject to interception or
    monitoring for operational reasons or for lawful business practices.
  • Rob Dixon at May 21, 2008 at 2:48 pm

    Remy Guo wrote:
    hi all,
    i have a text processing script that can work with a file but cannot work
    with another file that has the same content.
    as i compared the 2 files, i found the file that cannot work has a "^M" at
    the end of each line. what is this? is this what made it not work?
    by the way, i'm under unix.
    thanks....
    That will be your editor's representation of control-M, ASCII carriage return or
    "\x0D". Windows files uses a CR LF pair as the record terminator.

    Because a regex pattern of /\s/ matches both CR and LF (and FF) as well as tab
    and space, a loop like this

    while (<DATA>) {
    s/\s+$//;
    :
    }

    will do the same as a chomp, but will also remove trailing tabs and spaces,
    which is usually what is wanted.

    More properly, and especially if trailing whitespace is significant, you could
    change the input record separator by writing

    {
    local $/ = "\x0D\x0A";

    while (<DATA>) {
    chomp;
    :
    }
    }

    HTH,

    Rob

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbeginners @
categoriesperl
postedMay 21, '08 at 8:45a
activeMay 21, '08 at 2:48p
posts10
users8
websiteperl.org

People

Translate

site design / logo © 2021 Grokbase