FAQ
I have these following data:

a 100
a 102
c 100
a 102
b 111
c 100
c 102
c 100
c 100
a 102
...

I would like to have a list (either array or hash) with unique line .
Any help would appreciated.
Thanks.

Search Discussions

  • Mathew Snyder at Jan 10, 2007 at 5:51 am

    beast wrote:
    I have these following data:

    a 100
    a 102
    c 100
    a 102
    b 111
    c 100
    c 102
    c 100
    c 100
    a 102
    ...

    I would like to have a list (either array or hash) with unique line .
    Any help would appreciated.
    Thanks.

    The way I would do it is to place the initial data instance into a hash if it
    doesn't already exist.

    Just do a next if it does.

    Mathew
  • Beast at Jan 10, 2007 at 6:03 am

    Mathew Snyder wrote:
    The way I would do it is to place the initial data instance into a hash if it
    doesn't already exist.

    Just do a next if it does.
    It will remove duplicate key, not line.

    my %hash=();

    while (<>) {
    chomp;
    my ($key, $val) = split /,/;
    $hash{$key} = $val;
    }

    while ( my ($key, $value) = each(%h) ) {
    print "$key => $value\n";
    }


    Thanks.
  • Owen Cook at Jan 10, 2007 at 6:05 am

    On Wed, Jan 10, 2007 at 12:47:25PM +0700, beast wrote:
    I have these following data:

    a 100
    a 102
    c 100
    a 102
    b 111
    c 100
    c 102
    c 100
    c 100
    a 102
    ...

    I would like to have a list (either array or hash) with unique line .
    Any help would appreciated.
    Thanks.

    1. You need to read the data
    2. You need to split it into its components
    3. Then create a hash with the component

    This is untested off top of head

    my %data_hash;

    while(<DATA>){
    chomp; # Get rid of line feeds
    my @bits = split;
    $data_hash{$bits[0]} = $bits[1];
    }

    foreach my $data(keys %data_hash){print "$data $data_hash{$data}\n" }



    Owen
  • Beast at Jan 10, 2007 at 7:35 am

    Owen Cook wrote:
    my %data_hash;

    while(<DATA>){
    chomp; # Get rid of line feeds
    my @bits = split;
    $data_hash{$bits[0]} = $bits[1];
    }
    It will only remove duplicate key.

    Is this still acceptable in perl (its very ugly =(

    while (<>) {
    chomp;
    my ($k, $v) = split /,/;
    my $tmp_key = $k . "_" . $v;
    $hash{$tmp_key} = $_;
    }

    foreach $key (sort(keys %hash)) {
    my ($k, $v) = split /_/, $key;
    print "$k => $v\n";
    }
  • Mumia W. at Jan 10, 2007 at 12:49 pm

    On 01/10/2007 01:35 AM, beast wrote:
    [...]
    It will only remove duplicate key.

    Is this still acceptable in perl (its very ugly =(
    [...]
    This would remove duplicate lines:

    use List::MoreUtils qw(uniq);
    use File::Slurp;
    my @list = uniq read_file 'rm_dup_lines.txt';
    print $_ for @list;

    I'm a little confused on what you're trying to do, because you split on
    commas even though your data has no commas in it. Oh well, that code
    above removes duplicate lines.
  • Beast at Jan 10, 2007 at 12:59 pm

    Mumia W. wrote:
    On 01/10/2007 01:35 AM, beast wrote:
    [...]
    It will only remove duplicate key.

    Is this still acceptable in perl (its very ugly =(
    [...]
    This would remove duplicate lines:

    use List::MoreUtils qw(uniq);
    use File::Slurp;
    my @list = uniq read_file 'rm_dup_lines.txt';
    print $_ for @list;
    Without using any external modules not possible? :)
    I'm a little confused on what you're trying to do, because you split
    on commas even though your data has no commas in it. Oh well, that
    code above removes duplicate lines.
    Data is dummy, sorry for the confusion.
    The actual data was coming from log file, which is separated by space.

    while (<>) {
    my ($username, $ipaddr, @rest) = split /\s+/;
    #get the unique combinations of $username and $ipaddr
    }

    And btw, as data was coming from log, lines can be several thousands,
    but _unique lines_ should be less than 100

    --beast
  • Mathew at Jan 10, 2007 at 1:25 pm

    beast wrote:
    Mumia W. wrote:
    On 01/10/2007 01:35 AM, beast wrote:
    [...]
    It will only remove duplicate key.

    Is this still acceptable in perl (its very ugly =(
    [...]
    This would remove duplicate lines:

    use List::MoreUtils qw(uniq);
    use File::Slurp;
    my @list = uniq read_file 'rm_dup_lines.txt';
    print $_ for @list;
    Without using any external modules not possible? :)
    I'm a little confused on what you're trying to do, because you split
    on commas even though your data has no commas in it. Oh well, that
    code above removes duplicate lines.
    Data is dummy, sorry for the confusion.
    The actual data was coming from log file, which is separated by space.

    while (<>) {
    my ($username, $ipaddr, @rest) = split /\s+/;
    #get the unique combinations of $username and $ipaddr
    }

    And btw, as data was coming from log, lines can be several thousands,
    but _unique lines_ should be less than 100

    --beast


    I still think a hash would suffice. If you use the IP address variable
    as the key (because it SHOULD be unique, you can build your hash even
    with undef values. Once you place the IP into the hash the first time,
    write the line to a new file. The IP from each subsequent line can then
    be looked for in the hash, if it already exists, immediately call the
    next line. If it doesn't exist within the hash, add it and
    write the line out.
  • Dr.Ruud at Jan 11, 2007 at 2:32 am

    beast schreef:

    a 100
    a 102
    c 100
    a 102
    b 111
    c 100
    c 102
    c 100
    c 100
    a 102
    ...

    I would like to have a list (either array or hash) with unique line .
    perl -ne'$_{$_}||=print' datafile

    or

    perl -pe'$_ x=!$$_++' datafile

    --
    Affijn, Ruud

    "Gewoon is een tijger."
  • Oryann9 at Jan 11, 2007 at 3:11 pm

    "Dr.Ruud" wrote: beast schreef:

    a 100
    a 102
    c 100
    a 102
    b 111
    c 100
    c 102
    c 100
    c 100
    a 102
    ...

    I would like to have a list (either array or hash) with unique line .
    perl -ne'$_{$_}||=print' datafile

    or

    perl -pe'$_ x=!$$_++' datafile

    --
    Affijn, Ruud

    what are these exactly doing in plain english?
    1st line is not printing and second it is, but gets confusing at ||= in the 1st line and at !$ in 2nd line.

    thank you

    __________________________________________________
    Do You Yahoo!?
    Tired of spam? Yahoo! Mail has the best spam protection around
    http://mail.yahoo.com
  • Dr.Ruud at Jan 11, 2007 at 10:02 pm

    oryann9 schreef:
    "Dr.Ruud" wrote: beast schreef:
    beast:
    a 100
    a 102
    c 100
    a 102
    b 111
    c 100
    c 102
    c 100
    c 100
    a 102
    ...

    I would like to have a list (either array or hash) with unique line
    .
    perl -ne'$_{$_}||=print' datafile
    what are these exactly doing in plain english?
    1st line is not printing and second it is, but gets confusing at ||=
    in the 1st line and at !$ in 2nd line

    First see the output of

    perl -MO=Deparse -ne'$_{$_}||=print' datafile

    which returns:

    LINE: while (defined($_ = <ARGV>)) {
    $_{$_} ||= print($_);
    }


    See `perldoc perlrun` for the meaning of the -n option.


    Let me simplify the code, to this equivalent:

    while ( <ARGV> )
    {
    $d{$_} ||= print($_);
    }

    For every unique line of the input (the datafile) an entry to the
    hash-table %d is added, with the whole line as the key-value. The
    belonging value is set to the return value of print, which is 1 (if the
    print went OK).

    But if the key already exists, this is all skipped, because of the ||=
    operator, see `perldoc perlop`. This operator sets the lvalue to the
    rvalue, but only if the lvalue isn't true. So the loop is similar to

    while ( <ARGV> )
    {
    next if $d{$_};
    print;
    $d{$_} = 1;
    }

    And that was my explanation of perl -ne'$_{$_}||=print'.

    --
    Affijn, Ruud

    "Gewoon is een tijger."
  • Jason Roth at Jan 11, 2007 at 10:09 pm
    If we're just going for confusing concise one liners then I would use

    perl -ne '$$_||=print'

    You save three whole characters by using the symbol table instead of a hash :)
    On 1/11/07, Dr.Ruud wrote:
    oryann9 schreef:
    "Dr.Ruud" wrote: beast schreef:
    beast:
    a 100
    a 102
    c 100
    a 102
    b 111
    c 100
    c 102
    c 100
    c 100
    a 102
    ...

    I would like to have a list (either array or hash) with unique line
    .
    perl -ne'$_{$_}||=print' datafile
    what are these exactly doing in plain english?
    1st line is not printing and second it is, but gets confusing at ||=
    in the 1st line and at !$ in 2nd line

    First see the output of

    perl -MO=Deparse -ne'$_{$_}||=print' datafile

    which returns:

    LINE: while (defined($_ = <ARGV>)) {
    $_{$_} ||= print($_);
    }


    See `perldoc perlrun` for the meaning of the -n option.


    Let me simplify the code, to this equivalent:

    while ( <ARGV> )
    {
    $d{$_} ||= print($_);
    }

    For every unique line of the input (the datafile) an entry to the
    hash-table %d is added, with the whole line as the key-value. The
    belonging value is set to the return value of print, which is 1 (if the
    print went OK).

    But if the key already exists, this is all skipped, because of the ||=
    operator, see `perldoc perlop`. This operator sets the lvalue to the
    rvalue, but only if the lvalue isn't true. So the loop is similar to

    while ( <ARGV> )
    {
    next if $d{$_};
    print;
    $d{$_} = 1;
    }

    And that was my explanation of perl -ne'$_{$_}||=print'.

    --
    Affijn, Ruud

    "Gewoon is een tijger."


    --
    To unsubscribe, e-mail: beginners-unsubscribe@perl.org
    For additional commands, e-mail: beginners-help@perl.org
    http://learn.perl.org/

  • Oryann9 at Jan 12, 2007 at 12:15 am

    "Dr.Ruud" wrote: oryann9 schreef:
    "Dr.Ruud" wrote: beast schreef:
    beast:
    a 100
    a 102
    c 100
    a 102
    b 111
    c 100
    c 102
    c 100
    c 100
    a 102
    ...

    I would like to have a list (either array or hash) with unique line
    .
    perl -ne'$_{$_}||=print' datafile
    what are these exactly doing in plain english?
    1st line is not printing and second it is, but gets confusing at ||=
    in the 1st line and at !$ in 2nd line

    First see the output of

    perl -MO=Deparse -ne'$_{$_}||=print' datafile

    which returns:

    LINE: while (defined($_ = )) {
    $_{$_} ||= print($_);
    }


    See `perldoc perlrun` for the meaning of the -n option.


    Let me simplify the code, to this equivalent:

    while ( )
    {
    $d{$_} ||= print($_);
    }

    For every unique line of the input (the datafile) an entry to the
    hash-table %d is added, with the whole line as the key-value. The
    belonging value is set to the return value of print, which is 1 (if the
    print went OK).

    But if the key already exists, this is all skipped, because of the ||=
    operator, see `perldoc perlop`. This operator sets the lvalue to the
    rvalue, but only if the lvalue isn't true. So the loop is similar to

    while ( )
    {
    next if $d{$_};
    print;
    $d{$_} = 1;
    }

    And that was my explanation of perl -ne'$_{$_}||=print'.

    --
    Affijn, Ruud

    ************************

    cool! thanks, now I understand.



    ---------------------------------
    Want to start your own business? Learn how on Yahoo! Small Business.
  • Beast at Jan 12, 2007 at 6:11 am

    Dr.Ruud wrote:
    while ( <ARGV> )
    {
    next if $d{$_};
    print;
    $d{$_} = 1;
    }
    Got the idea. Many thanks!
  • Dr.Ruud at Jan 12, 2007 at 12:19 am

    "Dr.Ruud" schreef:
    beast:
    a 100
    a 102
    ...
    c 100
    a 102
    ...

    I would like to have a list (either array or hash) with unique line .
    perl -ne'$_{$_}||=print' datafile
    Jason Roth's version:

    perl -ne'$$_||=print' datafile
    or

    perl -pe'$_ x=!$$_++' datafile
    --
    Affijn, Ruud

    "Gewoon is een tijger."

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbeginners @
categoriesperl
postedJan 10, '07 at 5:47a
activeJan 12, '07 at 6:11a
posts15
users7
websiteperl.org

People

Translate

site design / logo © 2022 Grokbase