Grokbase Groups Perl ai August 2005
FAQ
Hello perl-ai!

I've been playing with AI::Categorizer for a week or two now, and am
having difficulties creating a collection object using the InMemory
module. I'm new to perl and oop and programming for that matter, but
I've managed to get the functionality I'm looking for from
AI::Categorizer using Collection::Files. However, it would be very much
more useful and efficient if I could create the collection from memory.
It seems that the collection is created, and I can load it into a
knowledgeset. I can even train NaiveBayes on the knowledge set and
categorize documents (although I'm not sure that it's doing so
properly.). It seems that it's not acknowledging all of the categories
that are included in the collection's documents, it seems to only be
recognizing one document's category set as the set for the collection.
The main error I'm getting is when I try to generate a stats_table using:

my $mem_experiment = $l_mem->categorize_collection( collection =>
$c_mem_test );
print $mem_experiment->stats_table;

Can't take log of 0 at
/usr/local/share/perl/5.8.4/Statistics/Contingency.pm line 183.

Can anyone tell me where I'm going wrong? I very much appreciate help
from anyone who has gotten this working. And thanks to Ken for creating
this great tool.

-Bill


---------code snippet--------
my %doc;
my %dochash;

my $cars = AI::Categorizer::Category->by_name(name => "cars");
my $trucks = AI::Categorizer::Category->by_name(name => "trucks");
my $baseball = AI::Categorizer::Category->by_name(name => "baseball");
my $seattle = AI::Categorizer::Category->by_name(name => "seattle");

push(my @seahawks_categories,$cars,$trucks);
push(my @seattle_categories,$seattle,$baseball);


$doc{name} = "Seahawks";
$doc{content} = "The Seahawks are a pretty good team. I enjoy watching
them, and going to Seattle to see them";
$doc{categories} = \@seahawks_categories;
$dochash{SeahawksDocTitle} = \%doc;

$doc{name} = "Seattle";
$doc{content} = "I like to go to seattle and watch the mariners and stuff";
$doc{categories} = \@seattle_categories;
$dochash{SeattleDocTitle} = \%doc;


my $collection = new AI::Categorizer::Collection::InMemory( data =>
\%dochash);

return($collection);

Search Discussions

  • Steffen Schwigon at Aug 5, 2005 at 8:09 am

    "Bill W." <bill-list@wtwhitman.com> writes:
    Hello perl-ai!

    I've been playing with AI::Categorizer for a week or two now, and am
    having difficulties creating a collection object using the InMemory
    module.
    Disclaimer: I'm just a quite non-expert user of AI::Categorizer. We
    used AI::Categorizer a while ago in a project and I'm not really
    remembering hard details. So better, you wait for the experts if in
    doubt.

    Anyway my experiences might me interesting for your first steps:

    AFAIR we had no luck using Collections. We simply filled an
    AI::Categorizer::KnowledgeSet with documents and categories and pushed
    as much RAM into our machines as possible for in memory learning.

    The important thing with this was that all documents had set its
    associated categories and also all categories had set its associated
    documents, which felt a bit redundant but seemed to be important.

    After short experimenting with Collections we built our own
    document collection framwork anyway, because it also better fit
    into our application needs (where we tried to do hierarchically
    categorization).


    GreetinX
    Steffen
    --
    Steffen Schwigon <schwigon@webit.de>
    Dresden Perl Mongers <http://dresden-pm.org/>
  • Ken Williams at Aug 5, 2005 at 11:29 pm
    Hi Bill,

    The problem in your example is actually in how you're creating
    %dochash. You're re-using the %doc hash for both documents, which
    means that under the surface you don't have what you think you have.
    Witness this condensed version of your example:

    =======================================================
    my %doc;
    my %dochash;

    $doc{name} = "Seahawks";
    $doc{content} = "The Seahawks are a pretty good team. I enjoy watching
    them.";
    $dochash{SeahawksDocTitle} = \%doc;

    $doc{name} = "Seattle";
    $doc{content} = "I like to go to seattle and watch the mariners and
    stuff";
    $dochash{SeattleDocTitle} = \%doc;

    use Data::Dumper;
    print Dumper \%dochash;
    =======================================================
    $VAR1 = {
    'SeattleDocTitle' => {
    'content' => 'I like to go to seattle
    and watch the mariners and stuff',
    'name' => 'Seattle'
    },
    'SeahawksDocTitle' => $VAR1->{'SeattleDocTitle'}
    };
    =======================================================


    There are several ways to create the data structure you intend - one
    way would be something like this:


    =======================================================
    my %dochash;

    $dochash{SeahawksDocTitle} =
    {
    name => "Seahawks",
    content => "The Seahawks are a pretty good team. I enjoy watching
    them.",
    };

    $dochash{SeattleDocTitle} =
    {
    name => "Seattle",
    content => "I like to go to seattle and watch the mariners and
    stuff",
    }

    use Data::Dumper;
    print Dumper \%dochash;
    =======================================================
    $VAR1 = {
    'SeattleDocTitle' => {
    'name' => 'Seattle',
    'content' => 'I like to go to seattle
    and watch the mariners and stuff'
    },
    'SeahawksDocTitle' => {
    'name' => 'Seahawks',
    'content' => 'The Seahawks are a
    pretty good team. I enjoy watching them.'
    }
    };
    =======================================================


    Then the following display code shows that the Collection is created
    properly:

    =======================================================
    print "Number of docs: ", $collection->count_documents, "\n";
    while (my $doc = $collection->next) {
    print $doc->name, " => [", join( ", ", map $_->name, $doc->categories
    ), "]\n";
    }
    =======================================================
    Number of docs: 2
    Seahawks => [trucks, cars]
    Seattle => [seattle, baseball]
    =======================================================


    -Ken

    On Aug 4, 2005, at 7:26 PM, Bill W. wrote:

    Hello perl-ai!

    I've been playing with AI::Categorizer for a week or two now, and am
    having difficulties creating a collection object using the InMemory
    module. I'm new to perl and oop and programming for that matter, but
    I've managed to get the functionality I'm looking for from
    AI::Categorizer using Collection::Files. However, it would be very
    much more useful and efficient if I could create the collection from
    memory. It seems that the collection is created, and I can load it
    into a knowledgeset. I can even train NaiveBayes on the knowledge set
    and categorize documents (although I'm not sure that it's doing so
    properly.). It seems that it's not acknowledging all of the
    categories that are included in the collection's documents, it seems
    to only be recognizing one document's category set as the set for the
    collection. The main error I'm getting is when I try to generate a
    stats_table using:

    my $mem_experiment = $l_mem->categorize_collection( collection =>
    $c_mem_test );
    print $mem_experiment->stats_table;

    Can't take log of 0 at
    /usr/local/share/perl/5.8.4/Statistics/Contingency.pm line 183.

    Can anyone tell me where I'm going wrong? I very much appreciate help
    from anyone who has gotten this working. And thanks to Ken for
    creating this great tool.

    -Bill


    ---------code snippet--------
    my %doc;
    my %dochash;

    my $cars = AI::Categorizer::Category->by_name(name => "cars");
    my $trucks = AI::Categorizer::Category->by_name(name => "trucks");
    my $baseball = AI::Categorizer::Category->by_name(name => "baseball");
    my $seattle = AI::Categorizer::Category->by_name(name => "seattle");

    push(my @seahawks_categories,$cars,$trucks);
    push(my @seattle_categories,$seattle,$baseball);


    $doc{name} = "Seahawks";
    $doc{content} = "The Seahawks are a pretty good team. I enjoy watching
    them, and going to Seattle to see them";
    $doc{categories} = \@seahawks_categories;
    $dochash{SeahawksDocTitle} = \%doc;

    $doc{name} = "Seattle";
    $doc{content} = "I like to go to seattle and watch the mariners and
    stuff";
    $doc{categories} = \@seattle_categories;
    $dochash{SeattleDocTitle} = \%doc;


    my $collection = new AI::Categorizer::Collection::InMemory( data
    => \%dochash);

    return($collection);

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupai @
categoriesperl
postedAug 5, '05 at 12:26a
activeAug 5, '05 at 11:29p
posts3
users3
websiteperl.org

People

Translate

site design / logo © 2021 Grokbase