Grokbase Groups Perl ai February 2005
Aha, yes.

AI::Categorizer lets you customize the tokenization behavior to be
however you want, by subclassing the Document class and overriding the
tokenize() method. You could do something like this:

package My::Documents;
@ISA = qw(AI::Categorizer::Document::Text);
sub tokenize {
return [split ' ', $_[1]];
my $c = new AI::Categorizer(
document_class => 'My::Documents',


On Feb 5, 2005, at 11:23 AM, Jason Armstrong wrote:

Thanks for all the good feedback, I'll certainly be following up on it.

I did find one reason why I wasn't getting good matches ... when I
looked more carefully at the perl data structure, I found that the
'features' hash only contained alphabetic characters. So, for example,
in the string 'WARRIOR 14-160 14-160', only the warrior part was being
used. Also, with 'BMW 318i' and 'BWM 525i', the numbers were being
ignored, and with something like 'A/T', two separate features 'a' and
't' were there.

So my further question is how to get NaiveBayes to use white space
separated words as features ('318i', 'a/t') and not just the individual
alphabetic characters. Is it a simple option when calling
new AI::Categorizer?

Jason Armstrong

Search Discussions

Discussion Posts


Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 9 of 9 | next ›
Discussion Overview
groupai @
postedFeb 4, '05 at 10:19a
activeFeb 6, '05 at 4:20p



site design / logo © 2021 Grokbase