I seem to have problems with umlauts, such as in words
Präsentation
When a document is added with
return new AI::Categorizer::Document(name => $filename,
content => $content);
to the collection, after loading and finish, the feature vector
contains only fragments of these words, such as
pr => 1
sentation => 1
Setting the locale on the shell or in Perl does not have any effect
use locale;
not even with turning on de_AT explicitly.
--
Aaaaaah, lib/AI/Categorizer/Document.pm is NOT using locale and use locale
is very, uhm, local %-)
Patching the file does not seem to break the test cases.
\rho