Grokbase Groups OpenNLP dev May 2012
Guys, I need to fix CLI for SimpleTokenizer. Otherwise I have no objections.

On Thu, May 3, 2012 at 1:34 PM, jim.foobar wrote:
On 03/05/12 12:16, Jörn Kottmann wrote:
On 05/03/2012 10:58 AM, Jim - FooBar(); wrote:

I can also provide the "AggregateNameFinder" class which takes any
number of name-finders and merges their results in order to get better
evaluation statistics. Internally, it uses the "NameFinderME.**dropOverlappingSpans()"
method to get rid of nested spans, which however does the simplistic thing
of keeping the earliest span (ignoring the type of the span completely). I
think being able to merge results from several name-finders is a killer
feature that a lot of people will appreciate even if i don't think keeping
the earliest span is sensible when trying to evaluate several finders on
multiple entity types...
+1 to implement it based on NameFinderME.**dropOverlappingSpans.

In my opinion that is still a good baseline. We can come up with more
specialized and sophisticated
approaches e.g. based on probabilities and limited for statistical name

Yes, I agree it is not a bad baseline, but pretty soon we'll have to
either look at the probabilities (if someone is trying to merge several
models) or at the actual class of the namefinder that gave a particular
prediction and reason on that...for example if a prediction came from a
dictionary there is really no point in doubting it is there? It must be
correct! anyway, i'd love to see this feature on 1.5.3 and a couple of
weeks (what William needs) is not that long...


ps: btw, I 've been actually using the aggregate name-finder in my private
build for almost 3 weeks now...I'm passing it 2 dictionary finders of
different types and a maxent model that can also predict 2 types.
Everything works just fine! :)

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 9 of 17 | next ›
Discussion Overview
groupdev @
postedMay 3, '12 at 2:50a
activeMay 7, '12 at 7:35a



site design / logo © 2021 Grokbase