FAQ
Hello,
I have a Tokenizer that generates a Payload, and a TokenFilter that uses it.
These work well with Solr 1.4.0 (therefore Lucene 2.9.1?), but when
I switched to the trunk version (I rebuilt the Tokenizer and TokenFilter
using the Lucene jar from the trunk and ran it), I encountered with
this error:

java.lang.IllegalArgumentException: This AttributeSource does not have the attribute 'org.apache.lucene.analysis.tokenattributes.PayloadAttribute'.

This exception comes from this line in the TokenFilter code:

payloadAtt = (PayloadAttribute) getAttribute(PayloadAttribute.class);

The payload is created in the Tokenizer's constructor like this:
payloadAtt = (PayloadAttribute) addAttribute(PayloadAttribute.class);

When I only has the Tokenizer, the Solr admin shows the Payloads, so
I am pretty sure that Payloads are there, but the TokenFilter is
having trouble getting it.

I think there is a very subtle change between the trunk version
and Lucene 2.9.1 (or more likely 3.0.1 guessing from other indirect
evidences) that is causing this. Has anyone encountered
a similar problem?

----
Teruhiko "Kuro" Kurosaka
RLP + Lucene & Solr = powerful search for global contents


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Robert Muir at Dec 4, 2010 at 3:31 am

    On Fri, Dec 3, 2010 at 10:15 PM, Teruhiko Kurosaka wrote:
    Hello,
    I have a Tokenizer that generates a Payload, and a TokenFilter that uses it.
    These work well with Solr 1.4.0 (therefore Lucene 2.9.1?), but when
    I switched to the trunk version (I rebuilt the Tokenizer and TokenFilter
    using the Lucene jar from the trunk and ran it), I encountered with
    this error:

    java.lang.IllegalArgumentException: This AttributeSource does not have the attribute 'org.apache.lucene.analysis.tokenattributes.PayloadAttribute'.

    This exception comes from this line in the TokenFilter code:

    payloadAtt = (PayloadAttribute) getAttribute(PayloadAttribute.class);
    I recommend you use addAttribute instead. Its buggy to use
    getAttribute in this way because you cannot rely upon the fact that a
    previous tokenstream has added the attribute[1]. I think your code was
    only working before because of TokenWrapperAttributeFactory (the
    backwards compatibility layer in 2.9 for the old Token API) being
    present.

    [1] http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/util/AttributeSource.html#getAttribute(java.lang.Class)

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Teruhiko Kurosaka at Dec 4, 2010 at 11:06 pm
    Thank you, Robert, substituting getAttribute with addAttribute worked!

    But I don't understand why. Could you help me to understand the mechanics?

    In my setting,
    hasAttribute(PayloadAttribute.class) returns false.

    So I thought addAttribute(PayloadAttribute.class) would just
    create a new PayloadAttribute object. It would remedy the
    Exception, but it wouldn't do any good accessing the payload
    generated upstream.

    But the newly generated PayloadAttribute t is actually
    getting the payload that was generated upstream (by my Tokenizer).
    How is this possible?

    On Dec 3, 2010, at 7:30 PM, Robert Muir wrote:
    On Fri, Dec 3, 2010 at 10:15 PM, Teruhiko Kurosaka wrote:
    Hello,
    I have a Tokenizer that generates a Payload, and a TokenFilter that uses it.
    These work well with Solr 1.4.0 (therefore Lucene 2.9.1?), but when
    I switched to the trunk version (I rebuilt the Tokenizer and TokenFilter
    using the Lucene jar from the trunk and ran it), I encountered with
    this error:

    java.lang.IllegalArgumentException: This AttributeSource does not have the attribute 'org.apache.lucene.analysis.tokenattributes.PayloadAttribute'.

    This exception comes from this line in the TokenFilter code:

    payloadAtt = (PayloadAttribute) getAttribute(PayloadAttribute.class);
    I recommend you use addAttribute instead. Its buggy to use
    getAttribute in this way because you cannot rely upon the fact that a
    previous tokenstream has added the attribute[1]. I think your code was
    only working before because of TokenWrapperAttributeFactory (the
    backwards compatibility layer in 2.9 for the old Token API) being
    present.

    [1] http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/util/AttributeSource.html#getAttribute(java.lang.Class)

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ----
    T. "Kuro" Kurosaka, 415-227-9600x122, 617-386-7122(direct)




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Dec 4, 2010 at 11:19 pm
    All tokenfilters/tokenizers in a chain share the same attributes. When you
    add an attribute to one of them, they all share the same instance. That's
    the whole trick behind the analysis API. getAttribute() should only be used
    in conditional code, that check with has Attribute before. But In general,
    producing and consuming attributes should always use addAttribute, then you
    don't need to take care if its there or not.

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de
    -----Original Message-----
    From: Teruhiko Kurosaka
    Sent: Sunday, December 05, 2010 12:05 AM
    To: java-user@lucene.apache.org
    Subject: Re: PayloadAttribute behavior change between Lucene 2.9/3.0 and
    the trunk

    Thank you, Robert, substituting getAttribute with addAttribute worked!

    But I don't understand why. Could you help me to understand the
    mechanics?

    In my setting,
    hasAttribute(PayloadAttribute.class) returns false.

    So I thought addAttribute(PayloadAttribute.class) would just create a new
    PayloadAttribute object. It would remedy the Exception, but it wouldn't do
    any good accessing the payload generated upstream.

    But the newly generated PayloadAttribute t is actually getting the payload
    that was generated upstream (by my Tokenizer).
    How is this possible?

    On Dec 3, 2010, at 7:30 PM, Robert Muir wrote:
    On Fri, Dec 3, 2010 at 10:15 PM, Teruhiko Kurosaka wrote:
    Hello,
    I have a Tokenizer that generates a Payload, and a TokenFilter that
    uses it.
    These work well with Solr 1.4.0 (therefore Lucene 2.9.1?), but when
    I switched to the trunk version (I rebuilt the Tokenizer and
    TokenFilter
    using the Lucene jar from the trunk and ran it), I encountered with
    this error:

    java.lang.IllegalArgumentException: This AttributeSource does not have
    the attribute
    'org.apache.lucene.analysis.tokenattributes.PayloadAttribute'.
    This exception comes from this line in the TokenFilter code:

    payloadAtt = (PayloadAttribute)
    getAttribute(PayloadAttribute.class);
    I recommend you use addAttribute instead. Its buggy to use
    getAttribute in this way because you cannot rely upon the fact that a
    previous tokenstream has added the attribute[1]. I think your code was
    only working before because of TokenWrapperAttributeFactory (the
    backwards compatibility layer in 2.9 for the old Token API) being
    present.

    [1]
    http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/util/Attri
    buteSource.html#getAttribute(java.lang.Class)
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ----
    T. "Kuro" Kurosaka, 415-227-9600x122, 617-386-7122(direct)




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Robert Muir at Dec 4, 2010 at 11:23 pm

    On Sat, Dec 4, 2010 at 6:05 PM, Teruhiko Kurosaka wrote:
    Thank you, Robert, substituting getAttribute with addAttribute worked!

    But I don't understand why.  Could you help me to understand the mechanics?

    In my setting,
    hasAttribute(PayloadAttribute.class) returns false.

    So I thought addAttribute(PayloadAttribute.class) would just
    create a new PayloadAttribute object.  It would remedy the
    Exception, but it wouldn't do any good accessing the payload
    generated upstream.

    But the newly generated PayloadAttribute t is actually
    getting the payload that was generated upstream (by my Tokenizer).
    How is this possible?
    Attributes are shared for the entire analysis chain.
    It is best to think of getAttribute as "get a reference to an
    already-added attribute".

    And to think of addAttribute as "if the attribute already exists,
    return a reference to it, otherwise add it to the chain and return a
    reference to that".

    In other words, in the entire Analyzer, there can only be one
    PayloadAttribute. Because it is shared, it does not matter who calls
    addAttribute.

    So, its best to always use addAttribute in your constructor.

    The simplest way to see why this is good: imagine if someone was to
    use your TokenFilter with say a WhitespaceTokenizer that does not add
    PayloadAttribute. Then your filter would not produce any error, the
    PayloadAttribute would just be empty as you expect.

    The reason your code worked with getAttribute in Lucene 2.9 is to
    provide backwards-compatibility with the Token API: the 6 attributes
    from Token were always automatically added: TermAttribute,
    OffsetAttribute, PositionIncrementAttribute, PayloadAttribute,
    TypeAttribute, FlagsAttribute. You can see this by looking at
    TokenStream.initTokenWrapper:
    http://svn.apache.org/repos/asf/lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/TokenStream.java

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedDec 4, '10 at 3:15a
activeDec 4, '10 at 11:23p
posts5
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase