FAQ
Hi there,

I'm currently trying to work out how I can determine the type (string/number/date/etc)of a term. I've not seen any off the shelf way to do it so am trying to store a payload against each term that records the type.

I'm having a little trouble retrieving a payload I'd stored onto the term. At the moment I'm using the TypeAsPayloadTokenFilter, however I'll change this soon as it's not really what I'm after.

I'm then getting a TermEnum from either reader.terms() or using a PrefixTermEnum.

For each term I want the payload from I'm running this function:

private String getPayload(Term term) {
String payload = null;
try {
TermPositions termPositions = reader.termPositions(term);
termPositions.next();
if (termPositions.isPayloadAvailable()) {
byte[] payloadBytes = new byte[termPositions.getPayloadLength()];
payloadBytes = termPositions.getPayload(payloadBytes, 0);
payload = new String(payloadBytes);
LOG.debug(payload);
}
}
catch( IOException e) {
...

This never returns anything though.

Am I missing something here? Any help would be greatly appreciated.

Cheers,
Derek

--------------------------------------------------------------------------
NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.

Search Discussions

  • Grant Ingersoll at Oct 14, 2010 at 2:22 pm

    On Oct 13, 2010, at 11:37 AM, Sykes, Derek wrote:

    Hi there,

    I'm currently trying to work out how I can determine the type (string/number/date/etc)of a term. I've not seen any off the shelf way to do it so am trying to store a payload against each term that records the type.

    I'm having a little trouble retrieving a payload I'd stored onto the term. At the moment I'm using the TypeAsPayloadTokenFilter, however I'll change this soon as it's not really what I'm after.

    I'm then getting a TermEnum from either reader.terms() or using a PrefixTermEnum.

    For each term I want the payload from I'm running this function:

    private String getPayload(Term term) {
    String payload = null;
    try {
    TermPositions termPositions = reader.termPositions(term);
    termPositions.next();
    next() returns a boolean as to whether there is a valid entry, so you need to check that. You may not actually have a match for that term.

    if (termPositions.isPayloadAvailable()) {
    byte[] payloadBytes = new byte[termPositions.getPayloadLength()];
    payloadBytes = termPositions.getPayload(payloadBytes, 0);
    payload = new String(payloadBytes);
    LOG.debug(payload);
    }
    }
    catch( IOException e) {
    ...


    What does your Analysis process look like? Many of Lucene's analysis pieces don't bother setting type. Have you looked at the index with Luke? That should show you the payloads. Also, have a look at the SpanTermQuery. You can use the Spans object to step directly through the position matches and get the payloads, if they are available.



    --------------------------
    Grant Ingersoll
    http://www.lucidimagination.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sykes, Derek at Oct 14, 2010 at 3:57 pm
    Hey Grant,

    Fair point on the next(). In this case I'm iterating through the terms returned from a PrefixTermEnum so I know they're in the index.

    The analyser I'm using looks like this:

    public class TypeSavingAnalyzer extends StandardAnalyzer {

    public TypeSavingAnalyzer(Version version) {
    super(version);
    }

    @Override
    public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = super.tokenStream(fieldName, reader);
    return new TypeAsPayloadTokenFilter(result);
    }

    }

    I'm creating the field with Field.TermVector.WITH_POSITIONS and see the little "p" indicator in luke however see no indication of a payload (though am not 100% sure where to look) being present and I still get false from isPayloadAvailable.

    Basically what I'm trying to do is store in the payload some meta data so I know what type the term is as at the moment when I get a list of terms back from the reader some are numeric. It seems NumericField doesn't allow payloads so it seems like I may have to find an alternative method. Any suggestions would be greatly appreciated!

    Regards,
    Derek

    -----Original Message-----
    From: Grant Ingersoll
    Sent: 14 October 2010 15:22
    To: java-user@lucene.apache.org
    Subject: Re: determining the type of a term - retrieving a payload

    On Oct 13, 2010, at 11:37 AM, Sykes, Derek wrote:

    Hi there,

    I'm currently trying to work out how I can determine the type (string/number/date/etc)of a term. I've not seen any off the shelf way to do it so am trying to store a payload against each term that records the type.

    I'm having a little trouble retrieving a payload I'd stored onto the term. At the moment I'm using the TypeAsPayloadTokenFilter, however I'll change this soon as it's not really what I'm after.

    I'm then getting a TermEnum from either reader.terms() or using a PrefixTermEnum.

    For each term I want the payload from I'm running this function:

    private String getPayload(Term term) {
    String payload = null;
    try {
    TermPositions termPositions = reader.termPositions(term);
    termPositions.next();
    next() returns a boolean as to whether there is a valid entry, so you need to check that. You may not actually have a match for that term.

    if (termPositions.isPayloadAvailable()) {
    byte[] payloadBytes = new byte[termPositions.getPayloadLength()];
    payloadBytes = termPositions.getPayload(payloadBytes, 0);
    payload = new String(payloadBytes);
    LOG.debug(payload);
    }
    }
    catch( IOException e) {
    ...


    What does your Analysis process look like? Many of Lucene's analysis pieces don't bother setting type. Have you looked at the index with Luke? That should show you the payloads. Also, have a look at the SpanTermQuery. You can use the Spans object to step directly through the position matches and get the payloads, if they are available.



    --------------------------
    Grant Ingersoll
    http://www.lucidimagination.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --------------------------------------------------------------------------
    NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • David Causse at Oct 15, 2010 at 8:34 am

    On Wed, Oct 13, 2010 at 04:37:37PM +0100, Sykes, Derek wrote:
    Hi there,

    I'm currently trying to work out how I can determine the type (string/number/date/etc)of a term. I've not seen any off the shelf way to do it so am trying to store a payload against each term that records the type.

    I'm having a little trouble retrieving a payload I'd stored onto the term. At the moment I'm using the TypeAsPayloadTokenFilter, however I'll change this soon as it's not really what I'm after.

    I'm then getting a TermEnum from either reader.terms() or using a PrefixTermEnum.

    For each term I want the payload from I'm running this function:

    private String getPayload(Term term) {
    String payload = null;
    try {
    TermPositions termPositions = reader.termPositions(term);
    termPositions.next();
    if (termPositions.isPayloadAvailable()) {
    byte[] payloadBytes = new byte[termPositions.getPayloadLength()];
    payloadBytes = termPositions.getPayload(payloadBytes, 0);
    payload = new String(payloadBytes);
    LOG.debug(payload);
    }
    }
    catch( IOException e) {
    ...

    This never returns anything though.
    Hi,

    my guess is that you need to call nextPosition, why not something like
    this :
    // goto to the doc with skipTo(int internalId) or next()
    // Iterate over positions
    for(int i = 0; i < currentTermPos.freq(); i++) {
    int p = currentTermPos.nextPosition();
    payloadBuffer = currentTermPos.getPayload(payloadBuffer, 0);
    ...
    }

    --
    David Causse
    Spotter
    http://www.spotter.com/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sykes, Derek at Oct 15, 2010 at 2:29 pm
    Hi David,

    nextPosition() was indeed the missing link. Thanks very much!

    Cheers,
    Derek

    -----Original Message-----
    From: David Causse
    Sent: 15 October 2010 09:34
    To: java-user@lucene.apache.org
    Subject: Re: determining the type of a term - retrieving a payload
    On Wed, Oct 13, 2010 at 04:37:37PM +0100, Sykes, Derek wrote:
    Hi there,

    I'm currently trying to work out how I can determine the type (string/number/date/etc)of a term. I've not seen any off the shelf way to do it so am trying to store a payload against each term that records the type.

    I'm having a little trouble retrieving a payload I'd stored onto the term. At the moment I'm using the TypeAsPayloadTokenFilter, however I'll change this soon as it's not really what I'm after.

    I'm then getting a TermEnum from either reader.terms() or using a PrefixTermEnum.

    For each term I want the payload from I'm running this function:

    private String getPayload(Term term) {
    String payload = null;
    try {
    TermPositions termPositions = reader.termPositions(term);
    termPositions.next();
    if (termPositions.isPayloadAvailable()) {
    byte[] payloadBytes = new byte[termPositions.getPayloadLength()];
    payloadBytes = termPositions.getPayload(payloadBytes, 0);
    payload = new String(payloadBytes);
    LOG.debug(payload);
    }
    }
    catch( IOException e) {
    ...

    This never returns anything though.
    Hi,

    my guess is that you need to call nextPosition, why not something like
    this :
    // goto to the doc with skipTo(int internalId) or next()
    // Iterate over positions
    for(int i = 0; i < currentTermPos.freq(); i++) {
    int p = currentTermPos.nextPosition();
    payloadBuffer = currentTermPos.getPayload(payloadBuffer, 0);
    ...
    }

    --
    David Causse
    Spotter
    http://www.spotter.com/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --------------------------------------------------------------------------
    NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedOct 13, '10 at 3:38p
activeOct 15, '10 at 2:29p
posts5
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase