FAQ
For my work, I have read an article stating that " Answer type can be
automatically constructed by Indexing Different Questions and Answer
types. Later, when an unseen question apears, answer type for this
question will be found with the help of 'similarity function'
computation"

so I am clear with the arguement above. my problem is,
1. how can I index individual questions and Answer types as is ( not tokenized
2. how can I calculate the similarity between indexed questions and
and unseen questions (question of any type that can be asked latter)

to make things clear: the senario is
1. Who is the president of UN
Answer type <Person>
2. When will the presidency of Meles Zenawi hold?
Answer Type <Date>
these two will be indexed and
and later an unseen question like
who is the president of Kenya
should match the first question and so that will have answer
type of <Person>

I appricate any help

Seid M

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Vasudevan Comandur at Mar 5, 2009 at 5:50 pm
    Hi,

    The very fact that you are trying to answer factoid questions to start
    with, it is better to use OpenNLP components to identify
    NER (Named Entity recognition) in the document and use those tags as part
    of your indexing process.

    REgards
    Vasu

    On Thu, Mar 5, 2009 at 8:19 PM, Seid Mohammed wrote:

    For my work, I have read an article stating that " Answer type can be
    automatically constructed by Indexing Different Questions and Answer
    types. Later, when an unseen question apears, answer type for this
    question will be found with the help of 'similarity function'
    computation"

    so I am clear with the arguement above. my problem is,
    1. how can I index individual questions and Answer types as is ( not
    tokenized
    2. how can I calculate the similarity between indexed questions and
    and unseen questions (question of any type that can be asked latter)

    to make things clear: the senario is
    1. Who is the president of UN
    Answer type <Person>
    2. When will the presidency of Meles Zenawi hold?
    Answer Type <Date>
    these two will be indexed and
    and later an unseen question like
    who is the president of Kenya
    should match the first question and so that will have answer
    type of <Person>

    I appricate any help

    Seid M

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Grant Ingersoll at Mar 5, 2009 at 8:05 pm
    Hi Seid,

    Do you have a reference for the article? I've done some QA in my day,
    but don't recall reading that one.

    At any rate, I do think it is possible to do what you are after. See
    below.
    On Mar 5, 2009, at 9:49 AM, Seid Mohammed wrote:

    For my work, I have read an article stating that " Answer type can be
    automatically constructed by Indexing Different Questions and Answer
    types. Later, when an unseen question apears, answer type for this
    question will be found with the help of 'similarity function'
    computation"

    so I am clear with the arguement above. my problem is,
    1. how can I index individual questions and Answer types as is ( not
    tokenized
    I'm not sure you want this, but when constructing your Field, just use
    the NOT_ANALYZED option.
    2. how can I calculate the similarity between indexed questions and
    and unseen questions (question of any type that can be asked latter)
    In line with #1, I think you might be better off to actually tokenize
    the question as one one field, and the answer type as a second field.
    Then, you can let Lucene calculate similarity via it's normal query
    mechanisms. In this case, I would like try experimenting with things
    like: exact match, phrase queries with slop, etc. That way, not only
    can you match "Who is the president of UN" but you might also match on
    things that are a bit fuzzier. To do this, you might need to have
    several fields per document with variations. I could also see using
    Lucene's payload mechanism as well.

    But, as Vasu said, you will likely need other parts too, like OpenNLP.

    HTH,
    Grant

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Patrick o'leary at Mar 5, 2009 at 8:21 pm
    Sounds like your most difficult part will be the question parser using POS.

    This is kind of old school but use something like the AliceBot AIML library
    http://en.wikipedia.org/wiki/AIML

    Where the subjective terms can be extracted from the questions, and indexed
    separately.

    Or as Grant and others suggest use OpenNLP (which rocks) or LingPipe
    (LingPipe license is a little bit of a pain)
    for entity extraction.

    An interesting way to look at the data would be to construct 3 fields,
    Original_Question, Question_base, Subject

    Doc:
    Original_Question: Who is the president of the UN
    Question_base: Who is the president of
    Question_base: Who is
    Subject: the president of the UN
    Subject: the president
    Subject: the UN
    /Doc

    And similarity can be somewhat easier to calculate with similar question
    bases, subjects, etc

    P

    On Thu, Mar 5, 2009 at 3:05 PM, Grant Ingersoll wrote:

    Hi Seid,

    Do you have a reference for the article? I've done some QA in my day, but
    don't recall reading that one.

    At any rate, I do think it is possible to do what you are after. See
    below.

    On Mar 5, 2009, at 9:49 AM, Seid Mohammed wrote:

    For my work, I have read an article stating that " Answer type can be
    automatically constructed by Indexing Different Questions and Answer
    types. Later, when an unseen question apears, answer type for this
    question will be found with the help of 'similarity function'
    computation"

    so I am clear with the arguement above. my problem is,
    1. how can I index individual questions and Answer types as is ( not
    tokenized
    I'm not sure you want this, but when constructing your Field, just use the
    NOT_ANALYZED option.

    2. how can I calculate the similarity between indexed questions and
    and unseen questions (question of any type that can be asked latter)
    In line with #1, I think you might be better off to actually tokenize the
    question as one one field, and the answer type as a second field. Then, you
    can let Lucene calculate similarity via it's normal query mechanisms. In
    this case, I would like try experimenting with things like: exact match,
    phrase queries with slop, etc. That way, not only can you match "Who is the
    president of UN" but you might also match on things that are a bit fuzzier.
    To do this, you might need to have several fields per document with
    variations. I could also see using Lucene's payload mechanism as well.

    But, as Vasu said, you will likely need other parts too, like OpenNLP.

    HTH,
    Grant


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMar 5, '09 at 2:49p
activeMar 5, '09 at 8:21p
posts4
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase