FAQ
Hi,

I am using RegexQuery for searching in a set of records wich are phrases of
several words each. My aim is to find any phrase that contains the given
group of letters (e.g. "in"). For that case, I am building the query with
the regular expression ".in.", so it should return all phrases with contain
"in", but the search only matches with the first word of the phrase.

For example, if my records are "Knowing yourself" and "Old clinic", the
correct search would return 2 matches, but it only matches with "Knowing
yourself".

How could I fix this?
--
View this message in context: http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23445235.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Erick Erickson at May 8, 2009 at 1:14 pm
    I don't understand your regex at all. Isn't it looking for in with any
    *single* character in front and back? Given your example, I don't
    see how you're getting anything back at all. Is this code you're
    actually executing or just an example?

    What does toString and/or Explain show? Think about getting
    a copy of Luke and using that tool to show you what actually
    happens.

    Best
    Erick
    On Fri, May 8, 2009 at 8:42 AM, Huntsman84 wrote:


    Hi,

    I am using RegexQuery for searching in a set of records wich are phrases of
    several words each. My aim is to find any phrase that contains the given
    group of letters (e.g. "in"). For that case, I am building the query with
    the regular expression ".in.", so it should return all phrases with contain
    "in", but the search only matches with the first word of the phrase.

    For example, if my records are "Knowing yourself" and "Old clinic", the
    correct search would return 2 matches, but it only matches with "Knowing
    yourself".

    How could I fix this?
    --
    View this message in context:
    http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23445235.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ian Lea at May 8, 2009 at 1:14 pm
    I'm surprised that it matches either - don't you need ".*in" where .*
    means match any character zero or more times? See the javadoc for
    java.util.regex.Pattern, or for Jakarta Regexp if you are using that
    package.

    Unless you're an expert in regexps it is probably worth playing with
    them outside your lucene code to start with e.g. with simple
    String.matches(regexp) calls. They can take some getting used to.
    And try to avoid anything with backslashes if you can!


    --
    Ian.


    On Fri, May 8, 2009 at 1:42 PM, Huntsman84 wrote:

    Hi,

    I am using RegexQuery for searching in a set of records wich are phrases of
    several words each. My aim is to find any phrase that contains the given
    group of letters (e.g. "in"). For that case, I am building the query with
    the regular expression ".in.", so it should return all phrases with contain
    "in", but the search only matches with the first word of the phrase.

    For example, if my records are "Knowing yourself" and "Old clinic", the
    correct search would return 2 matches, but it only matches with "Knowing
    yourself".

    How could I fix this?
    --
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Steven A Rowe at May 8, 2009 at 3:32 pm

    On 5/8/2009 at 9:13 AM, Ian Lee wrote:
    I'm surprised that it matches either - don't you need ".*in" where .*
    means match any character zero or more times? See the javadoc for
    java.util.regex.Pattern, or for Jakarta Regexp if you are using that
    package.

    Unless you're an expert in regexps it is probably worth playing with
    them outside your lucene code to start with e.g. with simple
    String.matches(regexp) calls. They can take some getting used to.
    And try to avoid anything with backslashes if you can!
    The java.util.regex.Pattern implementation (the default RegexQuery implementation) actually uses Matcher.lookingAt(), which is equivalent to prepending a "^" anchor to the beginning of the pattern, so if Huntsman84 is using the default implementation, then I agree with Ian: I'm surprised it matches either.

    However, the Jakarta Regexp implementation uses RE.match(), which does *not* require a beginning-of-string match.

    Hunstman84, are you using the Jakarta Regexp implementation? If so, then like you, I'm surprised it's not matching both :).

    It would be useful to see some real code, including how you index your records.

    Steve
    On Fri, May 8, 2009 at 1:42 PM, Huntsman84 wrote:

    Hi,

    I am using RegexQuery for searching in a set of records wich are
    phrases of several words each. My aim is to find any phrase that
    contains the given group of letters (e.g. "in"). For that case,
    I am building the query with the regular expression ".in.", so it
    should return all phrases with contain "in", but the search only
    matches with the first word of the phrase.

    For example, if my records are "Knowing yourself" and "Old
    clinic", the correct search would return 2 matches, but it only
    matches with "Knowing yourself".

    How could I fix this?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Huntsman84 at May 11, 2009 at 8:01 am
    This is the code for searching:

    String index = "index";
    String field = "contents";
    IndexReader reader = IndexReader.open(index);
    Searcher searcher = new IndexSearcher(reader);

    System.out.println("Enter query: ");
    String line = ".IN.";//in jakarta regexp this is like * IN *
    RegexQuery rxquery = new RegexQuery(new Term(field,line));
    Hits hits = searcher.search(rxquery);

    if(hits!=null){
    for(int k = 0; k<100 && k<hits.length(); k++){
    if(hits.doc(k)!=null)
    System.out.println(hits.doc(k).getField("contents").stringValue());
    }
    }



    And this is the part of creating the index:


    File directory = new File("index");
    IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
    true,
    IndexWriter.MaxFieldLength.LIMITED);
    List<String> records = getRecords();//returns a list of record values from
    database, all of them are phrases
    Iterator<String> i = records.iterator();
    while(i.hasNext()){
    Document doc = new Document();
    doc.add(new Field(field, i.next(), Field.Store.YES,
    Field.Index.NOT_ANALYZED));
    writer.addDocument(doc);
    }
    writer.optimize();
    writer.close();



    This code works as I want but just matching with the first word of the
    phrase. I think the problem is the index building, but I don't know how to
    fix it...

    Any ideas?

    Thank you so much!!



    Steven A Rowe wrote:
    On 5/8/2009 at 9:13 AM, Ian Lee wrote:
    I'm surprised that it matches either - don't you need ".*in" where .*
    means match any character zero or more times? See the javadoc for
    java.util.regex.Pattern, or for Jakarta Regexp if you are using that
    package.

    Unless you're an expert in regexps it is probably worth playing with
    them outside your lucene code to start with e.g. with simple
    String.matches(regexp) calls. They can take some getting used to.
    And try to avoid anything with backslashes if you can!
    The java.util.regex.Pattern implementation (the default RegexQuery
    implementation) actually uses Matcher.lookingAt(), which is equivalent to
    prepending a "^" anchor to the beginning of the pattern, so if Huntsman84
    is using the default implementation, then I agree with Ian: I'm surprised
    it matches either.

    However, the Jakarta Regexp implementation uses RE.match(), which does
    *not* require a beginning-of-string match.

    Hunstman84, are you using the Jakarta Regexp implementation? If so, then
    like you, I'm surprised it's not matching both :).

    It would be useful to see some real code, including how you index your
    records.

    Steve
    On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarcia84@gmail.com>
    wrote:
    Hi,

    I am using RegexQuery for searching in a set of records wich are
    phrases of several words each. My aim is to find any phrase that
    contains the given group of letters (e.g. "in"). For that case,
    I am building the query with the regular expression ".in.", so it
    should return all phrases with contain "in", but the search only
    matches with the first word of the phrase.

    For example, if my records are "Knowing yourself" and "Old
    clinic", the correct search would return 2 matches, but it only
    matches with "Knowing yourself".

    How could I fix this?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ian Lea at May 11, 2009 at 9:05 am
    The default regex package is java.util.regex and I can't see anywhere
    that you tell it to use the Jakarta regexp package. So I don't think
    that ".in" will match. Also, you are storing your contents field as
    NOT_ANALYZED so you will need to be wary of case sensitivity. Maybe
    this is what you want, but maybe not.


    --
    Ian.

    On Mon, May 11, 2009 at 9:00 AM, Huntsman84 wrote:

    This is the code for searching:

    String index = "index";
    String field = "contents";
    IndexReader reader = IndexReader.open(index);
    Searcher searcher = new IndexSearcher(reader);

    System.out.println("Enter query: ");
    String line = ".IN.";//in jakarta regexp this is like * IN *
    RegexQuery rxquery = new RegexQuery(new Term(field,line));
    Hits hits = searcher.search(rxquery);

    if(hits!=null){
    for(int k = 0; k<100 && k<hits.length(); k++){
    if(hits.doc(k)!=null)
    System.out.println(hits.doc(k).getField("contents").stringValue());
    }
    }



    And this is the part of creating the index:


    File directory = new File("index");
    IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
    true,
    IndexWriter.MaxFieldLength.LIMITED);
    List<String> records = getRecords();//returns a list of record values from
    database, all of them are phrases
    Iterator<String> i = records.iterator();
    while(i.hasNext()){
    Document doc = new Document();
    doc.add(new Field(field, i.next(), Field.Store.YES,
    Field.Index.NOT_ANALYZED));
    writer.addDocument(doc);
    }
    writer.optimize();
    writer.close();



    This code works as I want but just matching with the first word of the
    phrase. I think the problem is the index building, but I don't know how to
    fix it...

    Any ideas?

    Thank you so much!!



    Steven A Rowe wrote:
    On 5/8/2009 at 9:13 AM, Ian Lee wrote:
    I'm surprised that it matches either - don't you need ".*in" where .*
    means match any character zero or more times?  See the javadoc for
    java.util.regex.Pattern, or for Jakarta Regexp if you are using that
    package.

    Unless you're an expert in regexps it is probably worth playing with
    them outside your lucene code to start with e.g. with simple
    String.matches(regexp) calls.  They can take some getting used to.
    And try to avoid anything with backslashes if you can!
    The java.util.regex.Pattern implementation (the default RegexQuery
    implementation) actually uses Matcher.lookingAt(), which is equivalent to
    prepending a "^" anchor to the beginning of the pattern, so if Huntsman84
    is using the default implementation, then I agree with Ian: I'm surprised
    it matches either.

    However, the Jakarta Regexp implementation uses RE.match(), which does
    *not* require a beginning-of-string match.

    Hunstman84, are you using the Jakarta Regexp implementation?  If so, then
    like you, I'm surprised it's not matching both :).

    It would be useful to see some real code, including how you index your
    records.

    Steve
    On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarcia84@gmail.com>
    wrote:
    Hi,

    I am using RegexQuery for searching in a set of records wich are
    phrases of several words each. My aim is to find any phrase that
    contains the given group of letters (e.g. "in"). For that case,
    I am building the query with the regular expression ".in.", so it
    should return all phrases with contain "in", but the search only
    matches with the first word of the phrase.

    For example, if my records are "Knowing yourself" and "Old
    clinic", the correct search would return 2 matches, but it only
    matches with "Knowing yourself".

    How could I fix this?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Huntsman84 at May 11, 2009 at 12:39 pm
    The RegexQuery class uses that package, and for that reason the expression
    matches.

    If my records contained only one word each, this code would work, but I need
    to apply that regular expression to a phrase...


    Ian Lea wrote:
    The default regex package is java.util.regex and I can't see anywhere
    that you tell it to use the Jakarta regexp package. So I don't think
    that ".in" will match. Also, you are storing your contents field as
    NOT_ANALYZED so you will need to be wary of case sensitivity. Maybe
    this is what you want, but maybe not.


    --
    Ian.

    On Mon, May 11, 2009 at 9:00 AM, Huntsman84 wrote:

    This is the code for searching:

    String index = "index";
    String field = "contents";
    IndexReader reader = IndexReader.open(index);
    Searcher searcher = new IndexSearcher(reader);

    System.out.println("Enter query: ");
    String line = ".IN.";//in jakarta regexp this is like * IN *
    RegexQuery rxquery = new RegexQuery(new Term(field,line));
    Hits hits = searcher.search(rxquery);

    if(hits!=null){
    for(int k = 0; k<100 && k<hits.length(); k++){
    if(hits.doc(k)!=null)

    System.out.println(hits.doc(k).getField("contents").stringValue());
    }
    }



    And this is the part of creating the index:


    File directory = new File("index");
    IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
    true,
    IndexWriter.MaxFieldLength.LIMITED);
    List<String> records = getRecords();//returns a list of record values
    from
    database, all of them are phrases
    Iterator<String> i = records.iterator();
    while(i.hasNext()){
    Document doc = new Document();
    doc.add(new Field(field, i.next(), Field.Store.YES,
    Field.Index.NOT_ANALYZED));
    writer.addDocument(doc);
    }
    writer.optimize();
    writer.close();



    This code works as I want but just matching with the first word of the
    phrase. I think the problem is the index building, but I don't know how
    to
    fix it...

    Any ideas?

    Thank you so much!!



    Steven A Rowe wrote:
    On 5/8/2009 at 9:13 AM, Ian Lee wrote:
    I'm surprised that it matches either - don't you need ".*in" where .*
    means match any character zero or more times?  See the javadoc for
    java.util.regex.Pattern, or for Jakarta Regexp if you are using that
    package.

    Unless you're an expert in regexps it is probably worth playing with
    them outside your lucene code to start with e.g. with simple
    String.matches(regexp) calls.  They can take some getting used to.
    And try to avoid anything with backslashes if you can!
    The java.util.regex.Pattern implementation (the default RegexQuery
    implementation) actually uses Matcher.lookingAt(), which is equivalent
    to
    prepending a "^" anchor to the beginning of the pattern, so if
    Huntsman84
    is using the default implementation, then I agree with Ian: I'm
    surprised
    it matches either.

    However, the Jakarta Regexp implementation uses RE.match(), which does
    *not* require a beginning-of-string match.

    Hunstman84, are you using the Jakarta Regexp implementation?  If so,
    then
    like you, I'm surprised it's not matching both :).

    It would be useful to see some real code, including how you index your
    records.

    Steve
    On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarcia84@gmail.com>
    wrote:
    Hi,

    I am using RegexQuery for searching in a set of records wich are
    phrases of several words each. My aim is to find any phrase that
    contains the given group of letters (e.g. "in"). For that case,
    I am building the query with the regular expression ".in.", so it
    should return all phrases with contain "in", but the search only
    matches with the first word of the phrase.

    For example, if my records are "Knowing yourself" and "Old
    clinic", the correct search would return 2 matches, but it only
    matches with "Knowing yourself".

    How could I fix this?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23482532.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ian Lea at May 11, 2009 at 2:30 pm
    The little self-contained program below runs regex queries for a few
    regexps against a few phrases for both the java.util and jakarta
    regexp packages.

    Output when run with lucene 2.4.1 and jakarta-regexp 1.5 is

    Added Knowing yourself
    Added Old clinic
    Added INSIDE
    Added Not INSIDE

    Default RegexCapabilities=org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0

    org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0
    0 hits for text:.in
    2 hits for text:.*in
    0 hits for text:.IN
    2 hits for text:.*IN
    org.apache.lucene.search.regex.JakartaRegexpCapabilities@0
    2 hits for text:.in
    2 hits for text:.*in
    1 hits for text:.IN
    2 hits for text:.*IN

    Hope that helps.

    --
    Ian.


    import org.apache.lucene.index.*;
    import org.apache.lucene.store.*;
    import org.apache.lucene.document.*;
    import org.apache.lucene.analysis.*;
    import org.apache.lucene.analysis.standard.*;
    import org.apache.lucene.search.*;
    import org.apache.lucene.search.regex.*;

    public class luctest {

    public static void main(String[] _args) throws Exception {
    RAMDirectory rdir = new RAMDirectory();
    IndexWriter writer = new IndexWriter(rdir, new StandardAnalyzer(), true);
    String[] docterms = { "Knowing yourself",
    "Old clinic",
    "INSIDE",
    "Not INSIDE" };

    for (String s : docterms) {
    Document d = new Document();
    d.add(new Field("text",
    s,
    Field.Store.YES,
    Field.Index.NOT_ANALYZED));
    writer.addDocument(d);
    System.out.printf("Added %s\n", s);
    }
    writer.close();

    IndexSearcher searcher = new IndexSearcher(rdir);
    String[] queries = { ".in", ".*in", ".IN", ".*IN" };
    RegexCapabilities[] rcaps = { new JavaUtilRegexCapabilities(),
    new JakartaRegexpCapabilities() };
    RegexQuery qx = new RegexQuery(new Term("x", "x"));
    System.out.printf("\nDefault RegexCapabilities=%s\n\n",
    qx.getRegexImplementation());
    for (RegexCapabilities rcap : rcaps) {
    System.out.println(rcap);
    for (String s : queries) {
    Term t = new Term("text", s);
    RegexQuery q = new RegexQuery(t);
    q.setRegexImplementation(rcap);
    Hits h = searcher.search(q);
    System.out.printf("%s hits for %s\n",
    h.length(),
    q.toString());
    }
    }
    }
    }

    On Mon, May 11, 2009 at 1:39 PM, Huntsman84 wrote:

    The RegexQuery class uses that package, and for that reason the expression
    matches.

    If my records contained only one word each, this code would work, but I need
    to apply that regular expression to a phrase...


    Ian Lea wrote:
    The default regex package is java.util.regex and I can't see anywhere
    that you tell it to use the Jakarta regexp package.  So I don't think
    that ".in" will match.  Also, you are storing your contents field as
    NOT_ANALYZED so you will need to be wary of case sensitivity.  Maybe
    this is what you want, but maybe not.


    --
    Ian.

    On Mon, May 11, 2009 at 9:00 AM, Huntsman84 wrote:

    This is the code for searching:

    String index = "index";
    String field = "contents";
    IndexReader reader = IndexReader.open(index);
    Searcher searcher = new IndexSearcher(reader);

    System.out.println("Enter query: ");
    String line = ".IN.";//in jakarta regexp this is like * IN *
    RegexQuery rxquery = new RegexQuery(new Term(field,line));
    Hits hits = searcher.search(rxquery);

    if(hits!=null){
    for(int k = 0; k<100 && k<hits.length(); k++){
    if(hits.doc(k)!=null)

    System.out.println(hits.doc(k).getField("contents").stringValue());
    }
    }



    And this is the part of creating the index:


    File directory = new File("index");
    IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
    true,
    IndexWriter.MaxFieldLength.LIMITED);
    List<String> records = getRecords();//returns a list of record values
    from
    database, all of them are phrases
    Iterator<String> i = records.iterator();
    while(i.hasNext()){
    Document doc = new Document();
    doc.add(new Field(field, i.next(), Field.Store.YES,
    Field.Index.NOT_ANALYZED));
    writer.addDocument(doc);
    }
    writer.optimize();
    writer.close();



    This code works as I want but just matching with the first word of the
    phrase. I think the problem is the index building, but I don't know how
    to
    fix it...

    Any ideas?

    Thank you so much!!



    Steven A Rowe wrote:
    On 5/8/2009 at 9:13 AM, Ian Lee wrote:
    I'm surprised that it matches either - don't you need ".*in" where .*
    means match any character zero or more times?  See the javadoc for
    java.util.regex.Pattern, or for Jakarta Regexp if you are using that
    package.

    Unless you're an expert in regexps it is probably worth playing with
    them outside your lucene code to start with e.g. with simple
    String.matches(regexp) calls.  They can take some getting used to.
    And try to avoid anything with backslashes if you can!
    The java.util.regex.Pattern implementation (the default RegexQuery
    implementation) actually uses Matcher.lookingAt(), which is equivalent
    to
    prepending a "^" anchor to the beginning of the pattern, so if
    Huntsman84
    is using the default implementation, then I agree with Ian: I'm
    surprised
    it matches either.

    However, the Jakarta Regexp implementation uses RE.match(), which does
    *not* require a beginning-of-string match.

    Hunstman84, are you using the Jakarta Regexp implementation?  If so,
    then
    like you, I'm surprised it's not matching both :).

    It would be useful to see some real code, including how you index your
    records.

    Steve
    On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarcia84@gmail.com>
    wrote:
    Hi,

    I am using RegexQuery for searching in a set of records wich are
    phrases of several words each. My aim is to find any phrase that
    contains the given group of letters (e.g. "in"). For that case,
    I am building the query with the regular expression ".in.", so it
    should return all phrases with contain "in", but the search only
    matches with the first word of the phrase.

    For example, if my records are "Knowing yourself" and "Old
    clinic", the correct search would return 2 matches, but it only
    matches with "Knowing yourself".

    How could I fix this?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23482532.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Huntsman84 at May 11, 2009 at 4:06 pm
    That's it!!!

    The problem was with the regular expression, the one I need is ".*IN"!!

    Thank you so much, I was turning mad... =)


    Ian Lea wrote:
    The little self-contained program below runs regex queries for a few
    regexps against a few phrases for both the java.util and jakarta
    regexp packages.

    Output when run with lucene 2.4.1 and jakarta-regexp 1.5 is

    Added Knowing yourself
    Added Old clinic
    Added INSIDE
    Added Not INSIDE

    Default
    RegexCapabilities=org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0

    org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0
    0 hits for text:.in
    2 hits for text:.*in
    0 hits for text:.IN
    2 hits for text:.*IN
    org.apache.lucene.search.regex.JakartaRegexpCapabilities@0
    2 hits for text:.in
    2 hits for text:.*in
    1 hits for text:.IN
    2 hits for text:.*IN

    Hope that helps.

    --
    Ian.


    import org.apache.lucene.index.*;
    import org.apache.lucene.store.*;
    import org.apache.lucene.document.*;
    import org.apache.lucene.analysis.*;
    import org.apache.lucene.analysis.standard.*;
    import org.apache.lucene.search.*;
    import org.apache.lucene.search.regex.*;

    public class luctest {

    public static void main(String[] _args) throws Exception {
    RAMDirectory rdir = new RAMDirectory();
    IndexWriter writer = new IndexWriter(rdir, new StandardAnalyzer(), true);
    String[] docterms = { "Knowing yourself",
    "Old clinic",
    "INSIDE",
    "Not INSIDE" };

    for (String s : docterms) {
    Document d = new Document();
    d.add(new Field("text",
    s,
    Field.Store.YES,
    Field.Index.NOT_ANALYZED));
    writer.addDocument(d);
    System.out.printf("Added %s\n", s);
    }
    writer.close();

    IndexSearcher searcher = new IndexSearcher(rdir);
    String[] queries = { ".in", ".*in", ".IN", ".*IN" };
    RegexCapabilities[] rcaps = { new JavaUtilRegexCapabilities(),
    new JakartaRegexpCapabilities() };
    RegexQuery qx = new RegexQuery(new Term("x", "x"));
    System.out.printf("\nDefault RegexCapabilities=%s\n\n",
    qx.getRegexImplementation());
    for (RegexCapabilities rcap : rcaps) {
    System.out.println(rcap);
    for (String s : queries) {
    Term t = new Term("text", s);
    RegexQuery q = new RegexQuery(t);
    q.setRegexImplementation(rcap);
    Hits h = searcher.search(q);
    System.out.printf("%s hits for %s\n",
    h.length(),
    q.toString());
    }
    }
    }
    }

    On Mon, May 11, 2009 at 1:39 PM, Huntsman84 wrote:

    The RegexQuery class uses that package, and for that reason the
    expression
    matches.

    If my records contained only one word each, this code would work, but I
    need
    to apply that regular expression to a phrase...


    Ian Lea wrote:
    The default regex package is java.util.regex and I can't see anywhere
    that you tell it to use the Jakarta regexp package.  So I don't think
    that ".in" will match.  Also, you are storing your contents field as
    NOT_ANALYZED so you will need to be wary of case sensitivity.  Maybe
    this is what you want, but maybe not.


    --
    Ian.


    On Mon, May 11, 2009 at 9:00 AM, Huntsman84 <tpgarcia84@gmail.com>
    wrote:
    This is the code for searching:

    String index = "index";
    String field = "contents";
    IndexReader reader = IndexReader.open(index);
    Searcher searcher = new IndexSearcher(reader);

    System.out.println("Enter query: ");
    String line = ".IN.";//in jakarta regexp this is like * IN *
    RegexQuery rxquery = new RegexQuery(new Term(field,line));
    Hits hits = searcher.search(rxquery);

    if(hits!=null){
    for(int k = 0; k<100 && k<hits.length(); k++){
    if(hits.doc(k)!=null)

    System.out.println(hits.doc(k).getField("contents").stringValue());
    }
    }



    And this is the part of creating the index:


    File directory = new File("index");
    IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
    true,
    IndexWriter.MaxFieldLength.LIMITED);
    List<String> records = getRecords();//returns a list of record values
    from
    database, all of them are phrases
    Iterator<String> i = records.iterator();
    while(i.hasNext()){
    Document doc = new Document();
    doc.add(new Field(field, i.next(), Field.Store.YES,
    Field.Index.NOT_ANALYZED));
    writer.addDocument(doc);
    }
    writer.optimize();
    writer.close();



    This code works as I want but just matching with the first word of the
    phrase. I think the problem is the index building, but I don't know how
    to
    fix it...

    Any ideas?

    Thank you so much!!



    Steven A Rowe wrote:
    On 5/8/2009 at 9:13 AM, Ian Lee wrote:
    I'm surprised that it matches either - don't you need ".*in" where .*
    means match any character zero or more times?  See the javadoc for
    java.util.regex.Pattern, or for Jakarta Regexp if you are using that
    package.

    Unless you're an expert in regexps it is probably worth playing with
    them outside your lucene code to start with e.g. with simple
    String.matches(regexp) calls.  They can take some getting used to.
    And try to avoid anything with backslashes if you can!
    The java.util.regex.Pattern implementation (the default RegexQuery
    implementation) actually uses Matcher.lookingAt(), which is equivalent
    to
    prepending a "^" anchor to the beginning of the pattern, so if
    Huntsman84
    is using the default implementation, then I agree with Ian: I'm
    surprised
    it matches either.

    However, the Jakarta Regexp implementation uses RE.match(), which does
    *not* require a beginning-of-string match.

    Hunstman84, are you using the Jakarta Regexp implementation?  If so,
    then
    like you, I'm surprised it's not matching both :).

    It would be useful to see some real code, including how you index your
    records.

    Steve
    On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarcia84@gmail.com>
    wrote:
    Hi,

    I am using RegexQuery for searching in a set of records wich are
    phrases of several words each. My aim is to find any phrase that
    contains the given group of letters (e.g. "in"). For that case,
    I am building the query with the regular expression ".in.", so it
    should return all phrases with contain "in", but the search only
    matches with the first word of the phrase.

    For example, if my records are "Knowing yourself" and "Old
    clinic", the correct search would return 2 matches, but it only
    matches with "Knowing yourself".

    How could I fix this?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23482532.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23486350.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Seid Mohammed at May 12, 2009 at 2:25 pm
    I need it similar functionality, but while running the above code it
    breaks after outputing the following
    ========================================================================
    Added Knowing yourself
    Added Old clinic
    Added INSIDE
    Added Not INSIDE

    Default RegexCapabilities=org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0

    org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0
    0 hits for text:.in
    2 hits for text:.*in
    0 hits for text:.IN
    2 hits for text:.*IN
    org.apache.lucene.search.regex.JakartaRegexpCapabilities@0
    Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/regexp/RE
    at org.apache.lucene.search.regex.JakartaRegexpCapabilities.compile(JakartaRegexpCapabilities.java:32)
    at org.apache.lucene.search.regex.RegexTermEnum.(RegexQuery.java:59)
    at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:55)
    at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:162)
    at org.apache.lucene.search.Query.weight(Query.java:94)
    at org.apache.lucene.search.Hits.(Searcher.java:50)
    at org.apache.lucene.search.Searcher.search(Searcher.java:40)
    at Regex2.main(Regex2.java:43)
    Caused by: java.lang.ClassNotFoundException: org.apache.regexp.RE
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    ... 10 more
    ===================================================================

    thanks a lot
    On 5/11/09, Huntsman84 wrote:

    That's it!!!

    The problem was with the regular expression, the one I need is ".*IN"!!

    Thank you so much, I was turning mad... =)


    Ian Lea wrote:
    The little self-contained program below runs regex queries for a few
    regexps against a few phrases for both the java.util and jakarta
    regexp packages.

    Output when run with lucene 2.4.1 and jakarta-regexp 1.5 is

    Added Knowing yourself
    Added Old clinic
    Added INSIDE
    Added Not INSIDE

    Default
    RegexCapabilities=org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0

    org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0
    0 hits for text:.in
    2 hits for text:.*in
    0 hits for text:.IN
    2 hits for text:.*IN
    org.apache.lucene.search.regex.JakartaRegexpCapabilities@0
    2 hits for text:.in
    2 hits for text:.*in
    1 hits for text:.IN
    2 hits for text:.*IN

    Hope that helps.

    --
    Ian.


    import org.apache.lucene.index.*;
    import org.apache.lucene.store.*;
    import org.apache.lucene.document.*;
    import org.apache.lucene.analysis.*;
    import org.apache.lucene.analysis.standard.*;
    import org.apache.lucene.search.*;
    import org.apache.lucene.search.regex.*;

    public class luctest {

    public static void main(String[] _args) throws Exception {
    RAMDirectory rdir = new RAMDirectory();
    IndexWriter writer = new IndexWriter(rdir, new StandardAnalyzer(), true);
    String[] docterms = { "Knowing yourself",
    "Old clinic",
    "INSIDE",
    "Not INSIDE" };

    for (String s : docterms) {
    Document d = new Document();
    d.add(new Field("text",
    s,
    Field.Store.YES,
    Field.Index.NOT_ANALYZED));
    writer.addDocument(d);
    System.out.printf("Added %s\n", s);
    }
    writer.close();

    IndexSearcher searcher = new IndexSearcher(rdir);
    String[] queries = { ".in", ".*in", ".IN", ".*IN" };
    RegexCapabilities[] rcaps = { new JavaUtilRegexCapabilities(),
    new JakartaRegexpCapabilities() };
    RegexQuery qx = new RegexQuery(new Term("x", "x"));
    System.out.printf("\nDefault RegexCapabilities=%s\n\n",
    qx.getRegexImplementation());
    for (RegexCapabilities rcap : rcaps) {
    System.out.println(rcap);
    for (String s : queries) {
    Term t = new Term("text", s);
    RegexQuery q = new RegexQuery(t);
    q.setRegexImplementation(rcap);
    Hits h = searcher.search(q);
    System.out.printf("%s hits for %s\n",
    h.length(),
    q.toString());
    }
    }
    }
    }

    On Mon, May 11, 2009 at 1:39 PM, Huntsman84 wrote:

    The RegexQuery class uses that package, and for that reason the
    expression
    matches.

    If my records contained only one word each, this code would work, but I
    need
    to apply that regular expression to a phrase...


    Ian Lea wrote:
    The default regex package is java.util.regex and I can't see anywhere
    that you tell it to use the Jakarta regexp package.  So I don't think
    that ".in" will match.  Also, you are storing your contents field as
    NOT_ANALYZED so you will need to be wary of case sensitivity.  Maybe
    this is what you want, but maybe not.


    --
    Ian.


    On Mon, May 11, 2009 at 9:00 AM, Huntsman84 <tpgarcia84@gmail.com>
    wrote:
    This is the code for searching:

    String index = "index";
    String field = "contents";
    IndexReader reader = IndexReader.open(index);
    Searcher searcher = new IndexSearcher(reader);

    System.out.println("Enter query: ");
    String line = ".IN.";//in jakarta regexp this is like * IN *
    RegexQuery rxquery = new RegexQuery(new Term(field,line));
    Hits hits = searcher.search(rxquery);

    if(hits!=null){
    for(int k = 0; k<100 && k<hits.length(); k++){
    if(hits.doc(k)!=null)

    System.out.println(hits.doc(k).getField("contents").stringValue());
    }
    }



    And this is the part of creating the index:


    File directory = new File("index");
    IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
    true,
    IndexWriter.MaxFieldLength.LIMITED);
    List<String> records = getRecords();//returns a list of record values
    from
    database, all of them are phrases
    Iterator<String> i = records.iterator();
    while(i.hasNext()){
    Document doc = new Document();
    doc.add(new Field(field, i.next(), Field.Store.YES,
    Field.Index.NOT_ANALYZED));
    writer.addDocument(doc);
    }
    writer.optimize();
    writer.close();



    This code works as I want but just matching with the first word of the
    phrase. I think the problem is the index building, but I don't know how
    to
    fix it...

    Any ideas?

    Thank you so much!!



    Steven A Rowe wrote:
    On 5/8/2009 at 9:13 AM, Ian Lee wrote:
    I'm surprised that it matches either - don't you need ".*in" where .*
    means match any character zero or more times?  See the javadoc for
    java.util.regex.Pattern, or for Jakarta Regexp if you are using that
    package.

    Unless you're an expert in regexps it is probably worth playing with
    them outside your lucene code to start with e.g. with simple
    String.matches(regexp) calls.  They can take some getting used to.
    And try to avoid anything with backslashes if you can!
    The java.util.regex.Pattern implementation (the default RegexQuery
    implementation) actually uses Matcher.lookingAt(), which is equivalent
    to
    prepending a "^" anchor to the beginning of the pattern, so if
    Huntsman84
    is using the default implementation, then I agree with Ian: I'm
    surprised
    it matches either.

    However, the Jakarta Regexp implementation uses RE.match(), which does
    *not* require a beginning-of-string match.

    Hunstman84, are you using the Jakarta Regexp implementation?  If so,
    then
    like you, I'm surprised it's not matching both :).

    It would be useful to see some real code, including how you index your
    records.

    Steve
    On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarcia84@gmail.com>
    wrote:
    Hi,

    I am using RegexQuery for searching in a set of records wich are
    phrases of several words each. My aim is to find any phrase that
    contains the given group of letters (e.g. "in"). For that case,
    I am building the query with the regular expression ".in.", so it
    should return all phrases with contain "in", but the search only
    matches with the first word of the phrase.

    For example, if my records are "Knowing yourself" and "Old
    clinic", the correct search would return 2 matches, but it only
    matches with "Knowing yourself".

    How could I fix this?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23482532.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23486350.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    "RABI ZIDNI ILMA"

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ian Lea at May 12, 2009 at 2:35 pm
    The Jakarta regexp package is a separate download.
    http://jakarta.apache.org/regexp/index.html.

    --
    Ian.

    On Tue, May 12, 2009 at 3:21 PM, Seid Mohammed wrote:
    I need it similar functionality, but while running the above code it
    breaks after outputing the following
    ========================================================================
    Added Knowing yourself
    Added Old clinic
    Added INSIDE
    Added Not INSIDE

    Default RegexCapabilities=org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0

    org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0
    0 hits for text:.in
    2 hits for text:.*in
    0 hits for text:.IN
    2 hits for text:.*IN
    org.apache.lucene.search.regex.JakartaRegexpCapabilities@0
    Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/regexp/RE
    at org.apache.lucene.search.regex.JakartaRegexpCapabilities.compile(JakartaRegexpCapabilities.java:32)
    at org.apache.lucene.search.regex.RegexTermEnum.<init>(RegexTermEnum.java:47)
    at org.apache.lucene.search.regex.RegexQuery.getEnum(RegexQuery.java:59)
    at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:55)
    at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:162)
    at org.apache.lucene.search.Query.weight(Query.java:94)
    at org.apache.lucene.search.Hits.<init>(Hits.java:76)
    at org.apache.lucene.search.Searcher.search(Searcher.java:50)
    at org.apache.lucene.search.Searcher.search(Searcher.java:40)
    at Regex2.main(Regex2.java:43)
    Caused by: java.lang.ClassNotFoundException: org.apache.regexp.RE
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    ... 10 more
    ===================================================================

    thanks a lot
    On 5/11/09, Huntsman84 wrote:

    That's it!!!

    The problem was with the regular expression, the one I need is ".*IN"!!

    Thank you so much, I was turning mad... =)


    Ian Lea wrote:
    The little self-contained program below runs regex queries for a few
    regexps against a few phrases for both the java.util and jakarta
    regexp packages.

    Output when run with lucene 2.4.1 and jakarta-regexp 1.5 is

    Added Knowing yourself
    Added Old clinic
    Added INSIDE
    Added Not INSIDE

    Default
    RegexCapabilities=org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0

    org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0
    0 hits for text:.in
    2 hits for text:.*in
    0 hits for text:.IN
    2 hits for text:.*IN
    org.apache.lucene.search.regex.JakartaRegexpCapabilities@0
    2 hits for text:.in
    2 hits for text:.*in
    1 hits for text:.IN
    2 hits for text:.*IN

    Hope that helps.

    --
    Ian.


    import org.apache.lucene.index.*;
    import org.apache.lucene.store.*;
    import org.apache.lucene.document.*;
    import org.apache.lucene.analysis.*;
    import org.apache.lucene.analysis.standard.*;
    import org.apache.lucene.search.*;
    import org.apache.lucene.search.regex.*;

    public class luctest {

    public static void main(String[] _args) throws Exception {
    RAMDirectory rdir = new RAMDirectory();
    IndexWriter writer = new IndexWriter(rdir, new StandardAnalyzer(), true);
    String[] docterms = { "Knowing yourself",
    "Old clinic",
    "INSIDE",
    "Not INSIDE" };

    for (String s : docterms) {
    Document d = new Document();
    d.add(new Field("text",
    s,
    Field.Store.YES,
    Field.Index.NOT_ANALYZED));
    writer.addDocument(d);
    System.out.printf("Added %s\n", s);
    }
    writer.close();

    IndexSearcher searcher = new IndexSearcher(rdir);
    String[] queries = { ".in", ".*in", ".IN", ".*IN" };
    RegexCapabilities[] rcaps = { new JavaUtilRegexCapabilities(),
    new JakartaRegexpCapabilities() };
    RegexQuery qx = new RegexQuery(new Term("x", "x"));
    System.out.printf("\nDefault RegexCapabilities=%s\n\n",
    qx.getRegexImplementation());
    for (RegexCapabilities rcap : rcaps) {
    System.out.println(rcap);
    for (String s : queries) {
    Term t = new Term("text", s);
    RegexQuery q = new RegexQuery(t);
    q.setRegexImplementation(rcap);
    Hits h = searcher.search(q);
    System.out.printf("%s hits for %s\n",
    h.length(),
    q.toString());
    }
    }
    }
    }

    On Mon, May 11, 2009 at 1:39 PM, Huntsman84 wrote:

    The RegexQuery class uses that package, and for that reason the
    expression
    matches.

    If my records contained only one word each, this code would work, but I
    need
    to apply that regular expression to a phrase...


    Ian Lea wrote:
    The default regex package is java.util.regex and I can't see anywhere
    that you tell it to use the Jakarta regexp package.  So I don't think
    that ".in" will match.  Also, you are storing your contents field as
    NOT_ANALYZED so you will need to be wary of case sensitivity.  Maybe
    this is what you want, but maybe not.


    --
    Ian.


    On Mon, May 11, 2009 at 9:00 AM, Huntsman84 <tpgarcia84@gmail.com>
    wrote:
    This is the code for searching:

    String index = "index";
    String field = "contents";
    IndexReader reader = IndexReader.open(index);
    Searcher searcher = new IndexSearcher(reader);

    System.out.println("Enter query: ");
    String line = ".IN.";//in jakarta regexp this is like * IN *
    RegexQuery rxquery = new RegexQuery(new Term(field,line));
    Hits hits = searcher.search(rxquery);

    if(hits!=null){
    for(int k = 0; k<100 && k<hits.length(); k++){
    if(hits.doc(k)!=null)

    System.out.println(hits.doc(k).getField("contents").stringValue());
    }
    }



    And this is the part of creating the index:


    File directory = new File("index");
    IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
    true,
    IndexWriter.MaxFieldLength.LIMITED);
    List<String> records = getRecords();//returns a list of record values
    from
    database, all of them are phrases
    Iterator<String> i = records.iterator();
    while(i.hasNext()){
    Document doc = new Document();
    doc.add(new Field(field, i.next(), Field.Store.YES,
    Field.Index.NOT_ANALYZED));
    writer.addDocument(doc);
    }
    writer.optimize();
    writer.close();



    This code works as I want but just matching with the first word of the
    phrase. I think the problem is the index building, but I don't know how
    to
    fix it...

    Any ideas?

    Thank you so much!!



    Steven A Rowe wrote:
    On 5/8/2009 at 9:13 AM, Ian Lee wrote:
    I'm surprised that it matches either - don't you need ".*in" where .*
    means match any character zero or more times?  See the javadoc for
    java.util.regex.Pattern, or for Jakarta Regexp if you are using that
    package.

    Unless you're an expert in regexps it is probably worth playing with
    them outside your lucene code to start with e.g. with simple
    String.matches(regexp) calls.  They can take some getting used to.
    And try to avoid anything with backslashes if you can!
    The java.util.regex.Pattern implementation (the default RegexQuery
    implementation) actually uses Matcher.lookingAt(), which is equivalent
    to
    prepending a "^" anchor to the beginning of the pattern, so if
    Huntsman84
    is using the default implementation, then I agree with Ian: I'm
    surprised
    it matches either.

    However, the Jakarta Regexp implementation uses RE.match(), which does
    *not* require a beginning-of-string match.

    Hunstman84, are you using the Jakarta Regexp implementation?  If so,
    then
    like you, I'm surprised it's not matching both :).

    It would be useful to see some real code, including how you index your
    records.

    Steve
    On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarcia84@gmail.com>
    wrote:
    Hi,

    I am using RegexQuery for searching in a set of records wich are
    phrases of several words each. My aim is to find any phrase that
    contains the given group of letters (e.g. "in"). For that case,
    I am building the query with the regular expression ".in.", so it
    should return all phrases with contain "in", but the search only
    matches with the first word of the phrase.

    For example, if my records are "Knowing yourself" and "Old
    clinic", the correct search would return 2 matches, but it only
    matches with "Knowing yourself".

    How could I fix this?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23482532.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context:
    http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23486350.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    "RABI ZIDNI ILMA"

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark Miller at May 12, 2009 at 2:35 pm
    Use JavaUtilRegexCapabilities or put the Jakarata RegEx jar on your
    classpath: http://jakarta.apache.org/regexp/index.html

    --
    - Mark

    http://www.lucidimagination.com



    Seid Mohammed wrote:
    I need it similar functionality, but while running the above code it
    breaks after outputing the following
    ========================================================================
    Added Knowing yourself
    Added Old clinic
    Added INSIDE
    Added Not INSIDE

    Default RegexCapabilities=org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0

    org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0
    0 hits for text:.in
    2 hits for text:.*in
    0 hits for text:.IN
    2 hits for text:.*IN
    org.apache.lucene.search.regex.JakartaRegexpCapabilities@0
    Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/regexp/RE
    at org.apache.lucene.search.regex.JakartaRegexpCapabilities.compile(JakartaRegexpCapabilities.java:32)
    at org.apache.lucene.search.regex.RegexTermEnum.<init>(RegexTermEnum.java:47)
    at org.apache.lucene.search.regex.RegexQuery.getEnum(RegexQuery.java:59)
    at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:55)
    at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:162)
    at org.apache.lucene.search.Query.weight(Query.java:94)
    at org.apache.lucene.search.Hits.<init>(Hits.java:76)
    at org.apache.lucene.search.Searcher.search(Searcher.java:50)
    at org.apache.lucene.search.Searcher.search(Searcher.java:40)
    at Regex2.main(Regex2.java:43)
    Caused by: java.lang.ClassNotFoundException: org.apache.regexp.RE
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    ... 10 more
    ===================================================================

    thanks a lot
    On 5/11/09, Huntsman84 wrote:

    That's it!!!

    The problem was with the regular expression, the one I need is ".*IN"!!

    Thank you so much, I was turning mad... =)


    Ian Lea wrote:
    The little self-contained program below runs regex queries for a few
    regexps against a few phrases for both the java.util and jakarta
    regexp packages.

    Output when run with lucene 2.4.1 and jakarta-regexp 1.5 is

    Added Knowing yourself
    Added Old clinic
    Added INSIDE
    Added Not INSIDE

    Default
    RegexCapabilities=org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0

    org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0
    0 hits for text:.in
    2 hits for text:.*in
    0 hits for text:.IN
    2 hits for text:.*IN
    org.apache.lucene.search.regex.JakartaRegexpCapabilities@0
    2 hits for text:.in
    2 hits for text:.*in
    1 hits for text:.IN
    2 hits for text:.*IN

    Hope that helps.

    --
    Ian.


    import org.apache.lucene.index.*;
    import org.apache.lucene.store.*;
    import org.apache.lucene.document.*;
    import org.apache.lucene.analysis.*;
    import org.apache.lucene.analysis.standard.*;
    import org.apache.lucene.search.*;
    import org.apache.lucene.search.regex.*;

    public class luctest {

    public static void main(String[] _args) throws Exception {
    RAMDirectory rdir = new RAMDirectory();
    IndexWriter writer = new IndexWriter(rdir, new StandardAnalyzer(), true);
    String[] docterms = { "Knowing yourself",
    "Old clinic",
    "INSIDE",
    "Not INSIDE" };

    for (String s : docterms) {
    Document d = new Document();
    d.add(new Field("text",
    s,
    Field.Store.YES,
    Field.Index.NOT_ANALYZED));
    writer.addDocument(d);
    System.out.printf("Added %s\n", s);
    }
    writer.close();

    IndexSearcher searcher = new IndexSearcher(rdir);
    String[] queries = { ".in", ".*in", ".IN", ".*IN" };
    RegexCapabilities[] rcaps = { new JavaUtilRegexCapabilities(),
    new JakartaRegexpCapabilities() };
    RegexQuery qx = new RegexQuery(new Term("x", "x"));
    System.out.printf("\nDefault RegexCapabilities=%s\n\n",
    qx.getRegexImplementation());
    for (RegexCapabilities rcap : rcaps) {
    System.out.println(rcap);
    for (String s : queries) {
    Term t = new Term("text", s);
    RegexQuery q = new RegexQuery(t);
    q.setRegexImplementation(rcap);
    Hits h = searcher.search(q);
    System.out.printf("%s hits for %s\n",
    h.length(),
    q.toString());
    }
    }
    }
    }

    On Mon, May 11, 2009 at 1:39 PM, Huntsman84 wrote:

    The RegexQuery class uses that package, and for that reason the
    expression
    matches.

    If my records contained only one word each, this code would work, but I
    need
    to apply that regular expression to a phrase...


    Ian Lea wrote:
    The default regex package is java.util.regex and I can't see anywhere
    that you tell it to use the Jakarta regexp package. So I don't think
    that ".in" will match. Also, you are storing your contents field as
    NOT_ANALYZED so you will need to be wary of case sensitivity. Maybe
    this is what you want, but maybe not.


    --
    Ian.


    On Mon, May 11, 2009 at 9:00 AM, Huntsman84 <tpgarcia84@gmail.com>
    wrote:
    This is the code for searching:

    String index = "index";
    String field = "contents";
    IndexReader reader = IndexReader.open(index);
    Searcher searcher = new IndexSearcher(reader);

    System.out.println("Enter query: ");
    String line = ".IN.";//in jakarta regexp this is like * IN *
    RegexQuery rxquery = new RegexQuery(new Term(field,line));
    Hits hits = searcher.search(rxquery);

    if(hits!=null){
    for(int k = 0; k<100 && k<hits.length(); k++){
    if(hits.doc(k)!=null)

    System.out.println(hits.doc(k).getField("contents").stringValue());
    }
    }



    And this is the part of creating the index:


    File directory = new File("index");
    IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
    true,
    IndexWriter.MaxFieldLength.LIMITED);
    List<String> records = getRecords();//returns a list of record values
    from
    database, all of them are phrases
    Iterator<String> i = records.iterator();
    while(i.hasNext()){
    Document doc = new Document();
    doc.add(new Field(field, i.next(), Field.Store.YES,
    Field.Index.NOT_ANALYZED));
    writer.addDocument(doc);
    }
    writer.optimize();
    writer.close();



    This code works as I want but just matching with the first word of the
    phrase. I think the problem is the index building, but I don't know how
    to
    fix it...

    Any ideas?

    Thank you so much!!



    Steven A Rowe wrote:
    On 5/8/2009 at 9:13 AM, Ian Lee wrote:

    I'm surprised that it matches either - don't you need ".*in" where .*
    means match any character zero or more times? See the javadoc for
    java.util.regex.Pattern, or for Jakarta Regexp if you are using that
    package.

    Unless you're an expert in regexps it is probably worth playing with
    them outside your lucene code to start with e.g. with simple
    String.matches(regexp) calls. They can take some getting used to.
    And try to avoid anything with backslashes if you can!
    The java.util.regex.Pattern implementation (the default RegexQuery
    implementation) actually uses Matcher.lookingAt(), which is equivalent
    to
    prepending a "^" anchor to the beginning of the pattern, so if
    Huntsman84
    is using the default implementation, then I agree with Ian: I'm
    surprised
    it matches either.

    However, the Jakarta Regexp implementation uses RE.match(), which does
    *not* require a beginning-of-string match.

    Hunstman84, are you using the Jakarta Regexp implementation? If so,
    then
    like you, I'm surprised it's not matching both :).

    It would be useful to see some real code, including how you index your
    records.

    Steve

    On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarcia84@gmail.com>
    wrote:
    Hi,

    I am using RegexQuery for searching in a set of records wich are
    phrases of several words each. My aim is to find any phrase that
    contains the given group of letters (e.g. "in"). For that case,
    I am building the query with the regular expression ".in.", so it
    should return all phrases with contain "in", but the search only
    matches with the first word of the phrase.

    For example, if my records are "Knowing yourself" and "Old
    clinic", the correct search would return 2 matches, but it only
    matches with "Knowing yourself".

    How could I fix this?
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    View this message in context:
    http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    View this message in context:
    http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23482532.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    View this message in context:
    http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23486350.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Seid Mohammed at May 12, 2009 at 2:47 pm
    Thanks
    it works

    On 5/12/09, Mark Miller wrote:
    Use JavaUtilRegexCapabilities or put the Jakarata RegEx jar on your
    classpath: http://jakarta.apache.org/regexp/index.html

    --
    - Mark

    http://www.lucidimagination.com



    Seid Mohammed wrote:
    I need it similar functionality, but while running the above code it
    breaks after outputing the following
    ========================================================================
    Added Knowing yourself
    Added Old clinic
    Added INSIDE
    Added Not INSIDE

    Default
    RegexCapabilities=org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0

    org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0
    0 hits for text:.in
    2 hits for text:.*in
    0 hits for text:.IN
    2 hits for text:.*IN
    org.apache.lucene.search.regex.JakartaRegexpCapabilities@0
    Exception in thread "main" java.lang.NoClassDefFoundError:
    org/apache/regexp/RE
    at
    org.apache.lucene.search.regex.JakartaRegexpCapabilities.compile(JakartaRegexpCapabilities.java:32)
    at
    org.apache.lucene.search.regex.RegexTermEnum.<init>(RegexTermEnum.java:47)
    at org.apache.lucene.search.regex.RegexQuery.getEnum(RegexQuery.java:59)
    at
    org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:55)
    at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:162)
    at org.apache.lucene.search.Query.weight(Query.java:94)
    at org.apache.lucene.search.Hits.<init>(Hits.java:76)
    at org.apache.lucene.search.Searcher.search(Searcher.java:50)
    at org.apache.lucene.search.Searcher.search(Searcher.java:40)
    at Regex2.main(Regex2.java:43)
    Caused by: java.lang.ClassNotFoundException: org.apache.regexp.RE
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
    ... 10 more
    ===================================================================

    thanks a lot
    On 5/11/09, Huntsman84 wrote:

    That's it!!!

    The problem was with the regular expression, the one I need is ".*IN"!!

    Thank you so much, I was turning mad... =)


    Ian Lea wrote:
    The little self-contained program below runs regex queries for a few
    regexps against a few phrases for both the java.util and jakarta
    regexp packages.

    Output when run with lucene 2.4.1 and jakarta-regexp 1.5 is

    Added Knowing yourself
    Added Old clinic
    Added INSIDE
    Added Not INSIDE

    Default
    RegexCapabilities=org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0

    org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0
    0 hits for text:.in
    2 hits for text:.*in
    0 hits for text:.IN
    2 hits for text:.*IN
    org.apache.lucene.search.regex.JakartaRegexpCapabilities@0
    2 hits for text:.in
    2 hits for text:.*in
    1 hits for text:.IN
    2 hits for text:.*IN

    Hope that helps.

    --
    Ian.


    import org.apache.lucene.index.*;
    import org.apache.lucene.store.*;
    import org.apache.lucene.document.*;
    import org.apache.lucene.analysis.*;
    import org.apache.lucene.analysis.standard.*;
    import org.apache.lucene.search.*;
    import org.apache.lucene.search.regex.*;

    public class luctest {

    public static void main(String[] _args) throws Exception {
    RAMDirectory rdir = new RAMDirectory();
    IndexWriter writer = new IndexWriter(rdir, new StandardAnalyzer(),
    true);
    String[] docterms = { "Knowing yourself",
    "Old clinic",
    "INSIDE",
    "Not INSIDE" };

    for (String s : docterms) {
    Document d = new Document();
    d.add(new Field("text",
    s,
    Field.Store.YES,
    Field.Index.NOT_ANALYZED));
    writer.addDocument(d);
    System.out.printf("Added %s\n", s);
    }
    writer.close();

    IndexSearcher searcher = new IndexSearcher(rdir);
    String[] queries = { ".in", ".*in", ".IN", ".*IN" };
    RegexCapabilities[] rcaps = { new JavaUtilRegexCapabilities(),
    new JakartaRegexpCapabilities() };
    RegexQuery qx = new RegexQuery(new Term("x", "x"));
    System.out.printf("\nDefault RegexCapabilities=%s\n\n",
    qx.getRegexImplementation());
    for (RegexCapabilities rcap : rcaps) {
    System.out.println(rcap);
    for (String s : queries) {
    Term t = new Term("text", s);
    RegexQuery q = new RegexQuery(t);
    q.setRegexImplementation(rcap);
    Hits h = searcher.search(q);
    System.out.printf("%s hits for %s\n",
    h.length(),
    q.toString());
    }
    }
    }
    }


    On Mon, May 11, 2009 at 1:39 PM, Huntsman84 <tpgarcia84@gmail.com>
    wrote:
    The RegexQuery class uses that package, and for that reason the
    expression
    matches.

    If my records contained only one word each, this code would work, but I
    need
    to apply that regular expression to a phrase...


    Ian Lea wrote:
    The default regex package is java.util.regex and I can't see anywhere
    that you tell it to use the Jakarta regexp package. So I don't think
    that ".in" will match. Also, you are storing your contents field as
    NOT_ANALYZED so you will need to be wary of case sensitivity. Maybe
    this is what you want, but maybe not.


    --
    Ian.


    On Mon, May 11, 2009 at 9:00 AM, Huntsman84 <tpgarcia84@gmail.com>
    wrote:
    This is the code for searching:

    String index = "index";
    String field = "contents";
    IndexReader reader = IndexReader.open(index);
    Searcher searcher = new IndexSearcher(reader);

    System.out.println("Enter query: ");
    String line = ".IN.";//in jakarta regexp this is like * IN *
    RegexQuery rxquery = new RegexQuery(new Term(field,line));
    Hits hits = searcher.search(rxquery);

    if(hits!=null){
    for(int k = 0; k<100 && k<hits.length(); k++){
    if(hits.doc(k)!=null)

    System.out.println(hits.doc(k).getField("contents").stringValue());
    }
    }



    And this is the part of creating the index:


    File directory = new File("index");
    IndexWriter writer = new IndexWriter(directory, new
    StandardAnalyzer(),
    true,
    IndexWriter.MaxFieldLength.LIMITED);
    List<String> records = getRecords();//returns a list of record values
    from
    database, all of them are phrases
    Iterator<String> i = records.iterator();
    while(i.hasNext()){
    Document doc = new Document();
    doc.add(new Field(field, i.next(), Field.Store.YES,
    Field.Index.NOT_ANALYZED));
    writer.addDocument(doc);
    }
    writer.optimize();
    writer.close();



    This code works as I want but just matching with the first word of
    the
    phrase. I think the problem is the index building, but I don't know
    how
    to
    fix it...

    Any ideas?

    Thank you so much!!



    Steven A Rowe wrote:
    On 5/8/2009 at 9:13 AM, Ian Lee wrote:

    I'm surprised that it matches either - don't you need ".*in" where
    .*
    means match any character zero or more times? See the javadoc for
    java.util.regex.Pattern, or for Jakarta Regexp if you are using
    that
    package.

    Unless you're an expert in regexps it is probably worth playing
    with
    them outside your lucene code to start with e.g. with simple
    String.matches(regexp) calls. They can take some getting used to.
    And try to avoid anything with backslashes if you can!
    The java.util.regex.Pattern implementation (the default RegexQuery
    implementation) actually uses Matcher.lookingAt(), which is
    equivalent
    to
    prepending a "^" anchor to the beginning of the pattern, so if
    Huntsman84
    is using the default implementation, then I agree with Ian: I'm
    surprised
    it matches either.

    However, the Jakarta Regexp implementation uses RE.match(), which
    does
    *not* require a beginning-of-string match.

    Hunstman84, are you using the Jakarta Regexp implementation? If so,
    then
    like you, I'm surprised it's not matching both :).

    It would be useful to see some real code, including how you index
    your
    records.

    Steve

    On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarcia84@gmail.com>
    wrote:
    Hi,

    I am using RegexQuery for searching in a set of records wich are
    phrases of several words each. My aim is to find any phrase that
    contains the given group of letters (e.g. "in"). For that case,
    I am building the query with the regular expression ".in.", so it
    should return all phrases with contain "in", but the search only
    matches with the first word of the phrase.

    For example, if my records are "Knowing yourself" and "Old
    clinic", the correct search would return 2 matches, but it only
    matches with "Knowing yourself".

    How could I fix this?
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    View this message in context:
    http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    View this message in context:
    http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23482532.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    --
    View this message in context:
    http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23486350.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    "RABI ZIDNI ILMA"

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 8, '09 at 12:43p
activeMay 12, '09 at 2:47p
posts13
users6
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase