Search Discussions

128 discussions - 684 posts

  • Hi I am currently indexing documents (pdf, ms word, etc) that are uploaded, these documents can be searched and what the search returns to the user are summaries of the documents. Currently the ...
    Amin Mohammed-ColemanAmin Mohammed-Coleman
    Mar 7, 2009 at 9:39 am
    Mar 13, 2009 at 2:20 pm
  • I've been building a large index (hundreds of millions) with mainly structured data which consists of several fields with mostly unique values. I've been hitting out of memory issues when doing ...
    Mark harwoodMark harwood
    Mar 9, 2009 at 10:45 am
    Mar 14, 2009 at 1:52 pm
  • Hi, When is Lucene 2.9 due? I am eagerly waiting for the new lucene to come. As I compared Lucene with Minion I think Minion offers very rich capabilities like easier range query etc. Is there any ...
    Allahbaksh Mohammedali AsadullahAllahbaksh Mohammedali Asadullah
    Mar 9, 2009 at 9:42 am
    Mar 17, 2009 at 9:14 am
  • Hi guys: IndexWriter.deleteDocuments(Query query) api is not really making sense to me. Wouldn't IndexWriter.deleteDocuments(DocIdSet set) be better? Since we don't really care about scoring for this ...
    John WangJohn Wang
    Mar 31, 2009 at 7:42 pm
    Apr 2, 2009 at 7:36 pm
  • I'm overriding MergePolicy which is public, however SegmentInfos is package protected which means the MergePolicy subclass must be in the org.apache.lucene.index package. Can we make SegmentInfos ...
    Jason RutherglenJason Rutherglen
    Mar 24, 2009 at 7:08 pm
    Mar 27, 2009 at 8:48 pm
  • I am initiating a simple search and after profiling the my application using NetBeans. I see a constant heap consumption and eventually a server (tomcat) crash due to "out of memory" error. The ...
    Chetan ShahChetan Shah
    Mar 23, 2009 at 5:19 pm
    Mar 26, 2009 at 4:21 pm
  • So, I have a (small) Lucene index, all fine; I use it a bit, and then (on app shutdown) want to delete its files and the containing directory (the index is intended as a temp object). At some earlier ...
    Mar 6, 2009 at 2:20 am
    Mar 8, 2009 at 2:07 pm
  • Hi, I have scores between words, for example - dog and animal have a score of 0.5 (and not 0), dog and cat have a score of 0.2, etc. These scores are stored in an index: Doc1: field words: dog animal ...
    Liat orenLiat oren
    Mar 8, 2009 at 9:49 am
    Mar 27, 2009 at 10:57 am
  • Hello, I need the number of pages that contain two terms. Only the number of hits, I don't care about retrieving the pages. Right now I am using the following code in order to get it: Term first, ...
    Adrian DimulescuAdrian Dimulescu
    Mar 16, 2009 at 10:26 am
    Mar 18, 2009 at 12:55 am
  • While indexing using contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker. The asserion error is from TermsHashPerField.comparePostings(RawPostingList p1, RawPostingList p2). A Payload is ...
    Jason RutherglenJason Rutherglen
    Mar 24, 2009 at 6:02 pm
    Mar 29, 2009 at 2:05 pm
  • hi, lucene query result is sort by tf*idf. how/what can i do, to make the result is only by MatchWords Count? thanks.... hyj hongyinjie@163.com 2009-03-09
    Mar 9, 2009 at 2:53 am
    Mar 11, 2009 at 7:25 pm
  • Hi I am trying to run the performance tests against lucene, and am suprised about the results. I have a test that creates a queue of queries, and a number of threads. The threads run concurrently ...
    Paul TaylorPaul Taylor
    Mar 27, 2009 at 11:07 am
    Apr 1, 2009 at 10:04 am
  • Hi, I would like to do a search that will return documents that contain a given word. For example, I created the following index: IndexWriter writer = new IndexWriter("C:/TryIndex", new ...
    Liat orenLiat oren
    Mar 5, 2009 at 1:40 pm
    Mar 9, 2009 at 12:50 pm
  • Hi, What is the optimum way in which I can find all the document which has particular field in it. Example I want to find all the document in which the field text is not null. Regards, Allahbaksh ...
    Allahbaksh Mohammedali AsadullahAllahbaksh Mohammedali Asadullah
    Mar 4, 2009 at 5:43 am
    Mar 5, 2009 at 1:26 am
  • First of all I'm new into Lucene. I'm experimenting right now with it in combination with Hibernate Search. What I'm wondering is of I can index numbers related to i18n. E.g. I have a Book entity ...
    Marcel OverdijkMarcel Overdijk
    Mar 26, 2009 at 9:32 pm
    Mar 27, 2009 at 2:16 pm
  • hi all , has any body tried to use LSI(latent semantic indexing) for indexing in lucene? --------------------------------------------------------------------- To unsubscribe, e-mail: ...
    Nitin gopiNitin gopi
    Mar 18, 2009 at 6:09 am
    Apr 27, 2009 at 11:33 am
  • The DefaultSimilarity class defines sloppyFreq as: public float sloppyFreq(int distance) { return 1.0f / (distance + 1); } For a 'SpanNearQuery', this reduces the effect of the term frequency on the ...
    Peter KeeganPeter Keegan
    Mar 3, 2009 at 7:43 pm
    Apr 3, 2009 at 9:37 pm
  • Hi guys, I'm using a SinkTokenizer to collect some terms of the documents while doing the main document indexing I attached it to a specific field (tokenized, indexed). * writer* = *new* ...
    Raymond BalmèsRaymond Balmès
    Mar 28, 2009 at 5:36 pm
    Mar 31, 2009 at 1:43 pm
  • I'm new to Lucene and just beginning my project of adding it to our web app. We are indexing data from a MS SQL 2000 database and building full-text search from it. Everything I have read says that ...
    Matt SchraederMatt Schraeder
    Mar 26, 2009 at 9:27 pm
    Mar 27, 2009 at 4:17 pm
  • I'm using Lucene 2.4.1 and I'm still getting an AlreadyClosedException when trying to reopen an IndexReader. Here's the code I'm using, in case I'm doing something wrong, there isn't an error if I ...
    Chris SalemChris Salem
    Mar 19, 2009 at 5:50 pm
    Mar 20, 2009 at 3:42 pm
  • Hi I'm looking at trying to implement pagination for my search project. I've been google-ing for a solution. So far no luck. I've seen implementations of HitCollector which looks promising, however ...
    Amin Mohammed-ColemanAmin Mohammed-Coleman
    Mar 15, 2009 at 6:13 am
    Mar 19, 2009 at 6:01 pm
  • We are having a problem running searches on an index after upgrading to 2.4 and using the new Field.setOmitTf() function. The index size has been dramatically reduces and even the search performace ...
    Siraj HaiderSiraj Haider
    Mar 11, 2009 at 2:20 pm
    Mar 13, 2009 at 10:40 am
  • I am trying to evaluate as to whether Lucene is the right candidate for the problem at hand. Say I have 3 indexes: Index 1 has street names. Index 2 has business names. Index 3 has area names. All ...
    Srinivas BharghavSrinivas Bharghav
    Mar 6, 2009 at 6:25 am
    Mar 9, 2009 at 12:54 pm
  • Hi All, We Have used Lucene as our Search Engine and all our applications are deployed onto tomcat and running with thread pool size of 200. Java Version - 1.6.0-rc Lucene Version - 2.3.2 Tomcat ...
    Mar 3, 2009 at 1:14 pm
    Mar 6, 2009 at 2:20 pm
  • Hi all, The range query only works on fields (using a string compare)... is there any reason why it is not possible on the words of the document. The following query [stringa TO stringb] would just ...
    Raymond BalmèsRaymond Balmès
    Mar 3, 2009 at 5:04 pm
    Mar 4, 2009 at 9:14 pm
  • Hi, I'm fairly new to Lucene. I'd like to know how we can index synonyms for multiple words. This is the scenario: Consider a sentence: AAA BBB WORD1 WORD2 EEE FFF GGG. Now assume the two words ...
    Mar 2, 2009 at 2:25 pm
    Mar 3, 2009 at 3:41 pm
  • Hi Ive built some file based indexes based on data in a database, and it took quite some time. I am interested in trying to use RAM based indexes instead of file based indexes to compare search ...
    Paul TaylorPaul Taylor
    Mar 24, 2009 at 8:43 am
    Mar 25, 2009 at 6:50 am
  • Hello all 1) Which is best to use Snowball analyzer or Lucene contrib analyzer? There is no inbuilt stop word list for Snowball analyzer? 2) Whether Analyzer and QueryParser are thread-free. They ...
    Mar 6, 2009 at 6:45 am
    Mar 11, 2009 at 3:49 pm
  • Hi, Are you aware of any free software for language detection (given certain text, see if it is French, or Japanese)? I saw Bob Carpenter's previous mail which explained the principle nicely, but ...
    Zhang, LishengZhang, Lisheng
    Mar 27, 2009 at 4:55 pm
    Jul 7, 2009 at 12:38 am
  • I am working on a project that is already using Lucene (through Hibernate Search) to perform full text queries and have since come across several sites with information about LocalLucene/Lucene ...
    Jamie JohnsonJamie Johnson
    Mar 18, 2009 at 4:06 pm
    Apr 20, 2009 at 2:19 pm
  • Just ran into this. I'm using Lucene 2.4 in the following manner: 1. Open IndexWriter 2. Add documents 3. Delete documents 4. Close IndexWriter I haven't touched the out-of-the-box settings WRT ...
    Jeremy VolkmanJeremy Volkman
    Mar 26, 2009 at 1:22 am
    Mar 26, 2009 at 8:06 pm
  • Hello, I'm using Lucene 2.3.2 and had no problems untill now. But now I got an corrupt index. When searching, a java.lang.OutOfMemoryError is thrown. I've wrote the following test program: private ...
    René ZöpnekRené Zöpnek
    Mar 23, 2009 at 10:43 am
    Mar 24, 2009 at 9:00 pm
  • Hi all, Apologies if this question is off-topic, but I was wondering if there is a way of leveraging Lucene (or other mechanism) to store the information about connections and recommend People you ...
    Aaron SchonAaron Schon
    Mar 17, 2009 at 1:33 pm
    Mar 24, 2009 at 2:47 pm
  • Hi, my code receives a search query from the web, there are 5 different searches that can be searched on - each index is searched with a single IndexSearcher referenced in a map. it parses then ...
    Paul TaylorPaul Taylor
    Mar 20, 2009 at 5:02 pm
    Mar 20, 2009 at 6:12 pm
  • I want to Index Person_Name and associated phone number. Example: Abebe === +2519112332 later, When I search for Abebe, it should display +2519112332 any hint seid M -- "RABI ZIDNI ILMA" ...
    Seid MohammedSeid Mohammed
    Mar 15, 2009 at 10:56 am
    Mar 15, 2009 at 1:38 pm
  • Hi Lucene professionals! This may sound like a dumb beginner's question, but anyways: Can Lucene run out of memory during indexing? Should I use IndexWriter.flush() or .commit(), and if so, how ...
    Niels OttNiels Ott
    Mar 11, 2009 at 5:47 pm
    Mar 12, 2009 at 5:10 pm
  • Hi, I've been trying to find a way which allows executing a query that contains both Tokenized and Untokenized fields on Lucene's index, without having to parse the query. I've been able to execute a ...
    Mar 9, 2009 at 3:02 pm
    Mar 12, 2009 at 3:33 am
  • I'm not getting anything when I go to http://www.getopt.org/luke/, or http://www.getopt.org. Does anyone know how long the site is expected to be down and is there an alternate download location for ...
    Ruslan SivakRuslan Sivak
    Mar 4, 2009 at 5:08 pm
    Mar 4, 2009 at 8:01 pm
  • Can we have an API that exposes index information, e.g. number of segments etc.? (or simply make SegmentInfo(s) public classes) We currently do this by working around package-level protecting by ...
    John WangJohn Wang
    Mar 31, 2009 at 7:44 pm
    Apr 1, 2009 at 2:23 am
  • Hello Lucene users, We have all our xml documents stored in a content management system from MarkLogic. Is there any best approach to index these documents via lucene?
    Shah, YagneshShah, Yagnesh
    Mar 30, 2009 at 2:46 pm
    Mar 30, 2009 at 4:00 pm
  • I have an application clustered on two servers. Is the best practice to have two lucene indexes - one on each server for the app or is it best to have one index (on one physical path) which can be ...
    Mar 25, 2009 at 4:19 pm
    Mar 26, 2009 at 4:24 pm
  • Hi All I want my lucene to index documents and making some terms to have more boost value. so, if I index the document "The quick fox jumps over the lazy dog" and I want the term fox and dog to have ...
    Seid MohammedSeid Mohammed
    Mar 24, 2009 at 12:01 pm
    Mar 25, 2009 at 9:11 am
  • Is there an "elegant" approach to partitioning a large Lucene index (~1TB) into smaller sub-indexes other than the obvious method of re-indexing into partitions? Any ideas? Thanks, Shashi
    Shashi KantShashi Kant
    Mar 21, 2009 at 8:30 pm
    Mar 25, 2009 at 7:06 am
  • Hi I am using Lucene 2.4 in our project. I am using FSdirectory to store the index. when ever index is updated the first search is very slow. I am using the combination of CustomScoreQuery and ...
    Mar 23, 2009 at 4:53 pm
    Mar 23, 2009 at 6:23 pm
  • Hello all, i've a search application which uses lucene-2.3.0 , and my application running for a banking domain. Am indexing some banking urls as an input and am searching some keywords. What my doubt ...
    Mar 19, 2009 at 12:54 pm
    Mar 19, 2009 at 2:50 pm
  • Hi, I edited Luke's code so it also uses my classes (I added the jar to the class-path and put it in the lib folder). When I run from java it works good. Now I try to build it and invoke Luke's jar ...
    Liat orenLiat oren
    Mar 17, 2009 at 10:21 am
    Mar 17, 2009 at 11:19 am
  • Hello Niels, You cannot use the trie package with current lucene stable. To compile, you must also apply LUCENE-1478 to the core. Another option is to checkout trie and remove the SortField and ...
    Uwe SchindlerUwe Schindler
    Mar 14, 2009 at 4:05 pm
    Mar 14, 2009 at 10:59 pm
  • Hi all, I'm working on my prototype system and it turns out that RangeQueries are quite slow. In a first test I have about 80.000 documents in my index and I combine two range queries with a normal ...
    Niels OttNiels Ott
    Mar 14, 2009 at 12:38 pm
    Mar 14, 2009 at 3:58 pm
  • I wrote a really basic read-only Directory implementation for indices contained in zip files. It's read-only because that's what Java's API supports, and it has no documentation or anything else ...
    Mar 6, 2009 at 10:03 pm
    Mar 9, 2009 at 5:57 pm
  • Hi all, I'm not able to see what's wrong in the following sample code. I'm indexing a document with 5 fields, using five different indexing strategies. I'm fine the the results for 4 of them, but ...
    John MarksJohn Marks
    Mar 5, 2009 at 3:18 pm
    Mar 6, 2009 at 11:52 am
Group Navigation
period‹ prev | Mar 2009 | next ›
Group Overview
groupjava-user @

135 users for March 2009

Michael McCandless: 106 posts Amin Mohammed-Coleman: 49 posts Erick Erickson: 41 posts Grant Ingersoll: 28 posts Uwe Schindler: 22 posts Liat oren: 19 posts Ian Lea: 16 posts Mark harwood: 16 posts Raymond Balmès: 16 posts Chris Hostetter: 14 posts Yonik Seeley: 14 posts Chetan Shah: 11 posts Ganesh: 10 posts Paul Libbrecht: 10 posts Allahbaksh Mohammedali Asadullah: 8 posts Marvin Humphrey: 8 posts Shashi Kant: 8 posts Chris Lu: 7 posts Erik Hatcher: 7 posts Jason Rutherglen: 7 posts
show more