Search Discussions

86 discussions - 424 posts

  • I occasionally get a FileNotFoundException like: Exception in thread "Thread-44" org.apache.lucene.index.MergePolicy $MergeException: java.io.FileNotFoundException: /Stuff/Caches/ ...
    Paul J. LucasPaul J. Lucas
    May 30, 2008 at 12:06 am
    Jul 6, 2008 at 7:35 pm
  • Any recent changes that would expose index corruption? I am getting two new errors when trying to search: nullpointer fieldsreaders line 260 indexoutofbounds on fieldinfo line 185 I am kind of ...
    Mark MillerMark Miller
    May 5, 2008 at 7:34 pm
    May 6, 2008 at 9:30 am
  • Hi all, I have index of size 85MB. My query looks as follows: +(t:boss* d:boss* dd:boss* tg:boss*) +st:act +ntid:0 +cid:1 +dr:[20080410 TO 20081010] +rT:[002 TO 005] All the fields used in the query ...
    Rakesh SheteRakesh Shete
    May 22, 2008 at 5:16 pm
    May 30, 2008 at 9:19 pm
  • Hi Lucene experts: I am working upgrading Lucene-Oracle integration project to latest Lucene 2.3.1 code. After correcting a minor issue on OJVMDirectory file implementation I have the integration ...
    Marcelo OchoaMarcelo Ochoa
    May 6, 2008 at 9:37 pm
    May 10, 2008 at 9:13 am
  • Hi: We are experiencing memory leak with calling IndexReader.reopen(). From eyeballing the lucene source code, I am seeing normCache is not cleared. Anyone else experiencing this? Thanks -John
    John WangJohn Wang
    May 28, 2008 at 6:25 am
    Jun 1, 2008 at 2:07 pm
  • Hi, I need to find a reliable way how to extract content out of Word, Excel and PowerPoint formats prior to indexing and I am not sure if POI is the best way to go. Can anybody share experience with ...
    Lukas VlcekLukas Vlcek
    May 12, 2008 at 2:04 pm
    May 13, 2008 at 3:50 pm
  • hi all , I have a problem that how to "combine" two score to sort the search result documents. for example I have 10 million pages in lucene index , and i know their pagerank scores. i give a query ...
    May 28, 2008 at 10:03 am
    Jun 2, 2008 at 5:24 am
  • "Don't iterate over more hits than needed. Iterating over all hits is slow for two reasons. Firstly, the search() method that returns a Hits object re-executes the search internally when you need ...
    Stephane NicollStephane Nicoll
    May 10, 2008 at 1:36 pm
    May 24, 2008 at 8:59 am
  • Hello all, I have been doing some evaluation of Lucene on a TReC collection and get a rather disappointing mean average precision (MAP) of 11%. Other sources seem to report a MAP of about 20%. So I ...
    May 4, 2008 at 6:14 pm
    May 15, 2008 at 11:20 am
  • It would appear that to see all results (including low scoring) I need to pass a different Filter to Searcher.search[1]. If filter is null, only the highest-scoring results are returned. How do I ...
    Hasan DiwanHasan Diwan
    May 16, 2008 at 3:55 am
    May 17, 2008 at 1:05 am
  • Hi, I have some issue with boolean queries. I am using Lucene-core-2.3.1. I have done test on boolean query with 3 terms (data, store, variable) in my TTL field. The TTL field is indexed and searched ...
    Sonu SudhakarSonu Sudhakar
    May 28, 2008 at 11:44 am
    Jun 3, 2008 at 5:31 am
  • Hi: What is the current status on the distributed lucene project proposed at: http://www.mail-archive.com/general@lucene.apache.org/msg00338.html Thanks -John
    John WangJohn Wang
    May 15, 2008 at 4:37 am
    Jun 3, 2008 at 4:46 am
  • Hi, other than the in memory terms (.tii), and the few kilobytes of opened file buffer, where are some other sources of significant memory consumption when searching on a large index ? ( 100GB). The ...
    May 29, 2008 at 10:18 pm
    May 29, 2008 at 11:16 pm
  • Hi all I've got a bit of a niggling problem with how one of my searches is working as opposed to how my users would like it too work. We're indexing on UK postcodes, which are in the format of a 3 or ...
    Chris MannionChris Mannion
    May 6, 2008 at 4:29 pm
    May 24, 2008 at 9:16 am
  • Hi, I am looking for a way to filter a SpanQuery according to some other query (on another field from the one used for the SpanQuery). I need to get access to the spans themselves of course. I don't ...
    Eran SeviEran Sevi
    May 6, 2008 at 8:15 am
    May 12, 2008 at 10:17 pm
  • Hi there, We're using lucene with Hibernate search and we're very happy so far with the performance and the usability of lucene. We have however a specific use cases that prevent us to use only ...
    Stephane NicollStephane Nicoll
    May 1, 2008 at 8:01 am
    May 2, 2008 at 4:59 pm
  • I get the following error trace - java.io.FileNotFoundException: no segments* file found in ...
    May 29, 2008 at 3:04 am
    Jan 5, 2011 at 7:44 am
  • Hi, I haven't been able to find the answer to this question easily so any help would be appreciated. Thanks, Tom --------------------------------------------------------------------- To unsubscribe, ...
    Tom ConlonTom Conlon
    May 24, 2008 at 10:42 am
    May 26, 2008 at 1:55 pm
  • Dear Fellow Java/Lucene developers: I am trying to use the Highlighter class to return the keywords that the user is searching for in bold. However, instead of returning a fragment of the block of ...
    May 21, 2008 at 1:34 am
    May 24, 2008 at 4:25 am
  • Hi, I've got an application which stores ratings for content in a Lucene index. It works a treat for the most part, apart from the use-case I have for being able to filter out ratings that have less ...
    Dan HardikerDan Hardiker
    May 12, 2008 at 5:40 pm
    May 14, 2008 at 2:43 pm
  • Hi, I am a newbie to Lucene. I have a question for making a query that associate 2 index files: - One index has the content index for a list of documents and a key to the document. That means the ...
    Michael SiuMichael Siu
    May 6, 2008 at 4:14 pm
    May 7, 2008 at 12:02 am
  • Bravo Grant! Rajesh, I believe the following will work: - delete your small index - optimize your big index (needed? Not 100% sure, but I think it is) - loop through the docs in your "big" index - ...
    Otis GospodneticOtis Gospodnetic
    May 1, 2008 at 1:20 am
    May 2, 2008 at 1:41 am
  • Hello everybody, sorry for posting to the list but I’m kinda helpless. I’m trying to unsubscribe from the mailing list but my unsubscribe email is treated as spam :) SMTP error from remote server ...
    Daniel FreudenbergerDaniel Freudenberger
    May 30, 2008 at 11:12 am
    Jun 1, 2008 at 8:45 pm
  • Hello out there, We have implemented some open source desktop searching app based on Lucene http://sourceforge.net/projects/dynaq Development always goes further, and currently we make experiments ...
    Christian ReuschlingChristian Reuschling
    May 27, 2008 at 4:37 pm
    May 28, 2008 at 11:33 am
  • hi, I have a ValueSourceQuery that makes use of a stored field. The field contains roughly 27.27 million untokenized terms. The average length of each term is 8 digits. The first search always takes ...
    May 19, 2008 at 6:57 pm
    May 20, 2008 at 6:12 pm
  • Hi All, I am dealing with a situation where a document could possibly have multiple attachments to it, and they are all added to the index under a document-id (not lucene doc-id). Now if one of the ...
    Dino KorahDino Korah
    May 19, 2008 at 8:17 am
    May 19, 2008 at 6:36 pm
  • As far as I know Lucene only handle single word synonyms at index time. My life would be much simpler if it was possible to add synonyms that spanned over multiple tokens, such as "lucene in ...
    Karl WettinKarl Wettin
    May 17, 2008 at 6:29 pm
    May 18, 2008 at 6:00 pm
  • Dear all, I'd like to do document clustering using full-text with Lucene. In other words, I would like to group similar documents in their respective groups. I searched the mailing list and found ...
    Supheakmungkol SARINSupheakmungkol SARIN
    May 15, 2008 at 3:24 am
    May 18, 2008 at 1:53 am
  • Hi, I have a field in index which has been indexed using StandardAnalyzer and as TOKENIZED. Now I would like to write a query which returns the hit if there is a exact match on the field value. Say, ...
    Gauri ShankarGauri Shankar
    May 13, 2008 at 1:44 pm
    May 15, 2008 at 5:52 pm
  • I have an index with several million documents that each contains between a few hundred terms and up to about a million terms. To me it feels like there would be a rather big difference between the ...
    Karl WettinKarl Wettin
    May 14, 2008 at 11:41 pm
    May 15, 2008 at 3:55 pm
  • Hello All, Any suggestions for extracting text from PDF? I have tried pdfbox, but it works nice, however if the pdf is structured, it wont provide good results. For example consider the pdf: P1 Lorem ...
    Cam BazzCam Bazz
    May 14, 2008 at 9:32 am
    May 15, 2008 at 3:49 pm
  • For some reason it seems that either Lucene or Snowball has a problem with the color purple. According the snowball experts the problem is with lucene. Can anyone shed any light? Thanks, Steve ...
    Stephen CresswellStephen Cresswell
    May 9, 2008 at 11:27 pm
    May 10, 2008 at 8:22 am
  • -- OS: Linux lg99 2.6.5-7.276-smp #1 SMP Fri Sep 28 20:33:22 AKDT 2007 x86_64 x86_64 x86_64 GNU/Linux -- Lucene: 2.3.2 (tried 2.2.0 as well, since the index was built around 2.2.0, jdk1.6.0_01 ) -- ...
    May 6, 2008 at 6:03 pm
    May 7, 2008 at 3:33 pm
  • Hi - trying to execute a search in Lucene and getting results I don't understand :( The index contains fields search_text and type - both indexed tokenized. I'm attempting to execute the query: ...
    Casey DementCasey Dement
    May 22, 2008 at 10:21 pm
    Jun 2, 2008 at 1:22 pm
  • Hi All, I am trying to figure out a quick way to find the top N documents sorted by frequency of a term. I found: IndexRead.termDocs() which provides an enumeration of doc() and freq() but it returns ...
    Hider, SandyHider, Sandy
    May 28, 2008 at 2:49 pm
    Jun 1, 2008 at 8:30 pm
  • Hi, I'm running a SpanQuery and get the Spans result which tell me the documents and positions of what I searched for. I would now like to get the payloads in those documents and positions without ...
    Eran SeviEran Sevi
    May 22, 2008 at 8:03 am
    May 26, 2008 at 7:27 am
  • We have a requirement to inform users on a regular basis of new material on which they have expressed interest. How are we to know what is "new" from the point of view of a particular user? Our idea ...
    Lucene userLucene user
    May 22, 2008 at 9:45 am
    May 22, 2008 at 8:23 pm
  • After upgrading to version 2.3.x from 2.2.0, we started experiencing issues with our index searches. Some searches produced false positives, while others produce no hits for terms known to be in ...
    Dan RuggDan Rugg
    May 16, 2008 at 5:49 pm
    May 20, 2008 at 6:07 pm
  • Hi, I have an application where I need to issue queries with a large number of or-terms with individual boosts. Currently I just construct a BooleanQuery with a large number (often 1000) of ...
    John JensenJohn Jensen
    May 18, 2008 at 12:26 am
    May 18, 2008 at 7:20 pm
  • Greetings, I'm searching against a data set using lucene that contains searches such as the following: *ache* *aChe* etc and so forth, sadly this part of the dataset is imported via an external ...
    Matthew HallMatthew Hall
    May 15, 2008 at 4:35 pm
    May 16, 2008 at 6:39 pm
  • Hello there! We are starting with lucene, and in order to prove it's usage one of the benefits is performance. I do know that lucene (as other full text search engines) provide many more benefits ...
    Vinicius CarvalhoVinicius Carvalho
    May 16, 2008 at 5:20 pm
    May 16, 2008 at 5:41 pm
  • Hello, We are using lucene for a while, and we are happy with it. Now we want to optimize some space. We are parsing versions of files and we want to keep track of history and also know which one is ...
    Jean-Claude AntonioJean-Claude Antonio
    May 15, 2008 at 5:16 pm
    May 16, 2008 at 12:02 am
  • Hi, I have a TokenStream that inserts synonym tokens into the stream when matched. One thing I am wondering about is what is the effect of the startOffset and endOffset. I have something like this: ...
    Brendan GraingerBrendan Grainger
    May 12, 2008 at 4:06 pm
    May 12, 2008 at 9:12 pm
  • What is the limit of Lucene: # of docs per index? If RangeFilter.Bits(), for example, it initializes a bitset to the size of maxDoc from the indexReader. I wonder what happen if the # of docs is ...
    Michael SiuMichael Siu
    May 8, 2008 at 5:23 pm
    May 8, 2008 at 6:29 pm
  • I'm new to lucene and have a question on how to create a query for the following example... Say I have two fields, Title and Description, with the following data Item 1 Title: The greatest hits ...
    Kelvin Foo Chuan LyiKelvin Foo Chuan Lyi
    May 6, 2008 at 4:07 pm
    May 6, 2008 at 4:41 pm
  • : Hi Lucene-user and Lucene-dev, Please do not cross post -- java-user is the suitable place for your question. : Obviously there is something wrong with the above approach (as to get the : correct ...
    Chris HostetterChris Hostetter
    May 2, 2008 at 3:57 pm
    May 4, 2008 at 3:14 pm
  • I am using Hibernate Search in my Application, the first time i attempt to index records from the database it works and the second time i attempt to add records i notice that it does not work ...
    May 2, 2008 at 11:24 pm
    May 3, 2008 at 3:31 am
  • I am new to web services. This is the situation: We have a document/corpus indexed by Lucene and say it resides on C:\Lucene\Index We are hosting Lucene as a web service (following the instructions ...
    May 30, 2008 at 6:42 pm
    Jun 3, 2008 at 5:48 am
  • Hi, Folks: What are some average search and retrieval times for Lucene queries in real production use? Would people include relevant stuff like the number of documents in your index, etc.? Thanks for ...
    Lucene userLucene user
    May 31, 2008 at 12:25 pm
    Jun 1, 2008 at 8:17 pm
  • I have a couple of quick questions about how Lucene indexes metadata: - Does it do anything special with metadata or treat it as a supplement to the words in the document? I have a feeling that the ...
    May 20, 2008 at 5:36 pm
    May 21, 2008 at 10:59 am
Group Navigation
period‹ prev | May 2008 | next ›
Group Overview
groupjava-user @

116 users for May 2008

Michael McCandless: 30 posts Mark Miller: 29 posts Otis Gospodnetic: 24 posts Erick Erickson: 18 posts Karl Wettin: 17 posts Grant Ingersoll: 16 posts Paul Elschot: 10 posts Chris Hostetter: 9 posts Stéphane Nicoll: 9 posts Steven A Rowe: 9 posts Esra: 8 posts John Wang: 8 posts Jason Rutherglen: 7 posts Michael Busch: 7 posts Ian Lea: 6 posts Jamie: 6 posts Mark harwood: 6 posts Yonik Seeley: 6 posts Alex: 5 posts Eran Sevi: 5 posts
show more