Search Discussions

73 discussions - 295 posts

  • Hi all, I have to index about 4.5Million txt files. When I run the my indexing application through Eclipse, I get this error : "Exception in thread "main" java.lang.OutOfMemoryError: Java heap space" ...
    Sahin BuyrukbilenSahin Buyrukbilen
    Oct 20, 2010 at 4:11 am
    Oct 22, 2010 at 5:02 pm
  • Hi, I'am facing some problems in using Lucene. The index I am using is constructed like this: try { Analyzer analyzer = new SnowballAnalyzer(Version.LUCENE_30, "English"); Directory dir = ...
    Oct 14, 2010 at 9:07 am
    Oct 15, 2010 at 9:03 am
  • hello all, I would like to ask of how to add new documents to an existing lucene index. I mean what's class should I use to achieve this goal. thanks -- http://jacobian.web.id ...
    Oct 27, 2010 at 12:05 pm
    Oct 29, 2010 at 7:21 am
  • Hello I am trying to use a TermFreqVector to get a count of all words in a Document as follows: // Search. int hitsPerPage = 10; IndexSearcher searcher = new IndexSearcher(index, true); ...
    Martin O'SheaMartin O'Shea
    Oct 20, 2010 at 6:23 pm
    Oct 22, 2010 at 2:09 pm
  • We are working with a large readonly lucene index(single segment) with large number of fields and documents and are running into memory usage problems. We found that when using a ...
    Cabansag, Ronald-Alvin RCabansag, Ronald-Alvin R
    Oct 29, 2010 at 1:27 pm
    Oct 31, 2010 at 3:26 am
  • Hi All, Can anyone help with this issue? I have about 2000 pdf files that I use PDFBox to extract its text, then index them using for loop. The indexing stopped after the fdt file reaches at 7,061 KB ...
    Oct 13, 2010 at 2:39 am
    Oct 14, 2010 at 5:33 am
  • Hello All: Can any one suggest me the best way to implement both sentence specific and non sentence specific phrase search? The user is going to have a check box for phrase search on the screen that ...
    Sirish VadalaSirish Vadala
    Oct 6, 2010 at 6:33 pm
    Oct 8, 2010 at 2:28 am
  • I'm stepping tru a rdf file (the project gutenberg catalog) and sending data to a lucene index to allow searches of titles authors and such. However the gutenberg rdf is a little bit "special". It ...
    Paulo LeviPaulo Levi
    Oct 31, 2010 at 5:20 pm
    Nov 1, 2010 at 8:54 am
  • Hello, I have been looking at the SearcherManager example provided in the "Lucene In Action 2nd Edition" book. It seems like a great way to manage IndexReaders but I had a few questions about the ...
    Pulkit SinghalPulkit Singhal
    Oct 27, 2010 at 4:58 pm
    Oct 27, 2010 at 5:47 pm
  • Hi Group, I have an isue when using MultiFieldQueryParser, I would like to use one query against a number of fields however I get an java.lang.IllegalArgumentException: queries.length != ...
    Lev BronshteinLev Bronshtein
    Oct 14, 2010 at 1:05 am
    Oct 25, 2010 at 10:58 am
  • Dear All, Currently, I'm using PHP/Java Bridge to have Lucene in my PHP web application, and also using the java extension for PHP. FYI, I'd setup lucene on my PC several months ago and my code below ...
    Dian pumaDian puma
    Oct 23, 2010 at 4:01 pm
    Oct 25, 2010 at 6:13 am
  • Hello I have a StandardAnalyzer working which retrieves words and frequencies from a single document using a TermVectorMapper which is populating a HashMap. But if I use the following text as a field ...
    Martin O'SheaMartin O'Shea
    Oct 24, 2010 at 7:59 pm
    Oct 25, 2010 at 2:31 am
  • well actually I am doing a kind of a thesis regarding information retrieval.and my tutor wanted me to be able to create a program that firstly index a document in memory using RAMDirectory and then ...
    Oct 12, 2010 at 1:37 am
    Oct 16, 2010 at 4:48 pm
  • Hi, is there a way to store additional metadata with fields? My Problem is as follows: I'm extracting extended html with tika. This extended html contains references to pages, x,y values of the text ...
    Christoph HermannChristoph Hermann
    Oct 14, 2010 at 10:18 am
    Oct 15, 2010 at 7:23 pm
  • Hi, I am curious. Do you know why the book Lucene in Action, Second Edition is not available on sale (as new) on Amazon UK? http://www.amazon.co.uk/Lucene-Action-Michael-McCandless/dp/1933988177 Do ...
    Paolo CastagnaPaolo Castagna
    Oct 12, 2010 at 5:25 am
    Oct 12, 2010 at 8:27 am
  • Hi all, I'm having some issues with Numeric Range queries not working as expected. My underlying storage medium is the Lucandra index reader and writer, so I'm not sure if this is an issue within ...
    Todd NineTodd Nine
    Oct 4, 2010 at 4:14 am
    Oct 6, 2010 at 7:15 pm
  • Hi all, I need to retrieve the score of a term in a document? I dont want to play different scoring schemes. I just checked my index with Luke and it shows me a score for each term in each document ...
    Sahin BuyrukbilenSahin Buyrukbilen
    Oct 1, 2010 at 3:33 pm
    Oct 2, 2010 at 6:49 pm
  • I'd like to provide myself with a searchable index of email. I'm familiar with the Javamail library, so will use this to fetch the mail. Anyone out there done any indexing of email? On Sourceforge, ...
    Hasan DiwanHasan Diwan
    Oct 27, 2010 at 9:58 pm
    Oct 28, 2010 at 6:46 pm
  • I've written a blog regarding a work around for updating index in Lucene using parallel reader. It's explained with results and pictures. It would be great if you have a look at it. The link: ...
    Nilesh VijaywargiayNilesh Vijaywargiay
    Oct 20, 2010 at 6:58 pm
    Oct 28, 2010 at 5:53 am
  • Hello, I've been running into a problem during a merge. Would appreciate knowing what to look for since the exception doesn't seem too explanatory. I get: -- --- Nested Exception --- ...
    Cristian VatCristian Vat
    Oct 20, 2010 at 6:46 pm
    Oct 20, 2010 at 8:19 pm
  • Hello I would like to store data retrieved hourly from RSS feeds in a database or in Lucene so that the text can be easily indexed for word frequencies. I need to get the text from the title and ...
    Oct 14, 2010 at 2:18 pm
    Oct 15, 2010 at 6:16 pm
  • Hi there, I'm currently trying to work out how I can determine the type (string/number/date/etc)of a term. I've not seen any off the shelf way to do it so am trying to store a payload against each ...
    Sykes, DerekSykes, Derek
    Oct 13, 2010 at 3:38 pm
    Oct 15, 2010 at 2:29 pm
  • I have two index, A and B. Can two documents doc1[in index A] and doc2[in index B] have a common field? doc1 and doc2 have same document Id's.
    Nilesh VijaywargiayNilesh Vijaywargiay
    Oct 14, 2010 at 3:43 pm
    Oct 15, 2010 at 1:43 am
  • Hi all, I only want to index the latest one week's data, the previous data can be deleted. So I'd like to know about lucene's delete performance and whether it will has impact on the search ...
    Jeff ZhangJeff Zhang
    Oct 13, 2010 at 1:38 pm
    Oct 13, 2010 at 8:55 pm
  • Hi Group, I understand that the process of updating a document in lucene index is to delete the document and add it again. But I do not want to delete the document. I was thinking of a approach where ...
    Nilesh VijaywargiayNilesh Vijaywargiay
    Oct 12, 2010 at 6:06 am
    Oct 12, 2010 at 6:50 pm
  • When running application on Windows XP 32 bit machine the search time is 0.5 second. JVM is IBM Java 5 for 32 bit. But when running the same application on much more powerfull Windows Server 2007 64 ...
    Oct 6, 2010 at 10:22 am
    Oct 7, 2010 at 5:09 am
  • Having upgraded a live system from 2.4 to 2.9.3 the client is reporting a change in merge behaviour that is causing some issues with their update monitoring logic. The suggestion is that any merge ...
    Mark HarwoodMark Harwood
    Oct 5, 2010 at 10:27 pm
    Oct 6, 2010 at 3:25 pm
  • Hello, I'd like to know which field got hit in each doc in the hit results. To implement it, I thought I could use Scorer.freq() which was introduced 3.1/4.0: ...
    Koji SekiguchiKoji Sekiguchi
    Oct 4, 2010 at 1:59 am
    Oct 4, 2010 at 5:17 pm
  • Hi guys, I am trying to get some information on what enterprise hardware folks use out there. We are using Lucene extensively. Our total catalogs size is roughly 50GB between roughly 8 various ...
    Kovnatsky, EugeneKovnatsky, Eugene
    Oct 26, 2010 at 12:17 am
    Oct 27, 2010 at 10:02 am
  • hey - is there an API that return the number of term indexed? I found the API return the amount of document indexed (IndexWriter.docCount) but cant find an API for the amount of terms in the index. ...
    Oct 16, 2010 at 5:53 am
    Oct 16, 2010 at 10:16 am
  • Hi my original problem is to index large number of documents which contains 360 integers in rage from 0-90K. Searching it's a little bit complicated - I need to find most similar documents where ...
    Zaharije PasalicZaharije Pasalic
    Oct 15, 2010 at 8:31 am
    Oct 15, 2010 at 1:27 pm
  • Hi, I am keeping a ConcurrentMap of o.a.l.index.IndexReader which I use in my system. These readers are retrieved by multiple threads and I have no knowledge when these readers are actively used and ...
    Mindaugas ŽakšauskasMindaugas Žakšauskas
    Oct 5, 2010 at 10:14 am
    Oct 5, 2010 at 4:01 pm
  • Hi all, The JavaDocs do not appear to mention that only stored fields persist IndexWriter.updateDocument. When opening new readers, from either IndexWriter.getReader or IndexReader.open, neither ...
    Oct 4, 2010 at 6:03 pm
    Oct 4, 2010 at 6:20 pm
  • I need to auto-categorize a large number of documents. They are basically news articles from major news sources (nytimes, npr, abcnews, etc). I'd like to categorize them automatically. Any ...
    Maria VazquezMaria Vazquez
    Oct 27, 2010 at 8:12 pm
    Oct 28, 2010 at 2:13 am
  • Am about to implement a custom query that is sort of mash-up of Facets, Highlighting, and SpanQuery - but thought I'd see if anyone has done anything similar. In simple words, I need facet on the ...
    Oct 26, 2010 at 12:32 pm
    Oct 26, 2010 at 12:53 pm
  • I got stuck on a problem using NumericFields using with lucene 2.9.3 I add values to the document by doc.add(new NumericField("minprice").setDoubleValue(net_price)); If I want to search with a sorter ...
    Uwe GoetzkeUwe Goetzke
    Oct 26, 2010 at 6:33 am
    Oct 26, 2010 at 7:47 am
  • I'm currently working on building a Geocoder. The purpose of a Geocoder is to find the coordinates belonging to any given input address. I have a rather simple version based on Lucene working, ...
    Jasper de BarbansonJasper de Barbanson
    Oct 20, 2010 at 6:48 am
    Oct 20, 2010 at 9:05 am
  • I have many fields in my document and want to parse my query including each of them QueryParser parser = new QueryParser(Version.LUCENE_29, "Field2",new StandardAnalyzer(Version.LUCENE_29)); Should I ...
    Nilesh VijaywargiayNilesh Vijaywargiay
    Oct 18, 2010 at 5:55 pm
    Oct 18, 2010 at 10:24 pm
  • Is there interest in having a Meetup at ApacheCon? Who's going? Would anyone like to present? We could do something less formal, too, and just have drinks and Q&A/networking. Thoughts? -Grant ...
    Grant IngersollGrant Ingersoll
    Oct 18, 2010 at 6:57 pm
    Oct 18, 2010 at 10:05 pm
  • Hi, I would like to change the IDF value of the Lucene similarity computation to "inverse document frequency inside category". Not the complete collection should be considered, but only the documents ...
    Max JakobMax Jakob
    Oct 18, 2010 at 11:27 am
    Oct 18, 2010 at 2:33 pm
  • Hello, We're currently evaluating utilizing Lucene to index a large English corpus and we were are optimizing for space. We're basically concerned that the size of the postings lists will become ...
    Mahmoud AbdelkaderMahmoud Abdelkader
    Oct 17, 2010 at 7:17 am
    Oct 17, 2010 at 8:31 pm
  • Hello, how can i copy the Payload from the current token to the following token in a TokenFilter? I have implemented a TokenFilter and thought, that i could use input.incrementToken() to advance the ...
    Christoph HermannChristoph Hermann
    Oct 16, 2010 at 6:32 pm
    Oct 17, 2010 at 6:03 pm
  • Hi all, I am having issues building Lucene and Solr from svn checkout. I had this problem earlier but I was able to figure out the combination of ant and maven-ant-tasks that worked. Last few months ...
    Pradeep SinghPradeep Singh
    Oct 11, 2010 at 3:54 am
    Oct 11, 2010 at 3:51 pm
  • Hi Guys, Is there way to detect org.apache.lucene.util.Version of an index having IndexReader or just FSDirectory? I know I can open segments file and read the proper bytes according to rules of ...
    Ivan VasilevIvan Vasilev
    Oct 8, 2010 at 12:35 pm
    Oct 8, 2010 at 1:00 pm
  • Hi Everyone, Recently we have migrated from lucene 2.2 to lucene 2.9.3. We are having some issues in search. During the load, searchers are getting hung up. When we took a process stack, we sound ...
    Shailendra MudgalShailendra Mudgal
    Oct 7, 2010 at 5:13 am
    Oct 7, 2010 at 3:46 pm
  • Hi, I was indexing some documents, but my program crashed after several days of work. If I reopen this index it is empty. I guess the reason is that auto-commit was not set and I never performed a ...
    Philippe ThomasPhilippe Thomas
    Oct 6, 2010 at 10:02 am
    Oct 6, 2010 at 10:30 am
  • In lucene 3, is there an equivalent to obtaining a BitSet of documents from an Index as there was in version 2.x? I'm trying to put together an upgrade path. Thanks! ...
    Jordon SaardchitJordon Saardchit
    Oct 4, 2010 at 7:10 pm
    Oct 5, 2010 at 1:47 pm
  • Lets say the segment infos file is missing, and I'm aware of CheckIndex, however is there a tool to recreate a segment infos file? ...
    Jason RutherglenJason Rutherglen
    Oct 4, 2010 at 7:25 pm
    Oct 5, 2010 at 12:39 pm
  • How do I use MemoryIndex or RAMDirectory, but score using term statistics from a corpus given during preprocessing? Let's say I want to use a MemoryIndex or RAMDirectory to store a *single* document, ...
    Joseph TurianJoseph Turian
    Oct 29, 2010 at 12:07 am
    Nov 1, 2010 at 8:40 pm
  • Dear All Was setting up a web search with a query language that uses (, !, ), ^, *, ?, {, } and < in its syntax. For example: hot dog: Looks for documents with hot and dog in close vincinity. (hot ...
    Jan BurseJan Burse
    Oct 28, 2010 at 7:05 pm
    Oct 28, 2010 at 11:36 pm
Group Navigation
period‹ prev | Oct 2010 | next ›
Group Overview
groupjava-user @

102 users for October 2010

Michael McCandless: 19 posts Erick Erickson: 17 posts Ian Lea: 16 posts Sahin Buyrukbilen: 12 posts Uwe Schindler: 12 posts Nilesh Vijaywargiay: 11 posts Christoph Hermann: 10 posts Martin O'Shea: 9 posts Yakob: 8 posts Grant Ingersoll: 7 posts Steven A Rowe: 7 posts Anshum: 6 posts Pulkit Singhal: 6 posts Mark Harwood: 5 posts Pradeep Singh: 5 posts Toke Eskildsen: 5 posts Ching: 4 posts Sirish Vadala: 4 posts Cabansag, Ronald-Alvin R: 3 posts Cristian Vat: 3 posts
show more