Search Discussions

65 discussions - 295 posts

  • Hi, I am developing an application which uses Lucene for indexing and searching 1 bln documents. (the document size is very small though. Each document has a single field of 5-10 words; so I believe ...
    Aug 10, 2010 at 6:55 am
    Aug 27, 2010 at 5:13 pm
  • Hi all, I need to know if there is a Lucene plug-in or a Lucene-based API for calculating the term co-occurrence matrix for a given text corpus. Thanks! -- Ahmed
    Ahmed algoharyAhmed algohary
    Aug 19, 2010 at 3:40 pm
    Aug 26, 2010 at 9:04 am
  • Hi, we are building an application using Lucene and we have HUGE data sets (our index contains millions and millions and millions of documents), which obviously cause us very important problems when ...
    Michel NadeauMichel Nadeau
    Aug 16, 2010 at 7:09 pm
    Aug 18, 2010 at 5:04 pm
  • Lucene developers, We’ve been working on a undergraduate project to the college about changing Apache Nutch (that uses Lucene do index it’s web pages) to include a category filter, and we are having ...
    Luan CestariLuan Cestari
    Aug 8, 2010 at 11:16 pm
    Aug 13, 2010 at 7:13 pm
  • hi all, I am new to Lucene. I am trying to use Lucene to generate data for a document classifier. I need to generate wordno, lineno, pageno for each term/phrase. I was able to use ...
    Arun rArun r
    Aug 3, 2010 at 2:58 pm
    Aug 10, 2010 at 3:03 am
  • Hi, I have a Lucene index that contains a numeric field along with certain other fields. The order of incoming documents is random and un-predictable. As a result, while creating an index, I end up ...
    Aug 18, 2010 at 11:42 am
    Aug 26, 2010 at 3:19 am
  • In an attempt to avoid doubling disk usage when adding new fields to all existing documents, I added a call to IndexWriter::expungeDeletes. Then my colleague pointed out that Lucene will rewrite the ...
    Aug 23, 2010 at 11:08 pm
    Aug 24, 2010 at 1:56 pm
  • Hi all, We're observing search threads slowing down during directory copies performed during updates to the index. The thread dump shows search threads blocked on a ...
    Aug 23, 2010 at 7:43 am
    Aug 24, 2010 at 1:15 pm
  • Hi My multithreaded code was always creating a new IndexSearcher for every search, but I changed over to the recommendation of creating just one index searcher and keeping it between searches. Now I ...
    Paul TaylorPaul Taylor
    Aug 25, 2010 at 8:22 pm
    Sep 7, 2010 at 4:29 pm
  • Hi all, We are currently evaluating potential search frameworks (such as Hibernate Search) which might be suitable to use in our project (using Spring, JPA with Hibernate) ... I am sending this ...
    Schreiner WolfgangSchreiner Wolfgang
    Aug 25, 2010 at 1:21 pm
    Aug 31, 2010 at 12:05 pm
  • Hi, Lets say that I am indexing large book documents broken into chapters. A typical book that you buy at amazon. What would be the approximate limit to the number of books that can be indexed slowly ...
    Aug 14, 2010 at 1:25 am
    Aug 16, 2010 at 4:00 pm
  • Hi all, I have an index directory that is growing pretty fast, and is now at 138GB. A while ago, this index got corrupted. It was rebuilt, but the engineer cannot remember whether he deleted the ...
    Andrew BrunoAndrew Bruno
    Aug 14, 2010 at 10:12 am
    Aug 16, 2010 at 9:31 am
  • we are building a search system on top of lucene, and we are now looking for a facet feature So it there a easy way to do this ? btw, we do not want to switch to solr just for this! ...
    Fulin tangFulin tang
    Aug 30, 2010 at 9:58 am
    Sep 9, 2010 at 5:20 am
  • Hi, we are currently considering to switch from Lucene + Cassandra to *Lucandra*, mainly for the following reasons: * Ability to have many threads writing in the same index at the same time; * Live ...
    Michel NadeauMichel Nadeau
    Aug 23, 2010 at 7:22 pm
    Sep 3, 2010 at 4:56 pm
  • I have about 70k document, the total indexed size is about 15MB(the orginal text files' size). dir=new RAMDirectory(); IndexWriter write=new IndexWriter(dir,...; for(loop){ writer.addDocument(doc); } ...
    Li LiLi Li
    Aug 26, 2010 at 7:24 am
    Aug 27, 2010 at 8:48 am
  • Hi, i've built a local index of the german wikipedia (works fine so far). Now when i'm searching this index with luke (or my own code) using a query like "title:(-Datei*) avl" i still get results ...
    Christoph HermannChristoph Hermann
    Aug 16, 2010 at 1:32 pm
    Aug 16, 2010 at 7:39 pm
  • Hello, I am working with ConstantScoreQuery and ConstantScoreRangeQuery. Both shoud according to the description return the value of their boost as score for all matching documents. However I always ...
    ©plíchal Jiří©plíchal Jiří
    Aug 6, 2010 at 10:08 am
    Aug 6, 2010 at 1:20 pm
  • I am trying to have multi-word synonyms work in lucene using Solr's * SynonymFilter*. I need to match synonyms at index time, since many of the synonym lists are huge. Actually they are really not ...
    Arun RangarajanArun Rangarajan
    Aug 17, 2010 at 11:45 pm
    Aug 27, 2010 at 1:01 am
  • Hi - Using Lucene 2.9.3, I'm indexing the metadata in image files. For each image ("document" in Lucene), I have 2 additional special fields: "FILE-PATH" (containing the full path of the file) and ...
    Paul J. LucasPaul J. Lucas
    Aug 22, 2010 at 7:25 pm
    Aug 22, 2010 at 10:14 pm
  • hello all, you may remember me as the one who ask about how to understand lucene in the previous email,but I have now been able to create a sample application of lucene. I read the book and able to ...
    Aug 19, 2010 at 6:43 am
    Aug 20, 2010 at 11:49 am
  • Hi, I am researching the possibility of using Lucene for discovering clusters of documents and since I am new to Lucene I decided to ask the community for advice before I poke the APIs and the ...
    Nik KolevNik Kolev
    Aug 15, 2010 at 9:04 pm
    Aug 19, 2010 at 6:28 am
  • about RAMDirectory based B/S plantform problem hello, I just start to use lucene and become confused about RAMDirectory based lucene index establishment, the problem is one user use this RAM to ...
    xiaoyan Zhengxiaoyan Zheng
    Aug 17, 2010 at 2:47 am
    Aug 17, 2010 at 7:21 am
  • Hello all, I am getting the following exception for one of my customer. I think the database is corrupted but want to know the exact cause. Exception: read past EOF ...
    Aug 11, 2010 at 4:31 am
    Aug 11, 2010 at 12:56 pm
  • hi all we analyze system call of lucene and find that the fdx file is always read when we get field values. In my application the fdt is about 50GB and fdx is about 120MB. I think it may be benifit ...
    Li LiLi Li
    Aug 5, 2010 at 7:38 am
    Aug 5, 2010 at 10:20 am
  • Hi I am currently building an application whereby there is a remote index server (yes it probably does sound like Solr :)) and users use my API to send documents to the indexing server for indexing. ...
    Amin Mohammed-ColemanAmin Mohammed-Coleman
    Aug 1, 2010 at 7:01 pm
    Aug 4, 2010 at 5:25 am
  • Hi everyone, I'm trying to figure out the effects on search performance of using the non-CFS format and spreading the various underlying files to different disks/media types. For example, I'm ...
    Stefan NikolicStefan Nikolic
    Aug 26, 2010 at 9:37 pm
    Aug 27, 2010 at 3:27 pm
  • Hi I have a list of batch tasks that need to be executed. Each batch contains 1000 documents and basically I use a RAMDirectory based index writer, and at the end of adding 1000 documents to the ...
    Amin Mohammed-ColemanAmin Mohammed-Coleman
    Aug 26, 2010 at 6:22 pm
    Aug 26, 2010 at 7:08 pm
  • Is there anyway to walk the terms in reverse order? I have a query that need to find the last matching term -- if it could start checking from the end, it would avoid a lot of work. Thanks Ryan ...
    Ryan McKinleyRyan McKinley
    Aug 23, 2010 at 2:58 pm
    Aug 23, 2010 at 3:08 pm
  • Hello Everyone, Can anyone point me to a publicly Question answering system built using lucene on TREC or non-TREC data. Regards, Ramneek
    Ramneek Maan SinghRamneek Maan Singh
    Aug 14, 2010 at 10:55 am
    Aug 17, 2010 at 8:18 am
  • hey, I am new with this mail list thing, i wonder how to post a question or a message? I just send a question to FAQ mail address, but i recevie a letter with none available, have i send the wrong ...
    xiaoyan Zhengxiaoyan Zheng
    Aug 17, 2010 at 2:29 am
    Aug 17, 2010 at 3:07 am
  • Hi, I want to be able to regenerate index from time to time. I'm using IndexSearcher for search and want to be able to release the current index file so that I can replace it with the new one. But ...
    Mylnikov SergeyMylnikov Sergey
    Aug 16, 2010 at 1:22 pm
    Aug 16, 2010 at 1:42 pm
  • Hey all - apologize for the quick cross post - just to let you know, Andrzej is giving a free webinar this wed. His presentations are always fantastic, so check it out: Lucid Imagination Presents a ...
    Mark MillerMark Miller
    Aug 9, 2010 at 7:17 pm
    Aug 13, 2010 at 12:17 pm
  • Hello, Is there any point during a merge operation where the index cannot be searched or is unstable? We want to create a bunch of smaller indexes in parallel and then merge them into a single index ...
    Aug 10, 2010 at 3:33 pm
    Aug 10, 2010 at 6:20 pm
  • Hello All, Is there a way to count the number of times a query matched in a particular document? For example, say we created a document that had the string "cheese cheese cheese cheese" in the field ...
    Ryan McVRyan McV
    Aug 8, 2010 at 4:45 am
    Aug 8, 2010 at 8:10 pm
  • Hello Guys, I trying to understand how lucene score is calculated. So 'm using the searcher.explain() function. But the output it gives is really confusing for me. Below are the details of the query ...
    Soby ThomasSoby Thomas
    Aug 7, 2010 at 3:03 pm
    Aug 7, 2010 at 6:39 pm
  • I have a situation where I'm using a Boost on documents to bump them up in the search results when a search has multiple documents with the same hits in the search query. However, it looks like if ...
    Brian PontarelliBrian Pontarelli
    Aug 4, 2010 at 9:46 pm
    Aug 5, 2010 at 2:03 pm
  • Hi all, Is there any relevance engine that is built in lucence and which can be customized. Regards, Anirban De Yahoo: anirbande Skype: anirbande Gtalk : ade.sxc
    d' Anid' Ani
    Aug 3, 2010 at 12:09 pm
    Aug 4, 2010 at 6:19 am
  • I'm curious about what the largest Lucene installations are, in terms of: - Greatest number of documents (i.e. X billion docs) - Largest data size (i.e. Y terabytes of indexes) - Most machines (i.e. ...
    Aug 26, 2010 at 1:05 pm
    Sep 9, 2010 at 5:24 am
  • Does anyone on the list know about the nature of this research or collaboration:- A search engine based on '25 years of cutting edge research from the Indian Institutes of Technology, the University ...
    Aug 1, 2010 at 3:29 pm
    Aug 31, 2010 at 3:37 am
  • Hello, I am familiar with the SpanQuery construct and set an upper Slop limit. 1. But when I get the hit results, is there any way I can access the actual slop and the span text itself in that ...
    Shashi KantShashi Kant
    Aug 26, 2010 at 4:48 pm
    Aug 30, 2010 at 1:27 pm
  • Here is my index structure. for each document: Field articleTitle (only one value) Field majorHeading (multiple values) Field minorHeading (multiple values) I use heading (can be both majorHeadings ...
    Qi LiQi Li
    Aug 28, 2010 at 2:12 am
    Aug 28, 2010 at 6:46 am
  • Hi, All , I want to eliminate stop words from surround query how I can do that ...as I am new to QueryParser languages(JavaCC) .. Any Ideas or suggestions ? Thanks, Jagdish
    Jagdish Vasani INJagdish Vasani IN
    Aug 23, 2010 at 2:03 pm
    Aug 27, 2010 at 2:58 pm
  • Hi, Can any one guide me.. how I can accomplish to add default operator (W/1) in surround query ? Thanks, Jagdish
    Jagdish Vasani INJagdish Vasani IN
    Aug 27, 2010 at 12:54 pm
    Aug 27, 2010 at 12:59 pm
  • hi , 1. whether any search query, will scan for all documents in the lucene indexes 2. I want to search query faster.So I thought of if I could reduce the number of docs , lucene needs to search for ...
    Suman HolaniSuman Holani
    Aug 26, 2010 at 8:02 am
    Aug 26, 2010 at 3:38 pm
  • Hi all, Hi all, Unicorn just provide a URI and push the button. It will call a series of validation services and report the results.I have already downloaded and installed Unicorn. To Download the ...
    Aug 20, 2010 at 11:45 am
    Aug 20, 2010 at 1:19 pm
  • Hi, In my index lucene index, I want to search on a field, but the score or order of returned documents is not important. What is important is which documents are returned. As, I do not need score or ...
    Aug 18, 2010 at 11:47 am
    Aug 18, 2010 at 1:41 pm
  • I was setting up a new instance of my program on a new computer. I got this error: 2010-08-14 10:05:21,951 ERROR Thread LuceneThread: java.lang.IndexOutOfBoundsException: Not a valid hit number: 0 ...
    Herbert RoitblatHerbert Roitblat
    Aug 14, 2010 at 2:34 pm
    Aug 15, 2010 at 8:58 pm
  • Hi, Can anyone explain to me how exactly the Tokenizers and tokenattributes interact with each other? Or perhaps point me to a link which has a the interaction/sequence diagram for the same? I want ...
    Devshree SaneDevshree Sane
    Aug 14, 2010 at 2:19 pm
    Aug 14, 2010 at 8:23 pm
  • I found the system call by java when reading file, the buffer size is always 1024. Can I modify this value to reduce system call? --------------------------------------------------------------------- ...
    Li LiLi Li
    Aug 5, 2010 at 6:17 am
    Aug 5, 2010 at 9:50 am
  • Hi, I’m currently involved in a project of migrating from Lucene 2.9.1 to Solr 1.4.0. During stress testing, I encountered this performance problem: While actual search times in our shards (which are ...
    Ophir AdivOphir Adiv
    Aug 3, 2010 at 6:29 pm
    Aug 3, 2010 at 11:47 pm
Group Navigation
period‹ prev | Aug 2010 | next ›
Group Overview
groupjava-user @

101 users for August 2010

Erick Erickson: 23 posts Shelly_Singh: 18 posts Michael McCandless: 14 posts Anshum Gupta: 12 posts Uwe Schindler: 9 posts Michel Nadeau: 8 posts Grant Ingersoll: 7 posts Ian Lea: 7 posts Li Li: 7 posts Yakob: 7 posts Ivan Provalov: 6 posts Otis Gospodnetic: 6 posts Amin Mohammed-Coleman: 5 posts Danil ŢORIN: 5 posts xiaoyan Zheng: 5 posts Ahmed algohary: 4 posts Anuj Shah: 4 posts Arun r: 4 posts Beard, Brian: 4 posts Findbestopensource: 4 posts
show more