Search Discussions

130 discussions - 568 posts

  • Hi all, I was wondering if it is possible to do boosting by search terms' position in the document. for example: search terms appear in the first 100 words, or first 10% words, or in first two ...
    Cedric HoCedric Ho
    Aug 1, 2007 at 4:13 am
    Dec 18, 2007 at 5:19 pm
  • Is it possible to rename fields in an existing index without having to re-index all documents? thx -- Antoine Baudoux Development Manager ab@taktik.be Tél.: +32 2 333 58 44 GSM: +32 499 534 538 Fax.: ...
    Antoine BaudouxAntoine Baudoux
    Aug 22, 2007 at 7:55 am
    Feb 1, 2008 at 7:30 am
  • Hello, I have an application with a 2GB index. A lot of documents (up to 10.000 per day) are added/deleted to this index. My customer would like to have a Maximum of 7 minutes delay between a media ...
    Antoine BaudouxAntoine Baudoux
    Aug 29, 2007 at 8:04 pm
    Aug 31, 2007 at 4:08 pm
  • A few questions on custom score queries: [1] I need to rank matches by some combination of keyword match, popularity and recency of the doc. I read the docs about CustomScoreQuery and seems to be a ...
    Aug 13, 2007 at 11:35 pm
    Aug 20, 2007 at 2:00 pm
  • Hi, I would like to keep user search history data and I am looking for some ideas/advices/recommendations. In general I would like to talk about methods of storing such data, its structure and how to ...
    Lukas VlcekLukas Vlcek
    Aug 10, 2007 at 7:28 am
    Aug 14, 2007 at 8:53 pm
  • Hi, I indexed a large number of large documents, but I did not index the document themselves. Now I am interested in getting the vector (i.e.: the terms indexed and the frequency) of that indexed but ...
    Aug 3, 2007 at 9:18 am
    Nov 6, 2007 at 11:36 am
  • Using Lucene 2.2.0, I still sporadically got doc out of order error. I indexed all of my stuff in one thread. Do you have any idea why it happens? Thanks! -- View this message in context: ...
    Aug 15, 2007 at 11:05 pm
    Aug 16, 2007 at 7:50 pm
  • Hi all, Can I get just a list of document Ids given a search criteria ? To elaborate here is my situation: I store 20000 contracts in the file system index each with some parameterName and Value. ...
    Aug 2, 2007 at 8:56 am
    Aug 4, 2007 at 3:14 am
  • Hi, I'm new to this list. So first of all Hello to everyone! So right now I have a little issue I would like to discuss with you. Suppose that your are in a really big application where the data in ...
    Jonathan ArielJonathan Ariel
    Aug 22, 2007 at 1:15 pm
    Aug 26, 2007 at 10:46 am
  • Hello while calling IndexReader.deletedoc(int) I am becomming a NPE. java.lang.NullPointerException at org.apache.lucene.index.IndexReader.acquireWriteLock(IndexReader.java:658) at ...
    Eric LouvardEric Louvard
    Aug 21, 2007 at 8:10 am
    Aug 22, 2007 at 10:59 am
  • I know how to do english text with POI and PDFBox and so on. Now, I want to start indexing non-english language such as french and spanish. Which extraction libs are available for me? I want to do: ...
    Michael PrichardMichael Prichard
    Aug 1, 2007 at 5:45 am
    Sep 12, 2007 at 2:02 pm
  • Hi! I have an index containing the following fields "id" (not to be confused with the internal Lucene id) "version" "date" The combination of "id" and "version" is unique, i.e. there may be serveral ...
    Per LindbergPer Lindberg
    Aug 28, 2007 at 3:49 pm
    Aug 29, 2007 at 3:53 pm
  • Hi all, My problem is as follows: Our documents each comes from a different publication. And we currently have 5000 different publication sources. Our clients can choose arbitrarily a subset of the ...
    Cedric HoCedric Ho
    Aug 13, 2007 at 4:18 am
    Aug 15, 2007 at 5:52 am
  • I have a searchable index of documents which contain french and spanish diacritics (è, é, À) etc. I would like to make the content searchable so that when a user searches for a word such as ...
    Aug 27, 2007 at 2:03 pm
    Aug 28, 2007 at 11:55 am
  • Hi I am using WhitespaceAnalyzer and the query is " icdCode:H* " but there is no result however I know that there are many documents with this field value such as H20, H20.5 etc. this field is ...
    Mohammad NorouziMohammad Norouzi
    Aug 15, 2007 at 5:18 am
    Aug 19, 2007 at 3:15 pm
  • Hi, Folks - Two quick questions - need to size a server to run our new index. If I have an index with 111k articles and 90 million words indexed, how much RAM should I have to get really fast access ...
    Lucene userLucene user
    Aug 12, 2007 at 7:04 am
    Aug 13, 2007 at 10:33 pm
  • Hi, I have indexed 5 fields and stored 2 of them(field Length is around 10000). My index is growing in nature and it is in GB. I need to get search result based on docID only. Scoring, additional ...
    SK RSK R
    Aug 6, 2007 at 11:39 am
    Aug 7, 2007 at 12:25 pm
  • I'm creating a tokenized "content" Field from a plain text file using an InputStreamReader and new Field("content", in); The text file is large, 20 MB, and contains zillions lines, each with the the ...
    Per LindbergPer Lindberg
    Aug 31, 2007 at 2:17 pm
    Sep 11, 2007 at 1:29 pm
  • Hello i would like to implement a suggest implementation (like google suggest) using lucene. i actually tried using lucene and it was successfull but i was stuck in some point which is returning a ...
    Heba FaroukHeba Farouk
    Aug 21, 2007 at 10:00 am
    Aug 22, 2007 at 12:38 pm
  • Hi, Is there a way to delete the results from a query or a filter and not documents specified by Term. I have seen some explanations here but i do not know how to do it: ...
    Abu Abdulla alhanbaliAbu Abdulla alhanbali
    Aug 18, 2007 at 6:40 am
    Aug 20, 2007 at 6:44 pm
  • I'm working on refining my stopwords by looking at the highest scoring document returned for each search, and using the highlighter to show which terms were significant in choosing that document. ...
    Donna L GreshDonna L Gresh
    Aug 15, 2007 at 5:22 pm
    Aug 16, 2007 at 4:17 pm
  • Hi, could u pl. tell me how to update boost factor of already indexed document using setBoost. Thanks & regards, Rohit -- VANDE - MATRAM
    Rohit sainiRohit saini
    Aug 10, 2007 at 3:59 am
    Aug 14, 2007 at 10:45 am
  • Hello, I need to do a search that is capable to also match on substrings, for example: *oo bar the qu* should find a document that contains 'foo bar the quux' and 'foo bar the qux'. Now, should I ...
    Ard SchrijversArd Schrijvers
    Aug 8, 2007 at 8:28 am
    Aug 12, 2007 at 3:55 am
  • Hi There! I've been working for a while on the implementation of a website oriented to contents that would contain millions of entries, most of them indexable (such as descriptions, texts, names, ...
    Antonello ProvenzanoAntonello Provenzano
    Aug 10, 2007 at 9:09 am
    Aug 11, 2007 at 1:21 pm
  • Hi again, everyone. First of all, I want to thank everyone for their extremely helpful replies so far. Also, I just started reading the book "Lucene in Action" last night. So far it's an awesome ...
    Joe AttardiJoe Attardi
    Aug 1, 2007 at 3:32 pm
    Aug 1, 2007 at 9:31 pm
  • Hi Lucene gurus, I am newbie and i have a question on transfering index directories across multiple machines. Whenever i update/add any new documents to the existing index, then it is generating new ...
    Varma dVarma d
    Aug 26, 2007 at 1:52 am
    Jun 9, 2009 at 9:19 am
  • I'm invoking Luke like this: java -jar lukeall-0.7.1.jar I run this query: content:Nyarubuye When I use the StandardAnalyzer I get results but when I use the KeywordAnalyzer I don't get results. Can ...
    Kai_testing MiddletonKai_testing Middleton
    Aug 7, 2007 at 11:22 pm
    Dec 2, 2008 at 6:24 pm
  • Hi, I have fields which have high multiplicity; for example I have a topic with 1000 names, 500 of which are "USA" and 200 are "United States of America". Previously I was indexing "USA USA .(500x).. ...
    Tim SturgeTim Sturge
    Aug 28, 2007 at 7:29 pm
    Aug 29, 2007 at 7:59 pm
  • Hi All, I have the following set up: a) Indexed set of docs. b) Ran 1st query and got tops docs c) Fetched the id's from that and stored in a data structure. d) Ran 2nd query , got top docs , fetched ...
    Aug 16, 2007 at 6:20 pm
    Aug 20, 2007 at 3:43 pm
  • Hello, I have an index with an 'actor' field, for each actor there exists an single field value entry, e.g. stored/compressed,indexed,tokenized,termVector,termVectorOffsets,termVectorPosition ...
    Aug 16, 2007 at 9:50 am
    Aug 20, 2007 at 2:18 pm
  • Hi, John I think you cost too much time in I/O,and if you use RAMDirectory first will better.see http://wiki.apache.org/lucene-java/ImproveIndexingSpeed kai -----邮件原件----- 发件人: Erick Erickson 发送时间: ...
    Kai HuKai Hu
    Aug 13, 2007 at 7:02 am
    Aug 16, 2007 at 2:01 pm
  • Hi all, Lucene query parser synax page (http://lucene.apache.org/java/docs/queryparsersyntax.html) provides the following two examples of range query: mod_date:[20020101 TO 20030101] and title:{Aida ...
    Nilesh BansalNilesh Bansal
    Aug 11, 2007 at 8:27 pm
    Aug 13, 2007 at 1:20 pm
  • Hi there, I have my 25 indexes of 1.8GB each read with MultiReader. I try to get the document frequency of all the terms in specific documents and it takes quite a long time - a document with 1000 ...
    Aug 5, 2007 at 11:41 pm
    Aug 7, 2007 at 12:57 am
  • Hi, I got unexpected behavior while testing lucene. To shortly address the problem: Using IndexWriter I add docs with fields named ID with a consecutive order (1,2,3,4, etc) then close that index. I ...
    Ridwan HabbalRidwan Habbal
    Aug 1, 2007 at 3:49 pm
    Aug 2, 2007 at 1:02 pm
  • I want to set documents in my IndexReader as deleted, but I will never commit these deletions. Sort of a filter on a reader rather than on a searcher, and no write-locks. Can I do that out of the ...
    Karl wettinKarl wettin
    Aug 20, 2007 at 2:46 am
    Sep 4, 2007 at 3:50 pm
  • I have been fine with my database (discussion forum) to lucene. I am taking the simplest approach, eg; I have a discussion forum which are just text messages, I take those out of the databse and then ...
    Aug 31, 2007 at 8:15 pm
    Sep 3, 2007 at 8:14 pm
  • Hi everyone, I have the following need and I wander what are my options or if anyone run into it and has a solution / suggestion. I'm indexing a SQL database. Each table is a Lucene index. Now, in ...
    George AroushGeorge Aroush
    Aug 30, 2007 at 2:03 am
    Aug 30, 2007 at 9:55 am
  • I've searched the mailing list archives, the web, read the FAQ, etc and I don't see anything relevant so here it goes… I'm trying to implement a radius based searching based on zip/postal codes. (The ...
    Aug 29, 2007 at 3:36 pm
    Aug 29, 2007 at 5:04 pm
  • i'm looking at doing some statistical work with lucene searches and the function queries look like a nice starting point. i found the DocValues.getMin/Max/Avg functions already however there doesn't ...
    Will JohnsonWill Johnson
    Aug 24, 2007 at 9:04 pm
    Aug 27, 2007 at 1:43 pm
  • Hello, I'm indexing 2,5 millions docs. I already have added 1,2 millions docs to the index and the indexing speed becomes quite slow. my index directory is 1GB . Is there a limit to the indexing ...
    Antoine BaudouxAntoine Baudoux
    Aug 27, 2007 at 6:27 am
    Aug 27, 2007 at 12:58 pm
  • Hi, I need your help in formalizing this query: (field1:query1 AND field2:query2) OR (field1:query3 AND field2:query4) OR (field1:query5 AND field2:query6) OR (field1:query7 AND field2:query8) ... ...
    Abu Abdulla alhanbaliAbu Abdulla alhanbali
    Aug 10, 2007 at 4:21 am
    Aug 18, 2007 at 4:30 am
  • I've been experimenting with using SpanQuery to perform what is essentially a limited type of database 'join'. Each document in the index contains 1 or more 'rows' of meta data from another 'table'. ...
    Peter KeeganPeter Keegan
    Aug 13, 2007 at 6:34 pm
    Aug 15, 2007 at 12:02 am
  • Here's a scenario I just ran into, though I don't know how to make Lucene do it (or even if it can). I have two lists; to keep things simply lets assume (A B C D E F G) and (X Y). I want to form a ...
    Walt StoneburnerWalt Stoneburner
    Aug 13, 2007 at 6:20 pm
    Aug 14, 2007 at 8:17 pm
  • Antonello, You are right,I think lucene indexsearcher will search the old information if IndexWriter was not closed(I think lucene release the Lock here),so I only add a few documents every time from ...
    Kai HuKai Hu
    Aug 10, 2007 at 10:17 am
    Aug 11, 2007 at 5:16 pm
  • I was wondering if there is a "search based" method to find the top-k frequent phrases in a set of documents.( I do not have a particular phrase in mind so PhraseQuery can probably be ruled out). I ...
    Akanksha BaidAkanksha Baid
    Aug 9, 2007 at 7:35 am
    Aug 10, 2007 at 4:41 pm
  • Is there a good way to handle the following scenario: I have certain terms with embedded periods for which I want to leave them intact (not split at the periods). For example in my application a ...
    Donna L GreshDonna L Gresh
    Aug 9, 2007 at 2:37 pm
    Aug 9, 2007 at 5:29 pm
  • Hi, I got stuck with a complex proximity clause - and would be grateful to get your help. Does Lucene allow, and if yes: what is the syntax? * Proximity between two phrases, for instance a within n1 ...
    Aug 5, 2007 at 12:27 am
    Aug 5, 2007 at 12:09 pm
  • Hi, We're considering to use the new IndexWriter.deleteDocuments call rather than the IndexReader.delete call. Are there any performance improvements that this may provide, other than the benefit of ...
    Andreas KnechtAndreas Knecht
    Aug 3, 2007 at 4:27 am
    Aug 3, 2007 at 7:47 pm
  • Hello, I've been asked to devise some way to discover and correct data in Lucene indexes that have been "corrupted." The word "corrupt", in this case, has a few different meanings, some of which ...
    Joe RJoe R
    Aug 2, 2007 at 3:24 pm
    Aug 3, 2007 at 1:20 pm
  • I understand that only document that has been indexed will be able to search. I already manage to index the document and also search the content of the document. The problem is, why is that there are ...
    Aug 1, 2007 at 4:32 am
    Aug 1, 2007 at 9:19 am
Group Navigation
period‹ prev | Aug 2007 | next ›
Group Overview
groupjava-user @

138 users for August 2007

Erick Erickson: 41 posts Karl Wettin: 30 posts Grant Ingersoll: 21 posts Testn: 21 posts Chris Hostetter: 19 posts Tom: 16 posts Mark Miller: 16 posts Antoine Baudoux: 14 posts Kai Hu: 13 posts Lukas Vlcek: 13 posts Michael McCandless: 12 posts Ard Schrijvers: 11 posts Tierecke: 11 posts Cedric Ho: 9 posts Chris Lu: 9 posts Mike Klaas: 9 posts Mohammad Norouzi: 9 posts Donna L Gresh: 7 posts Michael Busch: 7 posts Paul Elschot: 7 posts
show more