FAQ

# Lucene>java-user>December 2012

## 62 discussions - 246 posts

• 12

#### Which token filter can combine 2 terms into 1?

Hi, I am looking for a token filter that can combine 2 terms into 1? E.g. the input has been tokenized by white space: t1 t2 t2a t3 I want a filter that output: t1 t2t2a t3 I know it is a very ...
 Xi ShenDec 21, 2012 at 7:50 amDec 29, 2012 at 3:29 am
• 8

#### how to implement a TokenFilter?

Hi, I need a guide to implement my own TokenFilter. I checked the wiki, but I could not find any useful guide :( -- Regards， David Shen http://about.me/davidshen https://twitter.com/#!/davidshen84
 Xi ShenDec 22, 2012 at 2:26 pmDec 26, 2012 at 11:29 pm
• 8

#### Beginning with Lucene

Hello list, I am entirely new to Lucene and was trying yo get myself familiar with it with the help of tutorial presented here : http://www.lucenetutorial.com/lucene-in-5-minutes.html I was trying to ...
 Mohammad TariqDec 4, 2012 at 3:07 pmDec 5, 2012 at 9:45 am
• 7

#### Lock Errors within JBoss Environment

Hello, I have been getting the following lock error when attempting to open an index writer to add new documents to an index. org.apache.lucene.store.LockObtainFailedException Lock obtain timed out ...
 Bowden WiseDec 18, 2012 at 6:59 pmDec 19, 2012 at 12:24 pm
• 7

#### Separating the document dataset and the index dataset

Greetings, We are using lucene in our log analysis tool. We get data around 35Gb a day and we have this practice of zipping week old indices and then unzip when need arises. Though the compression ...
 Ramprakash RamamoorthyDec 7, 2012 at 7:33 amDec 11, 2012 at 1:51 pm
• 6

#### Semi-structured queries

I’ve been trying to do semi-structured queries & query parsing. In other words, you could have XML snippets mixed in with plain terms, e.g. a query like: christmas tree <store loc=”abc” ...
 Wu, Stephen T., Ph.D.Dec 7, 2012 at 9:47 pmDec 14, 2012 at 2:41 pm
• 6

#### CheckIndex ArrayIndexOutOfBounds error for merged index

Hello, I'm trying to merge 12 indexed into one big index using the Lucene IndexMergeTool (command line used appended below). The merge seemed to finish successfully, but when I ran CheckIndex on the ...
 Tom Burton-WestDec 5, 2012 at 9:31 pmDec 13, 2012 at 10:22 pm
• 5

#### Retrieving granular scores back from Lucene/SOLR

Hi, I am looking to get a bit more information back from SOLR/Lucene about the query/document pair scores. This would include field level scores, overall text relevance score, Boost value, BF value ...
 Vishwas GoelDec 26, 2012 at 4:31 amDec 26, 2012 at 2:22 pm
• 5

#### Boolean and SpanQuery: different results

Hi, I'm following Grant's advice on how to combine BooleanQuery and SpanQuery ...
 Carsten SchnoberDec 13, 2012 at 3:49 pmDec 19, 2012 at 4:32 pm
• 5

I have just downloaded and set up Lucene 4.0.0 to implement a search facility for a web app I'm developing. Creating the index seems to be successful - the files created contain the text that I'm ...
 Ramon CashaDec 18, 2012 at 2:15 pmDec 18, 2012 at 3:06 pm
• 5

#### Maven 4.1-SNAPSHOTS not up-to-date

Hi all, I was wanting to use the 4.1 version to access some of the latest improvements, I was hoping to just connect to the maven snapshot repository but it seems that they are not being updated as ...
 Neil IresonDec 11, 2012 at 1:49 pmDec 11, 2012 at 9:29 pm
• 4

#### Lucene 4.0 scalability and performance.

Hi all, We start to evaluate Lucene 4.0 for using in the production environment. This means that we need to index millions of document with TeraBytes of content and search in it. For now we want to ...
 Vitaly_artemovDec 23, 2012 at 11:12 amDec 24, 2012 at 1:30 pm
• 4

#### Using GeohashPrefixTree for map clustering

Hi, I would like to be able to display up to multiple millions of lat/lng points on a map, to make this possible my intention is to plot less than 1000 clusters of points by dividing the world into a ...
 Neil IresonDec 13, 2012 at 7:43 pmDec 19, 2012 at 2:14 am
• 4

#### Explicit setting of NIOFSDirectory not respected

Hi all, I run my code on a cluster where I have to preset resource limits and therefore the processes have limited virtual memory causing OOME when using MMapDirectory on large indexes. This means I ...
 Neil IresonDec 12, 2012 at 10:32 amDec 12, 2012 at 3:14 pm
• 4

#### Deciding how to use reader

Hi! I'm using lucene.net, but I'm sure this question is not platform specific. :) I've created an index for a website which uses a central database server and three front-end servers. For now I've ...
 Lars-Erik AabechDec 10, 2012 at 10:33 amDec 10, 2012 at 12:22 pm
• 4

#### Lucene (4.0), junit, failed to delete _0_nrm.cfs

I am (also) running lucene unit tests. In the teardown-method(@After) I (try to) delete the complete directory-folder. Unfortunately this does not always work. If not, the file _0_nrm.cfs (or _0.fdx) ...
 Clemens Wyss DEVDec 9, 2012 at 4:46 pmDec 9, 2012 at 8:57 pm
• 4

#### Alternative for WildcardQuery with leading *

In order to provide suggestions our query also includes a "WildcardQuery with a leading *", which, of course, has a HUGE performance impact :-( E.g. Say we have indexed "vacancyplan", then if a user ...
 Clemens Wyss DEVDec 7, 2012 at 9:16 amDec 7, 2012 at 12:14 pm
• 4

#### Lucene 4.0, Serialization

I need to send a class containing Lucene elements such as Query over the network using EJB and of course this class need to be serialized. I marked my class as Serializable but it does not seems ...
 BIAGINI NathanDec 4, 2012 at 9:34 amDec 4, 2012 at 1:27 pm
• 3

#### TokenFilter state question

Hello, I'm still trying to figure out some of the nuances of Lucene and I have run into a small issue. I have created my own custom analyzer which uses the WhitespaceTokenizer and chains together the ...
 Jeremy LongDec 26, 2012 at 2:09 pmDec 27, 2012 at 12:13 am
• 3

#### Russiam stemmer?

Hello, I am looking for Russian stemmer. Do Lucene have one as well as documentation on how to use it? Please let me know where can I find the Russian stemmer. Thanks! Dima
 DokondrDec 17, 2012 at 11:19 pmDec 18, 2012 at 12:03 am
• 3

#### How to consume DocValues

I'm using trunk to try out DocValues. Directory directory = new RAMDirectory(); IndexWriterConfig iwConfig = new IndexWriterConfig( Version.LUCENE_41, new StandardAnalyzer(Version.LUCENE_41)) ...
 Varun ThackerDec 12, 2012 at 5:52 pmDec 12, 2012 at 6:19 pm
• 3

#### Lucene 4.0.0 - find term position.

Hi all, I am new with Lucene. I try to understand how can I find the term position. I use following code to index documents: ... IndexWriter writer = new IndexWriter(mIndexDir, mIwc); FileInputStream ...
 Vitaly_artemovDec 6, 2012 at 10:30 amDec 7, 2012 at 4:34 pm
• 2

#### WordDelimiterFilter Question (lucene 4.0)

Hello, I'm having an issue creating a custom analyzer utilizing the WordDelimiterFilter. I'm attempting to create an index of information gleaned from JAR manifest files. So if I have ...
 Jeremy LongDec 23, 2012 at 4:57 pmDec 26, 2012 at 2:15 pm
• 2

#### TokenStream: How to get token text?

Hello, Please, help. I am lost in TokenStream / Token / Analyzer API. I am trying to figure out how to get _token_itself_ or token text while looking at "Invoking the Analyzer" example (see example ...
 DokondrDec 25, 2012 at 6:18 pmDec 25, 2012 at 8:17 pm
• 2

#### how to forcemerge a index library with many segmens to another dir?

Now, i have a index library with 100 segments. Using forcemerge function can merge all segments into a segment. But I also want the newly generated index library which is written in another disk ...
 Hu JingDec 20, 2012 at 1:24 amDec 21, 2012 at 12:52 am
• 2

#### NGramPhraseQuery with missing terms

Hi. I am trying to make a NGramPhrase query that could tolerate terms missing, so even if one of the NGrams doesn't match it still gets picked up by search. I know I could use the combination of ...
 김한규Dec 19, 2012 at 7:36 amDec 20, 2012 at 10:01 am
• 2

#### Looking for example code: Tokenizer + Analyzer for Russian stemming

Hello, I am looking for an example of using Tokenizer + Analyzer (in particular org.apache.lucene.analysis.ru.RussianAnalyzer) for standalone stemming. Can't find such an example here ...
 DokondrDec 18, 2012 at 6:16 pmDec 19, 2012 at 2:22 pm
• 2

#### Lucene-analyzer 3.3.0 and Lucene snowball 3.0.1

Hello all, I beginning with an application and nobody knows with Lucene-analyzer 3.3.0.jar and Lucene snowball 3.0.1.jar are both included Its do same thing ? I how can I be sure that excluding ...
 Adrien RUFFIEDec 17, 2012 at 5:41 pmDec 18, 2012 at 7:01 am
• 2

#### how to get term docs in lucene 4.0

in lucene 3.0，i can get term doc by using indexreader termdocs. how to implement this in lucene 4.0
 Hu JingDec 18, 2012 at 12:19 amDec 18, 2012 at 5:16 am
• 2

Hi, guys: Does queryplugin implementation impacts caching? I have implemented a new query parser which just take the input query string and return my own query object. But the problem is, when i ...
 LukaiDec 17, 2012 at 5:59 amDec 17, 2012 at 7:06 am
• 2

#### precisionStep for days in TrieDate

If I specify a precisionStep of 26 for a TrieDate field, what rough impact should this have on both performance and index size? The input data has time in it, but the milliseconds per day is not ...
 Jack KrupanskyDec 14, 2012 at 10:48 pmDec 14, 2012 at 11:11 pm
• 2

#### Long query optimisation: using some terms for scoring only

Hi all I'm currently benchmarking Lucene to get an understanding of what optimisations are available for long queries, and wanted to check what the recommended approach is. Unsurprisingly a naive ...
 Matthew WillsonDec 11, 2012 at 2:20 pmDec 11, 2012 at 6:14 pm
• 2

#### Stemming and Wildcard - or fire and water

Hello there, my colleague and I ran into an example which didn't return the result size which we were expecting. We discovered that there is a mismatch in handling terms while indexing and ...
 Bayer DennisDec 11, 2012 at 9:52 amDec 11, 2012 at 10:55 am
• 2

#### Delete documents base on more than one condition?

Hi Is it possible to delete a set of documents where they match certain conditions? I would like to delete a set of articles that belong to a given user within a category. Thanks, ----- -- View this ...
 RajashekarDec 6, 2012 at 8:36 amDec 6, 2012 at 9:29 am
• 2

#### Is Analyzer used when calling IndexWriter.addIndexesNoOptimize()?

Lucene version: 3.0.3 Does IndexWriter use the analyzer when adding indexes via addIndexesNoOptimize()? What about for optimize()? I am examining some existing code and trying to determine what ...
 Earl HoodDec 5, 2012 at 7:33 amDec 5, 2012 at 10:58 pm
• 2

#### Using alternative scoring mechanism.

Can one replace the basic scoring algorithm (TF/IDF) for a specific field, to use a different one? I need to compute similarity for NAME field. The regular TF/IDF is not good enough, and I want to ...
 Eyal Ben MeirDec 1, 2012 at 7:16 pmDec 2, 2012 at 4:36 pm
• 1

#### tries and spatial search

I just found out about the blocktree implementation and how it is used to increase the speed of prefix search. Have you tried to use it for spatial search? I will explain to you how i think it will ...
 Apostolis XekoukoulotakisDec 22, 2012 at 3:56 pmJan 18, 2013 at 3:46 pm
• 1

#### Highlighter throws ClassCastException TextFragment cannot be cast to DocumentFragment

We have a section of code that does the text highlight, it was running fine under lucene 1.9.1, but we are getting the following error after upgrade to 3.6.1. It does not seem to be anything that we ...
 Bin LanDec 27, 2012 at 8:40 pmDec 27, 2012 at 9:25 pm
• 1

#### Negotiating string and byteArray in a migrated index

Dear all, We are moving from Lucene 2.3 to 4.1. For the migration, we use the IndexUpgrader class in org.apache.lucene.index of lucene 3.6. And then migrate it to 4.0 using the 4.0 IndexUpgrader ...
 Ramprakash RamamoorthyDec 26, 2012 at 2:52 pmDec 26, 2012 at 2:56 pm
• 1

#### all the documents are not getting indexed in solr 3.6.1

I am trying to index some documents in 3.6.1 using text_ja field and copying it to string field so that I can do an exact match on copy field, but it seems that all the documents are not getting ...
 Khem_99Dec 23, 2012 at 2:46 amDec 26, 2012 at 2:11 pm
• 1

#### Passing information from the token level to the document level

Hi, I want to be able to count the number of times a certain character appears in the page, and then add that number as a Field to the document itself. I've been unable to fund a way to do this ...
 Itai PelegDec 22, 2012 at 11:55 amDec 22, 2012 at 11:55 am
• 1

#### Lucene Indexing on NFS

Hello, I have been getting the following lock error when attempting to open an index writer to add new documents to an index. org.apache.lucene.store.LockObtainFailedException Lock obtain timed out ...
 Bowden WiseDec 19, 2012 at 8:15 pmDec 19, 2012 at 8:49 pm
• 1

#### Some fixes to typos in documentation

Hi, Found some typos. In package description: http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/package-summary.html?is-external=true#package_description In the example code bellow ...
 DokondrDec 18, 2012 at 7:32 pmDec 18, 2012 at 10:40 pm
• 1

#### Lucene 4.0.0 - find offsets for phrase queries

Hi all, I use Lucene 4.0. I try to find offsets for phrase queries. My code works then I search for one word but then I call it for some phrase I didn't get offsets. termsEnum.seekExact returns false ...
 Vitaly_artemovDec 17, 2012 at 4:48 pmDec 18, 2012 at 3:15 pm
• 1

#### How to update one field(not stored) of an document in lucene 4.0 ?

Hi all, I don't know that how to update one field which is not stored of an document in lucene 4.0. Can anybody tell me? Thanks! Cheers, --- Bob ...
 Bo ZhangDec 18, 2012 at 9:15 amDec 18, 2012 at 10:06 am
• 1

#### java-user-subscribe

java-user-subscribe
 DokondrDec 17, 2012 at 10:48 pmDec 17, 2012 at 10:50 pm
• 1

#### Lucene 4.1 tentative release

Hello, Any 'tentative' release date for 4.1 would help. I know it is difficult pointing a date, but still couldn't resist asking, for we could plan accordingly. Thanks in advance. -- With Thanks and ...
 Ramprakash RamamoorthyDec 12, 2012 at 11:51 amDec 12, 2012 at 5:22 pm
• 1

#### Opposite of SpanFirstQuery - Searching for documents by last term in a field

Hi, I wonder if there is a way to use a SpanQuery to find documents with fields that end with a certain term. Kind of the oppoisite of SpanFirstQuery, i.e. "SpanLastQuery", if you want. What I would ...
 Hasenberger, JosefDec 11, 2012 at 3:21 pmDec 11, 2012 at 4:05 pm
• 1

#### porting a cutsom Analyzer from 3.6 -> 4.0

I have a CustomAnalyzer which overrides "public final TokenStream tokenStream ( String fieldName, Reader reader )": @Override public final TokenStream tokenStream ( String fieldName, Reader reader ) ...
 Clemens Wyss DEVDec 9, 2012 at 1:16 pmDec 10, 2012 at 10:03 am
• 1

#### LMDirichletSimilarity for multiple fields

Hi all, I'm implementing an approach of mixture of language models in Lucene 4.0.0. Here is a little math to be precise: The ranking score for query q with t terms: p(q | \theta) = \prod_{t \in q} ...
 Nikita ZhiltsovDec 10, 2012 at 5:36 amDec 10, 2012 at 5:51 am