Hi Thanks for your reply.
After gone threw with the site which you given... i understood that
StandardAnalyzer is enough to handle these special characters.
i'm attaching one class called AnalysisDemo.java. By executing that class
i'm able to say the above sentance(i.e StandardAnalyzer is enough).
Here is the out put when i ran the above java file.
Analzying "Vedr. : Amtsgården Århus, Lyseng Allé 1, 8270
Højbjerg"
org.apache.lucene.analysis.WhitespaceAnalyzer:
[Vedr.] [:] [Amtsgården] [Århus,] [Lyseng] [Allé] [1,] [8270] [Højbjerg]
org.apache.lucene.analysis.SimpleAnalyzer:
[vedr] [amtsgården] [århus] [lyseng] [allé] [højbjerg]
org.apache.lucene.analysis.StopAnalyzer:
[vedr] [amtsgården] [århus] [lyseng] [allé] [højbjerg]
org.apache.lucene.analysis.standard.StandardAnalyzer:
[vedr] [amtsgården] [århus] [lyseng] [allé] [1] [8270] [højbjerg]
org.apache.lucene.analysis.snowball.SnowballAnalyzer:
[vedr] [amtsgården] [århus] [lyseng] [allé] [1] [8270] [højbjerg]
By the above out put we can say that StandardAnalyzer is enough to get rid
of danish elements.
But only problem is when i'm searching the any term which includes the
danish elements(like højbjerg...)
it is unable to find out.
Even i checked with LUKE. In that i given my sample text which contains the
danish elements and selected the StandardAnalyzer as analyser. when i click
analyze in that it cleary making index of danish words.
and also i givne one try on luke by loading my index directory in to luke.
after loading my index i searched for a word which contains the danish
element, But this time it was failed. It was shown nothing(i.e o resluts).
As in my sense the problem might be making the indexes or in searching the
item.
I gone threw the site which you given. From that i'm able to do this kind of
reaserch work.
Please help me in this.
Erick Erickson wrote:
OK, this is a much different problem than you were originally
asking about, effectively "how to index/search mixed language
documents".
This topic has been discussed multiple times on the user list, I
think your first step should be to search the archive. I *was*
going to find the old searchable mail archive, but those clever folks
at Lucid Imagination have something new, see:
http://www.lucidimagination.com/search/p:lucene?q=multiple+languagesOnce you've had a chance to look that over I think you'll be off and
running.
Best
Erick
On Thu, Apr 23, 2009 at 1:43 AM, uday kumar maddigatla
wrote:
HI
Here are the details about my goals.
1. I want to use this lucene for mixed languages.
2. I want to make indexes of the documents which are either english or
danish etc.
I'm attaching my IndexFiles.java file.
When i'm searching i'm giving the index path location as well as
doucmets
folder.
If i use StandardAnalyzer as an argument to IndexWriter's method it is
able
to search the english characters.
How can i use DutchAnalyzer in order to make this IndexFiles.java to
index
the danish elements.
In my Code which i attached, you can see 'C:\test3'. This is my location
where i want to store my indexes.
I'm giving documents folder location as comand line argument.
In my document the content will be like this
<com:Note><![CDATA[Kreditnota til udligning af faktura nr. 13927 pga skal
opsplittes
hhv. byggeplads og skat
Vedr. : Amtsgården Århus, Lyseng Allé 1, 8270 Højbjerg
Bygning B
SES Journal nr. : 42895-0001
SES Navision nr.: Navision 9800124
SES Ansvarlig : Martin Krøldrup Nielsen
SES rådgiver : Friis & Moltke A/S
Hermed fremsendes faktura på ekstra tømrerarbejde.
Byggeplads Amtsgården B-4
jvf. vedlagte specifikation - aftaleseddel nr. 12.]]></com:Note>
i"m searching the word like rådgiver . When i see the result it is
clearly
searching for r dgiver. It is omitting the danish element.
Please help me in this.
Erick Erickson wrote:
Are you *also* using the DutchAnalyzer for your *query*?
Please show us the index and search code (simplified as much
as possible), then we'll be able to provide better suggestions.
Also, tell us a bit more about your goals here. Is this an
index entirely of Dutch documents? Or is it a mixed-language
index?
Think about getting a copy of Luke and
1> examining your index to see what's *really* there
2> examining the effects of using different parsers on
your *query*.
Best
Erick
On Wed, Apr 22, 2009 at 2:57 AM, uday kumar maddigatla
wrote:
Hi
Thanks for your reply.
I'm able to see the DutchAnalyzer.
When i'm indexing my documents i given instace of DutchAnalyzer as an
argument to IndexWriter Class.
After this when i search for the
http://www.nabble.com/file/p23170710/IndexFiles.java IndexFiles.java
contains the danish elements .. Still it is not able to identify.
Please tell me how to use DutchAnalzer in my application. Sample
example
or
series of steps helps me.
I also attached my index file(.java file).
Please help me in this. please..
Erick Erickson wrote:
Take a look at DutchAnalyzer. The problem you'll have is if you're
indexing
this document along with a bunch of documents from other languages.
You could search the mail archive for extensive discussions of indexing/
searching documents from several languages.
Best
Erick
On Tue, Apr 21, 2009 at 2:40 AM, Uday Kumar Maddigatla
wrote:
HI,
I'm new to the lucene. I downloaded lucene 2.4.1.
I have one xml file which contains few special characters like 'å',
'ø,'
°'
etc.(these are Danish language elements).
How can I search these things.
Uday Kumar Reddy Maddigatla
Software Engineer(Progrator|gatetrade)
MACH India(Operations)
Mobile: + 91-9963000377
[email protected] > >> >>
[email protected] > >> >>
www.ness.com
--
View this message in context:
http://www.nabble.com/How-to-search-special-characters-in-LUcene-tp23150039p23170710.htmlSent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]For additional commands, e-mail:
[email protected] http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java
http://www.nabble.com/file/p23190583/SearchFiles.java SearchFiles.java
http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java
http://www.nabble.com/file/p23190583/IndexFiles.java IndexFiles.java
--
View this message in context:
http://www.nabble.com/How-to-search-special-characters-in-LUcene-tp23150039p23190583.htmlSent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]For additional commands, e-mail:
[email protected] http://www.nabble.com/file/p23211629/AnalysisDemo.java AnalysisDemo.java