FAQ
Hi,
I am using Lucene for indexing and searching the documents.
Its working file for supported documents. Now i want to index documents with
unsupported mime types.
Right now i am using LIUS which is built over Lucene for indexing the
documents.

Is there any tool which I can use for indexing the unsupported mime types.
Thanks in advance.
-Gaurav


-----
-Gaurav
--
View this message in context: http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p17983491.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Steven A Rowe at Jun 19, 2008 at 10:30 am
    Hi Gaurav,

    To which mime types are you referring?

    I can't think of a tool designed for this, but one thing you might try is checking whether the input is compressed/packed, and if so first decompressing/unpacking it, and then using the "strings" program (available on Linux and Cygwin) to extract string data.

    Steve
    On 06/18/2008 at 10:07 AM, Gaurav Sharma wrote:

    Hi, I am using Lucene for indexing and searching the documents. Its
    working file for supported documents. Now i want to index documents with
    unsupported mime types. Right now i am using LIUS which is built over
    Lucene for indexing the documents.

    Is there any tool which I can use for indexing the
    unsupported mime types.
    Thanks in advance.
    -Gaurav


    ----- -Gaurav -- View this message in context:
    http://www.nabble.com/indexing-unsupported-mime-types-using-Lu
    cene-tp17983491p17983491.html Sent from the Lucene - Java Users mailing
    list archive at Nabble.com.


    --------------------------------------------------------------------- To
    unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For
    additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Otis Gospodnetic at Jun 19, 2008 at 8:13 pm
    Gaurav, have you tried Tika? (sub-project of Apache Lucene)


    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

    ----- Original Message ----
    From: Gaurav Sharma <gaurav.gash@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Wednesday, June 18, 2008 10:07:22 AM
    Subject: indexing unsupported mime types using Lucene


    Hi,
    I am using Lucene for indexing and searching the documents.
    Its working file for supported documents. Now i want to index documents with
    unsupported mime types.
    Right now i am using LIUS which is built over Lucene for indexing the
    documents.

    Is there any tool which I can use for indexing the unsupported mime types.
    Thanks in advance.
    -Gaurav


    -----
    -Gaurav
    --
    View this message in context:
    http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p17983491.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Gaurav Sharma at Jun 20, 2008 at 6:43 am
    hi Otis

    I haven't tried Tiks?
    Is it open source?

    had u heard about LIUS before or is it talked aroung industry?
    And what about Solr. It seems you worked on Solr and Nutch.

    Otis Gospodnetic wrote:
    Gaurav, have you tried Tika? (sub-project of Apache Lucene)


    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

    ----- Original Message ----
    From: Gaurav Sharma <gaurav.gash@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Wednesday, June 18, 2008 10:07:22 AM
    Subject: indexing unsupported mime types using Lucene


    Hi,
    I am using Lucene for indexing and searching the documents.
    Its working file for supported documents. Now i want to index documents
    with
    unsupported mime types.
    Right now i am using LIUS which is built over Lucene for indexing the
    documents.

    Is there any tool which I can use for indexing the unsupported mime
    types.
    Thanks in advance.
    -Gaurav


    -----
    -Gaurav
    --
    View this message in context:
    http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p17983491.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    -----
    -Gaurav
    --
    View this message in context: http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p18023951.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Otis Gospodnetic at Jun 21, 2008 at 6:10 am
    Gaurav,

    If you go to http://lucene.apache.org/ you will see a Tika tab there. It's OSS. LIUS is either a part of Tika or is about to become a part of it.

    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

    ----- Original Message ----
    From: Gaurav Sharma <gaurav.gash@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, June 20, 2008 2:42:55 AM
    Subject: Re: indexing unsupported mime types using Lucene


    hi Otis

    I haven't tried Tiks?
    Is it open source?

    had u heard about LIUS before or is it talked aroung industry?
    And what about Solr. It seems you worked on Solr and Nutch.

    Otis Gospodnetic wrote:
    Gaurav, have you tried Tika? (sub-project of Apache Lucene)


    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

    ----- Original Message ----
    From: Gaurav Sharma
    To: java-user@lucene.apache.org
    Sent: Wednesday, June 18, 2008 10:07:22 AM
    Subject: indexing unsupported mime types using Lucene


    Hi,
    I am using Lucene for indexing and searching the documents.
    Its working file for supported documents. Now i want to index documents
    with
    unsupported mime types.
    Right now i am using LIUS which is built over Lucene for indexing the
    documents.

    Is there any tool which I can use for indexing the unsupported mime
    types.
    Thanks in advance.
    -Gaurav


    -----
    -Gaurav
    --
    View this message in context:
    http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p17983491.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    -----
    -Gaurav
    --
    View this message in context:
    http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p18023951.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Gaurav Sharma at Jul 4, 2008 at 6:24 am
    Hi,

    I am stuck with one more exception.
    When i am using a wild card such as a* i am getting too many clauses
    exception. It saying maximum clause count is set to 1024. Is there any way
    to increase this count.
    Can u please help me out in overcoming this.

    Thanks in advance.
    -Gaurav



    -----
    -Gaurav
    --
    View this message in context: http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p18273569.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Chris Lu at Jul 4, 2008 at 6:28 am
    This is easy, use:
    BooleanQuery.setMaxClauseCount(4096);

    --
    Chris Lu
    -------------------------
    Instant Scalable Full-Text Search On Any Database/Application
    site: http://www.dbsight.net
    demo: http://search.dbsight.com
    Lucene Database Search in 3 minutes:
    http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
    DBSight customer, a shopping comparison site, (anonymous per request) got
    2.6 Million Euro funding!
    On Thu, Jul 3, 2008 at 11:23 PM, Gaurav Sharma wrote:



    Hi,

    I am stuck with one more exception.
    When i am using a wild card such as a* i am getting too many clauses
    exception. It saying maximum clause count is set to 1024. Is there any way
    to increase this count.
    Can u please help me out in overcoming this.

    Thanks in advance.
    -Gaurav



    -----
    -Gaurav
    --
    View this message in context:
    http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p18273569.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Gaurav Sharma at Jul 4, 2008 at 6:26 am
    Hi,

    I am stuck with an exception in lucene (too many clauses).
    When i am using a wild card such as a* i am getting too many clauses
    exception. It saying maximum clause count is set to 1024. Is there any way
    to increase this count.
    Can u please help me out in overcoming this.

    Thanks in advance.
    -Gaurav

    -----
    -Gaurav
    --
    View this message in context: http://www.nabble.com/too-many-clauses-exception-tp18273582p18273582.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Daniel Naber at Jul 4, 2008 at 5:53 pm

    On Freitag, 4. Juli 2008, Gaurav Sharma wrote:

    I am stuck with an exception in lucene (too many clauses).
    When i am using a wild card such as a* i am getting too many clauses
    exception. It saying maximum clause count is set to 1024. Is there any
    way to increase this count.
    Please see
    http://wiki.apache.org/lucene-java/LuceneFAQ#head-06fafb5d19e786a50fb3dfb8821a6af9f37aa831

    --
    http://www.danielnaber.de

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 18, '08 at 2:08p
activeJul 4, '08 at 5:53p
posts9
users5
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase