FAQ
Has anyone deployed Lucene to index log files? I have seen some articles
about how RackSpace used Lucene and Hadoop for log processing, but I have
not seen any details on the implementation.

To get my required analytics, I think I would need to treat each line of
the Apache log files as a document and I though I would treat each field as
a key word to minimize processing.

Assuming you have clusters operating on independent datasets (so I guess it
would scale linearly) and you want to process Terabytes of logs per day,
is such a solution even feasible?

Thank you,

Jeff Capone


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Otis Gospodnetic at Nov 12, 2008 at 5:19 am
    Yes, I think it is. I think the only catch will be those log timestamps, how fine you really need them to be, and if you want them very fine what happens when you do range queries on timestamps. If you have a pile of log files lying around, it should be pretty easy to get them indexed. You don't even have to write a client for searching the resulting index, just point something like Luke to it, or even Solr.


    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch




    ________________________________
    From: Jeff Capone <jeff_capone@leafnetworks.net>
    To: java-user@lucene.apache.org
    Sent: Monday, November 10, 2008 6:51:20 PM
    Subject: Feasibility question

    Has anyone deployed Lucene to index log files? I have seen some articles
    about how RackSpace used Lucene and Hadoop for log processing, but I have
    not seen any details on the implementation.

    To get my required analytics, I think I would need to treat each line of
    the Apache log files as a document and I though I would treat each field as
    a key word to minimize processing.

    Assuming you have clusters operating on independent datasets (so I guess it
    would scale linearly) and you want to process Terabytes of logs per day,
    is such a solution even feasible?

    Thank you,

    Jeff Capone


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedNov 10, '08 at 11:52p
activeNov 12, '08 at 5:19a
posts2
users2
websitelucene.apache.org

2 users in discussion

Jeff Capone: 1 post Otis Gospodnetic: 1 post

People

Translate

site design / logo © 2022 Grokbase