Grokbase Groups Lucene dev June 2016
FAQ
Hi Mike. I'm writing code for the Altera OpenCL SDK. I have a code base that
gives me a non-Lucene format index. I was wondering in your benchmark what
kind of data do you collect? Do you collect all the position and frequency
data? I'm also curious about what you see as the biggest bottleneck in
creating an index? Is it creating the index from the data or merging the
indexes? Or something else? Do you feel the algorithm is CPU, memory or
disk bound? And finally do you think there is a market for accelerated
indexing? Say I could quadruple the price performance yet still make 100%
Lucene compatible indexes, would people pay for that?





Thanks



Steve

Search Discussions

  • Michael McCandless at Jun 18, 2016 at 9:24 am
    Hi Steve,

    Lucene on OpenCL sounds neat!

    In Lucene's nightly indexing benchmarks (
    http://home.apache.org/~mikemccand/lucenebench/indexing.html) I index an
    export of Wikipedia's english content, including terms, docIDs, term
    frequencies, positions, and also points, doc values, stored fields. The
    full (messy!) source code is in this repository:
    https://github.com/mikemccand/luceneutil.

    Both initial indexing and merging are CPU/IO intensive, but they are very
    amenable to soaking up the hardware's concurrency.

    On whether there's a market, that's beyond my pay grade ;) I just work on
    the bits! Different users care about different things.

    Mike McCandless

    http://blog.mikemccandless.com
    On Fri, Jun 17, 2016 at 6:52 PM, Steve Casselman wrote:

    Hi Mike. I’m writing code for the Altera OpenCL SDK. I have a code base
    that gives me a non-Lucene format index. I was wondering in your benchmark
    what kind of data do you collect? Do you collect all the position and
    frequency data? I’m also curious about what you see as the biggest
    bottleneck in creating an index? Is it creating the index from the data or
    merging the indexes? Or something else? Do you feel the algorithm is CPU,
    memory or disk bound? And finally do you think there is a market for
    accelerated indexing? Say I could quadruple the price performance yet still
    make 100% Lucene compatible indexes, would people pay for that?





    Thanks



    Steve

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedJun 17, '16 at 10:53p
activeJun 18, '16 at 9:24a
posts2
users2
websitelucene.apache.org

People

Translate

site design / logo © 2019 Grokbase