FAQ
Thinking out loud,

SSD is pretty close to RAM when it comes to seeking. Wouldn't that
mean that a bitset stored on an SSD would be more or less as fast as a
bitset in RAM? So how about storing all permutations of filters one
use on SSD? Perhaps loading them to RAM in case they are frequently
used? To me it sounds like a great idea.

Not sure if one should focus at OpenBitSet or a fixed size BitSet, I'd
really need to do some real tests to tell. Still, I'm rather convinced
the bang for the buck ratio is quite a bit more using SSD than RAM
given IO throughput (compare an index in RAM vs on SSD vs on HDD)
isn't an issue.

The only real issue I can this of is the lack of
DocSetIterator#close()..



karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

  • Michael McCandless at Jan 9, 2009 at 7:39 pm
    While SSDs are delightfully fast compared to mechanical drives, I
    think they are still quite a bit slower than RAM for truly random
    access.

    EG Intel's X25-E (apparently the leader at the moment) lists a 75us
    read latency, whereas RAM latency is maybe 50-100 ns.

    Though since Lucene accesses the filter via sequential scan, the read
    latency may not matter (unless the file is heavily fragmented, in
    which case lots of "seeks" are being done as you scan through it).
    And if you have enough RAM, the OS will cache the files in its IO
    cache anyway.

    I do think as "we" switch over to SSDs, it will change how we optimize
    Lucene. EG I think suddenly the CPU cost of searching will matter
    much more, the ability to make use of concurrency while searching will
    be important, etc.

    Mike

    Karl Wettin wrote:
    Thinking out loud,

    SSD is pretty close to RAM when it comes to seeking. Wouldn't that
    mean that a bitset stored on an SSD would be more or less as fast as
    a bitset in RAM? So how about storing all permutations of filters
    one use on SSD? Perhaps loading them to RAM in case they are
    frequently used? To me it sounds like a great idea.

    Not sure if one should focus at OpenBitSet or a fixed size BitSet,
    I'd really need to do some real tests to tell. Still, I'm rather
    convinced the bang for the buck ratio is quite a bit more using SSD
    than RAM given IO throughput (compare an index in RAM vs on SSD vs
    on HDD) isn't an issue.

    The only real issue I can this of is the lack of
    DocSetIterator#close()..



    karl

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Marvin Humphrey at Jan 9, 2009 at 9:30 pm

    On Fri, Jan 09, 2009 at 08:11:31PM +0100, Karl Wettin wrote:

    SSD is pretty close to RAM when it comes to seeking. Wouldn't that
    mean that a bitset stored on an SSD would be more or less as fast as a
    bitset in RAM?
    Provided that your index can fit in the system i/o cache and stay there, you
    get the speed of RAM regardless of the underlying permanent storage type.
    There's no reason to wait on SSDs before implementing such a feature.

    One thing we've contemplated in Lucy/KS is a FilterWriter, which would write
    out cached bitsets at index time. Adding that on would look like something
    this:

    public class MyArchitecture extends Architecture {
    public ArrayList<SegDataWriter> segDataWriters(InvIndex invindex,
    Segment segment) {
    ArrayList<SegDataWriter> writers
    = super.segDataWriters(invindex, segment);
    writers.add(new FilterWriter(invindex, segment));
    return writers;
    }
    }
    public class MySchema extends Schema {
    public Architecture architecture() { return new MyArchitecture(); }
    public MySchema() {
    TextField textFieldSpec = new TextField(new PolyAnalyzer("en"));
    specField("title", textFieldSpec);
    specField("content", textFieldSpec);
    }
    }

    IndexWriter writer = new IndexWriter(new MySchema().open("/path/to/index"));

    This isn't quite the same thing, because I believe you're talking about
    adaptively caching filters on the fly at search time. However, I expect this
    to work quite well when a finite set of filters is known in advance, e.g. for
    faceting categories.

    Marvin Humphrey



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert engels at Jan 9, 2009 at 9:43 pm
    If your index can fit in the IO cache, you should using a completely
    different implementation...

    You should be writing a sequential transaction log for add/update/
    delete operations, and storing the entire index in memory
    (RAMDirectory) - with periodic background flushes of the log.

    If you are running multiple processes (in KS), who is invoking them
    (inetd or similar?), if not, and users are on the system, you can't
    control what will happen with the IO cache...

    If you want performance use a server based implementation.

    If you don't care about performance, then performance is not an
    issue, so use the simplest approach (which is probably the current
    implementation).

    Wasting time and resources trying to make the current implementation
    "better" (and more complex) to accommodate a poor design is just a
    waste of time and resources.

    On Jan 9, 2009, at 3:30 PM, Marvin Humphrey wrote:
    On Fri, Jan 09, 2009 at 08:11:31PM +0100, Karl Wettin wrote:

    SSD is pretty close to RAM when it comes to seeking. Wouldn't that
    mean that a bitset stored on an SSD would be more or less as fast
    as a
    bitset in RAM?
    Provided that your index can fit in the system i/o cache and stay
    there, you
    get the speed of RAM regardless of the underlying permanent storage
    type.
    There's no reason to wait on SSDs before implementing such a feature.

    One thing we've contemplated in Lucy/KS is a FilterWriter, which
    would write
    out cached bitsets at index time. Adding that on would look like
    something
    this:

    public class MyArchitecture extends Architecture {
    public ArrayList<SegDataWriter> segDataWriters(InvIndex invindex,
    Segment segment) {
    ArrayList<SegDataWriter> writers
    = super.segDataWriters(invindex, segment);
    writers.add(new FilterWriter(invindex, segment));
    return writers;
    }
    }
    public class MySchema extends Schema {
    public Architecture architecture() { return new MyArchitecture
    (); }
    public MySchema() {
    TextField textFieldSpec = new TextField(new PolyAnalyzer("en"));
    specField("title", textFieldSpec);
    specField("content", textFieldSpec);
    }
    }

    IndexWriter writer = new IndexWriter(new MySchema().open("/path/
    to/index"));

    This isn't quite the same thing, because I believe you're talking
    about
    adaptively caching filters on the fly at search time. However, I
    expect this
    to work quite well when a finite set of filters is known in
    advance, e.g. for
    faceting categories.

    Marvin Humphrey



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Marvin Humphrey at Jan 9, 2009 at 11:02 pm

    On Fri, Jan 09, 2009 at 03:42:35PM -0600, robert engels wrote:
    If your index can fit in the IO cache, you should using a completely
    different implementation...

    You should be writing a sequential transaction log for add/update/
    delete operations, and storing the entire index in memory
    (RAMDirectory) - with periodic background flushes of the log.
    That'll work too.
    If you are running multiple processes (in KS), who is invoking them
    (inetd or similar?), if not, and users are on the system, you can't
    control what will happen with the IO cache...
    See LUCENE-1458.

    Marvin Humphrey


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert engels at Jan 9, 2009 at 11:24 pm
    I have better things to do than read a 10,000 word incident that
    discusses about 100 different topics under the generic heading
    "Further steps towards flexible indexing" in order to answer a simple
    question.

    You are a moron. And I don't mean that in a offensive way - I am
    using the secondary definition.

    Main Entry: mo·ron 
    Pronunciation:\ˈmȯr-ˌän\
    Function:noun
    Etymology: irregular from Greek mōros foolish, stupid
    Date: 1910
    1usually offensive : a mildly mentally retarded person
    2: a very stupid person
    On Jan 9, 2009, at 5:02 PM, Marvin Humphrey wrote:
    On Fri, Jan 09, 2009 at 03:42:35PM -0600, robert engels wrote:
    If your index can fit in the IO cache, you should using a completely
    different implementation...

    You should be writing a sequential transaction log for add/update/
    delete operations, and storing the entire index in memory
    (RAMDirectory) - with periodic background flushes of the log.
    That'll work too.
    If you are running multiple processes (in KS), who is invoking them
    (inetd or similar?), if not, and users are on the system, you can't
    control what will happen with the IO cache...
    See LUCENE-1458.

    Marvin Humphrey


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Doug Cutting at Jan 9, 2009 at 11:56 pm

    robert engels wrote:
    You are a moron. And I don't mean that in a offensive way - I am using
    the secondary definition.

    *2**:* a very stupid person
    That's still offensive and totally unacceptable here. Please refrain
    from making ad-hominem remarks and stick to discussing the issues.

    Thanks,

    Doug

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert engels at Jan 10, 2009 at 12:05 am
    Can something be offensive if its a statement of fact ? If you
    believe it is (under definition #3), then his remarks to me were just
    as offensive - as they caused me much displeasure and resentment. So
    please dress him down as well.

    Main Entry: 1of·fen·sive  
    Pronunciation: \ə-ˈfen(t)-siv, especially for 1 ˈä-ˌfen(t)-,
    ˈȯ-\
    Function: adjective
    Date: circa 1564
    1 a: making attack : aggressive b: of, relating to, or designed for
    attack <offensive weapons> c: of or relating to an attempt to score
    in a game or contest ; also : of or relating to a team in possession
    of the ball or puck
    2: giving painful or unpleasant sensations : nauseous , obnoxious <an
    offensive odor>
    3: causing displeasure or resentment <offensive remarks>
    — of·fen·sive·ly adverb
    — of·fen·sive·ness noun
    On Jan 9, 2009, at 5:55 PM, Doug Cutting wrote:

    robert engels wrote:
    You are a moron. And I don't mean that in a offensive way - I am
    using the secondary definition.
    *2**:* a very stupid person
    That's still offensive and totally unacceptable here. Please
    refrain from making ad-hominem remarks and stick to discussing the
    issues.

    Thanks,

    Doug

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Doug Cutting at Jan 10, 2009 at 12:35 am

    robert engels wrote:
    Can something be offensive if its a statement of fact ? If you believe
    it is (under definition #3), then his remarks to me were just as
    offensive - as they caused me much displeasure and resentment. So please
    dress him down as well.
    His comments were on-topic. The topic of this list is Lucene, not its
    contributors or users. As I said before, ad hominem remarks are not
    acceptable. We do not demand that everyone equally understand every
    issue, otherwise there would be little to discuss.

    Some folks may find some things offensive. There's often little we can
    do about that. It's certainly no excuse to start making ad hominem
    statements. An important technique here is to respond to things you
    find offensive in a polite manner, or not at all.

    Thanks,

    Doug

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert engels at Jan 10, 2009 at 1:19 am
    It was not ad hominem. It was a indirect critique of the value of the
    answer provided.

    Ad hominem would be if I called him ugly.
    On Jan 9, 2009, at 6:34 PM, Doug Cutting wrote:

    robert engels wrote:
    Can something be offensive if its a statement of fact ? If you
    believe
    it is (under definition #3), then his remarks to me were just as
    offensive - as they caused me much displeasure and resentment. So
    please dress him down as well.
    His comments were on-topic. The topic of this list is Lucene, not
    its contributors or users. As I said before, ad hominem remarks
    are not acceptable. We do not demand that everyone equally
    understand every issue, otherwise there would be little to discuss.

    Some folks may find some things offensive. There's often little we
    can do about that. It's certainly no excuse to start making ad
    hominem statements. An important technique here is to respond to
    things you find offensive in a polite manner, or not at all.

    Thanks,

    Doug

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Ian Holsman at Jan 10, 2009 at 12:14 am
    Robert.

    * no one is forcing you to be on this mailing list.

    * next time you look for a job, and your prospective employer 'googles'
    you, they are going to find this anti-social behavior. "playing well
    with others" is usually a key employment criteria people look for. (as
    well as being super-smart like you are).

    Other people have asked you politely to tone it down, but you persist on
    showing how stupid most of the people on this list are.

    Why not leave the stupid people alone (they have already shown they
    don't understand your finer points) , and go somewhere where you are
    more appreciated, or better yet.. prove them wrong and build it your
    way. I'm sure you convince your peers who are forcing you to use such a
    feeble-minded project that your approach would work better for them.

    I've even created a space for you on Google-Code for you to show them:-
    http://code.google.com/p/roberts-search/

    Sincerely
    Ian.


    robert engels wrote:
    I have better things to do than read a 10,000 word incident that
    discusses about 100 different topics under the generic heading
    "Further steps towards flexible indexing" in order to answer a simple
    question.

    You are a moron. And I don't mean that in a offensive way - I am
    using the secondary definition.

    Main Entry: *mo·ron *
    Pronunciation:\ˈmȯr-ˌän\
    Function:noun
    Etymology: irregular from Greek /mōros/ foolish, stupid
    Date: 1910
    *1*/usually offensive/ *:* a mildly mentally retarded person
    *2**:* a very stupid person
    On Jan 9, 2009, at 5:02 PM, Marvin Humphrey wrote:
    On Fri, Jan 09, 2009 at 03:42:35PM -0600, robert engels wrote:
    If your index can fit in the IO cache, you should using a completely
    different implementation...

    You should be writing a sequential transaction log for add/update/
    delete operations, and storing the entire index in memory
    (RAMDirectory) - with periodic background flushes of the log.
    That'll work too.
    If you are running multiple processes (in KS), who is invoking them
    (inetd or similar?), if not, and users are on the system, you can't
    control what will happen with the IO cache...
    See LUCENE-1458.

    Marvin Humphrey


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert engels at Jan 10, 2009 at 1:07 am
    Your exactly right. Playing well with others has trumped actual
    production and quality. You can see the mess that's gotten us in all
    sorts of areas.

    Luckily there are entrepreneurs and other managers/owners that value
    quality first, and let feelings get repaired over beers or not at all.

    Your approach is exactly what destroyed the Columbia and their lives
    - politics over substance.

    All of the half-baked "engineering" that is discussed on this list is
    a joke. This list is a place people go to have their egos stroked -
    regardless of what they're saying actually makes any sense.

    You should work in Germany sometime - they would laugh at you for
    your attitude !
    On Jan 9, 2009, at 6:13 PM, Ian Holsman wrote:

    Robert.

    * no one is forcing you to be on this mailing list.

    * next time you look for a job, and your prospective employer
    'googles' you, they are going to find this anti-social behavior.
    "playing well with others" is usually a key employment criteria
    people look for. (as well as being super-smart like you are).

    Other people have asked you politely to tone it down, but you
    persist on showing how stupid most of the people on this list are.

    Why not leave the stupid people alone (they have already shown they
    don't understand your finer points) , and go somewhere where you
    are more appreciated, or better yet.. prove them wrong and build it
    your way. I'm sure you convince your peers who are forcing you to
    use such a feeble-minded project that your approach would work
    better for them.

    I've even created a space for you on Google-Code for you to show
    them:- http://code.google.com/p/roberts-search/

    Sincerely
    Ian.


    robert engels wrote:
    I have better things to do than read a 10,000 word incident that
    discusses about 100 different topics under the generic heading
    "Further steps towards flexible indexing" in order to answer a
    simple question.

    You are a moron. And I don't mean that in a offensive way - I am
    using the secondary definition.

    Main Entry: *mo·ron *
    Pronunciation:\ˈmȯr-ˌän\
    Function:noun
    Etymology: irregular from Greek /mōros/ foolish, stupid
    Date: 1910
    *1*/usually offensive/ *:* a mildly mentally retarded person
    *2**:* a very stupid person
    On Jan 9, 2009, at 5:02 PM, Marvin Humphrey wrote:
    On Fri, Jan 09, 2009 at 03:42:35PM -0600, robert engels wrote:
    If your index can fit in the IO cache, you should using a
    completely different implementation...

    You should be writing a sequential transaction log for add/
    update/ delete operations, and storing the entire index in
    memory (RAMDirectory) - with periodic background flushes of the
    log.
    That'll work too.
    If you are running multiple processes (in KS), who is invoking
    them (inetd or similar?), if not, and users are on the system,
    you can't control what will happen with the IO cache...
    See LUCENE-1458.

    Marvin Humphrey


    --------------------------------------------------------------------
    -
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Grant Ingersoll at Jan 10, 2009 at 1:54 pm

    On Jan 9, 2009, at 8:06 PM, robert engels wrote:



    Luckily there are entrepreneurs and other managers/owners that value
    quality first, and let feelings get repaired over beers or not at all.
    Sure, but let me ask you, do you like working with those people who
    are jerks all the time? AFAICT, Lucene has done a pretty good job of
    creating something of quality AND doing it in a civil, respectful way,
    otherwise I just don't think we would still all be here. Why can't we
    have both? You talk like the only way it is possible to achieve
    something is by stepping on others.

    I would suggest everyone involved remove feelings from all of this and
    just focus on discussing the pros and cons of the subject at hand
    without the need to insult each other anytime you disagree. In the
    end, all involved would likely do better simply by showing their ideas
    out in code, i.e. _real_ patches.

    Your approach is exactly what destroyed the Columbia and their lives
    - politics over substance.
    Please don't compare what we do in a mission non-critical open source
    project for searching files, databases and other minutiae with the
    massive tragedy that was Columbia. They are not even close to being
    on the same level and it is so completely disrespectful to those that
    lost their lives in that tragedy.


    -Grant

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert engels at Jan 10, 2009 at 5:58 pm
    You are completely off-base in regards to my Columbia reference.

    It is sorrowful when anyone dies (others would dispute this,
    executions of murderers, etc.), but people die all the time - it
    doesn't make it a tragedy.

    What makes the Columbia truly a tragedy is that they died due to
    politics, of no fault of their own. They entrusted their lives to
    the engineers, and were deceived.

    The same can be said when politics is the overriding factor in our
    intelligence agencies. You can also see where that got us, and the
    lives that have been lost.

    Illustrating the serious problems that can occur when politics wins
    over substance is not disrespectful - it is prudent, and you are a
    disrespectful ass for even suggesting it was. Either that, or you do
    not enough enough command of the English language to understand I was
    not equating the two, but using one as an extreme example of the
    result of the underlying problem.

    That Lucene isn't used is mission critical applications, doesn't make
    the offense of politics over substance any less problematic.

    You sound like some unrealistic PC idiot - maybe in your next message
    you'll some how play the race card in order to "quiet me". Please.

    And personally, I would rather work with a jerk that made my work
    life/day easier because his work product removed a burden on me, or
    made the company so successful that I earned a better compensation,
    than work with I guy that I had to either explain everything 3 times,
    or redo his crap all the time. In fact, I would work hard to see that
    the latter did not work there very long.
    On Jan 10, 2009, at 7:53 AM, Grant Ingersoll wrote:


    On Jan 9, 2009, at 8:06 PM, robert engels wrote:


    Luckily there are entrepreneurs and other managers/owners that
    value quality first, and let feelings get repaired over beers or
    not at all.
    Sure, but let me ask you, do you like working with those people who
    are jerks all the time? AFAICT, Lucene has done a pretty good job
    of creating something of quality AND doing it in a civil,
    respectful way, otherwise I just don't think we would still all be
    here. Why can't we have both? You talk like the only way it is
    possible to achieve something is by stepping on others.

    I would suggest everyone involved remove feelings from all of this
    and just focus on discussing the pros and cons of the subject at
    hand without the need to insult each other anytime you disagree.
    In the end, all involved would likely do better simply by showing
    their ideas out in code, i.e. _real_ patches.

    Your approach is exactly what destroyed the Columbia and their
    lives - politics over substance.
    Please don't compare what we do in a mission non-critical open
    source project for searching files, databases and other minutiae
    with the massive tragedy that was Columbia. They are not even
    close to being on the same level and it is so completely
    disrespectful to those that lost their lives in that tragedy.


    -Grant

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert engels at Jan 10, 2009 at 6:05 pm
    Also, ideally my coworker would be both. But given that people are of
    differing ability levels, my coworker has a problem. If he is smarter
    than me, wasting his time explaining things over and over to me does
    little good - unless I take the time to learn from it - and that is
    not always possible - different people have different capacities.

    And if the shoe is on the other foot, it wastes my time, unless the
    person being talked to has demonstrated a willingness and ability to
    learn.

    You can't get blood from a turnip.
    On Jan 10, 2009, at 11:57 AM, robert engels wrote:

    You are completely off-base in regards to my Columbia reference.

    It is sorrowful when anyone dies (others would dispute this,
    executions of murderers, etc.), but people die all the time - it
    doesn't make it a tragedy.

    What makes the Columbia truly a tragedy is that they died due to
    politics, of no fault of their own. They entrusted their lives to
    the engineers, and were deceived.

    The same can be said when politics is the overriding factor in our
    intelligence agencies. You can also see where that got us, and the
    lives that have been lost.

    Illustrating the serious problems that can occur when politics wins
    over substance is not disrespectful - it is prudent, and you are a
    disrespectful ass for even suggesting it was. Either that, or you
    do not enough enough command of the English language to understand
    I was not equating the two, but using one as an extreme example of
    the result of the underlying problem.

    That Lucene isn't used is mission critical applications, doesn't
    make the offense of politics over substance any less problematic.

    You sound like some unrealistic PC idiot - maybe in your next
    message you'll some how play the race card in order to "quiet me".
    Please.

    And personally, I would rather work with a jerk that made my work
    life/day easier because his work product removed a burden on me, or
    made the company so successful that I earned a better compensation,
    than work with I guy that I had to either explain everything 3
    times, or redo his crap all the time. In fact, I would work hard to
    see that the latter did not work there very long.
    On Jan 10, 2009, at 7:53 AM, Grant Ingersoll wrote:


    On Jan 9, 2009, at 8:06 PM, robert engels wrote:


    Luckily there are entrepreneurs and other managers/owners that
    value quality first, and let feelings get repaired over beers or
    not at all.
    Sure, but let me ask you, do you like working with those people
    who are jerks all the time? AFAICT, Lucene has done a pretty good
    job of creating something of quality AND doing it in a civil,
    respectful way, otherwise I just don't think we would still all be
    here. Why can't we have both? You talk like the only way it is
    possible to achieve something is by stepping on others.

    I would suggest everyone involved remove feelings from all of this
    and just focus on discussing the pros and cons of the subject at
    hand without the need to insult each other anytime you disagree.
    In the end, all involved would likely do better simply by showing
    their ideas out in code, i.e. _real_ patches.

    Your approach is exactly what destroyed the Columbia and their
    lives - politics over substance.
    Please don't compare what we do in a mission non-critical open
    source project for searching files, databases and other minutiae
    with the massive tragedy that was Columbia. They are not even
    close to being on the same level and it is so completely
    disrespectful to those that lost their lives in that tragedy.


    -Grant

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Grant Ingersoll at Jan 10, 2009 at 9:07 pm
    Robert, if you wish to continue on this list I suggest you stop.
    Either contribute peacefully and positively to this list or you will
    be removed. We've all had it with your name calling and constant
    derision backed up by a complete lack of substance in actually doing
    any of the work, all while clearly benefiting from what Lucene has to
    offer. The fact that the community has tolerated your drivel for so
    long is a tremendous credit to this community.

    On Jan 10, 2009, at 1:04 PM, robert engels wrote:

    Also, ideally my coworker would be both. But given that people are
    of differing ability levels, my coworker has a problem. If he is
    smarter than me, wasting his time explaining things over and over to
    me does little good - unless I take the time to learn from it - and
    that is not always possible - different people have different
    capacities.

    And if the shoe is on the other foot, it wastes my time, unless the
    person being talked to has demonstrated a willingness and ability to
    learn.

    You can't get blood from a turnip.
    On Jan 10, 2009, at 11:57 AM, robert engels wrote:

    You are completely off-base in regards to my Columbia reference.

    It is sorrowful when anyone dies (others would dispute this,
    executions of murderers, etc.), but people die all the time - it
    doesn't make it a tragedy.

    What makes the Columbia truly a tragedy is that they died due to
    politics, of no fault of their own. They entrusted their lives to
    the engineers, and were deceived.

    The same can be said when politics is the overriding factor in our
    intelligence agencies. You can also see where that got us, and the
    lives that have been lost.

    Illustrating the serious problems that can occur when politics wins
    over substance is not disrespectful - it is prudent, and you are a
    disrespectful ass for even suggesting it was. Either that, or you
    do not enough enough command of the English language to understand
    I was not equating the two, but using one as an extreme example of
    the result of the underlying problem.

    That Lucene isn't used is mission critical applications, doesn't
    make the offense of politics over substance any less problematic.

    You sound like some unrealistic PC idiot - maybe in your next
    message you'll some how play the race card in order to "quiet me".
    Please.

    And personally, I would rather work with a jerk that made my work
    life/day easier because his work product removed a burden on me, or
    made the company so successful that I earned a better compensation,
    than work with I guy that I had to either explain everything 3
    times, or redo his crap all the time. In fact, I would work hard to
    see that the latter did not work there very long.
    On Jan 10, 2009, at 7:53 AM, Grant Ingersoll wrote:


    On Jan 9, 2009, at 8:06 PM, robert engels wrote:


    Luckily there are entrepreneurs and other managers/owners that
    value quality first, and let feelings get repaired over beers or
    not at all.
    Sure, but let me ask you, do you like working with those people
    who are jerks all the time? AFAICT, Lucene has done a pretty good
    job of creating something of quality AND doing it in a civil,
    respectful way, otherwise I just don't think we would still all be
    here. Why can't we have both? You talk like the only way it is
    possible to achieve something is by stepping on others.

    I would suggest everyone involved remove feelings from all of this
    and just focus on discussing the pros and cons of the subject at
    hand without the need to insult each other anytime you disagree.
    In the end, all involved would likely do better simply by showing
    their ideas out in code, i.e. _real_ patches.

    Your approach is exactly what destroyed the Columbia and their
    lives - politics over substance.
    Please don't compare what we do in a mission non-critical open
    source project for searching files, databases and other minutiae
    with the massive tragedy that was Columbia. They are not even
    close to being on the same level and it is so completely
    disrespectful to those that lost their lives in that tragedy.


    -Grant

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Robert engels at Jan 10, 2009 at 9:18 pm
    How is your calling me an ignorant, insensitive, disrespectful person
    any different?

    Sorry, I will always stand up for myself.

    Your complaint regarding the Columbia reference was completely out of
    line, and it shows your true nature, and lack of understanding.

    Now shut the fuck up.
    On Jan 10, 2009, at 3:06 PM, Grant Ingersoll wrote:

    Robert, if you wish to continue on this list I suggest you stop.
    Either contribute peacefully and positively to this list or you
    will be removed. We've all had it with your name calling and
    constant derision backed up by a complete lack of substance in
    actually doing any of the work, all while clearly benefiting from
    what Lucene has to offer. The fact that the community has
    tolerated your drivel for so long is a tremendous credit to this
    community.

    On Jan 10, 2009, at 1:04 PM, robert engels wrote:

    Also, ideally my coworker would be both. But given that people are
    of differing ability levels, my coworker has a problem. If he is
    smarter than me, wasting his time explaining things over and over
    to me does little good - unless I take the time to learn from it -
    and that is not always possible - different people have different
    capacities.

    And if the shoe is on the other foot, it wastes my time, unless
    the person being talked to has demonstrated a willingness and
    ability to learn.

    You can't get blood from a turnip.
    On Jan 10, 2009, at 11:57 AM, robert engels wrote:

    You are completely off-base in regards to my Columbia reference.

    It is sorrowful when anyone dies (others would dispute this,
    executions of murderers, etc.), but people die all the time - it
    doesn't make it a tragedy.

    What makes the Columbia truly a tragedy is that they died due to
    politics, of no fault of their own. They entrusted their lives
    to the engineers, and were deceived.

    The same can be said when politics is the overriding factor in
    our intelligence agencies. You can also see where that got us,
    and the lives that have been lost.

    Illustrating the serious problems that can occur when politics
    wins over substance is not disrespectful - it is prudent, and you
    are a disrespectful ass for even suggesting it was. Either that,
    or you do not enough enough command of the English language to
    understand I was not equating the two, but using one as an
    extreme example of the result of the underlying problem.

    That Lucene isn't used is mission critical applications, doesn't
    make the offense of politics over substance any less problematic.

    You sound like some unrealistic PC idiot - maybe in your next
    message you'll some how play the race card in order to "quiet
    me". Please.

    And personally, I would rather work with a jerk that made my work
    life/day easier because his work product removed a burden on me,
    or made the company so successful that I earned a better
    compensation, than work with I guy that I had to either explain
    everything 3 times, or redo his crap all the time. In fact, I
    would work hard to see that the latter did not work there very long.
    On Jan 10, 2009, at 7:53 AM, Grant Ingersoll wrote:


    On Jan 9, 2009, at 8:06 PM, robert engels wrote:


    Luckily there are entrepreneurs and other managers/owners that
    value quality first, and let feelings get repaired over beers
    or not at all.
    Sure, but let me ask you, do you like working with those people
    who are jerks all the time? AFAICT, Lucene has done a pretty
    good job of creating something of quality AND doing it in a
    civil, respectful way, otherwise I just don't think we would
    still all be here. Why can't we have both? You talk like the
    only way it is possible to achieve something is by stepping on
    others.

    I would suggest everyone involved remove feelings from all of
    this and just focus on discussing the pros and cons of the
    subject at hand without the need to insult each other anytime
    you disagree. In the end, all involved would likely do better
    simply by showing their ideas out in code, i.e. _real_ patches.

    Your approach is exactly what destroyed the Columbia and their
    lives - politics over substance.
    Please don't compare what we do in a mission non-critical open
    source project for searching files, databases and other minutiae
    with the massive tragedy that was Columbia. They are not even
    close to being on the same level and it is so completely
    disrespectful to those that lost their lives in that tragedy.


    -Grant

    -------------------------------------------------------------------
    --
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    --------------------------------------------------------------------
    -
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Yonik Seeley at Jan 10, 2009 at 9:20 pm
    Can we please let this thread die.
    -Yonik

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Elschot at Jan 18, 2009 at 10:52 pm

    On Friday 09 January 2009 22:30:14 Marvin Humphrey wrote:
    On Fri, Jan 09, 2009 at 08:11:31PM +0100, Karl Wettin wrote:

    SSD is pretty close to RAM when it comes to seeking. Wouldn't that
    mean that a bitset stored on an SSD would be more or less as fast as a
    bitset in RAM?
    Provided that your index can fit in the system i/o cache and stay there, you
    get the speed of RAM regardless of the underlying permanent storage type.
    There's no reason to wait on SSDs before implementing such a feature.
    Since this started by thinking out loud, I'd like to continue doing that.

    I've been thinking about how to add a decent skipTo() to something that
    compresses better than an (Open)BitSet, and this turns out to be an
    integer set implemented as a B plus tree (all leafs on the same level) of
    only integers with key/data compression by a frame of reference for
    every node (see LUCENE-1410).

    I found a java implementation for a B plus tree on sourceforge: BpLusDotNet
    in the BplusJ package, see http://bplusdotnet.sourceforge.net/ .
    This has nice transaction semantics on a file system and it has a BSD licence,
    so it could be used as a starting point, but:
    - it only has strings as index values, so it will need quite some simplification
    to work on integers as keys and data, and
    - it has no built in compression as far as I could see on first inspection.

    The questions:

    Would someone know of a better starting point for a B plus tree of integers
    with node compression?

    For example, how close is the current lucene code base to implementing
    a b plus tree for the doc ids of a single term?

    How valuable are transaction semantics for such an integer set? It is
    tempting to try and implement such an integer set by starting from the
    ground up, but I don't have any practical programming experience with
    transaction semantics, so it may be better to start from something that
    has transactions right from the start.

    Regards,
    Paul Elschot
  • Eks dev at Jan 19, 2009 at 8:14 am
    Hi Paul,
    not really an answer to your questions, I just thought you may find it useful as a confirmation that this packing of integers into (B or some other) Tree is good one.

    I have seen Integer set distributions that can profit hugely from the tree organization on top.

    have look at: http://www.iis.uni-stuttgart.de/intset/
    not meant for on disk storage, but the idea is quite similar.

    cheers,
    eks






    ________________________________
    From: Paul Elschot <paul.elschot@xs4all.nl>
    To: java-dev@lucene.apache.org
    Sent: Sunday, 18 January, 2009 23:51:36
    Subject: Re: Filesystem based bitset

    On Friday 09 January 2009 22:30:14 Marvin Humphrey wrote:
    On Fri, Jan 09, 2009 at 08:11:31PM +0100, Karl Wettin wrote:

    SSD is pretty close to RAM when it comes to seeking. Wouldn't that
    mean that a bitset stored on an SSD would be more or less as fast as a
    bitset in RAM?
    Provided that your index can fit in the system i/o cache and stay there, you
    get the speed of RAM regardless of the underlying permanent storage type.
    There's no reason to wait on SSDs before implementing such a feature.
    Since this started by thinking out loud, I'd like to continue doing that.
    I've been thinking about how to add a decent skipTo() to something that
    compresses better than an (Open)BitSet, and this turns out to be an
    integer set implemented as a B plus tree (all leafs on the same level) of
    only integers with key/data compression by a frame of reference for
    every node (see LUCENE-1410).
    I found a java implementation for a B plus tree on sourceforge: BpLusDotNet
    in the BplusJ package, see http://bplusdotnet.sourceforge.net/ .
    This has nice transaction semantics on a file system and it has a BSD licence,
    so it could be used as a starting point, but:
    - it only has strings as index values, so it will need quite some simplification
    to work on integers as keys and data, and
    - it has no built in compression as far as I could see on first inspection.
    The questions:
    Would someone know of a better starting point for a B plus tree of integers
    with node compression?
    For example, how close is the current lucene code base to implementing
    a b plus tree for the doc ids of a single term?
    How valuable are transaction semantics for such an integer set? It is
    tempting to try and implement such an integer set by starting from the
    ground up, but I don't have any practical programming experience with
    transaction semantics, so it may be better to start from something that
    has transactions right from the start.
    Regards,
    Paul Elschot
  • Michael McCandless at Jan 19, 2009 at 10:38 am

    Paul Elschot wrote:

    Since this started by thinking out loud, I'd like to continue doing
    that.
    I've been thinking about how to add a decent skipTo() to something
    that
    compresses better than an (Open)BitSet, and this turns out to be an
    integer set implemented as a B plus tree (all leafs on the same
    level) of
    only integers with key/data compression by a frame of reference for
    every node (see LUCENE-1410).
    Sounds great! With flexible indexing (LUCENE-1458, which I'm needing
    to get back to & finish...) you could experiment with these sorts of
    changes to the postings format by implementing your own codec.
    For example, how close is the current lucene code base to implementing
    a b plus tree for the doc ids of a single term?
    I'm not sure this is a good fit -- B+ trees are great at
    insertion/deletion of entries, but we never do that with our postings
    (they are write once). Though if the set operations are substantially
    faster (??) than the doc-at-a-time iteration Lucene does today, then
    maybe it is compelling? But we'd have to change up how AND/OR queries
    work to translate into these set operations.
    How valuable are transaction semantics for such an integer set? It is
    tempting to try and implement such an integer set by starting from the
    ground up, but I don't have any practical programming experience with
    transaction semantics, so it may be better to start from something
    that
    has transactions right from the start.
    If we use this to store/access deleted docs in RAM, then transactions
    are very important for realtime search. With LUCENE-1314
    (IndexReader.clone) a cloned reader carries over the deletes from the
    original reader but must "copy on write" as soon as a new deletion is
    made. With BitVector for deleted docs, this operation is very costly.
    But if we used B+ tree (or something similar) in RAM to hold the
    deleted docs, and that lets us incrementally copy-on-write only the
    nodes/blocks affected by the changes, that would be very useful.

    It could also be useful for storing deleted docs in the index, ie,
    this is an alternative to tombstones, in which case its transactional
    support would be good, to avoid writing an entire BitVector when only
    a few additional docs became deleted, during commit. This would fit
    nicely with Lucene's already transactional index storage, ie rather
    than storing the "deletion generation" (an int) that we store today,
    we'd store some reference into the B+ tree indicating the
    "commit point" to use for deletions.

    But I think this usage (changing how deletions are stored on disk) is
    less compelling than changing how deletions are stored/used in RAM.

    Mike

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Paul Elschot at Jan 19, 2009 at 5:33 pm

    On Monday 19 January 2009 11:32:17 Michael McCandless wrote:

    Paul Elschot wrote:
    Since this started by thinking out loud, I'd like to continue doing
    that.
    I've been thinking about how to add a decent skipTo() to something
    that
    compresses better than an (Open)BitSet, and this turns out to be an
    integer set implemented as a B plus tree (all leafs on the same
    level) of
    only integers with key/data compression by a frame of reference for
    every node (see LUCENE-1410).
    Sounds great! With flexible indexing (LUCENE-1458, which I'm needing
    to get back to & finish...) you could experiment with these sorts of
    changes to the postings format by implementing your own codec.
    I'll take a look there.
    For example, how close is the current lucene code base to implementing
    a b plus tree for the doc ids of a single term?
    I'm not sure this is a good fit -- B+ trees are great at
    insertion/deletion of entries, but we never do that with our postings
    (they are write once).

    Though if the set operations are substantially
    faster (??) than the doc-at-a-time iteration Lucene does today, then
    maybe it is compelling? But we'd have to change up how AND/OR queries
    work to translate into these set operations.
    The idea is to implement a DocIdSetIterator on top of this, with the
    usual next() and skipTo(), so it should fit in the current lucene framework.
    How valuable are transaction semantics for such an integer set? It is
    tempting to try and implement such an integer set by starting from the
    ground up, but I don't have any practical programming experience with
    transaction semantics, so it may be better to start from something
    that
    has transactions right from the start.
    If we use this to store/access deleted docs in RAM, then transactions
    are very important for realtime search. With LUCENE-1314
    (IndexReader.clone) a cloned reader carries over the deletes from the
    original reader but must "copy on write" as soon as a new deletion is
    made. With BitVector for deleted docs, this operation is very costly.
    But if we used B+ tree (or something similar) in RAM to hold the
    deleted docs, and that lets us incrementally copy-on-write only the
    nodes/blocks affected by the changes, that would be very useful.
    The one referenced by Eks Dev would be a good starting point for that,
    it's basically a binary tree of BitSets of at most 1024 bits at the leafs.
    It could also be useful for storing deleted docs in the index, ie,
    this is an alternative to tombstones, in which case its transactional
    support would be good, to avoid writing an entire BitVector when only
    a few additional docs became deleted, during commit. This would fit
    nicely with Lucene's already transactional index storage, ie rather
    than storing the "deletion generation" (an int) that we store today,
    we'd store some reference into the B+ tree indicating the
    "commit point" to use for deletions.

    But I think this usage (changing how deletions are stored on disk) is
    less compelling than changing how deletions are stored/used in RAM.
    Thanks,
    Paul Elschot

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-dev @
categorieslucene
postedJan 9, '09 at 7:12p
activeJan 19, '09 at 5:33p
posts22
users10
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase