Ah. Thanks for clarifying my wrong answer.. !
The only time I had to deal with timestamps I had to go through the thrift API ...
Never noticed the setTimeRange in the Scan() java API :)
So now I'm curious.. If I use this and it can't skip HFiles.. is there any performance gain from doing this vs doing it client side?
Or is it basically the same amount of work - a full scan checking & skipping timestamps.. ?
From: Carson Hoffacker <email@example.com>
To: firstname.lastname@example.org; Stuart Smith <email@example.com>
Sent: Wednesday, December 14, 2011 10:29 AM
Subject: Re: Questions on timestamps, insights on how timerange/timestamp filter are processed?
The timerange scan is able to leverage metadata in each of the HFiles. Each
HFile should store information about the timerange associated with the data
within the HFile. If the the timerange associated with the HFile is
different than the timerange you are interested in, that hfile will be
skipped completely. This can significantly increase scan performance.
However, when these files get compacted and the data is merged into a
smaller number of files, the time range associated with each file
increases. I don't think it works this way out of the box, but I believe
you can be smart about how you manage compactions over time to get the
behavior that you want. You could have compactions compact all the data
from January 2011 into a single file, and then compact all the data from
February 2011 into a different file.
On Wed, Dec 14, 2011 at 9:39 AM, Stuart Smith wrote:
Ã‚Â Ã‚Â Someone here could probably provide more help, but to start you off,
the only way I've filtered timestamps is to do a scan, and just filter out
rows one by one. This definitely sounds like something coprocessors could
help with, but I don't really understand those yet, so someone else will
have to step up.. or you can really dig into the documentation about them
(AFAIK, it's a little bit of custom code that runs on the regionservers
that can pre-process your gets.. but don't quote me on that!).
But I can say that a major compaction should not affect them - I've never
seen it happen, and if it does, I believe that's a bug.
Ã‚Â From: Steinmaurer Thomas <Thomas.Steinmaurer@scch.at>
Sent: Wednesday, December 14, 2011 12:38 AM
Subject: Questions on timestamps, insights on how timerange/timestamp
filter are processed?
can anybody share some insights on how timerange/timestamp filters are
Basically we intend to use timerange/timestamp filters to process rather
new data from an insertion timestamp POV
- How does the process of skipping records and/or regions work, if one
use timerange filters?
- I also wonder, do timestamp change when e.g. running a major
- If data grows over the years, is there any chance that regions with
"older" rows keep "stable" in a way, that they can be skipped very
quickly when querying data with a timerange filter of e.g. the last