FAQ
Hi,

I am working on new feature `Buffer Cache Hibernation' which enables
postgres to keep higher cache hit ratio even just started.

Postgres usually starts with ZERO buffer cache. By saving the buffer
cache data structure into hibernation files just before shutdown, and
loading them at startup, postgres can start operations with the saved
buffer cache as the same condition as just before the last shutdown.

Here is the patch for 9.0.3 (also tested on 8.4.7)
http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-9.0.3.patch

The patch includes the following.
- At shutdown, buffer cache data structure (such as BufferDescriptors,
BufferBlocks and StrategyControl) is saved into hibernation files.
- At startup, buffer cache data structure is loaded from hibernation
files and buffer lookup hashtable is setup based on buffer descriptors.
- Above functions are enabled by specifying `enable_buffer_cache_hibernation=on'
in postgresql.conf.

Any comments are welcome and I would very much appreciate merging the
patch in source tree.

Have fun and thanks!

Search Discussions

  • Andrew Dunstan at May 4, 2011 at 3:26 pm

    On 05/04/2011 10:10 AM, Mitsuru IWASAKI wrote:
    Hi,

    I am working on new feature `Buffer Cache Hibernation' which enables
    postgres to keep higher cache hit ratio even just started.

    Postgres usually starts with ZERO buffer cache. By saving the buffer
    cache data structure into hibernation files just before shutdown, and
    loading them at startup, postgres can start operations with the saved
    buffer cache as the same condition as just before the last shutdown.

    Here is the patch for 9.0.3 (also tested on 8.4.7)
    http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-9.0.3.patch

    The patch includes the following.
    - At shutdown, buffer cache data structure (such as BufferDescriptors,
    BufferBlocks and StrategyControl) is saved into hibernation files.
    - At startup, buffer cache data structure is loaded from hibernation
    files and buffer lookup hashtable is setup based on buffer descriptors.
    - Above functions are enabled by specifying `enable_buffer_cache_hibernation=on'
    in postgresql.conf.

    Any comments are welcome and I would very much appreciate merging the
    patch in source tree.
    That sounds cool.

    Please a) make sure your patch is up to data against the latest source
    in git and b) submit it to the next commitfest at
    <https://commitfest.postgresql.org/action/commitfest_view?id=10>

    We don't backport features, and 9.1 is closed for features now, so the
    earliest release this could be used in is 9.2.

    cheers

    andrew
  • Greg Stark at May 4, 2011 at 3:39 pm

    On Wed, May 4, 2011 at 3:10 PM, Mitsuru IWASAKI wrote:
    Postgres usually starts with ZERO buffer cache.  By saving the buffer
    cache data structure into hibernation files just before shutdown, and
    loading them at startup, postgres can start operations with the saved
    buffer cache as the same condition as just before the last shutdown.
    Offhand this seems pretty handy for benchmarks where it would help get
    reproducible results.


    --
    greg
  • Dickson S. Guedes at May 4, 2011 at 4:07 pm

    2011/5/4 Greg Stark <gsstark@mit.edu>:
    On Wed, May 4, 2011 at 3:10 PM, Mitsuru IWASAKI wrote:
    Postgres usually starts with ZERO buffer cache.  By saving the buffer
    cache data structure into hibernation files just before shutdown, and
    loading them at startup, postgres can start operations with the saved
    buffer cache as the same condition as just before the last shutdown.
    Offhand this seems pretty handy for benchmarks where it would help get
    reproducible results.
    It could have an option to force it or not at start of postgres. This
    could helps on benchmarks scenarios.

    --
    Dickson S. Guedes
    mail/xmpp: guedes@guedesoft.net - skype: guediz
    http://guedesoft.net - http://www.postgresql.org.br
  • Tom Lane at May 4, 2011 at 3:44 pm

    Mitsuru IWASAKI writes:
    Postgres usually starts with ZERO buffer cache. By saving the buffer
    cache data structure into hibernation files just before shutdown, and
    loading them at startup, postgres can start operations with the saved
    buffer cache as the same condition as just before the last shutdown.
    This seems like a lot of complication for rather dubious gain. What
    happens when the DBA changes the shared_buffers setting, for instance?
    How do you protect against the cached buffers getting out-of-sync with
    the actual disk files (especially during recovery scenarios)? What
    about crash-induced corruption in the cache file itself (consider the
    not-unlikely possibility that init will kill the database before it's
    had time to dump all the buffers during a system shutdown)? Do you have
    any proof that writing out a few GB of buffers and then reading them
    back in is actually much cheaper than letting the database re-read the
    data from the disk files?

    regards, tom lane
  • Alvaro Herrera at May 4, 2011 at 3:57 pm

    Excerpts from Tom Lane's message of mié may 04 12:44:36 -0300 2011:

    This seems like a lot of complication for rather dubious gain. What
    happens when the DBA changes the shared_buffers setting, for instance?
    How do you protect against the cached buffers getting out-of-sync with
    the actual disk files (especially during recovery scenarios)? What
    about crash-induced corruption in the cache file itself (consider the
    not-unlikely possibility that init will kill the database before it's
    had time to dump all the buffers during a system shutdown)? Do you have
    any proof that writing out a few GB of buffers and then reading them
    back in is actually much cheaper than letting the database re-read the
    data from the disk files?
    I thought the idea wasn't to copy the entire buffer but only a
    descriptor, so that the buffer would be loaded from the original page.

    If shared_buffers changes, there's no problem. If the new setting is
    smaller, then the last paages would just not be copied, and would have
    to be read from disk the first time they are accessed. If the new
    setting is larger, then the last few buffers would remain unused until
    requested.

    As for gain, I have heard of test setups requiring hours of runtime in
    order to prime the buffer cache.

    Crash safety would have to be researched, sure. Maybe only do it in
    clean shutdown.

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Greg Smith at May 4, 2011 at 4:46 pm

    Alvaro Herrera wrote:
    As for gain, I have heard of test setups requiring hours of runtime in
    order to prime the buffer cache.
    And production ones too. I have multiple customers where a server
    restart is almost a planned multi-hour downtime. The system may be back
    up, but for a couple of hours performance is so terrible it's barely
    usable. You can watch the MB/s ramp up as the more random data fills in
    over time; getting that taken care of in a larger block more amenable to
    elevator sorting would be a huge help.

    I never bothered with this particular idea though because shared_buffers
    is only a portion of the important data. Cedric's pgfincore code digs
    into the OS cache, too, which can then save enough to be really useful
    here. And that's already got a snapshot/restore feature. The slides at
    http://www.pgcon.org/2010/schedule/events/261.en.html have a useful into
    to that, pages 30 through 34 are the neat ones. That provides some
    other neat APIs for preloading popular data into cache too. I'd rather
    work on getting something like that into core, rather than adding
    something that only is targeting just shared_buffers.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
    "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
  • Josh Berkus at May 4, 2011 at 5:05 pm
    All,

    I thought that Dimitri had already implemented this using Fincore. It's
    linux-only, but that should work well enough to test the general concept.

    --
    Josh Berkus
    PostgreSQL Experts Inc.
    http://pgexperts.com
  • Dimitri Fontaine at May 5, 2011 at 7:54 am

    Josh Berkus writes:
    I thought that Dimitri had already implemented this using Fincore. It's
    linux-only, but that should work well enough to test the general concept.
    Actually, Cédric did, and I have a clone of his repository where I did
    some debian packaging of it.

    http://villemain.org/projects/pgfincore
    http://git.postgresql.org/gitweb?p=pgfincore.git;a=summary
    http://git.postgresql.org/gitweb?p=pgfincore.git;a=tree

    Regards,
    --
    Dimitri Fontaine
    http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
  • Cédric Villemain at May 5, 2011 at 8:00 am

    2011/5/4 Josh Berkus <josh@agliodbs.com>:
    All,

    I thought that Dimitri had already implemented this using Fincore.  It's
    linux-only, but that should work well enough to test the general concept.
    Harald provided me some pointers at pgday in Stuttgart to make it work
    with windows but ... hum I have not windows and wasn't enought
    motivated to make it work on it if no one need it.

    I didn't search recently on the different kernels, but any kernel
    supporting mincore and posix_fadvise should work. (so probably the
    same set of kernel that support our 'effective_io_concurrency').

    Still waiting for (free)BSD support .....


    --
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
  • Greg Stark at May 4, 2011 at 4:23 pm

    On Wed, May 4, 2011 at 4:44 PM, Tom Lane wrote:
    Do you have
    any proof that writing out a few GB of buffers and then reading them
    back in is actually much cheaper than letting the database re-read the
    data from the disk files?
    I believe he's just writing out the meta data. Ie, which blocks to
    re-reread from the disk files.

    --
    greg
  • Mitsuru IWASAKI at May 5, 2011 at 10:10 am
    Hi, thanks for good suggestions.
    Postgres usually starts with ZERO buffer cache. By saving the buffer
    cache data structure into hibernation files just before shutdown, and
    loading them at startup, postgres can start operations with the saved
    buffer cache as the same condition as just before the last shutdown.
    This seems like a lot of complication for rather dubious gain. What
    happens when the DBA changes the shared_buffers setting, for instance?
    It was my first concern actually. Current implementation is stopping
    reading hibernation file when detecting the size mismatch among
    shared_buffers and hibernation file. I think it is a safety way.
    As Alvaro Herrera mentioned, it would be possible to adjust copying
    buffer bloks, but changing shared_buffers setting is not so often I
    think.
    How do you protect against the cached buffers getting out-of-sync with
    the actual disk files (especially during recovery scenarios)? What
    Saving DB buffer cahce is called at shutdown after finishing
    bgwriter's final checkpoint process, so dirty-buffers should not exist
    I believe.
    For recovery scenarios, I need to research it though...
    Could you describe what is need to be consider?
    about crash-induced corruption in the cache file itself (consider the
    not-unlikely possibility that init will kill the database before it's
    had time to dump all the buffers during a system shutdown)? Do you have
    I think this is important point. I'll implement validation function for
    hibernation file.
    any proof that writing out a few GB of buffers and then reading them
    back in is actually much cheaper than letting the database re-read the
    data from the disk files?
    I think this means sequential-read vs scattered-read.
    The largest hibernation file is for buffer blocks, and sequential-read
    from it would be much faster than scattered-read from database file
    via smgrread() block by block.
    As Greg Stark suggested, re-reading from database file based on buffer
    descriptors was one of implementation candidates (it can reduce
    storage consumption for hibernation), but I chose creating buffer
    blocks raw image file and reading it for the performance.


    Thanks
  • Mitsuru IWASAKI at May 6, 2011 at 1:08 pm
    Hi,

    I revised the patch against HEAD, it's available at:
    http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110506.patch

    Implemented hibernation file validations:
    - comparison with pg_control
    At shutdown:
    pg_control state should be DB_SHUTDOWNED.
    At startup:
    pg_control state should be DB_SHUTDOWNED.
    hibernation files should be newer than pg_control.

    - CRC check
    At shutdown:
    compute CRC values for hibernation files and store them into a file.
    At startup:
    CRC values for hibernation files should be the same with read from the
    file created at shutdown.

    - file size
    At startup:
    The size of hibernation file should be the same with calculated file
    size based on shared_buffers.

    - buffer descriptors validation
    At startup:
    The descriptor flags should not include BM_DIRTY, BM_IO_IN_PROGRESS,
    BM_IO_ERROR, BM_JUST_DIRTIED and BM_PIN_COUNT_WAITER.
    Sanity checks for usage_count and usage_count should be done.
    (wait_backend_pid is zero-cleared because the process was terminated already)

    - system call error checking
    At shutdown and startup:
    Evaluation for return value system call (eg. open(), read(), write()
    and etc) should be done.
    How do you protect against the cached buffers getting out-of-sync with
    the actual disk files (especially during recovery scenarios)? What
    Saving DB buffer cahce is called at shutdown after finishing
    bgwriter's final checkpoint process, so dirty-buffers should not exist
    I believe.
    For recovery scenarios, I need to research it though...
    Could you describe what is need to be consider?
    I think hibernation should be allowed only when the system is shutdown
    normaly by checking pg_control state.
    And once the abnormal shutdown was detected, the hibernation files
    should be ignored.
    The latest patch includes this.
    # modifications for xlog.c:ReadControlFile() was required though...
    about crash-induced corruption in the cache file itself (consider the
    not-unlikely possibility that init will kill the database before it's
    had time to dump all the buffers during a system shutdown)? Do you have
    I think this is important point. I'll implement validation function for
    hibernation file.
    Added validations seem enough for me.
    # because my understanding on postgres is not enough ;)
    If any other considerations are required, please point them out.

    Thanks
  • Jeff Janes at May 4, 2011 at 4:55 pm

    On Wed, May 4, 2011 at 7:10 AM, Mitsuru IWASAKI wrote:
    Hi,

    I am working on new feature `Buffer Cache Hibernation' which enables
    postgres to keep higher cache hit ratio even just started.

    Postgres usually starts with ZERO buffer cache.  By saving the buffer
    cache data structure into hibernation files just before shutdown, and
    loading them at startup, postgres can start operations with the saved
    buffer cache as the same condition as just before the last shutdown.

    Here is the patch for 9.0.3 (also tested on 8.4.7)
    http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-9.0.3.patch

    The patch includes the following.
    - At shutdown, buffer cache data structure (such as BufferDescriptors,
    BufferBlocks and StrategyControl) is saved into hibernation files.
    - At startup, buffer cache data structure is loaded from hibernation
    files and buffer lookup hashtable is setup based on buffer descriptors.
    - Above functions are enabled by specifying `enable_buffer_cache_hibernation=on'
    in postgresql.conf.

    Any comments are welcome and I would very much appreciate merging the
    patch in source tree.

    Have fun and thanks!
    It applies and builds against head with offsets and some fuzz. It
    fails make check, but apparently only because
    src/test/regress/expected/rangefuncs.out needs to be updated to
    include the new setting. (Although all the other "enable%" settings
    are for the planner, so making a new setting with that prefix that
    does something else might be undesirable)

    I think that PgFincore (http://pgfoundry.org/projects/pgfincore/)
    provides similar functionality. Are you familiar with that? If so,
    could you contrast your approach with that one?

    Cheers,

    Jeff
  • Mitsuru IWASAKI at May 5, 2011 at 9:07 am
    Hi,
    I think that PgFincore (http://pgfoundry.org/projects/pgfincore/)
    provides similar functionality. Are you familiar with that? If so,
    could you contrast your approach with that one?
    I'm not familiar with PgFincore at all sorry, but I got source code
    and documents and read through them just now.
    # and I'm a novice on postgres actually...
    The target both is to reduce physical I/O, but their approaches and
    gains are different.
    My understanding is like this;

    +---------------------+ +---------------------+
    Postgres(backend) | | Postgres |
    +-----------------+ | | |
    DB Buffer Cache | | | |
    (shared buffers)| | | |
    *my target | | | |
    +-----------------+ | | |
    ^ ^ | | |
    v v | | |
    +-----------------+ | | +-----------------+ |
    buffer manager | | | | pgfincore | |
    +-----------------+ | | +-----------------+ |
    +---^------^----------+ +----------^----------+
    smgrread() |posix_fadvise()
    read()| | userland
    ==================================================================
    kernel
    +-------------+-------------+
    v
    +------------------------+
    File System |
    +-----------------+ |
    +------>| | FS Buffer Cache | |
    *PgFincore target| |
    +-----------------+ |
    ^ ^ |
    +----|-------|-----------+
    ==================================================================
    hardware
    +---------|-------|----------------+
    v Physical Disk |
    +------------------+ |
    base/16384/24598 | |
    v +------------------+ |
    +------------------------------+ |
    Buffer Cache Hibernation Files| |
    +------------------------------+ |
    +----------------------------------+

    In summary, PgFincore's target is File System Buffer Cache, Buffer
    Cache Hibernation's target is DB Buffer Cache(shared buffers).

    PgFincore is trying to preload database file by posix_fadvise() into
    File System Buffer Cache, not into DB Buffer Cache(shared buffers).
    On query execution, buffer manager will get DB buffer blocks by
    smgrread() from file system unless necessary blocks exist in DB Buffer
    Cache. At this point, physical reads may not happen because part of
    (or entire) database file is already loaded into FS Buffer Cache.

    The gain depends on the file system, especially size of File System
    Buffer Cache.
    Preloading database file is equivalent to following command in short.
    $ cat base/16384/24598 > /dev/null

    I think PgFincore is good for data warehouse in applications.


    Buffer Cache Hibernation, my approach, is more simple and straight forward.
    It try to save/load the contents of DB Buffer Cache(shared buffers) using
    regular files(called Buffer Cache Hibernation Files).
    At startup, buffer manager will load DB buffer blocks into DB Buffer
    Cache from Buffer Cache Hibernation Files which was saved at the last
    shutdown. Note that database file will not be read, so it is not
    cached in File System Buffer Cache at all. Only contents of DB Buffer
    Cache are filled. Therefore, the DB buffer cache miss penalty would
    be larger than PgFincore's.

    The gain depends on the size of shared buffers, and how often the
    similar queries are executed before and after restarting.

    Buffer Cache Hibernation is good for OLTP in applications.


    I think that PgFincore and Buffer Cache Hibernation is not exclusive,
    they can co-work together in different caching levels.



    Sorry for my poor english skill, but I'm doing my best :)

    Thanks
  • Cédric Villemain at May 5, 2011 at 11:36 am

    2011/5/5 Mitsuru IWASAKI <iwasaki@jp.freebsd.org>:
    Hi,
    I think that PgFincore (http://pgfoundry.org/projects/pgfincore/)
    provides similar functionality.  Are you familiar with that?  If so,
    could you contrast your approach with that one?
    I'm not familiar with PgFincore at all sorry, but I got source code
    and documents and read through them just now.
    # and I'm a novice on postgres actually...
    The target both is to reduce physical I/O, but their approaches and
    gains are different.
    My understanding is like this;

    +---------------------+     +---------------------+
    Postgres(backend)   |     | Postgres            |
    +-----------------+ |     |                     |
    DB Buffer Cache | |     |                     |
    (shared buffers)| |     |                     |
    *my target       | |     |                     |
    +-----------------+ |     |                     |
    ^      ^          |     |                     |
    v      v          |     |                     |
    +-----------------+ |     | +-----------------+ |
    buffer manager | |     | |    pgfincore    | |
    +-----------------+ |     | +-----------------+ |
    +---^------^----------+     +----------^----------+
    smgrread()                 |posix_fadvise()
    read()|                           |                 userland
    ==================================================================
    kernel
    +-------------+-------------+
    v
    +------------------------+
    File System            |
    +-----------------+  |
    +------>|   | FS Buffer Cache |  |
    *PgFincore target|  |
    +-----------------+  |
    ^       ^           |
    +----|-------|-----------+
    ==================================================================
    hardware
    +---------|-------|----------------+
    v  Physical Disk |
    +------------------+ |
    base/16384/24598 | |
    v   +------------------+ |
    +------------------------------+ |
    Buffer Cache Hibernation Files| |
    +------------------------------+ |
    +----------------------------------+
    littel detail, pgfincore store its data per relation in a file, like you do.
    I rewrote a bit that, and it will store its data directly in
    postgresql tables, as well as it will be able to restore the cache
    from raw bitstring.
    In summary, PgFincore's target is File System Buffer Cache, Buffer
    Cache Hibernation's target is DB Buffer Cache(shared buffers).
    Correct. (btw I am very happy of your idea and that you get time to do it)
    PgFincore is trying to preload database file by posix_fadvise() into
    File System Buffer Cache, not into DB Buffer Cache(shared buffers).
    On query execution, buffer manager will get DB buffer blocks by
    smgrread() from file system unless necessary blocks exist in DB Buffer
    Cache.  At this point, physical reads may not happen because part of
    (or entire) database file is already loaded into FS Buffer Cache.

    The gain depends on the file system, especially size of File System
    Buffer Cache.
    Preloading database file is equivalent to following command in short.
    $ cat base/16384/24598 > /dev/null
    Not exactly.

    it exists 2 calls :

    * pgfadv_WILLNEED
    * pgfadv_WILLNEED_snapshot

    The former ask to load each segment of a relation *but* the kernel can
    decide to not do that or load only part of each segment. (so it is not
    as brutal as cat file > /dev/null )
    The later read *exactly* each blocks required in each segment, not all
    blocks except if all were in cache while doing the snapshot. (this one
    is the part of the snapshot/restore combo)
    I think PgFincore is good for data warehouse in applications.
    Pgfincore with bitstring storage in a table allow streaming to
    HotStandbys and get better response in case of switch-over/fail-over
    by doing some house-keeping on the HotStandby and keep it really hot
    ;)

    Even web applications have large database today ....

    (they is more, but it is no the subject)

    Buffer Cache Hibernation, my approach, is more simple and straight forward.
    It try to save/load the contents of DB Buffer Cache(shared buffers) using
    regular files(called Buffer Cache Hibernation Files).
    At startup, buffer manager will load DB buffer blocks into DB Buffer
    Cache from Buffer Cache Hibernation Files which was saved at the last
    shutdown.  Note that database file will not be read, so it is not
    cached in File System Buffer Cache at all.  Only contents of DB Buffer
    Cache are filled.  Therefore, the DB buffer cache miss penalty would
    be larger than PgFincore's.

    The gain depends on the size of shared buffers, and how often the
    similar queries are executed before and after restarting.

    Buffer Cache Hibernation is good for OLTP in applications.
    It is very helpfull for debugging and analysis purpose, also, IIUC.
    I may prefer the per relation approach (so you can snapshot and
    restore only the interesting tables/index). Given what I read in your
    patch it looks easy to do, isn't it ?

    I also prefer the idea to keep a map of the Buffer Cache (yes, like
    what I do with pgfincore) than storing the data directly and reading
    it directly. This later part semmes a bit dangerous to me, even if it
    looks sane from a normal postgresql stop/start process.

    I think that PgFincore and Buffer Cache Hibernation is not exclusive,
    they can co-work together in different caching levels. Yes.


    Sorry for my poor english skill, but I'm doing my best :)
    better than me, and anyway your patch remain very easy to read in all case.
    Thanks

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers


    --
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
  • Mitsuru IWASAKI at May 6, 2011 at 5:22 pm
    Hi, thanks for your comments!
    I'm glad to discuss about this topic.
    * pgfadv_WILLNEED
    * pgfadv_WILLNEED_snapshot

    The former ask to load each segment of a relation *but* the kernel can
    decide to not do that or load only part of each segment. (so it is not
    as brutal as cat file > /dev/null )
    The later read *exactly* each blocks required in each segment, not all
    blocks except if all were in cache while doing the snapshot. (this one
    is the part of the snapshot/restore combo)
    Sorry about that, I'm not so familiar with posix_fadvise().
    I'll check posix_fadvise() later.
    Actually I used to execute 'cat database_file > /dev/null' script on
    other DBSM before starting.
    # or 'select /*+ INDEX(emp emp_pk) */ count(*) from emp;' to load
    # index blocks
    I may prefer the per relation approach (so you can snapshot and
    restore only the interesting tables/index). Given what I read in your
    patch it looks easy to do, isn't it ?
    I would like to keep my patch as simple as possible, because
    it is just a hibernation function, not complicated buffer management.
    But I want to try improving buffer management on next vacation.
    # currently I'm in 11-days vacation until Sunday.

    My rough idea on improving buffer management like this;
    SQL> alter table table_name buffer pin priority 7;
    SQL> alter index index_name buffer pin priority 10;

    This DDL set 'buffer pin priority' property to table/index and
    also buffer descriptors related with table/index.
    Optionally preloading database files in FS cache and relation blocks
    in DB cache would be possible.

    When new buffer is required, buffer manager refer to the priority in
    each buffers and select a victim buffer.

    I think it helps batch job runs in better buffer cache condition
    by giving hints for buffer management.
    For example, job-A reads table_A, index_A and writes only table_B;
    SQL> alter table table_A buffer pin priority 7;
    SQL> alter index index_A buffer pin priority 10;
    SQL> alter table table_B buffer pin priority 1;
    keeps buffers of index_A, table_A (table_B will be victims soon).

    Buffer pin priority can be reset like this;
    SQL> alter system buffer pin priority 5;

    Next job-B reads and writes table_C, reads index_C with preloading;
    SQL> alter table table_C buffer pin priority 5;
    SQL> alter index index_C buffer pin priority 10 with preloading 50%;
    something like this.
    I also prefer the idea to keep a map of the Buffer Cache (yes, like
    what I do with pgfincore) than storing the data directly and reading
    it directly. This later part semmes a bit dangerous to me, even if it
    looks sane from a normal postgresql stop/start process.
    Never mind :)
    I added enough validations and will add more.
    better than me, and anyway your patch remain very easy to read in all case.
    Thanks a lot! My policy on experimental implementation is easy-to-read
    so that people understand my idea quickly.
    That's why my first patch doesn't have enough error checkings ;)

    Thanks
  • Greg Smith at May 6, 2011 at 9:29 pm

    On 05/05/2011 05:06 AM, Mitsuru IWASAKI wrote:
    In summary, PgFincore's target is File System Buffer Cache, Buffer
    Cache Hibernation's target is DB Buffer Cache(shared buffers).
    Right. The thing to realize is that shared_buffers is becoming a
    smaller fraction of the total RAM used by the database every year. On
    Windows it's been stuck at useful settings being less than 512MB for a
    while now. And on UNIX systems, around 8GB seems to be effective upper
    limit. Best case, shared_buffers is only going to be around 25% of
    total RAM; worst-case, approximately, you might have Windows server with
    64GB of RAM where shared_buffers is less than 1% of total RAM.

    There's nothing wrong with the general idea you're suggesting. It's
    just only targeting a small (and shrinking) subset of the real problem
    here. Rebuilding cache state starts with shared_buffers, but that's not
    enough of the problem to be an effective tweak on many systems.

    I think that all the complexity with CRCs etc. is unlikely to lead
    anywhere too, and those two issues are not completely unrelated. The
    simplest, safest thing here is the right way to approach this, not the
    most complicated one, and a simpler format might add some flexibility
    here to reload more cache state too. The bottleneck on reloading the
    cache state is reading everything from disk. Trying to micro-optimize
    any other part of that is moving in the wrong direction to me. I doubt
    you'll ever measure a useful benefit that overcomes the expense of
    maintaining the code. And you seem to be moving to where someone can't
    restore cache state when they change shared_buffers. A simpler
    implementation might still work in that situation; reload until you run
    out of buffers if shared_buffers shrinks, reload until you're done with
    the original size.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
  • Robert Haas at May 6, 2011 at 9:58 pm

    On Fri, May 6, 2011 at 5:31 PM, Greg Smith wrote:
    On 05/05/2011 05:06 AM, Mitsuru IWASAKI wrote:

    In summary, PgFincore's target is File System Buffer Cache, Buffer
    Cache Hibernation's target is DB Buffer Cache(shared buffers).
    Right.  The thing to realize is that shared_buffers is becoming a smaller
    fraction of the total RAM used by the database every year.  On Windows it's
    been stuck at useful settings being less than 512MB for a while now.  And on
    UNIX systems, around 8GB seems to be effective upper limit.  Best case,
    shared_buffers is only going to be around 25% of total RAM; worst-case,
    approximately, you might have Windows server with 64GB of RAM where
    shared_buffers is less than 1% of total RAM.

    There's nothing wrong with the general idea you're suggesting.  It's just
    only targeting a small (and shrinking) subset of the real problem here.
    Rebuilding cache state starts with shared_buffers, but that's not enough of
    the problem to be an effective tweak on many systems.

    I think that all the complexity with CRCs etc. is unlikely to lead anywhere
    too, and those two issues are not completely unrelated.  The simplest,
    safest thing here is the right way to approach this, not the most
    complicated one, and a simpler format might add some flexibility here to
    reload more cache state too.  The bottleneck on reloading the cache state is
    reading everything from disk.  Trying to micro-optimize any other part of
    that is moving in the wrong direction to me.  I doubt you'll ever measure a
    useful benefit that overcomes the expense of maintaining the code.  And you
    seem to be moving to where someone can't restore cache state when they
    change shared_buffers.  A simpler implementation might still work in that
    situation; reload until you run out of buffers if shared_buffers shrinks,
    reload until you're done with the original size.
    Yeah, I'm pretty well convinced this whole approach is a dead end.
    Priming the OS buffer cache seems way more useful. I also think
    saving the blocks to be read rather than the actual blocks makes a lot
    more sense.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Mitsuru IWASAKI at May 7, 2011 at 7:32 am
    Hi,
    Sorry, I missed these messages because I didn't subscribe to this list.
    # I've just subscribed temporary
    I think that all the complexity with CRCs etc. is unlikely to lead anywhere
    too, and those two issues are not completely unrelated.  The simplest,
    safest thing here is the right way to approach this, not the most
    complicated one, and a simpler format might add some flexibility here to
    reload more cache state too.  The bottleneck on reloading the cache state is
    reading everything from disk.  Trying to micro-optimize any other part of
    that is moving in the wrong direction to me.  I doubt you'll ever measure a
    useful benefit that overcomes the expense of maintaining the code.  And you
    seem to be moving to where someone can't restore cache state when they
    change shared_buffers.  A simpler implementation might still work in that
    situation; reload until you run out of buffers if shared_buffers shrinks,
    reload until you're done with the original size.
    Yeah, I'm pretty well convinced this whole approach is a dead end.
    Priming the OS buffer cache seems way more useful. I also think
    saving the blocks to be read rather than the actual blocks makes a lot
    more sense.
    OK, there are two your suggestions here IIUC.
    # if not, please correct me.
    1. restore buffer blocks based on buffer descriptors, not from the saved file.
    2. support restoring cache state even if shared_buffers had changed.

    For 1, I've just finish my work. The latest patch is available at:
    http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110507.patch

    On my box, shared_buffers can be set up to only 200MB.
    Elapsed time for starting up is almost the same, about 3 sec (w/o
    hibernation takes about 1 sec).
    For shutdown, writing buffer blocks takes about 10 sec, otherwise
    about 1 sec.

    Well, it seems you were right :)
    By restoring buffer blocks based on buffer descriptors, the OS buffer
    cache will be filled too. This can help buffer updating performance
    I believe.

    I think saving buffer blocks is still useful for debugging or portability,
    so I would like to remain the support code in my patch.


    For 2, I'm not sure how to implement this.
    The problem is that freelist.c:StrategyControl is also restored at
    startup, but I have no idea currently how to adjust StrategyControl
    when shared_buffer had changed.
    StrategyControl has important data on buffer allocation, so this should be
    matched with shared_buffer, I belive.

    Changing shared_buffer is not so often on production environment.
    Current implementation like this;
    If shared_buffer had changed, restoring is aborted only on that time
    and saving is executed with new shared_buffer at shutdown, restoring
    is executed at startup on next time.

    I have one more day for working on this, but I may give up...

    Thanks
  • Robert Haas at May 7, 2011 at 1:56 pm

    On Sat, May 7, 2011 at 3:32 AM, Mitsuru IWASAKI wrote:
    I have one more day for working on this, but I may give up...
    I think this is an interesting line of inquiry, but if you were hoping
    to get something committable in a couple of days, you had unrealistic
    expectations...

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Mitsuru IWASAKI at May 8, 2011 at 4:59 am
    Hi, folks!
    I'll do more testing tomorrow, and hopefully finalize my patch.
    Done! the patch is available at:
    http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110508.patch

    I hope this would be committable and the final version.
    Major changes from the experimental implementation are the following.

    - add many validations against hibernation file corruption and etc.
    - restore buffer blocks based on buffer descriptors, not from the saved file.
    - support restoring cache state even if shared_buffers had changed.

    My vacation ends today and I have to go back my work from tomorrow,
    but I would try to find spare time for this.

    Thanks a lot for happy hacking days with you!
  • Greg Smith at May 8, 2011 at 6:42 am

    We can't accept patches just based on a pointer to a web site. Please
    e-mail this to the mailing list so that it can be considered a
    submission under the project's licensing terms.
    I hope this would be committable and the final version.
    PostgreSQL has high standards for code submissions. Extremely few
    submissions are committed without significant revisions to them based on
    code review. So far you've gotten a first round of high-level design
    review, there's several additional steps before something is considered
    for a commit. The whole process is outlined at
    http://wiki.postgresql.org/wiki/Submitting_a_Patch

    From a couple of minutes of reading the patch, the first things that
    pop out as problems are:

    -All of the ControlFile -> controlFile renaming has add a larger
    difference to ReadControlFile than I would consider ideal.
    -Touching StrategyControl is not something this patch should be doing.
    -I don't think your justification ("debugging or portability") for
    keeping around your original code in here is going to be sufficient to
    do so.
    -This should not be named enable_buffer_cache_hibernation. That very
    large diff you ended up with in the regression tests is because all of
    the settings named enable_* are optimizer control settings. Using the
    name "buffer_cache_hibernation" instead would make a better starting point.

    From a bigger picture perspective, this really hasn't addressed any of
    my comments about shared_buffers only being the beginning of the useful
    cache state to worry about here. I'd at least like the solution to the
    buffer cache save/restore to have a plan for how it might address that
    too one day. This project is also picky about only committing code that
    fits into the long-term picture for desired features.

    Having a working example of a server-side feature doing cache storage
    and restoration is helpful though. Don't think your work here is
    unappreciated--it is. Getting this feature added is just a harder
    problem than what you've done so far.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
  • Mitsuru IWASAKI at May 14, 2011 at 6:54 pm
    Hi,
    We can't accept patches just based on a pointer to a web site. Please
    e-mail this to the mailing list so that it can be considered a
    submission under the project's licensing terms.
    I hope this would be committable and the final version.
    PostgreSQL has high standards for code submissions. Extremely few
    submissions are committed without significant revisions to them based on
    code review. So far you've gotten a first round of high-level design
    review, there's several additional steps before something is considered
    for a commit. The whole process is outlined at
    http://wiki.postgresql.org/wiki/Submitting_a_Patch
    OK, I would do so for my next patch.
    From a couple of minutes of reading the patch, the first things that
    pop out as problems are:

    -All of the ControlFile -> controlFile renaming has add a larger
    difference to ReadControlFile than I would consider ideal.
    I think so too, I will consider this again.
    -Touching StrategyControl is not something this patch should be doing.
    Sorry, I could not get this. Could you describe this?
    I think StrategyControl needs to be adjusted if shared_buffers setting
    was changed.
    -I don't think your justification ("debugging or portability") for
    keeping around your original code in here is going to be sufficient to
    do so.
    -This should not be named enable_buffer_cache_hibernation. That very
    large diff you ended up with in the regression tests is because all of
    the settings named enable_* are optimizer control settings. Using the
    name "buffer_cache_hibernation" instead would make a better starting point.
    OK, how about `buffer_cache_hibernation_level'?
    The value 0 to disable(default), 1 for saving buffer descriptors only,
    2 for saving buffer descriptors and buffer blocks.
    From a bigger picture perspective, this really hasn't addressed any of
    my comments about shared_buffers only being the beginning of the useful
    cache state to worry about here. I'd at least like the solution to the
    buffer cache save/restore to have a plan for how it might address that
    too one day. This project is also picky about only committing code that
    fits into the long-term picture for desired features.
    My simple motivation on this is that `We don't want to restart our DB
    server because the DB buffer cache will be lost and the DB server
    needs to start its operations with zero cache. Does any DBMS product
    support holding the contents of DB cache as it is even by restarting,
    just like the hibernation feature of PC?'.
    It's very simple and many of DB admins will be happy soon with this
    feature, I think.

    Thanks
  • Heikki Linnakangas at May 8, 2011 at 10:11 pm

    On 08.05.2011 07:58, Mitsuru IWASAKI wrote:
    I'll do more testing tomorrow, and hopefully finalize my patch.
    Done! the patch is available at:
    http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110508.patch
    I'd suggest doing this as an extension module. All the changes to
    existing server code seem superficial.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Mitsuru IWASAKI at May 14, 2011 at 6:58 pm
    Hi,
    I'd suggest doing this as an extension module. All the changes to
    existing server code seem superficial.
    It sounds interesting. I'll try it later.
    Are there any good examples for extension module?

    Thanks
  • Greg Smith at May 27, 2011 at 2:13 am

    On 05/07/2011 03:32 AM, Mitsuru IWASAKI wrote:
    For 1, I've just finish my work. The latest patch is available at:
    http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110507.patch
    Reminder here--we can't accept code based on it being published to a web
    page. You'll need to e-mail it to the pgsql-hackers mailing list to be
    considered for the next PostgreSQL CommitFest, which is starting in a
    few weeks. Code submitted to the mailing list is considered a release
    of it to the project under the PostgreSQL license, which we can't just
    assume for things when given only a URL to them.

    Also, you suggested you were out of time to work on this. If that's the
    case, we'd like to know that so we don't keep cc'ing you about things in
    expectation of an answer. Someone else may pick this up as a project to
    continue working on. But it's going to need a fair amount of revision
    before it matches what people want here, and I'm not sure how much of
    what you've written is going to end up in any commit that may happen
    from this idea.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
  • Mitsuru IWASAKI at Jun 5, 2011 at 1:23 pm
    Hi,
    On 05/07/2011 03:32 AM, Mitsuru IWASAKI wrote:
    For 1, I've just finish my work. The latest patch is available at:
    http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110507.patch
    Reminder here--we can't accept code based on it being published to a web
    page. You'll need to e-mail it to the pgsql-hackers mailing list to be
    considered for the next PostgreSQL CommitFest, which is starting in a
    few weeks. Code submitted to the mailing list is considered a release
    of it to the project under the PostgreSQL license, which we can't just
    assume for things when given only a URL to them.
    Sorry about that, but I had enough time to revise my patches this week-end.
    I attached the patches in this mail, and will update CommitFest page soon.
    Also, you suggested you were out of time to work on this. If that's the
    case, we'd like to know that so we don't keep cc'ing you about things in
    expectation of an answer. Someone else may pick this up as a project to
    continue working on. But it's going to need a fair amount of revision
    before it matches what people want here, and I'm not sure how much of
    what you've written is going to end up in any commit that may happen
    from this idea.
    It seems that I don't have enough time to complete this work.
    You don't need to keep cc'ing me, and I'm very happy if postgres to be
    the first DBMS which support buffer cache hibernation feature.

    Thanks!


    diff --git src/backend/access/transam/xlog.c src/backend/access/transam/xlog.c
    index b0e4c41..7a3a207 100644
    --- src/backend/access/transam/xlog.c
    +++ src/backend/access/transam/xlog.c
    @@ -4834,6 +4834,19 @@ ReadControlFile(void)
    #endif
    }

    +bool
    +GetControlFile(ControlFileData *controlFile)
    +{
    + if (ControlFile == NULL)
    + {
    + return false;
    + }
    +
    + memcpy(controlFile, ControlFile, sizeof(ControlFileData));
    +
    + return true;
    +}
    +
    void
    UpdateControlFile(void)
    {
    diff --git src/backend/bootstrap/bootstrap.c src/backend/bootstrap/bootstrap.c
    index fc093cc..7ecf6bb 100644
    --- src/backend/bootstrap/bootstrap.c
    +++ src/backend/bootstrap/bootstrap.c
    @@ -360,6 +360,15 @@ AuxiliaryProcessMain(int argc, char *argv[])
    BaseInit();

    /*
    + * Only StartupProcess can call ResumeBufferCacheHibernation() after
    + * InitFileAccess() and smgrinit().
    + */
    + if (auxType == StartupProcess && BufferCacheHibernationLevel > 0)
    + {
    + ResumeBufferCacheHibernation();
    + }
    +
    + /*
    * When we are an auxiliary process, we aren't going to do the full
    * InitPostgres pushups, but there are a couple of things that need to get
    * lit up even in an auxiliary process.
    diff --git src/backend/storage/buffer/buf_init.c src/backend/storage/buffer/buf_init.c
    index dadb49d..52eb51a 100644
    --- src/backend/storage/buffer/buf_init.c
    +++ src/backend/storage/buffer/buf_init.c
    @@ -127,6 +127,14 @@ InitBufferPool(void)

    /* Init other shared buffer-management stuff */
    StrategyInitialize(!foundDescs);
    +
    + if (BufferCacheHibernationLevel > 0)
    + {
    + ResisterBufferCacheHibernation(BUFFER_CACHE_HIBERNATION_TYPE_DESCRIPTORS,
    + (char *)BufferDescriptors, sizeof(BufferDesc), NBuffers);
    + ResisterBufferCacheHibernation(BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS,
    + (char *)BufferBlocks, BLCKSZ, NBuffers);
    + }
    }

    /*
    diff --git src/backend/storage/buffer/bufmgr.c src/backend/storage/buffer/bufmgr.c
    index f96685d..dba8ebf 100644
    --- src/backend/storage/buffer/bufmgr.c
    +++ src/backend/storage/buffer/bufmgr.c
    @@ -31,6 +31,7 @@
    #include "postgres.h"

    #include <sys/file.h>
    +#include <sys/stat.h>
    #include <unistd.h>

    #include "catalog/catalog.h"
    @@ -61,6 +62,13 @@
    #define BUF_WRITTEN 0x01
    #define BUF_REUSABLE 0x02

    +/*
    + * Buffer Cache Hibernation stuff.
    + */
    +/* enable this to debug buffer cache hibernation. */
    +#if 0
    +#define DEBUG_BUFFER_CACHE_HIBERNATION
    +#endif

    /* GUC variables */
    bool zero_damaged_pages = false;
    @@ -765,6 +773,16 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
    }
    }

    +#ifdef DEBUG_BUFFER_CACHE_HIBERNATION
    + elog(DEBUG5,
    + "alloc [%d]\t%03x,%d,%d,%d,%d\t%08x,%d,%d,%d,%d,%d",
    + buf->buf_id, buf->flags, buf->usage_count, buf->refcount,
    + buf->wait_backend_pid, buf->freeNext,
    + newHash, newTag.rnode.spcNode,
    + newTag.rnode.dbNode, newTag.rnode.relNode,
    + newTag.forkNum, newTag.blockNum);
    +#endif
    +
    return buf;
    }

    @@ -800,6 +818,16 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
    * the old content is no longer relevant. (The usage_count starts out at
    * 1 so that the buffer can survive one clock-sweep pass.)
    */
    +#ifdef DEBUG_BUFFER_CACHE_HIBERNATION
    + elog(DEBUG5,
    + "rename [%d]\t%03x,%d,%d,%d,%d\t%08x,%d,%d,%d,%d,%d",
    + buf->buf_id, buf->flags, buf->usage_count, buf->refcount,
    + buf->wait_backend_pid, buf->freeNext,
    + oldHash, oldTag.rnode.spcNode,
    + oldTag.rnode.dbNode, oldTag.rnode.relNode,
    + oldTag.forkNum, oldTag.blockNum);
    +#endif
    +
    buf->tag = newTag;
    buf->flags &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT);
    if (relpersistence == RELPERSISTENCE_PERMANENT)
    @@ -2772,3 +2800,716 @@ local_buffer_write_error_callback(void *arg)
    pfree(path);
    }
    }
    +
    +/* ----------------------------------------------------------------
    + * Buffer Cache Hibernation support stuff
    + *
    + * Suspend/resume buffer cache data structure using hibernation files
    + * at shutdown/startup.
    + * ----------------------------------------------------------------
    + */
    +
    +int BufferCacheHibernationLevel = 0;
    +
    +#define BUFFER_CACHE_HIBERNATION_FILE_STRATEGY "global/pg_buffer_cache_hibernation_strategy"
    +#define BUFFER_CACHE_HIBERNATION_FILE_DESCRIPTORS "global/pg_buffer_cache_hibernation_descriptors"
    +#define BUFFER_CACHE_HIBERNATION_FILE_BLOCKS "global/pg_buffer_cache_hibernation_blocks"
    +#define BUFFER_CACHE_HIBERNATION_FILE_CRC32 "global/pg_buffer_cache_hibernation_crc32"
    +
    +static struct
    +{
    + char *hibernation_file;
    + char *data_ptr;
    + Size record_length;
    + Size num_records;
    + pg_crc32 crc;
    +} BufferCacheHibernationData[] =
    +{
    + /* BufferStrategyControl */
    + {
    + BUFFER_CACHE_HIBERNATION_FILE_STRATEGY,
    + NULL, 0, 0, 0
    + },
    +
    + /* BufferDescriptors */
    + {
    + BUFFER_CACHE_HIBERNATION_FILE_DESCRIPTORS,
    + NULL, 0, 0, 0
    + },
    +
    + /* BufferBlocks */
    + {
    + BUFFER_CACHE_HIBERNATION_FILE_BLOCKS,
    + NULL, 0, 0, 0
    + },
    +
    + /* End-of-list marker */
    + {
    + NULL,
    + NULL, 0, 0, 0
    + },
    +};
    +
    +static ControlFileData controlFile;
    +static bool controlFileInitialized = false;
    +
    +/*
    + * AtProcExit_BufferCacheHibernation:
    + * store the buffer cache into hibernation files at shutdown.
    + */
    +static void
    +AtProcExit_BufferCacheHibernation(int code, Datum arg)
    +{
    + BufferHibernationFileType id;
    + int i;
    + int fd;
    +
    + if (BufferCacheHibernationLevel == 0)
    + {
    + return;
    + }
    +
    + /*
    + * get the control file to check the system state validation.
    + */
    + if (GetControlFile(&controlFile) == false)
    + {
    + elog(WARNING,
    + "could not get control file, "
    + "aborting buffer cache hibernation");
    + return;
    + }
    +
    + if (controlFile.state != DB_SHUTDOWNED)
    + {
    + elog(WARNING,
    + "database system was not shut down normally, "
    + "aborting buffer cache hibernation");
    + return;
    + }
    +
    + /*
    + * suspend buffer cache data structure into hibernation files.
    + */
    + for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    + {
    + Size record_length;
    + Size num_records;
    + char *ptr;
    + pg_crc32 crc;
    +
    + if (BufferCacheHibernationLevel < 2 &&
    + id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + continue;
    + }
    +
    + if (BufferCacheHibernationData[id].data_ptr == NULL ||
    + BufferCacheHibernationData[id].record_length == 0 ||
    + BufferCacheHibernationData[id].num_records == 0)
    + {
    + elog(WARNING,
    + "ResisterBufferCacheHibernation() was not called for %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + goto cleanup;
    + }
    +
    + fd = BasicOpenFile(BufferCacheHibernationData[id].hibernation_file,
    + O_CREAT | O_WRONLY | O_TRUNC | PG_BINARY, S_IRUSR | S_IWUSR);
    + if (fd < 0)
    + {
    + elog(WARNING,
    + "could not open %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + goto cleanup;
    + }
    +
    + record_length = BufferCacheHibernationData[id].record_length;
    + num_records = BufferCacheHibernationData[id].num_records;
    +
    + elog(NOTICE,
    + "buffer cache hibernate into %s",
    + BufferCacheHibernationData[id].hibernation_file);
    +
    + INIT_CRC32(crc);
    + for (i = 0; i < num_records; i++)
    + {
    + ptr = BufferCacheHibernationData[id].data_ptr + (i * record_length);
    + if (write(fd, (void *)ptr, record_length) != record_length)
    + {
    + elog(WARNING,
    + "could not write %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + goto cleanup;
    + }
    +
    + COMP_CRC32(crc, ptr, record_length);
    + }
    +
    + FIN_CRC32(crc);
    + close(fd);
    +
    + BufferCacheHibernationData[id].crc = crc;
    + }
    +
    + /*
    + * save the computed crc values for the validations at resuming.
    + */
    + fd = BasicOpenFile(BUFFER_CACHE_HIBERNATION_FILE_CRC32,
    + O_CREAT | O_WRONLY | O_TRUNC | PG_BINARY, S_IRUSR | S_IWUSR);
    + if (fd < 0)
    + {
    + elog(WARNING,
    + "could not open %s",
    + BUFFER_CACHE_HIBERNATION_FILE_CRC32);
    + goto cleanup;
    + }
    +
    + for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    + {
    + pg_crc32 crc;
    +
    + if (BufferCacheHibernationLevel < 2 &&
    + id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + continue;
    + }
    +
    + crc = BufferCacheHibernationData[id].crc;
    + if (write(fd, (void *)&crc, sizeof(pg_crc32)) != sizeof(pg_crc32))
    + {
    + elog(WARNING,
    + "could not write %s for %s",
    + BUFFER_CACHE_HIBERNATION_FILE_CRC32,
    + BufferCacheHibernationData[id].hibernation_file);
    + goto cleanup;
    + }
    + }
    + close(fd);
    +
    + elog(NOTICE,
    + "buffer cache suspended successfully");
    +
    + return;
    +
    +cleanup:
    + for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    + {
    + unlink(BufferCacheHibernationData[id].hibernation_file);
    + }
    +
    + return;
    +}
    +
    +/*
    + * ResisterBufferCacheHibernation:
    + * register the buffer cache data structure info.
    + */
    +void
    +ResisterBufferCacheHibernation(BufferHibernationFileType id, char *ptr, Size record_length, Size num_records)
    +{
    + static bool first_time = true;
    +
    + if (BufferCacheHibernationLevel == 0)
    + {
    + return;
    + }
    +
    + if (id != BUFFER_CACHE_HIBERNATION_TYPE_STRATEGY &&
    + id != BUFFER_CACHE_HIBERNATION_TYPE_DESCRIPTORS &&
    + id != BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + return;
    + }
    +
    + if (first_time)
    + {
    + /*
    + * AtProcExit_BufferCacheHibernation to be called at shutdown.
    + */
    + on_shmem_exit(AtProcExit_BufferCacheHibernation, 0);
    + first_time = false;
    + }
    +
    + /*
    + * get the control file to check the system state and
    + * hibernation file validations.
    + */
    + if (controlFileInitialized == false)
    + {
    + if (GetControlFile(&controlFile) == true)
    + {
    + controlFileInitialized = true;
    + }
    + }
    +
    + BufferCacheHibernationData[id].data_ptr = ptr;
    + BufferCacheHibernationData[id].record_length = record_length;
    + BufferCacheHibernationData[id].num_records = num_records;
    +}
    +
    +/*
    + * ResumeBufferCacheHibernation:
    + * resume the buffer cache from hibernation file at startup.
    + */
    +void
    +ResumeBufferCacheHibernation(void)
    +{
    + BufferHibernationFileType id;
    + int i;
    + int fd;
    + Size num_records;
    + Size record_length;
    + char *buf_common;
    + int oldNBuffers;
    + bool buffer_block_processed;
    +
    + if (BufferCacheHibernationLevel == 0)
    + {
    + return;
    + }
    +
    + buf_common = NULL;
    + buffer_block_processed = false;
    +
    + /*
    + * lock all buffer descriptors to prevent other processes from
    + * updating buffers.
    + */
    + for (i = 0; i < NBuffers; i++)
    + {
    + BufferDesc *buf;
    +
    + buf = &BufferDescriptors[i];
    + LockBufHdr(buf);
    + }
    +
    + /*
    + * get the control file to check the system state and
    + * hibernation file validations.
    + */
    + if (controlFileInitialized == false)
    + {
    + elog(WARNING,
    + "could not get control file, "
    + "aborting buffer cache hibernation");
    + goto cleanup;
    + }
    +
    + if (controlFile.state != DB_SHUTDOWNED)
    + {
    + elog(WARNING,
    + "database system was not shut down normally, "
    + "aborting buffer cache hibernation");
    + goto cleanup;
    + }
    +
    + /*
    + * read the crc values which was computed when the hibernation
    + * files were created.
    + */
    + fd = BasicOpenFile(BUFFER_CACHE_HIBERNATION_FILE_CRC32,
    + O_RDONLY | PG_BINARY, S_IRUSR | S_IWUSR);
    + if (fd < 0)
    + {
    + elog(WARNING,
    + "could not open %s",
    + BUFFER_CACHE_HIBERNATION_FILE_CRC32);
    + goto cleanup;
    + }
    +
    + for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    + {
    + pg_crc32 crc;
    +
    + if (BufferCacheHibernationLevel < 2 &&
    + id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + continue;
    + }
    +
    + if (read(fd, (void *)&crc, sizeof(pg_crc32)) != sizeof(pg_crc32))
    + {
    + if (BufferCacheHibernationLevel == 2 &&
    + id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + /*
    + * if buffer_cache_hibernation_level changes 1 to 2,
    + * the crc value of buffer block hibernation file may not exist.
    + * just ignore it here.
    + */
    + continue;
    + }
    +
    + elog(WARNING,
    + "could not read %s for %s",
    + BUFFER_CACHE_HIBERNATION_FILE_CRC32,
    + BufferCacheHibernationData[id].hibernation_file);
    + close(fd);
    + goto cleanup;
    + }
    + BufferCacheHibernationData[id].crc = crc;
    + }
    +
    + close(fd);
    +
    + /*
    + * allocate a buffer to read the contents of the hibernation files
    + * for validations.
    + */
    + record_length = 0;
    + for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    + {
    + if (record_length < BufferCacheHibernationData[id].record_length)
    + {
    + record_length = BufferCacheHibernationData[id].record_length;
    + }
    + }
    +
    + buf_common = malloc(record_length);
    + Assert(buf_common != NULL);
    +
    + /* assume that the number of buffers have not changed. */
    + oldNBuffers = NBuffers;
    +
    + /*
    + * check if all hibernation files are valid.
    + */
    + for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    + {
    + struct stat sb;
    + pg_crc32 crc;
    +
    + if (BufferCacheHibernationLevel < 2 &&
    + id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + continue;
    + }
    +
    + if (BufferCacheHibernationData[id].data_ptr == NULL ||
    + BufferCacheHibernationData[id].record_length == 0 ||
    + BufferCacheHibernationData[id].num_records == 0)
    + {
    + elog(WARNING,
    + "ResisterBufferCacheHibernation() was not called for %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + goto cleanup;
    + }
    +
    + fd = BasicOpenFile(BufferCacheHibernationData[id].hibernation_file,
    + O_RDONLY | PG_BINARY, S_IRUSR | S_IWUSR);
    + if (fd < 0)
    + {
    + if (BufferCacheHibernationLevel == 2 &&
    + id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + /*
    + * if buffer_cache_hibernation_level changes 1 to 2,
    + * the buffer block hibernation file may not exist.
    + * just ignore it here.
    + */
    + continue;
    + }
    +
    + goto cleanup;
    + }
    +
    + if (fstat(fd, &sb) < 0)
    + {
    + elog(WARNING,
    + "could not get stats of the buffer cache hibernation file: %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + close(fd);
    + goto cleanup;
    + }
    +
    + record_length = BufferCacheHibernationData[id].record_length;
    + num_records = BufferCacheHibernationData[id].num_records;
    +
    + if (sb.st_size != (record_length * num_records))
    + {
    + /* The size of StrategyControl should be the same always. */
    + if (id == BUFFER_CACHE_HIBERNATION_TYPE_STRATEGY ||
    + (sb.st_size % record_length) > 0)
    + {
    + elog(WARNING,
    + "size mismatch on the buffer cache hibernation file: %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + close(fd);
    + goto cleanup;
    + }
    +
    + /*
    + * The number of records of buffer descriptors and blocks
    + * should be the same.
    + */
    + if (oldNBuffers != NBuffers &&
    + oldNBuffers != (sb.st_size / record_length))
    + {
    + elog(WARNING,
    + "size mismatch on the buffer cache hibernation file: %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + close(fd);
    + goto cleanup;
    + }
    +
    + oldNBuffers = sb.st_size / record_length;
    +
    + elog(NOTICE,
    + "shared_buffers have changed from %d to %d: %s",
    + oldNBuffers, NBuffers,
    + BufferCacheHibernationData[id].hibernation_file);
    +
    + /* use the original size to compute CRC of the hibernation file. */
    + num_records = oldNBuffers;
    + }
    +
    + if ((pg_time_t)sb.st_mtime < controlFile.time)
    + {
    + elog(WARNING,
    + "the hibernation file is older than control file: %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + close(fd);
    + goto cleanup;
    + }
    +
    + INIT_CRC32(crc);
    + for (i = 0; i < num_records; i++)
    + {
    + if (read(fd, (void *)buf_common, record_length) != record_length)
    + {
    + elog(WARNING,
    + "could not read the buffer cache hibernation file: %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + close(fd);
    + goto cleanup;
    + }
    +
    + COMP_CRC32(crc, buf_common, record_length);
    +
    + /*
    + * buffer descriptors validations.
    + */
    + if (id == BUFFER_CACHE_HIBERNATION_TYPE_DESCRIPTORS)
    + {
    + BufferDesc *buf;
    + BufFlags abnormal_flags;
    +
    + if (i >= NBuffers)
    + {
    + continue;
    + }
    +
    + abnormal_flags = (BM_DIRTY | BM_IO_IN_PROGRESS | BM_IO_ERROR |
    + BM_JUST_DIRTIED | BM_PIN_COUNT_WAITER);
    +
    + buf = (BufferDesc *)buf_common;
    +
    + if (buf->flags & abnormal_flags)
    + {
    + elog(WARNING,
    + "abnormal flags in buffer descriptors: %d",
    + buf->flags);
    + close(fd);
    + goto cleanup;
    + }
    +
    + if (buf->usage_count > BM_MAX_USAGE_COUNT)
    + {
    + elog(WARNING,
    + "invalid usage count in buffer descriptors: %d",
    + buf->usage_count);
    + close(fd);
    + goto cleanup;
    + }
    +
    + if (buf->buf_id < 0 || buf->buf_id >= num_records)
    + {
    + elog(WARNING,
    + "invalid buffer id in buffer descriptors: %d",
    + buf->buf_id);
    + close(fd);
    + goto cleanup;
    + }
    + }
    + }
    +
    + FIN_CRC32(crc);
    + close(fd);
    +
    + if (!EQ_CRC32(BufferCacheHibernationData[id].crc, crc))
    + {
    + elog(WARNING,
    + "crc mismatch on the buffer cache hibernation file: %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + close(fd);
    + goto cleanup;
    + }
    + }
    +
    + /*
    + * resume the buffer cache data structure from the hibernation files.
    + */
    + for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    + {
    + int fd;
    + char *ptr;
    +
    + if (BufferCacheHibernationLevel < 2 &&
    + id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + continue;
    + }
    +
    + record_length = BufferCacheHibernationData[id].record_length;
    + num_records = BufferCacheHibernationData[id].num_records;
    +
    + if (id != BUFFER_CACHE_HIBERNATION_TYPE_STRATEGY)
    + {
    + /* use the smaller number of buffers. */
    + num_records = (oldNBuffers < NBuffers)? oldNBuffers : NBuffers;
    + }
    +
    + fd = BasicOpenFile(BufferCacheHibernationData[id].hibernation_file,
    + O_RDONLY | PG_BINARY, S_IRUSR | S_IWUSR);
    + if (fd < 0)
    + {
    + if (BufferCacheHibernationLevel == 2 &&
    + id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + /*
    + * if buffer_cache_hibernation_level changes 1 to 2,
    + * the buffer block hibernation file may not exist.
    + * just ignore it here.
    + */
    + continue;
    + }
    +
    + goto cleanup;
    + }
    +
    + elog(NOTICE,
    + "buffer cache resume from %s(%d bytes * %d records)",
    + BufferCacheHibernationData[id].hibernation_file,
    + record_length, num_records);
    +
    + for (i = 0; i < num_records; i++)
    + {
    + ptr = BufferCacheHibernationData[id].data_ptr + (i * record_length);
    + read(fd, (void *)ptr, record_length);
    +
    + /* Re-lock the buffer descriptor if necessary. */
    + if (id == BUFFER_CACHE_HIBERNATION_TYPE_DESCRIPTORS)
    + {
    + BufferDesc *buf;
    +
    + buf = (BufferDesc *)ptr;
    + if (IsUnlockBufHdr(buf))
    + {
    + LockBufHdr(buf);
    + }
    + }
    + }
    +
    + close(fd);
    +
    + if (id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + buffer_block_processed = true;
    + }
    + }
    +
    + if (buffer_block_processed == false)
    + {
    + /* we didn't use the buffer block hibernation file, so delete it now. */
    + id = BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS;
    + unlink(BufferCacheHibernationData[id].hibernation_file);
    + }
    +
    + /*
    + * set the rest data structures (eg. lookup hashtable) up
    + * based on the buffer descriptors.
    + */
    + num_records = (oldNBuffers < NBuffers)? oldNBuffers : NBuffers;
    + for (i = 0; i < num_records; i++)
    + {
    + BufferDesc *buf;
    + BufferTag newTag;
    + uint32 newHash;
    + int buf_id;
    +
    + buf = &BufferDescriptors[i];
    + if (buf->tag.rnode.spcNode == InvalidOid &&
    + buf->tag.rnode.dbNode == InvalidOid &&
    + buf->tag.rnode.relNode == InvalidOid)
    + {
    + continue;
    + }
    +
    + INIT_BUFFERTAG(newTag, buf->tag.rnode, buf->tag.forkNum, buf->tag.blockNum);
    + newHash = BufTableHashCode(&newTag);
    +
    + if (buffer_block_processed == false)
    + {
    + Block bufBlock;
    + SMgrRelation smgr;
    +
    + /*
    + * re-read buffer block.
    + */
    + bufBlock = BufHdrGetBlock(buf);
    + smgr = smgropen(buf->tag.rnode, InvalidBackendId);
    + smgrread(smgr, newTag.forkNum, newTag.blockNum, (char *) bufBlock);
    + }
    +
    + buf_id = BufTableInsert(&newTag, newHash, buf->buf_id);
    + if (buf_id != -1)
    + {
    + /* the entry exists already, return it to the freelist. */
    + buf->refcount = 0;
    + buf->flags = 0;
    + InvalidateBuffer(buf);
    + continue;
    + }
    +
    + /* clear wait_backend_pid because the process was terminated already. */
    + buf->wait_backend_pid = 0;
    +
    +#ifdef DEBUG_BUFFER_CACHE_HIBERNATION
    + elog(DEBUG5,
    + "resume [%d]\t%03x,%d,%d,%d,%d\t%08x,%d,%d,%d,%d,%d",
    + buf->buf_id, buf->flags, buf->usage_count, buf->refcount,
    + buf->wait_backend_pid, buf->freeNext,
    + newHash, newTag.rnode.spcNode,
    + newTag.rnode.dbNode, newTag.rnode.relNode,
    + newTag.forkNum, newTag.blockNum);
    +#endif
    + }
    +
    + /*
    + * adjust StrategyControl based on the change of shared_buffers.
    + */
    + if (oldNBuffers != NBuffers)
    + {
    + AdjustStrategyControl(oldNBuffers);
    + }
    +
    + elog(NOTICE,
    + "buffer cache resumed successfully");
    +
    +cleanup:
    + for (i = 0; i < NBuffers; i++)
    + {
    + BufferDesc *buf;
    +
    + buf = &BufferDescriptors[i];
    + UnlockBufHdr(buf);
    + }
    +
    + if (buf_common != NULL)
    + {
    + free(buf_common);
    + }
    +
    + return;
    +}
    diff --git src/backend/storage/buffer/freelist.c src/backend/storage/buffer/freelist.c
    index bf9903b..ffc101d 100644
    --- src/backend/storage/buffer/freelist.c
    +++ src/backend/storage/buffer/freelist.c
    @@ -347,6 +347,12 @@ StrategyInitialize(bool init)
    }
    else
    Assert(!init);
    +
    + if (BufferCacheHibernationLevel > 0)
    + {
    + ResisterBufferCacheHibernation(BUFFER_CACHE_HIBERNATION_TYPE_STRATEGY,
    + (char *)StrategyControl, sizeof(BufferStrategyControl), 1);
    + }
    }


    @@ -521,3 +527,47 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, volatile BufferDesc *buf)

    return true;
    }
    +
    +/*
    + * AdjustStrategyControl -- adjust the member variables of StrategyControl
    + *
    + * If the shared_buffers setting had changed, restored StrategyControl
    + * needs to be adjusted for in both cases of shrinking and enlarging.
    + * This is called only from bufmgr.c:ResumeBufferCacheHibernation().
    + */
    +void
    +AdjustStrategyControl(int oldNBuffers)
    +{
    + if (oldNBuffers == NBuffers)
    + {
    + return;
    + }
    +
    + /* enlarge or shrink the free buffer based on current NBuffers. */
    + StrategyControl->lastFreeBuffer = NBuffers - 1;
    +
    + /* shared_buffers shrunk. */
    + if (oldNBuffers > NBuffers)
    + {
    + if (StrategyControl->nextVictimBuffer >= NBuffers)
    + {
    + /* set the tail of buffers. */
    + StrategyControl->nextVictimBuffer = NBuffers - 1;
    + }
    +
    + if (StrategyControl->firstFreeBuffer >= NBuffers)
    + {
    + /* set FREENEXT_END_OF_LIST(-1). */
    + StrategyControl->firstFreeBuffer = FREENEXT_END_OF_LIST;
    + }
    + }
    + else
    + /* shared_buffers enlarged. */
    + {
    + if (StrategyControl->firstFreeBuffer < 0)
    + {
    + /* set the next entry of the tail of old buffers. */
    + StrategyControl->firstFreeBuffer = oldNBuffers;
    + }
    + }
    +}
    diff --git src/backend/utils/misc/guc.c src/backend/utils/misc/guc.c
    index 738e215..5affc6e 100644
    --- src/backend/utils/misc/guc.c
    +++ src/backend/utils/misc/guc.c
    @@ -2361,6 +2361,18 @@ static struct config_int ConfigureNamesInt[] =
    NULL, NULL, NULL
    },

    + {
    + {"buffer_cache_hibernation_level", PGC_POSTMASTER, UNGROUPED,
    + gettext_noop("Sets buffer cache hibernation level."),
    + gettext_noop("0 to disable(default), "
    + "1 for saving buffer descriptors only(recommended), "
    + "2 for saving buffer descriptors and buffer blocks(slower at shutdown).")
    + },
    + &BufferCacheHibernationLevel,
    + 0, 0, 2,
    + NULL, NULL, NULL
    + },
    +
    /* End-of-list marker */
    {
    {NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
    diff --git src/backend/utils/misc/postgresql.conf.sample src/backend/utils/misc/postgresql.conf.sample
    index b8a1582..44b6ff3 100644
    --- src/backend/utils/misc/postgresql.conf.sample
    +++ src/backend/utils/misc/postgresql.conf.sample
    @@ -119,6 +119,17 @@
    #maintenance_work_mem = 16MB # min 1MB
    #max_stack_depth = 2MB # min 100kB

    +
    +# Buffer Cache Hibernation:
    +# Suspend/resume buffer cache data structure using hibernation files
    +# at shutdown/startup.
    +#buffer_cache_hibernation_level = 0 # Sets buffer cache hibernation level.
    + # 0 to disable(default),
    + # 1 for saving buffer descriptors only
    + # (recommended),
    + # 2 for saving buffer descriptors and
    + # buffer blocks(slower at shutdown).
    +
    # - Kernel Resource Usage -

    #max_files_per_process = 1000 # min 25
    diff --git src/include/access/xlog.h src/include/access/xlog.h
    index 7056fd6..7a9fb99 100644
    --- src/include/access/xlog.h
    +++ src/include/access/xlog.h
    @@ -13,6 +13,7 @@

    #include "access/rmgr.h"
    #include "access/xlogdefs.h"
    +#include "catalog/pg_control.h"
    #include "lib/stringinfo.h"
    #include "storage/buf.h"
    #include "utils/pg_crc.h"
    @@ -294,6 +295,7 @@ extern bool XLogInsertAllowed(void);
    extern void GetXLogReceiptTime(TimestampTz *rtime, bool *fromStream);
    extern XLogRecPtr GetXLogReplayRecPtr(void);

    +extern bool GetControlFile(ControlFileData *controlFile);
    extern void UpdateControlFile(void);
    extern uint64 GetSystemIdentifier(void);
    extern Size XLOGShmemSize(void);
    diff --git src/include/storage/buf_internals.h src/include/storage/buf_internals.h
    index b7d4ea5..d537ef1 100644
    --- src/include/storage/buf_internals.h
    +++ src/include/storage/buf_internals.h
    @@ -167,6 +167,7 @@ typedef struct sbufdesc
    */
    #define LockBufHdr(bufHdr) SpinLockAcquire(&(bufHdr)->buf_hdr_lock)
    #define UnlockBufHdr(bufHdr) SpinLockRelease(&(bufHdr)->buf_hdr_lock)
    +#define IsUnlockBufHdr(bufHdr) SpinLockFree(&(bufHdr)->buf_hdr_lock)


    /* in buf_init.c */
    @@ -190,6 +191,7 @@ extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
    extern int StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc);
    extern Size StrategyShmemSize(void);
    extern void StrategyInitialize(bool init);
    +extern void AdjustStrategyControl(int oldNBuffers);

    /* buf_table.c */
    extern Size BufTableShmemSize(int size);
    diff --git src/include/storage/bufmgr.h src/include/storage/bufmgr.h
    index b8fc87e..ddfeb9d 100644
    --- src/include/storage/bufmgr.h
    +++ src/include/storage/bufmgr.h
    @@ -211,6 +211,20 @@ extern void BgBufferSync(void);

    extern void AtProcExit_LocalBuffers(void);

    +/* buffer cache hibernation support stuff */
    +extern int BufferCacheHibernationLevel;
    +
    +typedef enum BufferHibernationFileType
    +{
    + BUFFER_CACHE_HIBERNATION_TYPE_STRATEGY,
    + BUFFER_CACHE_HIBERNATION_TYPE_DESCRIPTORS,
    + BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS
    +} BufferHibernationFileType;
    +
    +extern void ResisterBufferCacheHibernation(BufferHibernationFileType id,
    + char *ptr, Size record_length, Size num_records);
    +extern void ResumeBufferCacheHibernation(void);
    +
    /* in freelist.c */
    extern BufferAccessStrategy GetAccessStrategy(BufferAccessStrategyType btype);
    extern void FreeAccessStrategy(BufferAccessStrategy strategy);
  • Greg Smith at Jun 7, 2011 at 7:39 pm

    On 06/05/2011 08:50 AM, Mitsuru IWASAKI wrote:
    It seems that I don't have enough time to complete this work.
    You don't need to keep cc'ing me, and I'm very happy if postgres to be
    the first DBMS which support buffer cache hibernation feature.
    Thanks for submitting the patch, and we'll see what happens from here.
    I've switch to bcc'ing you here and we should get you off everyone
    else's cc: list here soon. If this feature ends up getting committed,
    I'll try to remember to drop you a note about it so you can see what
    happened.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
  • Bruce Momjian at Oct 14, 2011 at 12:02 am
    Should this be marked as TODO?

    ---------------------------------------------------------------------------

    Mitsuru IWASAKI wrote:
    Hi,
    On 05/07/2011 03:32 AM, Mitsuru IWASAKI wrote:
    For 1, I've just finish my work. The latest patch is available at:
    http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110507.patch
    Reminder here--we can't accept code based on it being published to a web
    page. You'll need to e-mail it to the pgsql-hackers mailing list to be
    considered for the next PostgreSQL CommitFest, which is starting in a
    few weeks. Code submitted to the mailing list is considered a release
    of it to the project under the PostgreSQL license, which we can't just
    assume for things when given only a URL to them.
    Sorry about that, but I had enough time to revise my patches this week-end.
    I attached the patches in this mail, and will update CommitFest page soon.
    Also, you suggested you were out of time to work on this. If that's the
    case, we'd like to know that so we don't keep cc'ing you about things in
    expectation of an answer. Someone else may pick this up as a project to
    continue working on. But it's going to need a fair amount of revision
    before it matches what people want here, and I'm not sure how much of
    what you've written is going to end up in any commit that may happen
    from this idea.
    It seems that I don't have enough time to complete this work.
    You don't need to keep cc'ing me, and I'm very happy if postgres to be
    the first DBMS which support buffer cache hibernation feature.

    Thanks!


    diff --git src/backend/access/transam/xlog.c src/backend/access/transam/xlog.c
    index b0e4c41..7a3a207 100644
    --- src/backend/access/transam/xlog.c
    +++ src/backend/access/transam/xlog.c
    @@ -4834,6 +4834,19 @@ ReadControlFile(void)
    #endif
    }

    +bool
    +GetControlFile(ControlFileData *controlFile)
    +{
    + if (ControlFile == NULL)
    + {
    + return false;
    + }
    +
    + memcpy(controlFile, ControlFile, sizeof(ControlFileData));
    +
    + return true;
    +}
    +
    void
    UpdateControlFile(void)
    {
    diff --git src/backend/bootstrap/bootstrap.c src/backend/bootstrap/bootstrap.c
    index fc093cc..7ecf6bb 100644
    --- src/backend/bootstrap/bootstrap.c
    +++ src/backend/bootstrap/bootstrap.c
    @@ -360,6 +360,15 @@ AuxiliaryProcessMain(int argc, char *argv[])
    BaseInit();

    /*
    + * Only StartupProcess can call ResumeBufferCacheHibernation() after
    + * InitFileAccess() and smgrinit().
    + */
    + if (auxType == StartupProcess && BufferCacheHibernationLevel > 0)
    + {
    + ResumeBufferCacheHibernation();
    + }
    +
    + /*
    * When we are an auxiliary process, we aren't going to do the full
    * InitPostgres pushups, but there are a couple of things that need to get
    * lit up even in an auxiliary process.
    diff --git src/backend/storage/buffer/buf_init.c src/backend/storage/buffer/buf_init.c
    index dadb49d..52eb51a 100644
    --- src/backend/storage/buffer/buf_init.c
    +++ src/backend/storage/buffer/buf_init.c
    @@ -127,6 +127,14 @@ InitBufferPool(void)

    /* Init other shared buffer-management stuff */
    StrategyInitialize(!foundDescs);
    +
    + if (BufferCacheHibernationLevel > 0)
    + {
    + ResisterBufferCacheHibernation(BUFFER_CACHE_HIBERNATION_TYPE_DESCRIPTORS,
    + (char *)BufferDescriptors, sizeof(BufferDesc), NBuffers);
    + ResisterBufferCacheHibernation(BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS,
    + (char *)BufferBlocks, BLCKSZ, NBuffers);
    + }
    }

    /*
    diff --git src/backend/storage/buffer/bufmgr.c src/backend/storage/buffer/bufmgr.c
    index f96685d..dba8ebf 100644
    --- src/backend/storage/buffer/bufmgr.c
    +++ src/backend/storage/buffer/bufmgr.c
    @@ -31,6 +31,7 @@
    #include "postgres.h"

    #include <sys/file.h>
    +#include <sys/stat.h>
    #include <unistd.h>

    #include "catalog/catalog.h"
    @@ -61,6 +62,13 @@
    #define BUF_WRITTEN 0x01
    #define BUF_REUSABLE 0x02

    +/*
    + * Buffer Cache Hibernation stuff.
    + */
    +/* enable this to debug buffer cache hibernation. */
    +#if 0
    +#define DEBUG_BUFFER_CACHE_HIBERNATION
    +#endif

    /* GUC variables */
    bool zero_damaged_pages = false;
    @@ -765,6 +773,16 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
    }
    }

    +#ifdef DEBUG_BUFFER_CACHE_HIBERNATION
    + elog(DEBUG5,
    + "alloc [%d]\t%03x,%d,%d,%d,%d\t%08x,%d,%d,%d,%d,%d",
    + buf->buf_id, buf->flags, buf->usage_count, buf->refcount,
    + buf->wait_backend_pid, buf->freeNext,
    + newHash, newTag.rnode.spcNode,
    + newTag.rnode.dbNode, newTag.rnode.relNode,
    + newTag.forkNum, newTag.blockNum);
    +#endif
    +
    return buf;
    }

    @@ -800,6 +818,16 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
    * the old content is no longer relevant. (The usage_count starts out at
    * 1 so that the buffer can survive one clock-sweep pass.)
    */
    +#ifdef DEBUG_BUFFER_CACHE_HIBERNATION
    + elog(DEBUG5,
    + "rename [%d]\t%03x,%d,%d,%d,%d\t%08x,%d,%d,%d,%d,%d",
    + buf->buf_id, buf->flags, buf->usage_count, buf->refcount,
    + buf->wait_backend_pid, buf->freeNext,
    + oldHash, oldTag.rnode.spcNode,
    + oldTag.rnode.dbNode, oldTag.rnode.relNode,
    + oldTag.forkNum, oldTag.blockNum);
    +#endif
    +
    buf->tag = newTag;
    buf->flags &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT);
    if (relpersistence == RELPERSISTENCE_PERMANENT)
    @@ -2772,3 +2800,716 @@ local_buffer_write_error_callback(void *arg)
    pfree(path);
    }
    }
    +
    +/* ----------------------------------------------------------------
    + * Buffer Cache Hibernation support stuff
    + *
    + * Suspend/resume buffer cache data structure using hibernation files
    + * at shutdown/startup.
    + * ----------------------------------------------------------------
    + */
    +
    +int BufferCacheHibernationLevel = 0;
    +
    +#define BUFFER_CACHE_HIBERNATION_FILE_STRATEGY "global/pg_buffer_cache_hibernation_strategy"
    +#define BUFFER_CACHE_HIBERNATION_FILE_DESCRIPTORS "global/pg_buffer_cache_hibernation_descriptors"
    +#define BUFFER_CACHE_HIBERNATION_FILE_BLOCKS "global/pg_buffer_cache_hibernation_blocks"
    +#define BUFFER_CACHE_HIBERNATION_FILE_CRC32 "global/pg_buffer_cache_hibernation_crc32"
    +
    +static struct
    +{
    + char *hibernation_file;
    + char *data_ptr;
    + Size record_length;
    + Size num_records;
    + pg_crc32 crc;
    +} BufferCacheHibernationData[] =
    +{
    + /* BufferStrategyControl */
    + {
    + BUFFER_CACHE_HIBERNATION_FILE_STRATEGY,
    + NULL, 0, 0, 0
    + },
    +
    + /* BufferDescriptors */
    + {
    + BUFFER_CACHE_HIBERNATION_FILE_DESCRIPTORS,
    + NULL, 0, 0, 0
    + },
    +
    + /* BufferBlocks */
    + {
    + BUFFER_CACHE_HIBERNATION_FILE_BLOCKS,
    + NULL, 0, 0, 0
    + },
    +
    + /* End-of-list marker */
    + {
    + NULL,
    + NULL, 0, 0, 0
    + },
    +};
    +
    +static ControlFileData controlFile;
    +static bool controlFileInitialized = false;
    +
    +/*
    + * AtProcExit_BufferCacheHibernation:
    + * store the buffer cache into hibernation files at shutdown.
    + */
    +static void
    +AtProcExit_BufferCacheHibernation(int code, Datum arg)
    +{
    + BufferHibernationFileType id;
    + int i;
    + int fd;
    +
    + if (BufferCacheHibernationLevel == 0)
    + {
    + return;
    + }
    +
    + /*
    + * get the control file to check the system state validation.
    + */
    + if (GetControlFile(&controlFile) == false)
    + {
    + elog(WARNING,
    + "could not get control file, "
    + "aborting buffer cache hibernation");
    + return;
    + }
    +
    + if (controlFile.state != DB_SHUTDOWNED)
    + {
    + elog(WARNING,
    + "database system was not shut down normally, "
    + "aborting buffer cache hibernation");
    + return;
    + }
    +
    + /*
    + * suspend buffer cache data structure into hibernation files.
    + */
    + for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    + {
    + Size record_length;
    + Size num_records;
    + char *ptr;
    + pg_crc32 crc;
    +
    + if (BufferCacheHibernationLevel < 2 &&
    + id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + continue;
    + }
    +
    + if (BufferCacheHibernationData[id].data_ptr == NULL ||
    + BufferCacheHibernationData[id].record_length == 0 ||
    + BufferCacheHibernationData[id].num_records == 0)
    + {
    + elog(WARNING,
    + "ResisterBufferCacheHibernation() was not called for %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + goto cleanup;
    + }
    +
    + fd = BasicOpenFile(BufferCacheHibernationData[id].hibernation_file,
    + O_CREAT | O_WRONLY | O_TRUNC | PG_BINARY, S_IRUSR | S_IWUSR);
    + if (fd < 0)
    + {
    + elog(WARNING,
    + "could not open %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + goto cleanup;
    + }
    +
    + record_length = BufferCacheHibernationData[id].record_length;
    + num_records = BufferCacheHibernationData[id].num_records;
    +
    + elog(NOTICE,
    + "buffer cache hibernate into %s",
    + BufferCacheHibernationData[id].hibernation_file);
    +
    + INIT_CRC32(crc);
    + for (i = 0; i < num_records; i++)
    + {
    + ptr = BufferCacheHibernationData[id].data_ptr + (i * record_length);
    + if (write(fd, (void *)ptr, record_length) != record_length)
    + {
    + elog(WARNING,
    + "could not write %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + goto cleanup;
    + }
    +
    + COMP_CRC32(crc, ptr, record_length);
    + }
    +
    + FIN_CRC32(crc);
    + close(fd);
    +
    + BufferCacheHibernationData[id].crc = crc;
    + }
    +
    + /*
    + * save the computed crc values for the validations at resuming.
    + */
    + fd = BasicOpenFile(BUFFER_CACHE_HIBERNATION_FILE_CRC32,
    + O_CREAT | O_WRONLY | O_TRUNC | PG_BINARY, S_IRUSR | S_IWUSR);
    + if (fd < 0)
    + {
    + elog(WARNING,
    + "could not open %s",
    + BUFFER_CACHE_HIBERNATION_FILE_CRC32);
    + goto cleanup;
    + }
    +
    + for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    + {
    + pg_crc32 crc;
    +
    + if (BufferCacheHibernationLevel < 2 &&
    + id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + continue;
    + }
    +
    + crc = BufferCacheHibernationData[id].crc;
    + if (write(fd, (void *)&crc, sizeof(pg_crc32)) != sizeof(pg_crc32))
    + {
    + elog(WARNING,
    + "could not write %s for %s",
    + BUFFER_CACHE_HIBERNATION_FILE_CRC32,
    + BufferCacheHibernationData[id].hibernation_file);
    + goto cleanup;
    + }
    + }
    + close(fd);
    +
    + elog(NOTICE,
    + "buffer cache suspended successfully");
    +
    + return;
    +
    +cleanup:
    + for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    + {
    + unlink(BufferCacheHibernationData[id].hibernation_file);
    + }
    +
    + return;
    +}
    +
    +/*
    + * ResisterBufferCacheHibernation:
    + * register the buffer cache data structure info.
    + */
    +void
    +ResisterBufferCacheHibernation(BufferHibernationFileType id, char *ptr, Size record_length, Size num_records)
    +{
    + static bool first_time = true;
    +
    + if (BufferCacheHibernationLevel == 0)
    + {
    + return;
    + }
    +
    + if (id != BUFFER_CACHE_HIBERNATION_TYPE_STRATEGY &&
    + id != BUFFER_CACHE_HIBERNATION_TYPE_DESCRIPTORS &&
    + id != BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + return;
    + }
    +
    + if (first_time)
    + {
    + /*
    + * AtProcExit_BufferCacheHibernation to be called at shutdown.
    + */
    + on_shmem_exit(AtProcExit_BufferCacheHibernation, 0);
    + first_time = false;
    + }
    +
    + /*
    + * get the control file to check the system state and
    + * hibernation file validations.
    + */
    + if (controlFileInitialized == false)
    + {
    + if (GetControlFile(&controlFile) == true)
    + {
    + controlFileInitialized = true;
    + }
    + }
    +
    + BufferCacheHibernationData[id].data_ptr = ptr;
    + BufferCacheHibernationData[id].record_length = record_length;
    + BufferCacheHibernationData[id].num_records = num_records;
    +}
    +
    +/*
    + * ResumeBufferCacheHibernation:
    + * resume the buffer cache from hibernation file at startup.
    + */
    +void
    +ResumeBufferCacheHibernation(void)
    +{
    + BufferHibernationFileType id;
    + int i;
    + int fd;
    + Size num_records;
    + Size record_length;
    + char *buf_common;
    + int oldNBuffers;
    + bool buffer_block_processed;
    +
    + if (BufferCacheHibernationLevel == 0)
    + {
    + return;
    + }
    +
    + buf_common = NULL;
    + buffer_block_processed = false;
    +
    + /*
    + * lock all buffer descriptors to prevent other processes from
    + * updating buffers.
    + */
    + for (i = 0; i < NBuffers; i++)
    + {
    + BufferDesc *buf;
    +
    + buf = &BufferDescriptors[i];
    + LockBufHdr(buf);
    + }
    +
    + /*
    + * get the control file to check the system state and
    + * hibernation file validations.
    + */
    + if (controlFileInitialized == false)
    + {
    + elog(WARNING,
    + "could not get control file, "
    + "aborting buffer cache hibernation");
    + goto cleanup;
    + }
    +
    + if (controlFile.state != DB_SHUTDOWNED)
    + {
    + elog(WARNING,
    + "database system was not shut down normally, "
    + "aborting buffer cache hibernation");
    + goto cleanup;
    + }
    +
    + /*
    + * read the crc values which was computed when the hibernation
    + * files were created.
    + */
    + fd = BasicOpenFile(BUFFER_CACHE_HIBERNATION_FILE_CRC32,
    + O_RDONLY | PG_BINARY, S_IRUSR | S_IWUSR);
    + if (fd < 0)
    + {
    + elog(WARNING,
    + "could not open %s",
    + BUFFER_CACHE_HIBERNATION_FILE_CRC32);
    + goto cleanup;
    + }
    +
    + for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    + {
    + pg_crc32 crc;
    +
    + if (BufferCacheHibernationLevel < 2 &&
    + id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + continue;
    + }
    +
    + if (read(fd, (void *)&crc, sizeof(pg_crc32)) != sizeof(pg_crc32))
    + {
    + if (BufferCacheHibernationLevel == 2 &&
    + id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + /*
    + * if buffer_cache_hibernation_level changes 1 to 2,
    + * the crc value of buffer block hibernation file may not exist.
    + * just ignore it here.
    + */
    + continue;
    + }
    +
    + elog(WARNING,
    + "could not read %s for %s",
    + BUFFER_CACHE_HIBERNATION_FILE_CRC32,
    + BufferCacheHibernationData[id].hibernation_file);
    + close(fd);
    + goto cleanup;
    + }
    + BufferCacheHibernationData[id].crc = crc;
    + }
    +
    + close(fd);
    +
    + /*
    + * allocate a buffer to read the contents of the hibernation files
    + * for validations.
    + */
    + record_length = 0;
    + for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    + {
    + if (record_length < BufferCacheHibernationData[id].record_length)
    + {
    + record_length = BufferCacheHibernationData[id].record_length;
    + }
    + }
    +
    + buf_common = malloc(record_length);
    + Assert(buf_common != NULL);
    +
    + /* assume that the number of buffers have not changed. */
    + oldNBuffers = NBuffers;
    +
    + /*
    + * check if all hibernation files are valid.
    + */
    + for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    + {
    + struct stat sb;
    + pg_crc32 crc;
    +
    + if (BufferCacheHibernationLevel < 2 &&
    + id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + continue;
    + }
    +
    + if (BufferCacheHibernationData[id].data_ptr == NULL ||
    + BufferCacheHibernationData[id].record_length == 0 ||
    + BufferCacheHibernationData[id].num_records == 0)
    + {
    + elog(WARNING,
    + "ResisterBufferCacheHibernation() was not called for %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + goto cleanup;
    + }
    +
    + fd = BasicOpenFile(BufferCacheHibernationData[id].hibernation_file,
    + O_RDONLY | PG_BINARY, S_IRUSR | S_IWUSR);
    + if (fd < 0)
    + {
    + if (BufferCacheHibernationLevel == 2 &&
    + id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + /*
    + * if buffer_cache_hibernation_level changes 1 to 2,
    + * the buffer block hibernation file may not exist.
    + * just ignore it here.
    + */
    + continue;
    + }
    +
    + goto cleanup;
    + }
    +
    + if (fstat(fd, &sb) < 0)
    + {
    + elog(WARNING,
    + "could not get stats of the buffer cache hibernation file: %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + close(fd);
    + goto cleanup;
    + }
    +
    + record_length = BufferCacheHibernationData[id].record_length;
    + num_records = BufferCacheHibernationData[id].num_records;
    +
    + if (sb.st_size != (record_length * num_records))
    + {
    + /* The size of StrategyControl should be the same always. */
    + if (id == BUFFER_CACHE_HIBERNATION_TYPE_STRATEGY ||
    + (sb.st_size % record_length) > 0)
    + {
    + elog(WARNING,
    + "size mismatch on the buffer cache hibernation file: %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + close(fd);
    + goto cleanup;
    + }
    +
    + /*
    + * The number of records of buffer descriptors and blocks
    + * should be the same.
    + */
    + if (oldNBuffers != NBuffers &&
    + oldNBuffers != (sb.st_size / record_length))
    + {
    + elog(WARNING,
    + "size mismatch on the buffer cache hibernation file: %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + close(fd);
    + goto cleanup;
    + }
    +
    + oldNBuffers = sb.st_size / record_length;
    +
    + elog(NOTICE,
    + "shared_buffers have changed from %d to %d: %s",
    + oldNBuffers, NBuffers,
    + BufferCacheHibernationData[id].hibernation_file);
    +
    + /* use the original size to compute CRC of the hibernation file. */
    + num_records = oldNBuffers;
    + }
    +
    + if ((pg_time_t)sb.st_mtime < controlFile.time)
    + {
    + elog(WARNING,
    + "the hibernation file is older than control file: %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + close(fd);
    + goto cleanup;
    + }
    +
    + INIT_CRC32(crc);
    + for (i = 0; i < num_records; i++)
    + {
    + if (read(fd, (void *)buf_common, record_length) != record_length)
    + {
    + elog(WARNING,
    + "could not read the buffer cache hibernation file: %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + close(fd);
    + goto cleanup;
    + }
    +
    + COMP_CRC32(crc, buf_common, record_length);
    +
    + /*
    + * buffer descriptors validations.
    + */
    + if (id == BUFFER_CACHE_HIBERNATION_TYPE_DESCRIPTORS)
    + {
    + BufferDesc *buf;
    + BufFlags abnormal_flags;
    +
    + if (i >= NBuffers)
    + {
    + continue;
    + }
    +
    + abnormal_flags = (BM_DIRTY | BM_IO_IN_PROGRESS | BM_IO_ERROR |
    + BM_JUST_DIRTIED | BM_PIN_COUNT_WAITER);
    +
    + buf = (BufferDesc *)buf_common;
    +
    + if (buf->flags & abnormal_flags)
    + {
    + elog(WARNING,
    + "abnormal flags in buffer descriptors: %d",
    + buf->flags);
    + close(fd);
    + goto cleanup;
    + }
    +
    + if (buf->usage_count > BM_MAX_USAGE_COUNT)
    + {
    + elog(WARNING,
    + "invalid usage count in buffer descriptors: %d",
    + buf->usage_count);
    + close(fd);
    + goto cleanup;
    + }
    +
    + if (buf->buf_id < 0 || buf->buf_id >= num_records)
    + {
    + elog(WARNING,
    + "invalid buffer id in buffer descriptors: %d",
    + buf->buf_id);
    + close(fd);
    + goto cleanup;
    + }
    + }
    + }
    +
    + FIN_CRC32(crc);
    + close(fd);
    +
    + if (!EQ_CRC32(BufferCacheHibernationData[id].crc, crc))
    + {
    + elog(WARNING,
    + "crc mismatch on the buffer cache hibernation file: %s",
    + BufferCacheHibernationData[id].hibernation_file);
    + close(fd);
    + goto cleanup;
    + }
    + }
    +
    + /*
    + * resume the buffer cache data structure from the hibernation files.
    + */
    + for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    + {
    + int fd;
    + char *ptr;
    +
    + if (BufferCacheHibernationLevel < 2 &&
    + id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + continue;
    + }
    +
    + record_length = BufferCacheHibernationData[id].record_length;
    + num_records = BufferCacheHibernationData[id].num_records;
    +
    + if (id != BUFFER_CACHE_HIBERNATION_TYPE_STRATEGY)
    + {
    + /* use the smaller number of buffers. */
    + num_records = (oldNBuffers < NBuffers)? oldNBuffers : NBuffers;
    + }
    +
    + fd = BasicOpenFile(BufferCacheHibernationData[id].hibernation_file,
    + O_RDONLY | PG_BINARY, S_IRUSR | S_IWUSR);
    + if (fd < 0)
    + {
    + if (BufferCacheHibernationLevel == 2 &&
    + id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + /*
    + * if buffer_cache_hibernation_level changes 1 to 2,
    + * the buffer block hibernation file may not exist.
    + * just ignore it here.
    + */
    + continue;
    + }
    +
    + goto cleanup;
    + }
    +
    + elog(NOTICE,
    + "buffer cache resume from %s(%d bytes * %d records)",
    + BufferCacheHibernationData[id].hibernation_file,
    + record_length, num_records);
    +
    + for (i = 0; i < num_records; i++)
    + {
    + ptr = BufferCacheHibernationData[id].data_ptr + (i * record_length);
    + read(fd, (void *)ptr, record_length);
    +
    + /* Re-lock the buffer descriptor if necessary. */
    + if (id == BUFFER_CACHE_HIBERNATION_TYPE_DESCRIPTORS)
    + {
    + BufferDesc *buf;
    +
    + buf = (BufferDesc *)ptr;
    + if (IsUnlockBufHdr(buf))
    + {
    + LockBufHdr(buf);
    + }
    + }
    + }
    +
    + close(fd);
    +
    + if (id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    + {
    + buffer_block_processed = true;
    + }
    + }
    +
    + if (buffer_block_processed == false)
    + {
    + /* we didn't use the buffer block hibernation file, so delete it now. */
    + id = BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS;
    + unlink(BufferCacheHibernationData[id].hibernation_file);
    + }
    +
    + /*
    + * set the rest data structures (eg. lookup hashtable) up
    + * based on the buffer descriptors.
    + */
    + num_records = (oldNBuffers < NBuffers)? oldNBuffers : NBuffers;
    + for (i = 0; i < num_records; i++)
    + {
    + BufferDesc *buf;
    + BufferTag newTag;
    + uint32 newHash;
    + int buf_id;
    +
    + buf = &BufferDescriptors[i];
    + if (buf->tag.rnode.spcNode == InvalidOid &&
    + buf->tag.rnode.dbNode == InvalidOid &&
    + buf->tag.rnode.relNode == InvalidOid)
    + {
    + continue;
    + }
    +
    + INIT_BUFFERTAG(newTag, buf->tag.rnode, buf->tag.forkNum, buf->tag.blockNum);
    + newHash = BufTableHashCode(&newTag);
    +
    + if (buffer_block_processed == false)
    + {
    + Block bufBlock;
    + SMgrRelation smgr;
    +
    + /*
    + * re-read buffer block.
    + */
    + bufBlock = BufHdrGetBlock(buf);
    + smgr = smgropen(buf->tag.rnode, InvalidBackendId);
    + smgrread(smgr, newTag.forkNum, newTag.blockNum, (char *) bufBlock);
    + }
    +
    + buf_id = BufTableInsert(&newTag, newHash, buf->buf_id);
    + if (buf_id != -1)
    + {
    + /* the entry exists already, return it to the freelist. */
    + buf->refcount = 0;
    + buf->flags = 0;
    + InvalidateBuffer(buf);
    + continue;
    + }
    +
    + /* clear wait_backend_pid because the process was terminated already. */
    + buf->wait_backend_pid = 0;
    +
    +#ifdef DEBUG_BUFFER_CACHE_HIBERNATION
    + elog(DEBUG5,
    + "resume [%d]\t%03x,%d,%d,%d,%d\t%08x,%d,%d,%d,%d,%d",
    + buf->buf_id, buf->flags, buf->usage_count, buf->refcount,
    + buf->wait_backend_pid, buf->freeNext,
    + newHash, newTag.rnode.spcNode,
    + newTag.rnode.dbNode, newTag.rnode.relNode,
    + newTag.forkNum, newTag.blockNum);
    +#endif
    + }
    +
    + /*
    + * adjust StrategyControl based on the change of shared_buffers.
    + */
    + if (oldNBuffers != NBuffers)
    + {
    + AdjustStrategyControl(oldNBuffers);
    + }
    +
    + elog(NOTICE,
    + "buffer cache resumed successfully");
    +
    +cleanup:
    + for (i = 0; i < NBuffers; i++)
    + {
    + BufferDesc *buf;
    +
    + buf = &BufferDescriptors[i];
    + UnlockBufHdr(buf);
    + }
    +
    + if (buf_common != NULL)
    + {
    + free(buf_common);
    + }
    +
    + return;
    +}
    diff --git src/backend/storage/buffer/freelist.c src/backend/storage/buffer/freelist.c
    index bf9903b..ffc101d 100644
    --- src/backend/storage/buffer/freelist.c
    +++ src/backend/storage/buffer/freelist.c
    @@ -347,6 +347,12 @@ StrategyInitialize(bool init)
    }
    else
    Assert(!init);
    +
    + if (BufferCacheHibernationLevel > 0)
    + {
    + ResisterBufferCacheHibernation(BUFFER_CACHE_HIBERNATION_TYPE_STRATEGY,
    + (char *)StrategyControl, sizeof(BufferStrategyControl), 1);
    + }
    }


    @@ -521,3 +527,47 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, volatile BufferDesc *buf)

    return true;
    }
    +
    +/*
    + * AdjustStrategyControl -- adjust the member variables of StrategyControl
    + *
    + * If the shared_buffers setting had changed, restored StrategyControl
    + * needs to be adjusted for in both cases of shrinking and enlarging.
    + * This is called only from bufmgr.c:ResumeBufferCacheHibernation().
    + */
    +void
    +AdjustStrategyControl(int oldNBuffers)
    +{
    + if (oldNBuffers == NBuffers)
    + {
    + return;
    + }
    +
    + /* enlarge or shrink the free buffer based on current NBuffers. */
    + StrategyControl->lastFreeBuffer = NBuffers - 1;
    +
    + /* shared_buffers shrunk. */
    + if (oldNBuffers > NBuffers)
    + {
    + if (StrategyControl->nextVictimBuffer >= NBuffers)
    + {
    + /* set the tail of buffers. */
    + StrategyControl->nextVictimBuffer = NBuffers - 1;
    + }
    +
    + if (StrategyControl->firstFreeBuffer >= NBuffers)
    + {
    + /* set FREENEXT_END_OF_LIST(-1). */
    + StrategyControl->firstFreeBuffer = FREENEXT_END_OF_LIST;
    + }
    + }
    + else
    + /* shared_buffers enlarged. */
    + {
    + if (StrategyControl->firstFreeBuffer < 0)
    + {
    + /* set the next entry of the tail of old buffers. */
    + StrategyControl->firstFreeBuffer = oldNBuffers;
    + }
    + }
    +}
    diff --git src/backend/utils/misc/guc.c src/backend/utils/misc/guc.c
    index 738e215..5affc6e 100644
    --- src/backend/utils/misc/guc.c
    +++ src/backend/utils/misc/guc.c
    @@ -2361,6 +2361,18 @@ static struct config_int ConfigureNamesInt[] =
    NULL, NULL, NULL
    },

    + {
    + {"buffer_cache_hibernation_level", PGC_POSTMASTER, UNGROUPED,
    + gettext_noop("Sets buffer cache hibernation level."),
    + gettext_noop("0 to disable(default), "
    + "1 for saving buffer descriptors only(recommended), "
    + "2 for saving buffer descriptors and buffer blocks(slower at shutdown).")
    + },
    + &BufferCacheHibernationLevel,
    + 0, 0, 2,
    + NULL, NULL, NULL
    + },
    +
    /* End-of-list marker */
    {
    {NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
    diff --git src/backend/utils/misc/postgresql.conf.sample src/backend/utils/misc/postgresql.conf.sample
    index b8a1582..44b6ff3 100644
    --- src/backend/utils/misc/postgresql.conf.sample
    +++ src/backend/utils/misc/postgresql.conf.sample
    @@ -119,6 +119,17 @@
    #maintenance_work_mem = 16MB # min 1MB
    #max_stack_depth = 2MB # min 100kB

    +
    +# Buffer Cache Hibernation:
    +# Suspend/resume buffer cache data structure using hibernation files
    +# at shutdown/startup.
    +#buffer_cache_hibernation_level = 0 # Sets buffer cache hibernation level.
    + # 0 to disable(default),
    + # 1 for saving buffer descriptors only
    + # (recommended),
    + # 2 for saving buffer descriptors and
    + # buffer blocks(slower at shutdown).
    +
    # - Kernel Resource Usage -

    #max_files_per_process = 1000 # min 25
    diff --git src/include/access/xlog.h src/include/access/xlog.h
    index 7056fd6..7a9fb99 100644
    --- src/include/access/xlog.h
    +++ src/include/access/xlog.h
    @@ -13,6 +13,7 @@

    #include "access/rmgr.h"
    #include "access/xlogdefs.h"
    +#include "catalog/pg_control.h"
    #include "lib/stringinfo.h"
    #include "storage/buf.h"
    #include "utils/pg_crc.h"
    @@ -294,6 +295,7 @@ extern bool XLogInsertAllowed(void);
    extern void GetXLogReceiptTime(TimestampTz *rtime, bool *fromStream);
    extern XLogRecPtr GetXLogReplayRecPtr(void);

    +extern bool GetControlFile(ControlFileData *controlFile);
    extern void UpdateControlFile(void);
    extern uint64 GetSystemIdentifier(void);
    extern Size XLOGShmemSize(void);
    diff --git src/include/storage/buf_internals.h src/include/storage/buf_internals.h
    index b7d4ea5..d537ef1 100644
    --- src/include/storage/buf_internals.h
    +++ src/include/storage/buf_internals.h
    @@ -167,6 +167,7 @@ typedef struct sbufdesc
    */
    #define LockBufHdr(bufHdr) SpinLockAcquire(&(bufHdr)->buf_hdr_lock)
    #define UnlockBufHdr(bufHdr) SpinLockRelease(&(bufHdr)->buf_hdr_lock)
    +#define IsUnlockBufHdr(bufHdr) SpinLockFree(&(bufHdr)->buf_hdr_lock)


    /* in buf_init.c */
    @@ -190,6 +191,7 @@ extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
    extern int StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc);
    extern Size StrategyShmemSize(void);
    extern void StrategyInitialize(bool init);
    +extern void AdjustStrategyControl(int oldNBuffers);

    /* buf_table.c */
    extern Size BufTableShmemSize(int size);
    diff --git src/include/storage/bufmgr.h src/include/storage/bufmgr.h
    index b8fc87e..ddfeb9d 100644
    --- src/include/storage/bufmgr.h
    +++ src/include/storage/bufmgr.h
    @@ -211,6 +211,20 @@ extern void BgBufferSync(void);

    extern void AtProcExit_LocalBuffers(void);

    +/* buffer cache hibernation support stuff */
    +extern int BufferCacheHibernationLevel;
    +
    +typedef enum BufferHibernationFileType
    +{
    + BUFFER_CACHE_HIBERNATION_TYPE_STRATEGY,
    + BUFFER_CACHE_HIBERNATION_TYPE_DESCRIPTORS,
    + BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS
    +} BufferHibernationFileType;
    +
    +extern void ResisterBufferCacheHibernation(BufferHibernationFileType id,
    + char *ptr, Size record_length, Size num_records);
    +extern void ResumeBufferCacheHibernation(void);
    +
    /* in freelist.c */
    extern BufferAccessStrategy GetAccessStrategy(BufferAccessStrategyType btype);
    extern void FreeAccessStrategy(BufferAccessStrategy strategy);

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers
    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Cédric Villemain at Oct 14, 2011 at 8:44 am
    2011/10/14 Bruce Momjian <bruce@momjian.us>:
    Should this be marked as TODO?
    I suppose TODO items *are* wanted and so working on them should remove
    the pain to convince people here to accept the feature, aren't they ?
    ---------------------------------------------------------------------------

    Mitsuru IWASAKI wrote:
    Hi,
    On 05/07/2011 03:32 AM, Mitsuru IWASAKI wrote:
    For 1, I've just finish my work.  The latest patch is available at:
    http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110507.patch
    Reminder here--we can't accept code based on it being published to a web
    page.  You'll need to e-mail it to the pgsql-hackers mailing list to be
    considered for the next PostgreSQL CommitFest, which is starting in a
    few weeks.  Code submitted to the mailing list is considered a release
    of it to the project under the PostgreSQL license, which we can't just
    assume for things when given only a URL to them.
    Sorry about that, but I had enough time to revise my patches this week-end.
    I attached the patches in this mail, and will update CommitFest page soon.
    Also, you suggested you were out of time to work on this.  If that's the
    case, we'd like to know that so we don't keep cc'ing you about things in
    expectation of an answer.  Someone else may pick this up as a project to
    continue working on.  But it's going to need a fair amount of revision
    before it matches what people want here, and I'm not sure how much of
    what you've written is going to end up in any commit that may happen
    from this idea.
    It seems that I don't have enough time to complete this work.
    You don't need to keep cc'ing me, and I'm very happy if postgres to be
    the first DBMS which support buffer cache hibernation feature.

    Thanks!


    diff --git src/backend/access/transam/xlog.c src/backend/access/transam/xlog.c
    index b0e4c41..7a3a207 100644
    --- src/backend/access/transam/xlog.c
    +++ src/backend/access/transam/xlog.c
    @@ -4834,6 +4834,19 @@ ReadControlFile(void)
    #endif
    }

    +bool
    +GetControlFile(ControlFileData *controlFile)
    +{
    +     if (ControlFile == NULL)
    +     {
    +             return false;
    +     }
    +
    +     memcpy(controlFile, ControlFile, sizeof(ControlFileData));
    +
    +     return true;
    +}
    +
    void
    UpdateControlFile(void)
    {
    diff --git src/backend/bootstrap/bootstrap.c src/backend/bootstrap/bootstrap.c
    index fc093cc..7ecf6bb 100644
    --- src/backend/bootstrap/bootstrap.c
    +++ src/backend/bootstrap/bootstrap.c
    @@ -360,6 +360,15 @@ AuxiliaryProcessMain(int argc, char *argv[])
    BaseInit();

    /*
    +      * Only StartupProcess can call ResumeBufferCacheHibernation() after
    +      * InitFileAccess() and smgrinit().
    +      */
    +     if (auxType == StartupProcess && BufferCacheHibernationLevel > 0)
    +     {
    +             ResumeBufferCacheHibernation();
    +     }
    +
    +     /*
    * When we are an auxiliary process, we aren't going to do the full
    * InitPostgres pushups, but there are a couple of things that need to get
    * lit up even in an auxiliary process.
    diff --git src/backend/storage/buffer/buf_init.c src/backend/storage/buffer/buf_init.c
    index dadb49d..52eb51a 100644
    --- src/backend/storage/buffer/buf_init.c
    +++ src/backend/storage/buffer/buf_init.c
    @@ -127,6 +127,14 @@ InitBufferPool(void)

    /* Init other shared buffer-management stuff */
    StrategyInitialize(!foundDescs);
    +
    +     if (BufferCacheHibernationLevel > 0)
    +     {
    +             ResisterBufferCacheHibernation(BUFFER_CACHE_HIBERNATION_TYPE_DESCRIPTORS,
    +                     (char *)BufferDescriptors, sizeof(BufferDesc), NBuffers);
    +             ResisterBufferCacheHibernation(BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS,
    +                     (char *)BufferBlocks, BLCKSZ, NBuffers);
    +     }
    }

    /*
    diff --git src/backend/storage/buffer/bufmgr.c src/backend/storage/buffer/bufmgr.c
    index f96685d..dba8ebf 100644
    --- src/backend/storage/buffer/bufmgr.c
    +++ src/backend/storage/buffer/bufmgr.c
    @@ -31,6 +31,7 @@
    #include "postgres.h"

    #include <sys/file.h>
    +#include <sys/stat.h>
    #include <unistd.h>

    #include "catalog/catalog.h"
    @@ -61,6 +62,13 @@
    #define BUF_WRITTEN                          0x01
    #define BUF_REUSABLE                 0x02

    +/*
    + * Buffer Cache Hibernation stuff.
    + */
    +/* enable this to debug buffer cache hibernation. */
    +#if 0
    +#define DEBUG_BUFFER_CACHE_HIBERNATION
    +#endif

    /* GUC variables */
    bool         zero_damaged_pages = false;
    @@ -765,6 +773,16 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
    }
    }

    +#ifdef DEBUG_BUFFER_CACHE_HIBERNATION
    +                     elog(DEBUG5,
    +                             "alloc  [%d]\t%03x,%d,%d,%d,%d\t%08x,%d,%d,%d,%d,%d",
    +                                     buf->buf_id, buf->flags, buf->usage_count, buf->refcount,
    +                                     buf->wait_backend_pid, buf->freeNext,
    +                                     newHash, newTag.rnode.spcNode,
    +                                     newTag.rnode.dbNode, newTag.rnode.relNode,
    +                                     newTag.forkNum, newTag.blockNum);
    +#endif
    +
    return buf;
    }

    @@ -800,6 +818,16 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
    * the old content is no longer relevant.  (The usage_count starts out at
    * 1 so that the buffer can survive one clock-sweep pass.)
    */
    +#ifdef DEBUG_BUFFER_CACHE_HIBERNATION
    +     elog(DEBUG5,
    +             "rename [%d]\t%03x,%d,%d,%d,%d\t%08x,%d,%d,%d,%d,%d",
    +                     buf->buf_id, buf->flags, buf->usage_count, buf->refcount,
    +                     buf->wait_backend_pid, buf->freeNext,
    +                     oldHash, oldTag.rnode.spcNode,
    +                     oldTag.rnode.dbNode, oldTag.rnode.relNode,
    +                     oldTag.forkNum, oldTag.blockNum);
    +#endif
    +
    buf->tag = newTag;
    buf->flags &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT);
    if (relpersistence == RELPERSISTENCE_PERMANENT)
    @@ -2772,3 +2800,716 @@ local_buffer_write_error_callback(void *arg)
    pfree(path);
    }
    }
    +
    +/* ----------------------------------------------------------------
    + *           Buffer Cache Hibernation support stuff
    + *
    + * Suspend/resume buffer cache data structure using hibernation files
    + * at shutdown/startup.
    + * ----------------------------------------------------------------
    + */
    +
    +int  BufferCacheHibernationLevel = 0;
    +
    +#define      BUFFER_CACHE_HIBERNATION_FILE_STRATEGY          "global/pg_buffer_cache_hibernation_strategy"
    +#define      BUFFER_CACHE_HIBERNATION_FILE_DESCRIPTORS       "global/pg_buffer_cache_hibernation_descriptors"
    +#define      BUFFER_CACHE_HIBERNATION_FILE_BLOCKS            "global/pg_buffer_cache_hibernation_blocks"
    +#define      BUFFER_CACHE_HIBERNATION_FILE_CRC32                     "global/pg_buffer_cache_hibernation_crc32"
    +
    +static struct
    +{
    +     char            *hibernation_file;
    +     char            *data_ptr;
    +     Size            record_length;
    +     Size            num_records;
    +     pg_crc32        crc;
    +} BufferCacheHibernationData[] =
    +{
    +     /* BufferStrategyControl */
    +     {
    +             BUFFER_CACHE_HIBERNATION_FILE_STRATEGY,
    +             NULL, 0, 0, 0
    +     },
    +
    +     /* BufferDescriptors */
    +     {
    +             BUFFER_CACHE_HIBERNATION_FILE_DESCRIPTORS,
    +             NULL, 0, 0, 0
    +     },
    +
    +     /* BufferBlocks */
    +     {
    +             BUFFER_CACHE_HIBERNATION_FILE_BLOCKS,
    +             NULL, 0, 0, 0
    +     },
    +
    +     /* End-of-list marker */
    +     {
    +             NULL,
    +             NULL, 0, 0, 0
    +     },
    +};
    +
    +static ControlFileData       controlFile;
    +static bool                          controlFileInitialized = false;
    +
    +/*
    + * AtProcExit_BufferCacheHibernation:
    + *           store the buffer cache into hibernation files at shutdown.
    + */
    +static void
    +AtProcExit_BufferCacheHibernation(int code, Datum arg)
    +{
    +     BufferHibernationFileType       id;
    +     int                                                     i;
    +     int                                                     fd;
    +
    +     if (BufferCacheHibernationLevel == 0)
    +     {
    +             return;
    +     }
    +
    +     /*
    +      * get the control file to check the system state validation.
    +      */
    +     if (GetControlFile(&controlFile) == false)
    +     {
    +             elog(WARNING,
    +                     "could not get control file, "
    +                     "aborting buffer cache hibernation");
    +             return;
    +     }
    +
    +     if (controlFile.state != DB_SHUTDOWNED)
    +     {
    +             elog(WARNING,
    +                     "database system was not shut down normally, "
    +                     "aborting buffer cache hibernation");
    +             return;
    +     }
    +
    +     /*
    +      * suspend buffer cache data structure into hibernation files.
    +      */
    +     for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    +     {
    +             Size            record_length;
    +             Size            num_records;
    +             char            *ptr;
    +             pg_crc32        crc;
    +
    +             if (BufferCacheHibernationLevel < 2 &&
    +                     id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    +             {
    +                     continue;
    +             }
    +
    +             if (BufferCacheHibernationData[id].data_ptr == NULL ||
    +                     BufferCacheHibernationData[id].record_length == 0 ||
    +                     BufferCacheHibernationData[id].num_records == 0)
    +             {
    +                     elog(WARNING,
    +                             "ResisterBufferCacheHibernation() was not called for %s",
    +                             BufferCacheHibernationData[id].hibernation_file);
    +                     goto cleanup;
    +             }
    +
    +             fd = BasicOpenFile(BufferCacheHibernationData[id].hibernation_file,
    +                             O_CREAT | O_WRONLY | O_TRUNC | PG_BINARY, S_IRUSR | S_IWUSR);
    +             if (fd < 0)
    +             {
    +                     elog(WARNING,
    +                             "could not open %s",
    +                             BufferCacheHibernationData[id].hibernation_file);
    +                     goto cleanup;
    +             }
    +
    +             record_length = BufferCacheHibernationData[id].record_length;
    +             num_records = BufferCacheHibernationData[id].num_records;
    +
    +             elog(NOTICE,
    +                     "buffer cache hibernate into %s",
    +                     BufferCacheHibernationData[id].hibernation_file);
    +
    +             INIT_CRC32(crc);
    +             for (i = 0; i < num_records; i++)
    +             {
    +                     ptr = BufferCacheHibernationData[id].data_ptr + (i * record_length);
    +                     if (write(fd, (void *)ptr, record_length) != record_length)
    +                     {
    +                             elog(WARNING,
    +                                     "could not write %s",
    +                                     BufferCacheHibernationData[id].hibernation_file);
    +                             goto cleanup;
    +                     }
    +
    +                     COMP_CRC32(crc, ptr, record_length);
    +             }
    +
    +             FIN_CRC32(crc);
    +             close(fd);
    +
    +             BufferCacheHibernationData[id].crc = crc;
    +     }
    +
    +     /*
    +      * save the computed crc values for the validations at resuming.
    +      */
    +     fd = BasicOpenFile(BUFFER_CACHE_HIBERNATION_FILE_CRC32,
    +                     O_CREAT | O_WRONLY | O_TRUNC | PG_BINARY, S_IRUSR | S_IWUSR);
    +     if (fd < 0)
    +     {
    +             elog(WARNING,
    +                     "could not open %s",
    +                     BUFFER_CACHE_HIBERNATION_FILE_CRC32);
    +             goto cleanup;
    +     }
    +
    +     for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    +     {
    +             pg_crc32        crc;
    +
    +             if (BufferCacheHibernationLevel < 2 &&
    +                     id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    +             {
    +                     continue;
    +             }
    +
    +             crc = BufferCacheHibernationData[id].crc;
    +             if (write(fd, (void *)&crc, sizeof(pg_crc32)) != sizeof(pg_crc32))
    +             {
    +                     elog(WARNING,
    +                             "could not write %s for %s",
    +                             BUFFER_CACHE_HIBERNATION_FILE_CRC32,
    +                             BufferCacheHibernationData[id].hibernation_file);
    +                     goto cleanup;
    +             }
    +     }
    +     close(fd);
    +
    +     elog(NOTICE,
    +             "buffer cache suspended successfully");
    +
    +     return;
    +
    +cleanup:
    +     for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    +     {
    +             unlink(BufferCacheHibernationData[id].hibernation_file);
    +     }
    +
    +     return;
    +}
    +
    +/*
    + * ResisterBufferCacheHibernation:
    + *           register the buffer cache data structure info.
    + */
    +void
    +ResisterBufferCacheHibernation(BufferHibernationFileType id, char *ptr, Size record_length, Size num_records)
    +{
    +     static bool                                     first_time = true;
    +
    +     if (BufferCacheHibernationLevel == 0)
    +     {
    +             return;
    +     }
    +
    +     if (id != BUFFER_CACHE_HIBERNATION_TYPE_STRATEGY &&
    +             id != BUFFER_CACHE_HIBERNATION_TYPE_DESCRIPTORS &&
    +             id != BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    +     {
    +             return;
    +     }
    +
    +     if (first_time)
    +     {
    +             /*
    +              * AtProcExit_BufferCacheHibernation to be called at shutdown.
    +              */
    +             on_shmem_exit(AtProcExit_BufferCacheHibernation, 0);
    +             first_time = false;
    +     }
    +
    +     /*
    +      * get the control file to check the system state and
    +      * hibernation file validations.
    +      */
    +     if (controlFileInitialized == false)
    +     {
    +             if (GetControlFile(&controlFile) == true)
    +             {
    +                     controlFileInitialized = true;
    +             }
    +     }
    +
    +     BufferCacheHibernationData[id].data_ptr = ptr;
    +     BufferCacheHibernationData[id].record_length = record_length;
    +     BufferCacheHibernationData[id].num_records = num_records;
    +}
    +
    +/*
    + * ResumeBufferCacheHibernation:
    + *           resume the buffer cache from hibernation file at startup.
    + */
    +void
    +ResumeBufferCacheHibernation(void)
    +{
    +     BufferHibernationFileType       id;
    +     int                                                     i;
    +     int                                                     fd;
    +     Size                                            num_records;
    +     Size                                            record_length;
    +     char                                            *buf_common;
    +     int                                                     oldNBuffers;
    +     bool                                            buffer_block_processed;
    +
    +     if (BufferCacheHibernationLevel == 0)
    +     {
    +             return;
    +     }
    +
    +     buf_common = NULL;
    +     buffer_block_processed = false;
    +
    +     /*
    +      * lock all buffer descriptors to prevent other processes from
    +      * updating buffers.
    +      */
    +     for (i = 0; i < NBuffers; i++)
    +     {
    +             BufferDesc      *buf;
    +
    +             buf = &BufferDescriptors[i];
    +             LockBufHdr(buf);
    +     }
    +
    +     /*
    +      * get the control file to check the system state and
    +      * hibernation file validations.
    +      */
    +     if (controlFileInitialized == false)
    +     {
    +             elog(WARNING,
    +                     "could not get control file, "
    +                     "aborting buffer cache hibernation");
    +             goto cleanup;
    +     }
    +
    +     if (controlFile.state != DB_SHUTDOWNED)
    +     {
    +             elog(WARNING,
    +                     "database system was not shut down normally, "
    +                     "aborting buffer cache hibernation");
    +             goto cleanup;
    +     }
    +
    +     /*
    +      * read the crc values which was computed when the hibernation
    +      * files were created.
    +      */
    +     fd = BasicOpenFile(BUFFER_CACHE_HIBERNATION_FILE_CRC32,
    +                     O_RDONLY | PG_BINARY, S_IRUSR | S_IWUSR);
    +     if (fd < 0)
    +     {
    +             elog(WARNING,
    +                     "could not open %s",
    +                     BUFFER_CACHE_HIBERNATION_FILE_CRC32);
    +             goto cleanup;
    +     }
    +
    +     for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    +     {
    +             pg_crc32        crc;
    +
    +             if (BufferCacheHibernationLevel < 2 &&
    +                     id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    +             {
    +                     continue;
    +             }
    +
    +             if (read(fd, (void *)&crc, sizeof(pg_crc32)) != sizeof(pg_crc32))
    +             {
    +                     if (BufferCacheHibernationLevel == 2 &&
    +                             id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    +                     {
    +                             /*
    +                              * if buffer_cache_hibernation_level changes 1 to 2,
    +                              * the crc value of buffer block hibernation file may not exist.
    +                              * just ignore it here.
    +                              */
    +                             continue;
    +                     }
    +
    +                     elog(WARNING,
    +                             "could not read %s for %s",
    +                             BUFFER_CACHE_HIBERNATION_FILE_CRC32,
    +                             BufferCacheHibernationData[id].hibernation_file);
    +                     close(fd);
    +                     goto cleanup;
    +             }
    +             BufferCacheHibernationData[id].crc = crc;
    +     }
    +
    +     close(fd);
    +
    +     /*
    +      * allocate a buffer to read the contents of the hibernation files
    +      * for validations.
    +      */
    +     record_length = 0;
    +     for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    +     {
    +             if (record_length < BufferCacheHibernationData[id].record_length)
    +             {
    +                     record_length = BufferCacheHibernationData[id].record_length;
    +             }
    +     }
    +
    +     buf_common = malloc(record_length);
    +     Assert(buf_common != NULL);
    +
    +     /* assume that the number of buffers have not changed. */
    +     oldNBuffers = NBuffers;
    +
    +     /*
    +      * check if all hibernation files are valid.
    +      */
    +     for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    +     {
    +             struct stat     sb;
    +             pg_crc32        crc;
    +
    +             if (BufferCacheHibernationLevel < 2 &&
    +                     id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    +             {
    +                     continue;
    +             }
    +
    +             if (BufferCacheHibernationData[id].data_ptr == NULL ||
    +                     BufferCacheHibernationData[id].record_length == 0 ||
    +                     BufferCacheHibernationData[id].num_records == 0)
    +             {
    +                     elog(WARNING,
    +                             "ResisterBufferCacheHibernation() was not called for %s",
    +                             BufferCacheHibernationData[id].hibernation_file);
    +                     goto cleanup;
    +             }
    +
    +             fd = BasicOpenFile(BufferCacheHibernationData[id].hibernation_file,
    +                             O_RDONLY | PG_BINARY, S_IRUSR | S_IWUSR);
    +             if (fd < 0)
    +             {
    +                     if (BufferCacheHibernationLevel == 2 &&
    +                             id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    +                     {
    +                             /*
    +                              * if buffer_cache_hibernation_level changes 1 to 2,
    +                              * the buffer block hibernation file may not exist.
    +                              * just ignore it here.
    +                              */
    +                             continue;
    +                     }
    +
    +                     goto cleanup;
    +             }
    +
    +             if (fstat(fd, &sb) < 0)
    +             {
    +                     elog(WARNING,
    +                             "could not get stats of the buffer cache hibernation file: %s",
    +                             BufferCacheHibernationData[id].hibernation_file);
    +                     close(fd);
    +                     goto cleanup;
    +             }
    +
    +             record_length = BufferCacheHibernationData[id].record_length;
    +             num_records = BufferCacheHibernationData[id].num_records;
    +
    +             if (sb.st_size != (record_length * num_records))
    +             {
    +                     /* The size of StrategyControl should be the same always. */
    +                     if (id == BUFFER_CACHE_HIBERNATION_TYPE_STRATEGY ||
    +                             (sb.st_size % record_length) > 0)
    +                     {
    +                             elog(WARNING,
    +                                     "size mismatch on the buffer cache hibernation file: %s",
    +                                     BufferCacheHibernationData[id].hibernation_file);
    +                             close(fd);
    +                             goto cleanup;
    +                     }
    +
    +                     /*
    +                      * The number of records of buffer descriptors and blocks
    +                      * should be the same.
    +                      */
    +                     if (oldNBuffers != NBuffers &&
    +                             oldNBuffers != (sb.st_size / record_length))
    +                     {
    +                             elog(WARNING,
    +                                     "size mismatch on the buffer cache hibernation file: %s",
    +                                     BufferCacheHibernationData[id].hibernation_file);
    +                             close(fd);
    +                             goto cleanup;
    +                     }
    +
    +                     oldNBuffers = sb.st_size / record_length;
    +
    +                     elog(NOTICE,
    +                             "shared_buffers have changed from %d to %d: %s",
    +                             oldNBuffers, NBuffers,
    +                             BufferCacheHibernationData[id].hibernation_file);
    +
    +                     /* use the original size to compute CRC of the hibernation file. */
    +                     num_records = oldNBuffers;
    +             }
    +
    +             if ((pg_time_t)sb.st_mtime < controlFile.time)
    +             {
    +                     elog(WARNING,
    +                             "the hibernation file is older than control file: %s",
    +                             BufferCacheHibernationData[id].hibernation_file);
    +                     close(fd);
    +                     goto cleanup;
    +             }
    +
    +             INIT_CRC32(crc);
    +             for (i = 0; i < num_records; i++)
    +             {
    +                     if (read(fd, (void *)buf_common, record_length) != record_length)
    +                     {
    +                             elog(WARNING,
    +                                     "could not read the buffer cache hibernation file: %s",
    +                                     BufferCacheHibernationData[id].hibernation_file);
    +                             close(fd);
    +                             goto cleanup;
    +                     }
    +
    +                     COMP_CRC32(crc, buf_common, record_length);
    +
    +                     /*
    +                      * buffer descriptors validations.
    +                      */
    +                     if (id == BUFFER_CACHE_HIBERNATION_TYPE_DESCRIPTORS)
    +                     {
    +                             BufferDesc      *buf;
    +                             BufFlags        abnormal_flags;
    +
    +                             if (i >= NBuffers)
    +                             {
    +                                     continue;
    +                             }
    +
    +                             abnormal_flags = (BM_DIRTY | BM_IO_IN_PROGRESS | BM_IO_ERROR |
    +                                                               BM_JUST_DIRTIED | BM_PIN_COUNT_WAITER);
    +
    +                             buf = (BufferDesc *)buf_common;
    +
    +                             if (buf->flags & abnormal_flags)
    +                             {
    +                                     elog(WARNING,
    +                                             "abnormal flags in buffer descriptors: %d",
    +                                             buf->flags);
    +                                     close(fd);
    +                                     goto cleanup;
    +                             }
    +
    +                             if (buf->usage_count > BM_MAX_USAGE_COUNT)
    +                             {
    +                                     elog(WARNING,
    +                                             "invalid usage count in buffer descriptors: %d",
    +                                             buf->usage_count);
    +                                     close(fd);
    +                                     goto cleanup;
    +                             }
    +
    +                             if (buf->buf_id < 0 || buf->buf_id >= num_records)
    +                             {
    +                                     elog(WARNING,
    +                                             "invalid buffer id in buffer descriptors: %d",
    +                                             buf->buf_id);
    +                                     close(fd);
    +                                     goto cleanup;
    +                             }
    +                     }
    +             }
    +
    +             FIN_CRC32(crc);
    +             close(fd);
    +
    +             if (!EQ_CRC32(BufferCacheHibernationData[id].crc, crc))
    +             {
    +                     elog(WARNING,
    +                             "crc mismatch on the buffer cache hibernation file: %s",
    +                             BufferCacheHibernationData[id].hibernation_file);
    +                     close(fd);
    +                     goto cleanup;
    +             }
    +     }
    +
    +     /*
    +      * resume the buffer cache data structure from the hibernation files.
    +      */
    +     for (id = 0; BufferCacheHibernationData[id].hibernation_file != NULL; id++)
    +     {
    +             int                     fd;
    +             char            *ptr;
    +
    +             if (BufferCacheHibernationLevel < 2 &&
    +                     id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    +             {
    +                     continue;
    +             }
    +
    +             record_length = BufferCacheHibernationData[id].record_length;
    +             num_records = BufferCacheHibernationData[id].num_records;
    +
    +             if (id != BUFFER_CACHE_HIBERNATION_TYPE_STRATEGY)
    +             {
    +                     /* use the smaller number of buffers. */
    +                     num_records = (oldNBuffers < NBuffers)? oldNBuffers : NBuffers;
    +             }
    +
    +             fd = BasicOpenFile(BufferCacheHibernationData[id].hibernation_file,
    +                             O_RDONLY | PG_BINARY, S_IRUSR | S_IWUSR);
    +             if (fd < 0)
    +             {
    +                     if (BufferCacheHibernationLevel == 2 &&
    +                             id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    +                     {
    +                             /*
    +                              * if buffer_cache_hibernation_level changes 1 to 2,
    +                              * the buffer block hibernation file may not exist.
    +                              * just ignore it here.
    +                              */
    +                             continue;
    +                     }
    +
    +                     goto cleanup;
    +             }
    +
    +             elog(NOTICE,
    +                     "buffer cache resume from %s(%d bytes * %d records)",
    +                     BufferCacheHibernationData[id].hibernation_file,
    +                     record_length, num_records);
    +
    +             for (i = 0; i < num_records; i++)
    +             {
    +                     ptr = BufferCacheHibernationData[id].data_ptr + (i * record_length);
    +                     read(fd, (void *)ptr, record_length);
    +
    +                     /* Re-lock the buffer descriptor if necessary. */
    +                     if (id == BUFFER_CACHE_HIBERNATION_TYPE_DESCRIPTORS)
    +                     {
    +                             BufferDesc      *buf;
    +
    +                             buf = (BufferDesc *)ptr;
    +                             if (IsUnlockBufHdr(buf))
    +                             {
    +                                     LockBufHdr(buf);
    +                             }
    +                     }
    +             }
    +
    +             close(fd);
    +
    +             if (id == BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS)
    +             {
    +                     buffer_block_processed = true;
    +             }
    +     }
    +
    +     if (buffer_block_processed == false)
    +     {
    +             /* we didn't use the buffer block hibernation file, so delete it now. */
    +             id = BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS;
    +             unlink(BufferCacheHibernationData[id].hibernation_file);
    +     }
    +
    +     /*
    +      * set the rest data structures (eg. lookup hashtable) up
    +      * based on the buffer descriptors.
    +      */
    +     num_records = (oldNBuffers < NBuffers)? oldNBuffers : NBuffers;
    +     for (i = 0; i < num_records; i++)
    +     {
    +             BufferDesc              *buf;
    +             BufferTag               newTag;
    +             uint32                  newHash;
    +             int                             buf_id;
    +
    +             buf = &BufferDescriptors[i];
    +             if (buf->tag.rnode.spcNode      == InvalidOid &&
    +                     buf->tag.rnode.dbNode   == InvalidOid &&
    +                     buf->tag.rnode.relNode  == InvalidOid)
    +             {
    +                     continue;
    +             }
    +
    +             INIT_BUFFERTAG(newTag, buf->tag.rnode, buf->tag.forkNum, buf->tag.blockNum);
    +             newHash = BufTableHashCode(&newTag);
    +
    +             if (buffer_block_processed == false)
    +             {
    +                     Block                   bufBlock;
    +                     SMgrRelation    smgr;
    +
    +                     /*
    +                      * re-read buffer block.
    +                      */
    +                     bufBlock = BufHdrGetBlock(buf);
    +                     smgr = smgropen(buf->tag.rnode, InvalidBackendId);
    +                     smgrread(smgr, newTag.forkNum, newTag.blockNum, (char *) bufBlock);
    +             }
    +
    +             buf_id = BufTableInsert(&newTag, newHash, buf->buf_id);
    +             if (buf_id != -1)
    +             {
    +                     /* the entry exists already, return it to the freelist. */
    +                     buf->refcount = 0;
    +                     buf->flags = 0;
    +                     InvalidateBuffer(buf);
    +                     continue;
    +             }
    +
    +             /* clear wait_backend_pid because the process was terminated already. */
    +             buf->wait_backend_pid = 0;
    +
    +#ifdef DEBUG_BUFFER_CACHE_HIBERNATION
    +             elog(DEBUG5,
    +                     "resume [%d]\t%03x,%d,%d,%d,%d\t%08x,%d,%d,%d,%d,%d",
    +                             buf->buf_id, buf->flags, buf->usage_count, buf->refcount,
    +                             buf->wait_backend_pid, buf->freeNext,
    +                             newHash, newTag.rnode.spcNode,
    +                             newTag.rnode.dbNode, newTag.rnode.relNode,
    +                             newTag.forkNum, newTag.blockNum);
    +#endif
    +     }
    +
    +     /*
    +      * adjust StrategyControl based on the change of shared_buffers.
    +      */
    +     if (oldNBuffers != NBuffers)
    +     {
    +             AdjustStrategyControl(oldNBuffers);
    +     }
    +
    +     elog(NOTICE,
    +             "buffer cache resumed successfully");
    +
    +cleanup:
    +     for (i = 0; i < NBuffers; i++)
    +     {
    +             BufferDesc      *buf;
    +
    +             buf = &BufferDescriptors[i];
    +             UnlockBufHdr(buf);
    +     }
    +
    +     if (buf_common != NULL)
    +     {
    +             free(buf_common);
    +     }
    +
    +     return;
    +}
    diff --git src/backend/storage/buffer/freelist.c src/backend/storage/buffer/freelist.c
    index bf9903b..ffc101d 100644
    --- src/backend/storage/buffer/freelist.c
    +++ src/backend/storage/buffer/freelist.c
    @@ -347,6 +347,12 @@ StrategyInitialize(bool init)
    }
    else
    Assert(!init);
    +
    +     if (BufferCacheHibernationLevel > 0)
    +     {
    +             ResisterBufferCacheHibernation(BUFFER_CACHE_HIBERNATION_TYPE_STRATEGY,
    +                     (char *)StrategyControl, sizeof(BufferStrategyControl), 1);
    +     }
    }


    @@ -521,3 +527,47 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, volatile BufferDesc *buf)

    return true;
    }
    +
    +/*
    + * AdjustStrategyControl -- adjust the member variables of StrategyControl
    + *
    + * If the shared_buffers setting had changed, restored StrategyControl
    + * needs to be adjusted for in both cases of shrinking and enlarging.
    + * This is called only from bufmgr.c:ResumeBufferCacheHibernation().
    + */
    +void
    +AdjustStrategyControl(int oldNBuffers)
    +{
    +     if (oldNBuffers == NBuffers)
    +     {
    +             return;
    +     }
    +
    +     /* enlarge or shrink the free buffer based on current NBuffers. */
    +     StrategyControl->lastFreeBuffer = NBuffers - 1;
    +
    +     /* shared_buffers shrunk. */
    +     if (oldNBuffers > NBuffers)
    +     {
    +             if (StrategyControl->nextVictimBuffer >= NBuffers)
    +             {
    +                     /* set the tail of buffers. */
    +                     StrategyControl->nextVictimBuffer = NBuffers - 1;
    +             }
    +
    +             if (StrategyControl->firstFreeBuffer >= NBuffers)
    +             {
    +                     /* set FREENEXT_END_OF_LIST(-1). */
    +                     StrategyControl->firstFreeBuffer = FREENEXT_END_OF_LIST;
    +             }
    +     }
    +     else
    +     /* shared_buffers enlarged. */
    +     {
    +             if (StrategyControl->firstFreeBuffer < 0)
    +             {
    +                     /* set the next entry of the tail of old buffers. */
    +                     StrategyControl->firstFreeBuffer = oldNBuffers;
    +             }
    +     }
    +}
    diff --git src/backend/utils/misc/guc.c src/backend/utils/misc/guc.c
    index 738e215..5affc6e 100644
    --- src/backend/utils/misc/guc.c
    +++ src/backend/utils/misc/guc.c
    @@ -2361,6 +2361,18 @@ static struct config_int ConfigureNamesInt[] =
    NULL, NULL, NULL
    },

    +     {
    +             {"buffer_cache_hibernation_level", PGC_POSTMASTER, UNGROUPED,
    +                     gettext_noop("Sets buffer cache hibernation level."),
    +                     gettext_noop("0 to disable(default), "
    +                                              "1 for saving buffer descriptors only(recommended), "
    +                                              "2 for saving buffer descriptors and buffer blocks(slower at shutdown).")
    +             },
    +             &BufferCacheHibernationLevel,
    +             0, 0, 2,
    +             NULL, NULL, NULL
    +     },
    +
    /* End-of-list marker */
    {
    {NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
    diff --git src/backend/utils/misc/postgresql.conf.sample src/backend/utils/misc/postgresql.conf.sample
    index b8a1582..44b6ff3 100644
    --- src/backend/utils/misc/postgresql.conf.sample
    +++ src/backend/utils/misc/postgresql.conf.sample
    @@ -119,6 +119,17 @@
    #maintenance_work_mem = 16MB         # min 1MB
    #max_stack_depth = 2MB                       # min 100kB

    +
    +# Buffer Cache Hibernation:
    +#  Suspend/resume buffer cache data structure using hibernation files
    +#  at shutdown/startup.
    +#buffer_cache_hibernation_level = 0  # Sets buffer cache hibernation level.
    +                                     # 0 to disable(default),
    +                                     # 1 for saving buffer descriptors only
    +                                     #   (recommended),
    +                                     # 2 for saving buffer descriptors and
    +                                     #   buffer blocks(slower at shutdown).
    +
    # - Kernel Resource Usage -

    #max_files_per_process = 1000                # min 25
    diff --git src/include/access/xlog.h src/include/access/xlog.h
    index 7056fd6..7a9fb99 100644
    --- src/include/access/xlog.h
    +++ src/include/access/xlog.h
    @@ -13,6 +13,7 @@

    #include "access/rmgr.h"
    #include "access/xlogdefs.h"
    +#include "catalog/pg_control.h"
    #include "lib/stringinfo.h"
    #include "storage/buf.h"
    #include "utils/pg_crc.h"
    @@ -294,6 +295,7 @@ extern bool XLogInsertAllowed(void);
    extern void GetXLogReceiptTime(TimestampTz *rtime, bool *fromStream);
    extern XLogRecPtr GetXLogReplayRecPtr(void);

    +extern bool GetControlFile(ControlFileData *controlFile);
    extern void UpdateControlFile(void);
    extern uint64 GetSystemIdentifier(void);
    extern Size XLOGShmemSize(void);
    diff --git src/include/storage/buf_internals.h src/include/storage/buf_internals.h
    index b7d4ea5..d537ef1 100644
    --- src/include/storage/buf_internals.h
    +++ src/include/storage/buf_internals.h
    @@ -167,6 +167,7 @@ typedef struct sbufdesc
    */
    #define LockBufHdr(bufHdr)           SpinLockAcquire(&(bufHdr)->buf_hdr_lock)
    #define UnlockBufHdr(bufHdr) SpinLockRelease(&(bufHdr)->buf_hdr_lock)
    +#define IsUnlockBufHdr(bufHdr)       SpinLockFree(&(bufHdr)->buf_hdr_lock)


    /* in buf_init.c */
    @@ -190,6 +191,7 @@ extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
    extern int   StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc);
    extern Size StrategyShmemSize(void);
    extern void StrategyInitialize(bool init);
    +extern void AdjustStrategyControl(int oldNBuffers);

    /* buf_table.c */
    extern Size BufTableShmemSize(int size);
    diff --git src/include/storage/bufmgr.h src/include/storage/bufmgr.h
    index b8fc87e..ddfeb9d 100644
    --- src/include/storage/bufmgr.h
    +++ src/include/storage/bufmgr.h
    @@ -211,6 +211,20 @@ extern void BgBufferSync(void);

    extern void AtProcExit_LocalBuffers(void);

    +/* buffer cache hibernation support stuff */
    +extern int   BufferCacheHibernationLevel;
    +
    +typedef enum BufferHibernationFileType
    +{
    +    BUFFER_CACHE_HIBERNATION_TYPE_STRATEGY,
    +    BUFFER_CACHE_HIBERNATION_TYPE_DESCRIPTORS,
    +    BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS
    +} BufferHibernationFileType;
    +
    +extern void ResisterBufferCacheHibernation(BufferHibernationFileType id,
    +                             char *ptr, Size record_length, Size num_records);
    +extern void ResumeBufferCacheHibernation(void);
    +
    /* in freelist.c */
    extern BufferAccessStrategy GetAccessStrategy(BufferAccessStrategyType btype);
    extern void FreeAccessStrategy(BufferAccessStrategy strategy);

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers
    --
    Bruce Momjian  <bruce@momjian.us>        http://momjian.us
    EnterpriseDB                             http://enterprisedb.com

    + It's impossible for everything to be true. +

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers


    --
    Cédric Villemain +33 (0)6 20 30 22 52
    http://2ndQuadrant.fr/
    PostgreSQL: Support 24x7 - Développement, Expertise et Formation
  • Heikki Linnakangas at Oct 14, 2011 at 11:31 am

    On 14.10.2011 11:44, Cédric Villemain wrote:
    2011/10/14 Bruce Momjian<bruce@momjian.us>:
    Should this be marked as TODO?
    I suppose TODO items *are* wanted and so working on them should remove
    the pain to convince people here to accept the feature, aren't they ?
    I don't think this is worthwhile to have in the backend. Someone could
    write it as an extension on pgfoundry, but I don't think that belongs on
    the TODO.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Tom Lane at Oct 14, 2011 at 2:41 pm

    =?ISO-8859-1?Q?C=E9dric_Villemain?= <cedric.villemain.debian@gmail.com> writes:
    2011/10/14 Bruce Momjian <bruce@momjian.us>:
    Should this be marked as TODO?
    I suppose TODO items *are* wanted and so working on them should remove
    the pain to convince people here to accept the feature, aren't they ?
    There is plenty of stuff in the TODO list for which there is no
    consensus.

    regards, tom lane
  • Bruce Momjian at Oct 14, 2011 at 2:56 pm

    Tom Lane wrote:
    =?ISO-8859-1?Q?C=E9dric_Villemain?= <cedric.villemain.debian@gmail.com> writes:
    2011/10/14 Bruce Momjian <bruce@momjian.us>:
    Should this be marked as TODO?
    I suppose TODO items *are* wanted and so working on them should remove
    the pain to convince people here to accept the feature, aren't they ?
    There is plenty of stuff in the TODO list for which there is no
    consensus.
    Uh, we should probably remove those then. Can you think of any?

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Alvaro Herrera at Oct 14, 2011 at 3:10 pm

    Excerpts from Bruce Momjian's message of vie oct 14 11:56:22 -0300 2011:
    Tom Lane wrote:
    =?ISO-8859-1?Q?C=E9dric_Villemain?= <cedric.villemain.debian@gmail.com> writes:
    2011/10/14 Bruce Momjian <bruce@momjian.us>:
    Should this be marked as TODO?
    I suppose TODO items *are* wanted and so working on them should remove
    the pain to convince people here to accept the feature, aren't they ?
    There is plenty of stuff in the TODO list for which there is no
    consensus.
    Uh, we should probably remove those then. Can you think of any?
    The guideline, last I checked, was that before getting into coding any
    item from the TODO list, the prospective hacker should check previous
    discussions and initiate a new one on this list to ensure consensus.
    Unless something is blatantly "not wanted", I don't think it should be
    removed from the TODO list. There not being consensus does not mean
    that there cannot ever be.

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Bruce Momjian at Oct 14, 2011 at 3:12 pm

    Alvaro Herrera wrote:

    Excerpts from Bruce Momjian's message of vie oct 14 11:56:22 -0300 2011:
    Tom Lane wrote:
    =?ISO-8859-1?Q?C=E9dric_Villemain?= <cedric.villemain.debian@gmail.com> writes:
    2011/10/14 Bruce Momjian <bruce@momjian.us>:
    Should this be marked as TODO?
    I suppose TODO items *are* wanted and so working on them should remove
    the pain to convince people here to accept the feature, aren't they ?
    There is plenty of stuff in the TODO list for which there is no
    consensus.
    Uh, we should probably remove those then. Can you think of any?
    The guideline, last I checked, was that before getting into coding any
    item from the TODO list, the prospective hacker should check previous
    discussions and initiate a new one on this list to ensure consensus.
    Unless something is blatantly "not wanted", I don't think it should be
    removed from the TODO list. There not being consensus does not mean
    that there cannot ever be.
    OK. But if we are pretty sure we don't want something, e.g. hibernate,
    we shouldn't add it.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Robert Haas at Oct 14, 2011 at 3:18 pm

    On Fri, Oct 14, 2011 at 11:12 AM, Bruce Momjian wrote:
    OK.  But if we are pretty sure we don't want something, e.g. hibernate,
    we shouldn't add it.
    Fair enough, but I'm not even slightly sure that we don't want that.
    I think having prewarming utilities available as contrib modules or on
    PGXN would be useful, but integrating something into the backend would
    allow it to be far more automated.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Oct 14, 2011 at 3:29 pm

    Robert Haas writes:
    On Fri, Oct 14, 2011 at 11:12 AM, Bruce Momjian wrote:
    OK.  But if we are pretty sure we don't want something, e.g. hibernate,
    we shouldn't add it.
    Fair enough, but I'm not even slightly sure that we don't want that.
    I think having prewarming utilities available as contrib modules or on
    PGXN would be useful, but integrating something into the backend would
    allow it to be far more automated.
    Right. I think this one falls into my class #2, ie, we have no idea how
    to implement it usefully. Doesn't (necessarily) mean that the core
    concept is without merit.

    regards, tom lane
  • Greg Stark at Oct 16, 2011 at 5:58 pm

    On Fri, Oct 14, 2011 at 4:29 PM, Tom Lane wrote:
    Right.  I think this one falls into my class #2, ie, we have no idea how
    to implement it usefully.  Doesn't (necessarily) mean that the core
    concept is without merit.
    Hm. given that we have an implementation I wouldn't say we have *no*
    clue. But there are certainly some parts we don't have consensus yet
    on. But then working code sometimes trumps a lack of absolute
    consensus.

    But just for the sake of argument I'm not sure that the implementation
    of dumping the current contents of the buffer cache is actually
    optimal. It doesn't handle resizing the buffer cache after a restart
    for example which I think would be a significant case. There could be
    other buffer cache algorithm parameters users might change -- though I
    don't think we really have any currently.

    If we had --to take it to an extreme-- a record of every buffer
    request prior to the shutdown then we could replay that log virtually
    with the new buffer cache size and know what buffers the new buffer
    cache size would have had in it.

    I'm not sure if there's any way to gather that data efficiently, and
    if we could if there's any way to bound the amount of data we would
    have to retain to anything less than nigh-infinite volumes, and if we
    could if there's any way to limit that has to be replayed on restart.
    But my point is that there may be other more general options than
    snapshotting the actual buffer cache of the system shutting down.

    --
    greg
  • Tom Lane at Oct 16, 2011 at 6:12 pm

    Greg Stark writes:
    On Fri, Oct 14, 2011 at 4:29 PM, Tom Lane wrote:
    Right.  I think this one falls into my class #2, ie, we have no idea how
    to implement it usefully.  Doesn't (necessarily) mean that the core
    concept is without merit.
    Hm. given that we have an implementation I wouldn't say we have *no*
    clue. But there are certainly some parts we don't have consensus yet
    on. But then working code sometimes trumps a lack of absolute
    consensus.
    In this context "working" means "shows a significant performance
    benefit", and IIRC we don't have a demonstration of that. Anyway this
    was all discussed back in May.

    regards, tom lane
  • Alvaro Herrera at Oct 14, 2011 at 3:29 pm

    Excerpts from Bruce Momjian's message of vie oct 14 12:12:22 -0300 2011:

    Alvaro Herrera wrote:
    The guideline, last I checked, was that before getting into coding any
    item from the TODO list, the prospective hacker should check previous
    discussions and initiate a new one on this list to ensure consensus.
    Unless something is blatantly "not wanted", I don't think it should be
    removed from the TODO list. There not being consensus does not mean
    that there cannot ever be.
    OK. But if we are pretty sure we don't want something, e.g. hibernate,
    we shouldn't add it.
    If we're so sure we don't want it, we could add it to the "features we
    do not want" section. But as Robert says downthread, I don't see us
    being so sure that we don't want hibernation.

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Bruce Momjian at Oct 14, 2011 at 3:35 pm

    Alvaro Herrera wrote:

    Excerpts from Bruce Momjian's message of vie oct 14 12:12:22 -0300 2011:
    Alvaro Herrera wrote:
    The guideline, last I checked, was that before getting into coding any
    item from the TODO list, the prospective hacker should check previous
    discussions and initiate a new one on this list to ensure consensus.
    Unless something is blatantly "not wanted", I don't think it should be
    removed from the TODO list. There not being consensus does not mean
    that there cannot ever be.
    OK. But if we are pretty sure we don't want something, e.g. hibernate,
    we shouldn't add it.
    If we're so sure we don't want it, we could add it to the "features we
    do not want" section. But as Robert says downthread, I don't see us
    Those are for features that people often ask for, and we don't want. I
    am sure there are a lot of things we don't want.
    being so sure that we don't want hibernation.
    So, add it?

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Tom Lane at Oct 14, 2011 at 3:22 pm

    Alvaro Herrera writes:
    Excerpts from Bruce Momjian's message of vie oct 14 11:56:22 -0300 2011:
    Tom Lane wrote:
    There is plenty of stuff in the TODO list for which there is no
    consensus.
    Uh, we should probably remove those then. Can you think of any?
    Unless something is blatantly "not wanted", I don't think it should be
    removed from the TODO list. There not being consensus does not mean
    that there cannot ever be.
    Yeah. The reason why something is on the TODO list (and not already
    done) is typically one of

    1. It's too hard, or too long/boring for the expected value.
    2. There's no consensus about how to implement the feature.
    3. There's no consensus about the user-visible design of the feature.

    Cases where there's debate about whether we want it at all seem to me
    to be a subset of #3. But for anything in #3, someone could do the
    legwork or have the bright idea needed to create consensus about how
    to design the feature.

    My gripe about the TODO list is not that we have some stuff in there
    that's not clearly wanted, it's that some of the entries fail to make
    it clear where the issue stands on this scale. That could lead people
    to waste time trying to code something that there's not consensus for
    the design or implementation of.

    regards, tom lane
  • Tatsuo Ishii at Jun 1, 2011 at 7:03 am

    Yeah, I'm pretty well convinced this whole approach is a dead end.
    Priming the OS buffer cache seems way more useful. I also think
    saving the blocks to be read rather than the actual blocks makes a lot
    more sense.
    Well, his proposal works on any platforms PostgreSQL supports. On the
    other hand PgFincore works on Linux only. Who wants Linux only tool be
    in core?

    Also I really want to see the performance comparison between these two
    approaches in the real world database.
    --
    Tatsuo Ishii
    SRA OSS, Inc. Japan
    English: http://www.sraoss.co.jp/index_en.php
    Japanese: http://www.sraoss.co.jp
  • Cédric Villemain at Jun 1, 2011 at 8:57 am

    2011/6/1 Tatsuo Ishii <ishii@postgresql.org>:
    Yeah, I'm pretty well convinced this whole approach is a dead end.
    Priming the OS buffer cache seems way more useful.  I also think
    saving the blocks to be read rather than the actual blocks makes a lot
    more sense.
    Well, his proposal works on any platforms PostgreSQL supports. On the
    other hand PgFincore works on Linux only. Who wants Linux only tool be
    in core?
    I don't want to compete the features here. Just for the completeness:
    PgFincore 'snapshot' is possible on any platform supporting mincure()
    (most support it, for widows alternatives exists). For restoring, it
    can be a ReadBuffer for postgresql cache; for OS it can be an
    open(),read(X), read (Y), close() *or* posix_fadvise() which can be
    less destructive (I did only via posix_fadv but nothing prevent to
    change that when posix support is not present).
    And we already have linux-only feature in-core, fortunately because it
    is usefull feature and I really like to add more posix_fadvise call
    (*this* will really help read and cache strategy more than any hack we
    can do to try to workaround kernel decisions)
    Note that BSD developers can change that and make posix_fadvise work:
    it has been sitting in their TODO list since some years now.

    Anyway we need this patch on-list to go ahead.
    Also I really want to see the performance comparison between these two
    approaches in the real world database.
    --
    Tatsuo Ishii
    SRA OSS, Inc. Japan
    English: http://www.sraoss.co.jp/index_en.php
    Japanese: http://www.sraoss.co.jp

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers


    --
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
  • Greg Smith at Jun 1, 2011 at 10:05 am

    On 06/01/2011 03:03 AM, Tatsuo Ishii wrote:
    Also I really want to see the performance comparison between these two
    approaches in the real world database.
    Well, tell me how big of a performance improvement you want PgFincore to
    win by, and I'll construct a benchmark where it does that. If you pick
    a database size that fits in the OS cache, but is bigger than
    shared_buffers, the difference between the approaches is huge. The
    opposite--trying to find a case where this hibernation approach wins--is
    extremely hard to do.

    Anyway, further discussion of this patch is kind of a waste right now.
    We've never gotten the patch actually sent to the list to establish a
    proper contribution (just pointers to a web page), and no feedback on
    that or other suggestions for redesign (extension repackaging, GUC
    renaming, removing unused code, and a few more). Unless the author
    shows up again in the next two weeks, this is getting bounced back with
    no review as code we can't use.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
  • Robert Haas at May 15, 2011 at 6:19 pm

    On Fri, May 6, 2011 at 5:31 PM, Greg Smith wrote:
    I think that all the complexity with CRCs etc. is unlikely to lead anywhere
    too, and those two issues are not completely unrelated.  The simplest,
    safest thing here is the right way to approach this, not the most
    complicated one, and a simpler format might add some flexibility here to
    reload more cache state too.  The bottleneck on reloading the cache state is
    reading everything from disk.  Trying to micro-optimize any other part of
    that is moving in the wrong direction to me.  I doubt you'll ever measure a
    useful benefit that overcomes the expense of maintaining the code.  And you
    seem to be moving to where someone can't restore cache state when they
    change shared_buffers.  A simpler implementation might still work in that
    situation; reload until you run out of buffers if shared_buffers shrinks,
    reload until you're done with the original size.
    I don't think there's any need for this to get data into
    shared_buffers at all. Getting it into the OS cache oughta be plenty
    sufficient, no?

    ISTM that a very simple approach here would be to save the contents of
    each shared buffer on clean shutdown, and to POSIX_FADV_WILLNEED those
    buffers on startup. We could worry about additional complexity, like
    using fincore to probe the OS cache, in a follow-on patch. While
    reloading only 8GB of maybe 30GB of cached data on restart would not
    be as good as reloading all of it, it would be a lot better than
    reloading none of it, and the gymnastics required seems substantially
    less.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Cédric Villemain at May 15, 2011 at 7:11 pm

    2011/5/15 Robert Haas <robertmhaas@gmail.com>:
    On Fri, May 6, 2011 at 5:31 PM, Greg Smith wrote:
    I think that all the complexity with CRCs etc. is unlikely to lead anywhere
    too, and those two issues are not completely unrelated.  The simplest,
    safest thing here is the right way to approach this, not the most
    complicated one, and a simpler format might add some flexibility here to
    reload more cache state too.  The bottleneck on reloading the cache state is
    reading everything from disk.  Trying to micro-optimize any other part of
    that is moving in the wrong direction to me.  I doubt you'll ever measure a
    useful benefit that overcomes the expense of maintaining the code.  And you
    seem to be moving to where someone can't restore cache state when they
    change shared_buffers.  A simpler implementation might still work in that
    situation; reload until you run out of buffers if shared_buffers shrinks,
    reload until you're done with the original size.
    I don't think there's any need for this to get data into
    shared_buffers at all.  Getting it into the OS cache oughta be plenty
    sufficient, no?

    ISTM that a very simple approach here would be to save the contents of
    each shared buffer on clean shutdown, and to POSIX_FADV_WILLNEED those
    buffers on startup.
    +1
    It is just an evolution of the current process if I understood the
    explantions of the latest patch correctly.
    We could worry about additional complexity, like
    using fincore to probe the OS cache, in a follow-on patch.  While
    reloading only 8GB of maybe 30GB of cached data on restart would not
    be as good as reloading all of it, it would be a lot better than
    reloading none of it, and the gymnastics required seems substantially
    less.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers


    --
    Cédric Villemain               2ndQuadrant
    http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
  • Jeff Janes at Jun 1, 2011 at 3:58 pm

    On Sun, May 15, 2011 at 11:19 AM, Robert Haas wrote:

    I don't think there's any need for this to get data into
    shared_buffers at all.  Getting it into the OS cache oughta be plenty
    sufficient, no?

    ISTM that a very simple approach here would be to save the contents of
    each shared buffer on clean shutdown, and to POSIX_FADV_WILLNEED those
    buffers on startup.
    Do you mean to save the contents of the buffer pages themselves into a
    hibernation file, or to save just the identities (relation/fork/block
    number) of the buffers?

    In the first case, getting them into the OS cache would not help
    because the kernel would not recognize that data as being equivalent
    to the block it is a copy of.

    In the latter case, wouldn't we just trigger the same inefficient
    scattered read of the data that normal database operation would
    trigger, taking about the same amount of time to reach cache-warmth?
    Or is POSIX_FADV_WILLNEED going to be clever about reordering and
    coalescing reads?

    Cheers,

    Jeff
  • Robert Haas at Jun 1, 2011 at 5:03 pm

    On Wed, Jun 1, 2011 at 11:58 AM, Jeff Janes wrote:
    On Sun, May 15, 2011 at 11:19 AM, Robert Haas wrote:
    I don't think there's any need for this to get data into
    shared_buffers at all.  Getting it into the OS cache oughta be plenty
    sufficient, no?

    ISTM that a very simple approach here would be to save the contents of
    each shared buffer on clean shutdown, and to POSIX_FADV_WILLNEED those
    buffers on startup.
    Do you mean to save the contents of the buffer pages themselves into a
    hibernation file, or to save just the identities (relation/fork/block
    number) of the buffers?
    The latter.
    In the first case, getting them into the OS cache would not help
    because the kernel would not recognize that data as being equivalent
    to the block it is a copy of.

    In the latter case, wouldn't we just trigger the same inefficient
    scattered read of the data that normal database operation would
    trigger, taking about the same amount of time to reach cache-warmth?
    Or is POSIX_FADV_WILLNEED going to be clever about reordering and
    coalescing reads?
    It would be nice if POSIX_FADV_WILLNEED is clever enough to reorder
    and coalesce, but even if it isn't, we can help it along by doing all
    the reads from any given file one after another and in increasing
    block number order.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Greg Stark at Jun 1, 2011 at 5:58 pm

    On Wed, Jun 1, 2011 at 8:58 AM, Jeff Janes wrote:
    In the latter case, wouldn't we just trigger the same inefficient
    scattered read of the data that normal database operation would
    trigger, taking about the same amount of time to reach cache-warmth?
    If you have a system where you're bandwidth-constrained and processing
    queries as fast as you can then yes.

    But if you have an OLTP system where queries come in at a fixed rate
    and it's latency that matters then there's a big difference. It might
    take you hours to prime the cache at the rate that queries come in
    organically and for that whole time every query requires multiple
    cache misses and multiple seeks and random access reads. Once it's all
    primed your whole database might actually fit in RAM and require no
    i/o to serve requests. And it's possible that your system is
    architected on the assumption that that's the case and performance is
    inadequate until the whole database is read in.

    Actually in that extreme case you can probably get away with a few dd
    commands or perhaps an sql select count(*) on startup. I'm not sure in
    practice how wide the use case is in the gap between that extreme case
    and more average cases where the difference isn't so catastrophic.

    I'm sure there will be people who will say it's big but I would like
    to see numbers. And I'm not just talking about the usual knee-jerk
    "lets' see the benchmarks" response. I would love to see metrics on a
    live database showing users how much of their response time depends on
    the cache and how that performance varies as the cache gets warmer.
    Right now I think users are kind of in the dark on cache effectiveness
    and latency numbers.

    --
    greg

Related Discussions

People

Translate

site design / logo © 2021 Grokbase