Hey,

this is my first post to the -hackers lists, so be merciful ;-)

I created a patch which implements MAP_HUGETLB for sysv shared memory segments
(PGSharedMemoryCreate). It is based on tests of Tom Lane and Andres Freund, I
added error handling, huge page size detection and a GUC variable.

Performance improvements differ from about 1% in the worst case to about 13% in
the best case. Benchmarking results are as follows:

pgbench -i -s 100 test

Patched:
pgbench -n -S -j 64 -c 64 -T 10 -M prepared test
tps avg: 51879.2


Unpatched:
pgbench -n -S -j 64 -c 64 -T 10 -M prepared test
tps avg: 45321.6

tps increase: 6557.6, 12.6%


Patched:
pgbench -n -S -j 64 -c 64 -T 180 -M prepared test (patched)

number of transactions actually processed: 8767510
tps = 48705.159196 (including connections establishing)
tps = 48749.761241 (excluding connections establishing)


Unpatched:
mit pgbench -n -S -j 64 -c 64 -T 120 -M prepared test (unpatched)

number of transactions actually processed: 8295439
tps = 46083.559187 (including connections establishing)
tps = 46097.763939 (excluding connections establishing)

tps diff: 2652, 5%



create table large (a int, b int);
insert into large (a, b) select s, s + 10 from generate_series(1, 10000000) s;

5 times executed, with \timing on:
SELECT sum(a), sum(b) from large;
Time: 1143.880 ms unpatched
Time: 1125.644 ms patched
about 1% difference


The patch ist attached. Any comments?

Greetings,
  CK

Search Discussions

  • Tom Lane at Oct 29, 2012 at 8:33 pm

    Christian Kruse writes:
    I created a patch which implements MAP_HUGETLB for sysv shared memory segments
    (PGSharedMemoryCreate). It is based on tests of Tom Lane and Andres Freund, I
    added error handling, huge page size detection and a GUC variable.
    My recollection is we'd decided not to pursue that because of fears that
    it could make performance horribly worse on some platforms.

        regards, tom lane
  • Andres Freund at Oct 29, 2012 at 8:54 pm

    On Monday, October 29, 2012 09:33:25 PM Tom Lane wrote:
    Christian Kruse <cjk+postgres@defunct.ch> writes:
    I created a patch which implements MAP_HUGETLB for sysv shared memory
    segments (PGSharedMemoryCreate). It is based on tests of Tom Lane and
    Andres Freund, I added error handling, huge page size detection and a
    GUC variable.
    My recollection is we'd decided not to pursue that because of fears that
    it could make performance horribly worse on some platforms.
    Hm. I don't so. Maybe youre remembering our discussion about mlock?

    MAP_HUGETLB is linux specific anyway, there are other OSes doing this, but I
    don't think a standard exists.

    http://archives.postgresql.org/message-
    id/201206292152.40311.andres%402ndquadrant.com and
    http://archives.postgresql.org/message-
    id/CA%2BTgmoZGX0Pi0rw5sDH0Uz%3D03WkQ%3DmnoAW3TXVEfvUpyW%2BfMjw%40mail.gmail.com
    seem to be the most recent discussions arround this.


    The parts I like most about having usable hugepages arround is usable 'ps'
    output and way smaller pagetables...

    Greetings,

    Andres
    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Christian Kruse at Oct 30, 2012 at 7:18 pm
    Hi,
    On 29/10/12 16:33, Tom Lane wrote:
    I created a patch which implements MAP_HUGETLB for sysv shared memory segments
    (PGSharedMemoryCreate). It is based on tests of Tom Lane and Andres Freund, I
    added error handling, huge page size detection and a GUC variable.
    My recollection is we'd decided not to pursue that because of fears that
    it could make performance horribly worse on some platforms.
    This is why I created a GUC for turning this feature off.

    Greetings,
      CK
  • Peter Geoghegan at Oct 29, 2012 at 9:15 pm

    On 29 October 2012 20:14, Christian Kruse wrote:
    created a patch which implements MAP_HUGETLB for sysv shared memory segments
    (PGSharedMemoryCreate). It is based on tests of Tom Lane and Andres Freund, I
    added error handling, huge page size detection and a GUC variable.
    I have a few initial observations on this.

    * I think you should be making the new GUC PGC_INTERNAL on platforms
    where MAP_HUGETLB is not defined or available. See also,
    effective_io_concurrency. This gives sane error handling.

    * Why aren't you failing when HUGE_TLB_ON and the mmap fails? You do
    the same thing as HUGE_TLB_TRY, contrary to your documentation:

      + if (huge_tlb_pages == HUGE_TLB_ON || huge_tlb_pages == HUGE_TLB_TRY)

    * I think you should add MAP_HUGETLB to PG_MMAP_FLAGS directly, rather
    than XOR'ing after the fact. We already build the flags that comprise
    PG_MMAP_FLAGS using another set of intermediary flags, based on
    platform considerations, so what you're doing here is just
    inconsistent. Don't wrap the mmap() code in ifdefs, and instead rely
    on the GUC being available on all platforms, with setting the GUC
    causing an error on unsupported platforms, in the style of
    effective_io_concurrency, as established in commit
    3b07182e613a0e63ff662e3dc7f820f010923ec7 . Build a PG_MAP_HUGETLB
    intermediary flag on all platforms.

    * Apparently you're supposed to do this for the benefit of Itanic [1]:

    /* Only ia64 requires this */
    #ifdef __ia64__
    #define ADDR (void *)(0x8000000000000000UL)
    #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED)
    #else
    #define ADDR (void *)(0x0UL)
    #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
    #endif

    * You aren't following our style guide with respect to error messages [2].

    [1] https://www.kernel.org/doc/Documentation/vm/map_hugetlb.c

    [2] http://www.postgresql.org/docs/devel/static/error-style-guide.html
    --
    Peter Geoghegan http://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Christian Kruse at Oct 30, 2012 at 7:18 pm
    Hi,
    On 29/10/12 21:14, Peter Geoghegan wrote:
    I have a few initial observations on this.
    Thanks for your feedback.
    * I think you should be making the new GUC PGC_INTERNAL on platforms
    where MAP_HUGETLB is not defined or available. See also,
    effective_io_concurrency. This gives sane error handling.
    Ok, sounds good for me.
    * Why aren't you failing when HUGE_TLB_ON and the mmap fails? You do
    the same thing as HUGE_TLB_TRY, contrary to your documentation:

    + if (huge_tlb_pages == HUGE_TLB_ON || huge_tlb_pages == HUGE_TLB_TRY)
    No, what I did was mmap()ing the memory with MAP_HUGETLB and falling
    back to mmap() w/o MAP_HUGETLB when mmap() failed. But there was
    another bug, I didn't mmap() at all when huge_tlb_pages == off.
    * I think you should add MAP_HUGETLB to PG_MMAP_FLAGS directly, rather
    than XOR'ing after the fact. We already build the flags that comprise
    PG_MMAP_FLAGS using another set of intermediary flags, based on
    platform considerations, so what you're doing here is just
    inconsistent.
    This would mean that I have to disable the bit when huge_tlb_pages ==
    off. I think it is better to enable it if wanted and to just leave the
    flags as they were when this feature has been turned off, do you
    disagree?
    Don't wrap the mmap() code in ifdefs, and instead rely
    on the GUC being available on all platforms, with setting the GUC
    causing an error on unsupported platforms, in the style of
    effective_io_concurrency, as established in commit
    3b07182e613a0e63ff662e3dc7f820f010923ec7 . Build a PG_MAP_HUGETLB
    intermediary flag on all platforms.
    Ok, this sounds good. It will remove complexity from the code.
    * Apparently you're supposed to do this for the benefit of Itanic [1]:

    /* Only ia64 requires this */
    #ifdef __ia64__
    #define ADDR (void *)(0x8000000000000000UL)
    #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED)
    #else
    #define ADDR (void *)(0x0UL)
    #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
    #endif
    Hm… is this enough? Or do we have to do more work for ia64?
    * You aren't following our style guide with respect to error messages [2].
    Thanks, I wasn't aware of this. I attached a new version of the patch.

    Greetings,
      CK
  • Christian Kruse at Oct 30, 2012 at 7:20 pm
    Hey,

    Oh man, first I didn't sent the email to the list and now I forgot the
    attachment. I should really get some sleep, sorry for any
    inconveniences :(

    Greetings,
      CK
  • Andres Freund at Oct 30, 2012 at 7:33 pm

    On Tuesday, October 30, 2012 08:20:33 PM Christian Kruse wrote:
    Hey,

    Oh man, first I didn't sent the email to the list and now I forgot the
    attachment. I should really get some sleep, sorry for any
    inconveniences :(
    +#ifdef MAP_HUGETLB
    +# ifdef __ia64__
    +# define PG_HUGETLB_BASE_ADDR (void *)(0x8000000000000000UL)
    +# define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED)
    +# else

    Not your fault, but that looks rather strange to me. The level of documentation
    around the requirement of using MAP_FIXED is subpar...

    +long
    +InternalGetHugepageSize()
    +{
    +...
    + if (biggest_size < size && InternalGetFreeHugepagesCount(ent-
    d_name) > 0)
    + {
    + biggest_size = size;
    + }

    I dislike automatically using the biggest size here. There are platforms with
    16GB hugepages... I think we should rather always use the smallest size and
    leave it to the kernel to choose whatever pagesize he wants to use.

    +#ifdef MAP_HUGETLB
    + long pagesize = InternalGetHugepageSize();
    +
    + if (pagesize <= 0)
    + pagesize = sysconf(_SC_PAGE_SIZE);
    +#else
        long pagesize = sysconf(_SC_PAGE_SIZE);
    +#endif

    Thats going to log two warnings if the current system doesn't have hugepage
    support compiled in and thus no /sys/kernel/mm/hugepages directory.
    I think you should 1. only test this if TRY or ON is configured, 2. make the
    messages in InternalGetHugepageSize DEBUG1 at least for the TRY case.

    + {
    + {"huge_tlb_pages",
    +#ifdef MAP_HUGETLB
    + PGC_SUSET,
    +#else
    + PGC_INTERNAL,
    +#endif
    + RESOURCES_MEM,
    + gettext_noop("Enable/disable the use of the hugepages feature"),
    + NULL
    + },
    + &huge_tlb_pages,
    + HUGE_TLB_TRY, huge_tlb_options,
    + NULL, NULL, NULL
    + },

    you use HUGE_TLB_TRY with ifdef here.


    Looking forward to having this...

    Andres
    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Christian Kruse at Oct 30, 2012 at 7:54 pm
    Hey,
    On 30/10/12 20:33, Andres Freund wrote:
    +#ifdef MAP_HUGETLB
    +# ifdef __ia64__
    +# define PG_HUGETLB_BASE_ADDR (void *)(0x8000000000000000UL)
    +# define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED)
    +# else

    Not your fault, but that looks rather strange to me. The level of documentation
    around the requirement of using MAP_FIXED is subpar...
    Yeah, looks strange to me, too. This is why I asked if this is really
    all we have to do.
    I dislike automatically using the biggest size here. There are platforms with
    16GB hugepages... I think we should rather always use the smallest size and
    leave it to the kernel to choose whatever pagesize he wants to use.
    Good point. I will change this to the smallest available page size.
    Thats going to log two warnings if the current system doesn't have hugepage
    support compiled in and thus no /sys/kernel/mm/hugepages directory.
    I think you should 1. only test this if TRY or ON is configured, 2. make the
    messages in InternalGetHugepageSize DEBUG1 at least for the TRY case.
    Ok, point taken.
    […]
    + HUGE_TLB_TRY, huge_tlb_options,
    + NULL, NULL, NULL
    + },

    you use HUGE_TLB_TRY with ifdef here.
    Ah, right. Setting it to HUGE_TLB_OFF when no MAP_HUGETLB is available.

    Greetings,
      CK
  • Christian Kruse at Oct 30, 2012 at 8:16 pm
    Hey,

    ok, I think I implemented all of the changes you requested. All but
    the ia64 dependent, I have to do more research for this one.


    Greetings,
      CK
  • Andres Freund at Nov 13, 2012 at 8:57 am
    Hi CK,
    On 2012-10-30 21:16:07 +0100, Christian Kruse wrote:
    index b4fcbaf..66ed10f 100644
    --- a/doc/src/sgml/config.sgml
    +++ b/doc/src/sgml/config.sgml
    I think a short introduction or at least a reference on how to configure
    hugepages would be a good thing.
    <varlistentry id="guc-temp-buffers" xreflabel="temp_buffers">
    <term><varname>temp_buffers</varname> (<type>integer</type>)</term>
    <indexterm>
    diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
    index df06312..f9de239 100644
    --- a/src/backend/port/sysv_shmem.c
    +++ b/src/backend/port/sysv_shmem.c
    @@ -27,10 +27,14 @@
    #ifdef HAVE_SYS_SHM_H
    #include <sys/shm.h>
    #endif
    +#ifdef MAP_HUGETLB
    +#include <dirent.h>
    +#endif
    I think a central #define for the MAP_HUGETLB capability would be a good
    idea, akin to HAVE_SYS_SHM_H.

    E.g. this:
    --- a/src/backend/utils/misc/guc.c
    +++ b/src/backend/utils/misc/guc.c
    @@ -22,6 +22,7 @@
    #include <limits.h>
    #include <unistd.h>
    #include <sys/stat.h>
    +#include <sys/mman.h>
    #ifdef HAVE_SYSLOG
    #include <syslog.h>
    #endif
    is unlikely to fly on windows.

    +/*
    + * static long InternalGetHugepageSize()
    + *
    + * Attempt to get a valid hugepage size from /sys/kernel/mm/hugepages/ by
    + * reading directory contents
    + * Will fail (return -1) if the directory could not be opened or no valid
    + * page sizes are available. Will return the biggest hugepage size on
    + * success.
    + *
    + */
    The "biggest" remark is out of date.

    +static long
    +InternalGetHugepageSize()
    +{
    ...
    + if ((smallest_size == -1 || size < smallest_size)
    + && InternalGetFreeHugepagesCount(ent->d_name) > 0)
    + {
    + smallest_size = size;
    + }
    ...
    +
    + if (smallest_size == -1)
    + {
    + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING,
    + (errmsg("Could not find a valid hugepage size"),
    + errhint("This error usually means that either CONFIG_HUGETLB_PAGE "
    + "is not in kernel or that your architecture does not "
    + "support hugepages or you did not configure hugepages")));
    + }
    I think differentiating the error message between no hugepages found and
    InternalGetFreeHugepagesCount(ent->d_name) always beeing zero would be a
    good idea. Failing this way if
    InternalGetFreeHugepagesCount(ent->d_name) < 0 seems fine.


    Greetings,

    Andres Freund
  • Andres Freund at Nov 13, 2012 at 9:03 am
    Oh, one more thing...
    On 2012-10-30 21:16:07 +0100, Christian Kruse wrote:
    ok, I think I implemented all of the changes you requested. All but
    the ia64 dependent, I have to do more research for this one.
    I vote for simply not caring about ia64.

    This is:
    +#ifdef MAP_HUGETLB
    +# ifdef __ia64__
    +# define PG_HUGETLB_BASE_ADDR (void *)(0x8000000000000000UL)
    +# define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED)
    +# else
    +# define PG_HUGETLB_BASE_ADDR (void *)(0x0UL)
    +# define PG_MAP_HUGETLB MAP_HUGETLB
    +# endif
    +#else
    +# define PG_MAP_HUGETLB 0
    +#endif
    too much underdocumented crazyness for a very minor platform. Should
    somebody with the approprate harware want to submit an additional patch,
    fine....


    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Andres Freund at Dec 21, 2012 at 8:37 pm

    On 2012-10-30 21:16:07 +0100, Christian Kruse wrote:
    +static long
    +InternalGetHugepageSize()
    +{
    + DIR *dir = opendir(HUGE_PAGE_INFO_DIR);
    + long smallest_size = -1, size;
    + char *ptr;
    ...
    + while((ent = readdir(dir)) != NULL)
    + {
    This should be (Allocate|Read)Dir btw.

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Bruce Momjian at Jun 29, 2013 at 3:03 am
    Did we decide against specifying huge pages in Postgres?

    ---------------------------------------------------------------------------
    On Tue, Oct 30, 2012 at 09:16:07PM +0100, Christian Kruse wrote:
    Hey,

    ok, I think I implemented all of the changes you requested. All but
    the ia64 dependent, I have to do more research for this one.


    Greetings,
    CK
    diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
    index b4fcbaf..66ed10f 100644
    --- a/doc/src/sgml/config.sgml
    +++ b/doc/src/sgml/config.sgml
    @@ -1049,6 +1049,37 @@ include 'filename'
    </listitem>
    </varlistentry>

    + <varlistentry id="guc-huge-tlb-pages" xreflabel="huge_tlb_pages">
    + <term><varname>huge_tlb_pages</varname> (<type>enum</type>)</term>
    + <indexterm>
    + <primary><varname>huge_tlb_pages</> configuration parameter</primary>
    + </indexterm>
    + <listitem>
    + <para>
    + Enables/disables the use of huge tlb pages. Valid values are
    + <literal>on</literal>, <literal>off</literal> and <literal>try</literal>.
    + The default value is <literal>try</literal>.
    + </para>
    +
    + <para>
    + With <varname>huge_tlb_pages</varname> set to <literal>on</literal>
    + <symbol>mmap()</symbol> will be called with <symbol>MAP_HUGETLB</symbol>.
    + If the call fails the server will fail fatally.
    + </para>
    +
    + <para>
    + With <varname>huge_tlb_pages</varname> set to <literal>off</literal> we
    + will not use <symbol>MAP_HUGETLB</symbol> at all.
    + </para>
    +
    + <para>
    + With <varname>huge_tlb_pages</varname> set to <literal>try</literal>
    + we will try to use <symbol>MAP_HUGETLB</symbol> and fall back to
    + <symbol>mmap()</symbol> without <symbol>MAP_HUGETLB</symbol>.
    + </para>
    + </listitem>
    + </varlistentry>
    +
    <varlistentry id="guc-temp-buffers" xreflabel="temp_buffers">
    <term><varname>temp_buffers</varname> (<type>integer</type>)</term>
    <indexterm>
    diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
    index df06312..f9de239 100644
    --- a/src/backend/port/sysv_shmem.c
    +++ b/src/backend/port/sysv_shmem.c
    @@ -27,10 +27,14 @@
    #ifdef HAVE_SYS_SHM_H
    #include <sys/shm.h>
    #endif
    +#ifdef MAP_HUGETLB
    +#include <dirent.h>
    +#endif

    #include "miscadmin.h"
    #include "storage/ipc.h"
    #include "storage/pg_shmem.h"
    +#include "utils/guc.h"


    typedef key_t IpcMemoryKey; /* shared memory key passed to shmget(2) */
    @@ -61,6 +65,19 @@ typedef int IpcMemoryId; /* shared memory ID returned by shmget(2) */
    #define MAP_FAILED ((void *) -1)
    #endif

    +#ifdef MAP_HUGETLB
    +# ifdef __ia64__
    +# define PG_HUGETLB_BASE_ADDR (void *)(0x8000000000000000UL)
    +# define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED)
    +# else
    +# define PG_HUGETLB_BASE_ADDR (void *)(0x0UL)
    +# define PG_MAP_HUGETLB MAP_HUGETLB
    +# endif
    +#else
    +# define PG_MAP_HUGETLB 0
    +#endif
    +
    +

    unsigned long UsedShmemSegID = 0;
    void *UsedShmemSegAddr = NULL;
    @@ -73,7 +90,6 @@ static void IpcMemoryDelete(int status, Datum shmId);
    static PGShmemHeader *PGSharedMemoryAttach(IpcMemoryKey key,
    IpcMemoryId *shmid);

    -
    /*
    * InternalIpcMemoryCreate(memKey, size)
    *
    @@ -342,6 +358,155 @@ PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2)
    }


    +#ifdef MAP_HUGETLB
    +#define HUGE_PAGE_INFO_DIR "/sys/kernel/mm/hugepages"
    +
    +/*
    + * static long InternalGetFreeHugepagesCount(const char *name)
    + *
    + * Attempt to read the number of available hugepages from
    + * /sys/kernel/mm/hugepages/hugepages-<size>/free_hugepages
    + * Will fail (return -1) if file could not be opened, 0 if no pages are available
    + * and > 0 if there are free pages
    + *
    + */
    +static long
    +InternalGetFreeHugepagesCount(const char *name)
    +{
    + int fd;
    + char buff[1024];
    + size_t len;
    + long result;
    + char *ptr;
    +
    + len = snprintf(buff, 1024, "%s/%s/free_hugepages", HUGE_PAGE_INFO_DIR, name);
    + if (len == 1024) /* I don't think that this will happen ever */
    + {
    + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING,
    + (errmsg("Filename %s/%s/free_hugepages is too long", HUGE_PAGE_INFO_DIR, name),
    + errcontext("while checking hugepage size")));
    + return -1;
    + }
    +
    + fd = open(buff, O_RDONLY);
    + if (fd <= 0)
    + {
    + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING,
    + (errmsg("Could not open file %s: %s", buff, strerror(errno)),
    + errcontext("while checking hugepage size")));
    + return -1;
    + }
    +
    + len = read(fd, buff, 1024);
    + if (len <= 0)
    + {
    + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING,
    + (errmsg("Error reading from file %s: %s", buff, strerror(errno)),
    + errcontext("while checking hugepage size")));
    + close(fd);
    + return -1;
    + }
    +
    + /*
    + * If the content of free_hugepages is longer than or equal to 1024 bytes
    + * the rest is irrelevant; we simply want to know if there are any
    + * hugepages left
    + */
    + if (len == 1024)
    + {
    + buff[1023] = 0;
    + }
    + else
    + {
    + buff[len] = 0;
    + }
    +
    + close(fd);
    +
    + result = strtol(buff, &ptr, 10);
    +
    + if (ptr == NULL)
    + {
    + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING,
    + (errmsg("Could not convert contents of file %s/%s/free_hugepages to number", HUGE_PAGE_INFO_DIR, name),
    + errcontext("while checking hugepage size")));
    + return -1;
    + }
    +
    + return result;
    +}
    +
    +/*
    + * static long InternalGetHugepageSize()
    + *
    + * Attempt to get a valid hugepage size from /sys/kernel/mm/hugepages/ by
    + * reading directory contents
    + * Will fail (return -1) if the directory could not be opened or no valid
    + * page sizes are available. Will return the biggest hugepage size on
    + * success.
    + *
    + */
    +static long
    +InternalGetHugepageSize()
    +{
    + struct dirent *ent;
    + DIR *dir = opendir(HUGE_PAGE_INFO_DIR);
    + long smallest_size = -1, size;
    + char *ptr;
    +
    + if (dir == NULL)
    + {
    + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING,
    + (errmsg("Could not open directory %s: %s", HUGE_PAGE_INFO_DIR, strerror(errno)),
    + errcontext("while checking hugepage size")));
    + return -1;
    + }
    +
    + /*
    + * Linux supports multiple hugepage sizes if the hardware
    + * supports it; for each possible size there will be a
    + * directory in /sys/kernel/mm/hugepages consisting of the
    + * string hugepages- and the size of the page, e.g. on x86_64:
    + * hugepages-2048kB
    + */
    + while((ent = readdir(dir)) != NULL)
    + {
    + if (strncmp(ent->d_name, "hugepages-", 10) == 0)
    + {
    + size = strtol(ent->d_name + 10, &ptr, 10);
    + if (ptr == NULL)
    + {
    + continue;
    + }
    +
    + if (strcmp(ptr, "kB") == 0)
    + {
    + size *= 1024;
    + }
    +
    + if ((smallest_size == -1 || size < smallest_size)
    + && InternalGetFreeHugepagesCount(ent->d_name) > 0)
    + {
    + smallest_size = size;
    + }
    + }
    + }
    +
    + closedir(dir);
    +
    + if (smallest_size == -1)
    + {
    + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING,
    + (errmsg("Could not find a valid hugepage size"),
    + errhint("This error usually means that either CONFIG_HUGETLB_PAGE "
    + "is not in kernel or that your architecture does not "
    + "support hugepages or you did not configure hugepages")));
    + }
    +
    + return smallest_size;
    +}
    +#endif
    +
    /*
    * PGSharedMemoryCreate
    *
    @@ -391,7 +556,17 @@ PGSharedMemoryCreate(Size size, bool makePrivate, int port)
    */
    #ifndef EXEC_BACKEND
    {
    +#ifdef MAP_HUGETLB
    + long pagesize = 0;
    +
    + if (huge_tlb_pages == HUGE_TLB_ON || huge_tlb_pages == HUGE_TLB_TRY)
    + pagesize = InternalGetHugepageSize();
    +
    + if (pagesize <= 0)
    + pagesize = sysconf(_SC_PAGE_SIZE);
    +#else
    long pagesize = sysconf(_SC_PAGE_SIZE);
    +#endif

    /*
    * Ensure request size is a multiple of pagesize.
    @@ -410,8 +585,22 @@ PGSharedMemoryCreate(Size size, bool makePrivate, int port)
    * to be false, we might need to add a run-time test here and do this
    * only if the running kernel supports it.
    */
    - AnonymousShmem = mmap(NULL, size, PROT_READ|PROT_WRITE, PG_MMAP_FLAGS,
    - -1, 0);
    +
    + if (huge_tlb_pages == HUGE_TLB_ON || huge_tlb_pages == HUGE_TLB_TRY)
    + {
    + AnonymousShmem = mmap(PG_HUGETLB_BASE_ADDR, size, PROT_READ|PROT_WRITE,
    + PG_MMAP_FLAGS|PG_MAP_HUGETLB, -1, 0);
    +
    + elog(DEBUG3, "mmap() tried with MAP_HUGEPAGE: %p", AnonymousShmem);
    + }
    +
    + if ((AnonymousShmem == MAP_FAILED && huge_tlb_pages == HUGE_TLB_TRY)
    + || huge_tlb_pages == HUGE_TLB_OFF)
    + {
    + AnonymousShmem = mmap(NULL, size, PROT_READ|PROT_WRITE, PG_MMAP_FLAGS,
    + -1, 0);
    + }
    +
    if (AnonymousShmem == MAP_FAILED)
    ereport(FATAL,
    (errmsg("could not map anonymous shared memory: %m"),
    diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
    index 745e7be..28b6191 100644
    --- a/src/backend/utils/misc/guc.c
    +++ b/src/backend/utils/misc/guc.c
    @@ -22,6 +22,7 @@
    #include <limits.h>
    #include <unistd.h>
    #include <sys/stat.h>
    +#include <sys/mman.h>
    #ifdef HAVE_SYSLOG
    #include <syslog.h>
    #endif
    @@ -389,6 +390,22 @@ static const struct config_enum_entry synchronous_commit_options[] = {
    };

    /*
    + * huge_tlb_pages may be on|off|try, where try is the default
    + * on: try to mmap() with MAP_HUGETLB and fail when mmap() fails
    + * off: do not try tp mmap() with MAP_HUGETLB
    + * try: try to mmap() with MAP_HUGETLB and fallback to mmap()
    + * w/o MAP_HUGETLB
    + */
    +static const struct config_enum_entry huge_tlb_options[] = {
    +#ifdef MAP_HUGETLB
    + {"on", HUGE_TLB_ON, false},
    + {"try", HUGE_TLB_TRY, false},
    +#endif
    + {"off", HUGE_TLB_OFF, false},
    + {NULL, 0, false}
    +};
    +
    +/*
    * Options for enum values stored in other modules
    */
    extern const struct config_enum_entry wal_level_options[];
    @@ -447,6 +464,12 @@ int tcp_keepalives_idle;
    int tcp_keepalives_interval;
    int tcp_keepalives_count;

    +#ifdef MAP_HUGETLB
    +int huge_tlb_pages = HUGE_TLB_TRY;
    +#else
    +int huge_tlb_pages = HUGE_TLB_OFF;
    +#endif
    +
    /*
    * These variables are all dummies that don't do anything, except in some
    * cases provide the value for SHOW to display. The real state is elsewhere
    @@ -3301,6 +3324,26 @@ static struct config_enum ConfigureNamesEnum[] =
    NULL, NULL, NULL
    },

    + {
    + {"huge_tlb_pages",
    +#ifdef MAP_HUGETLB
    + PGC_SUSET,
    +#else
    + PGC_INTERNAL,
    +#endif
    + RESOURCES_MEM,
    + gettext_noop("Enable/disable the use of the hugepages feature"),
    + NULL
    + },
    + &huge_tlb_pages,
    +#ifdef MAP_HUGETLB
    + HUGE_TLB_TRY,
    +#else
    + HUGE_TLB_OFF,
    +#endif
    + huge_tlb_options,
    + NULL, NULL, NULL
    + },

    /* End-of-list marker */
    {
    diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
    index eeb9b82..e5bafec 100644
    --- a/src/backend/utils/misc/postgresql.conf.sample
    +++ b/src/backend/utils/misc/postgresql.conf.sample
    @@ -113,6 +113,7 @@

    #shared_buffers = 32MB # min 128kB
    # (change requires restart)
    +#huge_tlb_pages = try # try to map memory with MAP_HUGETLB (on, off, try)
    #temp_buffers = 8MB # min 800kB
    #max_prepared_transactions = 0 # zero disables the feature
    # (change requires restart)
    diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
    index 06f797c..17f5870 100644
    --- a/src/include/utils/guc.h
    +++ b/src/include/utils/guc.h
    @@ -230,6 +230,24 @@ extern int tcp_keepalives_idle;
    extern int tcp_keepalives_interval;
    extern int tcp_keepalives_count;

    +
    +/*
    + * Possible values for huge_tlb_pages; default is HUGE_TLB_TRY
    + */
    +typedef enum
    +{
    + HUGE_TLB_OFF,
    + HUGE_TLB_ON,
    + HUGE_TLB_TRY
    +} HugeTlbType;
    +
    +
    +/*
    + * configure the use of huge TLB pages
    + */
    +extern int huge_tlb_pages;
    +
    +
    /*
    * Functions exported by guc.c
    */



    --
       Bruce Momjian <bruce@momjian.us> http://momjian.us
       EnterpriseDB http://enterprisedb.com

       + It's impossible for everything to be true. +
  • Andres Freund at Jun 29, 2013 at 6:24 am
    Hi,
    On 2013-06-28 23:03:22 -0400, Bruce Momjian wrote:
    Did we decide against specifying huge pages in Postgres?
    I don't think so. We need somebody to make some last modifications to
    the patch though and Christian doesn't seem to have the time atm.

    I think the bigger memory (size of the per process page table) and #cpus
    (number of pg backends => number of page tables allocated) the more
    imporant it gets to implement this. We already can spend gigabytes of
    memory on those on big systems.

    Greetings,

    Andres Freund

    --
      Andres Freund http://www.2ndQuadrant.com/
      PostgreSQL Development, 24x7 Support, Training & Services
  • Christian Kruse at Oct 30, 2012 at 10:10 pm
    Hey,
    On Tuesday 30 October 2012 20:33:18 Andres Freund wrote:
    +#ifdef MAP_HUGETLB
    +# ifdef __ia64__
    +# define PG_HUGETLB_BASE_ADDR (void *)(0x8000000000000000UL)
    +# define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED)
    +# else

    Not your fault, but that looks rather strange to me. The level of
    documentation around the requirement of using MAP_FIXED is subpar...
    Ok, after further reading I think with MAP_FIXED one has to „generate“ SHM
    segment addresses hisself. So 0x8000000000000000UL would just be one possible
    location for a hugepage.

    Since this feature is not used in PG for now, the code should work. Could
    someone with access to a ia64 architecture verify it?

    Greetings,
      CK

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedOct 29, '12 at 8:14p
activeJun 29, '13 at 6:24a
posts16
users6
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase