In response to my blog post on lseek contention, someone posted a
comment wherein they proposed using fstat() rather than lseek() to get
file sizes.

http://rhaas.blogspot.com/2011/08/linux-and-glibc-scalability.html

I tried that on a RHEL 6.1 machine with 64-cores running
2.6.32-131.6.1.el6.x86_64, and it's pretty clear that the locking
characteristics are completely different. At 1 client, the lseek
method appears to be slightly faster, although it's not beyond belief
that the difference could be in the noise. Above 40 cores, however,
the fstat method thumps the lseek method up one side and down the
other.

Patch and test results are attached. Test runs are 5-minute runs with
scale factor 100 and shared_buffers=8GB.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Search Discussions

  • Tom Lane at Aug 8, 2011 at 2:45 pm

    Robert Haas writes:
    In response to my blog post on lseek contention, someone posted a
    comment wherein they proposed using fstat() rather than lseek() to get
    file sizes.
    Patch and test results are attached. Test runs are 5-minute runs with
    scale factor 100 and shared_buffers=8GB.
    Thoughts?
    I'm a bit concerned by the fact that you've only tested this on one
    operating system, and thus the performance characteristics could be
    quite different elsewhere. The comment in mdextend also points out
    a way in which this might not be a win --- did you test anything besides
    read-only scenarios?

    regards, tom lane
  • Robert Haas at Aug 8, 2011 at 3:33 pm

    On Mon, Aug 8, 2011 at 10:45 AM, Tom Lane wrote:
    I'm a bit concerned by the fact that you've only tested this on one
    operating system, and thus the performance characteristics could be
    quite different elsewhere.  The comment in mdextend also points out
    a way in which this might not be a win --- did you test anything besides
    read-only scenarios? Nope.
    On Mon, Aug 8, 2011 at 10:49 AM, Andres Freund wrote:
    I don't think its a good idea to replace lseek with fstat in the long run. The
    likelihood that the lockless generic_file_llseek will get included seems rather
    high to me. In contrast to that fstat will always be more expensive than that
    as its going through a security check and then the fs' getattr implementation
    (which actually takes a lock on some fs).
    *scratches head* I understand that stat() would need a security
    check, but why would fstat()?

    I think both of you raise good points. I wasn't too enthusiastic
    about this approach either. It's not very appealing to adopt an
    approach where the right performance decision is going to depend on
    operating system, file system, kernel version, core count, and
    workload. We could add a GUC, but it would be pretty annoying to have
    a setting that won't matter for most people at all, except
    occasionally when it makes a huge difference.

    I wasn't aware that was any current activity around this on the Linux
    side. But Andres' comments made me Google it again, and now I see
    this:

    https://lkml.org/lkml/2011/6/16/800

    Andes, any idea what the status of that patch is? I'm not clear on
    how Linux works in terms of things getting upstreamed.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Andres Freund at Aug 8, 2011 at 5:10 pm

    On Monday, August 08, 2011 11:33:29 Robert Haas wrote:
    On Mon, Aug 8, 2011 at 10:49 AM, Andres Freund wrote:
    I don't think its a good idea to replace lseek with fstat in the long
    run. The likelihood that the lockless generic_file_llseek will get
    included seems rather high to me. In contrast to that fstat will always
    be more expensive than that as its going through a security check and
    then the fs' getattr implementation (which actually takes a lock on
    some fs).
    *scratches head* I understand that stat() would need a security
    check, but why would fstat()?
    That I am not totally sure of either. I guess Kaigai might know more about
    that.
    I guess it might be that a forked process possibly is not allowed anymore to
    access the information from an inherited file handle? Also I think a process
    can change its permissions during runtime.
    I think both of you raise good points. I wasn't too enthusiastic
    about this approach either. It's not very appealing to adopt an
    approach where the right performance decision is going to depend on
    operating system, file system, kernel version, core count, and
    workload. We could add a GUC, but it would be pretty annoying to have
    a setting that won't matter for most people at all, except
    occasionally when it makes a huge difference.

    I wasn't aware that was any current activity around this on the Linux
    side. But Andres' comments made me Google it again, and now I see
    this:

    https://lkml.org/lkml/2011/6/16/800

    Andes, any idea what the status of that patch is? I'm not clear on
    how Linux works in terms of things getting upstreamed.
    There doesn't seem to have been any activity to inlude it in 3.1. The merge
    window for 3.1 just ended. The next one will open for about a week after the
    release.
    Its also not yet included in linux-next which is a "preview" for the currently
    worked on release + 1. A release takes roughly 3 months.

    For upstreaming somebody needs to be persistent enough to convince one of the
    maintainers of the particular area to include the code so that linus then can
    pull that.
    I guess citing your numbers would go a long way in that direction. Naturally
    it would be even better to inlcude results with the patch applied.
    My largest machine I can reboot often enough to test such a thing has only two
    sockets (4cores E5520). I guess you cannot reboot your loaned machine with a
    new kernel easily?

    Greetings,
    Andres
  • Robert Haas at Aug 8, 2011 at 5:19 pm

    On Mon, Aug 8, 2011 at 1:10 PM, Andres Freund wrote:
    There doesn't seem to have been any activity to inlude it in 3.1. The merge
    window for 3.1 just ended. The next one will open for about a week after the
    release.
    Its also not yet included in linux-next which is a "preview" for the currently
    worked on release + 1. A release takes roughly 3 months.
    OK. If it doesn't get into Linux 3.2 we had better start thinking
    hard about a workaround on our side. I am not too concerned about
    people hitting this with PostgreSQL 9.1 or prior, because you'd
    basically need a workload targeted to exercise the problem, which
    workload is not that similar to the way people actually do things in
    real life. However, in PostgreSQL 9.2devel, it's going to be much
    more of a real-world problem, so I'd hate to wait until after our
    feature freeze and then decide we've got a problem we have to fix.
    For upstreaming somebody needs to be persistent enough to convince one of the
    maintainers of the particular area to include the code so that linus then can
    pull that.
    I guess citing your numbers would go a long way in that direction. Naturally
    it would be even better to inlcude results with the patch applied.
    My largest machine I can reboot often enough to test such a thing has only two
    sockets (4cores E5520). I guess you cannot reboot your loaned machine with a
    new kernel easily?
    Not really. I do have root access to a 64-core box at the moment, and
    I could probably get permission to reboot it, but if it didn't come
    back on-line that would be awkward.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Tom Lane at Aug 8, 2011 at 5:29 pm

    Robert Haas writes:
    Not really. I do have root access to a 64-core box at the moment, and
    I could probably get permission to reboot it, but if it didn't come
    back on-line that would be awkward.
    Red Hat has some test hardware that I can use (... pokes around ...)
    Hmm, this one looks promising:

    Memory NUMA Nodes
    64348 MB 4

    Cpu
    Vendor Model Name Family Model Stepping Speed Processors Cores Sockets Hyper
    GenuineIntel Intel(R) Xeon(R) CPU E7- 4860 @ 2.27GHz 6 47 2 1064.0 80 40 4 True

    If you can wrap something up to the point where someone else can
    run it, I'll give it a shot.

    regards, tom lane
  • Andres Freund at Aug 8, 2011 at 5:31 pm

    On Monday, August 08, 2011 13:19:13 Robert Haas wrote:
    On Mon, Aug 8, 2011 at 1:10 PM, Andres Freund wrote:
    There doesn't seem to have been any activity to inlude it in 3.1. The
    merge window for 3.1 just ended. The next one will open for about a
    week after the release.
    Its also not yet included in linux-next which is a "preview" for the
    currently worked on release + 1. A release takes roughly 3 months.
    OK. If it doesn't get into Linux 3.2 we had better start thinking
    hard about a workaround on our side.
    If its ok I will write a mail to lkml referencing this thread and your numbers
    inline (with attribution obviously).
    I don't think it will be that hard to convince them. But I constantly surprise
    myself with naivity so I may be wrong.

    My largest machine I can reboot often enough to test such a thing has only
    two sockets (4cores E5520). I guess you cannot reboot your loaned machine
    with a new kernel easily?
    Not really. I do have root access to a 64-core box at the moment, and
    I could probably get permission to reboot it, but if it didn't come
    back on-line that would be awkward.
    As I feared. Any chance that the person lending you the machine can give you a
    hand?
    Although I don't know how that could be after reading the code it would be
    disappointing to wait for 3.2 with the llseek fixes appearing in $distribution
    just to notice fstat is still faster for $unobvious_reason...

    Andres
  • Robert Haas at Aug 8, 2011 at 5:50 pm

    On Mon, Aug 8, 2011 at 1:31 PM, Andres Freund wrote:
    If its ok I will write a mail to lkml referencing this thread and your numbers
    inline (with attribution obviously).
    That would be great. Please go ahead.
    I don't think it will be that hard to convince them. But I constantly surprise
    myself with naivity so I may be wrong.
    Heh, heh, open source is fun.
    My largest machine I can reboot often enough to test such a thing has only
    two sockets (4cores E5520). I guess you cannot reboot your loaned machine
    with a new kernel easily?
    Not really.  I do have root access to a 64-core box at the moment, and
    I could probably get permission to reboot it, but if it didn't come
    back on-line that would be awkward.
    As I feared. Any chance that the person lending you the machine can give you a
    hand?
    Uh, maybe, but considering my relative inexperience in compiling the
    Linux kernel, I'd be a little worried about having to iterate too many
    times.
    Although I don't know how that could be after reading the code it would be
    disappointing to wait for 3.2 with the llseek fixes appearing in $distribution
    just to notice fstat is still faster for $unobvious_reason...
    Well, the good thing here is that we are really only concerned with
    gross effects. It's pretty obvious from the numbers I posted upthread
    that the problem is related to lock contention. If that gets fixed,
    and lseek is still 20% slower under some set of circumstances, it's
    not clear that we're really gonna care. I mean, maybe it would be
    nice to avoid going to the kernel at all here just so we're immune to
    possible inefficiencies in other operating systems (it would be nice
    if someone could repeat these tests on a big SMP box running Windows
    and/or one of BSD systems) and to save the overhead of a system call,
    but those effects are pretty tiny. We could spend a lot of time
    optimizing other things before that one percolated up to the top of
    the heap, at least based on what I've seen so far.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Andrea Suisani at Sep 16, 2011 at 1:27 pm
    hi
    On 08/08/2011 07:50 PM, Robert Haas wrote:
    On Mon, Aug 8, 2011 at 1:31 PM, Andres Freundwrote:
    If its ok I will write a mail to lkml referencing this thread and your numbers
    inline (with attribution obviously).
    That would be great. Please go ahead.
    I've just stumbled across this thread on lkml [1]
    "Improve lseek scalability v3".

    and I thought to ping pgsql hackers list
    just in case, more to the point they're
    asking "are there any real workloads which care
    [Make generic lseek lockless safe]"

    maybe I've got it wrong but it seems somewhat
    related to what has been discussed here and
    also in Robert Haas's "Linux and glibc Scalability"
    blog post [1].

    [cut]

    Andrea

    [1] https://lkml.org/lkml/2011/9/15/399
    [2] http://rhaas.blogspot.com/2011/08/linux-and-glibc-scalability.html
  • Andres Freund at Sep 16, 2011 at 1:30 pm

    On Friday 16 Sep 2011 15:19:07 Andrea Suisani wrote:
    hi
    On 08/08/2011 07:50 PM, Robert Haas wrote:
    On Mon, Aug 8, 2011 at 1:31 PM, Andres Freundwrote:
    If its ok I will write a mail to lkml referencing this thread and your
    numbers inline (with attribution obviously).
    That would be great. Please go ahead.
    I've just stumbled across this thread on lkml [1]
    "Improve lseek scalability v3".

    and I thought to ping pgsql hackers list
    just in case, more to the point they're
    asking "are there any real workloads which care
    [Make generic lseek lockless safe]"
    I wrote them a mail sometime ago (some weeks) regarding an earlier version of
    the patch... Can't find it right now though.

    Andres
  • Andres Freund at Aug 8, 2011 at 2:49 pm

    On Monday, August 08, 2011 10:30:38 Robert Haas wrote:
    In response to my blog post on lseek contention, someone posted a
    comment wherein they proposed using fstat() rather than lseek() to get
    file sizes.

    Thoughts?
    I don't think its a good idea to replace lseek with fstat in the long run. The
    likelihood that the lockless generic_file_llseek will get included seems rather
    high to me. In contrast to that fstat will always be more expensive than that
    as its going through a security check and then the fs' getattr implementation
    (which actually takes a lock on some fs).
    On the other hand its currently lockless if the security subsystem is compiled
    out (i.e. no selinux et al) for some common fs (ext3/4, xfs).

    Andres
  • Andres Freund at Oct 28, 2011 at 7:33 pm
    Hi All,

    The lseek patches just got included in Linus tree.

    Andres
  • Robert Haas at Oct 28, 2011 at 7:40 pm

    On Fri, Oct 28, 2011 at 3:33 PM, Andres Freund wrote:
    The lseek patches just got included in Linus tree.
    Excellent, thanks for the update!

    http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=ef3d0fd27e90f67e35da516dafc1482c82939a60

    So I guess this will be in Linux 3.2?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Andres Freund at Oct 28, 2011 at 8:22 pm
    Hi,
    On Friday, October 28, 2011 09:40:51 PM Robert Haas wrote:
    On Fri, Oct 28, 2011 at 3:33 PM, Andres Freund wrote:
    The lseek patches just got included in Linus tree.
    Excellent, thanks for the update!

    http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=ef3
    d0fd27e90f67e35da516dafc1482c82939a60

    So I guess this will be in Linux 3.2?
    Unless they get reverted for some reason, yes.


    Andres

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedAug 8, '11 at 2:30p
activeOct 28, '11 at 8:22p
posts14
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase