Hello

I would to use a special memory context for shared data (based on
mmap) and I like impementation of aset. There is only one difference -
aset is based on malloc and I would to use a mmap.

malloc() is used in AllocSetContextCreate and AllocSetAlloc. These
procedures should be overwritten, but other code and data structures
can be used. This step can be useful for previous discuss about some
more comfortable maintaining of shared memory.

What do you think about?

Regards

Pavel Stehule

Search Discussions

  • Robert Haas at Sep 7, 2010 at 12:10 pm

    On Tue, Sep 7, 2010 at 4:53 AM, Pavel Stehule wrote:
    I would to use a special memory context for shared data (based on
    mmap) and I like impementation of aset. There is only one difference -
    aset is based on malloc and I would to use a mmap.

    malloc() is used in AllocSetContextCreate and AllocSetAlloc. These
    procedures should be overwritten, but other code and data structures
    can be used. This step can be useful for previous discuss about some
    more comfortable maintaining of shared memory.

    What do you think about?
    What would this be good for?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Pavel Stehule at Sep 7, 2010 at 1:28 pm

    2010/9/7 Robert Haas <robertmhaas@gmail.com>:
    On Tue, Sep 7, 2010 at 4:53 AM, Pavel Stehule wrote:
    I would to use a special memory context for shared data (based on
    mmap) and I like impementation of aset. There is only one difference -
    aset is based on malloc and I would to use a mmap.

    malloc() is used in AllocSetContextCreate and AllocSetAlloc. These
    procedures should be overwritten, but other code and data structures
    can be used. This step can be useful for previous discuss about some
    more comfortable maintaining of shared memory.

    What do you think about?
    What would this be good for?
    I try to solve performance problems with czech tsearch. I checked
    serialization and deserialization, but this decrease load time only to
    100ms (from 500) that is too much for us. After some gaming with mmap
    I thinking so there some chance to preallocate mmap memory, and then
    use a special memory context based on mmap instead of malloc.
    Teoretically I can copy aset interface - this module probably never be
    in core (this problem is probably local - only Czech), but it isn't
    nice. So I asking.

    Regards

    Pavel Stehule

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Robert Haas at Sep 7, 2010 at 2:13 pm

    On Tue, Sep 7, 2010 at 9:27 AM, Pavel Stehule wrote:
    2010/9/7 Robert Haas <robertmhaas@gmail.com>:
    On Tue, Sep 7, 2010 at 4:53 AM, Pavel Stehule wrote:
    I would to use a special memory context for shared data (based on
    mmap) and I like impementation of aset. There is only one difference -
    aset is based on malloc and I would to use a mmap.

    malloc() is used in AllocSetContextCreate and AllocSetAlloc. These
    procedures should be overwritten, but other code and data structures
    can be used. This step can be useful for previous discuss about some
    more comfortable maintaining of shared memory.

    What do you think about?
    What would this be good for?
    I try to solve performance problems with czech tsearch. I checked
    serialization and deserialization, but this decrease load time only to
    100ms (from 500) that is too much for us. After some gaming with mmap
    I thinking so there some chance to preallocate mmap memory, and then
    use a special memory context based on mmap instead of malloc.
    Teoretically I can copy aset interface - this module probably never be
    in core (this problem is probably local - only Czech), but it isn't
    nice. So I asking.
    I don't see how you could do anything with this that you can't do with
    the existing implementation. It's not as if you can store pointers
    into an mmap'd block and then count on them being valid the next time
    you map the file... it might not end up at the same offset.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Tom Lane at Sep 7, 2010 at 2:56 pm

    Robert Haas writes:
    On Tue, Sep 7, 2010 at 9:27 AM, Pavel Stehule wrote:
    I try to solve performance problems with czech tsearch. I checked
    serialization and deserialization, but this decrease load time only to
    100ms (from 500) that is too much for us. After some gaming with mmap
    I thinking so there some chance to preallocate mmap memory, and then
    use a special memory context based on mmap instead of malloc.
    Teoretically I can copy aset interface - this module probably never be
    in core (this problem is probably local - only Czech), but it isn't
    nice. So I asking.
    I don't see how you could do anything with this that you can't do with
    the existing implementation. It's not as if you can store pointers
    into an mmap'd block and then count on them being valid the next time
    you map the file... it might not end up at the same offset.
    More to the point, this entire approach to speeding up dictionary loading
    has already been proposed and rejected, and it'll get rejected again if
    it's submitted.

    The conclusion of the previous discussion was that we should build
    "precompiled" dictionaries, using some pointer-free representation,
    which would be stored in files that could be either mmap'd in or just
    read in if running on a platform lacking mmap. There is no need for
    any shmem allocator in that implementation.

    regards, tom lane
  • Alvaro Herrera at Sep 7, 2010 at 3:18 pm

    Excerpts from Robert Haas's message of mar sep 07 10:13:12 -0400 2010:

    I try to solve performance problems with czech tsearch. I checked
    serialization and deserialization, but this decrease load time only to
    100ms (from 500) that is too much for us. After some gaming with mmap
    I thinking so there some chance to preallocate mmap memory, and then
    use a special memory context based on mmap instead of malloc.
    Teoretically I can copy aset interface - this module probably never be
    in core (this problem is probably local - only Czech), but it isn't
    nice. So I asking.
    I don't see how you could do anything with this that you can't do with
    the existing implementation. It's not as if you can store pointers
    into an mmap'd block and then count on them being valid the next time
    you map the file... it might not end up at the same offset.
    Hmm, surely you could store offsets instead of absolute pointers.

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Robert Haas at Sep 7, 2010 at 4:03 pm

    On Tue, Sep 7, 2010 at 11:18 AM, Alvaro Herrera wrote:
    Excerpts from Robert Haas's message of mar sep 07 10:13:12 -0400 2010:
    I try to solve performance problems with czech tsearch. I checked
    serialization and deserialization, but this decrease load time only to
    100ms (from 500) that is too much for us. After some gaming with mmap
    I thinking so there some chance to preallocate mmap memory, and then
    use a special memory context based on mmap instead of malloc.
    Teoretically I can copy aset interface - this module probably never be
    in core (this problem is probably local - only Czech), but it isn't
    nice. So I asking.
    I don't see how you could do anything with this that you can't do with
    the existing implementation.  It's not as if you can store pointers
    into an mmap'd block and then count on them being valid the next time
    you map the file...  it might not end up at the same offset.
    Hmm, surely you could store offsets instead of absolute pointers.
    Surely you could. But then where does palloc come in? As Tom said
    upthread, the right thing to do here is to create a pre-compiler that
    outputs a pointer-free representation which you can then mmap().

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Pavel Stehule at Sep 7, 2010 at 5:04 pm

    2010/9/7 Robert Haas <robertmhaas@gmail.com>:
    On Tue, Sep 7, 2010 at 9:27 AM, Pavel Stehule wrote:
    2010/9/7 Robert Haas <robertmhaas@gmail.com>:
    On Tue, Sep 7, 2010 at 4:53 AM, Pavel Stehule wrote:
    I would to use a special memory context for shared data (based on
    mmap) and I like impementation of aset. There is only one difference -
    aset is based on malloc and I would to use a mmap.

    malloc() is used in AllocSetContextCreate and AllocSetAlloc. These
    procedures should be overwritten, but other code and data structures
    can be used. This step can be useful for previous discuss about some
    more comfortable maintaining of shared memory.

    What do you think about?
    What would this be good for?
    I try to solve performance problems with czech tsearch. I checked
    serialization and deserialization, but this decrease load time only to
    100ms (from 500) that is too much for us. After some gaming with mmap
    I thinking so there some chance to preallocate mmap memory, and then
    use a special memory context based on mmap instead of malloc.
    Teoretically I can copy aset interface - this module probably never be
    in core (this problem is probably local - only Czech), but it isn't
    nice. So I asking.
    I don't see how you could do anything with this that you can't do with
    the existing implementation.  It's not as if you can store pointers
    into an mmap'd block and then count on them being valid the next time
    you map the file...  it might not end up at the same offset.
    you can, but you have to do preallocation and you have to use a FIXED flag.

    Pavel

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Robert Haas at Sep 7, 2010 at 6:31 pm

    On Tue, Sep 7, 2010 at 12:44 PM, Pavel Stehule wrote:
    I don't see how you could do anything with this that you can't do with
    the existing implementation.  It's not as if you can store pointers
    into an mmap'd block and then count on them being valid the next time
    you map the file...  it might not end up at the same offset.
    you can, but you have to do preallocation and you have to use a FIXED flag.
    MAP_FIXED? As TFM says: "Because requiring a fixed address for a
    mapping is less portable, the use of this option is discouraged."

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Pavel Stehule at Sep 7, 2010 at 6:36 pm

    2010/9/7 Robert Haas <robertmhaas@gmail.com>:
    On Tue, Sep 7, 2010 at 12:44 PM, Pavel Stehule wrote:
    I don't see how you could do anything with this that you can't do with
    the existing implementation.  It's not as if you can store pointers
    into an mmap'd block and then count on them being valid the next time
    you map the file...  it might not end up at the same offset.
    you can, but you have to do preallocation and you have to use a FIXED flag.
    MAP_FIXED?  As TFM says: "Because requiring a fixed address for a
    mapping is less portable, the use of this option  is  discouraged."
    yes, I know. This will be used for proprietary Czech language - 95% of
    postgresql instalations are on Linux, 10% on MS Windows (in Czech
    Republic)

    I don't plan to try to move this module to core. And it's useless -
    other languages has not our problems.

    Regards

    Pavel
    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Peter Eisentraut at Sep 16, 2010 at 6:26 pm

    On tis, 2010-09-07 at 20:35 +0200, Pavel Stehule wrote:
    I don't plan to try to move this module to core. And it's useless -
    other languages has not our problems.
    I don't know the details of what you're struggling with, but it's a bit
    hard to believe that there is a problem that is absolutely unique to the
    Czech language.
  • Pavel Stehule at Sep 16, 2010 at 6:44 pm

    2010/9/16 Peter Eisentraut <peter_e@gmx.net>:
    On tis, 2010-09-07 at 20:35 +0200, Pavel Stehule wrote:
    I don't plan to try to move this module to core. And it's useless -
    other languages has not our problems.
    I don't know the details of what you're struggling with, but it's a bit
    hard to believe that there is a problem that is absolutely unique to the
    Czech language.
    I think so people uses a steamer dictionary - because ispell
    dictionary should be slow for any language. But there are not
    available steamer for Czech language. People who need fast processing
    just use a simple dictionary - and probably there are not any pg
    hacker from Poland or Slovakia.

    Regards

    Pavel



    >
  • David Fetter at Sep 16, 2010 at 9:52 pm

    On Thu, Sep 16, 2010 at 08:43:37PM +0200, Pavel Stehule wrote:
    2010/9/16 Peter Eisentraut <peter_e@gmx.net>:
    On tis, 2010-09-07 at 20:35 +0200, Pavel Stehule wrote:
    I don't plan to try to move this module to core. And it's useless
    - other languages has not our problems.
    I don't know the details of what you're struggling with, but it's
    a bit hard to believe that there is a problem that is absolutely
    unique to the Czech language.
    I think so people uses a steamer dictionary - because ispell
    dictionary should be slow for any language. But there are not
    available steamer for Czech language. People who need fast
    processing just use a simple dictionary - and probably there are not
    any pg hacker from Poland or Slovakia.
    I know of at least one in Poland, and I'd be amazed if there were none
    from Slovakia.

    Cheers,
    David.
    --
    David Fetter <david@fetter.org> http://fetter.org/
    Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
    Skype: davidfetter XMPP: david.fetter@gmail.com
    iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

    Remember to vote!
    Consider donating to Postgres: http://www.postgresql.org/about/donate
  • Tom Lane at Sep 7, 2010 at 2:40 pm

    Pavel Stehule writes:
    I would to use a special memory context for shared data (based on
    mmap) and I like impementation of aset. There is only one difference -
    aset is based on malloc and I would to use a mmap.
    malloc() is used in AllocSetContextCreate and AllocSetAlloc. These
    procedures should be overwritten, but other code and data structures
    can be used. This step can be useful for previous discuss about some
    more comfortable maintaining of shared memory.
    What do you think about?
    If you're proposing factoring aset.c into two levels, I don't think so.
    That code is already a tremendous performance hot-spot and introducing
    any more inefficiency into it doesn't seem like a good idea. Especially
    not for shared memory allocation, which is a feature that still has
    no buy-in. Also, you'd need to do more than just replace malloc: you'd
    need to add locking capability. That would make the code even uglier,
    and slower, if it has to support locking or no locking dynamically.

    Use the mcxt.c switch. That's what it's there for.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedSep 7, '10 at 8:54a
activeSep 16, '10 at 9:52p
posts14
users6
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase