FAQ

On Wed, Mar 31, 2010 at 7:43 AM, Ask Bjørn Hansen wrote:
The main point here is that we can't use 20 inodes per distribution.
so don't. How much reengineering would be needed to keep CPAN in a
database instead of a file system?

Search Discussions

  • David Precious at Apr 1, 2010 at 9:50 am

    On Thursday 01 April 2010 05:39:27 David Nicol wrote:
    On Wed, Mar 31, 2010 at 7:43 AM, Ask Bjørn Hansen wrote:
    The main point here is that we can't use 20 inodes per distribution.
    so don't. How much reengineering would be needed to keep CPAN in a
    database instead of a file system?
    It'd mean each and every mirror operator changing how they sync their mirrors,
    and how access is provided...

    Currently, it's dead simple to sync a copy of CPAN via rsync, offer it up via
    whatever combination of HTTP, FTP and rsync you prefer, and job done - you're
    doing a valuable public service by offering a CPAN mirror.

    Make that process a lot harder (setting up database replication, custom
    scripts, etc etc) and a lot of people just won't do it.

    There's a lot to be said for keeping things simple.

    (FWIW, I run mirrors.uk2.net, and appreciated the fact it was simple and easy
    to get a mirror up and running without investing much time at all.
    Personally, I have no real problem with the current size of CPAN or the
    overhead of updating via rsync, but that's just my opinion.)

    Cheers

    Dave P
  • Eakin, Lee at Apr 1, 2010 at 4:16 pm
    Much of this discussion is beyond my depth but in terms of keeping it
    simple, and trying to limit the stat calls on the upstream servers,
    what about DNS as a replication model? You could break up the tree at
    logical divisions similar to zones and assign them serial numbers
    (say a .serial file) and then still use rsync, but broken up into modules to
    avoid recursion into sub-trees where the serial number is up to date?
    The rsyncd.conf could be published also so replicas use the same
    include/exclude logic.
    -lee
  • Arthur Corliss at Apr 1, 2010 at 5:49 pm
  • Ask Bjørn Hansen at Apr 1, 2010 at 10:58 pm

    On Apr 1, 2010, at 19:49, Arthur Corliss wrote:

    I've made a viable suggestion, and offered some time to work on it. But
    you've made it abundantly clear that it's not welcome.
    Talk = ZzZz.
    Code = Interesting.
    Deployment = Useful.


    - ask
  • Arthur Corliss at Apr 1, 2010 at 11:55 pm
  • Ask Bjørn Hansen at Apr 1, 2010 at 11:13 pm
    On Apr 1, 2010, at 19:49, Arthur Corliss wrote:

    I can't believe I'm doing this, but ...
    The main point here is that we can't use 20 inodes per distribution. It's Just Nuts. Sure, it's only something like 400k files/inodes now - but at the rate it's going it'll be a lot more soon enough.
    Thats a problem, but not likely the biggest drag on server I/O you're
    suffering. Might that be <ahem> rsync?
    That reply doesn't even make sense.
    HOWEVER: Right now more of those are wasted on other things (.readme files, symlinks, ...) -- some of which have solutions in progress already.

    I don't think anyone is arguing that we NEED to delete the old distributions; only that they do indeed have a cost to keep around in the main CPAN.
    You're right, I'm not arguing the need for the cruft. I've only pointed out
    the obvious reality that trimming files only postpones the I/O management
    issues that at some time are likely going to have to be addressed, anyway.
    And that you'll get less bang for the buck (or man hour) by treating the
    symptoms, not the disease.

    For the record: if that's what you want to do, have at it. Let's just not
    be disingenuous about the fact that we're abrogating our responsibilities as
    technologists by refusing to address the real problems and weaknesses of the
    platform.
    You are confusing "we", "I" and "you" again.

    ....

    Yes, I (and I'm guessing everyone else who have thought about it for more than say 5 seconds) agree that having rsync remember the file tree to save the disk IO for each sync sounds like an "obvious solution".

    But reality is more complicated. If it was such an obviously good solution someone would have done it by now. (For starters play this question: "What is the kernel cache?").

    Andreas' solution is much more sensible -- and as have been pointed out before we DO USE THAT; but the problem here is not with clients who are interested enough to do something special and dedicate resources to their CPAN mirroring.


    - ask
  • Arthur Corliss at Apr 1, 2010 at 11:50 pm
  • Ask Bjørn Hansen at Apr 2, 2010 at 2:37 pm

    On Apr 2, 2010, at 1:50, Arthur Corliss wrote:

    And my assertion has been that the excessive stats by the server are a bigger
    impediment to synchronization than the inode count.
    Well, then one of us don't understand how file systems etc work. :-)


    - ask
  • Arthur Corliss at Apr 2, 2010 at 5:10 pm
  • David Nicol at Apr 4, 2010 at 9:11 pm

    It hasn't been done because its outside of the scope of design for rsync.
    It's meant to sync arbitrary filesets in which many, if not all, changes are
    made out of band.  It's decidely non-trivial to implement in that mode
    unless you're willing to accept a certain window in which your database may
    be out of date.

    But, in a situation like PAUSE, where the avenues in which files can be
    introduced into the file sets is controlled, it does become trivial.  It's
    the gatekeeper, it knows who's been in or out.
    so the requirements for the Solution To The Problem Which Solves A
    More General Problem Than The Immediate Problem And Will Therefore
    Make Whoever Sets It Up A Hero include a replacement for the current
    mirroring technology stack that is tailored to mirroring distributions
    possibly including on-demand caching and expiration and that is
    trivial to install -- something like

    perl -MCPAN -e 'install STTPWSAMGPTTIPAWTMWSIUAH::Mirrorsuite'
    nohup nice nice perl -MSTTPWSAMGPTTIPAWTMWSIUAH::Mirrorsuite -e
    'mirror cpan.org .' &
  • Arthur Corliss at Apr 5, 2010 at 4:24 pm

    On Sun, 4 Apr 2010, David Nicol wrote:

    so the requirements for the Solution To The Problem Which Solves A
    More General Problem Than The Immediate Problem And Will Therefore
    Make Whoever Sets It Up A Hero include a replacement for the current
    mirroring technology stack that is tailored to mirroring distributions
    possibly including on-demand caching and expiration and that is
    trivial to install -- something like

    perl -MCPAN -e 'install STTPWSAMGPTTIPAWTMWSIUAH::Mirrorsuite'
    nohup nice nice perl -MSTTPWSAMGPTTIPAWTMWSIUAH::Mirrorsuite -e
    'mirror cpan.org .' &
    Gee, kind of looks like your tongue got superglued to your cheek. You're
    mischaracterizing the problem. The immediate problem *is* the I/O load
    caused by synchronizing mirrors with rsync, *not* supporting CPAN clients,
    right? If you have data indicating something different, then please provide
    it so we can all get educated.

    Regardless, it should be that easy to install, but it should also install a
    script into bin/ to make ye ole cron job just as succinct as what's
    currently being used with rsync.

    --Arthur Corliss
    Live Free or Die

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcpan-workers @
categoriesperl
postedApr 1, '10 at 4:39a
activeApr 5, '10 at 4:24p
posts12
users5
websitecpan.org

People

Translate

site design / logo © 2021 Grokbase