FAQ
I've cloned the cpanpm repo and played with CPAN::Mirrors to make it a
bit more useful for things outside of CPAN.pm. Most of its use in
CPAN.pm is to support urllist in CPAN::FirstTime, but I want to use it
to also update the urllist as I travel, and create other interesting
applications for it.

I haven't merged anything, but you can look at the firsttime branch:

https://github.com/briandfoy/cpanpm/blob/firsttime/lib/CPAN/Mirrors.pm

I have some other refactorings to do, but I don't want to drop them all
on you at once.

And, who makes the MIRRORED.BY file? I imagine that's something from a
script that Jarkko makes, but how does it get the data? I'd like to see
about exporting it as JSON or something. Also, is it something that
noc.perl.org has to handle as the master CPAN moves off of FUNET?

--
brian d foy <brian.d.foy@gmail.com>

Search Discussions

  • Ask Bjørn Hansen at Feb 8, 2011 at 7:31 am

    On Feb 7, 2011, at 22:55, brian d foy wrote:

    And, who makes the MIRRORED.BY file? I imagine that's something from a
    script that Jarkko makes, but how does it get the data?
    Henk Penning (aka the mirror list master since a few months ago) maintains a master mirrors.json file that a script on the perl.org servers converts to MIRRORED.BY.
    I'd like to see about exporting it as JSON or something.
    Henk and I were just talking about adding a mirrors.json file a few days ago actually... So yes, coming soon.
    Also, is it something that noc.perl.org has to handle as the master CPAN moves off of FUNET?
    Already done; just not announced yet.


    - ask
  • Adam Kennedy at Feb 8, 2011 at 10:44 am
    I think I may have implemented what you're looking for several years
    ago for JSAN, which has a client that auto-detected appropriate
    mirrors in a few seconds each time it starts.

    http://search.cpan.org/~adamk/Mirror-URI-0.90/lib/Mirror/YAML.pm

    Or at least, something similar to it.

    It autodetects mirrors, can validate them for both speed and
    staleness, and doesn't give a crap about where they or you are
    physically in the world.

    It auto-updates to new master servers, and is resistant to mirror/repo
    hijacking, and it can (with a tweak to support an async library) do it
    in parallel.

    Adam K

    On Tue, Feb 8, 2011 at 5:55 PM, brian d foy wrote:
    I've cloned the cpanpm repo and played with CPAN::Mirrors to make it a
    bit more useful for things outside of CPAN.pm. Most of its use in
    CPAN.pm is to support urllist in CPAN::FirstTime, but I want to use it
    to also update the urllist as I travel, and create other interesting
    applications for it.

    I haven't merged anything, but you can look at the firsttime branch:

    https://github.com/briandfoy/cpanpm/blob/firsttime/lib/CPAN/Mirrors.pm

    I have some other refactorings to do, but I don't want to drop them all
    on you at once.

    And, who makes the MIRRORED.BY file? I imagine that's something from a
    script that Jarkko makes, but how does it get the data? I'd like to see
    about exporting it as JSON or something. Also, is it something that
    noc.perl.org has to handle as the master CPAN moves off of FUNET?

    --
    brian d foy <brian.d.foy@gmail.com>
  • Brian d foy at Apr 28, 2011 at 4:19 pm

    On Tue, Feb 8, 2011 at 4:44 AM, Adam Kennedy wrote:
    I think I may have implemented what you're looking for several years
    ago for JSAN, which has a client that auto-detected appropriate
    mirrors in a few seconds each time it starts.

    http://search.cpan.org/~adamk/Mirror-URI-0.90/lib/Mirror/YAML.pm
    I was looking at this, but it seems like the idea of downloading a
    small file from several mirrors isn't a good way to figure out which
    mirrors to use, especially with a large number of mirrors.

    I guess you could randomly choose some mirrors and keep checking until
    you find some that are fast enough.

    However, shouldn't knowing something about the location can start that
    more quickly when there are several hundred mirrors?

    --
    brian d foy <brian.d.foy@gmail.com>
    http://www.pair.com/~comdog/
  • Graham Barr at Apr 28, 2011 at 4:28 pm

    On Apr 28, 2011, at 11:19 , brian d foy wrote:
    On Tue, Feb 8, 2011 at 4:44 AM, Adam Kennedy wrote:
    I think I may have implemented what you're looking for several years
    ago for JSAN, which has a client that auto-detected appropriate
    mirrors in a few seconds each time it starts.

    http://search.cpan.org/~adamk/Mirror-URI-0.90/lib/Mirror/YAML.pm
    I was looking at this, but it seems like the idea of downloading a
    small file from several mirrors isn't a good way to figure out which
    mirrors to use, especially with a large number of mirrors.

    I guess you could randomly choose some mirrors and keep checking until
    you find some that are fast enough.

    However, shouldn't knowing something about the location can start that
    more quickly when there are several hundred mirrors?
    No need for everyone to contact all mirrors.

    http://mirrors.cpan.org/ constantly monitors CPAN mirrors for freshness. This data can be obtained in JSON form at

    http://mirrors.cpan.org/cpan-json.txt

    for example

    {
    "url" : "http://ftp.wa.co.za/pub/CPAN/",
    "city" : "Cape Town",
    "region" : null,
    "country" : "South Africa",
    "continent" : "Africa",
    "cc" : "za",
    "age" : "1303986001",
    "last_status" : "ok",
    "last_ok_probe" : "1304001901"
    },

    It checks each mirror by fetching a file which is constantly updated on the CPAN master site. The age field is the epoch timestamp for that file

    Graham.
  • Brian d foy at Apr 28, 2011 at 4:32 pm

    However, shouldn't knowing something about the location can start that
    more quickly when there are several hundred mirrors?
    No need for everyone to contact all mirrors.
    You're talking about something different than Adam's Mirror::YAML,
    which doesn't carry location information. The stuff that we have now,
    which is what you are talking about, seems to work better and is what
    CPAN::Mirror already uses, although not in the JSON form yet.

    --
    brian d foy <brian.d.foy@gmail.com>
    http://www.pair.com/~comdog/
  • Ask Bjørn Hansen at Apr 29, 2011 at 10:58 am

    On Apr 28, 2011, at 9:19, brian d foy wrote:

    I think I may have implemented what you're looking for several years
    ago for JSAN, which has a client that auto-detected appropriate
    mirrors in a few seconds each time it starts.

    http://search.cpan.org/~adamk/Mirror-URI-0.90/lib/Mirror/YAML.pm
    I was looking at this, but it seems like the idea of downloading a
    small file from several mirrors isn't a good way to figure out which
    mirrors to use, especially with a large number of mirrors.
    What's the goal here?

    "Faster" is sorta dumb, really. There are few files on CPAN that are significantly bigger that the checking for a "faster" mirror won't take longer than just getting the file from a slower mirror.

    If it's to find a good/up-to-date mirror, then there are a couple of json files available (on CPAN and the mirrors.cpan.org server).

    I'll talk to Henk about getting the mirrors.json file - http://www.cpan.org/indices/mirrors.json - to include a "is this mirror good?" flag of sorts.


    - ask
  • Adam Kennedy at Apr 29, 2011 at 3:58 pm
    It makes a bigger difference for minicpan than for cpan itself.

    But "fast enough" is important, and that fast enough be network relative.

    Adam K
    On Fri, Apr 29, 2011 at 8:57 PM, Ask Bjørn Hansen wrote:
    On Apr 28, 2011, at 9:19, brian d foy wrote:

    I think I may have implemented what you're looking for several years
    ago for JSAN, which has a client that auto-detected appropriate
    mirrors in a few seconds each time it starts.

    http://search.cpan.org/~adamk/Mirror-URI-0.90/lib/Mirror/YAML.pm
    I was looking at this, but it seems like the idea of downloading a
    small file from several mirrors isn't a good way to figure out which
    mirrors to use, especially with a large number of mirrors.
    What's the goal here?

    "Faster" is sorta dumb, really.  There are few files on CPAN that are significantly bigger that the checking for a "faster" mirror won't take longer than just getting the file from a slower mirror.

    If it's to find a good/up-to-date mirror, then there are a couple of json files available (on CPAN and the mirrors.cpan.org server).

    I'll talk to Henk about getting the mirrors.json file - http://www.cpan.org/indices/mirrors.json - to include a "is this mirror good?" flag of sorts.


    - ask
  • Ask Bjørn Hansen at May 6, 2011 at 6:34 am

    On May 2, 2011, at 12:54, David Golden wrote:

    Doesn't a lot of this problem go away with www.cpan.org resolving to
    tier 1 mirrors? Did I see that Robert/Ask were using some sort of
    GeoIP-aware DNS?
    Yeah, though for now only with mirrors in Los Angeles and one in Europe. Somewhat similar to the search.cpan.org setup.

    (And everyone mirroring from www.cpan.org would sorta defeat the purpose of having a bunch of mirrors -- unless we put more mirrors "behind" the www.cpan.org name.).


    - ask
  • Perl at May 7, 2011 at 6:33 am

    On 05/05/2011 11:34 PM, Ask Bjørn Hansen wrote:
    On May 2, 2011, at 12:54, David Golden wrote:

    Doesn't a lot of this problem go away with www.cpan.org resolving to
    tier 1 mirrors? Did I see that Robert/Ask were using some sort of
    GeoIP-aware DNS?
    Yeah, though for now only with mirrors in Los Angeles and one in Europe. Somewhat similar to the search.cpan.org setup.

    (And everyone mirroring from www.cpan.org would sorta defeat the purpose of having a bunch of mirrors -- unless we put more mirrors "behind" the www.cpan.org name.).


    - ask
    Hmm, while reading this I was reminded of the pool.ntp.org concept, is
    it applicable here? :)

    --
    ~Apocalypse ( APOCAL )
  • Ask Bjørn Hansen at May 10, 2011 at 5:00 am

    On May 6, 2011, at 23:33, perl@0ne.us wrote:

    (And everyone mirroring from www.cpan.org would sorta defeat the purpose of having a bunch of mirrors -- unless we put more mirrors "behind" the www.cpan.org name.).
    Hmm, while reading this I was reminded of the pool.ntp.org concept, is
    it applicable here? :)

    Yes, search.cpan.org is using the same software as the NTP Pool for DNS.

    However - http://www.cpan.org/ doesn't get enough requests that bandwidth is a concern so it's really just about making it (very slightly) faster or maybe trading complexity to get it a little bit more reliable. The benefits are not really obvious enough to make it to the top of my todo.

    Optimizing/distributing cpan-rsync.perl.org will probably come first, but that also got down the priority list a bit.

    Just looking at the logs for today there are about 60 mirrors using rrr and about the same number doing an occasional full rsync (between once a day and every few hours) and the load is basically completely negligible so far.

    'rrr' drastically cuts down the IO required and the SSD that's serving the CPAN data can just do a crazy amount of "rsync type I/O".


    - ask

    --
    Ask Bjørn Hansen, http://askask.com/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcpan-workers @
categoriesperl
postedFeb 8, '11 at 6:55a
activeMay 10, '11 at 5:00a
posts11
users5
websitecpan.org

People

Translate

site design / logo © 2018 Grokbase