On Sat, Feb 7, 2015 at 3:44 PM, Stephen John Smoogen wrote:
On 7 February 2015 at 08:12, Tim Verhoeven wrote:


I've been thinking a bit about this. The best solution IMHO besides
building your our CDN, which is indeed a bit over the top for this, is
to push these updates instead of working with a pull method. So would
it be possible to find some mirrors that would allow us to push
packages into our repo's on their servers. In case of releases that
need to go out quickly we could use a seperate mirrorlist that only
includes our servers and the mirrors that allows us to push to. So we
can move the needed packages our quickly and let users get them fast.
Later as the other mirrors sync up we just go back to the normal

Stupid idea or not?
I don't think it is "stupid", but it is overly simplified. Just going off of
the EPEL checkins to mirrorlist there are at least 400k->600k active systems
which are going to be checking hourly for updates for an emergency update.
The number of mirrors who are going to allow a push system are going to have
to be large enough to deal with the thundering herd problem when an update
occurs and 500k systems checkin at 10 after the hour (seems like a common
time for boxes which check in hourly) all see there is a new update and
start pulling from it.

There are approaches that could make it more effective. One of them is
an inventory based update mechanism: A server side flag, available to
clients, to report changes in the repository and allow clients to
efficiently update by scanning that flag for new files and repodata
information could be far more efficient for many sites.

One of the subtler difficulties, and this is being ignored by the
Fedora migrations to dnf, is the cost of the metadata updates. The
repodata alone is over 500 MBytes for CentOS 7. This is *nsane* to
keep transmitting for every micro-update or critical update. Scaled
out across a bulky local cluster and simply running "yum check-ipdate"
can saturate your bandwidth, and has done so for me. That's why I use
local mirrors when possible. But then, hey, my local mirror has to
pull these alerts *all the time*, which puts it in a constant state of
churn for the repository information. It gets out of hand very

The underlying solution to the bulky repodata is to *stop using
monolithic repodata*. Switch to a much, much lighter weight repodata
and stop trying to invent new, bulky, confusing features such as
"Recommends" and concentrate on splitting it much like "apt" splits up
its repositories. One package, one small header file, if the package
updates update *that* header file instead of a monolithic database.

I realize that's not going to happen right now: too much work as hbeen
invested in yum and dnf as they exist to do this. But it's worth
keeping in mind, it sets a half Gig transmisison cost to *any*
repository updates of the main OS repositories.

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 25 of 27 | next ›
Discussion Overview
groupcentos-devel @
postedFeb 3, '15 at 1:38p
activeFeb 8, '15 at 11:08a



site design / logo © 2021 Grokbase