FAQ
Tom and, I believe, also Andrew have expressed some concerns about the
space that will be taken up by having multiple copies of the git
repository on their systems. While most users can probably get by
with a single repository, committers will likely need one for each
back-branch that they work with, and we have quite a few of those.

After playing around with this a bit, I've come to the conclusion that
there are a couple of possible options but they've all got some
drawbacks.

1. Clone the origin. Then, clone the clone n times locally. This
uses hard links, so it saves disk space. But, every time you want to
pull, you first have to pull to the "main" clone, and then to each of
the "slave" clones. And same thing when you want to push.

2. Clone the origin n times. Use more disk space. Live with it. :-)

3. Clone the origin once. Apply patches to multiple branches by
switching branches. Playing around with it, this is probably a
tolerable way to work when you're only going back one or two branches
but it's certainly a big nuisance when you're going back 5-7 branches.

4. Clone the origin. Use that to get at the master branch. Then
clone that clone n-1 times, one for each back-branch. This makes it a
bit easier to push and pull when you're only dealing with the master
branch, but you still have the double push/double pull problem for all
the other branches.

5. Use git clone --shared or git clone --references or
git-new-workdir. While I once thought this was the solution, I can't
take very seriously any solution that has a warning in the manual that
says, essentially, git gc may corrupt your repository if you do this.

I'm not really sure which of these I'm going to do yet, and I'm not
sure what to recommend to others, either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Search Discussions

  • Aidan Van Dyk at Jul 20, 2010 at 5:28 pm

    * Robert Haas [100720 13:04]:

    3. Clone the origin once. Apply patches to multiple branches by
    switching branches. Playing around with it, this is probably a
    tolerable way to work when you're only going back one or two branches
    but it's certainly a big nuisance when you're going back 5-7 branches.
    This is what I do when I'm working on a project that has completely
    proper dependancies, and you don't need to always re-run configure
    between different branches. I use ccache heavily, so configure takes
    longer than a complete build with a couple-dozen
    actually-not-previously-seen changes...

    But *all* dependancies need to be proper in the build system, or you end
    up needing a git-clean-type-cleanup between branch switches, forcing a
    new configure run too, which takes too much time...

    Maybe this will cause make dependancies to be refined in PG ;-)

    It has the advantage, that if "back patching" (or in reality, forward
    patching) all happens in 1 repository, the git conflict machinery is all
    using the same cache of resolutions, meaning that if you apply the same
    patch to 2 different branches, with identical code/conflict, you don't
    need to do the whole conflict resolution by hand from scratch in the 2nd
    branch.
    5. Use git clone --shared or git clone --references or
    git-new-workdir. While I once thought this was the solution, I can't
    take very seriously any solution that has a warning in the manual that
    says, essentially, git gc may corrupt your repository if you do this.
    This is the type of setup I often use. I have a "central" set of git
    repos that I have automatically straight mirror-clones of project
    repositories. And they are kept up-to-date via cron. And any time I
    clone a work repo, I use --reference.

    Since I make sure I don't "remove" anything from the reference repo, I
    don't have to worry about loosing objects other repositories might be
    using from the "cache" repo. In case anyone is wondering, that's:
    git clone --mirror $REPO /data/src/cache/$project.git
    git --git-dir=/data/src/cache/$project.git config gc.auto 0

    And then in crontab:
    git --git-dir=/data/src/cache/$project.git fetch --quiet --all

    With gc.auto disabled, and the only commands ever run being "git fetch",
    no objects are removed, even if a remote rewinds and throws away
    commits.

    But this way means that the seperate repos only share the "past, from
    central repository" history, which means that you have to jump through
    hoops if you want to be able to use git's handyj
    merging/cherry-picking/conflict tools when trying to rebase/port
    patches between branches. You're pretty much limited to exporting a
    patch, changing to a the new branch-repository, and applying the patch.

    a.

    --
    Aidan Van Dyk Create like a god,
    aidan@highrise.ca command like a king,
    http://www.highrise.ca/ work like a slave.
  • Peter Eisentraut at Jul 20, 2010 at 6:24 pm

    On tis, 2010-07-20 at 13:28 -0400, Aidan Van Dyk wrote:
    But *all* dependancies need to be proper in the build system, or you
    end
    up needing a git-clean-type-cleanup between branch switches, forcing a
    new configure run too, which takes too much time...
    This realization, while true, doesn't really help, because we are
    talking about maintaining 5+ year old back branches, where we are not
    going to fiddle with the build system at this time. Also, the switch
    from 9.0 to 9.1 the other day showed everyone who cared to watch that
    the dependencies are currently not correct for major version switches,
    so this method will definitely not work at the moment.
  • Dimitri Fontaine at Jul 21, 2010 at 7:00 pm

    Aidan Van Dyk writes:
    * Robert Haas [100720 13:04]:
    3. Clone the origin once. Apply patches to multiple branches by
    switching branches. Playing around with it, this is probably a
    tolerable way to work when you're only going back one or two branches
    but it's certainly a big nuisance when you're going back 5-7 branches.
    This is what I do when I'm working on a project that has completely
    proper dependancies, and you don't need to always re-run configure
    between different branches. I use ccache heavily, so configure takes
    longer than a complete build with a couple-dozen
    actually-not-previously-seen changes...

    But *all* dependancies need to be proper in the build system, or you end
    up needing a git-clean-type-cleanup between branch switches, forcing a
    new configure run too, which takes too much time...

    Maybe this will cause make dependancies to be refined in PG ;-)
    Well, there's also the VPATH possibility, where all your build objects
    are stored out of the way of the repo. So you could checkout the branch
    you're interrested in, change to the associated build directory and
    build there. And automate that of course.

    Regards,
    --
    dim
  • Alvaro Herrera at Jul 21, 2010 at 7:29 pm

    Excerpts from Dimitri Fontaine's message of mié jul 21 15:00:48 -0400 2010:

    Well, there's also the VPATH possibility, where all your build objects
    are stored out of the way of the repo. So you could checkout the branch
    you're interrested in, change to the associated build directory and
    build there. And automate that of course.
    This does not work as cleanly as you suppose, because some "build
    objects" are stored in the source tree. configure being one of them.
    So if you switch branches, configure is rerun even in a VPATH build,
    which is undesirable.
  • Dimitri Fontaine at Jul 21, 2010 at 9:06 pm

    Alvaro Herrera writes:
    This does not work as cleanly as you suppose, because some "build
    objects" are stored in the source tree. configure being one of them.
    So if you switch branches, configure is rerun even in a VPATH build,
    which is undesirable.
    Ouch. Reading -hackers led me to thinking this had received a cleaning
    effort in the Makefiles, so that any generated file appears in the build
    directory. Sorry to learn that's not (yet?) the case.

    Regards,
    --
    dim
  • Peter Eisentraut at Jul 22, 2010 at 9:34 am

    On ons, 2010-07-21 at 23:06 +0200, Dimitri Fontaine wrote:
    Alvaro Herrera <alvherre@commandprompt.com> writes:
    This does not work as cleanly as you suppose, because some "build
    objects" are stored in the source tree. configure being one of them.
    So if you switch branches, configure is rerun even in a VPATH build,
    which is undesirable.
    Ouch. Reading -hackers led me to thinking this had received a cleaning
    effort in the Makefiles, so that any generated file appears in the build
    directory. Sorry to learn that's not (yet?) the case.
    It is, but not in the back branches.
  • Kevin Grittner at Jul 20, 2010 at 5:51 pm

    Robert Haas wrote:

    2. Clone the origin n times. Use more disk space. Live with it.
    :-)

    But each copy uses almost 0.36% of the formatted space on my 150GB
    drive!

    -Kevin
  • Peter Eisentraut at Jul 20, 2010 at 7:28 pm

    On tis, 2010-07-20 at 13:04 -0400, Robert Haas wrote:
    2. Clone the origin n times. Use more disk space. Live with it. :-)
    Well, I plan to use cp -a to avoid cloning over the network n times, but
    other than that that was my plan. My .git directory currently takes 283
    MB, so I think I can just about live with that.
  • Andrew Dunstan at Jul 20, 2010 at 8:22 pm

    Robert Haas wrote:
    Tom and, I believe, also Andrew have expressed some concerns about the
    space that will be taken up by having multiple copies of the git
    repository on their systems. While most users can probably get by
    with a single repository, committers will likely need one for each
    back-branch that they work with, and we have quite a few of those.

    After playing around with this a bit, I've come to the conclusion that
    there are a couple of possible options but they've all got some
    drawbacks.

    1. Clone the origin. Then, clone the clone n times locally. This
    uses hard links, so it saves disk space. But, every time you want to
    pull, you first have to pull to the "main" clone, and then to each of
    the "slave" clones. And same thing when you want to push.

    You can have a cron job that does the first pull fairly frequently. It
    should be a fairly cheap operation unless the git protocol is dumber
    than I think.

    The second pull is the equivalent of what we do now with "cvs update".

    Given that, you could push commits direct to the authoritative repo and
    wait for the cron job to catch up your local base clone.

    I think that's the pattern I will probably try to follow.

    cheers

    andrew
  • Abhijit Menon-Sen at Jul 21, 2010 at 10:42 am

    At 2010-07-20 13:04:12 -0400, robertmhaas@gmail.com wrote:
    1. Clone the origin. Then, clone the clone n times locally. This
    uses hard links, so it saves disk space. But, every time you want to
    pull, you first have to pull to the "main" clone, and then to each of
    the "slave" clones. And same thing when you want to push.
    If your extra clones are for occasionally-touched back branches, then:

    (a) In my experience, it is almost always much easier to work with many
    branches and move patches between them rather than use multiple clones;
    but

    (b) You don't need to do the double-pull and push. Clone your local
    repository as many times as needed, but create new git-remote(1)s in
    each extra clone and pull/push only the branch you care about directly
    from or to the remote. That way, you'll start off with the bulk of the
    storage shared with your main local repository, and "waste" a few KB
    when you make (presumably infrequent) new changes.

    But that brings me to another point:

    In my experience (doing exactly this kind of old-branch-maintenance with
    Archiveopteryx), git doesn't help you much if you want to backport (i.e.
    cherry-pick) changes from a development branch to old release branches.
    It is much more helpful when you make changes to the *oldest* applicable
    branch and bring it *forward* to your development branch (by merging the
    old branch into your master). Cherry-picking can be done, but it becomes
    painful after a while.

    See http://toroid.org/ams/etc/git-merge-vs-p4-integrate for more.

    -- ams
  • Robert Haas at Jul 21, 2010 at 10:39 am

    On Wed, Jul 21, 2010 at 6:17 AM, Abhijit Menon-Sen wrote:
    At 2010-07-20 13:04:12 -0400, robertmhaas@gmail.com wrote:

    1. Clone the origin.  Then, clone the clone n times locally.  This
    uses hard links, so it saves disk space.  But, every time you want to
    pull, you first have to pull to the "main" clone, and then to each of
    the "slave" clones.  And same thing when you want to push.
    If your extra clones are for occasionally-touched back branches, then:

    (a) In my experience, it is almost always much easier to work with many
    branches and move patches between them rather than use multiple clones;
    but

    (b) You don't need to do the double-pull and push. Clone your local
    repository as many times as needed, but create new git-remote(1)s in
    each extra clone and pull/push only the branch you care about directly
    from or to the remote. That way, you'll start off with the bulk of the
    storage shared with your main local repository, and "waste" a few KB
    when you make (presumably infrequent) new changes.
    Ah, that is clever. Perhaps we need to write up directions on how to do that.
    But that brings me to another point:

    In my experience (doing exactly this kind of old-branch-maintenance with
    Archiveopteryx), git doesn't help you much if you want to backport (i.e.
    cherry-pick) changes from a development branch to old release branches.
    It is much more helpful when you make changes to the *oldest* applicable
    branch and bring it *forward* to your development branch (by merging the
    old branch into your master). Cherry-picking can be done, but it becomes
    painful after a while.
    Well, per previous discussion, we're not going to change that at this
    point, or maybe ever.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Magnus Hagander at Jul 21, 2010 at 10:42 am

    On Wed, Jul 21, 2010 at 12:39, Robert Haas wrote:
    On Wed, Jul 21, 2010 at 6:17 AM, Abhijit Menon-Sen wrote:
    At 2010-07-20 13:04:12 -0400, robertmhaas@gmail.com wrote:

    1. Clone the origin.  Then, clone the clone n times locally.  This
    uses hard links, so it saves disk space.  But, every time you want to
    pull, you first have to pull to the "main" clone, and then to each of
    the "slave" clones.  And same thing when you want to push.
    If your extra clones are for occasionally-touched back branches, then:

    (a) In my experience, it is almost always much easier to work with many
    branches and move patches between them rather than use multiple clones;
    but

    (b) You don't need to do the double-pull and push. Clone your local
    repository as many times as needed, but create new git-remote(1)s in
    each extra clone and pull/push only the branch you care about directly
    from or to the remote. That way, you'll start off with the bulk of the
    storage shared with your main local repository, and "waste" a few KB
    when you make (presumably infrequent) new changes.
    Ah, that is clever.  Perhaps we need to write up directions on how to do that.
    Yeah, that's the way I work with some projects at least.

    But that brings me to another point:

    In my experience (doing exactly this kind of old-branch-maintenance with
    Archiveopteryx), git doesn't help you much if you want to backport (i.e.
    cherry-pick) changes from a development branch to old release branches.
    It is much more helpful when you make changes to the *oldest* applicable
    branch and bring it *forward* to your development branch (by merging the
    old branch into your master). Cherry-picking can be done, but it becomes
    painful after a while.
    Well, per previous discussion, we're not going to change that at this
    point, or maybe ever.
    Nope, the deal was definitely that we stick to the current workflow.

    Yes, this means we can't use git cherry-pick or similar git-specific
    tools to make life easier. But it shouldn't make life harder than it
    is *now*, with cvs.
  • Abhijit Menon-Sen at Jul 21, 2010 at 10:56 am

    At 2010-07-21 06:39:28 -0400, robertmhaas@gmail.com wrote:
    Perhaps we need to write up directions on how to do that.
    I'll write them if you tell me where to put them. It's trivial.
    Well, per previous discussion, we're not going to change that at this
    point, or maybe ever.
    Sure. I just wanted to mention it, because it's something I learned the
    hard way. It's also true that back-porting changes is a bigger deal for
    Postgres than it was for me (in the sense that it's an exception rather
    than a routine activity), and individual changes are usually backported
    as soon as, or very soon after, they are committed; so it should be less
    painful on the whole.

    Another point, in response to Magnus's followup:
    At 2010-07-21 12:42:03 +0200, magnus@hagander.net wrote:

    Yes, this means we can't use git cherry-pick or similar git-specific
    tools to make life easier.
    No, that's not right. You *can* use cherry-pick; in fact, it's the sane
    way to backport the occasional change. What you can't do is efficiently
    manage a queue of changes to be backported to multiple branches. But as
    I said above, that's not exactly what we want to do for Postgres, so it
    should not matter too much.

    -- ams
  • Robert Haas at Jul 21, 2010 at 10:58 am

    On Wed, Jul 21, 2010 at 6:56 AM, Abhijit Menon-Sen wrote:
    At 2010-07-21 06:39:28 -0400, robertmhaas@gmail.com wrote:

    Perhaps we need to write up directions on how to do that.
    I'll write them if you tell me where to put them. It's trivial.
    Post 'em here or drop them on the wiki and post a link.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Abhijit Menon-Sen at Jul 21, 2010 at 11:53 am

    At 2010-07-21 06:57:53 -0400, robertmhaas@gmail.com wrote:
    Post 'em here or drop them on the wiki and post a link.
    1. Clone the remote repository as usual:

    git clone git://git.postgresql.org/git/postgresql.git

    2. Create as many local clones as you want:

    git clone postgresql foobar

    3. In each clone (supposing you care about branch xyzzy):

    3.1. git remote origin set-url ssh://whatever/postgresql.git

    3.2. git remote update && git remote prune

    3.2. git checkout -t origin/xyzzy

    3.4. git branch -d master

    3.5. Edit .git/config and set origin.fetch thus:

    [remote "origin"]
    fetch = +refs/heads/xyzzy:refs/remotes/origin/xyzzy

    (You can git config remote.origin.fetch '+refs/...' if you're
    squeamish about editing the config file.)

    3.6. That's it. git pull and git push will work correctly.

    (This will replace the "origin" remote that pointed at your local
    postgresql.git clone with one that points to the real remote; but you
    could also add a remote definition named something other than "origin",
    in which case you'd need to "git push thatname" etc.)

    -- ams

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedJul 20, '10 at 5:04p
activeJul 22, '10 at 9:34a
posts16
users9
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase