FAQ
While attempting to fix the build.golang.org entity corruption (bunch of
commits with a bogus "Num" of mostly 0 I believe, but maybe other low
numbers?), I managed to break it somehow that I don't understand yet.

I probably shouldn't have even touched it because now I'm grumpy. But I
think I understand how it's all supposed to work.

For the record, here's what I did which broke everything:

      https://codereview.appspot.com/136540044

I tried to make it so a builder in commit-watching mode would be able to
fix a broken commit, so I made the commit handler in GET mode treat a
corrupt entry as missing (error). And then I made the commit handler in
POST mode allow overwriting metadata.

Then I added a --commitWatcherOnly mode to the builder, and ran it for a
bit.

And then everything disappeared from the dashboard.

So I guess I broke something in the data structure which I don't
understand. Perhaps the ParentHash field went missing or something.

And yes, perhaps I should've experimented on test instance, but the code
"works" from a fresh initial state. What was broken is the state it got
into (still unknown), and I didn't know how to take an App Engine instance
and clone all its data into a separate version for experimentation. So I
did it love.

More eyeballs welcome, but I need a break.

I'll try to give it another shot tomorrow after sleeping, else Monday.

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Robert Griesemer at Sep 14, 2014 at 3:47 am
    the good news is it looks a bit less red... (perhaps because bleeding is
    internal, now)
    On Sat, Sep 13, 2014 at 8:26 PM, Brad Fitzpatrick wrote:

    While attempting to fix the build.golang.org entity corruption (bunch of
    commits with a bogus "Num" of mostly 0 I believe, but maybe other low
    numbers?), I managed to break it somehow that I don't understand yet.

    I probably shouldn't have even touched it because now I'm grumpy. But I
    think I understand how it's all supposed to work.

    For the record, here's what I did which broke everything:

    https://codereview.appspot.com/136540044

    I tried to make it so a builder in commit-watching mode would be able to
    fix a broken commit, so I made the commit handler in GET mode treat a
    corrupt entry as missing (error). And then I made the commit handler in
    POST mode allow overwriting metadata.

    Then I added a --commitWatcherOnly mode to the builder, and ran it for a
    bit.

    And then everything disappeared from the dashboard.

    So I guess I broke something in the data structure which I don't
    understand. Perhaps the ParentHash field went missing or something.

    And yes, perhaps I should've experimented on test instance, but the code
    "works" from a fresh initial state. What was broken is the state it got
    into (still unknown), and I didn't know how to take an App Engine instance
    and clone all its data into a separate version for experimentation. So I
    did it love.

    More eyeballs welcome, but I need a break.

    I'll try to give it another shot tomorrow after sleeping, else Monday.

    --
    You received this message because you are subscribed to the Google Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Sep 14, 2014 at 4:17 am
    Just in case Andrew has the database snapshot on Aug 20 :)
    We've snapshotted it before perf dashboard deployment.

    On Sat, Sep 13, 2014 at 8:47 PM, Robert Griesemer wrote:
    the good news is it looks a bit less red... (perhaps because bleeding is
    internal, now)
    On Sat, Sep 13, 2014 at 8:26 PM, Brad Fitzpatrick wrote:

    While attempting to fix the build.golang.org entity corruption (bunch of
    commits with a bogus "Num" of mostly 0 I believe, but maybe other low
    numbers?), I managed to break it somehow that I don't understand yet.

    I probably shouldn't have even touched it because now I'm grumpy. But I
    think I understand how it's all supposed to work.

    For the record, here's what I did which broke everything:

    https://codereview.appspot.com/136540044

    I tried to make it so a builder in commit-watching mode would be able to
    fix a broken commit, so I made the commit handler in GET mode treat a
    corrupt entry as missing (error). And then I made the commit handler in POST
    mode allow overwriting metadata.

    Then I added a --commitWatcherOnly mode to the builder, and ran it for a
    bit.

    And then everything disappeared from the dashboard.

    So I guess I broke something in the data structure which I don't
    understand. Perhaps the ParentHash field went missing or something.

    And yes, perhaps I should've experimented on test instance, but the code
    "works" from a fresh initial state. What was broken is the state it got into
    (still unknown), and I didn't know how to take an App Engine instance and
    clone all its data into a separate version for experimentation. So I did it
    love.

    More eyeballs welcome, but I need a break.

    I'll try to give it another shot tomorrow after sleeping, else Monday.

    --
    You received this message because you are subscribed to the Google Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

    --
    You received this message because you are subscribed to the Google Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Brad Fitzpatrick at Sep 14, 2014 at 1:55 pm
    Good to know. Hopefully that won't be required.

    I've turned off breakage emails in the meantime to reduce some annoyance at
    least.
      On Sep 14, 2014 12:17 AM, "Dmitry Vyukov" wrote:

    Just in case Andrew has the database snapshot on Aug 20 :)
    We've snapshotted it before perf dashboard deployment.

    On Sat, Sep 13, 2014 at 8:47 PM, Robert Griesemer wrote:
    the good news is it looks a bit less red... (perhaps because bleeding is
    internal, now)

    On Sat, Sep 13, 2014 at 8:26 PM, Brad Fitzpatrick <bradfitz@golang.org>
    wrote:
    While attempting to fix the build.golang.org entity corruption (bunch
    of
    commits with a bogus "Num" of mostly 0 I believe, but maybe other low
    numbers?), I managed to break it somehow that I don't understand yet.

    I probably shouldn't have even touched it because now I'm grumpy. But I
    think I understand how it's all supposed to work.

    For the record, here's what I did which broke everything:

    https://codereview.appspot.com/136540044

    I tried to make it so a builder in commit-watching mode would be able to
    fix a broken commit, so I made the commit handler in GET mode treat a
    corrupt entry as missing (error). And then I made the commit handler in
    POST
    mode allow overwriting metadata.

    Then I added a --commitWatcherOnly mode to the builder, and ran it for a
    bit.

    And then everything disappeared from the dashboard.

    So I guess I broke something in the data structure which I don't
    understand. Perhaps the ParentHash field went missing or something.

    And yes, perhaps I should've experimented on test instance, but the code
    "works" from a fresh initial state. What was broken is the state it got
    into
    (still unknown), and I didn't know how to take an App Engine instance
    and
    clone all its data into a separate version for experimentation. So I
    did it
    love.

    More eyeballs welcome, but I need a break.

    I'll try to give it another shot tomorrow after sleeping, else Monday.

    --
    You received this message because you are subscribed to the Google
    Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

    --
    You received this message because you are subscribed to the Google Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Russ Cox at Sep 14, 2014 at 2:24 pm
    Thanks for looking into this. In response to your earlier mail, we cannot
    use hg's sequence number as Num. That number is dependent on the order that
    the commits arrived on a particular machine. Because there are other
    branches, those branches can get commit numbers mixed into the sequence in
    roughly any interlacing. In fact I think different orders on different
    machines is part of what is causing the occasional missing commits. If just
    one builder were sending commits in, then I think things would be fine. It
    looks like someone changed the builders so that they default to sending
    commits in, instead of having just one. Maybe that is part of the problem.

    Russ

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Brad Fitzpatrick at Sep 14, 2014 at 2:36 pm
    I was planning on modifying the dashboard to ignore commit pings from old
    builders.

    Then exactly 1 commit watcher (on the coordinator) will be polling for
    commits and reporting, so the hg-local sequence numbers should be fine?

    On Sun, Sep 14, 2014 at 10:24 AM, Russ Cox wrote:

    Thanks for looking into this. In response to your earlier mail, we cannot
    use hg's sequence number as Num. That number is dependent on the order that
    the commits arrived on a particular machine. Because there are other
    branches, those branches can get commit numbers mixed into the sequence in
    roughly any interlacing. In fact I think different orders on different
    machines is part of what is causing the occasional missing commits. If just
    one builder were sending commits in, then I think things would be fine. It
    looks like someone changed the builders so that they default to sending
    commits in, instead of having just one. Maybe that is part of the problem.

    Russ

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Russ Cox at Sep 14, 2014 at 2:43 pm

    On Sun, Sep 14, 2014 at 10:36 AM, Brad Fitzpatrick wrote:

    I was planning on modifying the dashboard to ignore commit pings from old
    builders.

    Then exactly 1 commit watcher (on the coordinator) will be polling for
    commits and reporting, so the hg-local sequence numbers should be fine?
    That definitely sounds better. It is possible that if the coordinator is
    blown away and reinitialized it will get a different sequence numbering,
    but it should get the same total number N, so future updates to the
    dashboard would still start at the right place (N+1...). As long as the
    sequence number is only used to decide the dashboard display ordering and
    not used to identify commits internally (for example, ParentHash should
    stay a hash, not become a sequence number) then I think it's fine.

    Russ

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Brad Fitzpatrick at Sep 14, 2014 at 3:46 pm

    On Sun, Sep 14, 2014 at 10:43 AM, Russ Cox wrote:
    On Sun, Sep 14, 2014 at 10:36 AM, Brad Fitzpatrick wrote:

    I was planning on modifying the dashboard to ignore commit pings from old
    builders.

    Then exactly 1 commit watcher (on the coordinator) will be polling for
    commits and reporting, so the hg-local sequence numbers should be fine?
    That definitely sounds better. It is possible that if the coordinator is
    blown away and reinitialized it will get a different sequence numbering,
    but it should get the same total number N, so future updates to the
    dashboard would still start at the right place (N+1...). As long as the
    sequence number is only used to decide the dashboard display ordering and
    not used to identify commits internally (for example, ParentHash should
    stay a hash, not become a sequence number) then I think it's fine.
    Okay, I've written a one-off program to fix the datastore for now. It's
    definitely just a bandage, but it's possible things might just keep working
    from here on out too. I modified the App Engine app a bit too to reset its
    NextNum if it sees somebody use an explicit one that's >= its current
    NextNum.

    The dashboard now clearly shows that "runtime: stop scanning stack
    frames/args conservatively" broke 386 builds, so mission success: the
    dashboard is kinda useful again.

    I'm off for a run. Will resume cleaning more tomorrow at work on a nice
    keyboard.

    Andrew, as the dashboard is currently running unsubmitted changes, try not
    to deploy on top of it, though it should be fine. I think it'd just lose
    the ability of my fixer tool to modify existing Commits.

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Sep 14, 2014 at 4:17 pm

    On Sun, Sep 14, 2014 at 7:43 AM, Russ Cox wrote:
    On Sun, Sep 14, 2014 at 10:36 AM, Brad Fitzpatrick wrote:

    I was planning on modifying the dashboard to ignore commit pings from old
    builders.

    Then exactly 1 commit watcher (on the coordinator) will be polling for
    commits and reporting, so the hg-local sequence numbers should be fine?

    That definitely sounds better. It is possible that if the coordinator is
    blown away and reinitialized it will get a different sequence numbering, but
    it should get the same total number N, so future updates to the dashboard
    would still start at the right place (N+1...). As long as the sequence

    It will get the same total number N iff there were no commits during
    restart. If there were commits, then I believe this scheme will lead
    to mess. The coordinator will get new commits *and* all commits will
    be reordered at the same time.
    What's wrong with the current scheme? It looks fine to me.


    number is only used to decide the dashboard display ordering and not used to
    identify commits internally (for example, ParentHash should stay a hash, not
    become a sequence number) then I think it's fine.

    Russ
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Brad Fitzpatrick at Sep 14, 2014 at 4:21 pm
    The current scheme wasted my time.

    If you find the bug that led to zero Commit.Num values, I might be
    interested in keeping the current scheme.
    On Sunday, September 14, 2014, Dmitry Vyukov wrote:

    On Sun, Sep 14, 2014 at 7:43 AM, Russ Cox <rsc@golang.org <javascript:;>>
    wrote:
    On Sun, Sep 14, 2014 at 10:36 AM, Brad Fitzpatrick <bradfitz@golang.org
    <javascript:;>>
    wrote:
    I was planning on modifying the dashboard to ignore commit pings from
    old
    builders.

    Then exactly 1 commit watcher (on the coordinator) will be polling for
    commits and reporting, so the hg-local sequence numbers should be fine?

    That definitely sounds better. It is possible that if the coordinator is
    blown away and reinitialized it will get a different sequence numbering, but
    it should get the same total number N, so future updates to the dashboard
    would still start at the right place (N+1...). As long as the sequence

    It will get the same total number N iff there were no commits during
    restart. If there were commits, then I believe this scheme will lead
    to mess. The coordinator will get new commits *and* all commits will
    be reordered at the same time.
    What's wrong with the current scheme? It looks fine to me.


    number is only used to decide the dashboard display ordering and not used to
    identify commits internally (for example, ParentHash should stay a hash, not
    become a sequence number) then I think it's fine.

    Russ
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Sep 14, 2014 at 4:27 pm
    What commits have 0 nums? Is there anything else strange about them?
    How many a there? What is the first? Do they have NeedsBenchmarking
    set?

    On Sun, Sep 14, 2014 at 9:21 AM, Brad Fitzpatrick wrote:
    The current scheme wasted my time.

    If you find the bug that led to zero Commit.Num values, I might be
    interested in keeping the current scheme.

    On Sunday, September 14, 2014, Dmitry Vyukov wrote:
    On Sun, Sep 14, 2014 at 7:43 AM, Russ Cox wrote:
    On Sun, Sep 14, 2014 at 10:36 AM, Brad Fitzpatrick <bradfitz@golang.org>
    wrote:
    I was planning on modifying the dashboard to ignore commit pings from
    old
    builders.

    Then exactly 1 commit watcher (on the coordinator) will be polling for
    commits and reporting, so the hg-local sequence numbers should be fine?

    That definitely sounds better. It is possible that if the coordinator is
    blown away and reinitialized it will get a different sequence numbering,
    but
    it should get the same total number N, so future updates to the
    dashboard
    would still start at the right place (N+1...). As long as the sequence

    It will get the same total number N iff there were no commits during
    restart. If there were commits, then I believe this scheme will lead
    to mess. The coordinator will get new commits *and* all commits will
    be reordered at the same time.
    What's wrong with the current scheme? It looks fine to me.


    number is only used to decide the dashboard display ordering and not
    used to
    identify commits internally (for example, ParentHash should stay a hash,
    not
    become a sequence number) then I think it's fine.

    Russ
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Brad Fitzpatrick at Sep 14, 2014 at 4:33 pm

    On Sun, Sep 14, 2014 at 12:27 PM, Dmitry Vyukov wrote:

    What commits have 0 nums?

    All the previously-missing ones.

    Is there anything else strange about them?
    I didn't look, because I also didn't know what normal was.

    How many a there? What is the first? Do they have NeedsBenchmarking
    set?
    No clue.

    One thing I also noticed was the "hg log --template=" xmlLogTemplate used
    the {parents} variable, which is defined (in hg help templates) like:

         parents List of strings. The parents of the changeset in
    "rev:node"
                       format. If the changeset has only one "natural" parent
    (the
                       predecessor revision) nothing is shown.

    Instead of what it meant to use:

         p1node String. The identification hash of the changeset's first
                       parent, as a 40 digit hexadecimal string. If the changeset
                       has no parents, all digits are 0.

    So I was also noticing later things with empty "ParentHash" values. But I'm
    not sure how much that matters? Maybe some builders in commit mode are
    running different versions of hg where the template definitions are
    different.

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Russ Cox at Sep 14, 2014 at 5:21 pm

    On Sun, Sep 14, 2014 at 12:17 PM, Dmitry Vyukov wrote:

    What's wrong with the current scheme? It looks fine to me.
    What's wrong is that the dashboard stopped updating Friday morning. Brad's
    the one taking time to fix it, so really anything he wants to do is fine.

    Russ

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Russ Cox at Sep 14, 2014 at 5:26 pm
    Brad, a bunch of the commits on the front page now have a 0 time value (Jan
    1 00:00). Is that something you know about that should not happen anymore,
    or is it a different corruption?

    Thanks.
    Russ

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Brad Fitzpatrick at Sep 14, 2014 at 6:18 pm

    On Sun, Sep 14, 2014 at 1:26 PM, Russ Cox wrote:

    Brad, a bunch of the commits on the front page now have a 0 time value
    (Jan 1 00:00). Is that something you know about that should not happen
    anymore, or is it a different corruption?
    My ad-hoc fixer tool probably had a bug. I copied the time parsing +
    formatting from another tool, but guess it didn't work. I can fix.

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Brad Fitzpatrick at Sep 14, 2014 at 6:35 pm
    On Sun, Sep 14, 2014 at 2:18 PM, Brad Fitzpatrick wrote:
    On Sun, Sep 14, 2014 at 1:26 PM, Russ Cox wrote:

    Brad, a bunch of the commits on the front page now have a 0 time value
    (Jan 1 00:00). Is that something you know about that should not happen
    anymore, or is it a different corruption?
    My ad-hoc fixer tool probably had a bug. I copied the time parsing +
    formatting from another tool, but guess it didn't work. I can fix.
    Done.

    My tool was sending it but the server wasn't letting clients overwrite that
    field, and apparently some of the existing ones were zero already. That's
    two suspicious things now on Commit entities: a zero Time and a zero Num.

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedSep 14, '14 at 3:26a
activeSep 14, '14 at 6:35p
posts16
users4
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase