FAQ
This weekend the build dashboard has lost all it's entries and caused
the builders to rebuild a page full of work, twice.

Does anyone know what is going on with the build dashboard ?

Dave

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Andrew Gerrand at Oct 27, 2014 at 5:52 am
    Yes, this was because of some work on the datastore that I did to resurrect
    the perf dashboard.

    None of the result data is actually lost; just the denormalised result data
    in the Commit records was wiped, tricking the builders into doing their
    builds again.

    It won't happen again.

    On Mon Oct 27 2014 at 4:43:13 PM Dave Cheney wrote:

    This weekend the build dashboard has lost all it's entries and caused
    the builders to rebuild a page full of work, twice.

    Does anyone know what is going on with the build dashboard ?

    Dave

    --
    You received this message because you are subscribed to the Google Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dave Cheney at Oct 27, 2014 at 5:55 am
    Thanks. I'm glad to know this isn't just randomly tearing itself apart.
    On Mon, Oct 27, 2014 at 4:52 PM, Andrew Gerrand wrote:
    Yes, this was because of some work on the datastore that I did to resurrect
    the perf dashboard.

    None of the result data is actually lost; just the denormalised result data
    in the Commit records was wiped, tricking the builders into doing their
    builds again.

    It won't happen again.

    On Mon Oct 27 2014 at 4:43:13 PM Dave Cheney wrote:

    This weekend the build dashboard has lost all it's entries and caused
    the builders to rebuild a page full of work, twice.

    Does anyone know what is going on with the build dashboard ?

    Dave

    --
    You received this message because you are subscribed to the Google Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Oct 27, 2014 at 8:19 am
    We now seem to be in an infinite loop of breaking dashboard with
    manual updates...

    On Mon, Oct 27, 2014 at 9:55 AM, Dave Cheney wrote:
    Thanks. I'm glad to know this isn't just randomly tearing itself apart.
    On Mon, Oct 27, 2014 at 4:52 PM, Andrew Gerrand wrote:
    Yes, this was because of some work on the datastore that I did to resurrect
    the perf dashboard.

    None of the result data is actually lost; just the denormalised result data
    in the Commit records was wiped, tricking the builders into doing their
    builds again.

    It won't happen again.

    On Mon Oct 27 2014 at 4:43:13 PM Dave Cheney wrote:

    This weekend the build dashboard has lost all it's entries and caused
    the builders to rebuild a page full of work, twice.

    Does anyone know what is going on with the build dashboard ?

    Dave

    --
    You received this message because you are subscribed to the Google Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Russ Cox at Oct 27, 2014 at 5:29 pm

    On Mon, Oct 27, 2014 at 4:18 AM, 'Dmitry Vyukov' via golang-dev wrote:

    We now seem to be in an infinite loop of breaking dashboard with
    manual updates...
      You do understand that the manual updates are correcting non-manual
    updates executed by buggy code, right? It's not like we just get up in the
    morning and think, hmm, I'd like to make some changes by hand to the build
    dashboard datastore.

    Russ

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Oct 27, 2014 at 5:35 pm

    On Mon, Oct 27, 2014 at 9:29 PM, Russ Cox wrote:
    On Mon, Oct 27, 2014 at 4:18 AM, 'Dmitry Vyukov' via golang-dev
    wrote:
    We now seem to be in an infinite loop of breaking dashboard with
    manual updates...

    You do understand that the manual updates are correcting non-manual updates
    executed by buggy code, right? It's not like we just get up in the morning
    and think, hmm, I'd like to make some changes by hand to the build dashboard
    datastore.

    No, I don't. What is the buggy code?

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Brad Fitzpatrick at Oct 27, 2014 at 5:39 pm
    Seriously?? You're complaining this much and you haven't even been paying
    attention?!
      On Oct 27, 2014 10:35 AM, "'Dmitry Vyukov' via golang-dev" wrote:
    On Mon, Oct 27, 2014 at 9:29 PM, Russ Cox wrote:
    On Mon, Oct 27, 2014 at 4:18 AM, 'Dmitry Vyukov' via golang-dev
    wrote:
    We now seem to be in an infinite loop of breaking dashboard with
    manual updates...

    You do understand that the manual updates are correcting non-manual updates
    executed by buggy code, right? It's not like we just get up in the morning
    and think, hmm, I'd like to make some changes by hand to the build dashboard
    datastore.

    No, I don't. What is the buggy code?

    --
    You received this message because you are subscribed to the Google Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Oct 27, 2014 at 5:50 pm
    It's not that I did not pay attention. But I missed it for some
    reason. I will appreciate if you can point me to the bug.
    On Mon, Oct 27, 2014 at 9:39 PM, Brad Fitzpatrick wrote:
    Seriously?? You're complaining this much and you haven't even been paying
    attention?!

    On Oct 27, 2014 10:35 AM, "'Dmitry Vyukov' via golang-dev"
    wrote:
    On Mon, Oct 27, 2014 at 9:29 PM, Russ Cox wrote:
    On Mon, Oct 27, 2014 at 4:18 AM, 'Dmitry Vyukov' via golang-dev
    wrote:
    We now seem to be in an infinite loop of breaking dashboard with
    manual updates...

    You do understand that the manual updates are correcting non-manual
    updates
    executed by buggy code, right? It's not like we just get up in the
    morning
    and think, hmm, I'd like to make some changes by hand to the build
    dashboard
    datastore.

    No, I don't. What is the buggy code?

    --
    You received this message because you are subscribed to the Google Groups
    "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Russ Cox at Oct 27, 2014 at 6:11 pm

    On Mon, Oct 27, 2014 at 1:50 PM, Dmitry Vyukov wrote:

    It's not that I did not pay attention. But I missed it for some
    reason. I will appreciate if you can point me to the bug.
    The one my manual updates fixed was that there was a code path through the
    "send a build breakage email" logic that ended up writing a record to the
    datastore with FailNotificationSent=true but all the other fields,
    including Num, zeroed. This made records disappear from the dashboard
    because they lost their Nums. Look through your mail for
    FailNotificationSent if you want details.

    Russ

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Oct 27, 2014 at 6:22 pm
    If you mean this one:
    https://codereview.appspot.com/154080043/diff/60001/dashboard/app/build/handler.go
    I've looked at it when it was submitted. But it does not fix anything,
    it just bails out on commits with Num=0. And there is no explanation
    how commits with Num=0 appeared in the datastore. I do not see
    anything in the failure notification path that stores empty commits in
    the database.
    I frankly don't see where is _the_ bug in the code. Please explain it to me.



    On Mon, Oct 27, 2014 at 10:11 PM, Russ Cox wrote:
    On Mon, Oct 27, 2014 at 1:50 PM, Dmitry Vyukov wrote:

    It's not that I did not pay attention. But I missed it for some
    reason. I will appreciate if you can point me to the bug.

    The one my manual updates fixed was that there was a code path through the
    "send a build breakage email" logic that ended up writing a record to the
    datastore with FailNotificationSent=true but all the other fields, including
    Num, zeroed. This made records disappear from the dashboard because they
    lost their Nums. Look through your mail for FailNotificationSent if you want
    details.

    Russ
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Russ Cox at Oct 27, 2014 at 6:51 pm

    On Mon, Oct 27, 2014 at 2:22 PM, Dmitry Vyukov wrote:

    If you mean this one:

    https://codereview.appspot.com/154080043/diff/60001/dashboard/app/build/handler.go
    I've looked at it when it was submitted. But it does not fix anything,
    it just bails out on commits with Num=0. And there is no explanation
    how commits with Num=0 appeared in the datastore. I do not see
    anything in the failure notification path that stores empty commits in
    the database.
    I frankly don't see where is _the_ bug in the code. Please explain it to
    me.

    The commit proves that the bug exists (when I put the check in, it
    triggered). I didn't need to find the root cause, I just needed to stop it
    and uncorrupt the database so that I could get my work done. Finding the
    actual bug is left as an exercise to the interested reader / owner of the
    code.

    Russ

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Oct 27, 2014 at 6:57 pm

    On Mon, Oct 27, 2014 at 10:51 PM, Russ Cox wrote:
    On Mon, Oct 27, 2014 at 2:22 PM, Dmitry Vyukov wrote:

    If you mean this one:

    https://codereview.appspot.com/154080043/diff/60001/dashboard/app/build/handler.go
    I've looked at it when it was submitted. But it does not fix anything,
    it just bails out on commits with Num=0. And there is no explanation
    how commits with Num=0 appeared in the datastore. I do not see
    anything in the failure notification path that stores empty commits in
    the database.
    I frankly don't see where is _the_ bug in the code. Please explain it to
    me.

    The commit proves that the bug exists (when I put the check in, it
    triggered). I didn't need to find the root cause, I just needed to stop it
    and uncorrupt the database so that I could get my work done. Finding the
    actual bug is left as an exercise to the interested reader / owner of the
    code.

    That commit was done when the database was updated several times
    manually and was in a corrupted state. So the commit could well detect
    results of that updates.
    It's still mystery to me what was the root cause and what was not I
    paying attention to...

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Brad Fitzpatrick at Oct 27, 2014 at 7:08 pm

    On Mon, Oct 27, 2014 at 11:57 AM, Dmitry Vyukov wrote:
    On Mon, Oct 27, 2014 at 10:51 PM, Russ Cox wrote:
    On Mon, Oct 27, 2014 at 2:22 PM, Dmitry Vyukov wrote:

    If you mean this one:
    https://codereview.appspot.com/154080043/diff/60001/dashboard/app/build/handler.go
    I've looked at it when it was submitted. But it does not fix anything,
    it just bails out on commits with Num=0. And there is no explanation
    how commits with Num=0 appeared in the datastore. I do not see
    anything in the failure notification path that stores empty commits in
    the database.
    I frankly don't see where is _the_ bug in the code. Please explain it to
    me.

    The commit proves that the bug exists (when I put the check in, it
    triggered). I didn't need to find the root cause, I just needed to stop it
    and uncorrupt the database so that I could get my work done. Finding the
    actual bug is left as an exercise to the interested reader / owner of the
    code.

    That commit was done when the database was updated several times
    manually and was in a corrupted state. So the commit could well detect
    results of that updates.
    It's still mystery to me what was the root cause and what was not I
    paying attention to...

    Let me repeat Russ's point: Do you think we changed the database for fun?

    We only changed it manually because it was already broken and not updating.
    Things were zero that should not be zero.

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Oct 27, 2014 at 7:16 pm

    On Mon, Oct 27, 2014 at 11:08 PM, Brad Fitzpatrick wrote:
    On Mon, Oct 27, 2014 at 11:57 AM, Dmitry Vyukov wrote:
    On Mon, Oct 27, 2014 at 10:51 PM, Russ Cox wrote:
    On Mon, Oct 27, 2014 at 2:22 PM, Dmitry Vyukov <dvyukov@google.com>
    wrote:
    If you mean this one:


    https://codereview.appspot.com/154080043/diff/60001/dashboard/app/build/handler.go
    I've looked at it when it was submitted. But it does not fix anything,
    it just bails out on commits with Num=0. And there is no explanation
    how commits with Num=0 appeared in the datastore. I do not see
    anything in the failure notification path that stores empty commits in
    the database.
    I frankly don't see where is _the_ bug in the code. Please explain it
    to
    me.

    The commit proves that the bug exists (when I put the check in, it
    triggered). I didn't need to find the root cause, I just needed to stop
    it
    and uncorrupt the database so that I could get my work done. Finding the
    actual bug is left as an exercise to the interested reader / owner of
    the
    code.

    That commit was done when the database was updated several times
    manually and was in a corrupted state. So the commit could well detect
    results of that updates.
    It's still mystery to me what was the root cause and what was not I
    paying attention to...

    Let me repeat Russ's point: Do you think we changed the database for fun?
    I don't think so.
    We only changed it manually because it was already broken and not updating.
    Things were zero that should not be zero.
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Russ Cox at Oct 27, 2014 at 9:30 pm

    On Mon, Oct 27, 2014 at 2:57 PM, Dmitry Vyukov wrote:

    That commit was done when the database was updated several times
    manually and was in a corrupted state. So the commit could well detect
    results of that updates.
    It's still mystery to me what was the root cause and what was not I
    paying attention to...
    I think it has to do with the order of the commit logs arriving. In normal
    usage if you have commit A then commit B, where B broke the build, then the
    result for A comes in first, and then the result for B. The insertion of
    the result for B checks that A was okay and since B is not, it sends mail
    about B breaking the build. This works.

    If you commit A and B back to back, then it is fairly likely that B will
    run first, because the builders run newest thing missing first. Then when B
    comes in broken, the code records that fact but doesn't send mail, because
    it doesn't know about A yet. When A comes in working, then it checks
    whether B was broken, finds that it was, and sends mail saying that B is
    broken. It is this send mail operation that I believe is passed an
    incomplete Commit record for B (with Num and many other fields set to their
    zero values). The mail sender updates FailNotificationSent=true in the
    record and writes it back into the datastore, blowing away the real record
    for B.

    This theory matches the failures I observed: the database always got stuck
    when I committed 3 or 4 CLs back to back. I do this fairly often, because I
    send a bunch of CLs in the same client in different directories and then I
    come back to them and fix all the comments and run all.bash and submit them
    together.

    I have not chased down exactly what is wrong in the code, but if you want
    to do so, that's where I would start.

    Russ

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Andrew Gerrand at Oct 27, 2014 at 9:48 pm
    Thanks for the helpful analysis Russ. I found it. Dmitry and I broke it.
    Sorry for the trouble.

    CL:

    https://codereview.appspot.com/164960043
    On Tue Oct 28 2014 at 8:30:43 AM Russ Cox wrote:
    On Mon, Oct 27, 2014 at 2:57 PM, Dmitry Vyukov wrote:

    That commit was done when the database was updated several times
    manually and was in a corrupted state. So the commit could well detect
    results of that updates.
    It's still mystery to me what was the root cause and what was not I
    paying attention to...
    I think it has to do with the order of the commit logs arriving. In normal
    usage if you have commit A then commit B, where B broke the build, then the
    result for A comes in first, and then the result for B. The insertion of
    the result for B checks that A was okay and since B is not, it sends mail
    about B breaking the build. This works.

    If you commit A and B back to back, then it is fairly likely that B will
    run first, because the builders run newest thing missing first. Then when B
    comes in broken, the code records that fact but doesn't send mail, because
    it doesn't know about A yet. When A comes in working, then it checks
    whether B was broken, finds that it was, and sends mail saying that B is
    broken. It is this send mail operation that I believe is passed an
    incomplete Commit record for B (with Num and many other fields set to their
    zero values). The mail sender updates FailNotificationSent=true in the
    record and writes it back into the datastore, blowing away the real record
    for B.

    This theory matches the failures I observed: the database always got stuck
    when I committed 3 or 4 CLs back to back. I do this fairly often, because I
    send a bunch of CLs in the same client in different directories and then I
    come back to them and fix all the comments and run all.bash and submit them
    together.

    I have not chased down exactly what is wrong in the code, but if you want
    to do so, that's where I would start.

    Russ
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Aram Hăvărneanu at Oct 27, 2014 at 8:46 am

    On Mon, Oct 27, 2014 at 6:52 AM, Andrew Gerrand wrote:
    None of the result data is actually lost
    Good to know because the dashboard sure indicates this, e.g. on a
    random page: http://build.golang.org/?page=21

    Will it come back to normal (without rebuilding everything)?

    --
    Aram Hăvărneanu

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Andrew Gerrand at Oct 27, 2014 at 9:48 am
    Yeah, I have a little script that will put things back the way they were.
    Hopefully Dmitry isn't correct about this being an infinite loop of
    datastore updates.
    On Mon Oct 27 2014 at 7:46:49 PM Aram Hăvărneanu wrote:
    On Mon, Oct 27, 2014 at 6:52 AM, Andrew Gerrand wrote:
    None of the result data is actually lost
    Good to know because the dashboard sure indicates this, e.g. on a
    random page: http://build.golang.org/?page=21

    Will it come back to normal (without rebuilding everything)?

    --
    Aram Hăvărneanu
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedOct 27, '14 at 5:43a
activeOct 27, '14 at 9:48p
posts18
users6
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase