Grokbase Groups Hive dev July 2009
FAQ
Hi all,
Having focused on hive for several month, here is some wish of me after
serious consideration

1. All auto-gen code for hive was under the facebook commercial version
of thrift, which is older than the open source one, would lead to lots of
compatible problems and stop from all helps from the open source community.
We need to remove them as soon as possible, but it seems the progress on
this issue has stopped.
2. Please give us a clear roadmap. We also have a plan improving hive,
but our patches would probably be uncared-for, because it's not on the
schedule of facebook. If go on like this, there should be a lot of
compatible problems brought by other commits, we were surfing from fixing
conflicts again and again.
3. Please don't commit code so rashly. Code from Ashish could easily be
committed by others without a strict examination, that caused a lot of
problems when using it here, bugs and incondite code hard to read and to
extend it. Perhaps the main reason is that Ashish is the leader of Hive.
Another person, Namit, always committed buggy or ugly code. I have a
suggestion, just more discussion and tests with the helps of open source
community. Code quality would be raised if so. (I don't intend to in the
personal attacks here)

Regards,
Min
--
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Search Discussions

  • He Yongqiang at Jul 9, 2009 at 6:43 am

    On 09-7-9 下午2:14, "Min Zhou" wrote:

    Hi all,
    Having focused on hive for several month, here is some wish of me after
    serious consideration

    1. All auto-gen code for hive was under the facebook commercial version
    of thrift, which is older than the open source one, would lead to lots of
    compatible problems and stop from all helps from the open source community.
    We need to remove them as soon as possible, but it seems the progress on
    this issue has stopped.
    See Hive-438. I think it will be committed by this weekend?
    2. Please give us a clear roadmap. We also have a plan improving hive,
    but our patches would probably be uncared-for, because it's not on the
    schedule of facebook. If go on like this, there should be a lot of
    compatible problems brought by other commits, we were surfing from fixing
    conflicts again and again.
    I think the hive roadmap on hive wiki page has just been updated.
    Please send out request for code review if you think the patch is ready.
    But I think no conflicts can be guaranteed, since conflicts are not raised
    by one patch. If no conflict appears to this patch, then there will be
    conflicts for other patches.
    3. Please don't commit code so rashly. Code from Ashish could easily be
    committed by others without a strict examination, that caused a lot of
    problems when using it here, bugs and incondite code hard to read and to
    extend it. Perhaps the main reason is that Ashish is the leader of Hive.
    Another person, Namit, always committed buggy or ugly code. I have a
    suggestion, just more discussion and tests with the helps of open source
    community. Code quality would be raised if so. (I don't intend to in the
    personal attacks here)
    Code review is a kind of really hard and boring work. And we can only say
    that the code is much likely with no error. A patch is committed with at
    least two persons' work, the patch submitter and the code reviewer.
    Sometimes the code is really hard to find errors either by eyes or tests, so
    please be more patient. The bugs can be fixed soon after observing.

    And I agree with you suggestion on more discussion, so please comment on the
    jira pages for issues you think need more discussion and tests.
    BTW, I think as the hive community grows, there could be more discussions.
    So the first priority issue should be how to enlarge the hive community, and
    let more people involved in the discussion of the hive mail-list or jira.
    Regards,
    Min
  • He Yongqiang at Jul 9, 2009 at 6:46 am
    "But I think no conflicts can be guaranteed, since conflicts are not raised
    by one patch. If no conflict appears to this patch, then there will be
    conflicts for other patches."

    I mean

    But I think no conflicts can NOT be guaranteed, since conflicts are not
    raised by one patch. If no conflict appears to this patch, then there will
    be conflicts for other patches.

    On 09-7-9 下午2:43, "He Yongqiang" wrote:
    On 09-7-9 下午2:14, "Min Zhou" wrote:

    Hi all,
    Having focused on hive for several month, here is some wish of me after
    serious consideration

    1. All auto-gen code for hive was under the facebook commercial version
    of thrift, which is older than the open source one, would lead to lots of
    compatible problems and stop from all helps from the open source
    community.
    We need to remove them as soon as possible, but it seems the progress on
    this issue has stopped.
    See Hive-438. I think it will be committed by this weekend?
    2. Please give us a clear roadmap. We also have a plan improving hive,
    but our patches would probably be uncared-for, because it's not on the
    schedule of facebook. If go on like this, there should be a lot of
    compatible problems brought by other commits, we were surfing from fixing
    conflicts again and again.
    I think the hive roadmap on hive wiki page has just been updated.
    Please send out request for code review if you think the patch is ready.
    But I think no conflicts can be guaranteed, since conflicts are not raised
    by one patch. If no conflict appears to this patch, then there will be
    conflicts for other patches.
    3. Please don't commit code so rashly. Code from Ashish could easily be
    committed by others without a strict examination, that caused a lot of
    problems when using it here, bugs and incondite code hard to read and to
    extend it. Perhaps the main reason is that Ashish is the leader of Hive.
    Another person, Namit, always committed buggy or ugly code. I have a
    suggestion, just more discussion and tests with the helps of open source
    community. Code quality would be raised if so. (I don't intend to in the
    personal attacks here)
    Code review is a kind of really hard and boring work. And we can only say
    that the code is much likely with no error. A patch is committed with at
    least two persons' work, the patch submitter and the code reviewer.
    Sometimes the code is really hard to find errors either by eyes or tests, so
    please be more patient. The bugs can be fixed soon after observing.

    And I agree with you suggestion on more discussion, so please comment on the
    jira pages for issues you think need more discussion and tests.
    BTW, I think as the hive community grows, there could be more discussions.
    So the first priority issue should be how to enlarge the hive community, and
    let more people involved in the discussion of the hive mail-list or jira.
    Regards,
    Min

  • Min Zhou at Jul 9, 2009 at 8:37 am
    I have been watching HIVE-438 for so long a time, you know that's a critical
    change almost impact the whole hive source tree, a quick resolve is need.
    It's understandable human resources of facebook are very nervous,
    developers there always join several projects at the same time. Therefore,
    we should use the power of the open source community to speed up the
    development of it. But right now, my feeling is that we only care about
    their own affairs, regardless of what other people do. This is not the
    pattern of the open source community, but we still immersed in this pattern.




    2009/7/9 He Yongqiang <heyongqiang@software.ict.ac.cn>
    "But I think no conflicts can be guaranteed, since conflicts are not raised
    by one patch. If no conflict appears to this patch, then there will be
    conflicts for other patches."
    My meaning was not refuse conflicts, there should be a way to avoid the
    frequency of them if we work together.

    I mean

    But I think no conflicts can NOT be guaranteed, since conflicts are not
    raised by one patch. If no conflict appears to this patch, then there will
    be conflicts for other patches.

    On 09-7-9 下午2:43, "He Yongqiang" wrote:
    On 09-7-9 下午2:14, "Min Zhou" wrote:

    Hi all,
    Having focused on hive for several month, here is some wish of me after
    serious consideration

    1. All auto-gen code for hive was under the facebook commercial
    version
    of thrift, which is older than the open source one, would lead to
    lots of
    compatible problems and stop from all helps from the open source
    community.
    We need to remove them as soon as possible, but it seems the
    progress on
    this issue has stopped.
    See Hive-438. I think it will be committed by this weekend?
    2. Please give us a clear roadmap. We also have a plan improving
    hive,
    but our patches would probably be uncared-for, because it's not on
    the
    schedule of facebook. If go on like this, there should be a lot of
    compatible problems brought by other commits, we were surfing from
    fixing
    conflicts again and again.
    I think the hive roadmap on hive wiki page has just been updated.
    Please send out request for code review if you think the patch is ready.
    But I think no conflicts can be guaranteed, since conflicts are not raised
    by one patch. If no conflict appears to this patch, then there will be
    conflicts for other patches.
    3. Please don't commit code so rashly. Code from Ashish could easily
    be
    committed by others without a strict examination, that caused a lot
    of
    problems when using it here, bugs and incondite code hard to read and
    to
    extend it. Perhaps the main reason is that Ashish is the leader of
    Hive.
    Another person, Namit, always committed buggy or ugly code. I have a
    suggestion, just more discussion and tests with the helps of open
    source
    community. Code quality would be raised if so. (I don't intend to in
    the
    personal attacks here)
    Code review is a kind of really hard and boring work. And we can only say
    that the code is much likely with no error. A patch is committed with at
    least two persons' work, the patch submitter and the code reviewer.
    Sometimes the code is really hard to find errors either by eyes or tests, so
    please be more patient. The bugs can be fixed soon after observing.

    And I agree with you suggestion on more discussion, so please comment on the
    jira pages for issues you think need more discussion and tests.
    BTW, I think as the hive community grows, there could be more
    discussions.
    So the first priority issue should be how to enlarge the hive community, and
    let more people involved in the discussion of the hive mail-list or jira.
    Regards,
    Min


    --
    My research interests are distributed systems, parallel computing and
    bytecode based virtual machine.

    My profile:
    http://www.linkedin.com/in/coderplay
    My blog:
    http://coderplay.javaeye.com
  • Edward Capriolo at Jul 9, 2009 at 3:12 pm
    Min,

    I am also waiting for HIVE-438 to do some work with the Hive
    authentication system. I chose to sideline my work until a solid
    thrift release.

    I personally am very impressed by the open source contribution at
    facebook. Hadoop-Hive, Thrift, Scribe, Cassandra probably more stuff I
    do not know about.

    Also the Hive crew is very amicable! Do you want to see a horror
    story? Here is a patch that I use for (SSH public keys in LDAP
    support)!

    http://www.nabble.com/IMPORTANT:-change-of-project-hosting-notice-td17271488.html

    ...However, their efforts to
    find a way to get this kind of functionality into OpenSSH have met with
    absolutely no reaction whatsoever from the OpenSSH developers. ...

    Software has to developed based on usage. Since thrift has to be very
    very interoperable and standardized just ram-rodding it with
    developers may not help since there may need to be numerous cycles
    between users and developers.

    I believe Hive was/is waiting for an official thrift release before updating.

    2009/7/9 Min Zhou <coderplay@gmail.com>:
    I have been watching HIVE-438 for so long a time, you know that's a critical
    change almost impact the whole hive source tree, a quick resolve is need.
    It's understandable human resources of facebook are very nervous,
    developers there always join several projects at the same time. Therefore,
    we should use the power of the open source community to speed up the
    development of it. But right now, my feeling is that we only care about
    their own affairs, regardless of what other people do. This is not the
    pattern of the open source community, but we still immersed in this pattern.




    2009/7/9 He Yongqiang <heyongqiang@software.ict.ac.cn>
    "But I think no conflicts can be guaranteed, since conflicts are not raised
    by one patch. If no conflict appears to this patch, then there will be
    conflicts for other patches."
    My meaning was not refuse conflicts, there should be a way to avoid the
    frequency of them if we work together.

    I mean

    But I think no conflicts can NOT be guaranteed, since conflicts are not
    raised by one patch. If no conflict appears to this patch, then there will
    be conflicts for other patches.

    On 09-7-9 下午2:43, "He Yongqiang" wrote:
    On 09-7-9 下午2:14, "Min Zhou" wrote:

    Hi all,
    Having focused on hive for several month, here is some wish of me after
    serious consideration

    1. All auto-gen code for hive was under the facebook commercial
    version
    of thrift, which is older than the open source one, would lead to
    lots of
    compatible problems and stop from all helps from the open source
    community.
    We need to remove them as soon as possible, but it seems the
    progress on
    this issue has stopped.
    See Hive-438. I think it will be committed by this weekend?
    2. Please give us a clear roadmap. We also have a plan improving
    hive,
    but our patches would probably be uncared-for, because it's not on
    the
    schedule of facebook. If go on like this, there should be a lot of
    compatible problems brought by other commits, we were surfing from
    fixing
    conflicts again and again.
    I think the hive roadmap on hive wiki page has just been updated.
    Please send out request for code review if you think the patch is ready.
    But I think no conflicts can be guaranteed, since conflicts are not raised
    by one patch. If no conflict appears to this patch, then there will be
    conflicts for other patches.
    3. Please don't commit code so rashly. Code from Ashish could easily
    be
    committed by others without a strict examination, that caused a lot
    of
    problems when using it here, bugs and incondite code hard to read and
    to
    extend it. Perhaps the main reason is that Ashish is the leader of
    Hive.
    Another person, Namit, always committed buggy or ugly code. I have a
    suggestion, just more discussion and tests with the helps of open
    source
    community. Code quality would be raised if so. (I don't intend to in
    the
    personal attacks here)
    Code review is a kind of really hard and boring work. And we can only say
    that the code is much likely with no error. A patch is committed with at
    least two persons' work, the patch submitter and the code reviewer.
    Sometimes the code is really hard to find errors either by eyes or tests, so
    please be more patient. The bugs can be fixed soon after observing.

    And I agree with you suggestion on more discussion, so please comment on the
    jira pages for issues you think need more discussion and tests.
    BTW, I think as the hive community grows, there could be more
    discussions.
    So the first priority issue should be how to enlarge the hive community, and
    let more people involved in the discussion of the hive mail-list or jira.
    Regards,
    Min


    --
    My research interests are distributed systems, parallel computing and
    bytecode based virtual machine.

    My profile:
    http://www.linkedin.com/in/coderplay
    My blog:
    http://coderplay.javaeye.com
  • Ashish Thusoo at Jul 9, 2009 at 6:59 pm
    Min,

    Your feedback is well taken. However, in defence of the team at FB - we have tried to keep the communication as open as possible through the JIRA. Most of the thoughts and design issues are documented there. It is also used a lot as a conversational tool around specific issues. If you are critically blocked on an issue, do mention it on the JIRA and you can be assured of a good response.

    It is true that there are many times we work on things that are a priority for FB but that is not always the case. We have tried to be very community oriented. We have always included contributions from the community in our internal deployments and we have taken contributions in core aspects of the code from the community as well even though they may not be important for FBs environment. I would like you to think of us also as part of the Hive community as opposed to FB.

    Regarding code quality, that is always a work in progress. Trunk is a development branch and some times things break in trunk. Our code reviews minimize that but cannot eliminate that. Code review is done fairly thoroughly, which you can gauge from the quality of code review comments at large. Having said that this is a process of continuous improvements for everyone and it is never a perfect process.

    Regarding roadmap, the current roadmap is published on the wiki. We can be more frequent in our updates but essentially the roadmap is a way of summarizing what core features or performance improvements every one is working on in the JIRA. It is on the wiki and it owned by the Hive community and is built bottom up rather than top down. You will get full support from the Hive community in terms of patches being incorporated even if they are not on the roadmap. The roadmap is more of a guidance rather than a dictat.

    I hope that answers a lot of questions/concerns that you have. I would also encourage a more positive tone in terms of how you can improve things rather than how FB can improve things for you. For example, if you feel a certain code review is not appropriate or has missed things, do jump in and mention your concerns on the JIRA and put your suggestions there. If you feel that some code is badly written, do jump in to fix it or suggest changes. If you feel some issue is taking too much time, chime in and take it on. Most successful movements are bottom up rather than top down and that is what we encourage in Hive as well.

    Cheers,
    Ashish

    -----Original Message-----
    From: Edward Capriolo
    Sent: Thursday, July 09, 2009 8:12 AM
    To: hive-dev@hadoop.apache.org
    Subject: Re: Some wish after serious consideration

    Min,

    I am also waiting for HIVE-438 to do some work with the Hive authentication system. I chose to sideline my work until a solid thrift release.

    I personally am very impressed by the open source contribution at facebook. Hadoop-Hive, Thrift, Scribe, Cassandra probably more stuff I do not know about.

    Also the Hive crew is very amicable! Do you want to see a horror story? Here is a patch that I use for (SSH public keys in LDAP support)!

    http://www.nabble.com/IMPORTANT:-change-of-project-hosting-notice-td17271488.html

    ...However, their efforts to
    find a way to get this kind of functionality into OpenSSH have met with absolutely no reaction whatsoever from the OpenSSH developers. ...

    Software has to developed based on usage. Since thrift has to be very very interoperable and standardized just ram-rodding it with developers may not help since there may need to be numerous cycles between users and developers.

    I believe Hive was/is waiting for an official thrift release before updating.

    2009/7/9 Min Zhou <coderplay@gmail.com>:
    I have been watching HIVE-438 for so long a time, you know that's a
    critical change almost impact the whole hive source tree, a quick resolve is need.
    It's understandable human resources of facebook are very nervous,
    developers there always join several projects at the same time.
    Therefore, we should use the power of the open source community to
    speed up the development of it. But right now, my feeling is that we
    only care about their own affairs, regardless of what other people do.
    This is not the pattern of the open source community, but we still immersed in this pattern.




    2009/7/9 He Yongqiang <heyongqiang@software.ict.ac.cn>
    "But I think no conflicts can be guaranteed, since conflicts are not
    raised by one patch. If no conflict appears to this patch, then there
    will be conflicts for other patches."
    My meaning was not refuse conflicts, there should be a way to avoid
    the frequency of them if we work together.

    I mean

    But I think no conflicts can NOT be guaranteed, since conflicts are
    not raised by one patch. If no conflict appears to this patch, then
    there will be conflicts for other patches.

    On 09-7-9 下午2:43, "He Yongqiang" wrote:
    On 09-7-9 下午2:14, "Min Zhou" wrote:

    Hi all,
    Having focused on hive for several month, here is some wish of me
    after serious consideration

    1. All auto-gen code for hive was under the facebook commercial
    version
    of thrift, which is older than the open source one, would lead
    to
    lots of
    compatible problems and stop from all helps from the open
    source community.
    We need to remove them as soon as possible, but it seems the
    progress on
    this issue has stopped.
    See Hive-438. I think it will be committed by this weekend?
    2. Please give us a clear roadmap. We also have a plan
    improving
    hive,
    but our patches would probably be uncared-for, because it's not
    on
    the
    schedule of facebook. If go on like this, there should be a lot of
    compatible problems brought by other commits, we were surfing
    from
    fixing
    conflicts again and again.
    I think the hive roadmap on hive wiki page has just been updated.
    Please send out request for code review if you think the patch is ready.
    But I think no conflicts can be guaranteed, since conflicts are not raised
    by one patch. If no conflict appears to this patch, then there will
    be conflicts for other patches.
    3. Please don't commit code so rashly. Code from Ashish could
    easily
    be
    committed by others without a strict examination, that caused
    a lot
    of
    problems when using it here, bugs and incondite code hard to
    read and
    to
    extend it. Perhaps the main reason is that Ashish is the leader
    of
    Hive.
    Another person, Namit, always committed buggy or ugly code. I have a
    suggestion, just more discussion and tests with the helps of
    open
    source
    community. Code quality would be raised if so. (I don't intend
    to in
    the
    personal attacks here)
    Code review is a kind of really hard and boring work. And we can
    only say that the code is much likely with no error. A patch is
    committed with at least two persons' work, the patch submitter and the code reviewer.
    Sometimes the code is really hard to find errors either by eyes or
    tests, so
    please be more patient. The bugs can be fixed soon after observing.

    And I agree with you suggestion on more discussion, so please
    comment on the
    jira pages for issues you think need more discussion and tests.
    BTW, I think as the hive community grows, there could be more
    discussions.
    So the first priority issue should be how to enlarge the hive
    community, and
    let more people involved in the discussion of the hive mail-list or jira.
    Regards,
    Min


    --
    My research interests are distributed systems, parallel computing and
    bytecode based virtual machine.

    My profile:
    http://www.linkedin.com/in/coderplay
    My blog:
    http://coderplay.javaeye.com
  • Amr Awadallah at Jul 10, 2009 at 4:29 am
    It's understandable human resources of facebook are very nervous,
    very true, they nervous all the time, just kidding :)

    You are making a good point, that said the Hive facebook team is really one of the better teams out there in terms of progressing the project.

    -- amr


    Min Zhou wrote:
    I have been watching HIVE-438 for so long a time, you know that's a critical
    change almost impact the whole hive source tree, a quick resolve is need.
    It's understandable human resources of facebook are very nervous,
    developers there always join several projects at the same time. Therefore,
    we should use the power of the open source community to speed up the
    development of it. But right now, my feeling is that we only care about
    their own affairs, regardless of what other people do. This is not the
    pattern of the open source community, but we still immersed in this pattern.




    2009/7/9 He Yongqiang <heyongqiang@software.ict.ac.cn>

    "But I think no conflicts can be guaranteed, since conflicts are not raised
    by one patch. If no conflict appears to this patch, then there will be
    conflicts for other patches."
    My meaning was not refuse conflicts, there should be a way to avoid the
    frequency of them if we work together.


    I mean

    But I think no conflicts can NOT be guaranteed, since conflicts are not
    raised by one patch. If no conflict appears to this patch, then there will
    be conflicts for other patches.


    On 09-7-9 下午2:43, "He Yongqiang" wrote:

    On 09-7-9 下午2:14, "Min Zhou" wrote:

    Hi all,
    Having focused on hive for several month, here is some wish of me after
    serious consideration

    1. All auto-gen code for hive was under the facebook commercial
    version
    of thrift, which is older than the open source one, would lead to
    lots of
    compatible problems and stop from all helps from the open source
    community.
    We need to remove them as soon as possible, but it seems the
    progress on
    this issue has stopped.
    See Hive-438. I think it will be committed by this weekend?

    2. Please give us a clear roadmap. We also have a plan improving
    hive,
    but our patches would probably be uncared-for, because it's not on
    the
    schedule of facebook. If go on like this, there should be a lot of
    compatible problems brought by other commits, we were surfing from
    fixing
    conflicts again and again.
    I think the hive roadmap on hive wiki page has just been updated.
    Please send out request for code review if you think the patch is ready.
    But I think no conflicts can be guaranteed, since conflicts are not raised
    by one patch. If no conflict appears to this patch, then there will be
    conflicts for other patches.

    3. Please don't commit code so rashly. Code from Ashish could easily
    be
    committed by others without a strict examination, that caused a lot
    of
    problems when using it here, bugs and incondite code hard to read and
    to
    extend it. Perhaps the main reason is that Ashish is the leader of
    Hive.
    Another person, Namit, always committed buggy or ugly code. I have a
    suggestion, just more discussion and tests with the helps of open
    source
    community. Code quality would be raised if so. (I don't intend to in
    the
    personal attacks here)
    Code review is a kind of really hard and boring work. And we can only say
    that the code is much likely with no error. A patch is committed with at
    least two persons' work, the patch submitter and the code reviewer.
    Sometimes the code is really hard to find errors either by eyes or tests, so
    please be more patient. The bugs can be fixed soon after observing.

    And I agree with you suggestion on more discussion, so please comment on the
    jira pages for issues you think need more discussion and tests.
    BTW, I think as the hive community grows, there could be more
    discussions.
    So the first priority issue should be how to enlarge the hive community, and
    let more people involved in the discussion of the hive mail-list or jira.

    Regards,
    Min

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshive, hadoop
postedJul 9, '09 at 6:14a
activeJul 10, '09 at 4:29a
posts7
users5
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase