Grokbase Groups Hive dev June 2011
FAQ
Optimize get_partition_names_ps()
---------------------------------

Key: HIVE-2213
URL: https://issues.apache.org/jira/browse/HIVE-2213
Project: Hive
Issue Type: Improvement
Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain


If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Search Discussions

  • Sohan Jain (JIRA) at Jun 10, 2011 at 7:01 am
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Sohan Jain updated HIVE-2213:
    -----------------------------

    Attachment: HIVE-2213.1.patch
    Optimize get_partition_names_ps()
    ---------------------------------

    Key: HIVE-2213
    URL: https://issues.apache.org/jira/browse/HIVE-2213
    Project: Hive
    Issue Type: Improvement
    Components: Metastore
    Reporter: Sohan Jain
    Assignee: Sohan Jain
    Attachments: HIVE-2213.1.patch


    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • jiraposter@reviews.apache.org (JIRA) at Jun 10, 2011 at 7:09 am
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047041#comment-13047041 ]

    [email protected] commented on HIVE-2213:
    -----------------------------------------------------


    -----------------------------------------------------------
    This is an automatically generated e-mail. To reply, visit:
    https://reviews.apache.org/r/878/
    -----------------------------------------------------------

    Review request for hive and Paul Yang.


    Summary
    -------

    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.


    This addresses bug HIVE-2213.
    https://issues.apache.org/jira/browse/HIVE-2213


    Diffs
    -----

    trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205
    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1134205
    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1134205
    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1134205
    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1134205
    trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1134205

    Diff: https://reviews.apache.org/r/878/diff


    Testing
    -------

    Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.


    Thanks,

    Sohan


    Optimize get_partition_names_ps()
    ---------------------------------

    Key: HIVE-2213
    URL: https://issues.apache.org/jira/browse/HIVE-2213
    Project: Hive
    Issue Type: Improvement
    Components: Metastore
    Reporter: Sohan Jain
    Assignee: Sohan Jain
    Attachments: HIVE-2213.1.patch


    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • jiraposter@reviews.apache.org (JIRA) at Jun 10, 2011 at 9:04 pm
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047452#comment-13047452 ]

    [email protected] commented on HIVE-2213:
    -----------------------------------------------------


    -----------------------------------------------------------
    This is an automatically generated e-mail. To reply, visit:
    https://reviews.apache.org/r/878/#review804
    -----------------------------------------------------------


    You can do this here or in a separate JIRA, but can you update get_partitions_ps() using a similar technique?


    trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
    <https://reviews.apache.org/r/878/#comment1753>

    Can you refactor with the above function since they are similar?



    trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
    <https://reviews.apache.org/r/878/#comment1754>

    Same here



    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
    <https://reviews.apache.org/r/878/#comment1755>

    To be consistent with the other method, maybe call this listPartitionNamesPs?



    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java
    <https://reviews.apache.org/r/878/#comment1756>

    Combine with above


    - Paul


    On 2011-06-10 07:05:56, Sohan Jain wrote:
    bq.
    bq. -----------------------------------------------------------
    bq. This is an automatically generated e-mail. To reply, visit:
    bq. https://reviews.apache.org/r/878/
    bq. -----------------------------------------------------------
    bq.
    bq. (Updated 2011-06-10 07:05:56)
    bq.
    bq.
    bq. Review request for hive and Paul Yang.
    bq.
    bq.
    bq. Summary
    bq. -------
    bq.
    bq. If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    bq.
    bq.
    bq. This addresses bug HIVE-2213.
    bq. https://issues.apache.org/jira/browse/HIVE-2213
    bq.
    bq.
    bq. Diffs
    bq. -----
    bq.
    bq. trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1134205
    bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1134205
    bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1134205
    bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1134205
    bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1134205
    bq. trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1134205
    bq.
    bq. Diff: https://reviews.apache.org/r/878/diff
    bq.
    bq.
    bq. Testing
    bq. -------
    bq.
    bq. Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.
    bq.
    bq.
    bq. Thanks,
    bq.
    bq. Sohan
    bq.
    bq.


    Optimize get_partition_names_ps()
    ---------------------------------

    Key: HIVE-2213
    URL: https://issues.apache.org/jira/browse/HIVE-2213
    Project: Hive
    Issue Type: Improvement
    Components: Metastore
    Reporter: Sohan Jain
    Assignee: Sohan Jain
    Attachments: HIVE-2213.1.patch


    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • jiraposter@reviews.apache.org (JIRA) at Jun 13, 2011 at 9:12 pm
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048781#comment-13048781 ]

    [email protected] commented on HIVE-2213:
    -----------------------------------------------------


    -----------------------------------------------------------
    This is an automatically generated e-mail. To reply, visit:
    https://reviews.apache.org/r/878/
    -----------------------------------------------------------

    (Updated 2011-06-13 21:11:38.325243)


    Review request for hive and Paul Yang.


    Changes
    -------

    -Refactored similar functions
    -Renamed getPartitionNamesPs() to listPartitionNamesPs()
    -Modified get_partitions_ps() and get_partitions_ps_with_auth() for a similar optimization


    Summary
    -------

    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.


    This addresses bug HIVE-2213.
    https://issues.apache.org/jira/browse/HIVE-2213


    Diffs (updated)
    -----

    trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227
    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227
    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227
    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227
    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227
    trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227

    Diff: https://reviews.apache.org/r/878/diff


    Testing
    -------

    Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.


    Thanks,

    Sohan


    Optimize get_partition_names_ps()
    ---------------------------------

    Key: HIVE-2213
    URL: https://issues.apache.org/jira/browse/HIVE-2213
    Project: Hive
    Issue Type: Improvement
    Components: Metastore
    Reporter: Sohan Jain
    Assignee: Sohan Jain
    Attachments: HIVE-2213.1.patch


    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • jiraposter@reviews.apache.org (JIRA) at Jun 16, 2011 at 5:38 pm
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050585#comment-13050585 ]

    [email protected] commented on HIVE-2213:
    -----------------------------------------------------


    -----------------------------------------------------------
    This is an automatically generated e-mail. To reply, visit:
    https://reviews.apache.org/r/878/#review853
    -----------------------------------------------------------



    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
    <https://reviews.apache.org/r/878/#comment1862>

    Line exceeds 100 char limit


    - Paul


    On 2011-06-13 21:11:38, Sohan Jain wrote:
    bq.
    bq. -----------------------------------------------------------
    bq. This is an automatically generated e-mail. To reply, visit:
    bq. https://reviews.apache.org/r/878/
    bq. -----------------------------------------------------------
    bq.
    bq. (Updated 2011-06-13 21:11:38)
    bq.
    bq.
    bq. Review request for hive and Paul Yang.
    bq.
    bq.
    bq. Summary
    bq. -------
    bq.
    bq. If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    bq.
    bq.
    bq. This addresses bug HIVE-2213.
    bq. https://issues.apache.org/jira/browse/HIVE-2213
    bq.
    bq.
    bq. Diffs
    bq. -----
    bq.
    bq. trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227
    bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227
    bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227
    bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227
    bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227
    bq. trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227
    bq.
    bq. Diff: https://reviews.apache.org/r/878/diff
    bq.
    bq.
    bq. Testing
    bq. -------
    bq.
    bq. Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.
    bq.
    bq.
    bq. Thanks,
    bq.
    bq. Sohan
    bq.
    bq.


    Optimize get_partition_names_ps()
    ---------------------------------

    Key: HIVE-2213
    URL: https://issues.apache.org/jira/browse/HIVE-2213
    Project: Hive
    Issue Type: Improvement
    Components: Metastore
    Reporter: Sohan Jain
    Assignee: Sohan Jain
    Attachments: HIVE-2213.1.patch


    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Paul Yang (JIRA) at Jun 16, 2011 at 5:40 pm
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050586#comment-13050586 ]

    Paul Yang commented on HIVE-2213:
    ---------------------------------

    Looks good, but can you do a minor update to fix lines longer than 100 chars?
    Optimize get_partition_names_ps()
    ---------------------------------

    Key: HIVE-2213
    URL: https://issues.apache.org/jira/browse/HIVE-2213
    Project: Hive
    Issue Type: Improvement
    Components: Metastore
    Reporter: Sohan Jain
    Assignee: Sohan Jain
    Attachments: HIVE-2213.1.patch


    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Sohan Jain (JIRA) at Jun 16, 2011 at 11:32 pm
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Sohan Jain updated HIVE-2213:
    -----------------------------

    Status: Patch Available (was: Open)
    Optimize get_partition_names_ps()
    ---------------------------------

    Key: HIVE-2213
    URL: https://issues.apache.org/jira/browse/HIVE-2213
    Project: Hive
    Issue Type: Improvement
    Components: Metastore
    Reporter: Sohan Jain
    Assignee: Sohan Jain
    Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch


    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • jiraposter@reviews.apache.org (JIRA) at Jun 16, 2011 at 11:32 pm
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050803#comment-13050803 ]

    [email protected] commented on HIVE-2213:
    -----------------------------------------------------


    -----------------------------------------------------------
    This is an automatically generated e-mail. To reply, visit:
    https://reviews.apache.org/r/878/
    -----------------------------------------------------------

    (Updated 2011-06-16 23:30:02.425588)


    Review request for hive and Paul Yang.


    Changes
    -------

    -Fixed line that exceeded 100 chars


    Summary
    -------

    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.


    This addresses bug HIVE-2213.
    https://issues.apache.org/jira/browse/HIVE-2213


    Diffs (updated)
    -----

    trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227
    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227
    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227
    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227
    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227
    trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227

    Diff: https://reviews.apache.org/r/878/diff


    Testing
    -------

    Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.


    Thanks,

    Sohan


    Optimize get_partition_names_ps()
    ---------------------------------

    Key: HIVE-2213
    URL: https://issues.apache.org/jira/browse/HIVE-2213
    Project: Hive
    Issue Type: Improvement
    Components: Metastore
    Reporter: Sohan Jain
    Assignee: Sohan Jain
    Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch


    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Sohan Jain (JIRA) at Jun 16, 2011 at 11:32 pm
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Sohan Jain updated HIVE-2213:
    -----------------------------

    Attachment: HIVE-2213.3.patch

    -Fixed line that exceeded 100 chars
    Optimize get_partition_names_ps()
    ---------------------------------

    Key: HIVE-2213
    URL: https://issues.apache.org/jira/browse/HIVE-2213
    Project: Hive
    Issue Type: Improvement
    Components: Metastore
    Reporter: Sohan Jain
    Assignee: Sohan Jain
    Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch


    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • jiraposter@reviews.apache.org (JIRA) at Jun 17, 2011 at 1:19 am
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050841#comment-13050841 ]

    [email protected] commented on HIVE-2213:
    -----------------------------------------------------


    -----------------------------------------------------------
    This is an automatically generated e-mail. To reply, visit:
    https://reviews.apache.org/r/878/#review858
    -----------------------------------------------------------



    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
    <https://reviews.apache.org/r/878/#comment1877>

    Can we make this method parameterized to reduce the number of casts required? E.g.

    private <T> Collection <T> getPartition...

    We might have to do something like <String>getPartition... when making the call though.


    - Paul


    On 2011-06-16 23:30:02, Sohan Jain wrote:
    bq.
    bq. -----------------------------------------------------------
    bq. This is an automatically generated e-mail. To reply, visit:
    bq. https://reviews.apache.org/r/878/
    bq. -----------------------------------------------------------
    bq.
    bq. (Updated 2011-06-16 23:30:02)
    bq.
    bq.
    bq. Review request for hive and Paul Yang.
    bq.
    bq.
    bq. Summary
    bq. -------
    bq.
    bq. If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    bq.
    bq.
    bq. This addresses bug HIVE-2213.
    bq. https://issues.apache.org/jira/browse/HIVE-2213
    bq.
    bq.
    bq. Diffs
    bq. -----
    bq.
    bq. trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1135227
    bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1135227
    bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1135227
    bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1135227
    bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1135227
    bq. trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1135227
    bq.
    bq. Diff: https://reviews.apache.org/r/878/diff
    bq.
    bq.
    bq. Testing
    bq. -------
    bq.
    bq. Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.
    bq.
    bq.
    bq. Thanks,
    bq.
    bq. Sohan
    bq.
    bq.


    Optimize get_partition_names_ps()
    ---------------------------------

    Key: HIVE-2213
    URL: https://issues.apache.org/jira/browse/HIVE-2213
    Project: Hive
    Issue Type: Improvement
    Components: Metastore
    Reporter: Sohan Jain
    Assignee: Sohan Jain
    Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch


    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • jiraposter@reviews.apache.org (JIRA) at Jun 17, 2011 at 9:24 pm
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051330#comment-13051330 ]

    [email protected] commented on HIVE-2213:
    -----------------------------------------------------


    -----------------------------------------------------------
    This is an automatically generated e-mail. To reply, visit:
    https://reviews.apache.org/r/878/
    -----------------------------------------------------------

    (Updated 2011-06-17 21:22:00.028428)


    Review request for hive and Paul Yang.


    Changes
    -------

    - made getPartitionPsQueryResults() return a parameterized type to avoid lots of casting


    Summary
    -------

    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.


    This addresses bug HIVE-2213.
    https://issues.apache.org/jira/browse/HIVE-2213


    Diffs (updated)
    -----

    trunk/common/src/java/org/apache/hadoop/hive/common/FileUtils.java 1136751
    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1136751
    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1136751
    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1136751
    trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 1136751
    trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1136751

    Diff: https://reviews.apache.org/r/878/diff


    Testing
    -------

    Passes previous test cases for get_partition_names_ps() in TestHiveMetaStore.


    Thanks,

    Sohan


    Optimize get_partition_names_ps()
    ---------------------------------

    Key: HIVE-2213
    URL: https://issues.apache.org/jira/browse/HIVE-2213
    Project: Hive
    Issue Type: Improvement
    Components: Metastore
    Reporter: Sohan Jain
    Assignee: Sohan Jain
    Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch


    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Sohan Jain (JIRA) at Jun 17, 2011 at 9:30 pm
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051333#comment-13051333 ]

    Sohan Jain commented on HIVE-2213:
    ----------------------------------

    I'd also like to point one more thing out. The previous implementation of get_partitions_ps_with_auth() did not actually make use of the inputted user name or group name, nor did it set any auth privileges on the desired partitions.

    This patch adds authentication privileges, which unfortunately slows down get_partitions_ps_with_auth(), since we have to iterate through all of the partitions and set privileges before returning them. What is the desired behavior here?
    Optimize get_partition_names_ps()
    ---------------------------------

    Key: HIVE-2213
    URL: https://issues.apache.org/jira/browse/HIVE-2213
    Project: Hive
    Issue Type: Improvement
    Components: Metastore
    Reporter: Sohan Jain
    Assignee: Sohan Jain
    Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch


    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Paul Yang (JIRA) at Jun 18, 2011 at 12:50 am
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051419#comment-13051419 ]

    Paul Yang commented on HIVE-2213:
    ---------------------------------

    If get_partitions_ps_with_auth() was not correct before, then we should fix the method to produce the correct behavior. Ideally, it should have been done in a separate JIRA, but it should be okay to include in this one.

    +1 looks good though, will test and commit.
    Optimize get_partition_names_ps()
    ---------------------------------

    Key: HIVE-2213
    URL: https://issues.apache.org/jira/browse/HIVE-2213
    Project: Hive
    Issue Type: Improvement
    Components: Metastore
    Reporter: Sohan Jain
    Assignee: Sohan Jain
    Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch


    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Paul Yang (JIRA) at Jun 20, 2011 at 11:47 pm
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Paul Yang updated HIVE-2213:
    ----------------------------

    Summary: Optimize partial specification metastore functions (was: Optimize get_partition_names_ps())
    Optimize partial specification metastore functions
    --------------------------------------------------

    Key: HIVE-2213
    URL: https://issues.apache.org/jira/browse/HIVE-2213
    Project: Hive
    Issue Type: Improvement
    Components: Metastore
    Reporter: Sohan Jain
    Assignee: Sohan Jain
    Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch


    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Paul Yang (JIRA) at Jun 20, 2011 at 11:49 pm
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Paul Yang updated HIVE-2213:
    ----------------------------

    Resolution: Fixed
    Fix Version/s: 0.8.0
    Status: Resolved (was: Patch Available)

    Committed. Thanks Sohan!
    Optimize partial specification metastore functions
    --------------------------------------------------

    Key: HIVE-2213
    URL: https://issues.apache.org/jira/browse/HIVE-2213
    Project: Hive
    Issue Type: Improvement
    Components: Metastore
    Reporter: Sohan Jain
    Assignee: Sohan Jain
    Fix For: 0.8.0

    Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch


    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Hudson (JIRA) at Jun 23, 2011 at 7:49 pm
    [ https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054064#comment-13054064 ]

    Hudson commented on HIVE-2213:
    ------------------------------

    Integrated in Hive-trunk-h0.21 #790 (See [https://builds.apache.org/job/Hive-trunk-h0.21/790/])

    Optimize partial specification metastore functions
    --------------------------------------------------

    Key: HIVE-2213
    URL: https://issues.apache.org/jira/browse/HIVE-2213
    Project: Hive
    Issue Type: Improvement
    Components: Metastore
    Reporter: Sohan Jain
    Assignee: Sohan Jain
    Fix For: 0.8.0

    Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch


    If a table has a large number of partitions, get_partition_names_ps() make take a long time to execute, because we get all of the partition names from the database. This is not very memory efficient, and the operation can be pushed down to the JDO layer without getting all of the names first.
    --
    This message is automatically generated by JIRA.
    For more information on JIRA, see: http://www.atlassian.com/software/jira

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshive, hadoop
postedJun 10, '11 at 6:51a
activeJun 23, '11 at 7:49p
posts17
users1
websitehive.apache.org

1 user in discussion

Hudson (JIRA): 17 posts

People

Translate

site design / logo © 2023 Grokbase