FAQ
Hi Barry,

What version of Impala are you using? Can you also send us the explain plan?

Thanks,
Alan
On Wednesday, March 20, 2013 7:50:26 AM UTC-7, Barry Becker wrote:

I have a query that sometimes gives different results with
no discernible pattern that I can see.
The query is

SELECT auto_discount / revenue as discount_percent FROM sales_fact_ order
by discount_percent desc limit 10

Sometimes the result is

inf
inf
inf
inf
inf
inf
inf
inf
inf


and sometimes it is

inf
inf
inf
inf
inf
inf
inf
inf
inf
-nan

and occasionally it is

-nan
-nan
-nan
-nan
-nan
-nan
-nan
-nan
-nan
-nan

Can you explain why the results may not be consistent for an identical
query issued multiple times in succession?

Search Discussions

  • Barry Becker at Mar 21, 2013 at 2:40 am
    We are using impala version 0.6.
    Below is the explain plan for the above query. Another thing we notice is
    that if we use "limit x" then there will be x-1 infinities or NaN followed
    by a lone 0 or -NaN at the end.

    explain SELECT auto_discount / revenue as discount_percent FROM
    pa_sales_fact order by discount_percent desc limit 10
    Explain query: SELECT auto_discount / revenue as discount_percent FROM
    pa_sales_fact order by discount_percent desc limit 10
    Plan Fragment 0
    UNPARTITIONED
    TOP-N
    ORDER BY: auto_discount / revenue DESC
    LIMIT: 10
    TUPLE IDS: 0
    EXCHANGE (2)
    TUPLE IDS: 0

    Plan Fragment 1
    RANDOM
    STREAM DATA SINK
    EXCHANGE ID: 2
    UNPARTITIONED

    SCAN HDFS table=default.pa_sales_fact #partitions=1 size=1.67GB (0)
    TUPLE IDS: 0

    On Wed, Mar 20, 2013 at 8:59 PM, Alan wrote:

    Hi Barry,

    What version of Impala are you using? Can you also send us the explain
    plan?

    Thanks,
    Alan

    On Wednesday, March 20, 2013 7:50:26 AM UTC-7, Barry Becker wrote:

    I have a query that sometimes gives different results with
    no discernible pattern that I can see.
    The query is

    SELECT auto_discount / revenue as discount_percent FROM sales_fact_
    order by discount_percent desc limit 10

    Sometimes the result is

    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf


    and sometimes it is

    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    -nan

    and occasionally it is

    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan

    Can you explain why the results may not be consistent for an identical
    query issued multiple times in succession?

    --
    -Barry
  • Alan Choi at Mar 21, 2013 at 4:57 pm
    Hi Barry,

    Sounds like we've a bug here. Would you mind giving us some sample data
    that can cause this problem? We would like to repro it in house.

    I've created this JIRA (https://issues.cloudera.org/browse/IMPALA-162) to
    track it.

    Thanks,
    Alan

    On Wed, Mar 20, 2013 at 7:40 PM, Barry Becker wrote:

    We are using impala version 0.6.
    Below is the explain plan for the above query. Another thing we notice is
    that if we use "limit x" then there will be x-1 infinities or NaN followed
    by a lone 0 or -NaN at the end.

    explain SELECT auto_discount / revenue as discount_percent FROM
    pa_sales_fact order by discount_percent desc limit 10
    Explain query: SELECT auto_discount / revenue as discount_percent FROM
    pa_sales_fact order by discount_percent desc limit 10
    Plan Fragment 0
    UNPARTITIONED
    TOP-N
    ORDER BY: auto_discount / revenue DESC
    LIMIT: 10
    TUPLE IDS: 0
    EXCHANGE (2)
    TUPLE IDS: 0

    Plan Fragment 1
    RANDOM
    STREAM DATA SINK
    EXCHANGE ID: 2
    UNPARTITIONED

    SCAN HDFS table=default.pa_sales_fact #partitions=1 size=1.67GB (0)
    TUPLE IDS: 0

    On Wed, Mar 20, 2013 at 8:59 PM, Alan wrote:

    Hi Barry,

    What version of Impala are you using? Can you also send us the explain
    plan?

    Thanks,
    Alan

    On Wednesday, March 20, 2013 7:50:26 AM UTC-7, Barry Becker wrote:

    I have a query that sometimes gives different results with
    no discernible pattern that I can see.
    The query is

    SELECT auto_discount / revenue as discount_percent FROM sales_fact_
    order by discount_percent desc limit 10

    Sometimes the result is

    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf


    and sometimes it is

    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    -nan

    and occasionally it is

    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan

    Can you explain why the results may not be consistent for an identical
    query issued multiple times in succession?

    --
    -Barry
  • Nong Li at Mar 21, 2013 at 5:00 pm
    Hey Barry,

    Can you run select count(*) from sales_fact_ where revenue = 0?

    Thanks
    Nong
    On Thu, Mar 21, 2013 at 9:57 AM, Alan Choi wrote:

    Hi Barry,

    Sounds like we've a bug here. Would you mind giving us some sample data
    that can cause this problem? We would like to repro it in house.

    I've created this JIRA (https://issues.cloudera.org/browse/IMPALA-162) to
    track it.

    Thanks,
    Alan

    On Wed, Mar 20, 2013 at 7:40 PM, Barry Becker wrote:

    We are using impala version 0.6.
    Below is the explain plan for the above query. Another thing we notice is
    that if we use "limit x" then there will be x-1 infinities or NaN followed
    by a lone 0 or -NaN at the end.

    explain SELECT auto_discount / revenue as discount_percent FROM
    pa_sales_fact order by discount_percent desc limit 10
    Explain query: SELECT auto_discount / revenue as discount_percent FROM
    pa_sales_fact order by discount_percent desc limit 10
    Plan Fragment 0
    UNPARTITIONED
    TOP-N
    ORDER BY: auto_discount / revenue DESC
    LIMIT: 10
    TUPLE IDS: 0
    EXCHANGE (2)
    TUPLE IDS: 0

    Plan Fragment 1
    RANDOM
    STREAM DATA SINK
    EXCHANGE ID: 2
    UNPARTITIONED

    SCAN HDFS table=default.pa_sales_fact #partitions=1 size=1.67GB (0)
    TUPLE IDS: 0

    On Wed, Mar 20, 2013 at 8:59 PM, Alan wrote:

    Hi Barry,

    What version of Impala are you using? Can you also send us the explain
    plan?

    Thanks,
    Alan

    On Wednesday, March 20, 2013 7:50:26 AM UTC-7, Barry Becker wrote:

    I have a query that sometimes gives different results with
    no discernible pattern that I can see.
    The query is

    SELECT auto_discount / revenue as discount_percent FROM sales_fact_
    order by discount_percent desc limit 10

    Sometimes the result is

    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf


    and sometimes it is

    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    -nan

    and occasionally it is

    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan

    Can you explain why the results may not be consistent for an identical
    query issued multiple times in succession?

    --
    -Barry
  • Barry Becker at Mar 25, 2013 at 2:53 pm
    I was able to reproduce the same problem on a much smaller dataset.
    The query is now
    select special_promo_cost / cash_payment_discount as DISCOUNT_PERCENT from
    PD_SALES_TRANSACTIONS order by DISCOUNT_PERCENT asc limit 10

    select count(*) from pd_sales_transactions
    gives 543623
    and
    select count(*) from pd_sales_transactions where cash_payment_discount = 0
    gives 407806

    the ordering of values seems completely random in the results. Seems to be
    related to having lots of divisions by 0 present.
    On Thursday, March 21, 2013 12:00:00 PM UTC-5, Nong wrote:

    Hey Barry,

    Can you run select count(*) from sales_fact_ where revenue = 0?

    Thanks
    Nong

    On Thu, Mar 21, 2013 at 9:57 AM, Alan Choi <al...@cloudera.com<javascript:>
    wrote:
    Hi Barry,

    Sounds like we've a bug here. Would you mind giving us some sample data
    that can cause this problem? We would like to repro it in house.

    I've created this JIRA (https://issues.cloudera.org/browse/IMPALA-162)
    to track it.

    Thanks,
    Alan


    On Wed, Mar 20, 2013 at 7:40 PM, Barry Becker <barryb...@gmail.com<javascript:>
    wrote:
    We are using impala version 0.6.
    Below is the explain plan for the above query. Another thing we notice
    is that if we use "limit x" then there will be x-1 infinities or NaN
    followed by a lone 0 or -NaN at the end.

    explain SELECT auto_discount / revenue as discount_percent FROM
    pa_sales_fact order by discount_percent desc limit 10
    Explain query: SELECT auto_discount / revenue as discount_percent FROM
    pa_sales_fact order by discount_percent desc limit 10
    Plan Fragment 0
    UNPARTITIONED
    TOP-N
    ORDER BY: auto_discount / revenue DESC
    LIMIT: 10
    TUPLE IDS: 0
    EXCHANGE (2)
    TUPLE IDS: 0

    Plan Fragment 1
    RANDOM
    STREAM DATA SINK
    EXCHANGE ID: 2
    UNPARTITIONED

    SCAN HDFS table=default.pa_sales_fact #partitions=1 size=1.67GB (0)
    TUPLE IDS: 0


    On Wed, Mar 20, 2013 at 8:59 PM, Alan <al...@cloudera.com <javascript:>>wrote:
    Hi Barry,

    What version of Impala are you using? Can you also send us the explain
    plan?

    Thanks,
    Alan

    On Wednesday, March 20, 2013 7:50:26 AM UTC-7, Barry Becker wrote:

    I have a query that sometimes gives different results with
    no discernible pattern that I can see.
    The query is

    SELECT auto_discount / revenue as discount_percent FROM sales_fact_
    order by discount_percent desc limit 10

    Sometimes the result is

    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf


    and sometimes it is

    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    -nan

    and occasionally it is

    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan

    Can you explain why the results may not be consistent for an identical
    query issued multiple times in succession?

    --
    -Barry
  • Barry Becker at Mar 25, 2013 at 3:07 pm
    Yes.
    I just exported the table to a csv file and compressed it. Its 49m in size.
    I can email you a link to a dropbox and give you the create table
    statement. What email should I send the info to?
    On Mon, Mar 25, 2013 at 10:00 AM, Marcel Kornacker wrote:

    Barry, would you be able to share the Create Table statements and the
    data file with us?
    On Mon, Mar 25, 2013 at 7:53 AM, Barry Becker wrote:
    I was able to reproduce the same problem on a much smaller dataset.
    The query is now
    select special_promo_cost / cash_payment_discount as DISCOUNT_PERCENT from
    PD_SALES_TRANSACTIONS order by DISCOUNT_PERCENT asc limit 10

    select count(*) from pd_sales_transactions
    gives 543623
    and
    select count(*) from pd_sales_transactions where cash_payment_discount = 0
    gives 407806

    the ordering of values seems completely random in the results. Seems to be
    related to having lots of divisions by 0 present.
    On Thursday, March 21, 2013 12:00:00 PM UTC-5, Nong wrote:

    Hey Barry,

    Can you run select count(*) from sales_fact_ where revenue = 0?

    Thanks
    Nong
    On Thu, Mar 21, 2013 at 9:57 AM, Alan Choi wrote:

    Hi Barry,

    Sounds like we've a bug here. Would you mind giving us some sample data
    that can cause this problem? We would like to repro it in house.

    I've created this JIRA (https://issues.cloudera.org/browse/IMPALA-162)
    to
    track it.

    Thanks,
    Alan


    On Wed, Mar 20, 2013 at 7:40 PM, Barry Becker <barryb...@gmail.com>
    wrote:
    We are using impala version 0.6.
    Below is the explain plan for the above query. Another thing we notice
    is that if we use "limit x" then there will be x-1 infinities or NaN
    followed by a lone 0 or -NaN at the end.

    explain SELECT auto_discount / revenue as discount_percent FROM
    pa_sales_fact order by discount_percent desc limit 10
    Explain query: SELECT auto_discount / revenue as discount_percent FROM
    pa_sales_fact order by discount_percent desc limit 10
    Plan Fragment 0
    UNPARTITIONED
    TOP-N
    ORDER BY: auto_discount / revenue DESC
    LIMIT: 10
    TUPLE IDS: 0
    EXCHANGE (2)
    TUPLE IDS: 0

    Plan Fragment 1
    RANDOM
    STREAM DATA SINK
    EXCHANGE ID: 2
    UNPARTITIONED

    SCAN HDFS table=default.pa_sales_fact #partitions=1 size=1.67GB (0)
    TUPLE IDS: 0

    On Wed, Mar 20, 2013 at 8:59 PM, Alan wrote:

    Hi Barry,

    What version of Impala are you using? Can you also send us the
    explain
    plan?

    Thanks,
    Alan

    On Wednesday, March 20, 2013 7:50:26 AM UTC-7, Barry Becker wrote:

    I have a query that sometimes gives different results with no
    discernible pattern that I can see.
    The query is

    SELECT auto_discount / revenue as discount_percent FROM sales_fact_
    order by discount_percent desc limit 10

    Sometimes the result is

    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf


    and sometimes it is

    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    -nan

    and occasionally it is

    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan

    Can you explain why the results may not be consistent for an
    identical
    query issued multiple times in succession?



    --
    -Barry


    --
    -Barry
  • Barry Becker at Mar 25, 2013 at 3:28 pm
    Nong and Alan,

    Regading this issue https://issues.cloudera.org/browse/IMPALA-162

    Here is the data that I can use to reproduce the problem:
    https://dl.dropbox.com/u/6248568/pxqa_sales_transactions.zip

    Here is the query

    select special_promo_cost / cash_payment_discount as DISCOUNT_PERCENT from
    PD_SALES_TRANSACTIONS order by DISCOUNT_PERCENT asc limit 10

    and finally, here is the create table statement

    CREATE TABLE [dbo].[PD_SALES_TRANSACTIONS](

    [TRANSACTION_ID] [varchar](256) NOT NULL,

    [TIME_ID] [varchar](256) NOT NULL,

    [CUSTOMER_ID] [varchar](255) NOT NULL,

    [CUSTOMER_STORE_ID] [varchar](256) NOT NULL,

    [PRODUCT_ID] [varchar](256) NOT NULL,

    [PLANT_ID] [varchar](256) NOT NULL,

    [SALES_ORG_ID] [varchar](256) NOT NULL,

    [SALES_PERSON_ID] [varchar](256) NOT NULL,

    [CURRENCY_ID] [varchar](256) NOT NULL,

    [FD_PROMOTION_CODE] [varchar](256) NOT NULL,

    [FD_DEAL_ID] [varchar](256) NOT NULL,

    [PROMOTION_CODE] [varchar](256) NULL,

    [PG_VALIDVALUE] [varchar](256) NULL,

    [SALES_VOLUME] [float] NULL,

    [LIST_PRICE] [float] NULL,

    [ORDER_SIZE_DISCOUNT] [float] NULL,

    [STD_DISCOUNT] [float] NULL,

    [PROMOTION_COST] [float] NULL,

    [CASH_PAYMENT_DISCOUNT] [float] NULL,

    [CARRYING_COST] [float] NULL,

    [DISTRIBUTION_COST] [float] NULL,

    [SPECIAL_PROMO_COST] [float] NULL,

    [ADVERTISING_COSTS] [float] NULL,

    [PREFERRED_DISCOUNTS] [float] NULL,

    [COGS] [float] NULL,

    [FREIGHT] [float] NULL,

    [WARRANTY] [float] NULL,

    [NET_REVENUE] [float] NULL,

    [POCKET_PRICE] [float] NULL,

    [POCKET_MARGIN] [float] NULL,

    [INVOICE] [float] NULL,

    [MARGIN_PERCENT] [float] NULL,

    [MARKUP_PERCENT] [float] NULL,

    [DISC_PERCENT] [float] NULL,

    [PRICE_INDEX] [float] NULL,

    [EXTRACTION_TIME] [datetime] NULL,

    [LOADID] [int] NULL,

    [NIRAKARSEGMENTATION] [varchar](256) NULL,

    [PROS_DATALOAD_TIME] [datetime] NULL,

    CONSTRAINT [PD_SALES_TRANSACTIONS_PK] PRIMARY KEY CLUSTERED

    (

    [TRANSACTION_ID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
    OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]
    On Wednesday, March 20, 2013 9:50:26 AM UTC-5, Barry Becker wrote:

    I have a query that sometimes gives different results with
    no discernible pattern that I can see.
    The query is

    SELECT auto_discount / revenue as discount_percent FROM sales_fact_ order
    by discount_percent desc limit 10

    Sometimes the result is

    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf


    and sometimes it is

    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    inf
    -nan

    and occasionally it is

    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan
    -nan

    Can you explain why the results may not be consistent for an identical
    query issued multiple times in succession?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedMar 21, '13 at 1:59a
activeMar 25, '13 at 3:28p
posts7
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase