FAQ
Nice thin about that patch is that it works on the reader side with old (incorrectly written) files.

We will make a new parquet-mr release this week to include this and other fixes & optimizations.
On Dec 9, 2013, at 12:34 PM, Keith Simmons wrote:

Ah, it does look the same, at least on the writer side. Seems I'm not the only one that's run into the issue.

On Mon, Dec 9, 2013 at 12:18 PM, Dmitriy Ryaboy wrote:
Keith,
Is this the same fix?
https://github.com/Parquet/parquet-mr/commit/b297c73c1082728ad9626d17ce0f7abe6abaa36b

On Mon, Dec 9, 2013 at 11:04 AM, Keith Simmons wrote:
We ran into the same error. The parquet-hadoop subproject of parquet-mr jar has a bug where it writes an incorrect size for dictionary pages (the size doesn't include the header, only the data). Impala uses these sizes to figure out the read size off disk, so it results in an incomplete read, and hence the thrift deserialization error. The parquet-hadoop jar only uses offsets when desierializing, which are correctly set, so hive doesn't exhibit this bug.

We haven't yet sent a pull request back to the parquet folks yet, but you can look at the last commit here to see the necessary change if you want to compile your own jar: https://github.com/pulseio/parquet-mr

Keith

On Sun, Dec 8, 2013 at 10:30 PM, Sean O'Brien wrote:
Hi All,

We've been having issues with a bunch of queries that work over some parquet tables we generate from our own MR. These tables have worked since our upgrade to impala 1.2, so it's not a total breakage with the upgrade... it seems more like a bug or something particular to one or two odd rows.

I can produce the failure on a given partition that I've figured out is bad with a query like:

select max(some_string_filed) from parquet_table where dt='2013-12-07' and hr='12';

the error I get:
ERRORS ENCOUNTERED DURING EXECUTION:
Backend 4:couldn't deserialize thrift msg:
No more data to read.

When I find a node that I believe was the source of the error (still not sure how to determine which 'backend' is which). I see:

I1208 21:17:44.664397 2897 status.cc:44] couldn't deserialize thrift msg:
No more data to read.
@ 0x6c56e0 impala::Status::Status()
@ 0x9ab081 impala::DeserializeThriftMsg<>()
@ 0x9ac177 impala::HdfsParquetScanner::BaseColumnReader::ReadDataPage()
@ 0x9ad46f impala::HdfsParquetScanner::AssembleRows()
@ 0x9b0268 impala::HdfsParquetScanner::ProcessSplit()
@ 0x99229a impala::HdfsScanNode::ScannerThreadHelper()
@ 0x98d7f3 impala::HdfsScanNode::ScannerThread()
@ 0x7dfdfc impala::Thread::SuperviseThread()
@ 0x7e070e boost::detail::thread_data<>::run()
@ 0xa28884 thread_proxy
@ 0x7f3493e37e9a start_thread
@ 0x7f3492adc3fd (unknown)

We've tried version matching parquet-mr to 1.2.5 since that appeared to be the version cdh4.5 uses. What version of parquet is impala 1.2.1 using perhaps we need to compile our parquet generating MR's against another version to get things working again.

Also is there any way to get impalad to give me more information about where in a given file it's failing?

Thanks
-Sean


To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 4 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedDec 9, '13 at 7:04p
activeDec 9, '13 at 8:43p
posts4
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Keith Simmons: 2 posts Dmitriy Ryaboy: 2 posts

People

Translate

site design / logo © 2021 Grokbase