Perhaps this is a good time to also mention that we're currently working on
compiled Python UDFs in Impala. The idea would be that you could write a
UDF in Python and compile it down to LLVM IR, as if it were written in C++.
(This would NOT fire up a Python interpreter.) You would get the
convenience of working with Python with the performance of a compiled
language, and eventually you might get access to some of Python's
libraries, like numpy, scikit-learn, nltk, etc.
Some caveats: this is still highly speculative, though we see a path where
it could work. At least to begin with, this would also only support the
Impala primitive types.
Uri
On Fri, Feb 7, 2014 at 11:55 AM, Jonathan Schuff
wrote:
Just wanted to drop in and convey our interest in non-primitive data type
support. Many of our use cases require the dynamic handling of arrays and
structures, such as performing time series analyses over large datasets
(which is coincidentally the only reason why Hive is still a primary aspect
of our stack). If non-primitive data type support is added, I believe that
Impala would become *much *more useful for a broad range of complex
scientific analyses, especially in conjunction with Hive UDF support.
Thanks for reading!
Jon
On Sunday, July 28, 2013 12:52:58 PM UTC-4, Marcel Kornacker wrote:Impala does not support non-scalar column types right now, but we are
actively looking into adding support for that in the medium term.
On Sat, Jul 27, 2013 at 3:25 PM, SN wrote:Bumping up this topic again since Its relevant in my use case. Does impala
still not support non primitive data types...
I have a dynamic column family in hbase that I am mapping using a
map<string,string> in hive table
Something of this form:
CREATE EXTERNAL TABLE tsp_raw(row_key string,value map<string,string>) row
format delimited fields terminated by ',' STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH
SERDEPROPERTIES
("hbase.columns.mapping" = ":key,cf:,") TBLPROPERTIES("hbase.table.name" =
"tsp");
However, when I try to read it in with the jdbc connector (via
impalad), am
unable to parse the value column which is a map collection...
Any thoughts.. when I run the query directly in hive I of course do get a
complete dump of the map. Any work arounds for the time being would also be
very helpful.
Thanks.
On Tuesday, January 29, 2013 7:29:27 PM UTC-5, Marcel Kornacker wrote:We can't promise specific dates yet, but we know this won't arrive in
the GA timeframe (early April). It's high on our list, though, because
it's been requested quite a few times, which means it'll be one of the
first things we'll be working on after GA.
Marcel
On Tue, Jan 29, 2013 at 4:17 PM, Steven Wong <
[email protected]>
wrote:
When will Impala support non-primitive data types, especially map
and
To unsubscribe from this group and stop receiving emails from it, send an
email to
[email protected].
--
Uri Laserson, PhD
Data Scientist, Cloudera
Twitter/GitHub: @laserson
+1 617 910 0447
[email protected]To unsubscribe from this group and stop receiving emails from it, send an email to
[email protected].