FAQ
Hi,

I wonder how Impala supports file systems other than HDFS, for example,
Amazon s3n. Is the support provided by Hadoop file system or Impala has to
provide some native code to support them?

Thanks in advance,

Jian

Search Discussions

  • Jian Fang at Feb 20, 2013 at 7:00 pm
    I followed the instruction on
    https://github.com/amplab/shark/wiki/Running-Shark-on-EC2 to create a hive
    table as follows

    CREATE EXTERNAL TABLE wiki_small (id BIGINT, title STRING, last_modified STRING, xml STRING, text STRING)
            ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3n://spark-data/wikipedia-sample/';

    I can read the data from the hive table wiki_small without any problem.

    hive > select * from wiki_small limit 10;

    But when I did the same for impala-shell, I got the following errors:

    $ impala-shell -r -i localhost:21000
    Connected to localhost:21000
    Welcome to the Impala shell. Press TAB twice to see a list of available
    commands.

    Copyright (c) 2012 Cloudera, Inc. All rights reserved.

    (Build version: Impala v0.5 (5a64c63) built on Fri Feb 1 11:21:22 PST 2013)
    Successfully refreshed catalog
    [localhost:21000] > select * from wiki_small limit 10;
    Query: select * from wiki_small limit 10
    ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in
    select * from wiki_small limit 10)
             at
    com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133)
             at
    com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221)
             at
    com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89)
    Caused by: com.cloudera.impala.common.AnalysisException: Failed to load
    metadata for table: wiki_small
             at
    com.cloudera.impala.analysis.Analyzer.registerBaseTableRef(Analyzer.java:174)
             at
    com.cloudera.impala.analysis.BaseTableRef.analyze(BaseTableRef.java:51)
             at
    com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:115)
             at
    com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130)
             ... 2 more
    Caused by: com.cloudera.impala.catalog.Db$TableLoadingException: Failed to
    load metadata for table: wiki_small
             at com.cloudera.impala.catalog.HdfsTable.load(HdfsTable.java:582)
             at com.cloudera.impala.catalog.Table.load(Table.java:113)
             at
    com.cloudera.impala.catalog.Db$LazyTableMap.loadTable(Db.java:191)
             at
    com.cloudera.impala.catalog.Db$LazyTableMap.access$000(Db.java:112)
             at com.cloudera.impala.catalog.Db$LazyTableMap$1.apply(Db.java:118)
             at com.cloudera.impala.catalog.Db$LazyTableMap$1.apply(Db.java:116)
             at
    com.google.common.collect.ComputingConcurrentHashMap$ComputingValueReference.compute(ComputingConcurrentHashMap.java:355)
             at
    com.google.common.collect.ComputingConcurrentHashMap$ComputingSegment.compute(ComputingConcurrentHashMap.java:184)
             at
    com.google.common.collect.ComputingConcurrentHashMap$ComputingSegment.getOrCompute(ComputingConcurrentHashMap.java:153)
             at
    com.google.common.collect.ComputingConcurrentHashMap.getOrCompute(ComputingConcurrentHashMap.java:69)
             at
    com.google.common.collect.ComputingConcurrentHashMap$ComputingMapAdapter.get(ComputingConcurrentHashMap.java:393)
             at com.cloudera.impala.catalog.Db$LazyTableMap.get(Db.java:160)
             at com.cloudera.impala.catalog.Db.getTable(Db.java:254)
             at
    com.cloudera.impala.analysis.Analyzer.registerBaseTableRef(Analyzer.java:172)
             ... 5 more
    Caused by: org.apache.hadoop.fs.s3.S3Exception:
    org.jets3t.service.S3ServiceException: Request Error. HEAD
    '/wikipedia-sample' on Host 'spark-data.s3.amazonaws.com' @ 'Wed, 20 Feb
    2013 18:49:49 GMT' -- ResponseCode: 404, ResponseStatus: Not Found,
    RequestId: DDA22667E335D024, HostId: ....
             at
    org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleServiceException(Jets3tNativeFileSystemStore.java:245)
             at
    org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:119)
             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
             at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
             at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
             at java.lang.reflect.Method.invoke(Method.java:616)
             at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
             at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
             at org.apache.hadoop.fs.s3native.$Proxy12.retrieveMetadata(Unknown
    Source)
             at
    org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:414)
             at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1333)
             at
    com.cloudera.impala.catalog.HdfsTable.addPartition(HdfsTable.java:531)
             at
    com.cloudera.impala.catalog.HdfsTable.loadPartitions(HdfsTable.java:489)
             at com.cloudera.impala.catalog.HdfsTable.load(HdfsTable.java:578)
             ... 18 more
    Caused by: org.jets3t.service.S3ServiceException: Request Error. HEAD
    '/wikipedia-sample' on Host 'spark-data.s3.amazonaws.com' @ 'Wed, 20 Feb
    2013 18:49:49 GMT' -- ResponseCode: 404, ResponseStatus: Not Found,
    RequestId: DDA22667E335D024, HostId: ....
             at
    org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:520)
             at
    org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestHead(RestS3Service.java:868)
             at
    org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:2016)
             at
    org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectDetailsImpl(RestS3Service.java:1944)
             at
    org.jets3t.service.S3Service.getObjectDetails(S3Service.java:3059)
             at
    org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1940)
             at
    org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:111)
             ... 30 more
    Caused by: org.jets3t.service.impl.rest.HttpException
             at
    org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:518)
             ... 36 more

    Seems Impala did not handle s3n file system properly because
    "/wikipedia-sample" is not a key and impala should do the same as hive to
    read the keys in this container and do not treat the container as a key.

    BTW, Impala version is 0.5.

    Thanks,

    Jian


    On Wednesday, February 20, 2013 9:51:50 AM UTC-8, Jian Fang wrote:

    Hi,

    I wonder how Impala supports file systems other than HDFS, for example,
    Amazon s3n. Is the support provided by Hadoop file system or Impala has to
    provide some native code to support them?

    Thanks in advance,

    Jian
  • Aaron A at May 8, 2013 at 9:37 pm
    Jian,

    I don't think Impala support s3 directly. It is logical but somehow they
    don't.

    Aaron
    On Wednesday, February 20, 2013 2:00:51 PM UTC-5, Jian Fang wrote:

    I followed the instruction on
    https://github.com/amplab/shark/wiki/Running-Shark-on-EC2 to create a
    hive table as follows

    CREATE EXTERNAL TABLE wiki_small (id BIGINT, title STRING, last_modified STRING, xml STRING, text STRING)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3n://spark-data/wikipedia-sample/';

    I can read the data from the hive table wiki_small without any problem.

    hive > select * from wiki_small limit 10;

    But when I did the same for impala-shell, I got the following errors:

    $ impala-shell -r -i localhost:21000
    Connected to localhost:21000
    Welcome to the Impala shell. Press TAB twice to see a list of available
    commands.

    Copyright (c) 2012 Cloudera, Inc. All rights reserved.

    (Build version: Impala v0.5 (5a64c63) built on Fri Feb 1 11:21:22 PST
    2013)
    Successfully refreshed catalog
    [localhost:21000] > select * from wiki_small limit 10;
    Query: select * from wiki_small limit 10
    ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception
    (in select * from wiki_small limit 10)
    at
    com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133)
    at
    com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221)
    at
    com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89)
    Caused by: com.cloudera.impala.common.AnalysisException: Failed to load
    metadata for table: wiki_small
    at
    com.cloudera.impala.analysis.Analyzer.registerBaseTableRef(Analyzer.java:174)
    at
    com.cloudera.impala.analysis.BaseTableRef.analyze(BaseTableRef.java:51)
    at
    com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:115)
    at
    com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130)
    ... 2 more
    Caused by: com.cloudera.impala.catalog.Db$TableLoadingException: Failed to
    load metadata for table: wiki_small
    at com.cloudera.impala.catalog.HdfsTable.load(HdfsTable.java:582)
    at com.cloudera.impala.catalog.Table.load(Table.java:113)
    at
    com.cloudera.impala.catalog.Db$LazyTableMap.loadTable(Db.java:191)
    at
    com.cloudera.impala.catalog.Db$LazyTableMap.access$000(Db.java:112)
    at com.cloudera.impala.catalog.Db$LazyTableMap$1.apply(Db.java:118)
    at com.cloudera.impala.catalog.Db$LazyTableMap$1.apply(Db.java:116)
    at
    com.google.common.collect.ComputingConcurrentHashMap$ComputingValueReference.compute(ComputingConcurrentHashMap.java:355)
    at
    com.google.common.collect.ComputingConcurrentHashMap$ComputingSegment.compute(ComputingConcurrentHashMap.java:184)
    at
    com.google.common.collect.ComputingConcurrentHashMap$ComputingSegment.getOrCompute(ComputingConcurrentHashMap.java:153)
    at
    com.google.common.collect.ComputingConcurrentHashMap.getOrCompute(ComputingConcurrentHashMap.java:69)
    at
    com.google.common.collect.ComputingConcurrentHashMap$ComputingMapAdapter.get(ComputingConcurrentHashMap.java:393)
    at com.cloudera.impala.catalog.Db$LazyTableMap.get(Db.java:160)
    at com.cloudera.impala.catalog.Db.getTable(Db.java:254)
    at
    com.cloudera.impala.analysis.Analyzer.registerBaseTableRef(Analyzer.java:172)
    ... 5 more
    Caused by: org.apache.hadoop.fs.s3.S3Exception:
    org.jets3t.service.S3ServiceException: Request Error. HEAD
    '/wikipedia-sample' on Host 'spark-data.s3.amazonaws.com' @ 'Wed, 20 Feb
    2013 18:49:49 GMT' -- ResponseCode: 404, ResponseStatus: Not Found,
    RequestId: DDA22667E335D024, HostId: ....
    at
    org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleServiceException(Jets3tNativeFileSystemStore.java:245)
    at
    org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:119)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
    at org.apache.hadoop.fs.s3native.$Proxy12.retrieveMetadata(Unknown
    Source)
    at
    org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:414)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1333)
    at
    com.cloudera.impala.catalog.HdfsTable.addPartition(HdfsTable.java:531)
    at
    com.cloudera.impala.catalog.HdfsTable.loadPartitions(HdfsTable.java:489)
    at com.cloudera.impala.catalog.HdfsTable.load(HdfsTable.java:578)
    ... 18 more
    Caused by: org.jets3t.service.S3ServiceException: Request Error. HEAD
    '/wikipedia-sample' on Host 'spark-data.s3.amazonaws.com' @ 'Wed, 20 Feb
    2013 18:49:49 GMT' -- ResponseCode: 404, ResponseStatus: Not Found,
    RequestId: DDA22667E335D024, HostId: ....
    at
    org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:520)
    at
    org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestHead(RestS3Service.java:868)
    at
    org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:2016)
    at
    org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectDetailsImpl(RestS3Service.java:1944)
    at
    org.jets3t.service.S3Service.getObjectDetails(S3Service.java:3059)
    at
    org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1940)
    at
    org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:111)
    ... 30 more
    Caused by: org.jets3t.service.impl.rest.HttpException
    at
    org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:518)
    ... 36 more

    Seems Impala did not handle s3n file system properly because
    "/wikipedia-sample" is not a key and impala should do the same as hive to
    read the keys in this container and do not treat the container as a key.

    BTW, Impala version is 0.5.

    Thanks,

    Jian


    On Wednesday, February 20, 2013 9:51:50 AM UTC-8, Jian Fang wrote:

    Hi,

    I wonder how Impala supports file systems other than HDFS, for example,
    Amazon s3n. Is the support provided by Hadoop file system or Impala has to
    provide some native code to support them?

    Thanks in advance,

    Jian
  • Hari Sekhon at May 8, 2013 at 11:13 pm
    Impala is designed for speed, S3 is slow, and even worse you wouldn't
    achieve data locality.

    On 8 May 2013 22:37, Aaron A wrote:

    Jian,

    I don't think Impala support s3 directly. It is logical but somehow they
    don't.

    Aaron
    On Wednesday, February 20, 2013 2:00:51 PM UTC-5, Jian Fang wrote:

    I followed the instruction on https://github.com/amplab/**
    shark/wiki/Running-Shark-on-**EC2<https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>to create a hive table as follows

    CREATE EXTERNAL TABLE wiki_small (id BIGINT, title STRING, last_modified STRING, xml STRING, text STRING)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3n://spark-data/wikipedia-**sample/';

    I can read the data from the hive table wiki_small without any problem.

    hive > select * from wiki_small limit 10;

    But when I did the same for impala-shell, I got the following errors:

    $ impala-shell -r -i localhost:21000
    Connected to localhost:21000
    Welcome to the Impala shell. Press TAB twice to see a list of available
    commands.

    Copyright (c) 2012 Cloudera, Inc. All rights reserved.

    (Build version: Impala v0.5 (5a64c63) built on Fri Feb 1 11:21:22 PST
    2013)
    Successfully refreshed catalog
    [localhost:21000] > select * from wiki_small limit 10;
    Query: select * from wiki_small limit 10
    ERROR: com.cloudera.impala.common.**AnalysisException: Analysis
    exception (in select * from wiki_small limit 10)
    at com.cloudera.impala.analysis.**AnalysisContext.analyze(**
    AnalysisContext.java:133)
    at com.cloudera.impala.service.**Frontend.createExecRequest(**
    Frontend.java:221)
    at com.cloudera.impala.service.**JniFrontend.createExecRequest(**
    JniFrontend.java:89)
    Caused by: com.cloudera.impala.common.**AnalysisException: Failed to
    load metadata for table: wiki_small
    at com.cloudera.impala.analysis.**Analyzer.registerBaseTableRef(*
    *Analyzer.java:174)
    at com.cloudera.impala.analysis.**BaseTableRef.analyze(**
    BaseTableRef.java:51)
    at com.cloudera.impala.analysis.**SelectStmt.analyze(SelectStmt.*
    *java:115)
    at com.cloudera.impala.analysis.**AnalysisContext.analyze(**
    AnalysisContext.java:130)
    ... 2 more
    Caused by: com.cloudera.impala.catalog.**Db$TableLoadingException:
    Failed to load metadata for table: wiki_small
    at com.cloudera.impala.catalog.**HdfsTable.load(HdfsTable.java:**
    582)
    at com.cloudera.impala.catalog.**Table.load(Table.java:113)
    at com.cloudera.impala.catalog.**Db$LazyTableMap.loadTable(Db.**
    java:191)
    at com.cloudera.impala.catalog.**Db$LazyTableMap.access$000(Db.**
    java:112)
    at com.cloudera.impala.catalog.**Db$LazyTableMap$1.apply(Db.**
    java:118)
    at com.cloudera.impala.catalog.**Db$LazyTableMap$1.apply(Db.**
    java:116)
    at com.google.common.collect.**ComputingConcurrentHashMap$**
    ComputingValueReference.**compute(**ComputingConcurrentHashMap.**
    java:355)
    at com.google.common.collect.**ComputingConcurrentHashMap$**
    ComputingSegment.compute(**ComputingConcurrentHashMap.**java:184)
    at com.google.common.collect.**ComputingConcurrentHashMap$**
    ComputingSegment.getOrCompute(**ComputingConcurrentHashMap.**java:153)
    at com.google.common.collect.**ComputingConcurrentHashMap.**
    getOrCompute(**ComputingConcurrentHashMap.**java:69)
    at com.google.common.collect.**ComputingConcurrentHashMap$**
    ComputingMapAdapter.get(**ComputingConcurrentHashMap.**java:393)
    at com.cloudera.impala.catalog.**Db$LazyTableMap.get(Db.java:**
    160)
    at com.cloudera.impala.catalog.**Db.getTable(Db.java:254)
    at com.cloudera.impala.analysis.**Analyzer.registerBaseTableRef(*
    *Analyzer.java:172)
    ... 5 more
    Caused by: org.apache.hadoop.fs.s3.**S3Exception: org.jets3t.service.**S3ServiceException:
    Request Error. HEAD '/wikipedia-sample' on Host '
    spark-data.s3.amazonaws.com' @ 'Wed, 20 Feb 2013 18:49:49 GMT' --
    ResponseCode: 404, ResponseStatus: Not Found, RequestId: DDA22667E335D024,
    HostId: ....
    at org.apache.hadoop.fs.s3native.**Jets3tNativeFileSystemStore.**
    handleServiceException(**Jets3tNativeFileSystemStore.**java:245)
    at org.apache.hadoop.fs.s3native.**Jets3tNativeFileSystemStore.**
    retrieveMetadata(**Jets3tNativeFileSystemStore.**java:119)
    at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native
    Method)
    at sun.reflect.**NativeMethodAccessorImpl.**invoke(**
    NativeMethodAccessorImpl.java:**57)
    at sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
    DelegatingMethodAccessorImpl.**java:43)
    at java.lang.reflect.Method.**invoke(Method.java:616)
    at org.apache.hadoop.io.retry.**RetryInvocationHandler.**
    invokeMethod(**RetryInvocationHandler.java:**164)
    at org.apache.hadoop.io.retry.**RetryInvocationHandler.invoke(**
    RetryInvocationHandler.java:**83)
    at org.apache.hadoop.fs.s3native.**$Proxy12.retrieveMetadata(**Unknown
    Source)
    at org.apache.hadoop.fs.s3native.**NativeS3FileSystem.**
    getFileStatus(**NativeS3FileSystem.java:414)
    at org.apache.hadoop.fs.**FileSystem.exists(FileSystem.**
    java:1333)
    at com.cloudera.impala.catalog.**HdfsTable.addPartition(**
    HdfsTable.java:531)
    at com.cloudera.impala.catalog.**HdfsTable.loadPartitions(**
    HdfsTable.java:489)
    at com.cloudera.impala.catalog.**HdfsTable.load(HdfsTable.java:**
    578)
    ... 18 more
    Caused by: org.jets3t.service.**S3ServiceException: Request Error. HEAD
    '/wikipedia-sample' on Host 'spark-data.s3.amazonaws.com' @ 'Wed, 20 Feb
    2013 18:49:49 GMT' -- ResponseCode: 404, ResponseStatus: Not Found,
    RequestId: DDA22667E335D024, HostId: ....
    at org.jets3t.service.impl.rest.**httpclient.RestS3Service.**
    performRequest(RestS3Service.**java:520)
    at org.jets3t.service.impl.rest.**httpclient.RestS3Service.**
    performRestHead(RestS3Service.**java:868)
    at org.jets3t.service.impl.rest.**httpclient.RestS3Service.**
    getObjectImpl(RestS3Service.**java:2016)
    at org.jets3t.service.impl.rest.**httpclient.RestS3Service.**
    getObjectDetailsImpl(**RestS3Service.java:1944)
    at org.jets3t.service.S3Service.**getObjectDetails(S3Service.**
    java:3059)
    at org.jets3t.service.S3Service.**getObjectDetails(S3Service.**
    java:1940)
    at org.apache.hadoop.fs.s3native.**Jets3tNativeFileSystemStore.**
    retrieveMetadata(**Jets3tNativeFileSystemStore.**java:111)
    ... 30 more
    Caused by: org.jets3t.service.impl.rest.**HttpException
    at org.jets3t.service.impl.rest.**httpclient.RestS3Service.**
    performRequest(RestS3Service.**java:518)
    ... 36 more

    Seems Impala did not handle s3n file system properly because
    "/wikipedia-sample" is not a key and impala should do the same as hive to
    read the keys in this container and do not treat the container as a key.

    BTW, Impala version is 0.5.

    Thanks,

    Jian


    On Wednesday, February 20, 2013 9:51:50 AM UTC-8, Jian Fang wrote:

    Hi,

    I wonder how Impala supports file systems other than HDFS, for example,
    Amazon s3n. Is the support provided by Hadoop file system or Impala has to
    provide some native code to support them?

    Thanks in advance,

    Jian
  • Aaron A at May 13, 2013 at 10:21 pm
    Hari,

    Can you release some statistics data to show how slow between S3 and EC2
    (generally or using the Impala)? I am very interested in knowing why you
    saying so.
    Thanks,

    Aaron
    On Wednesday, May 8, 2013 7:12:43 PM UTC-4, Hari Sekhon wrote:

    Impala is designed for speed, S3 is slow, and even worse you wouldn't
    achieve data locality.

    On 8 May 2013 22:37, Aaron A <info1...@gmail.com <javascript:>> wrote:

    Jian,

    I don't think Impala support s3 directly. It is logical but somehow they
    don't.

    Aaron
    On Wednesday, February 20, 2013 2:00:51 PM UTC-5, Jian Fang wrote:

    I followed the instruction on https://github.com/amplab/**
    shark/wiki/Running-Shark-on-**EC2<https://github.com/amplab/shark/wiki/Running-Shark-on-EC2>to create a hive table as follows

    CREATE EXTERNAL TABLE wiki_small (id BIGINT, title STRING, last_modified STRING, xml STRING, text STRING)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3n://spark-data/wikipedia-**sample/';

    I can read the data from the hive table wiki_small without any problem.

    hive > select * from wiki_small limit 10;

    But when I did the same for impala-shell, I got the following errors:

    $ impala-shell -r -i localhost:21000
    Connected to localhost:21000
    Welcome to the Impala shell. Press TAB twice to see a list of available
    commands.

    Copyright (c) 2012 Cloudera, Inc. All rights reserved.

    (Build version: Impala v0.5 (5a64c63) built on Fri Feb 1 11:21:22 PST
    2013)
    Successfully refreshed catalog
    [localhost:21000] > select * from wiki_small limit 10;
    Query: select * from wiki_small limit 10
    ERROR: com.cloudera.impala.common.**AnalysisException: Analysis
    exception (in select * from wiki_small limit 10)
    at com.cloudera.impala.analysis.**AnalysisContext.analyze(**
    AnalysisContext.java:133)
    at com.cloudera.impala.service.**Frontend.createExecRequest(**
    Frontend.java:221)
    at com.cloudera.impala.service.**JniFrontend.createExecRequest(*
    *JniFrontend.java:89)
    Caused by: com.cloudera.impala.common.**AnalysisException: Failed to
    load metadata for table: wiki_small
    at com.cloudera.impala.analysis.**Analyzer.registerBaseTableRef(
    **Analyzer.java:174)
    at com.cloudera.impala.analysis.**BaseTableRef.analyze(**
    BaseTableRef.java:51)
    at com.cloudera.impala.analysis.**SelectStmt.analyze(SelectStmt.
    **java:115)
    at com.cloudera.impala.analysis.**AnalysisContext.analyze(**
    AnalysisContext.java:130)
    ... 2 more
    Caused by: com.cloudera.impala.catalog.**Db$TableLoadingException:
    Failed to load metadata for table: wiki_small
    at com.cloudera.impala.catalog.**HdfsTable.load(HdfsTable.java:*
    *582)
    at com.cloudera.impala.catalog.**Table.load(Table.java:113)
    at com.cloudera.impala.catalog.**Db$LazyTableMap.loadTable(Db.**
    java:191)
    at com.cloudera.impala.catalog.**Db$LazyTableMap.access$000(Db.*
    *java:112)
    at com.cloudera.impala.catalog.**Db$LazyTableMap$1.apply(Db.**
    java:118)
    at com.cloudera.impala.catalog.**Db$LazyTableMap$1.apply(Db.**
    java:116)
    at com.google.common.collect.**ComputingConcurrentHashMap$**
    ComputingValueReference.**compute(**ComputingConcurrentHashMap.**
    java:355)
    at com.google.common.collect.**ComputingConcurrentHashMap$**
    ComputingSegment.compute(**ComputingConcurrentHashMap.**java:184)
    at com.google.common.collect.**ComputingConcurrentHashMap$**
    ComputingSegment.getOrCompute(**ComputingConcurrentHashMap.**java:153)
    at com.google.common.collect.**ComputingConcurrentHashMap.**
    getOrCompute(**ComputingConcurrentHashMap.**java:69)
    at com.google.common.collect.**ComputingConcurrentHashMap$**
    ComputingMapAdapter.get(**ComputingConcurrentHashMap.**java:393)
    at com.cloudera.impala.catalog.**Db$LazyTableMap.get(Db.java:**
    160)
    at com.cloudera.impala.catalog.**Db.getTable(Db.java:254)
    at com.cloudera.impala.analysis.**Analyzer.registerBaseTableRef(
    **Analyzer.java:172)
    ... 5 more
    Caused by: org.apache.hadoop.fs.s3.**S3Exception: org.jets3t.service.**S3ServiceException:
    Request Error. HEAD '/wikipedia-sample' on Host '
    spark-data.s3.amazonaws.com' @ 'Wed, 20 Feb 2013 18:49:49 GMT' --
    ResponseCode: 404, ResponseStatus: Not Found, RequestId: DDA22667E335D024,
    HostId: ....
    at org.apache.hadoop.fs.s3native.**Jets3tNativeFileSystemStore.*
    *handleServiceException(**Jets3tNativeFileSystemStore.**java:245)
    at org.apache.hadoop.fs.s3native.**Jets3tNativeFileSystemStore.*
    *retrieveMetadata(**Jets3tNativeFileSystemStore.**java:119)
    at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native
    Method)
    at sun.reflect.**NativeMethodAccessorImpl.**invoke(**
    NativeMethodAccessorImpl.java:**57)
    at sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
    DelegatingMethodAccessorImpl.**java:43)
    at java.lang.reflect.Method.**invoke(Method.java:616)
    at org.apache.hadoop.io.retry.**RetryInvocationHandler.**
    invokeMethod(**RetryInvocationHandler.java:**164)
    at org.apache.hadoop.io.retry.**RetryInvocationHandler.invoke(**
    RetryInvocationHandler.java:**83)
    at org.apache.hadoop.fs.s3native.**$Proxy12.retrieveMetadata(**Unknown
    Source)
    at org.apache.hadoop.fs.s3native.**NativeS3FileSystem.**
    getFileStatus(**NativeS3FileSystem.java:414)
    at org.apache.hadoop.fs.**FileSystem.exists(FileSystem.**
    java:1333)
    at com.cloudera.impala.catalog.**HdfsTable.addPartition(**
    HdfsTable.java:531)
    at com.cloudera.impala.catalog.**HdfsTable.loadPartitions(**
    HdfsTable.java:489)
    at com.cloudera.impala.catalog.**HdfsTable.load(HdfsTable.java:*
    *578)
    ... 18 more
    Caused by: org.jets3t.service.**S3ServiceException: Request Error. HEAD
    '/wikipedia-sample' on Host 'spark-data.s3.amazonaws.com' @ 'Wed, 20
    Feb 2013 18:49:49 GMT' -- ResponseCode: 404, ResponseStatus: Not Found,
    RequestId: DDA22667E335D024, HostId: ....
    at org.jets3t.service.impl.rest.**httpclient.RestS3Service.**
    performRequest(RestS3Service.**java:520)
    at org.jets3t.service.impl.rest.**httpclient.RestS3Service.**
    performRestHead(RestS3Service.**java:868)
    at org.jets3t.service.impl.rest.**httpclient.RestS3Service.**
    getObjectImpl(RestS3Service.**java:2016)
    at org.jets3t.service.impl.rest.**httpclient.RestS3Service.**
    getObjectDetailsImpl(**RestS3Service.java:1944)
    at org.jets3t.service.S3Service.**getObjectDetails(S3Service.**
    java:3059)
    at org.jets3t.service.S3Service.**getObjectDetails(S3Service.**
    java:1940)
    at org.apache.hadoop.fs.s3native.**Jets3tNativeFileSystemStore.*
    *retrieveMetadata(**Jets3tNativeFileSystemStore.**java:111)
    ... 30 more
    Caused by: org.jets3t.service.impl.rest.**HttpException
    at org.jets3t.service.impl.rest.**httpclient.RestS3Service.**
    performRequest(RestS3Service.**java:518)
    ... 36 more

    Seems Impala did not handle s3n file system properly because
    "/wikipedia-sample" is not a key and impala should do the same as hive to
    read the keys in this container and do not treat the container as a key.

    BTW, Impala version is 0.5.

    Thanks,

    Jian


    On Wednesday, February 20, 2013 9:51:50 AM UTC-8, Jian Fang wrote:

    Hi,

    I wonder how Impala supports file systems other than HDFS, for example,
    Amazon s3n. Is the support provided by Hadoop file system or Impala has to
    provide some native code to support them?

    Thanks in advance,

    Jian

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedFeb 20, '13 at 5:51p
activeMay 13, '13 at 10:21p
posts5
users3
websitecloudera.com
irc#hadoop

3 users in discussion

Aaron A: 2 posts Jian Fang: 2 posts Hari Sekhon: 1 post

People

Translate

site design / logo © 2022 Grokbase