FAQ
Good evening,

I have built an Rtree on HDFS, in order to improve the query performance of high-selectivity spatial queries.
The Rtree is composed of a number of hdfs files (each one created by one Reducer, so as the number of the files is equal to the number of the reducers), where each file is a subtree of the root of the Rtree.
I investigate the way to use the Rtree in an efficient way, with respect to the locality of each file on hdfs (data-placement).


I would like to ask, if it is possible to read a file which is on hdfs, from a java application (not MapReduce).
In case this is not possible (as I believe), either I should download the files on the local filesystem (which is not a solution, since the files could be very large), orrun the queries using the Hadoop.
In order to maximise the gain, I should probably process a batch of queries during each Job, and run each query on a node that is "near" to the files that are involved in handling the specific query.

Can I find the node where each file is located (or at least most of its blocks), and run on that node a reducer that handles these queries? Could the function  DFSClient.getBlockLocations() help ?

Thank you in advance,
Sofia

Search Discussions

  • Robert Evans at Jul 25, 2011 at 10:01 pm
    Sofia,

    You can access any HDFS file from a normal java application so long as your classpath and some configuration is set up correctly. That is all that the hadoop jar command does. It is a shell script that sets up the environment for java to work with Hadoop. Look at the example for the Tool Class

    http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/Tool.html

    If you delete the JobConf stuff you can then just talk to the FIleSystem by doing the following

    Path p = new Path("URI OF FILE TO OPEN");
    FileSystem fs = p.getFileSystem(conf);
    InputStream in = fs.open(p);

    Now you can use in to read your data. Just be sure to close it when you are done.

    --Bobby Evans



    On 7/25/11 4:40 PM, "Sofia Georgiakaki" wrote:

    Good evening,

    I have built an Rtree on HDFS, in order to improve the query performance of high-selectivity spatial queries.
    The Rtree is composed of a number of hdfs files (each one created by one Reducer, so as the number of the files is equal to the number of the reducers), where each file is a subtree of the root of the Rtree.
    I investigate the way to use the Rtree in an efficient way, with respect to the locality of each file on hdfs (data-placement).


    I would like to ask, if it is possible to read a file which is on hdfs, from a java application (not MapReduce).
    In case this is not possible (as I believe), either I should download the files on the local filesystem (which is not a solution, since the files could be very large), orrun the queries using the Hadoop.
    In order to maximise the gain, I should probably process a batch of queries during each Job, and run each query on a node that is "near" to the files that are involved in handling the specific query.

    Can I find the node where each file is located (or at least most of its blocks), and run on that node a reducer that handles these queries? Could the function DFSClient.getBlockLocations() help ?

    Thank you in advance,
    Sofia
  • Joey Echeverria at Jul 25, 2011 at 11:15 pm
    To add to what Bobby said, you can get block locations with
    fs.getFileBlockLocations() if you want to open based on locality.

    -Joey
    On Mon, Jul 25, 2011 at 3:00 PM, Robert Evans wrote:
    Sofia,

    You can access any HDFS file from a normal java application so long as your classpath and some configuration is set up correctly.  That is all that the hadoop jar command does.  It is a shell script that sets up the environment for java to work with Hadoop.  Look at the example for the Tool Class

    http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/Tool.html

    If you delete the JobConf stuff you can then just talk to the FIleSystem by doing the following

    Path p = new Path("URI OF FILE TO OPEN");
    FileSystem fs = p.getFileSystem(conf);
    InputStream in = fs.open(p);

    Now you can use in to read your data.  Just be sure to close it when you are done.

    --Bobby Evans



    On 7/25/11 4:40 PM, "Sofia Georgiakaki" wrote:

    Good evening,

    I have built an Rtree on HDFS, in order to improve the query performance of high-selectivity spatial queries.
    The Rtree is composed of a number of hdfs files (each one created by one Reducer, so as the number of the files is equal to the number of the reducers), where each file is a subtree of the root of the Rtree.
    I investigate the way to use the Rtree in an efficient way, with respect to the locality of each file on hdfs (data-placement).


    I would like to ask, if it is possible to read a file which is on hdfs, from a java application (not MapReduce).
    In case this is not possible (as I believe), either I should download the files on the local filesystem (which is not a solution, since the files could be very large), orrun the queries using the Hadoop.
    In order to maximise the gain, I should probably process a batch of queries during each Job, and run each query on a node that is "near" to the files that are involved in handling the specific query.

    Can I find the node where each file is located (or at least most of its blocks), and run on that node a reducer that handles these queries? Could the function  DFSClient.getBlockLocations() help ?

    Thank you in advance,
    Sofia


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 25, '11 at 9:41p
activeJul 25, '11 at 11:15p
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase