Hi,
I am trying to write some output files, using webhdfs from a python script,
to the hdfs over network. I get "HTTP Response: 404, Not Found" in the
response and if I copy paste the resultant URL on the browser address bar,
I get:
The resultant URL -
http://node02.expressanalytics.net:50075/webhdfs/v1/user/sunita/Linkedin/JobSearch?op=CREATE&user.name=sunita&namenoderpcaddress=node01.expressanalytics.net:8020&overwrite=true&replication=1
{"RemoteException":{"exception":"IllegalArgumentException","javaClassName":"java.lang.IllegalArgumentException","message":"Invalid
value for webhdfs parameter \"op\": No enum const class
org.apache.hadoop.hdfs.web.resources.GetOpParam$Op.CREATE"}}
Here is what I am attempting:
1. Using the python library -
https://github.com/carlosmarin/webhdfs-py/blob/master/webhdfs/webhdfs.py
2. Is my code snippet:
from ForkedWebHDFS import WebHDFS #I have saved this
library as ForkedWebHDFS.py in my CWD
webhdfs = WebHDFS("192.168.1.61", 50070, "sunita") #The name node
make_rest_api_calls()
create_files()
webhdfs.copyfromlocal(respFile,"/user/sunita/Linkedin/JobSearch")
3. The debug messages are as below:
07/22/2013 03:02:09 PM - webhdfs - DEBUG - HTTP Response: 307,
TEMPORARY_REDIRECT
07/22/2013 03:02:09 PM - webhdfs - DEBUG - HTTP Response: 307,
TEMPORARY_REDIRECT
07/22/2013 03:02:09 PM - webhdfs - DEBUG - =============
07/22/2013 03:02:09 PM - webhdfs - DEBUG - HTTP Location:
http://node02.expressanalytics.net:50075/webhdfs/v1/user/sunita/Linkedin/JobSearch?op=CREATE&user.name
=sunita&namenoderpcaddress=node01.expressanalytics.net:8020&overwrite=true
07/22/2013 <http://node02.expressanalytics.net:50075/webhdfs/v1/user/sunita/Linkedin/JobSearch?op=CREATE&user.name=sunita&namenoderpcaddress=node01.expressanalytics.net:8020&overwrite=true07/22/2013>
03:02:09 PM - webhdfs - DEBUG - *********************
07/22/2013 03:02:09 PM - webhdfs - DEBUG - Redirect: host:
node02.expressanalytics.net, port: 50075, path:
webhdfs/v1/user/sunita/Linkedin/JobSearch?op=CREATE&user.name=sunita&namenoderpcaddress=node01.expressanalytics.net:8020&overwrite=true&replication=1
07/22/2013 03:02:10 PM - webhdfs - DEBUG - *********************
07/22/2013 03:02:10 PM - webhdfs - DEBUG - HTTP Response: 404, Not Found
Traceback (most recent call last):
4. Here is the trouble shooting done:
i. ensured the data nodes have the below flag set in
hdfs-site.xml on all the machines that are being accessed :
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
ii. restarted the cluster after changes. (The hdfs-site.xml file
had a header that it is autogenerated, hence after restart, I
re-checked to make sure the setting was not overwritten. Its still
set)
iii. I am using CDH4.3:
[sunita@node01 etc]$ hadoop version
Hadoop 2.0.0-cdh4.3.0
Subversion
file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2.0.0-cdh4.3.0/src/hadoop-common-project/hadoop-common
-r 48a9315b342ca16de92fcc5be95ae3650629155a
Compiled by jenkins on Mon May 27 19:45:25 PDT 2013
From source with checksum a4218d77f9b12df4e3e49ef96f9d357d
This command was run using
/usr/lib/hadoop/hadoop-common-2.0.0-cdh4.3.0.jar
iv. The output dir /user/sunita/Linkedin/JobSearch exists and I have
write access
Appreciate your help.
regards,
Sunita K
Express Analytics