Hi Hadoop, Hive, and Sqoop users,
For the past year, the Apache Hadoop MapReduce project has played host to
Sqoop, a command-line tool that performs parallel imports and exports
between relational databases and HDFS. We've developed a lot of features and
gotten a lot of great feedback from users. While Sqoop was a contrib project
in Hadoop, it has been steadily improved and grown.
But the contrib directory is a home for new or small projects incubating
underneath Hadoop's umbrella. Sqoop is starting to look less like a small
project these days. In particular, a feature that has been growing in
importance for Sqoop is its ability to integrate with Hive. In order to
facilitate this integration from a compilation and testing standpoint, we've
pulled Sqoop out of contrib and into its own repository hosted on github.
You can download all the relevant bits here:
The code there will run in conjunction with the Apache Hadoop trunk source.
(Compatibility with other distributions/versions is forthcoming.)
While we've changed hosts, Sqoop will keep the same license -- future
improvements will continue to remain Apache 2.0-licensed. We welcome the
contributions of all in the open source community; there's a lot of exciting
work still to be done! If you'd like to help out but aren't sure where to
start, send me an email and I can recommend a few areas where improvements
would be appreciated.
Want some more information about Sqoop? An introduction is available here:
A ready-to-run release of Sqoop is included with Cloudera's Distribution for
And its reference manual is available for browsing at
If you have any questions about this move process, please ask me.
- Aaron Kimball