|
Hadoop QA (JIRA) |
at Jun 6, 2009 at 10:50 am
|
⇧ |
| |
[
https://issues.apache.org/jira/browse/HADOOP-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716864#action_12716864 ]
Hadoop QA commented on HADOOP-5967:
-----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12409836/single-mapper.patchagainst trunk revision 782083.
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 Eclipse classpath. The patch retains Eclipse classpath integrity.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed core unit tests.
-1 contrib tests. The patch failed contrib unit tests.
Test results:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/472/testReport/Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/472/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.htmlCheckstyle results:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/472/artifact/trunk/build/test/checkstyle-errors.htmlConsole output:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/472/consoleThis message is automatically generated.
Sqoop should only use a single map task
---------------------------------------
Key: HADOOP-5967
URL:
https://issues.apache.org/jira/browse/HADOOP-5967Project: Hadoop Core
Issue Type: Improvement
Reporter: Aaron Kimball
Assignee: Aaron Kimball
Priority: Minor
Attachments: single-mapper.patch
The current DBInputFormat implementation uses SELECT ... LIMIT ... OFFSET statements to read from a database table. This actually results in several queries all accessing the same table at the same time. Most database implementations will actually use a full table scan for each such query, starting at row 1 and scanning down until the OFFSET is reached before emitting data to the client. The upshot of this is that we see O(n^2) performance in the size of the table when using a large number of mappers, when a single mapper would read through the table in O(n) time in the number of rows.
This patch sets the number of map tasks to 1 in the MapReduce job sqoop launches.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.