FAQ
Hello,

Nutch is stalling in the fetch process. I've run it twice now, and it is
stopping on the *same* URL both times. I don't get what's going on!

The last status report was:
060810 145315 status: segment 20060810142649, 7900 pages, 14 errors,
98421231 bytes, 1571224 ms
060810 145315 status: 5.0279274 pages/s, 489.3738 kb/s, 12458.384 bytes/page

Then, exactly 94 documents later with no errors in between, it just stops.
On what appears to be a perfectly normal URL and a perfectly normal page. I
don't get it.

How can I debug this situation further, to see what's going on?

I'm really frustrated since I don't know where to start looking.

Nutch is still running, taking up a lot of CPU. I don't want to kill it
unless it really stuck. How can I tell?

Ben

Search Discussions

  • Benjamin Higgins at Aug 10, 2006 at 10:35 pm
    Further details:

    If I run strace on the process, it looks like this, over and over and over:

    gettimeofday({1155249187, 999952}, NULL) = 0
    gettimeofday({1155249188, 389}, NULL) = 0
    gettimeofday({1155249188, 679}, NULL) = 0
    gettimeofday({1155249188, 955}, NULL) = 0
    clock_gettime(CLOCK_REALTIME, {1155249188, 1235000}) = 0
    futex(0xb1f0185c, FUTEX_WAIT, 7163, {0, 999720000}) = -1 ETIMEDOUT
    (Connection timed out)
    futex(0x805d250, FUTEX_WAKE, 1) = 0
    futex(0x805c378, FUTEX_WAIT, 2, NULL) = 0
    futex(0x805c378, FUTEX_WAKE, 1) = 0

    I'm afraid I don't know how to go about finding what part of the code might
    be causing this...

    Any ideas?

    Ben
    On 8/10/06, Benjamin Higgins wrote:

    Hello,

    Nutch is stalling in the fetch process. I've run it twice now, and it is
    stopping on the *same* URL both times. I don't get what's going on!

    The last status report was:
    060810 145315 status: segment 20060810142649, 7900 pages, 14 errors,
    98421231 bytes, 1571224 ms
    060810 145315 status: 5.0279274 pages/s, 489.3738 kb/s, 12458.384bytes/page

    Then, exactly 94 documents later with no errors in between, it just
    stops. On what appears to be a perfectly normal URL and a perfectly normal
    page. I don't get it.

    How can I debug this situation further, to see what's going on?

    I'm really frustrated since I don't know where to start looking.

    Nutch is still running, taking up a lot of CPU. I don't want to kill it
    unless it really stuck. How can I tell?

    Ben
  • Andrzej Bialecki at Aug 11, 2006 at 7:39 am

    Benjamin Higgins wrote:
    Further details:

    If I run strace on the process, it looks like this, over and over and
    over:
    This doesn't really say anything, do a thread dump instead (Ctrl-Break
    in Windows, kill -SIGQUIT in Unix).

    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupnutch-user @
categorieslucene
postedAug 10, '06 at 10:04p
activeAug 11, '06 at 7:39a
posts3
users2
websitenutch.apache.org

People

Translate

site design / logo © 2022 Grokbase