Search Discussions
-
Hello, I have looked a little into nutch code and mailing lists. I think the nutchbase branch (http://issues.apache.org/jira/browse/NUTCH-650) is very interesting, with a good potential to improve ...
Alban Mouton
Dec 5, 2009 at 2:57 pm
Dec 5, 2009 at 2:57 pm -
Retry interval in crawl date is set to 0 ---------------------------------------- Key: NUTCH-774 URL: https://issues.apache.org/jira/browse/NUTCH-774 Project: Nutch Issue Type: Bug Components: ...
Reinhard Schwab (JIRA)
Dec 2, 2009 at 12:06 pm
Dec 2, 2009 at 12:12 pm -
See <http://hudson.zones.apache.org/hudson/job/Nutch-trunk/998/ ------------------------------------------ A timer trigger started this job Building remotely on lucene.zones.apache.org (Solaris 10) ...
Apache Hudson Server
Dec 1, 2009 at 4:03 am
Dec 5, 2009 at 4:54 am -
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "Automating_Fetches_with_Python" page has been changed by newacct. ...
Apache Wiki
Nov 29, 2009 at 3:19 am
Nov 29, 2009 at 3:19 am -
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "FrontPage" page has been changed by Davinder. ...
Apache Wiki
Nov 27, 2009 at 5:49 pm
Nov 27, 2009 at 5:58 pm -
[ https://issues.apache.org/jira/browse/NUTCH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-185. ------------------------------------- ...
Chris A. Mattmann (JIRA)
Nov 26, 2009 at 3:17 am
Nov 26, 2009 at 3:17 am -
some minor bugs in AbstractFetchSchedule.java --------------------------------------------- Key: NUTCH-773 URL: https://issues.apache.org/jira/browse/NUTCH-773 Project: Nutch Issue Type: Bug ...
Reinhard Schwab (JIRA)
Nov 25, 2009 at 2:16 pm
Nov 28, 2009 at 2:09 pm -
Oops. Sorry about that. ab@apache.org wrote:
Dennis Kubes
Nov 25, 2009 at 1:37 pm
Nov 25, 2009 at 4:02 pm -
Upgrade Nutch to use Lucene 2.9.1 --------------------------------- Key: NUTCH-772 URL: https://issues.apache.org/jira/browse/NUTCH-772 Project: Nutch Issue Type: Improvement Affects Versions: 1.1 ...
Andrzej Bialecki (JIRA)
Nov 25, 2009 at 12:37 pm
Nov 28, 2009 at 2:09 pm -
Add WebGraph classes to the bin/nutch script -------------------------------------------- Key: NUTCH-771 URL: https://issues.apache.org/jira/browse/NUTCH-771 Project: Nutch Issue Type: Improvement ...
Dennis Kubes (JIRA)
Nov 24, 2009 at 8:48 pm
Nov 24, 2009 at 9:26 pm -
Hello everybody, I don't know if it is a known issue, but it's been like that since at least a couple of days so I figured I should tell someone. The root url for the nutch wiki ...
Alban Mouton
Nov 24, 2009 at 4:46 pm
Nov 30, 2009 at 4:22 pm -
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "FrontPage" page has been changed by DennisKubes. ...
Apache Wiki
Nov 24, 2009 at 4:00 pm
Nov 24, 2009 at 4:00 pm -
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "OptimizingCrawls" page has been changed by DennisKubes. The comment on this change ...
Apache Wiki
Nov 24, 2009 at 3:59 pm
Nov 24, 2009 at 3:59 pm -
David Stuart
Nov 24, 2009 at 10:57 am
Nov 24, 2009 at 10:00 pm -
Timebomb for Fetcher -------------------- Key: NUTCH-770 URL: https://issues.apache.org/jira/browse/NUTCH-770 Project: Nutch Issue Type: Improvement Reporter: Julien Nioche This patch provides the ...
Julien Nioche (JIRA)
Nov 23, 2009 at 11:24 am
Dec 5, 2009 at 4:51 pm -
Fetcher to skip queues for URLS getting repeated exceptions ------------------------------------------------------------- Key: NUTCH-769 URL: https://issues.apache.org/jira/browse/NUTCH-769 Project: ...
Julien Nioche (JIRA)
Nov 23, 2009 at 11:06 am
Dec 1, 2009 at 3:16 pm -
Hi Now Hbase 0.20 as random access performance on par with open source relational databases such as MySQ. Can we move the main databases to Hbase and make it easy to add extra fields.
Work only
Nov 23, 2009 at 7:24 am
Nov 23, 2009 at 7:24 am -
Upgrade Nutch 1.0 to use Hadoop 0.20 ------------------------------------ Key: NUTCH-768 URL: https://issues.apache.org/jira/browse/NUTCH-768 Project: Nutch Issue Type: Improvement Affects Versions: ...
Dennis Kubes (JIRA)
Nov 21, 2009 at 11:40 pm
Dec 1, 2009 at 2:59 pm -
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "NutchHadoopTutorial" page has been changed by ilgiz. ...
Apache Wiki
Nov 18, 2009 at 5:24 pm
Nov 18, 2009 at 5:24 pm -
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "NutchHadoopTutorial" page has been changed by ilgiz. The comment on this change is: ...
Apache Wiki
Nov 18, 2009 at 5:24 pm
Nov 18, 2009 at 5:24 pm -
Update version of Tika for the MimeType detection ------------------------------------------------- Key: NUTCH-767 URL: https://issues.apache.org/jira/browse/NUTCH-767 Project: Nutch Issue Type: ...
Julien Nioche (JIRA)
Nov 18, 2009 at 2:59 pm
Dec 5, 2009 at 2:25 pm -
Tika parser ----------- Key: NUTCH-766 URL: https://issues.apache.org/jira/browse/NUTCH-766 Project: Nutch Issue Type: New Feature Reporter: Julien Nioche Tika handles a lot of different formats ...
Julien Nioche (JIRA)
Nov 18, 2009 at 2:51 pm
Nov 18, 2009 at 2:55 pm -
How can I filter certain pages like Privacy Policies, Terms and conditions etc from crawling, because all these pages contains bogus information. I am new to nutch. Please let me know about this. ...
Sumittyagi
Nov 17, 2009 at 6:49 pm
Nov 17, 2009 at 6:49 pm -
Hi, I came across the classloader issue that you mentioned but got everything to work OK by duplicating the class TikaConfiguration into the package used by my plugin. The lib tika-core goes into the ...
Julien Nioche
Nov 16, 2009 at 7:13 pm
Nov 17, 2009 at 6:03 pm -
David Stuart
Nov 14, 2009 at 3:40 pm
Nov 14, 2009 at 7:54 pm -
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "RunNutchInEclipse1.0" page has been changed by AnasElghafari. ...
Apache Wiki
Nov 14, 2009 at 10:28 am
Nov 14, 2009 at 10:28 am -
Hello. If i'm right, Nutch is able to indexing files from Office 2007, but can't treat the content. Is there a way to do that, so i can search and show the words of the content? Thanks. -- View this ...
BrunoWL
Nov 13, 2009 at 4:32 pm
Nov 13, 2009 at 4:32 pm -
Allow Crawl class to call Either Solr or Lucene Indexer ------------------------------------------------------- Key: NUTCH-765 URL: https://issues.apache.org/jira/browse/NUTCH-765 Project: Nutch ...
Dennis Kubes (JIRA)
Nov 12, 2009 at 9:01 pm
Nov 28, 2009 at 2:09 pm -
Hi. i'm a benning in nutch. Can anybody tell how to make nutch use parsers from tika. I did all kind of search and didn't find a answer. thanks. -- View this message in context: ...
BrunoWL
Nov 10, 2009 at 6:28 pm
Nov 12, 2009 at 3:06 pm -
I was wondering what the process is for getting patches approved for adding to trunk is it a vote based thing? Is there a big review of patches when a pending release is close? The reason for asking ...
David Stuart
Nov 10, 2009 at 10:54 am
Nov 10, 2009 at 8:03 pm -
Add support for vfsfile:// loading of plugins for JBoss ------------------------------------------------------- Key: NUTCH-764 URL: https://issues.apache.org/jira/browse/NUTCH-764 Project: Nutch ...
tcurran@approachingpi.com (JIRA)
Nov 10, 2009 at 1:25 am
Nov 10, 2009 at 8:23 pm -
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "FrontPage" page has been changed by TerrenceCurran. ...
Apache Wiki
Nov 10, 2009 at 1:08 am
Nov 10, 2009 at 1:08 am -
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "GettingNutchRunningWithJboss" page has been changed by TerrenceCurran. ...
Apache Wiki
Nov 10, 2009 at 1:06 am
Nov 10, 2009 at 1:06 am -
See <http://hudson.zones.apache.org/hudson/job/Nutch-trunk/985/ ------------------------------------------ A timer trigger started this job Building remotely on lucene.zones.apache.org (Solaris 10) ...
Apache Hudson Server
Nov 7, 2009 at 4:03 am
Nov 8, 2009 at 5:41 am -
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "Presentations" page has been changed by AndrzejBialecki. ...
Apache Wiki
Nov 6, 2009 at 5:34 pm
Nov 6, 2009 at 5:34 pm -
Dear Wiki user, You have subscribed to a wiki page "Presentations" for change notification. An attachment has been added to that page by AndrzejBialecki. Following detailed information is available: ...
Apache Wiki
Nov 6, 2009 at 5:29 pm
Nov 6, 2009 at 6:42 pm -
Separate configuration files from resources to be included in the job file -------------------------------------------------------------------------- Key: NUTCH-763 URL: ...
Julien Nioche (JIRA)
Nov 5, 2009 at 6:34 pm
Nov 5, 2009 at 6:34 pm -
Hi there, seems i have some serious problems with hadoop during map-reduce for MergeSegments. i am out of ideas on this. Any suggestions will be quite welcome. Here is my set up: RAM: 4G JVM HEAP: 2G ...
Fadzi
Nov 5, 2009 at 1:31 pm
Nov 5, 2009 at 1:31 pm -
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "ApacheConUs2009MeetUp" page has been changed by AndrzejBialecki. ...
Apache Wiki
Nov 4, 2009 at 10:12 pm
Nov 4, 2009 at 10:12 pm -
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "ApacheConUs2009MeetUp" page has been changed by KenKrugler. ...
Apache Wiki
Nov 4, 2009 at 10:03 pm
Nov 5, 2009 at 12:09 pm -
Team, For those Lucene fanatics not in Oakland this week for ApacheCon US, don't miss the FREE live video streaming, starting today: http://streaming.linux-magazin.de/en/program-apachecon-us-2009.htm ...
Michael McCandless
Nov 4, 2009 at 1:26 pm
Nov 4, 2009 at 10:47 pm -
Alternative Generator which can generate several segments in one parse of the crawlDB ------------------------------------------------------------------------------------- Key: NUTCH-762 URL: ...
Julien Nioche (JIRA)
Nov 3, 2009 at 3:04 pm
Nov 25, 2009 at 5:39 pm -
Avoid cloningCrawlDatum in CrawlDbReducer ------------------------------------------ Key: NUTCH-761 URL: https://issues.apache.org/jira/browse/NUTCH-761 Project: Nutch Issue Type: Improvement ...
Julien Nioche (JIRA)
Nov 3, 2009 at 2:40 pm
Nov 28, 2009 at 2:10 pm -
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "DownloadingNutch" page has been changed by SteveKearns. ...
Apache Wiki
Oct 27, 2009 at 11:26 pm
Oct 27, 2009 at 11:26 pm -
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "ApacheConUs2009MeetUp" page has been changed by KenKrugler. ...
Apache Wiki
Oct 27, 2009 at 1:13 pm
Oct 27, 2009 at 1:13 pm -
Hi, I've create parser and indexer to specific file type(geo xml meta file - kml). I am trying to crawl couple of sites, and index only files of this type. I don't want to index html or anything ...
Dmitriy Fundak
Oct 26, 2009 at 9:12 am
Oct 26, 2009 at 9:12 am -
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "首页" page has been changed by yongping8204. ...
Apache Wiki
Oct 24, 2009 at 4:38 pm
Oct 24, 2009 at 4:38 pm -
I tried asking this over at the nutch-user alias, but I am seeing very little traction, so I thought I'd ask the developers. I realize this is most likely a configuration problem on my end, but I am ...
Jesse Hires
Oct 20, 2009 at 11:22 pm
Oct 22, 2009 at 4:29 am -
Hi, I just noticed that Niocchi has been released recently. http://www.niocchi.com/ Niocchi is a java asynchronous crawl library implemented with NIO. It is designed to crawl several thousands of ...
Lukáš Vlček
Oct 18, 2009 at 11:14 am
Oct 19, 2009 at 2:54 pm -
If I want to rename nutch files, folders to my own projects name. The project is using lot of plugins, those plugins also having packages... Original-- org.apache.nutch.analysis Updated/renamed -- ...
Fredericoagent
Oct 18, 2009 at 10:06 am
Oct 18, 2009 at 11:34 am
Group Overview
group | nutch-dev |
categories | lucene |
discussions | 2,967 |
posts | 11,566 |
users | 736 |
website | nutch.apache.org |
Top users
Archives
- December 2009 (31)
- November 2009 (154)
- October 2009 (88)
- September 2009 (32)
- August 2009 (82)
- July 2009 (77)
- June 2009 (94)
- May 2009 (104)
- April 2009 (85)
- March 2009 (255)
- February 2009 (250)
- January 2009 (197)
- December 2008 (158)
- November 2008 (117)
- October 2008 (84)
- September 2008 (101)
- August 2008 (58)
- July 2008 (32)
- June 2008 (93)
- May 2008 (57)
- April 2008 (78)
- March 2008 (152)
- February 2008 (190)
- January 2008 (155)
- December 2007 (68)
- November 2007 (188)
- October 2007 (179)
- September 2007 (190)
- August 2007 (136)
- July 2007 (283)
- June 2007 (241)
- May 2007 (187)
- April 2007 (145)
- March 2007 (285)
- February 2007 (242)
- January 2007 (266)
- December 2006 (103)
- November 2006 (223)
- October 2006 (188)
- September 2006 (166)
- August 2006 (282)
- July 2006 (180)
- June 2006 (262)
- May 2006 (284)
- April 2006 (246)
- March 2006 (308)
- February 2006 (348)
- January 2006 (556)
- December 2005 (417)
- November 2005 (287)
- October 2005 (315)
- September 2005 (340)
- August 2005 (430)
- July 2005 (233)
- June 2005 (192)
- May 2005 (139)
- April 2005 (517)
- March 2005 (410)
- February 2005 (10)