FAQ
Has anybody had any experience bypassing ExtractingRequestHandler and
simply managing Tika manually? I want to make a small modification to Tika
to get and save additional data from my PDFs, but I have been
procrastinating in no small part due to the unpleasant prospect of setting
up a development environment where I could compile and debug modifications
that might run through PDFBox, Tika, and ExtractingRequestHandler. It
occurs to me that it would be much easier if the two were separate, so I
could have direct control over Tika and just submit the text to Solr after
extraction. Am I going to regret this approach? I'm not sure what
ExtractingRequestHandler really does for me that Tika doesn't already do.

Also, I was reading this
<http://stackoverflow.com/questions/33292776/solr-tika-processor-not-crawling-my-pdf-files-prefectly>
stackoverflow entry and someone offhandedly mentioned that
ExtractingRequestHandler might be separated in the future anyway. Is there
a public roadmap for the project, or does one have to keep up with the
developer's mailing list and hunt through JIRA entries to keep up with the
pulse of the project?

Thanks,
Justin

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 5 | next ›
Discussion Overview
groupsolr-user @
categorieslucene
postedJun 10, '16 at 1:20a
activeJun 13, '16 at 9:05p
posts5
users4
websitelucene.apache.org...

People

Translate

site design / logo © 2019 Grokbase