Grokbase Groups Lucene dev May 2016
FAQ
[ https://issues.apache.org/jira/browse/SOLR-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285816#comment-15285816 ]

Tim Allison commented on SOLR-8981:
-----------------------------------

Tika 1.13 is now available.
Upgrade to Tika 1.13 when it is available
-----------------------------------------

Key: SOLR-8981
URL: https://issues.apache.org/jira/browse/SOLR-8981
Project: Solr
Issue Type: Improvement
Reporter: Tim Allison
Priority: Minor

Tika 1.13 should be out within a month. This includes PDFBox 2.0.0 and a number of other upgrades and improvements.
If there are any showstoppers in 1.13 from Solr's side or requests before we roll 1.13, let us know.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Search Discussions

  • Alexandre Rafalovitch (JIRA) at May 18, 2016 at 10:37 am
    [ https://issues.apache.org/jira/browse/SOLR-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288757#comment-15288757 ]

    Alexandre Rafalovitch commented on SOLR-8981:
    ---------------------------------------------

    Is this going to affect language detection module in Solr? Or is API unchanged?
    Upgrade to Tika 1.13 when it is available
    -----------------------------------------

    Key: SOLR-8981
    URL: https://issues.apache.org/jira/browse/SOLR-8981
    Project: Solr
    Issue Type: Improvement
    Reporter: Tim Allison
    Priority: Minor

    Tika 1.13 should be out within a month. This includes PDFBox 2.0.0 and a number of other upgrades and improvements.
    If there are any showstoppers in 1.13 from Solr's side or requests before we roll 1.13, let us know.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Tim Allison (JIRA) at May 18, 2016 at 11:12 am
    [ https://issues.apache.org/jira/browse/SOLR-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288803#comment-15288803 ]

    Tim Allison commented on SOLR-8981:
    -----------------------------------

    Thanks to [~grossws], our traditional language detection API _should_ be unchanged in 1.13. #famouslastwords

    We've also added Optimaize and Julia under a new package (tika-langdetect) TIKA-1696. This new package allows easier integration for other language detection packages such as [Yalder|https://github.com/kkrugler/yalder]

    [~chrismattmann] and [~kkrugler], is the above correct?
    Upgrade to Tika 1.13 when it is available
    -----------------------------------------

    Key: SOLR-8981
    URL: https://issues.apache.org/jira/browse/SOLR-8981
    Project: Solr
    Issue Type: Improvement
    Reporter: Tim Allison
    Priority: Minor

    Tika 1.13 should be out within a month. This includes PDFBox 2.0.0 and a number of other upgrades and improvements.
    If there are any showstoppers in 1.13 from Solr's side or requests before we roll 1.13, let us know.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Chris A. Mattmann (JIRA) at May 18, 2016 at 2:16 pm
    [ https://issues.apache.org/jira/browse/SOLR-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289030#comment-15289030 ]

    Chris A. Mattmann commented on SOLR-8981:
    -----------------------------------------

    correct [~tallison@apache.org] won't affect it for now.
    Upgrade to Tika 1.13 when it is available
    -----------------------------------------

    Key: SOLR-8981
    URL: https://issues.apache.org/jira/browse/SOLR-8981
    Project: Solr
    Issue Type: Improvement
    Reporter: Tim Allison
    Priority: Minor

    Tika 1.13 should be out within a month. This includes PDFBox 2.0.0 and a number of other upgrades and improvements.
    If there are any showstoppers in 1.13 from Solr's side or requests before we roll 1.13, let us know.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Tim Allison (JIRA) at May 26, 2016 at 4:56 pm
    [ https://issues.apache.org/jira/browse/SOLR-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302427#comment-15302427 ]

    Tim Allison commented on SOLR-8981:
    -----------------------------------

    CVE-2016-4434: Apache Tika XML External Entity vulnerability in versions 0.10-1.12: [announcement|https://mail-archives.apache.org/mod_mbox/tika-dev/201605.mbox/%3C1705136517.1175366.1464278135251.JavaMail.yahoo%40mail.yahoo.com%3E]
    Upgrade to Tika 1.13 when it is available
    -----------------------------------------

    Key: SOLR-8981
    URL: https://issues.apache.org/jira/browse/SOLR-8981
    Project: Solr
    Issue Type: Improvement
    Reporter: Tim Allison
    Priority: Minor

    Tika 1.13 should be out within a month. This includes PDFBox 2.0.0 and a number of other upgrades and improvements.
    If there are any showstoppers in 1.13 from Solr's side or requests before we roll 1.13, let us know.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Lewis John McGibbney (JIRA) at May 26, 2016 at 5:04 pm
    [ https://issues.apache.org/jira/browse/SOLR-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302442#comment-15302442 ]

    Lewis John McGibbney commented on SOLR-8981:
    --------------------------------------------

    I am working on this again and will try to post a patch ASAP. [~tallison@mitre.org]. I have the following test failing in Solr
    https://github.com/apache/lucene-solr/blob/master/solr/contrib/extraction/src/test/org/apache/solr/handler/extraction/ExtractingRequestHandlerTest.java#L505
    I have been debugging the tests with no luck as of yet. I'll post a new PR later today. The new PR is rebased against lucene-solr master and Tika 1.13
    Upgrade to Tika 1.13 when it is available
    -----------------------------------------

    Key: SOLR-8981
    URL: https://issues.apache.org/jira/browse/SOLR-8981
    Project: Solr
    Issue Type: Improvement
    Reporter: Tim Allison
    Priority: Minor

    Tika 1.13 should be out within a month. This includes PDFBox 2.0.0 and a number of other upgrades and improvements.
    If there are any showstoppers in 1.13 from Solr's side or requests before we roll 1.13, let us know.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Tim Allison (JIRA) at May 31, 2016 at 6:59 pm
    [ https://issues.apache.org/jira/browse/SOLR-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308355#comment-15308355 ]

    Tim Allison commented on SOLR-8981:
    -----------------------------------

    I'm getting a failure on that test too. I can't figure out what's going on. I'm getting exactly the same output with the standalone Tika 1.7 and 1.13 apps on the test file...argh...
    Upgrade to Tika 1.13 when it is available
    -----------------------------------------

    Key: SOLR-8981
    URL: https://issues.apache.org/jira/browse/SOLR-8981
    Project: Solr
    Issue Type: Improvement
    Reporter: Tim Allison
    Priority: Minor

    Tika 1.13 should be out within a month. This includes PDFBox 2.0.0 and a number of other upgrades and improvements.
    If there are any showstoppers in 1.13 from Solr's side or requests before we roll 1.13, let us know.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Tim Allison (JIRA) at May 31, 2016 at 7:28 pm
    [ https://issues.apache.org/jira/browse/SOLR-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308355#comment-15308355 ]

    Tim Allison edited comment on SOLR-8981 at 5/31/16 7:27 PM:
    ------------------------------------------------------------

    I'm getting a failure on that test too. I'm getting exactly the same output with the standalone Tika 1.7 and 1.13 apps on the test file...argh...

    For some reason, it looks like Tika is now emitting 2 bodies, if you double the body in both tests, this now works:
    {noformat}
    ExtractingParams.XPATH_EXPRESSION, "/xhtml:html/xhtml:body/xhtml:body/xhtml:a/descendant::node()",
    {noformat}
    {noformat}
    "xpath", "/xhtml:html/xhtml:body/xhtml:body/xhtml:div//node()",
    {noformat}


    was (Author: tallison@mitre.org):
    I'm getting a failure on that test too. I can't figure out what's going on. I'm getting exactly the same output with the standalone Tika 1.7 and 1.13 apps on the test file...argh...
    Upgrade to Tika 1.13 when it is available
    -----------------------------------------

    Key: SOLR-8981
    URL: https://issues.apache.org/jira/browse/SOLR-8981
    Project: Solr
    Issue Type: Improvement
    Reporter: Tim Allison
    Priority: Minor

    Tika 1.13 should be out within a month. This includes PDFBox 2.0.0 and a number of other upgrades and improvements.
    If there are any showstoppers in 1.13 from Solr's side or requests before we roll 1.13, let us know.


    --
    This message was sent by Atlassian JIRA
    (v6.3.4#6332)

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedMay 17, '16 at 1:26a
activeMay 31, '16 at 7:28p
posts8
users1
websitelucene.apache.org

1 user in discussion

Tim Allison (JIRA): 8 posts

People

Translate

site design / logo © 2022 Grokbase