FAQ
[ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12968818#action_12968818 ]

Justinas Jaronis commented on SOLR-2129:
----------------------------------------

I tried Your latest patch however after compiling it doesn't include resources (./contrib/uima/src/resources/*) to the compiled project. So posting fails :

java.lang.RuntimeException: org.apache.uima.resource.ResourceInitializationException
at org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:81)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1359)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: org.apache.uima.resource.ResourceInitializationException
at org.apache.solr.uima.processor.ae.OverridingParamsAEProvider.getAE(OverridingParamsAEProvider.java:85)
at org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processText(UIMAUpdateRequestProcessor.java:115)
at org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:68)
... 24 more
Caused by: java.lang.NullPointerException
at org.apache.uima.util.XMLInputSource.<init>(XMLInputSource.java:114)
at org.apache.solr.uima.processor.ae.OverridingParamsAEProvider.getAE(OverridingParamsAEProvider.java:64)
... 26 more

when OverridingParamsAEProvider tries to read /org/apache/uima/desc/OverridingParamsExtServicesAE.xml . Where this file (and its fellow XMLs) should be located?


Thanks for the effort. Great project!
Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
-------------------------------------------------------------------------------

Key: SOLR-2129
URL: https://issues.apache.org/jira/browse/SOLR-2129
Project: Solr
Issue Type: New Feature
Reporter: Tommaso Teofili
Assignee: Robert Muir
Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, SOLR-2129-version2.patch, SOLR-2129.patch


Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents.
The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document.
Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents.
The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Search Discussions

  • Tommaso Teofili (JIRA) at Dec 7, 2010 at 11:35 pm
    [ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969076#action_12969076 ]

    Tommaso Teofili commented on SOLR-2129:
    ---------------------------------------

    Hi Justinas,
    you should have each needed XML under solr/contrib/uima/src/main/resources/org/apache/uima/desc/.
    Maybe I need to fix the ant build.xml.
    I'll inspect it, thanks for your feedback :)
    Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
    -------------------------------------------------------------------------------

    Key: SOLR-2129
    URL: https://issues.apache.org/jira/browse/SOLR-2129
    Project: Solr
    Issue Type: New Feature
    Reporter: Tommaso Teofili
    Assignee: Robert Muir
    Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, SOLR-2129-version2.patch, SOLR-2129.patch


    Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents.
    The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document.
    Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents.
    The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Justinas Jaronis (JIRA) at Dec 8, 2010 at 10:52 am
    [ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969271#action_12969271 ]

    Justinas Jaronis commented on SOLR-2129:
    ----------------------------------------

    The file is present in source this place after Your patch, but it doesn't appear in any JARs / WARs (or maybe it doesn't have to appear? ). And I don't find any location for manual injection. Tried to copy whole directory structure to example/, but no luck. Thank You for the fast response.
    Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
    -------------------------------------------------------------------------------

    Key: SOLR-2129
    URL: https://issues.apache.org/jira/browse/SOLR-2129
    Project: Solr
    Issue Type: New Feature
    Reporter: Tommaso Teofili
    Assignee: Robert Muir
    Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, SOLR-2129-version2.patch, SOLR-2129.patch


    Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents.
    The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document.
    Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents.
    The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Tommaso Teofili (JIRA) at Dec 8, 2010 at 2:28 pm
    [ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969325#action_12969325 ]

    Tommaso Teofili commented on SOLR-2129:
    ---------------------------------------

    Thanks Justinas, I've found and fixed the problem, a new patch will come shortly.
    Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
    -------------------------------------------------------------------------------

    Key: SOLR-2129
    URL: https://issues.apache.org/jira/browse/SOLR-2129
    Project: Solr
    Issue Type: New Feature
    Reporter: Tommaso Teofili
    Assignee: Robert Muir
    Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, SOLR-2129-version2.patch, SOLR-2129.patch


    Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents.
    The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document.
    Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents.
    The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Justinas Jaronis (JIRA) at Dec 8, 2010 at 6:40 pm
    [ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969415#action_12969415 ]

    Justinas Jaronis commented on SOLR-2129:
    ----------------------------------------

    Woohoo! Works like a charm. One slight note, after trying to index some documents I added multiValued="true" to the "entity*" field in schema.xml (I believe UIMA handles entities as array)
    Thanks again. Very very much :-) Hope i'll also bring some resources into this project.
    Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
    -------------------------------------------------------------------------------

    Key: SOLR-2129
    URL: https://issues.apache.org/jira/browse/SOLR-2129
    Project: Solr
    Issue Type: New Feature
    Reporter: Tommaso Teofili
    Assignee: Robert Muir
    Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, SOLR-2129-version2.patch, SOLR-2129-version3.patch, SOLR-2129.patch


    Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents.
    The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document.
    Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents.
    The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Tommaso Teofili (JIRA) at Dec 8, 2010 at 7:04 pm
    [ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969425#action_12969425 ]

    Tommaso Teofili commented on SOLR-2129:
    ---------------------------------------

    I'm glad you appreciated! And thanks for the hint ;)
    Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
    -------------------------------------------------------------------------------

    Key: SOLR-2129
    URL: https://issues.apache.org/jira/browse/SOLR-2129
    Project: Solr
    Issue Type: New Feature
    Reporter: Tommaso Teofili
    Assignee: Robert Muir
    Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, SOLR-2129-version2.patch, SOLR-2129-version3.patch, SOLR-2129.patch


    Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents.
    The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document.
    Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents.
    The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Kamil Machura (JIRA) at Dec 18, 2010 at 12:34 am
    [ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972710#action_12972710 ]

    Kamil Machura commented on SOLR-2129:
    -------------------------------------

    Hi Tommaso,

    I'm really curious to take a look at your work, unfortunately it doesn't compile after applying the patch:

    BUILD FAILED
    <your Solr trunk checkout dir>/trunk/solr/contrib/uima/build.xml:65: The following error occurred while executing this line:
    <your Solr trunk checkout dir>/trunk/solr/common-build.xml:267: /home/kamil/dev/solr/solr-old/trunk/solr/contrib/uima/lib does not exist.

    Obviously it worked out for Justinas so I am wondering what is wrong. Any idea?

    Great project, by the way!!!

    Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
    -------------------------------------------------------------------------------

    Key: SOLR-2129
    URL: https://issues.apache.org/jira/browse/SOLR-2129
    Project: Solr
    Issue Type: New Feature
    Reporter: Tommaso Teofili
    Assignee: Robert Muir
    Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, SOLR-2129-version2.patch, SOLR-2129-version3.patch, SOLR-2129.patch


    Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents.
    The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document.
    Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents.
    The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Tommaso Teofili (JIRA) at Dec 22, 2010 at 8:56 am
    [ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974114#action_12974114 ]

    Tommaso Teofili commented on SOLR-2129:
    ---------------------------------------

    Hi Kamil,
    can you please take a look at your trunk/solr/contrib/uima does the lib folder exist? Can you find the jars in there?
    Let me know and thanks for your feedback
    Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
    -------------------------------------------------------------------------------

    Key: SOLR-2129
    URL: https://issues.apache.org/jira/browse/SOLR-2129
    Project: Solr
    Issue Type: New Feature
    Reporter: Tommaso Teofili
    Assignee: Robert Muir
    Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, SOLR-2129-version2.patch, SOLR-2129-version3.patch, SOLR-2129.patch


    Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents.
    The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document.
    Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents.
    The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Tommaso Teofili (JIRA) at Dec 23, 2010 at 2:36 pm
    [ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974630#action_12974630 ]

    Tommaso Teofili commented on SOLR-2129:
    ---------------------------------------

    Maybe a dedicated page on the wiki could help on installing, testing, extending this patch.
    Any opinions?
    Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
    -------------------------------------------------------------------------------

    Key: SOLR-2129
    URL: https://issues.apache.org/jira/browse/SOLR-2129
    Project: Solr
    Issue Type: New Feature
    Reporter: Tommaso Teofili
    Assignee: Robert Muir
    Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, SOLR-2129-version2.patch, SOLR-2129-version3.patch, SOLR-2129.patch


    Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents.
    The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document.
    Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents.
    The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Lance Norskog (JIRA) at Dec 24, 2010 at 1:08 am
    [ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974809#action_12974809 ]

    Lance Norskog commented on SOLR-2129:
    -------------------------------------

    +1. There is a lot of material behind UIMA, and a wiki page describing it and some sample use cases would go a long way.
    Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
    -------------------------------------------------------------------------------

    Key: SOLR-2129
    URL: https://issues.apache.org/jira/browse/SOLR-2129
    Project: Solr
    Issue Type: New Feature
    Reporter: Tommaso Teofili
    Assignee: Robert Muir
    Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, SOLR-2129-version2.patch, SOLR-2129-version3.patch, SOLR-2129.patch


    Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents.
    The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document.
    Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents.
    The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Kamil Machura (JIRA) at Dec 26, 2010 at 9:06 pm
    [ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12975165#action_12975165 ]

    Kamil Machura commented on SOLR-2129:
    -------------------------------------

    Hi Tomasso,

    the trunk/solr/contrib/uima folder doesn't exist so I can't find any jars.
    Basically I follow the steps mentioned here: http://wiki.apache.org/solr/HowToContribute , i.e.
    - checkout trunk
    - apply patch (after that trunk/solr/contrib/uima exists)
    - ant build

    The build fails with above mentioned error.
    Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
    -------------------------------------------------------------------------------

    Key: SOLR-2129
    URL: https://issues.apache.org/jira/browse/SOLR-2129
    Project: Solr
    Issue Type: New Feature
    Reporter: Tommaso Teofili
    Assignee: Robert Muir
    Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, SOLR-2129-version2.patch, SOLR-2129-version3.patch, SOLR-2129.patch


    Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents.
    The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document.
    Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents.
    The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Lance Norskog (JIRA) at Dec 27, 2010 at 2:13 am
    [ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12975185#action_12975185 ]

    Lance Norskog commented on SOLR-2129:
    -------------------------------------

    The directory trunk/solr/contrib/uima does not exist because either the directory is not in the patch. The patch should include an empty "marker" file in trunk/solr/contrib/uima/lib so that the directory gets made.



    Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
    -------------------------------------------------------------------------------

    Key: SOLR-2129
    URL: https://issues.apache.org/jira/browse/SOLR-2129
    Project: Solr
    Issue Type: New Feature
    Reporter: Tommaso Teofili
    Assignee: Robert Muir
    Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, SOLR-2129-version2.patch, SOLR-2129-version3.patch, SOLR-2129.patch


    Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents.
    The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document.
    Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents.
    The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedDec 7, '10 at 4:53p
activeDec 27, '10 at 2:13a
posts12
users1
websitelucene.apache.org

1 user in discussion

Lance Norskog (JIRA): 12 posts

People

Translate

site design / logo © 2021 Grokbase