FAQ

[jira] Updated: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA

Tommaso Teofili (JIRA)
Nov 14, 2010 at 9:38 am
[ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tommaso Teofili updated SOLR-2129:
----------------------------------

Attachment: SOLR-2129-version2.patch

Huge Solr-UIMA refactoring, including injecting the following information from <uimaConfig> tag inside solrconfig:

1. added dynamic field mapping with the following syntax:
<fieldMapping>
<type name="org.apache.uima.jcas.tcas.Annotation">
<map feature="coveredText" field="tag"/>
</type>
<type name="org.apache.uima.jcas.tcas.AnotherAnnotationType">
<map feature="featureName" field="anotherField"/>
</type>
</fieldMapping>

2. added AnalysisEngine descriptor path (must be inside the classpath)
<analysisEngine>/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</analysisEngine>

3. added fields' values to be analyzed, eventually merging their values to make UIMA run only once:
<analyzeFields merge="false">text,title</analyzeFields>

Runtime parameters for defining overriding parameters for delegate AEs remains the same:
<runtimeParameters>
<keyword_apikey>VALID_ALCHEMYAPI_KEY</keyword_apikey>
<concept_apikey>VALID_ALCHEMYAPI_KEY</concept_apikey>
<lang_apikey>VALID_ALCHEMYAPI_KEY</lang_apikey>
<cat_apikey>VALID_ALCHEMYAPI_KEY</cat_apikey>
<oc_licenseID>VALID_OPENCALAIS_KEY</oc_licenseID>
</runtimeParameters>

These changes should make the use of such a module much easier and flexible.
Looking forward for your feedback.
Tommaso
Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
-------------------------------------------------------------------------------

Key: SOLR-2129
URL: https://issues.apache.org/jira/browse/SOLR-2129
Project: Solr
Issue Type: New Feature
Reporter: Tommaso Teofili
Assignee: Robert Muir
Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, SOLR-2129-version2.patch, SOLR-2129.patch


Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents.
The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document.
Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents.
The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
reply

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post

1 user in discussion

Tommaso Teofili (JIRA): 1 post