1 | Getting Started |
---|
2 | --------------- |
---|
3 | To start using Solr UIMA Metadata Extraction Library you should go through the following configuration steps: |
---|
4 | |
---|
5 | 1. copy generated solr-uima jar and its libs (under contrib/uima/lib) inside a Solr libraries directory. |
---|
6 | or set <lib/> tags in solrconfig.xml appropriately to point those jar files. |
---|
7 | |
---|
8 | <lib dir="../../contrib/uima/lib" /> |
---|
9 | <lib dir="../../dist/" regex="apache-solr-uima-\d.*\.jar" /> |
---|
10 | |
---|
11 | 2. modify your schema.xml adding the fields you want to be hold metadata specifying proper values for type, indexed, stored and multiValued options: |
---|
12 | |
---|
13 | for example you could specify the following |
---|
14 | |
---|
15 | <field name="language" type="string" indexed="true" stored="true" required="false"/> |
---|
16 | <field name="concept" type="string" indexed="true" stored="true" multiValued="true" required="false"/> |
---|
17 | <field name="sentence" type="text" indexed="true" stored="true" multiValued="true" required="false" /> |
---|
18 | |
---|
19 | 3. modify your solrconfig.xml adding the following snippet: |
---|
20 | |
---|
21 | <updateRequestProcessorChain name="uima"> |
---|
22 | <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory"> |
---|
23 | <lst name="uimaConfig"> |
---|
24 | <lst name="runtimeParameters"> |
---|
25 | <str name="keyword_apikey">VALID_ALCHEMYAPI_KEY</str> |
---|
26 | <str name="concept_apikey">VALID_ALCHEMYAPI_KEY</str> |
---|
27 | <str name="lang_apikey">VALID_ALCHEMYAPI_KEY</str> |
---|
28 | <str name="cat_apikey">VALID_ALCHEMYAPI_KEY</str> |
---|
29 | <str name="entities_apikey">VALID_ALCHEMYAPI_KEY</str> |
---|
30 | <str name="oc_licenseID">VALID_OPENCALAIS_KEY</str> |
---|
31 | </lst> |
---|
32 | <str name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str> |
---|
33 | <!-- Set to true if you want to continue indexing even if text processing fails. |
---|
34 | Default is false. That is, Solr throws RuntimeException and |
---|
35 | never indexed documents entirely in your session. --> |
---|
36 | <bool name="ignoreErrors">true</bool> |
---|
37 | <!-- This is optional. It is used for logging when text processing fails. |
---|
38 | If logField is not specified, uniqueKey will be used as logField. |
---|
39 | <str name="logField">id</str> |
---|
40 | --> |
---|
41 | <lst name="analyzeFields"> |
---|
42 | <bool name="merge">false</bool> |
---|
43 | <arr name="fields"> |
---|
44 | <str>text</str> |
---|
45 | </arr> |
---|
46 | </lst> |
---|
47 | <lst name="fieldMappings"> |
---|
48 | <lst name="type"> |
---|
49 | <str name="name">org.apache.uima.alchemy.ts.concept.ConceptFS</str> |
---|
50 | <lst name="mapping"> |
---|
51 | <str name="feature">text</str> |
---|
52 | <str name="field">concept</str> |
---|
53 | </lst> |
---|
54 | </lst> |
---|
55 | <lst name="type"> |
---|
56 | <str name="name">org.apache.uima.alchemy.ts.language.LanguageFS</str> |
---|
57 | <lst name="mapping"> |
---|
58 | <str name="feature">language</str> |
---|
59 | <str name="field">language</str> |
---|
60 | </lst> |
---|
61 | </lst> |
---|
62 | <lst name="type"> |
---|
63 | <str name="name">org.apache.uima.SentenceAnnotation</str> |
---|
64 | <lst name="mapping"> |
---|
65 | <str name="feature">coveredText</str> |
---|
66 | <str name="field">sentence</str> |
---|
67 | </lst> |
---|
68 | </lst> |
---|
69 | </lst> |
---|
70 | </lst> |
---|
71 | </processor> |
---|
72 | <processor class="solr.LogUpdateProcessorFactory" /> |
---|
73 | <processor class="solr.RunUpdateProcessorFactory" /> |
---|
74 | </updateRequestProcessorChain> |
---|
75 | |
---|
76 | where VALID_ALCHEMYAPI_KEY is your AlchemyAPI Access Key. You need to register AlchemyAPI Access |
---|
77 | key to exploit the AlchemyAPI services: http://www.alchemyapi.com/api/register.html |
---|
78 | |
---|
79 | where VALID_OPENCALAIS_KEY is your Calais Service Key. You need to register Calais Service |
---|
80 | key to exploit the Calais services: http://www.opencalais.com/apikey |
---|
81 | |
---|
82 | the analysisEngine must contain an AE descriptor inside the specified path in the classpath |
---|
83 | |
---|
84 | the analyzeFields must contain the input fields that need to be analyzed by UIMA, |
---|
85 | if merge=true then their content will be merged and analyzed only once |
---|
86 | |
---|
87 | field mapping describes which features of which types should go in a field |
---|
88 | |
---|
89 | 4. in your solrconfig.xml replace the existing default (<requestHandler name="/update"...) or create a new UpdateRequestHandler with the following: |
---|
90 | <requestHandler name="/update" class="solr.XmlUpdateRequestHandler"> |
---|
91 | <lst name="defaults"> |
---|
92 | <str name="update.processor">uima</str> |
---|
93 | </lst> |
---|
94 | </requestHandler> |
---|
95 | |
---|
96 | Once you're done with the configuration you can index documents which will be automatically enriched with the specified fields |
---|