|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface ExtractingParams
The various Solr Parameters names to use when extracting content.
Field Summary | |
---|---|
static String |
BOOST_PREFIX
The boost value for the name of the field. |
static String |
CAPTURE_ATTRIBUTES
Capture attributes separately according to the name of the element, instead of just adding them to the string buffer |
static String |
CAPTURE_ELEMENTS
Capture the specified fields (and everything included below it that isn't capture by some other capture field) separately from the default. |
static String |
DEFAULT_FIELD
Optional. |
static String |
EXTRACT_FORMAT
Content output format if extractOnly is true. |
static String |
EXTRACT_ONLY
Only extract and return the content, do not index it. |
static String |
IGNORE_TIKA_EXCEPTION
if true, ignore TikaException (give up to extract text but index meta data) |
static String |
LITERALS_PREFIX
Pass in literal values to be added to the document, as in |
static String |
LOWERNAMES
Map all generated attribute names to field names with lowercase and underscores. |
static String |
MAP_PREFIX
The param prefix for mapping Tika metadata to Solr fields. |
static String |
RESOURCE_NAME
Optional. |
static String |
STREAM_TYPE
The type of the stream. |
static String |
UNKNOWN_FIELD_PREFIX
Optional. |
static String |
XPATH_EXPRESSION
Restrict the extracted parts of a document to be indexed by passing in an XPath expression. |
Field Detail |
---|
static final String LOWERNAMES
static final String IGNORE_TIKA_EXCEPTION
static final String MAP_PREFIX
fmap.title=solr.titleIn this example, the tika "title" metadata value will be added to a Solr field named "solr.title"
static final String BOOST_PREFIX
map.title=solr.title boost.solr.title=2.5will boost the solr.title field for this document by 2.5
static final String LITERALS_PREFIX
literal.myField=Foo
static final String XPATH_EXPRESSION
SolrContentHandler
.
See Tika's docs for what the extracted document looks like.
CAPTURE_ELEMENTS
,
Constant Field Valuesstatic final String EXTRACT_ONLY
static final String EXTRACT_FORMAT
static final String CAPTURE_ATTRIBUTES
static final String CAPTURE_ELEMENTS
SolrContentHandler
by Tika, not to be confused by the mapped field. The field name can then
be mapped into the index schema.
For instance, a Tika document may look like:
<html> ... <body> <p>some text here. <div>more text</div></p> Some more text </body>By passing in the p tag, you could capture all P tags separately from the rest of the t Thus, in the example, the capture of the P tag would be: "some text here. more text"
static final String STREAM_TYPE
static final String RESOURCE_NAME
static final String UNKNOWN_FIELD_PREFIX
static final String DEFAULT_FIELD
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |