org.apache.solr.handler.extraction
Interface ExtractingParams

All Known Implementing Classes:
SolrContentHandler

public interface ExtractingParams

The various Solr Parameters names to use when extracting content.


Field Summary
static String BOOST_PREFIX
          The boost value for the name of the field.
static String CAPTURE_ATTRIBUTES
          Capture attributes separately according to the name of the element, instead of just adding them to the string buffer
static String CAPTURE_ELEMENTS
          Capture the specified fields (and everything included below it that isn't capture by some other capture field) separately from the default.
static String DEFAULT_FIELD
          Optional.
static String EXTRACT_FORMAT
          Content output format if extractOnly is true.
static String EXTRACT_ONLY
          Only extract and return the content, do not index it.
static String IGNORE_TIKA_EXCEPTION
          if true, ignore TikaException (give up to extract text but index meta data)
static String LITERALS_PREFIX
          Pass in literal values to be added to the document, as in
static String LOWERNAMES
          Map all generated attribute names to field names with lowercase and underscores.
static String MAP_PREFIX
          The param prefix for mapping Tika metadata to Solr fields.
static String RESOURCE_NAME
          Optional.
static String STREAM_TYPE
          The type of the stream.
static String UNKNOWN_FIELD_PREFIX
          Optional.
static String XPATH_EXPRESSION
          Restrict the extracted parts of a document to be indexed by passing in an XPath expression.
 

Field Detail

LOWERNAMES

static final String LOWERNAMES
Map all generated attribute names to field names with lowercase and underscores.

See Also:
Constant Field Values

IGNORE_TIKA_EXCEPTION

static final String IGNORE_TIKA_EXCEPTION
if true, ignore TikaException (give up to extract text but index meta data)

See Also:
Constant Field Values

MAP_PREFIX

static final String MAP_PREFIX
The param prefix for mapping Tika metadata to Solr fields.

To map a field, add a name like:

fmap.title=solr.title
In this example, the tika "title" metadata value will be added to a Solr field named "solr.title"

See Also:
Constant Field Values

BOOST_PREFIX

static final String BOOST_PREFIX
The boost value for the name of the field. The boost can be specified by a name mapping.

For example

 map.title=solr.title
 boost.solr.title=2.5
 
will boost the solr.title field for this document by 2.5

See Also:
Constant Field Values

LITERALS_PREFIX

static final String LITERALS_PREFIX
Pass in literal values to be added to the document, as in
  literal.myField=Foo 
 

See Also:
Constant Field Values

XPATH_EXPRESSION

static final String XPATH_EXPRESSION
Restrict the extracted parts of a document to be indexed by passing in an XPath expression. All content that satisfies the XPath expr. will be passed to the SolrContentHandler.

See Tika's docs for what the extracted document looks like.

See Also:
CAPTURE_ELEMENTS, Constant Field Values

EXTRACT_ONLY

static final String EXTRACT_ONLY
Only extract and return the content, do not index it.

See Also:
Constant Field Values

EXTRACT_FORMAT

static final String EXTRACT_FORMAT
Content output format if extractOnly is true. Default is "xml", alternative is "text".

See Also:
Constant Field Values

CAPTURE_ATTRIBUTES

static final String CAPTURE_ATTRIBUTES
Capture attributes separately according to the name of the element, instead of just adding them to the string buffer

See Also:
Constant Field Values

CAPTURE_ELEMENTS

static final String CAPTURE_ELEMENTS
Capture the specified fields (and everything included below it that isn't capture by some other capture field) separately from the default. This is different then the case of passing in an XPath expression.

The Capture field is based on the localName returned to the SolrContentHandler by Tika, not to be confused by the mapped field. The field name can then be mapped into the index schema.

For instance, a Tika document may look like:

  <html>
    ...
    <body>
      <p>some text here.  <div>more text</div></p>
      Some more text
    </body>
 
By passing in the p tag, you could capture all P tags separately from the rest of the t Thus, in the example, the capture of the P tag would be: "some text here. more text"

See Also:
Constant Field Values

STREAM_TYPE

static final String STREAM_TYPE
The type of the stream. If not specified, Tika will use mime type detection.

See Also:
Constant Field Values

RESOURCE_NAME

static final String RESOURCE_NAME
Optional. The file name. If specified, Tika can take this into account while guessing the MIME type.

See Also:
Constant Field Values

UNKNOWN_FIELD_PREFIX

static final String UNKNOWN_FIELD_PREFIX
Optional. If specified, the prefix will be prepended to all Metadata, such that it would be possible to setup a dynamic field to automatically capture it

See Also:
Constant Field Values

DEFAULT_FIELD

static final String DEFAULT_FIELD
Optional. If specified and the name of a potential field cannot be determined, the default Field specified will be used instead.

See Also:
Constant Field Values


Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.