SimplePreAnalyzedParser (Solr 4.0.0-ALPHA API)

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.solr.schema
Class SimplePreAnalyzedParser

java.lang.Object
  org.apache.solr.schema.SimplePreAnalyzedParser

All Implemented Interfaces:: PreAnalyzedField.PreAnalyzedParser

public final class SimplePreAnalyzedParser
extends Object
implements PreAnalyzedField.PreAnalyzedParser
extends Object
implements PreAnalyzedField.PreAnalyzedParser

Simple plain text format parser for PreAnalyzedField.

Serialization format

The format of the serialization is as follows:

 content ::= version (stored)? tokens
 version ::= digit+ " "
 ; stored field value - any "=" inside must be escaped!
 stored ::= "=" text "="
 tokens ::= (token ((" ") + token)*)*
 token ::= text ("," attrib)*
 attrib ::= name '=' value
 name ::= text
 value ::= text

Special characters in "text" values can be escaped using the escape character \ . The following escape sequences are recognized:

 "\ " - literal space character
 "\," - literal , character
 "\=" - literal = character
 "\\" - literal \ character
 "\n" - newline
 "\r" - carriage return
 "\t" - horizontal tab

Please note that Unicode sequences (e.g. ) are not supported.

Supported attribute names

The following token attributes are supported, and identified with short symbolic names:

 i - position increment (integer)
 s - token offset, start position (integer)
 e - token offset, end position (integer)
 t - token type (string)
 f - token flags (hexadecimal integer)
 p - payload (bytes in hexadecimal format)

Token positions are tracked and implicitly added to the token stream - the start and end offsets consider only the term text and whitespace, and exclude the space taken by token attributes.

Example token streams

 1 one two three
  - version 1
  - stored: 'null'
  - tok: '(term=one,startOffset=0,endOffset=3)'
  - tok: '(term=two,startOffset=4,endOffset=7)'
  - tok: '(term=three,startOffset=8,endOffset=13)'
 1 one  two   three 
  - version 1
  - stored: 'null'
  - tok: '(term=one,startOffset=1,endOffset=4)'
  - tok: '(term=two,startOffset=6,endOffset=9)'
  - tok: '(term=three,startOffset=12,endOffset=17)'
1 one,s=123,e=128,i=22  two three,s=20,e=22
  - version 1
  - stored: 'null'
  - tok: '(term=one,positionIncrement=22,startOffset=123,endOffset=128)'
  - tok: '(term=two,positionIncrement=1,startOffset=5,endOffset=8)'
  - tok: '(term=three,positionIncrement=1,startOffset=20,endOffset=22)'
1 \ one\ \,,i=22,a=\, two\=

  \n,\ =\   \
  - version 1
  - stored: 'null'
  - tok: '(term= one ,,positionIncrement=22,startOffset=0,endOffset=6)'
  - tok: '(term=two=


 ,positionIncrement=1,startOffset=7,endOffset=15)'
  - tok: '(term=\,positionIncrement=1,startOffset=17,endOffset=18)'
1 ,i=22 ,i=33,s=2,e=20 , 
  - version 1
  - stored: 'null'
  - tok: '(term=,positionIncrement=22,startOffset=0,endOffset=0)'
  - tok: '(term=,positionIncrement=33,startOffset=2,endOffset=20)'
  - tok: '(term=,positionIncrement=1,startOffset=2,endOffset=2)'
1 =This is the stored part with \= 
 \n    \t escapes.=one two three 
  - version 1
  - stored: 'This is the stored part with = 
 \n    \t escapes.'
  - tok: '(term=one,startOffset=0,endOffset=3)'
  - tok: '(term=two,startOffset=4,endOffset=7)'
  - tok: '(term=three,startOffset=8,endOffset=13)'
1 ==
  - version 1
  - stored: ''
  - (no tokens)
1 =this is a test.=
  - version 1
  - stored: 'this is a test.'
  - (no tokens)

Constructor Summary
`SimplePreAnalyzedParser()`

Method Summary
`PreAnalyzedField.ParseResult`	`parse(Reader reader, AttributeSource parent)` Parse input.
`String`	`toFormattedString(Field f)` Format a field so that the resulting String is valid for parsing with `PreAnalyzedField.PreAnalyzedParser.parse(Reader, AttributeSource)`.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail