Class SimplePreAnalyzedParser

  extended by org.apache.solr.schema.SimplePreAnalyzedParser
All Implemented Interfaces:

public final class SimplePreAnalyzedParser
extends Object
implements PreAnalyzedField.PreAnalyzedParser

Simple plain text format parser for PreAnalyzedField.

Serialization format

The format of the serialization is as follows:

 content ::= version (stored)? tokens
 version ::= digit+ " "
 ; stored field value - any "=" inside must be escaped!
 stored ::= "=" text "="
 tokens ::= (token ((" ") + token)*)*
 token ::= text ("," attrib)*
 attrib ::= name '=' value
 name ::= text
 value ::= text

Special characters in "text" values can be escaped using the escape character \ . The following escape sequences are recognized:

 "\ " - literal space character
 "\," - literal , character
 "\=" - literal = character
 "\\" - literal \ character
 "\n" - newline
 "\r" - carriage return
 "\t" - horizontal tab
Please note that Unicode sequences (e.g. ) are not supported.

Supported attribute names

The following token attributes are supported, and identified with short symbolic names:
 i - position increment (integer)
 s - token offset, start position (integer)
 e - token offset, end position (integer)
 t - token type (string)
 f - token flags (hexadecimal integer)
 p - payload (bytes in hexadecimal format)
Token positions are tracked and implicitly added to the token stream - the start and end offsets consider only the term text and whitespace, and exclude the space taken by token attributes.

Example token streams

 1 one two three
  - version 1
  - stored: 'null'
  - tok: '(term=one,startOffset=0,endOffset=3)'
  - tok: '(term=two,startOffset=4,endOffset=7)'
  - tok: '(term=three,startOffset=8,endOffset=13)'
 1 one  two   three 
  - version 1
  - stored: 'null'
  - tok: '(term=one,startOffset=1,endOffset=4)'
  - tok: '(term=two,startOffset=6,endOffset=9)'
  - tok: '(term=three,startOffset=12,endOffset=17)'
1 one,s=123,e=128,i=22  two three,s=20,e=22
  - version 1
  - stored: 'null'
  - tok: '(term=one,positionIncrement=22,startOffset=123,endOffset=128)'
  - tok: '(term=two,positionIncrement=1,startOffset=5,endOffset=8)'
  - tok: '(term=three,positionIncrement=1,startOffset=20,endOffset=22)'
1 \ one\ \,,i=22,a=\, two\=

  \n,\ =\   \
  - version 1
  - stored: 'null'
  - tok: '(term= one ,,positionIncrement=22,startOffset=0,endOffset=6)'
  - tok: '(term=two=

  - tok: '(term=\,positionIncrement=1,startOffset=17,endOffset=18)'
1 ,i=22 ,i=33,s=2,e=20 , 
  - version 1
  - stored: 'null'
  - tok: '(term=,positionIncrement=22,startOffset=0,endOffset=0)'
  - tok: '(term=,positionIncrement=33,startOffset=2,endOffset=20)'
  - tok: '(term=,positionIncrement=1,startOffset=2,endOffset=2)'
1 =This is the stored part with \= 
 \n    \t escapes.=one two three 
  - version 1
  - stored: 'This is the stored part with = 
 \n    \t escapes.'
  - tok: '(term=one,startOffset=0,endOffset=3)'
  - tok: '(term=two,startOffset=4,endOffset=7)'
  - tok: '(term=three,startOffset=8,endOffset=13)'
1 ==
  - version 1
  - stored: ''
  - (no tokens)
1 =this is a test.=
  - version 1
  - stored: 'this is a test.'
  - (no tokens)

Constructor Summary
Method Summary
 PreAnalyzedField.ParseResult parse(Reader reader, AttributeSource parent)
          Parse input.
 String toFormattedString(Field f)
          Format a field so that the resulting String is valid for parsing with PreAnalyzedField.PreAnalyzedParser.parse(Reader, AttributeSource).
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public SimplePreAnalyzedParser()
Method Detail


public PreAnalyzedField.ParseResult parse(Reader reader,
                                          AttributeSource parent)
                                   throws IOException
Description copied from interface: PreAnalyzedField.PreAnalyzedParser
Parse input.

Specified by:
parse in interface PreAnalyzedField.PreAnalyzedParser
reader - input to read from
parent - parent who will own the resulting states (tokens with attributes)
parse result, with possibly null stored and/or states fields.
IOException - if a parsing error or IO error occurs


public String toFormattedString(Field f)
                         throws IOException
Description copied from interface: PreAnalyzedField.PreAnalyzedParser
Format a field so that the resulting String is valid for parsing with PreAnalyzedField.PreAnalyzedParser.parse(Reader, AttributeSource).

Specified by:
toFormattedString in interface PreAnalyzedField.PreAnalyzedParser
f - field instance
formatted string

Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.