org.silverpeas.search.indexEngine.analysis
Class SilverTokenizer

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.Tokenizer
              extended by org.silverpeas.search.indexEngine.analysis.SilverTokenizer
All Implemented Interfaces:
Closeable

public class SilverTokenizer
extends org.apache.lucene.analysis.Tokenizer

A grammar-based tokenizer constructed with JFlex

This should be a good tokenizer for most European-language documents:

Many applications have specific tokenizer needs. If this tokenizer does not suit your application, please consider copying this source code directory to your project and maintaining your own grammar-based tokenizer.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
 
Field Summary
 
Fields inherited from class org.apache.lucene.analysis.Tokenizer
input
 
Constructor Summary
SilverTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader input)
          Creates a new StandardTokenizer with a given AttributeSource.AttributeFactory
SilverTokenizer(org.apache.lucene.util.AttributeSource source, Reader input)
          Creates a new StandardTokenizer with a given AttributeSource.
SilverTokenizer(Reader input)
          Creates a new instance of the StandardTokenizer.
 
Method Summary
 void end()
           
 int getMaxTokenLength()
           
 boolean incrementToken()
           
 boolean isReplaceInvalidAcronym()
          Deprecated. Remove in 3.X and make true the only valid value
 void reset(Reader reader)
           
 void setMaxTokenLength(int length)
          Set the max allowed token length.
 void setReplaceInvalidAcronym(boolean replaceInvalidAcronym)
          Deprecated. Remove in 3.X and make true the only valid value See https://issues.apache.org/jira/browse/LUCENE-1068
 
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, correctOffset
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
reset
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SilverTokenizer

public SilverTokenizer(Reader input)
Creates a new instance of the StandardTokenizer. Attaches the input to the newly created JFlex scanner.

Parameters:
input - The input reader See http://issues.apache.org/jira/browse/LUCENE-1068

SilverTokenizer

public SilverTokenizer(org.apache.lucene.util.AttributeSource source,
                       Reader input)
Creates a new StandardTokenizer with a given AttributeSource.


SilverTokenizer

public SilverTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory,
                       Reader input)
Creates a new StandardTokenizer with a given AttributeSource.AttributeFactory

Method Detail

setMaxTokenLength

public void setMaxTokenLength(int length)
Set the max allowed token length. Any token longer than this is skipped.


getMaxTokenLength

public int getMaxTokenLength()
See Also:
setMaxTokenLength(int)

incrementToken

public final boolean incrementToken()
                             throws IOException
Specified by:
incrementToken in class org.apache.lucene.analysis.TokenStream
Throws:
IOException

end

public final void end()
Overrides:
end in class org.apache.lucene.analysis.TokenStream

reset

public void reset(Reader reader)
           throws IOException
Overrides:
reset in class org.apache.lucene.analysis.Tokenizer
Throws:
IOException

isReplaceInvalidAcronym

@Deprecated
public boolean isReplaceInvalidAcronym()
Deprecated. Remove in 3.X and make true the only valid value

Prior to https://issues.apache.org/jira/browse/LUCENE-1068, StandardTokenizer mischaracterized as acronyms tokens like www.abc.com when they should have been labeled as hosts instead.

Returns:
true if StandardTokenizer now returns these tokens as Hosts, otherwise false

setReplaceInvalidAcronym

@Deprecated
public void setReplaceInvalidAcronym(boolean replaceInvalidAcronym)
Deprecated. Remove in 3.X and make true the only valid value See https://issues.apache.org/jira/browse/LUCENE-1068

Parameters:
replaceInvalidAcronym - Set to true to replace mischaracterized acronyms as HOST.


Copyright © 2016 Silverpeas. All Rights Reserved.