com.topologi.diffx.load.text
Class TextTokeniserNoSpace

java.lang.Object
  extended bycom.topologi.diffx.load.text.TextTokeniserNoSpace
All Implemented Interfaces:
TextTokeniser

public final class TextTokeniserNoSpace
extends Object
implements TextTokeniser

A tokeniser for text events that ignore and does not preserve whitespaces.

This tokeniser works at the word level, and will essetially prune out all the white spaces from the text to return only a list of successive words.

Version:
23 December 2004
Author:
Christophe Lauret

Constructor Summary
TextTokeniserNoSpace(CharSequence cs)
          Creates a new tokeniser.
 
Method Summary
 int countTokens()
          Calculates the number of times that this tokenizer's nextToken method can be called before it generates an exception.
 TextEvent nextToken()
          Returns the following token.
 void useRepertory(Repertory rep)
          Specifies a repertory to use for this tokeniser.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TextTokeniserNoSpace

public TextTokeniserNoSpace(CharSequence cs)
                     throws NullPointerException
Creates a new tokeniser.

Parameters:
cs - The character sequence to tokenise.
Throws:
NullPointerException - If the specified character sequence is null.
Method Detail

countTokens

public int countTokens()
Calculates the number of times that this tokenizer's nextToken method can be called before it generates an exception.

Specified by:
countTokens in interface TextTokeniser
Returns:
The number of tokens.

nextToken

public TextEvent nextToken()
                    throws NoSuchElementException
Returns the following token.

Specified by:
nextToken in interface TextTokeniser
Returns:
The next text event.
Throws:
NoSuchElementException - If the last token has already been returned.

useRepertory

public void useRepertory(Repertory rep)
Specifies a repertory to use for this tokeniser.

Specified by:
useRepertory in interface TextTokeniser
Parameters:
rep - The repertory to use.