Java Doc for Token.java in  » Net » lucene-connector » org » apache » lucene » analysis » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Net » lucene connector » org.apache.lucene.analysis 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   org.apache.lucene.analysis.Token

Token
public class Token implements Cloneable(Code)
A Token is an occurence of a term from the text of a field. It consists of a term's text, the start and end offset of the term in the text of the field, and a type string.

The start and end offsets permit applications to re-associate a token with its source text, e.g., to display highlighted query terms in a document browser, or to show matching text fragments in a KWIC (KeyWord In Context) display, etc.

The type is an interned string, assigned by a lexical analyzer (a.k.a. tokenizer), naming the lexical or syntactic class that the token belongs to. For example an end of sentence marker token might be implemented with type "eos". The default token type is "word".

A Token can optionally have metadata (a.k.a. Payload) in the form of a variable length byte array. Use TermPositions.getPayloadLength and TermPositions.getPayload(byte[]int) to retrieve the payloads from the index.

WARNING: The status of the Payloads feature is experimental. The APIs introduced here might change in the future and will not be supported anymore in such a case.

NOTE: As of 2.3, Token stores the term text internally as a malleable char[] termBuffer instead of String termText. The indexing code and core tokenizers have been changed re-use a single Token instance, changing its buffer and other fields in-place as the Token is processed. This provides substantially better indexing performance as it saves the GC cost of new'ing a Token and String for every term. The APIs that accept String termText are still available but a warning about the associated performance cost has been added (below). The Token.termText() method has been deprecated.

Tokenizers and filters should try to re-use a Token instance when possible for best performance, by implementing the TokenStream.next(Token) API. Failing that, to create a new Token you should first use one of the constructors that starts with null text. Then you should call either Token.termBuffer() or Token.resizeTermBuffer(int) to retrieve the Token's termBuffer. Fill in the characters of your term into this buffer, and finally call Token.setTermLength(int) to set the length of the term text. See LUCENE-969 for details.


See Also:   org.apache.lucene.index.Payload


Field Summary
final public static  StringDEFAULT_TYPE
    
 intendOffset
    
 Payloadpayload
    
 intpositionIncrement
    
 intstartOffset
    
 char[]termBuffer
    
 inttermLength
    
 Stringtype
    

Constructor Summary
public  Token()
     Constructs a Token will null text.
public  Token(int start, int end)
     Constructs a Token with null text and start & end offsets.
public  Token(int start, int end, String typ)
     Constructs a Token with null text and start & end offsets plus the Token type.
public  Token(String text, int start, int end)
     Constructs a Token with the given term text, and start & end offsets.
public  Token(String text, int start, int end, String typ)
     Constructs a Token with the given text, start and end offsets, & type.

Method Summary
public  voidclear()
     Resets the term text, payload, and positionIncrement to default. Other fields such as startOffset, endOffset and the token type are not reset since they are normally overwritten by the tokenizer.
public  Objectclone()
    
final public  intendOffset()
     Returns this Token's ending offset, one greater than the position of the last character corresponding to this token in the source text.
public  PayloadgetPayload()
     Returns this Token's payload.
public  intgetPositionIncrement()
     Returns the position increment of this Token.
public  char[]resizeTermBuffer(int newSize)
     Grows the termBuffer to at least size newSize.
public  voidsetEndOffset(int offset)
     Set the ending offset.
public  voidsetPayload(Payload payload)
     Sets this Token's payload.
public  voidsetPositionIncrement(int positionIncrement)
     Set the position increment.
public  voidsetStartOffset(int offset)
     Set the starting offset.
final public  voidsetTermBuffer(char[] buffer, int offset, int length)
     Copies the contents of buffer, starting at offset for length characters, into the termBuffer array.
final public  voidsetTermLength(int length)
     Set number of valid characters (length of the term) in the termBuffer array.
public  voidsetTermText(String text)
     Sets the Token's term text.
final public  voidsetType(String type)
     Set the lexical type.
final public  intstartOffset()
     Returns this Token's starting offset, the position of the first character corresponding to this token in the source text. Note that the difference between endOffset() and startOffset() may not be equal to termText.length(), as the term text may have been altered by a stemmer or some other filter.
final public  char[]termBuffer()
     Returns the internal termBuffer character array which you can then directly alter.
final public  inttermLength()
     Return number of valid characters (length of the term) in the termBuffer array.
final public  StringtermText()
     Returns the Token's term text.
public  StringtoString()
    
final public  Stringtype()
     Returns this Token's lexical type.

Field Detail
DEFAULT_TYPE
final public static String DEFAULT_TYPE(Code)



endOffset
int endOffset(Code)



payload
Payload payload(Code)



positionIncrement
int positionIncrement(Code)



startOffset
int startOffset(Code)



termBuffer
char[] termBuffer(Code)



termLength
int termLength(Code)



type
String type(Code)




Constructor Detail
Token
public Token()(Code)
Constructs a Token will null text.



Token
public Token(int start, int end)(Code)
Constructs a Token with null text and start & end offsets.
Parameters:
  start - start offset
Parameters:
  end - end offset



Token
public Token(int start, int end, String typ)(Code)
Constructs a Token with null text and start & end offsets plus the Token type.
Parameters:
  start - start offset
Parameters:
  end - end offset



Token
public Token(String text, int start, int end)(Code)
Constructs a Token with the given term text, and start & end offsets. The type defaults to "word." NOTE: for better indexing speed you should instead use the char[] termBuffer methods to set the term text.
Parameters:
  text - term text
Parameters:
  start - start offset
Parameters:
  end - end offset



Token
public Token(String text, int start, int end, String typ)(Code)
Constructs a Token with the given text, start and end offsets, & type. NOTE: for better indexing speed you should instead use the char[] termBuffer methods to set the term text.
Parameters:
  text - term text
Parameters:
  start - start offset
Parameters:
  end - end offset
Parameters:
  typ - token type




Method Detail
clear
public void clear()(Code)
Resets the term text, payload, and positionIncrement to default. Other fields such as startOffset, endOffset and the token type are not reset since they are normally overwritten by the tokenizer.



clone
public Object clone()(Code)



endOffset
final public int endOffset()(Code)
Returns this Token's ending offset, one greater than the position of the last character corresponding to this token in the source text.



getPayload
public Payload getPayload()(Code)
Returns this Token's payload.



getPositionIncrement
public int getPositionIncrement()(Code)
Returns the position increment of this Token.
See Also:   Token.setPositionIncrement



resizeTermBuffer
public char[] resizeTermBuffer(int newSize)(Code)
Grows the termBuffer to at least size newSize.
Parameters:
  newSize - minimum size of the new termBuffer newly created termBuffer with length >= newSize



setEndOffset
public void setEndOffset(int offset)(Code)
Set the ending offset.
See Also:   Token.endOffset()
See Also:   



setPayload
public void setPayload(Payload payload)(Code)
Sets this Token's payload.



setPositionIncrement
public void setPositionIncrement(int positionIncrement)(Code)
Set the position increment. This determines the position of this token relative to the previous Token in a TokenStream , used in phrase searching.

The default value is one.

Some common uses for this are:

  • Set it to zero to put multiple terms in the same position. This is useful if, e.g., a word has multiple stems. Searches for phrases including either stem will match. In this case, all but the first stem's increment should be set to zero: the increment of the first instance should be one. Repeating a token with an increment of zero can also be used to boost the scores of matches on that token.
  • Set it to values greater than one to inhibit exact phrase matches. If, for example, one does not want phrases to match across removed stop words, then one could build a stop word filter that removes stop words and also sets the increment to the number of stop words removed before each non-stop word. Then exact phrase queries will only match when the terms occur with no intervening stop words.

See Also:   org.apache.lucene.index.TermPositions



setStartOffset
public void setStartOffset(int offset)(Code)
Set the starting offset.
See Also:   Token.startOffset()
See Also:   



setTermBuffer
final public void setTermBuffer(char[] buffer, int offset, int length)(Code)
Copies the contents of buffer, starting at offset for length characters, into the termBuffer array. NOTE: for better indexing speed you should instead retrieve the termBuffer, using Token.termBuffer() or Token.resizeTermBuffer(int) , and fill it in directly to set the term text. This saves an extra copy.



setTermLength
final public void setTermLength(int length)(Code)
Set number of valid characters (length of the term) in the termBuffer array.



setTermText
public void setTermText(String text)(Code)
Sets the Token's term text. NOTE: for better indexing speed you should instead use the char[] termBuffer methods to set the term text.



setType
final public void setType(String type)(Code)
Set the lexical type.
See Also:   Token.type()
See Also:   



startOffset
final public int startOffset()(Code)
Returns this Token's starting offset, the position of the first character corresponding to this token in the source text. Note that the difference between endOffset() and startOffset() may not be equal to termText.length(), as the term text may have been altered by a stemmer or some other filter.



termBuffer
final public char[] termBuffer()(Code)
Returns the internal termBuffer character array which you can then directly alter. If the array is too small for your token, use Token.resizeTermBuffer(int) to increase it. After altering the buffer be sure to call Token.setTermLength to record the number of valid characters that were placed into the termBuffer.



termLength
final public int termLength()(Code)
Return number of valid characters (length of the term) in the termBuffer array.



termText
final public String termText()(Code)
Returns the Token's term text. Token.termBuffer()Token.termLength()



toString
public String toString()(Code)



type
final public String type()(Code)
Returns this Token's lexical type. Defaults to "word".



Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.