Java Doc for DiskBasedIndex.java in  » Search-Engine » mg4j » it » unimi » dsi » mg4j » index » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Search Engine » mg4j » it.unimi.dsi.mg4j.index 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   it.unimi.dsi.mg4j.index.DiskBasedIndex

DiskBasedIndex
public class DiskBasedIndex (Code)
A static container providing facilities to load an index based on data stored on disk.

This class contains several useful static methods such as DiskBasedIndex.readOffsets(InputBitStream,int) and DiskBasedIndex.readSizes(InputBitStream,int) , and static factor methods such as DiskBasedIndex.getInstance(CharSequence,boolean,boolean,boolean,EnumMap) that take care of reading the properties associated to the index, identify the correct it.unimi.dsi.mg4j.index.Index implementation that should be used to load the index, and load the necessary data into memory.

As an option, a disk-based index can be loaded into main memory (key: Index.UriKeys.INMEMORY ), returning an it.unimi.dsi.mg4j.index.InMemoryIndex / InMemoryHPIndex , or mapped into main memory (key: Index.UriKeys.MAPPED ), returning a MemoryMappedIndex / InMemoryHPIndex (note that the value assigned to the keys is irrelevant). In both cases some insurmountable Java problems prevents using indices whose size exceeds two gigabytes (but see MemoryMappedIndex for some elaboration on this topic).

Moreover, by default the term-offset list is accessed using a it.unimi.dsi.mg4j.util.SemiExternalOffsetList with a step of DiskBasedIndex.DEFAULT_OFFSET_STEP . This behaviour can be changed using the URI key UriKeys.OFFSETSTEP .

Disk-based indices are the workhorse of MG4J. All other indices (clustered, remote, etc.) ultimately rely on disk-based indices to provide results.

Note that not all data produced by it.unimi.dsi.mg4j.tool.Scan and by the other indexing utilities are actually necessary to run a disk-based index. Usually the property file and the index file (plus the positions file, for ) are sufficient: if one needs random access, also the offsets file must be present, and if the compression method requires document sizes or if sizes are requested explicitly, also the sizes file must be present. A StringMap and possibly a PrefixMap will be fetched automatically by DiskBasedIndex.getInstance(CharSequence,boolean,boolean) using standard extensions.

Thread safety

A disk-based index is thread safe as long as the offset list, the size list and the term/prefix map are. The static factory methods provided by this class load offsets and sizes using data structures that are thread safe. If you use directly a constructor, instead, it is your responsability to pass thread-safe data structures.
author:
   Sebastiano Vigna
since:
   1.1



Field Summary
final public static  intDEFAULT_OFFSET_STEP
     The default value for the query parameter Index.UriKeys.OFFSETSTEP .
final public static  StringFREQUENCIES_EXTENSION
     Standard extension for the file of frequencies.
final public static  StringGLOBCOUNTS_EXTENSION
     Standard extension for the file of global counts.
final public static  StringINDEX_EXTENSION
     Standard extension for the index bitstream.
final public static  StringOFFSETS_EXTENSION
     Standard extension for the file of offsets.
final public static  StringPOSITIONS_EXTENSION
     Standard extension for the positions bitstream of an .
final public static  StringPREFIXMAP_EXTENSION
     Standard extension for the prefix map.
final public static  StringPROPERTIES_EXTENSION
     Standard extension for the index properties.
final public static  StringSIZES_EXTENSION
     Standard extension for the file of sizes.
final public static  StringSTATS_EXTENSION
     Standard extension for the stats file.
final public static  StringTERMMAP_EXTENSION
     Standard extension for the term map.
final public static  StringTERMS_EXTENSION
     Standard extension for the file of terms.
final public static  StringUNSORTED_TERMS_EXTENSION
     Standard extension for the file of terms, unsorted.


Method Summary
public static  BitStreamIndexgetInstance(CharSequence basename, Properties properties, StringMap<? extends CharSequence> termMap, PrefixMap<? extends CharSequence> prefixMap, boolean randomAccess, boolean documentSizes, EnumMap<UriKeys, String> queryProperties)
     Returns a new disk-based index, loading exactly the specified parts and using preloaded Properties .
public static  BitStreamIndexgetInstance(CharSequence basename, Properties properties, boolean randomAccess, boolean documentSizes, boolean maps, EnumMap<UriKeys, String> queryProperties)
     Returns a new disk-based index, using preloaded Properties and possibly guessing reasonable term and prefix maps from the basename.
public static  BitStreamIndexgetInstance(CharSequence basename, boolean randomAccess, boolean documentSizes, boolean maps, EnumMap<UriKeys, String> queryProperties)
     Returns a new disk-based index, possibly guessing reasonable term and prefix maps from the basename.

If there is a term map file (basename stemmed with .termmap), it is used as term map and, in case it implements PrefixMap .

public static  BitStreamIndexgetInstance(CharSequence basename, boolean randomAccess, boolean documentSizes, boolean maps)
     Returns a new disk-based index, using preloaded Properties and possibly guessing reasonable term and prefix maps from the basename.

If there is a term map file (basename stemmed with .termmap), it is used as term map and, in case it implements PrefixMap .

public static  BitStreamIndexgetInstance(CharSequence basename, boolean randomAccess, boolean documentSizes)
     Returns a new disk-based index, guessing reasonable term and prefix maps from the basename.
public static  BitStreamIndexgetInstance(CharSequence basename, boolean randomAccess)
     Returns a new local index, trying to guess reasonable term and prefix maps from the basename, and loading document sizes only if it is necessary.
public static  BitStreamIndexgetInstance(CharSequence basename)
     Returns a new local index, trying to guess reasonable term and prefix maps from the basename, loading offsets but loading document sizes only if it is necessary.
public static  PrefixMap<? extends CharSequence>loadPrefixMap(String filename)
     Utility static method that loads a prefix map.
Parameters:
  filename - the name of the file containing the prefix map.
public static  StringMap<? extends CharSequence>loadStringMap(String filename)
     Utility static method that loads a term map.
Parameters:
  filename - the name of the file containing the term map.
public static  LongListreadOffsets(InputBitStream in, int T)
     Utility method to load a compressed offset file into a list.
Parameters:
  in - the input bit stream providing the offsets (see BitStreamIndexWriter).
Parameters:
  T - the number of terms indexed.
public static  IntListreadSizes(InputBitStream in, int N)
     Utility method to load a compressed size file into a list.
Parameters:
  in - the input bit stream providing the offsets (see BitStreamIndexWriter).
Parameters:
  N - the number of documents indexed.

Field Detail
DEFAULT_OFFSET_STEP
final public static int DEFAULT_OFFSET_STEP(Code)
The default value for the query parameter Index.UriKeys.OFFSETSTEP .



FREQUENCIES_EXTENSION
final public static String FREQUENCIES_EXTENSION(Code)
Standard extension for the file of frequencies.



GLOBCOUNTS_EXTENSION
final public static String GLOBCOUNTS_EXTENSION(Code)
Standard extension for the file of global counts.



INDEX_EXTENSION
final public static String INDEX_EXTENSION(Code)
Standard extension for the index bitstream.



OFFSETS_EXTENSION
final public static String OFFSETS_EXTENSION(Code)
Standard extension for the file of offsets.



POSITIONS_EXTENSION
final public static String POSITIONS_EXTENSION(Code)
Standard extension for the positions bitstream of an .



PREFIXMAP_EXTENSION
final public static String PREFIXMAP_EXTENSION(Code)
Standard extension for the prefix map.



PROPERTIES_EXTENSION
final public static String PROPERTIES_EXTENSION(Code)
Standard extension for the index properties.



SIZES_EXTENSION
final public static String SIZES_EXTENSION(Code)
Standard extension for the file of sizes.



STATS_EXTENSION
final public static String STATS_EXTENSION(Code)
Standard extension for the stats file.



TERMMAP_EXTENSION
final public static String TERMMAP_EXTENSION(Code)
Standard extension for the term map.



TERMS_EXTENSION
final public static String TERMS_EXTENSION(Code)
Standard extension for the file of terms.



UNSORTED_TERMS_EXTENSION
final public static String UNSORTED_TERMS_EXTENSION(Code)
Standard extension for the file of terms, unsorted.





Method Detail
getInstance
public static BitStreamIndex getInstance(CharSequence basename, Properties properties, StringMap<? extends CharSequence> termMap, PrefixMap<? extends CharSequence> prefixMap, boolean randomAccess, boolean documentSizes, EnumMap<UriKeys, String> queryProperties) throws ClassNotFoundException, IOException, InstantiationException, IllegalAccessException(Code)
Returns a new disk-based index, loading exactly the specified parts and using preloaded Properties .
Parameters:
  basename - the basename of the index.
Parameters:
  properties - the properties obtained from the given basename.
Parameters:
  termMap - the term map for this index, or null for no term map.
Parameters:
  prefixMap - the prefix map for this index, or null for no prefix map.
Parameters:
  randomAccess - whether the index should be accessible randomly (e.g., if it willbe possible to call IndexReader.documents(int) on the index readers returned by the index).
Parameters:
  documentSizes - if true, document sizes will be loaded (note that sometimes document sizesmight be loaded anyway because the compression method for positions requires it).
Parameters:
  queryProperties - a map containing associations between Index.UriKeys and values, or null.



getInstance
public static BitStreamIndex getInstance(CharSequence basename, Properties properties, boolean randomAccess, boolean documentSizes, boolean maps, EnumMap<UriKeys, String> queryProperties) throws ClassNotFoundException, IOException, InstantiationException, IllegalAccessException(Code)
Returns a new disk-based index, using preloaded Properties and possibly guessing reasonable term and prefix maps from the basename.
Parameters:
  basename - the basename of the index.
Parameters:
  properties - the properties obtained by stemming basename.
Parameters:
  randomAccess - whether the index should be accessible randomly.
Parameters:
  documentSizes - if true, document sizes will be loaded.
Parameters:
  maps - if true, and maps will be guessed and loaded.
Parameters:
  queryProperties - a map containing associations between Index.UriKeys and values, or null.
throws:
  IllegalAccessException -
throws:
  InstantiationException -
See Also:   DiskBasedIndex.getInstance(CharSequence,Properties,StringMap,PrefixMap,boolean,boolean,EnumMap)



getInstance
public static BitStreamIndex getInstance(CharSequence basename, boolean randomAccess, boolean documentSizes, boolean maps, EnumMap<UriKeys, String> queryProperties) throws ConfigurationException, ClassNotFoundException, IOException, InstantiationException, IllegalAccessException(Code)
Returns a new disk-based index, possibly guessing reasonable term and prefix maps from the basename.

If there is a term map file (basename stemmed with .termmap), it is used as term map and, in case it implements PrefixMap . Otherwise, we search for a prefix map (basename stemmed with .prefixmap) and, if it implements StringMap and no term map has been found, we use it as prefix map.
Parameters:
  basename - the basename of the index.
Parameters:
  randomAccess - whether the index should be accessible randomly (e.g., if it willbe possible to call IndexReader.documents(int) on the index readers returned by the index).
Parameters:
  documentSizes - if true, document sizes will be loaded (note that sometimes document sizesmight be loaded anyway because the compression method for positions requires it).
Parameters:
  maps - if true, and maps will be guessed and loaded (thisfeature might not be available with some kind of index).
Parameters:
  queryProperties - a map containing associations between Index.UriKeys and values, or null.




getInstance
public static BitStreamIndex getInstance(CharSequence basename, boolean randomAccess, boolean documentSizes, boolean maps) throws ConfigurationException, ClassNotFoundException, IOException, InstantiationException, IllegalAccessException(Code)
Returns a new disk-based index, using preloaded Properties and possibly guessing reasonable term and prefix maps from the basename.

If there is a term map file (basename stemmed with .termmap), it is used as term map and, in case it implements PrefixMap . Otherwise, we search for a prefix map (basename stemmed with .prefixmap) and, if it implements StringMap and no term map has been found, we use it as prefix map.
Parameters:
  basename - the basename of the index.
Parameters:
  randomAccess - whether the index should be accessible randomly (e.g., if it willbe possible to call IndexReader.documents(int) on the index readers returned by the index).
Parameters:
  documentSizes - if true, document sizes will be loaded (note that sometimes document sizesmight be loaded anyway because the compression method for positions requires it).
Parameters:
  maps - if true, and maps will be guessed and loaded (thisfeature might not be available with some kind of index).
See Also:   DiskBasedIndex.getInstance(CharSequence,boolean,boolean,boolean,EnumMap)
See Also:   




getInstance
public static BitStreamIndex getInstance(CharSequence basename, boolean randomAccess, boolean documentSizes) throws ConfigurationException, ClassNotFoundException, IOException, InstantiationException, IllegalAccessException(Code)
Returns a new disk-based index, guessing reasonable term and prefix maps from the basename.
Parameters:
  basename - the basename of the index.
Parameters:
  randomAccess - whether the index should be accessible randomly (e.g., if it willbe possible to call IndexReader.documents(int) on the index readers returned by the index).
Parameters:
  documentSizes - if true, document sizes will be loaded (note that sometimes document sizesmight be loaded anyway because the compression method for positions requires it).



getInstance
public static BitStreamIndex getInstance(CharSequence basename, boolean randomAccess) throws ConfigurationException, ClassNotFoundException, IOException, InstantiationException, IllegalAccessException(Code)
Returns a new local index, trying to guess reasonable term and prefix maps from the basename, and loading document sizes only if it is necessary.
Parameters:
  basename - the basename of the index.
Parameters:
  randomAccess - whether the index should be accessible randomly (e.g., if it willbe possible to call IndexReader.documents(int) on the index readers returned by the index).



getInstance
public static BitStreamIndex getInstance(CharSequence basename) throws ConfigurationException, ClassNotFoundException, IOException, InstantiationException, IllegalAccessException(Code)
Returns a new local index, trying to guess reasonable term and prefix maps from the basename, loading offsets but loading document sizes only if it is necessary.
Parameters:
  basename - the basename of the index.



loadPrefixMap
public static PrefixMap<? extends CharSequence> loadPrefixMap(String filename) throws IOException(Code)
Utility static method that loads a prefix map.
Parameters:
  filename - the name of the file containing the prefix map. the map, or null if the file did not exist.
throws:
  IOException - if some IOException (other than FileNotFoundException) occurred.



loadStringMap
public static StringMap<? extends CharSequence> loadStringMap(String filename) throws IOException(Code)
Utility static method that loads a term map.
Parameters:
  filename - the name of the file containing the term map. the map, or null if the file did not exist.
throws:
  IOException - if some IOException (other than FileNotFoundException) occurred.



readOffsets
public static LongList readOffsets(InputBitStream in, int T) throws IOException(Code)
Utility method to load a compressed offset file into a list.
Parameters:
  in - the input bit stream providing the offsets (see BitStreamIndexWriter).
Parameters:
  T - the number of terms indexed. a list of longs backed by an array; the list hasan additional final element of index T that gives the numberof bytes of the index file.



readSizes
public static IntList readSizes(InputBitStream in, int N) throws IOException(Code)
Utility method to load a compressed size file into a list.
Parameters:
  in - the input bit stream providing the offsets (see BitStreamIndexWriter).
Parameters:
  N - the number of documents indexed. a list of integers backed by an array.



Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.