Source Code Cross Referenced for Index.java in  » Search-Engine » mg4j » it » unimi » dsi » mg4j » index » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Search Engine » mg4j » it.unimi.dsi.mg4j.index 
Source Cross Referenced  Class Diagram Java Document (Java Doc) 


001:        package it.unimi.dsi.mg4j.index;
002:
003:        /*		 
004:         * MG4J: Managing Gigabytes for Java
005:         *
006:         * Copyright (C) 2004-2007 Sebastiano Vigna 
007:         *
008:         *  This library is free software; you can redistribute it and/or modify it
009:         *  under the terms of the GNU Lesser General Public License as published by the Free
010:         *  Software Foundation; either version 2.1 of the License, or (at your option)
011:         *  any later version.
012:         *
013:         *  This library is distributed in the hope that it will be useful, but
014:         *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
015:         *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
016:         *  for more details.
017:         *
018:         *  You should have received a copy of the GNU Lesser General Public License
019:         *  along with this program; if not, write to the Free Software
020:         *  Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
021:         *
022:         */
023:
024:        import it.unimi.dsi.fastutil.ints.IntIterator;
025:        import it.unimi.dsi.fastutil.ints.IntIterators;
026:        import it.unimi.dsi.fastutil.ints.IntList;
027:        import it.unimi.dsi.fastutil.objects.Reference2ReferenceMap;
028:        import it.unimi.dsi.fastutil.objects.ReferenceSet;
029:        import it.unimi.dsi.fastutil.objects.ReferenceSets;
030:        import it.unimi.dsi.lang.ObjectParser;
031:        import it.unimi.dsi.mg4j.document.PropertyBasedDocumentFactory;
032:        import it.unimi.dsi.mg4j.index.cluster.IndexCluster;
033:        import it.unimi.dsi.mg4j.index.payload.Payload;
034:        import it.unimi.dsi.mg4j.index.remote.IndexServer;
035:        import it.unimi.dsi.mg4j.search.DocumentIterator;
036:        import it.unimi.dsi.mg4j.search.IntervalIterator;
037:        import it.unimi.dsi.mg4j.search.visitor.DocumentIteratorVisitor;
038:        import it.unimi.dsi.mg4j.util.MG4JClassParser;
039:        import it.unimi.dsi.Util;
040:        import it.unimi.dsi.util.ImmutableExternalPrefixMap;
041:        import it.unimi.dsi.util.Properties;
042:        import it.unimi.dsi.util.StringMap;
043:        import it.unimi.dsi.util.PrefixMap;
044:        import it.unimi.dsi.util.StringMaps;
045:
046:        import java.io.IOException;
047:        import java.io.Serializable;
048:        import java.lang.reflect.InvocationTargetException;
049:        import java.net.URI;
050:        import java.net.URISyntaxException;
051:        import java.util.EnumMap;
052:
053:        import org.apache.commons.configuration.ConfigurationException;
054:        import org.apache.log4j.Logger;
055:
056:        /** An abstract representation of an index.
057:         *
058:         * <P>Concrete subclasses of this class represent abstract index access
059:         * information: for instance, the basename or IP address/port,
060:         * flags, etc. It allows to build easily {@linkplain IndexReader index readers} over the index:
061:         * in turn, index readers provide {@linkplain it.unimi.dsi.mg4j.search.DocumentIterator document iterators}.
062:         * 
063:         * <P>In principle, this class should just contain methods declarations,
064:         * and attributes for all data that is common to any form of index.
065:         * Note that we use an abstract class, rather than an interface, because
066:         * interfaces do not allow to declare attributes. 
067:         * 
068:         * <P>This class provide static factory methods (e.g., {@link #getInstance(CharSequence)})
069:         * that return an index given a suitable URI string. If the scheme part is <samp>mg4j</samp>, then
070:         * the URI is assumed to point at a remote index. Otherwise, it is assumed to be the
071:         * basename of a local index. In both cases, a query part introduced by <samp>?</samp> can
072:         * specify additional parameters (<samp><var>key</var>=<var>value</var></samp> pairs separated
073:         * by <samp>;</samp>). For instance, the URI <samp>example?inmemory=1</samp> will load
074:         * the index with basename <samp>example</samp>, caching its content in core memory.
075:         * Please have a look at constants in {@link Index.UriKeys} 
076:         * (and analogous enums in subclasses) for additional parameters.
077:         *
078:         * <h2>Thread safety</h2>
079:         * 
080:         * <p>Indices are a natural candidate for multithreaded access. An instance of this class
081:         * <strong>must</strong> be thread safe as long as external data structures provided to its
082:         * constructors are. For instance, the tool {@link it.unimi.dsi.mg4j.tool.IndexBuilder} generates
083:         * a {@linkplain StringMaps#synchronize(PrefixMap) synchronized} {@link ImmutableExternalPrefixMap}
084:         * so that by default the resulting index is thread safe.
085:         * 
086:         * <p>For instance, a {@link it.unimi.dsi.mg4j.index.DiskBasedIndex} requires a list of
087:         * term offsets, term maps, etc. As long as all these data structures are thread safe, the
088:         * same is true of the index. Data structures created by static factory methods such as
089:         * {@link it.unimi.dsi.mg4j.index.DiskBasedIndex#getInstance(CharSequence)} are thread safe.
090:         * 
091:         * <p>Note that {@link it.unimi.dsi.mg4j.index.IndexReader}s returned by {@link #getReader()}
092:         * are <em>not</em> thread safe (even if the method {@link #getReader()} is). The logic behind
093:         * this arrangement is that you create as many reader as you need, and then {@link java.io.Closeable#close()} them. In a multithreaded
094:         * environment, a pool of index readers can be created, and a custom {@link it.unimi.dsi.mg4j.query.nodes.QueryBuilderVisitor}
095:         * can be used to build {@link it.unimi.dsi.mg4j.search.DocumentIterator}s using the given pool of readers. In
096:         * this case readers are not closed, but rather reused.
097:         * 
098:         * <h2>Read-once load</h2>
099:         * 
100:         * <p>Implementations of this class are strongly encouraged to offer <em>read-once</em> constructors
101:         * and factory methods: property files and other data related to the index (but not to an {@link it.unimi.dsi.mg4j.index.IndexReader}
102:         * should be read exactly once, and sequentially. This feature is very useful when 
103:         * {@linkplain it.unimi.dsi.mg4j.tool.Combine combining indices}.
104:         * 
105:         * @author Paolo Boldi
106:         * @author Sebastiano Vigna
107:         * @since 0.9
108:         */
109:
110:        public abstract class Index implements  Serializable {
111:            private static final Logger LOGGER = Util.getLogger(Index.class);
112:            private static final long serialVersionUID = 0;
113:
114:            /** Symbolic names for properties of a {@link it.unimi.dsi.mg4j.index.Index}. */
115:            public static enum PropertyKeys {
116:                /** The number of documents in the collection. */
117:                DOCUMENTS,
118:                /** The number of terms in the collection. */
119:                TERMS,
120:                /** The number of occurrences in the collection. */
121:                OCCURRENCES,
122:                /** The number of postings (pairs term/document) in the collection. */
123:                POSTINGS,
124:                /** The number of batches this index was (or should be) built from. */
125:                BATCHES,
126:                /** The maximum count. */
127:                MAXCOUNT,
128:                /** The maximum size (in words) of a document. */
129:                MAXDOCSIZE,
130:                /** Whether the index is case sensitive. */
131:                TERMPROCESSOR,
132:                /** A class for the payloads of this index. */
133:                PAYLOADCLASS,
134:                /** The specification of a compressiong flag. This property can be specified
135:                 * as many time as necessary (e.g., <samp>FREQUENCIES:GAMMA</samp>, <samp>POINTERS:GOLOMB</samp>, etc.). */
136:                CODING,
137:                /** The name of the {@link Index} class. */
138:                INDEXCLASS,
139:                /** The name of the field indexed by this index, if any. */
140:                FIELD,
141:                /** The size in bits of the index. */
142:                SIZE
143:            }
144:
145:            /** Keys to be used (downcased) in specifiying additional parameters to a MG4J URI. */
146:
147:            public static enum UriKeys {
148:                /** When set, forces loading a local index into core memory. */
149:                INMEMORY,
150:                /** When set, forces to map a local index into core memory. */
151:                MAPPED,
152:                /** The step used for creating the offset {@link it.unimi.dsi.mg4j.util.SemiExternalOffsetList}. If
153:                 * set to zero, the offset list will be entirely loaded into core memory. If negative, the list
154:                 * will be memory-mapped, and the absolute value will be used as step. */
155:                OFFSETSTEP,
156:                /** The name of a sizes file that will be loaded in case of an {@link IndexCluster}. */
157:                SIZES,
158:            }
159:
160:            /** The field indexed by this index, or <code>null</code>. */
161:            public final String field;
162:            /** The properties of this index. It is stored here for convenience (for instance,
163:             * if custom keys are added to the property file), but it may be <code>null</code>. */
164:            public final Properties properties;
165:            /** The number of documents of the collection. */
166:            public final int numberOfDocuments;
167:            /** The number of terms of the collection. This field might be set to -1 in some cases 
168:             * (for instance, in certain documental clusters). */
169:            public final int numberOfTerms;
170:            /** The number of occurrences of the collection. */
171:            public final long numberOfOccurrences;
172:            /** The number of postings (pairs term/document) of the collection. */
173:            public final long numberOfPostings;
174:            /** The maximum number of positions in an position list, or -1 if it is unknown. */
175:            public final int maxCount;
176:            /** The payload for this index, or <code>null</code>. */
177:            public final Payload payload;
178:            /** Whether this index contains payloads; if true, {@link #payload} is non-<code>null</code>. */
179:            public final boolean hasPayloads;
180:            /** Whether this index contains counts. */
181:            public final boolean hasCounts;
182:            /** Whether this index contains positions. */
183:            public final boolean hasPositions;
184:            /** The term processor used to build this index. */
185:            public final TermProcessor termProcessor;
186:            /** An immutable singleton set containing just {@link #keyIndex}. */
187:            public ReferenceSet<Index> singletonSet;
188:            /** The index used as a key to retrieve intervals. Usually equal to <code>this</code>, but it is {@linkplain #keyIndex(Index) settable}. */
189:            public Index keyIndex;
190:            /** The size of each document, or <code>null</code> if sizes are not necessary or not loaded in this index. */
191:            public final IntList sizes;
192:
193:            /** Creates a new instance, initialising all fields. */
194:            protected Index(final int numberOfDocuments,
195:                    final int numberOfTerms, final long numberOfPostings,
196:                    final long numberOfOccurrences, final int maxCount,
197:                    final Payload payload, final boolean hasCounts,
198:                    final boolean hasPositions,
199:                    final TermProcessor termProcessor, final String field,
200:                    final IntList sizes, final Properties properties) {
201:                this .numberOfDocuments = numberOfDocuments;
202:                this .numberOfTerms = numberOfTerms;
203:                this .numberOfPostings = numberOfPostings;
204:                this .numberOfOccurrences = numberOfOccurrences;
205:                this .maxCount = maxCount;
206:                this .payload = payload;
207:                this .hasPayloads = payload != null;
208:                this .hasCounts = hasCounts;
209:                this .hasPositions = hasPositions;
210:                this .termProcessor = termProcessor;
211:                this .field = field;
212:                this .properties = properties;
213:                this .keyIndex = this ;
214:                this .singletonSet = ReferenceSets.singleton(this );
215:                this .sizes = sizes;
216:            }
217:
218:            protected static TermProcessor getTermProcessor(
219:                    final Properties properties) {
220:                try {
221:                    // Catch old property files
222:                    if (properties
223:                            .getProperty(Index.PropertyKeys.TERMPROCESSOR) == null)
224:                        throw new IllegalArgumentException(
225:                                "No term processor has been specified (most likely, because of an obsolete property file)");
226:                    return ObjectParser.fromSpec(properties
227:                            .getString(Index.PropertyKeys.TERMPROCESSOR),
228:                            TermProcessor.class, MG4JClassParser.PACKAGE,
229:                            new String[] { "getInstance" });
230:                } catch (Exception e) {
231:                    throw new RuntimeException(e);
232:                }
233:            }
234:
235:            /** Returns a new index using the given URI.
236:             * 
237:             * <p>If <code>uri</code> has scheme <samp>mg4j</samp>, the index is considered to be remote
238:             * and index creation delegated to {@link IndexServer#getIndex(String, int, boolean, boolean)}. Otherwise,
239:             * we delegate to {@link DiskBasedIndex#getInstance(CharSequence, boolean, boolean, boolean, EnumMap)}.
240:             * 
241:             * @param uri the URI defining the index.
242:             * @param randomAccess whether the index should be accessible randomly.
243:             * @param documentSizes if true, document sizes will be loaded (note that sometimes document sizes
244:             * might be loaded anyway because the compression method for positions requires it).
245:             * @param maps if true, {@linkplain StringMap term} and {@linkplain PrefixMap prefix} maps will be guessed and loaded (this
246:             * feature might not be available with some kind of index). 
247:             */
248:            public static Index getInstance(final CharSequence uri,
249:                    final boolean randomAccess, final boolean documentSizes,
250:                    final boolean maps) throws IOException,
251:                    ConfigurationException, URISyntaxException,
252:                    ClassNotFoundException, SecurityException,
253:                    InstantiationException, IllegalAccessException,
254:                    InvocationTargetException, NoSuchMethodException {
255:                /* If the scheme is mg4j, then we are creating a remote
256:                 * index. If it is null, we assume it is a property file and load it. Otherwise, we
257:                 * assume it is a valid property file URI and try to download it. */
258:
259:                final String uriString = uri.toString();
260:                if (uriString.startsWith("mg4j:")) {
261:                    final URI u = new URI(uriString);
262:                    return IndexServer.getIndex(u.getHost(), u.getPort(),
263:                            randomAccess, documentSizes);
264:                }
265:
266:                final String basename, query;
267:
268:                if (uriString.startsWith("file:")) {
269:                    final URI u = new URI(uriString);
270:                    basename = u.getPath();
271:                    query = u.getQuery();
272:                } else {
273:                    final int questionMarkPos = uriString.indexOf('?');
274:                    basename = questionMarkPos == -1 ? uriString : uriString
275:                            .substring(0, questionMarkPos);
276:                    query = questionMarkPos == -1 ? null : uriString
277:                            .substring(questionMarkPos + 1);
278:                }
279:
280:                LOGGER.debug("Searching for an index with basename " + basename
281:                        + "...");
282:                Properties properties = new Properties(basename
283:                        + DiskBasedIndex.PROPERTIES_EXTENSION);
284:                LOGGER.debug("Properties: " + properties);
285:
286:                // We parse the key/value pairs appearing in the query part.
287:                final EnumMap<UriKeys, String> queryProperties = new EnumMap<UriKeys, String>(
288:                        UriKeys.class);
289:                if (query != null) {
290:                    String[] keyValue = query.split(";");
291:                    for (int i = 0; i < keyValue.length; i++) {
292:                        String[] piece = keyValue[i].split("=");
293:                        if (piece.length != 2)
294:                            throw new IllegalArgumentException(
295:                                    "Malformed key/value pair: " + keyValue[i]);
296:                        // Convert to standard keys
297:                        boolean found = false;
298:                        for (UriKeys key : UriKeys.values())
299:                            if (found = PropertyBasedDocumentFactory.sameKey(
300:                                    key, piece[0])) {
301:                                queryProperties.put(key, piece[1]);
302:                                break;
303:                            }
304:                        if (!found)
305:                            throw new IllegalArgumentException("Unknown key: "
306:                                    + piece[0]);
307:                    }
308:                }
309:
310:                String className = properties.getString(
311:                        Index.PropertyKeys.INDEXCLASS, "(missing index class)");
312:                // Temporary patch
313:                if ("it.unimi.dsi.mg4j.index.SkipFileIndex".equals(className))
314:                    className = FileIndex.class.getName();
315:                Class<?> indexClass = Class.forName(className);
316:
317:                // It is a cluster
318:                if (IndexCluster.class.isAssignableFrom(indexClass))
319:                    return IndexCluster.getInstance(basename, randomAccess,
320:                            documentSizes, queryProperties);
321:
322:                // Now we dispatch to DiskBasedIndex.getInstance().
323:                return DiskBasedIndex.getInstance(basename, properties,
324:                        randomAccess, documentSizes, maps, queryProperties);
325:            }
326:
327:            /** Returns a new index using the given URI, searching dynamically for term and prefix maps.
328:             * 
329:             * @param uri the URI defining the index.
330:             * @param randomAccess whether the index should be accessible randomly.
331:             * @param documentSizes if true, document sizes will be loaded (note that sometimes document sizes
332:             * might be loaded anyway because the compression method for positions requires it).
333:             * @see #getInstance(CharSequence, boolean, boolean, boolean)
334:             */
335:            public static Index getInstance(final CharSequence uri,
336:                    final boolean randomAccess, final boolean documentSizes)
337:                    throws IOException, ConfigurationException,
338:                    URISyntaxException, ClassNotFoundException,
339:                    SecurityException, InstantiationException,
340:                    IllegalAccessException, InvocationTargetException,
341:                    NoSuchMethodException {
342:                return getInstance(uri, randomAccess, documentSizes, true);
343:            }
344:
345:            /** Returns a new index using the given URI, searching dynamically for term and prefix maps and loading
346:             * document sizes only if it is necessary.   
347:             * 
348:             * @param uri the URI defining the index.
349:             * @param randomAccess whether the index should be accessible randomly.
350:             * @see #getInstance(CharSequence, boolean, boolean)
351:             */
352:            public static Index getInstance(final CharSequence uri,
353:                    final boolean randomAccess) throws ConfigurationException,
354:                    IOException, URISyntaxException, ClassNotFoundException,
355:                    SecurityException, InstantiationException,
356:                    IllegalAccessException, InvocationTargetException,
357:                    NoSuchMethodException {
358:                return getInstance(uri, randomAccess, false);
359:            }
360:
361:            /** Returns a new index using the given URI, searching dynamically for term and prefix maps, loading offsets but loading
362:             * document sizes only if it is necessary.   
363:             * 
364:             * @param uri the URI defining the index.
365:             * @see #getInstance(CharSequence, boolean)
366:             */
367:            public static Index getInstance(final CharSequence uri)
368:                    throws ConfigurationException, IOException,
369:                    URISyntaxException, ClassNotFoundException,
370:                    SecurityException, InstantiationException,
371:                    IllegalAccessException, InvocationTargetException,
372:                    NoSuchMethodException {
373:                return getInstance(uri, true);
374:            }
375:
376:            /** An iterator returning no documents based on this index. 
377:             * 
378:             * <P>Note that {@link #accept(DocumentIteratorVisitor)} does nothing
379:             * and returns true, whereas {@link #acceptOnTruePaths(DocumentIteratorVisitor)}
380:             * throws an {@link IllegalStateException}.
381:             */
382:            protected class EmptyIndexIterator extends
383:                    IntIterators.EmptyIterator implements  IndexIterator,
384:                    Serializable {
385:                private static final long serialVersionUID = 0;
386:
387:                public int document() {
388:                    throw new IllegalStateException();
389:                }
390:
391:                public ReferenceSet<Index> indices() {
392:                    return Index.this .singletonSet;
393:                }
394:
395:                public IntervalIterator intervalIterator() {
396:                    throw new IllegalStateException();
397:                }
398:
399:                public Reference2ReferenceMap<Index, IntervalIterator> intervalIterators() {
400:                    throw new IllegalStateException();
401:                }
402:
403:                public IntervalIterator intervalIterator(final Index index) {
404:                    throw new IllegalStateException();
405:                }
406:
407:                public int nextDocument() {
408:                    return -1;
409:                }
410:
411:                public int skipTo(final int n) {
412:                    return Integer.MAX_VALUE;
413:                }
414:
415:                public int frequency() {
416:                    return 0;
417:                }
418:
419:                public Payload payload() {
420:                    throw new IllegalStateException();
421:                }
422:
423:                public int count() {
424:                    throw new IllegalStateException();
425:                }
426:
427:                public IntIterator positions() {
428:                    throw new IllegalStateException();
429:                }
430:
431:                public int positions(final int[] positions) {
432:                    throw new IllegalStateException();
433:                }
434:
435:                public int[] positionArray() {
436:                    throw new IllegalStateException();
437:                }
438:
439:                public void dispose() {
440:                }
441:
442:                public Index index() {
443:                    return Index.this ;
444:                };
445:
446:                public boolean accept(DocumentIteratorVisitor visitor) {
447:                    return true;
448:                }
449:
450:                public boolean acceptOnTruePaths(DocumentIteratorVisitor visitor) {
451:                    throw new IllegalStateException();
452:                }
453:
454:                public String term() {
455:                    return null;
456:                }
457:
458:                public void term(final CharSequence term) { /* No-op allowed by contract. */
459:                }
460:
461:                public int id() {
462:                    return -1;
463:                }
464:
465:                public void id(final int id) { /* No-op allowed by contract. */
466:                }
467:
468:                public IntervalIterator iterator() {
469:                    return intervalIterator();
470:                }
471:
472:                public int termNumber() {
473:                    return -1;
474:                }
475:            }
476:
477:            /** A singleton for an iterator returning no documents based on this index. */
478:            public final EmptyIndexIterator emptyIndexIterator = new EmptyIndexIterator();
479:
480:            /** Creates and returns a new {@link IndexReader} based on this index, using
481:             * the default buffer size. After that, you can use the reader to read this index.
482:             * 
483:             * @return a new {@link IndexReader} to read this index.
484:             */
485:            public IndexReader getReader() throws IOException {
486:                return getReader(-1);
487:            }
488:
489:            /** Creates and returns a new {@link IndexReader} based on this index. After that, you
490:             *  can use the reader to read this index.
491:             * 
492:             * @param bufferSize the size of the buffer to be used accessing the reader, or -1
493:             * for a default buffer size.
494:             * @return a new {@link IndexReader} to read this index.
495:             */
496:            public abstract IndexReader getReader(final int bufferSize)
497:                    throws IOException;
498:
499:            /** Creates a new {@link IndexReader} for this index and uses it to return 
500:             * an index iterator over the documents containing a term.
501:             *
502:             * <p>Since the reader is created from scratch, it is essential
503:             * to {@linkplain it.unimi.dsi.mg4j.search.DocumentIterator#dispose() dispose} the
504:             * returned iterator after usage. See {@link IndexReader#documents(int)}
505:             * for a method with the same semantics, but making reader reuse possible.
506:             * 
507:             * @param term a term.
508:             * @throws IOException if an exception occurred while accessing the index.
509:             * @throws UnsupportedOperationException if this index is not accessible by term
510:             * number.
511:             * @see IndexReader#documents(int)
512:             */
513:            public IndexIterator documents(final int term) throws IOException {
514:                final IndexReader indexReader = getReader();
515:                final IndexIterator indexIterator = indexReader.documents(term);
516:                if (indexIterator == emptyIndexIterator)
517:                    indexReader.close();
518:                return indexIterator;
519:            }
520:
521:            /** Creates a new {@link IndexReader} for this index and uses it to return 
522:             * an index iterator over the documents containing a term; the term is
523:             *  given explicitly, and the index {@linkplain StringMap term map} is used, if present.
524:             *
525:             * <p>Since the reader is created from scratch, it is essential
526:             * to {@linkplain it.unimi.dsi.mg4j.search.DocumentIterator#dispose() dispose} the
527:             * returned iterator after usage. See {@link IndexReader#documents(int)}
528:             * for a method with the same semantics, but making reader reuse possible.
529:             * 
530:             * <p>Unless the {@linkplain Index#termProcessor term processor} of
531:             * this index is <code>null</code>, words coming from a query will
532:             * have to be processed before being used with this method.
533:             * 
534:             * @param term a term.
535:             * @throws IOException if an exception occurred while accessing the index.
536:             * @throws UnsupportedOperationException if the {@linkplain StringMap term map} is not 
537:             * available for this index.
538:             * @see IndexReader#documents(CharSequence)
539:             */
540:            public IndexIterator documents(final CharSequence term)
541:                    throws IOException {
542:                final IndexReader indexReader = getReader();
543:                final IndexIterator indexIterator = indexReader.documents(term);
544:                if (indexIterator == emptyIndexIterator)
545:                    indexReader.close();
546:                return indexIterator;
547:            }
548:
549:            /** Creates a number of instances of {@link IndexReader} for this index and uses them to return 
550:             * a document iterator over the documents containing a set of terms defined
551:             *  by a prefix; the prefix is given explicitly, and unless the index has a 
552:             *  {@linkplain PrefixMap prefix map}, an {@link UnsupportedOperationException}
553:             *  will be thrown. 
554:             *
555:             * <p>This method is not provided by {@link IndexReader} because it requires the
556:             * creation of several index readers at the same time. These readers must be
557:             * {@linkplain it.unimi.dsi.mg4j.search.DocumentIterator#dispose() disposed} afterwards.
558:             * 
559:             * @param prefix a prefix.
560:             * @param limit a limit on the number of terms that will be used to resolve
561:             * the prefix query; if the terms starting with <code>prefix</code> are more than
562:             * <code>limit</code>, a {@link TooManyTermsException} will be thrown. 
563:             * @throws IOException if an exception occurred while accessing the index.
564:             * @throws UnsupportedOperationException if this index cannot resolve prefixes.
565:             * @throws TooManyTermsException if there are more than <code>limit</code> terms starting with <code>prefix</code>.
566:             */
567:            public abstract IndexIterator documents(CharSequence prefix,
568:                    int limit) throws IOException, TooManyTermsException;
569:
570:            /** Set the index used as a key to retrieve intervals from iterators generated from this index.
571:             * 
572:             * <P>This setter is a compromise between clarity of design and efficiency.
573:             * Each index iterator is based on an index, and when that index is passed
574:             * to {@link DocumentIterator#intervalIterator(Index)}, intervals corresponding
575:             * to the positions of the term in the current document are returned. Analogously,
576:             * {@link it.unimi.dsi.mg4j.search.DocumentIterator#indices()} returns a singleton
577:             * set containing the index. However, when composing indices into clusters, 
578:             * often iterators generated by a local index must act as if they really belong
579:             * to the global index. This method allows to set the index that is used as
580:             * a key to return intervals, and that is contained in {@link #singletonSet}.   
581:             *
582:             * <P>Note that setting this value will only influence {@linkplain IndexReader index readers}
583:             * created afterwards.
584:             * 
585:             * @param newKeyIndex the new index to be used as a key for interval retrieval.
586:             */
587:
588:            public void keyIndex(Index newKeyIndex) {
589:                keyIndex = newKeyIndex;
590:                singletonSet = ReferenceSets.singleton(keyIndex);
591:            }
592:        }
www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.