it.unimi.dsi.mg4j.query.nodes

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Search Engine » mg4j » it.unimi.dsi.mg4j.query.nodes 
it.unimi.dsi.mg4j.query.nodes
MG4J: Managing Gigabytes for Java

Composite representation for queries

This package contains the classes that represent queries as an abstract syntax tree, or, in design-pattern jargon, as a composite.

Warning: Before reading this overview, the reader is encouraged to consult the first part of the {@link it.unimi.dsi.mg4j.search} package documentation, where the notion of query is briefly presented.

The basic idea is to perform a complete decoupling of the query construction and resolution into three levels:

  • String level: the query is a string, written in a suitable language where, for example, every operator has a certain syntax, uses a certain keyword etc.
  • Tree level: the query is an abstract tree, that contains one internal node for each operator, but that has nothing to do with the actual way the query was written; for example, one may want to give the user the possibility to write the query by using a visual editor, in which case there will be some way to obtain the tree representation of the query without having any string-level representation for it.
  • Iterator level: the document iterator that lazily solves the query over a certain index or set of indices; this is the semantic counterpart of a query, and it is discussed at large in the overview of the {@link it.unimi.dsi.mg4j.search} package.

This package overview shows how to pass from the string to the tree level using the built-in {@link it.unimi.dsi.mg4j.query.parser.QueryParser}, and explains what is needed to implement the tree level, as well as the basic instruments needed to convert the tree level into the iterator level.

In MG4J, a query parser is an implementation of {@link it.unimi.dsi.mg4j.query.parser.QueryParser}: essentially, an object with a method that transforms strings in {@linkplain it.unimi.dsi.mg4j.query.nodes.Query queries}. You can use any parser you like, or build your queries programmatically.

The parser provided with MG4J and whose syntax is described in the manual is currently generated using JavaCC: unfortunately, this tool produces some public classes that somehow clutter the package documentation. All the parsing logic is contained in the {@link it.unimi.dsi.mg4j.query.parser.SimpleParser} class, which is generated by the JavaCC source SimpleParser.jj.

Building a query tree out of a query string

MG4J allows one to perform queries on multiple indices; by saying so, we mean that you may have many indices concerning the same document collection, and you want to perform a query that may search on some of them. Think, for example, of a collection of emails. You might have generated a number of indices, to index their subjects, sender, recipient(s), body etc. All these indices can be thought of as individual entities, their only relation being that they index collections with the same number of documents, and that same-numbered documents conceptually "come" from the same source.

To get a query tree from a string representation, you must first build a {@link it.unimi.dsi.mg4j.query.parser.QueryParser} object. For instance, if you use the built-in {@link it.unimi.dsi.mg4j.query.parser.SimpleParser} you have to provide a map whose keys are index names (called index aliases) and with each index alias mapped to the corresponding {@link it.unimi.dsi.mg4j.index.Index}. Moreover, one of the aliases is taken to be the default index alias used for the queries.

In our example, we will assume that we have three index aliases (say, subject, from and body), and that subject is the default index alias. The parser would then be created as follows (here, we are stipulating that index with alias x has basename mail-x):

		Object2ReferenceMap<String,Index> indexAlias2Index = new Object2ReferenceOpenHashMap<String,Index>();
		Index defaultIndex;
		indexAlias2Index.put( "subject", defaultIndex = Index.getInstance( "mail-subject" ) );
		indexAlias2Index.put( "from", Index.getInstance( "mail-from" ) );
		indexAlias2Index.put( "body", Index.getInstance( "mail-body" ) );
		
		QueryParser parser = new QueryParser( indexAlias2Index.keySet(), "subject" );
		
after which you can use it for example as follows:
			Query query = parser.parse( "meeting AND body:(schedule OR urgent)" );
			DocumentIteratorBuilderVisitor visitor = new DocumentIteratorBuilderVisitor( indexAlias2Index, defaultIndex, 0 );
			DocumentIterator it = query.accept( visitor );
			while ( it.hasNext() ) {
				System.out.println( "Document #: " + it.nextDocument() );
				System.out.print( "\tPositions:" );
				for ( Interval interval: it )
					System.out.print( " " + interval );
				System.out.println();
			}
		

Structure of a {@link it.unimi.dsi.mg4j.query.nodes.Query}

As explained above, in this package a {@link it.unimi.dsi.mg4j.query.nodes.Query} is simply an abstract tree: its leaves will correspond to ground queries, whereas internal nodes correspond to query operators.

Ground queries can be {@linkplain it.unimi.dsi.mg4j.query.nodes.Term term queries}, {@linkplain it.unimi.dsi.mg4j.query.nodes.Prefix prefix queries}, and {@linkplain it.unimi.dsi.mg4j.query.nodes.MultiTerm multiterm queries} (the latter usually generated by some query-expansion mechanism). Any other query is either a {@linkplain it.unimi.dsi.mg4j.query.nodes.Composite composite query} (e.g., a {@linkplain it.unimi.dsi.mg4j.query.nodes.And conjunctive query}, a {@linkplain it.unimi.dsi.mg4j.query.nodes.Or disjunctive query} etc.), that is, a query composed by other subqueries, or it is obtained by some other operator (e.g., a {@linkplain it.unimi.dsi.mg4j.query.nodes.LowPass low-pass query} etc.).

Every query has a method {@link it.unimi.dsi.mg4j.query.nodes.Query#accept(it.unimi.dsi.mg4j.query.nodes.QueryBuilderVisitor)} a {@link it.unimi.dsi.mg4j.query.nodes.QueryBuilderVisitor} object, through which you can visit the query tree. More precisely, when the accept method is invoked on a query, a recursive visit of the query tree is performed: at every node n of type Q (Q being any class implementing {@link it.unimi.dsi.mg4j.query.nodes.Query}), the following steps are performed:

  • the visitor's visitPre(Q n) method is called; if the method returns false, the visit is interrupted; otherwise…
  • the subtrees are recursively visited, obtaining an array of results T[] x; if at a certain point during this visit some call decides that the visit should be interrupted, we stop; otherwise…
  • the visitor's visitPost(Q n, T[] x) method is called, and the result (of type T) is returned.

To make the idea of a visitor easier to understand, consider the following simple example of a visitor:

    	        static class PrinterVisitor extends AbstractQueryBuilderVisitor<String> {
                private static void appendArray( MutableString res, String t[], char sep ) {
                        for ( int i = 0; i < t.length - 1; i++ ) res.append( t[ i ] + sep );
                        res.append( t[ t.length - 1 ] );
                }
                public String[] newArray( int len ) { return new String[ len ]; }
                public String visitPost( OrderedAnd node, String[] t ) {
                        MutableString res = new MutableString();
                        res.append( "OAND(" );
                        appendArray( res, t, ',' );
                        res.append( ")" );
                        return res.toString();
                }
                public String visitPost( Or node, String[] t ) {
                        MutableString res = new MutableString();
                        res.append( "OR(" );
                        appendArray( res, t, ',' );
                        res.append( ")" );
                        return res.toString();
                }
                public String visitPost( And node, String[] t ) {
                        MutableString res = new MutableString();
                        res.append( "AND(" );
                        appendArray( res, t, ',' );
                        res.append( ")" );
                        return res.toString();
                }
                public String visitPost( Consecutive node, String[] t ) {
                        MutableString res = new MutableString();
                        res.append( "\"" );
                        appendArray( res, t, ' ' );
                        res.append( "\"" );
                        return res.toString();
                }
                public String visitPost( Not node, String t ) {
                        return "NOT(" + t + ")";
                }
                public String visitPost( LowPass node, String t ) {
                        return t + "~" + node.k;
                }
                public String visitPost( Select node, String t ) {
                        return node.index + ":" + t;
                }
                public String visit( Term node ) { return node.term.toString(); }
                public String visit( Prefix node ) { return node.prefix + "*"; }
        }
   	

Now, you can pass an instance of this visitor to a query, and as a result get a string (linear) representation of the query itself. (Of course, there is a simple way to get the same result, that is, calling the toString() method directly, but for this example is easy enough for illustrative purposes.)

Here is an example of how the above visitor can be used:

          public static void main( String arg[] ) throws Exception {
                Query qa = new Term( "a" );
                Query qb = new Prefix( "b" );
                Query qc = new Term( "c" );
                Query qd = new Term( "d" );

                Query q = new LowPass( new And( qa, new Select( "another_index", qb ), new Or( qc, qd ) ), 20 );

                System.out.println( q.accept( new PrinterVisitor() ) );
        }
  		
that produces AND(a,another_index:b*,OR(c,d))~20.

Building a document iterator out of a query

The interface {@link it.unimi.dsi.mg4j.query.nodes.Query} and its implementations provide the basic classes for the tree-level representation of a query. Building an iterator out of it can be thought of as a process of recursive instantiation of a query tree into an iterator tree; this is actually a sort of a copy visit, and it is indeed implemented as a special kind of visitor: the {@link it.unimi.dsi.mg4j.search.DocumentIteratorBuilderVisitor}. You can create one such visitor and use it to visit a query tree, obtaining as a result a document iterator that you can use to get the results.

To be more precise, let us recall the example presented in the last part of the {@linkplain it.unimi.dsi.mg4j.search overview of the search package}:

			Index subjectIndex = Index.getInstance( "mail-subject" );
			DocumentIterator it = AndDocumentIterator.getInstance( 
				subjectIndex.documents( "meeting" ), 
				subjectIndex.documents( "schedule" ), 
				subjectIndex.documents( "monday" ) 
			);
			while ( it.hasNext() ) {
					System.out.println( "Document #: " + it.nextDocument() );
					System.out.print( "\tPositions:" );
					for ( Interval interval: it )
						System.out.print( " " + interval );
					System.out.println();
			}
		

Here is an equivalent piece of code that uses the idea of decoupling the tree level from the iterator level.

			Index subjectIndex = Index.getInstance( "mail-subject" );
			Query q = new And( new Term( "meeting" ), new Term( "schedule" ), new Term( "monday" ) );
			DocumentIteratorBuilderVisitor visitor = new DocumentIteratorBuilderVisitor( null, subjectIndex, 0 );
			DocumentIterator it = q.accept( visitor );
			while ( it.hasNext() ) {
				System.out.println( "Document #: " + it.nextDocument() );
				System.out.print( "\tPositions:" );
				for ( Interval interval: it )
					System.out.print( " " + interval );
				System.out.println();
			}
		
Java Source File NameTypeComment
AbstractQueryBuilderVisitor.javaClass A it.unimi.dsi.mg4j.query.nodes.QueryBuilderVisitor that returns true on all visitPre() methods and does nothing on AbstractQueryBuilderVisitor.prepare() .
AbstractTermExpander.javaClass A that just requires implementing a method that (e.g., into disjunctive queries).
And.javaClass A node representing the logical and of the underlying queries.
CheckForSelectQueryVisitor.javaClass A it.unimi.dsi.mg4j.query.nodes.QueryBuilderVisitor that returns null only if the visited query contains a Select node.
Composite.javaClass A abstract composite node containing an array of underlying queries.
Consecutive.javaClass A node representing the consecutive composition of the underlying queries.
Difference.javaClass A node representing a difference of two queries.
LowPass.javaClass A node representing a low-pass filtering of the only underlying query.
MultiTerm.javaClass A node representing a virtual term obtained by merging the occurrences of the given terms.

This node is mainly useful when performing query expansion.

Not.javaClass A node representing the logical not of the underlying query.
Or.javaClass A node representing the logical or of the underlying queries.
OrderedAnd.javaClass A node representing the logical ordered and of the underlying queries.
Prefix.javaClass A node representing a set of terms defined by a common prefix.
Queries.javaClass Static methods and objects related to queries.
Query.javaInterface A node of a composite representing a query.

A query is abstractly represented by a composite made of implementations of this interface.

QueryBuilderVisitor.javaInterface A visitor for a composite query.
QueryBuilderVisitorException.javaClass A wrapper for unchecked exceptions thrown during a visit.

Since the operations of a visiting method are generic, any exception might be thrown by such a method.

QueryTransformer.javaInterface A strategy for transforming queries.
Range.javaClass A node representing a range query on a payload-only index.
Select.javaClass A node representing an index selection.
Term.javaClass A node representing a single term.
www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.