org.archive.crawler.settings

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Web Crawler » heritrix » org.archive.crawler.settings 
org.archive.crawler.settings
Provides classes for the settings framework.

The settings framework is designed to be a flexible way to configure a crawl with special treatment for subparts of the web without adding to much performance overhead.

At it's core the settings framework is a way to keep persistent, context sensitive configuration settings for any class in the crawler.

All classes in the crawler that has configurable settings subclasses {@link org.archive.crawler.settings.ComplexType} or one of its descendants. The {@link org.archive.crawler.settings.ComplexType} implements the {@link javax.management.DynamicMBean} interface. This gives you a way to ask the object for what attributes it supports and standard methods for getting and setting these attributes.

The entry point into the settings framework is the {@link org.archive.crawler.settings.SettingsHandler}. This class is responsible for loading and saving from persistent storage and for interconnecting the different parts of the framework.


Figure 1. Schematic view of the Settings Framework

Settings hierarchy

The settings framework supports a hierarchy of settings. This hierarchy is built by {@link org.archive.crawler.settings.CrawlerSettings} objects. On the top there is a settings object representing the global settings. This consist of all the settings that a crawl job needs for running. Beneath this global object there is one "per" settings object for each host/domain which has settings that should override the order for that particular host or domain.

When the settings framework is asked for an attribute for a specific host, it will first try to see if this attribute is set for this particular host. If it is, the value will be returned. If not, it will go up one level recursively until it eventually reach the order object and returns the global value. If no value is set here either (normally it would be), a hard coded default value is returned.

All per domain/host settings objects only contain those settings which are to be overridden for that particular domain/host. The convention is to name the top level object "global settings" and the objects beneath "per settings" or "overrides" (although the refinements described next, also do overriding).

To further complicate the picture, there is also settings objects called refinements. An object of this type belongs to a global or per settings object and overrides the settings in it's owners object if some criteria is met. These criteria could be that the URI in question conforms to a regular expression or that it the settings are consulted at a specific time of day limited by a time span.

ComplexType hierarchy

All the configurable modules in the crawler subclasses {@link org.archive.crawler.settings.ComplexType} or one of its descendants. The {@link org.archive.crawler.settings.ComplexType} is responsible for keeping the definition of the configurable attributes of the module. The actual values are stored in an instance of {@link org.archive.crawler.settings.DataContainer}. The {@link org.archive.crawler.settings.DataContainer} is never accessed directly from user code. Instead the user accesses the attributes through methods in the {@link org.archive.crawler.settings.ComplexType}. The attributes are accessed in different ways depending if it is from the user interface or from inside a running crawl.

When an attribute is accessed from the URI (either reading or writing) you want to make sure that you are editing the attribute in the right context. When trying to override an attribute, you don't want the settings framework to traverse up to effective value for the attribute, but instead want to know that the attribute is not set on this level. To achieve this, there is {@link org.archive.crawler.settings.ComplexType#getLocalAttribute(CrawlerSettings settings, String name)} and {@link org.archive.crawler.settings.ComplexType#setAttribute(CrawlerSettings settings, Attribute attribute)} methods taking a settings object as a parameter. These methods works only on the supplied settings object. In addition the methods {@link org.archive.crawler.settings.ComplexType#getAttribute(String)} and {@link org.archive.crawler.settings.ComplexType#setAttribute(Attribute attribute)} is there for conformance to the Java JMX specification. The latter two always works on the global settings object.

Getting an attribute within a crawl is different in that you always want to get a value even if it is not set in it's context. That means that the settings framework should work its way up the settings hierarchy to find the value in effect for the context. The method {@link org.archive.crawler.settings.ComplexType#getAttribute(String name, CrawlURI uri)} should be used to make sure that the right context is used. Figure 2 shows how the settings framework finds the effective value given a context.


Figure 2. Flow of getting an attribute

The different attributes has a type. The allowed type all subclasses the {@link org.archive.crawler.settings.Type} class. There are tree main Types:

  1. {@link org.archive.crawler.settings.SimpleType}
  2. {@link org.archive.crawler.settings.ListType}
  3. {@link org.archive.crawler.settings.ComplexType}
Except for the {@link org.archive.crawler.settings.SimpleType}, the actual type used will be a subclass of one of these main types.

SimpleType

The {@link org.archive.crawler.settings.SimpleType} is mainly for representing Java™ wrappers for the Java™ primitive types. In addition it also handles the {@link java.util.Date} type and a special Heritrix {@link org.archive.crawler.settings.TextField} type. Overrides of a {@link org.archive.crawler.settings.SimpleType} must be of the same type as the initial default value for the {@link org.archive.crawler.settings.SimpleType}.

ListType

The {@link org.archive.crawler.settings.ListType} is further subclassed into versions for some of the wrapped Java™ primitive types ({@link org.archive.crawler.settings.DoubleList}, {@link org.archive.crawler.settings.FloatList}, {@link org.archive.crawler.settings.IntegerList}, {@link org.archive.crawler.settings.LongList}, {@link org.archive.crawler.settings.StringList}). A List holds values in the same order as they were added. If an attribute of type {@link org.archive.crawler.settings.ListType} is overridden, then the complete list of values is replaced at the override level.

ComplexType

The {@link org.archive.crawler.settings.ComplexType} is a map of name/value pairs. The values can be any {@link org.archive.crawler.settings.Type} including new {@link org.archive.crawler.settings.ComplexType MapTypes}. The {@link org.archive.crawler.settings.ComplexType} is defined abstract and you should use one of the subclasses {@link org.archive.crawler.settings.MapType} or {@link org.archive.crawler.settings.ModuleType}. The {@link org.archive.crawler.settings.MapType} allows adding of new name/value pairs at runtime, while the {@link org.archive.crawler.settings.ModuleType} only allows the name/value pairs that it defines at construction time. When overriding the {@link org.archive.crawler.settings.MapType} the options are either override the value of an already existing attribute or add a new one. It is not possible in an override to remove an existing attribute. The {@link org.archive.crawler.settings.ModuleType} doesn't allow additions in overrides, but the predefined attributes' values might be overridden. Since the {@link org.archive.crawler.settings.ModuleType} is defined at construction time, it is possible to set more restrictions on each attribute than in the {@link org.archive.crawler.settings.MapType}. Another consequence of definition at construction time is that you would normally subclass the {@link org.archive.crawler.settings.ModuleType}, while the {@link org.archive.crawler.settings.MapType} is usable as it is. It is possible to restrict the {@link org.archive.crawler.settings.MapType} to only allow attributes of a certain type. There is also a restriction that {@link org.archive.crawler.settings.MapType MapTypes} can not contain nested {@link org.archive.crawler.settings.MapType MapTypes}.
Java Source File NameTypeComment
ComplexType.javaClass Superclass of all configurable modules. This class is in many ways the heart of the settings framework.
Constraint.javaClass Superclass for constraints that can be set on attribute definitions.

Constraints will be checked against attribute values.

CrawlerSettings.javaClass Class representing a settings file. More precisely it represents a collection of settings valid in a particular scope.
CrawlerSettingsTest.javaClass
CrawlSettingsSAXHandler.javaClass An SAX element handler that updates a CrawlerSettings object.
CrawlSettingsSAXSource.javaClass Class that takes a CrawlerSettings object and create SAXEvents from it.
DataContainer.javaClass This class holds the data for a ComplexType for a settings object.
DoubleList.javaClass
FloatList.javaClass
IntegerList.javaClass
LegalValueListConstraint.javaClass A constraint that checks that an attribute value matches one of the items in the list of legal values.
LegalValueTypeConstraint.javaClass
ListType.javaClass Super type for all lists.
LongList.javaClass
MapType.javaClass This class represents a container of settings. This class is usually used to make it possible to have a dynamic number of ModuleTypes like for instance a list of filters of different type. When this type is overridden on a per domain basis, the following restrictions apply:
  • Added elements is placed after the elements in the map it overrides.
  • You can not remove elements from the map it overrides.
MapTypeTest.javaClass
ModuleAttributeInfo.javaClass
ModuleType.javaClass Superclass of all modules that should be configurable.
OverrideTest.javaClass Test the concept of overrides. As this test is testing a concept, it involves more than one class to be tested.
RegularExpressionConstraint.javaClass A constraint that checks that a value matches a regular expression.
SettingsCache.javaClass This class keeps a map of host names to settings objects.
SettingsFrameworkTestCase.javaClass Set up a couple of settings to test different functions of the settings framework.
SettingsHandler.javaClass An instance of this class holds a hierarchy of settings.
SimpleType.javaClass A type that holds a Java type.
SimpleTypeTest.javaClass
SoftSettingsHash.javaClass
StringList.javaClass List of String values.
TextField.javaClass Class to hold values for text fields. Objects of this class could be used instead of java.lang.String to hold text strings with newlines in it.
Type.javaClass Interface implemented by all element types.
ValueErrorHandler.javaInterface If a ValueErrorHandler is registered with a SettingsHandler , only constraints with level java.util.logging.Level.SEVERE will throw an javax.management.InvalidAttributeValueException .
XMLSettingsHandler.javaClass A SettingsHandler which uses XML files as persistent storage.
XMLSettingsHandlerTest.javaClass Tests the handling of settings files.
www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.