Java Doc for CrawlURI.java in » Web-Crawler » heritrix » org » archive » crawler » datamodel » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI

Java

Java Tutorial

Illustrator Tutorials

GIMP Tutorials

C# / C Sharp

C# / CSharp Tutorial

C# / CSharp Open Source

SQL Server / T-SQL Tutorial

Oracle PL / SQL

Oracle PL/SQL Tutorial

Flash / Flex / ActionScript

VBA / Excel / Access / Word

XML

XML Tutorial

Microsoft Office PowerPoint 2007 Tutorial

Microsoft Office Excel 2007 Tutorial

Microsoft Office Word 2007 Tutorial

Java Source Code / Java Documentation » Web Crawler » heritrix » org.archive.crawler.datamodel

Source Cross Reference Class Diagram Java Document (Java Doc)

java.lang .Object

org.archive.crawler.datamodel .CandidateURI

org.archive.crawler.datamodel .CrawlURI

CrawlURI
public class CrawlURI extends CandidateURI implements FetchStatusCodes(Code)
	Represents a candidate URI and the associated state it collects as it is crawled. Core state is in instance variables but a flexible attribute list is also available. Use this 'bucket' to carry custom processing extracted data and state across CrawlURI processing. See the CrawlURI.putString(String,String) , CrawlURI.getString(String) , etc. author: Gordon Mohr

Field Summary
final public static int	MAX_OUTLINKS Protection against outlink overflow.
final public static int	UNCALCULATED
transient Object	holder
int	holderCost spot for an integer cost to be placed by external facility (frontier).
transient Object	holderKey
protected long	ordinal Monotonically increasing number within a crawl; useful for tending towards breadth-first ordering.
transient Collection<Object>	outLinks All discovered outbound Links (navlinks, embeds, etc.) Can either contain Link instances or CandidateURI instances, or both. The LinksScoper processor converts Link instances in this collection to CandidateURI instances.

Constructor Summary
public	CrawlURI(UURI uuri) Create a new instance of CrawlURI from a UURI .
public	CrawlURI(CandidateURI caUri, long o)

Method Summary
public void	aboutToLog()
public static void	addAlistPersistentMember(Object key) Add the key of alist items you want to persist across processings.
public void	addAnnotation(String annotation) Add an annotation: an abbrieviated indication of something special about this URI that need not be present in every crawl.log line, but should be noted for future reference.
public void	addCredentialAvatar(CredentialAvatar ca) Add an avatar.
public void	addLocalizedError(String processorName, Throwable ex, String message) Make note of a non-fatal error, local to a particular Processor, which should be logged somewhere, but allows processing to continue.
public void	addOutLink(Link link) Add a discovered Link, unless it would exceed the max number to accept.
protected boolean	annotationContains(String str2Find)
public void	clearOutlinks()
public void	createAndAddLink(String url, CharSequence context, char hopType)
public void	createAndAddLinkRelativeToBase(String url, CharSequence context, char hopType)
public void	createAndAddLinkRelativeToVia(String url, CharSequence context, char hopType) Convenience method for creating a Link with the given string and context, relative to this CrawlURI's via UURI if available.
public Link	createLink(String url, CharSequence context, char hopType)
public static String	fetchStatusCodesToString(int code) Takes a status code and converts it into a human readable string.
public static CrawlURI	from(CandidateURI caUri, long ordinal) Make a `CrawlURI` from the passed `CandidateURI`. Its safe to pass a CrawlURI instance.
public String	getAnnotations() Get the annotations set for this uri.
public UURI	getBaseURI() Get the (HTML) Base URI used for derelativizing internal URIs.
protected String	getClassSimpleName(Class c)
public Object	getContentDigest() Return the retained content-digest value, if any.
public String	getContentDigestSchemeString()
public String	getContentDigestString()
public long	getContentLength() For completed HTTP transactions, the length of the content-body.
public long	getContentSize() Get the size in bytes of this URI's recorded content, inclusive of things like protocol headers.
public String	getContentType() Get the content type of this URI. Fetched URIs content type.
public String	getCrawlURIString()
public Set<CredentialAvatar>	getCredentialAvatars() Credential avatars.
public int	getDeferrals() Get the deferral count.
public int	getEmbedHopCount() Get the embeded hop count.
public int	getFetchAttempts() Get the number of attempts at getting the document referenced by this URI.
public int	getFetchStatus() Return the overall/fetch status of this CrawlURI for its current trip through the processing loop.
public Object	getHolder() Return the 'holder' for the convenience of an external facility.
public int	getHolderCost()
public Object	getHolderKey() Return the 'holderKey' for convenience of an external facility (Frontier).
public HttpRecorder	getHttpRecorder() Get the http recorder associated with this uri. Returns the httpRecorder.
public int	getLinkHopCount() Get the link hop count.
public long	getOrdinal() Get the ordinal (serial number) assigned at creation.
public Collection<CandidateURI>	getOutCandidates() Returns discovered candidate URIs.
public Collection<Link>	getOutLinks() Returns discovered links.
public Collection<Object>	getOutObjects() Returns all of the outbound objects.
public AList	getPersistentAList()
public Object	getPrerequisiteUri() Get the prerequisite for this URI.
public long	getRecordedSize()
public int	getThreadNumber() Get the number of the ToeThread responsible for processing this uri.
public String	getUserAgent() Get the user agent to use for crawling this URI.
public boolean	hasBeenLinkExtracted() If true then a link extractor has already claimed this CrawlURI and performed link extraction on the document content.
public boolean	hasCredentialAvatars()
public boolean	hasPrerequisiteUri()
public boolean	hasRfc2617CredentialAvatar()
public void	incrementDeferrals() Increment the deferral count.
public int	incrementFetchAttempts() Increment the number of attempts at getting the document referenced by this URI.
public boolean	is2XXSuccess()
public boolean	isHeaderTruncatedFetch()
public boolean	isHttpTransaction() Return true if this is a http transaction.
public boolean	isLengthTruncatedFetch()
public boolean	isPost() Returns true if this URI should be fetched by sending a HTTP POST request.
public boolean	isPrerequisite() Returns true if this CrawlURI is a prerequisite.
public boolean	isSuccess() Ask this URI if it was a success or not. Only makes sense to call this method after execution of HttpMethod#execute.
public boolean	isTimeTruncatedFetch()
public boolean	isTruncatedFetch() TODO: Implement truncation using booleans rather than as this ugly String parse.
public void	linkExtractorFinished() Note that link extraction has been performed on this CrawlURI.
public void	markAsSeed() Mark this uri as being a seed.
public void	markPrerequisite(String preq, ProcessorChain lastProcessorChain) Do all actions associated with setting a `CrawlURI` as requiring a prerequisite. Parameters: lastProcessorChain - Last processor chain reference.
public Processor	nextProcessor() Get the next processor to process this URI.
public ProcessorChain	nextProcessorChain() Get the processor chain that should be processing this URI after the current chain is finished with it.
public int	outlinksSize()
public void	processingCleanup() Clean up after a run through the processing chain. Called on the end of processing chain by Frontier#finish.
public static boolean	removeAlistPersistentMember(Object key) Parameters: key - Key to remove.
public boolean	removeCredentialAvatar(CredentialAvatar ca) Remove all credential avatars from this crawl uri. Parameters: ca - Avatar to remove.
public void	removeCredentialAvatars() Remove all credential avatars from this crawl uri.
public void	replaceOutlinks(Collection<CandidateURI> links) Replace current collection of links w/ passed list.
public void	resetDeferrals() Reset deferrals counter.
public void	resetFetchAttempts() Reset fetchAttempts counter.
public void	setBaseURI(String baseHref) Set the (HTML) Base URI used for derelativizing internal URIs.
public void	setContentDigest(byte[] digestValue) Set the retained content-digest value (usu.
public void	setContentDigest(String scheme, byte[] digestValue)
public void	setContentSize(long l) Sets the 'content size' for the URI, which is considered inclusive of all recorded material (such as protocol headers) or even material 'virtually' considered (as in material from a previous fetch confirmed unchanged with a server).
public void	setContentType(String ct) Set a fetched uri's content type. Parameters: ct - Contenttype.
public void	setFetchStatus(int newstatus) Set the overall/fetch status of this CrawlURI for its current trip through the processing loop.
public void	setHolder(Object obj) Remember a 'holder' to which some enclosing/queueing facility has assigned this CrawlURI .
public void	setHolderCost(int cost)
public void	setHolderKey(Object obj) Remember a 'holderKey' which some enclosing/queueing facility has assigned this CrawlURI .
public void	setHttpRecorder(HttpRecorder httpRecorder) Set the http recorder to be associated with this uri.
public void	setNextProcessor(Processor processor) Set the next processor to process this URI.
public void	setNextProcessorChain(ProcessorChain nextProcessorChain) Set the next processor chain to process this URI.
public void	setPost(boolean b) Set whether this URI should be fetched by sending a HTTP POST request. Else a HTTP GET request will be used. Parameters: b - Set whether this curi is to be POST'd.
public void	setPrerequisite(boolean prerequisite) Set if this CrawlURI is itself a prerequisite URI.
public void	setPrerequisiteUri(Object link) Set a prerequisite for this URI.
public void	setThreadNumber(int i) Set the number of the ToeThread responsible for processing this uri.
public void	setUserAgent(String string) Set the user agent to use when crawling this URI.
public void	skipToProcessor(ProcessorChain processorChain, Processor processor) Set which processor should be the next processor to process this uri instead of using the default next processor.
public void	skipToProcessorChain(ProcessorChain processorChain) Set which processor chain should be processing this uri next.
public void	stripToMinimal() Remove all attributes set on this uri.

Field Detail

MAX_OUTLINKS
final public static int MAX_OUTLINKS(Code)
	Protection against outlink overflow. Change value by setting alternate maximum in heritrix.properties.

UNCALCULATED
final public static int UNCALCULATED(Code)

holder
transient Object holder(Code)

holderCost
int holderCost(Code)
	spot for an integer cost to be placed by external facility (frontier). cost is truncated to 8 bits at times, so should not exceed 255

holderKey
transient Object holderKey(Code)

ordinal
protected long ordinal(Code)
	Monotonically increasing number within a crawl; useful for tending towards breadth-first ordering. Will sometimes be truncated to 48 bits, so behavior over 281 trillion instantiated CrawlURIs may be buggy

outLinks
transient Collection<Object> outLinks(Code)
	All discovered outbound Links (navlinks, embeds, etc.) Can either contain Link instances or CandidateURI instances, or both. The LinksScoper processor converts Link instances in this collection to CandidateURI instances.

Constructor Detail

CrawlURI
public CrawlURI(UURI uuri)(Code)
	Create a new instance of CrawlURI from a UURI . Parameters: uuri - the UURI to base this CrawlURI on.

CrawlURI
public CrawlURI(CandidateURI caUri, long o)(Code)
	Create a new instance of CrawlURI from a CandidateURI Parameters: caUri - the CandidateURI to base this CrawlURI on. Parameters: o - Monotonically increasing number within a crawl.

Method Detail

aboutToLog
public void aboutToLog()(Code)
	Notify CrawlURI it is about to be logged; opportunity for self-annotation

addAlistPersistentMember
public static void addAlistPersistentMember(Object key)(Code)
	Add the key of alist items you want to persist across processings. Parameters: key - Key to add.

addAnnotation
public void addAnnotation(String annotation)(Code)
	Add an annotation: an abbrieviated indication of something special about this URI that need not be present in every crawl.log line, but should be noted for future reference. Parameters: annotation - the annotation to add; should not contain whitespace or a comma

addCredentialAvatar
public void addCredentialAvatar(CredentialAvatar ca)(Code)
	Add an avatar. We do lazy instantiation. Parameters: ca - Credential avatar to add to set of avatars.

addLocalizedError
public void addLocalizedError(String processorName, Throwable ex, String message)(Code)
	Make note of a non-fatal error, local to a particular Processor, which should be logged somewhere, but allows processing to continue. This is how you add to the local-error log (the 'localized' in the below is making an error local rather than global, not making a swiss-french version of the error.). Parameters: processorName - Name of processor the exception was thrownin. Parameters: ex - Throwable to log. Parameters: message - Extra message to log beyond exception message.

addOutLink
public void addOutLink(Link link)(Code)
	Add a discovered Link, unless it would exceed the max number to accept. (If so, increment discarded link counter.) Parameters: link - the Link to add

annotationContains
protected boolean annotationContains(String str2Find)(Code)

clearOutlinks
public void clearOutlinks()(Code)

createAndAddLink
public void createAndAddLink(String url, CharSequence context, char hopType) throws URIException(Code)
	Convenience method for creating a Link with the given string and context Parameters: url - String to use to create Link Parameters: context - CharSequence context to use Parameters: hopType - throws: URIException - if Link UURI cannot be constructed

createAndAddLinkRelativeToBase
public void createAndAddLinkRelativeToBase(String url, CharSequence context, char hopType) throws URIException(Code)
	Convenience method for creating a Link with the given string and context, relative to a previously set base HREF if available (or relative to the current CrawlURI if no other base has been set) Parameters: url - String URL to add as destination of link Parameters: context - String context where link was discovered Parameters: hopType - char hop-type indicator throws: URIException -

createAndAddLinkRelativeToVia
public void createAndAddLinkRelativeToVia(String url, CharSequence context, char hopType) throws URIException(Code)
	Convenience method for creating a Link with the given string and context, relative to this CrawlURI's via UURI if available. (If a via is not available, falls back to using #createAndAddLinkRelativeToBase.) Parameters: url - String URL to add as destination of link Parameters: context - String context where link was discovered Parameters: hopType - char hop-type indicator throws: URIException -

createLink
public Link createLink(String url, CharSequence context, char hopType) throws URIException(Code)
	Convenience method for creating a Link discovered at this URI with the given string and context Parameters: url - String to use to create Link Parameters: context - CharSequence context to use Parameters: hopType - Link. throws: URIException - if Link UURI cannot be constructed

fetchStatusCodesToString
public static String fetchStatusCodesToString(int code)(Code)
	Takes a status code and converts it into a human readable string. Parameters: code - the status code a human readable string declaring what the status code is.

from
public static CrawlURI from(CandidateURI caUri, long ordinal)(Code)
	Make a `CrawlURI` from the passed `CandidateURI`. Its safe to pass a CrawlURI instance. In this case we just return it as a result. Otherwise, we create new CrawlURI instance. Parameters: caUri - Candidate URI. Parameters: ordinal - A crawlURI made from the passed CandidateURI.

getAnnotations
public String getAnnotations()(Code)
	Get the annotations set for this uri. the annotations set for this uri.

getBaseURI
public UURI getBaseURI()(Code)
	Get the (HTML) Base URI used for derelativizing internal URIs. UURI base URI previously set

getClassSimpleName
protected String getClassSimpleName(Class c)(Code)

getContentDigest
public Object getContentDigest()(Code)
	Return the retained content-digest value, if any. Digest value.

getContentDigestSchemeString
public String getContentDigestSchemeString()(Code)

getContentDigestString
public String getContentDigestString()(Code)

getContentLength
public long getContentLength()(Code)
	For completed HTTP transactions, the length of the content-body. For completed HTTP transactions, the length of the content-body.

getContentSize
public long getContentSize()(Code)
	Get the size in bytes of this URI's recorded content, inclusive of things like protocol headers. It is the responsibility of the classes which fetch the URI to set this value accordingly -- it is not calculated/verified within CrawlURI. This value is consulted in reporting/logging/writing-decisions. See Also: CrawlURI.setContentSize() contentSize

getContentType
public String getContentType()(Code)
	Get the content type of this URI. Fetched URIs content type. May be null.

getCrawlURIString
public String getCrawlURIString()(Code)
	This crawl URI as a string wrapped with 'CrawlURI(' +')'.

getCredentialAvatars
public Set<CredentialAvatar> getCredentialAvatars()(Code)
	Credential avatars. Null if none set.

getDeferrals
public int getDeferrals()(Code)
	Get the deferral count. the deferral count.

getEmbedHopCount
public int getEmbedHopCount()(Code)
	Get the embeded hop count. the embeded hop count.

getFetchAttempts
public int getFetchAttempts()(Code)
	Get the number of attempts at getting the document referenced by this URI. the number of attempts at getting the document referenced by thisURI.

getFetchStatus
public int getFetchStatus()(Code)
	Return the overall/fetch status of this CrawlURI for its current trip through the processing loop. a value from FetchStatusCodes

getHolder
public Object getHolder()(Code)
	Return the 'holder' for the convenience of an external facility. holder

getHolderCost
public int getHolderCost()(Code)
	Return the 'holderCost' for convenience of external facility (frontier) value of holderCost

getHolderKey
public Object getHolderKey()(Code)
	Return the 'holderKey' for convenience of an external facility (Frontier). holderKey

getHttpRecorder
public HttpRecorder getHttpRecorder()(Code)
	Get the http recorder associated with this uri. Returns the httpRecorder. May be null but its set early inFetchHttp so there is an issue if its null.

getLinkHopCount
public int getLinkHopCount()(Code)
	Get the link hop count. the link hop count.

getOrdinal
public long getOrdinal()(Code)
	Get the ordinal (serial number) assigned at creation. ordinal

getOutCandidates
public Collection<CandidateURI> getOutCandidates()(Code)
	Returns discovered candidate URIs. The returned collection will be emtpy until something like LinksScoper promotes discovered Links into CandidateURIs. Elements can be removed from the returned collection, but not added. To add a candidate URI, use CrawlURI.replaceOutlinks(Collection) or CrawlURI.getOutObjects . Collection of candidate URIs

getOutLinks
public Collection<Link> getOutLinks()(Code)
	Returns discovered links. The returned collection might be empty if no links were discovered, or if something like LinksScoper promoted the links to CandidateURIs. Elements can be removed from the returned collection, but not added. To add a discovered link, use one of the createAndAdd methods or CrawlURI.getOutObjects() . Collection of all discovered outbound Links

getOutObjects
public Collection<Object> getOutObjects()(Code)
	Returns all of the outbound objects. The returned Collection will contain Link instances, or CandidateURI instances, or both. the collection of Links and/or CandidateURIs

getPersistentAList
public AList getPersistentAList()(Code)

getPrerequisiteUri
public Object getPrerequisiteUri()(Code)
	Get the prerequisite for this URI. A prerequisite is a URI that must be crawled before this URI can be crawled. the prerequisite for this URI or null if no prerequisite.

getRecordedSize
public long getRecordedSize()(Code)
	Get size of data recorded (transferred) recorded data size

getThreadNumber
public int getThreadNumber()(Code)
	Get the number of the ToeThread responsible for processing this uri. the ToeThread number.

getUserAgent
public String getUserAgent()(Code)
	Get the user agent to use for crawling this URI. If null the global setting should be used. user agent or null

hasBeenLinkExtracted

public boolean hasBeenLinkExtracted()(Code)

If true then a link extractor has already claimed this CrawlURI and performed link extraction on the document content. This does not preclude other link extractors that may have an interest in this CrawlURI from also doing link extraction but default behavior should be to not run if link extraction has already been done.

There is an onus on link extractors to set this flag if they have run.

The only extractor of the default Heritrix set that does not respect this flag is org.archive.crawler.extractor.ExtractorHTTP . It runs against HTTP headers, not the document content. True if a processor has performed link extraction on thisCrawlURI
See Also: CrawlURI.linkExtractorFinished()

hasCredentialAvatars
public boolean hasCredentialAvatars()(Code)
	True if there are avatars attached to this instance.

hasPrerequisiteUri
public boolean hasPrerequisiteUri()(Code)
	True if this CrawlURI has a prerequisite.

hasRfc2617CredentialAvatar
public boolean hasRfc2617CredentialAvatar()(Code)
	True if we have an rfc2617 payload.

incrementDeferrals
public void incrementDeferrals()(Code)
	Increment the deferral count.

incrementFetchAttempts
public int incrementFetchAttempts()(Code)
	Increment the number of attempts at getting the document referenced by this URI. the number of attempts at getting the document referenced by thisURI.

is2XXSuccess
public boolean is2XXSuccess()(Code)
	True if status code is in the 2xx range. See Also: CrawlURI.isSuccess()

isHeaderTruncatedFetch
public boolean isHeaderTruncatedFetch()(Code)

isHttpTransaction
public boolean isHttpTransaction()(Code)
	Return true if this is a http transaction. TODO: Compound this and CrawlURI.isPost() method so that there is one place to go to find out if get http, post http, ftp, dns. True if this is a http transaction.

isLengthTruncatedFetch
public boolean isLengthTruncatedFetch()(Code)

isPost
public boolean isPost()(Code)
	Returns true if this URI should be fetched by sending a HTTP POST request. TODO: Compound this and CrawlURI.isHttpTransaction() method so that there is one place to go to find out if get http, post http, ftp, dns. Returns is this CrawlURI instance is to be posted.

isPrerequisite
public boolean isPrerequisite()(Code)
	Returns true if this CrawlURI is a prerequisite. true if this CrawlURI is a prerequisite.

isSuccess
public boolean isSuccess()(Code)
	Ask this URI if it was a success or not. Only makes sense to call this method after execution of HttpMethod#execute. Regard any status larger then 0 as success except for below caveat regarding 401s. Use CrawlURI.is2XXSuccess() if looking for a status code in the 200 range. 401s caveat: If any rfc2617 credential data present and we got a 401 assume it got loaded in FetchHTTP on expectation that we're to go around the processing chain again. Report this condition as a failure so we get another crack at the processing chain only this time we'll be making use of the loaded credential data. True if ths URI has been successfully processed. See Also: CrawlURI.is2XXSuccess()

isTimeTruncatedFetch
public boolean isTimeTruncatedFetch()(Code)

isTruncatedFetch
public boolean isTruncatedFetch()(Code)
	TODO: Implement truncation using booleans rather than as this ugly String parse. True if fetch was truncated.

linkExtractorFinished
public void linkExtractorFinished()(Code)
	Note that link extraction has been performed on this CrawlURI. A processor doing link extraction should invoke this method once it has finished it's work. It should invoke it even if no links are extracted. It should only invoke this method if the link extraction was performed on the document body (not the HTTP headers etc.). See Also: CrawlURI.hasBeenLinkExtracted()

markAsSeed
public void markAsSeed()(Code)
	Mark this uri as being a seed.

markPrerequisite
public void markPrerequisite(String preq, ProcessorChain lastProcessorChain) throws URIException(Code)
	Do all actions associated with setting a `CrawlURI` as requiring a prerequisite. Parameters: lastProcessorChain - Last processor chain reference. This chain iswhere this `CrawlURI` goes next. Parameters: preq - Object to set a prerequisite. throws: URIException -

nextProcessor
public Processor nextProcessor()(Code)
	Get the next processor to process this URI. the processor that should process this URI next.

nextProcessorChain
public ProcessorChain nextProcessorChain()(Code)
	Get the processor chain that should be processing this URI after the current chain is finished with it. the next processor chain to process this URI.

outlinksSize
public int outlinksSize()(Code)
	Count of outlinks.

processingCleanup
public void processingCleanup()(Code)
	Clean up after a run through the processing chain. Called on the end of processing chain by Frontier#finish. Null out any state gathered during processing.

removeAlistPersistentMember
public static boolean removeAlistPersistentMember(Object key)(Code)
	Parameters: key - Key to remove. True if list contained the element.

removeCredentialAvatar
public boolean removeCredentialAvatar(CredentialAvatar ca)(Code)
	Remove all credential avatars from this crawl uri. Parameters: ca - Avatar to remove. True if we removed passed parameter. False if no operationperformed.

removeCredentialAvatars
public void removeCredentialAvatars()(Code)
	Remove all credential avatars from this crawl uri.

replaceOutlinks
public void replaceOutlinks(Collection<CandidateURI> links)(Code)
	Replace current collection of links w/ passed list. Used by Scopers adjusting the list of links (removing those not in scope and promoting Links to CandidateURIs). Parameters: a - collection of CandidateURIs replacing any previouslyexisting outLinks or outCandidates

resetDeferrals
public void resetDeferrals()(Code)
	Reset deferrals counter.

resetFetchAttempts
public void resetFetchAttempts()(Code)
	Reset fetchAttempts counter.

setBaseURI
public void setBaseURI(String baseHref) throws URIException(Code)
	Set the (HTML) Base URI used for derelativizing internal URIs. Parameters: baseHref - String base href to use throws: URIException - if supplied string cannot be interpreted as URI

setContentDigest
public void setContentDigest(byte[] digestValue)(Code)
	Set the retained content-digest value (usu. SHA1). Parameters: digestValue - #setContentDigest(String scheme, byte[])

setContentDigest
public void setContentDigest(String scheme, byte[] digestValue)(Code)

setContentSize
public void setContentSize(long l)(Code)
	Sets the 'content size' for the URI, which is considered inclusive of all recorded material (such as protocol headers) or even material 'virtually' considered (as in material from a previous fetch confirmed unchanged with a server). (In contrast, content-length matches the HTTP definition, that of the enclosed content-body.) Should be set by a fetcher or other processor as soon as the final size of recorded content is known. Setting to an artificial/incorrect value may affect other reporting/processing. Parameters: l - Content size.

setContentType
public void setContentType(String ct)(Code)
	Set a fetched uri's content type. Parameters: ct - Contenttype. May be null.

setFetchStatus
public void setFetchStatus(int newstatus)(Code)
	Set the overall/fetch status of this CrawlURI for its current trip through the processing loop. Parameters: newstatus - a value from FetchStatusCodes

setHolder
public void setHolder(Object obj)(Code)
	Remember a 'holder' to which some enclosing/queueing facility has assigned this CrawlURI . Parameters: obj -

setHolderCost
public void setHolderCost(int cost)(Code)
	Remember a 'holderCost' which some enclosing/queueing facility has assigned this CrawlURI Parameters: cost - value to remember

setHolderKey
public void setHolderKey(Object obj)(Code)
	Remember a 'holderKey' which some enclosing/queueing facility has assigned this CrawlURI . Parameters: obj -

setHttpRecorder
public void setHttpRecorder(HttpRecorder httpRecorder)(Code)
	Set the http recorder to be associated with this uri. Parameters: httpRecorder - The httpRecorder to set.

setNextProcessor
public void setNextProcessor(Processor processor)(Code)
	Set the next processor to process this URI. Parameters: processor - the next processor to process this URI.

setNextProcessorChain
public void setNextProcessorChain(ProcessorChain nextProcessorChain)(Code)
	Set the next processor chain to process this URI. Parameters: nextProcessorChain - the next processor chain to process this URI.

setPost
public void setPost(boolean b)(Code)
	Set whether this URI should be fetched by sending a HTTP POST request. Else a HTTP GET request will be used. Parameters: b - Set whether this curi is to be POST'd. Else its to be GET'd.

setPrerequisite
public void setPrerequisite(boolean prerequisite)(Code)
	Set if this CrawlURI is itself a prerequisite URI. Parameters: prerequisite - True if this CrawlURI is itself a prerequiste uri.

setPrerequisiteUri
public void setPrerequisiteUri(Object link)(Code)
	Set a prerequisite for this URI. A prerequisite is a URI that must be crawled before this URI can be crawled. Parameters: link - Link to set as prereq.

setThreadNumber
public void setThreadNumber(int i)(Code)
	Set the number of the ToeThread responsible for processing this uri. Parameters: i - the ToeThread number.

setUserAgent
public void setUserAgent(String string)(Code)
	Set the user agent to use when crawling this URI. If not set the global settings should be used. Parameters: string - user agent to use

skipToProcessor
public void skipToProcessor(ProcessorChain processorChain, Processor processor)(Code)
	Set which processor should be the next processor to process this uri instead of using the default next processor. Parameters: processorChain - the processor chain to skip to. Parameters: processor - the processor in the processor chain to skip to.

skipToProcessorChain
public void skipToProcessorChain(ProcessorChain processorChain)(Code)
	Set which processor chain should be processing this uri next. Parameters: processorChain - the processor chain to skip to.

stripToMinimal
public void stripToMinimal()(Code)
	Remove all attributes set on this uri. This methods removes the attribute list.

Fields inherited from org.archive.crawler.datamodel.CandidateURI

final public static int HIGH(Code)(Java Doc)
final public static int HIGHEST(Code)(Java Doc)
final public static int MEDIUM(Code)(Java Doc)
final public static int NORMAL(Code)(Java Doc)

Methods inherited from org.archive.crawler.datamodel.CandidateURI

protected void clearAList()(Code)(Java Doc)
public boolean containsKey(String key)(Code)(Java Doc)
public CandidateURI createCandidateURI(UURI baseUURI, Link link) throws URIException(Code)(Java Doc)
public CandidateURI createCandidateURI(UURI baseUURI, Link link, int scheduling, boolean seed) throws URIException(Code)(Java Doc)
public static CandidateURI createSeedCandidateURI(UURI uuri)(Code)(Java Doc)
public String flattenVia()(Code)(Java Doc)
public boolean forceFetch()(Code)(Java Doc)
public static CandidateURI fromString(String uriHopsViaString) throws URIException(Code)(Java Doc)
public AList getAList()(Code)(Java Doc)
public synchronized String getCandidateURIString()(Code)(Java Doc)
public String getClassKey()(Code)(Java Doc)
public int getInt(String key)(Code)(Java Doc)
public long getLong(String key)(Code)(Java Doc)
public Object getObject(String key)(Code)(Java Doc)
public String getPathFromSeed()(Code)(Java Doc)
public String[] getReports()(Code)(Java Doc)
public int getSchedulingDirective()(Code)(Java Doc)
public String getString(String key)(Code)(Java Doc)
public int getTransHops()(Code)(Java Doc)
public String getURIString()(Code)(Java Doc)
public UURI getUURI()(Code)(Java Doc)
public UURI getVia()(Code)(Java Doc)
public CharSequence getViaContext()(Code)(Java Doc)
protected void inheritFrom(CandidateURI ancestor)(Code)(Java Doc)
public boolean isLocation()(Code)(Java Doc)
public boolean isSeed()(Code)(Java Doc)
public Iterator keys()(Code)(Java Doc)
public void makeHeritable(String key)(Code)(Java Doc)
public void makeNonHeritable(String key)(Code)(Java Doc)
public boolean needsImmediateScheduling()(Code)(Java Doc)
public boolean needsSoonScheduling()(Code)(Java Doc)
public void putInt(String key, int value)(Code)(Java Doc)
public void putLong(String key, long value)(Code)(Java Doc)
public void putObject(String key, Object value)(Code)(Java Doc)
public void putString(String key, String value)(Code)(Java Doc)
protected UURI readUuri(String u)(Code)(Java Doc)
public void remove(String key)(Code)(Java Doc)
public void reportTo(String name, PrintWriter writer)(Code)(Java Doc)
public void reportTo(PrintWriter writer) throws IOException(Code)(Java Doc)
public boolean sameDomainAs(CandidateURI other) throws URIException(Code)(Java Doc)
protected void setAList(AList alist)(Code)(Java Doc)
public void setClassKey(String key)(Code)(Java Doc)
public void setForceFetch(boolean b)(Code)(Java Doc)
public void setIsSeed(boolean b)(Code)(Java Doc)
protected void setPathFromSeed(String string)(Code)(Java Doc)
public void setSchedulingDirective(int schedulingDirective)(Code)(Java Doc)
public void setVia(UURI via)(Code)(Java Doc)
public String singleLineLegend()(Code)(Java Doc)
public String singleLineReport()(Code)(Java Doc)
public void singleLineReportTo(PrintWriter w)(Code)(Java Doc)
public String toString()(Code)(Java Doc)

Methods inherited from java.lang.Object

native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us

All other trademarks are property of their respective owners.