DocumentInfo

The DocumentInfo is an object that is passed from the HTMLComponent to the DocumentRequestHandler, and can be used by the latter to obtain information about the document such as its location, type, encoding etcetera, and also to hint back to the HTMLComponent about attributes it found about the document.

When a setPage is called on an HTMLComponent, it results in a call to the DocumentRequestHandler's resourceRequested method, with a populated DocumentInfo object. This method is also called when links are clicked or referenced images and CSS files are needed. A DocumentRequestHandler implementation should consider a couple of useful DocumentInfo getters/setters that are discussed next.

getUrl

This method returns the absolute URL of the requested document. The absolute URL is automatically calculated internally according to the page on which the link was clicked on. Implementations can learn about the document protocol (file, http, etc.) and about the document's domain and act accordingly. For example, it is possible to allow only certain protocols or domains, or to use custom protocol strings etc.

getEncoding and setEncoding

getEncoding and setEncoding are quite important when reading documents that can have different encodings. Encoding information of HTML and CSS documents can appear in multiple places. For example, when posting a form, its FORM tag can have an ENCTYPE property that specifies the form's encoding. This is one situation in which the encoding in the provided DocumentInfo is different than the default (which is ISO-8859-1), and thus has to be queried to set encoding headers appropriately. On the other direction, when requesting a document, the encoding can be specified by the response headers (charset in the content-type header) — and then, in order for HTMLComponent to be able to read the document properly, the encoding type must be set using setEncoding. Note that encoding can be set in other ways as well, such as BOM (Byte Order Mark), and it is the responsibility of the DocumentRequestHandler to figure it out and relay that info to HTMLComponent via the DocumentInfo object.

getParams

getParams returns the request parameters. It can be used, for example, to screen parameters before sending to the server (and it has a matching setter as well).

getExpectedContentType and setExpectedContentType

The expected content type is what the HTMLComponent expects to find when requesting the resource in question. This would be an HTML document (TYPE_HTML) when setting a page or clicking links, an image (TYPE_IMAGE) for image references, and a CSS file (TYPE_CSS) for CSS references. Querying the expected content type can help processing; for example, we will check encoding only for HTML and CSS, but not for images. Another reason may be that we want to cache images and not HTML documents, and so forth.

getFullUrl or getBaseUrl

Other more informative methods include getFullUrl which returns a string composed of the absolute URL plus the parameters of the request (if any, and only if this was a GET request). Another one is getBaseUrl, returning the document base URL.