Request Processing in Apache

Processing HTTP requests is central to most web applications. In this article, we present an overview of request handling in Apache, and how modules may insert hooks into the request processing to build custom applications and components.

This article should help developers on the learning curve to working with apache modules, and equip you to work comfortably with the API documentation and code examples shipped with Apache itself.

Sorry, anonymous annotations have been disabled due to excessive spambots.

Introduction


The Apache architecture comprises a common core, a platform-dependent layer (the APR), and a number of modules. Any Apache-based application - even one as simple as serving Apache's default "it worked" page - require several modules. Users of Apache need not be aware of this, but for applications developers, understanding modules and Apache's module API are the key to working with Apache.

Most, though by no means all, modules are concerned with some aspect of processing an HTTP request. But there is rarely if ever a reason for a module to concern itself with every aspect of HTTP: that is the business of the httpd. The advantage of a modular approach is that it is straightforward for a module to focus on a particular task but ignore aspects of HTTP that are not relevant to the task at hand.

In this article, we present the Apache request processing architecture, and show how a module can hook in to - and optionally control - different parts of the request cycle.

show annotation

Note by anonymous, Tue Feb 1 04:04:25 2005

In the first line you should define what the letters APR are.

Content Generation


[Figure 1: the minimal webserver]

The simplest possible formulation of a webserver is a program that listens for HTTP requests and returns a response when it recieves one. In Apache, this is fundamentally the business of a content generator, the core of the webserver. Exactly one content generator must be run for every HTTP request. Any module may register content generators, normally by defining a function referenced by a handler that can be configured using the SetHandler or AddHandler directives in httpd.conf. Any request for which no generator is provided by some module is handled by the default generator, which simply returns a file mapped directly from the request to the filesystem. Modules that implement one or more content generator may be known as content generator or handler modules.

show annotation

Note by anonymous, Tue Oct 25 10:08:01 2005

"when it recieves one" is a typo. As we learned at school, 'I before E, except after C' :)

Request Processing Phases


[Figure 2: Request Processing]

In principle, a content generator can handle all the functions of a webserver: for example, a CGI program gets the request and produces the response, and can take full control of what happens between them. But in common with other webservers, Apache splits the request into different phases. So, for example, it checks whether the user is authorised to do something before the content generator does that thing.

There are several request phases before the content generator. These serve to examine and perhaps manipulate the request headers, and determine what to do with the request. For example:

  • The request URL will be matched against the configuration to determine what content generator should be used.
  • The request URL will be normally mapped to the filesystem. The mapping may be to a static file, a CGI script, or whatever else the content generator may use.
  • If content negotiation is enabled, mod_negotiation will find the version of the resource that best matches the browser's preference. For example, the Apache manual pages are served in the language requested by the browser.
  • Access and Authentication modules will enforce the servers access rules, and determine whether the user is permitted what has been requested.
  • mod_alias or mod_rewrite may change the effective URL in the request.

In addition, there is a request logging phase, that comes after the content generator has sent a reply to the browser.

A module may hook its own handlers into any of these processing hooks. Modules that concern themselves with the phases before content generation are known as metadata modules. Those that deal with logging are known as logging modules.

The Data Axis and Filters


[Figure 3: The Data Axis]

What we have described above is essentially the architecture of every general-purpose webserver. There are differences in the detail, but the request processing metadata->generator->logger phases are common.

The major innovation in Apache 2 that transforms it from a 'mere' webserver (like Apache 1.3 and others) into a powerful applications platform is the filter chain. This can be represented as a data axis, orthogonal to the request processing axis. The request data may be processed by input filters before reaching the content generator, and the response may be processed by output filters before being sent to the client. Filters enable a far cleaner and more efficient implementation of data processing than was possible in the past, as well as separating it from content generation. Examples of filters include Server side includes (SSI), XML and XSLT processing, gzip compression, and Encryption (SSL).

Order of Processing


Before proceeding to discuss how a module hooks itself in to any of the stages of processing a request / data, let's pause to clear up a matter that often causes confusion amongst new module developers: namely, the order of processing.

The request processing axis is straightforward: the phases happen strictly in order. But confusion arises in the data axis. For maximum efficiency, this is pipelined, so the content generator and filters do not run in a deterministic order. So, for example, you cannot in general set something in an input filter and expect it to apply in the generator or output filters.

The order of processing is in fact centred on the content generator, which is responsible for pulling data from the input filter stack and pushing data to the output filters (where applicable, in both cases). When a generator or filter needs to set something affecting the request as a whole, it must do so before passing any data down the chain (generator and output filters), or before returning data to the caller (input filters). Techniques for this will be discussed in another article.

show annotation

Note by anonymous, Fri Aug 19 03:34:17 2005

I'd like to know how to write an input filter to modify HTTP headers. Could you tell us which article discussed those techniques with regard to the last line of the third paragraph? Thank you in advance. --Miki

Processing Hooks


Now that we have an overview of request processing in Apache, we can proceed to show how a module hooks into it to play a part.

The apache module structure declares several (optional) data and function members:


module AP_MODULE_DECLARE_DATA my_module = {
  STANDARD20_MODULE_STUFF,
  my_dir_conf,
  my_dir_merge,
  my_server_conf,
  my_server_merge,
  my_cmds,
  my_hooks
} ;

The relevant function for the module to create request processing hooks is the final member:


static void my_hooks(apr_pool_t* pool) {
  /* create request processing hooks as required */
}

What hooks we need to create here depend on what part or parts of the request our module is interested in. For example, a module that implements a content generator (handler) will need a handler hook, looking something like:


  ap_hook_handler(my_handler, NULL, NULL, APR_HOOK_MIDDLE) ;

Now my_handler will be called when a request reaches the content generation phase. Hooks for other request phases are similar; a few commonly used ones are:

ap_hook_post_read_request
First chance to look at the request after accepting it.
ap_hook_fixups
Last chance to look at the request before content generation.
ap_hook_log_transaction
Logging hook.

Between the general post_read_request and fixups hooks are several other hooks designated for specific purposes: for example access and authentication modules have specific hooks for checking permissions. All these hooks take exactly the same form as the handler hook. For further details, see http_config.h.

The prototype for a handler for any of these phases is:


static int my_handler(request_rec* r) {
  /* do something with the request */
}

The request_rec is the main apache data structure representing all aspects of an HTTP request.

The return value of my_handler is one of:

OK
my_handler has handled the request successfully. The handler phase is finished.
DECLINED
my_handler is not interested in this request. Let some other handler deal with it.
Any HTTP response code
An error condition occurred while processing the request. This diverts the request processing path: normal processing is aborted, and the server instead returns an ErrorDocument.

Implementation of the handlers will be discussed in other articles.

show annotation

Note by anonymous, Fri Nov 10 16:29:10 2006

For httpd_config.h documentation see http://docx.itscales.com/http__config_8h.html

Filter Hooks


Filters are also normally registered in the my_hooks function, but the API is rather different:


  ap_register_output_filter("my-output-filter-name", my_output_filter,
	NULL, AP_FTYPE_RESOURCE) ;
  ap_register_input_filter("my-input-filter-name", my_input_filter,
	NULL, AP_FTYPE_RESOURCE) ;

with the filter function prototypes


static apr_status_t my_output_filter(ap_filter_t* f, apr_bucket_brigade* bb) {
  /* read a chunk of data, process it, pass it to the next filter */
  return APR_SUCCESS ;
}
static apr_status_t my_input_filter(ap_filter_t* f, apr_bucket_brigade* bb,
	ap_input_mode_t mode, apr_read_type_e block, apr_off_t nbytes) {
  /* pull a chunk of data from the next filter, process it, return it in bb */
  return APR_SUCCESS ;
}

Filter functions will normally return APR_SUCCESS, either explicitly as above or as the return code from the next filter via an ap_pass_brigade or ap_get_brigade call. Any other return value is an internal server error and should only happen when the request is unrecoverable.

As with handlers, implementation of filters will be discussed in other articles. The API documentation is in util_filter.h.

show annotation

Note by anonymous, Fri May 27 01:26:09 2005

in the filter function prototypes, the comments should be : pull a chunk of data from previous input filter, process... The word 'next' should be 'previous'. becuase,say,the http_filter is behind of ssl_filter, and http_filter pull data from ssl_filter.

The request data structure


The central data structure that represents an HTTP request is the request_rec. It is created when Apache accepts the request, and is provided to all request processing functions, as shown in the prototype my_handler above. In a content filter, the request_rec is available as f->r.

The request_rec is a large struct containing, directly or indirectly, all the data fields a handler needs to process the request. Any metadata handler works by accessing and updating fields in the request_rec, content generator or filter may do so but additionally processes I/O, and a logger gets its information from the request_rec. For full details, see the API header file httpd.h.

We'll conclude this article with a few quick-tips about using the request_rec. You'll need to look at the API - or other articles where available - for details of how to use them, but these deal with frequently asked questions.

  • The request pool r->pool is available for all resource allocations having the lifetime of the request.

  • The Request and Response headers are held in r->headers_in and r->headers_out respectively. These are of type apr_table_t, and can be accessed by the apr_table functions such as apr_table_get and apr_table_set.

  • The handler field determines what handler will run. Content generators should check it and return an immediate DECLINED if it isn't for them. Metadata handlers may set this field to select a generator.

  • The input_filters and output_filters fields may be used as I/O descriptors just as in a filter module. Alternatively the higher-level I/O (carried over from Apache 1.x) is available to content generators, but not to filters.

  • Configuration directives are provided in the per_dir_config field, and may be accessed using ap_get_module_config (also ap_set_module_config, though that would be highly inappropriate during a request).

  • The other core data strcutures are available as r->connection (the connection rec), r->server (the server rec), etc. Their configurations are similarly available to the request.

  • An additional configuration field request_config is re-initialised for every request, and is the place to store request data for the module between request phases.

show annotation

Note by anonymous, Sun Nov 26 07:50:43 2006

The word "structures" is misspelled for the phrase "The other core data strcutures are available as r->connection ... " for bullet number 6.