![]()
URL Rewriting Module
Version 2.4This module uses a rule-based rewriting engine (based on a regular-expression parser) to rewrite requested URLs on the fly.
It supports an unlimited number of additional rule conditions (which can operate on a lot of variables, including HTTP headers) for granular matching and external database lookups (either via plain text tables, DBM hash files or external processes) for advanced URL substitution.
It operates on the full URLs (including the PATH_INFO part) both in per-server context (httpd.conf) and per-dir context (.htaccess) and even can generate QUERY_STRING parts on result. The rewrittten result can lead to internal sub-processing, external request redirection or to internal proxy throughput.
The latest version can be found on
http://www.engelschall.com/sw/mod_rewrite/
Copyright © 1996 The Apache Group, All rights reserved.
Copyright © 1996 Ralf S. Engelschall, All rights reserved.Written for The Apache Group by
Ralf S. Engelschall
rse@engelschall.com
http://www.engelschall.com/
Contents
- About this Module
- Installation Procedure
- Configuration Directives
- RewriteEngine
- RewriteOptions
- RewriteLog
- RewriteLogLevel
- RewriteMap
- RewriteBase
- RewriteCond
- RewriteRule
- Special Features
- Practical Examples
- URL Canonicalization
- Homogeneous URL Layout
- Secure CGI Script Integration
- Simplification Of Services
- Backward Compatibility for Obsolete URLs
- The Trailing Slash Problem
- Map External Stuff into Local Namespace
- Hardcore Example: net.sw
- Static HTML, Dynamically Created
- Blocking Some URLs
- Programmed Maps
- Partially Forwarded Homepages
- Frequently Asked Questions
- Where can I find more example configurations?
- Are there any published articles about mod_rewrite?
- Why is mod_rewrite difficult to learn and seems so complicated?
- What can I do if my RewriteRules don't work as expected?
- Some of my URLs don't get prefixed with DocumentRoot?
- Comparison to similar Modules
- Additional Modules
About this Module
Summary of Functionality
The Apache webserver API (application programmers interface) has a hook (an entry point) for URL-to-filename translation which is primarily used by this module to link itself into the server. For every request, this module is called with the requested URL (or even with an URI in a proxy-context) and has the chance to re-write this request just before the Apache server continues with all following processing steps, like running the hooks of the other modules, etc.It's rewriting engine is rule-based where a rule primarily consists of a rewriting pattern and a rewriting substitution. The patterns are full regular expressions, as commonly known in the Unix community. They are applied to the current value of the URL.
The substitutions replace the current URL value of the pattern matched and are either:
Additionally these substitutions can have in-lined mapping-function expressions to lookup external databases (either plain text, DBM hashfile or even programs which act like a map) and hence can include strings from external sources.
- fully-qualified URIs
- new URLs
- filepaths, which can contain so-called QUERY_STRING parts.
Finally the rule flags can control the behaviour after a rewriting rule has been matched. There is support for rewriting loops, rewriting breaks, chained rules, pseudo if-then-else constructs, forced redirects, forced MIME-types, forced proxy-module throughput etc.
When a URL expands to a non-self-referencing URI by means of a substitution, it will automatically lead to an external request redirection at the end of the rewriting process. If the generated URI is self-referencing (i.e. http://thishost[:thisport]), then this automatically gets stripped down to a URL. This provides support for so-called webclusters with homogeneous URL trees.
Optionally, any rewriting rule can be preceded by any number of additional rewriting conditions. These are pairs of condition strings (which can be constructed out of a rich set of variables) and condition patterns. The condition patterns are again regular expressions and get matched against the runtime-evaluated condition string. Preceding a rule with such conditions mean that the rule is only fully applied if the rule pattern matches and all of the preceding rule conditions do so as well. This gives you a flexible way to adjust the situation in which your rules are applied.
This functionality can be used in four contexts:
This consistent usage is provided for maximum flexibility. But there is a slight difference between situations 1 & 2 and 3 & 4:
- Used by the administrator inside the server config file (httpd.conf) for global per-server context outside of any VirtualHost blocks.
Here the rules are applied to all URLs, i.e. to all URLs of the main server and all URLs of the virtual servers.
- Used by the administrator inside the server config file (httpd.conf) for global per-server context inside of any VirtualHost blocks.
Here the rules are applied only to URLs of a specific virtual server.
- Used by the administrator inside the server config file (httpd.conf) for local per-directory context inside of any Directory blocks.
Here the rules are applied only to URLs for the specific directory.
- Used by the users inside their directory config files (.htaccess) for local per-directory context.
Here the rules are applied only to URLs for the specific directory.This difference is subtle but very important, because only this way it is possibly to do a lot of special rewriting tricks in the per-directory context!
- In situation 1 & 2 the context is global hence the rule patterns get applied to the full URLs and the substitutions have to create full URLs.
- In situation 3 & 3 the context is local hence the rule patterns get applied only to the remainder of the URL. This remainder is just the URL with the per-directory prefix stripped off. The same thing applies to the substitutions: they should (but they have not) create remainders which automatically will completed to full URLs by adding the previously stripped prefix.
New Features
The URL rewriting module adds a lot of new features to the Apache server which are not possible with the aliasing and rewriting mechanism provided by the standard functionality inherited from the NCSA webserver. These are listed where:
- URLs can be re-written in any number of steps instead of just translated once to the final result.
- URLs can be changed in any way instead of just substituted through a simple prefix match.
- The full power of regular expressions can be used for rewriting instead of simple substring substitution
- The substitutions of a rewriting rule can contain a
QUERY_STRING
part, i.e. "?key=value&key=value&...
". If such a part occurs, the module automatically splits it away from the URL and injects it into the current Apache-internal request information, just as it was coming from the original request.
- You can use rule conditions not only to rewrite according to a pattern which has to match the current URI/URL. You can additionally match (also via regular expression patterns) a lot of server-variables which include most of the HTTP headers. With this you can rewrite according to the browser type or identified user and host, etc.
- URIs (i.e. URLs plus protocol, host and port) can be generated instead of just URLs. Additionally when those occur the module automatically forces a request redirection, so you do not have to differentiate between pure rewritings and rewriting with redirection as in the old Alias and Redirect directives.
- You can force a URI to be put through the proxy module internally. This is a sort of an super-enhanced ProxyPass directive, because it enables some special tricks. For instance, to proxy away only some files of a directory, etc.
- Mapping-functions can be used inside the substitution strings to insert looked-up values by means of external databases. These external databases can be either plain text files, DBM hashfiles or even programs which act like dynamic maps.
- Server-variables (the same as in the condition strings of the rule conditions) can be used to insert values into the rule substitution by means of the current internal Apache state of these variables.
- Exhaustive logging is provided for all rewriting actions. This is very useful for the webmaster to see how the rewriting rules are applied and what result is generated by these rules. You can set the amount of verbosity for this logging.
- The rewriting rules can be used both globally inside the server configuration (httpd.conf) and locally in per-directory configuration files (.htaccess).
- Through the use of rewrite rule flags many special URL rewriting features can be configured. There is support for rewriting loops, rewriting breaks, pseudo-if-then-else constructs, chained rules, forced MIME-types, forced proxy-throughputs, etc.
- Two additional new CGI/SSI environment variables named SCRIPT_URL and SCRIPT_URI are provided which contain the original (i.e. previous to any rewritings!) Web-view to the current resource.
Side Effects
You can safely compile mod_rewrite into the Apache server, because this module is only triggered with its server configuration directives.Only the RewriteRule directive is a runtime one and causes URL rewriting actions to really happen, and then only if the RewriteEngine on directive is used too. In other words: Unless you set "RewriteEngine on" there are no side effects!
Tested Apache Versions
Although this module's history starts with the Apache 1.0 API in February 1996, it was moved to the Apache 1.1 API very early. This means it will NOT run with any Apache server prior to 1.1b0!It runs fine with any beta release of Apache 1.1 and (of course) with the current official release: Apache 1.1.1. If you want to use this module but still have an older version of the Apache server running, please upgrade first!
From version 2.1 mod_rewrite has support for Apache version 1.2-dev which already has its own POSIX library included.
You have to carry out the following steps to install this module into your Apache server. Installation
Preconditions:
- mod_rewrite distribution stays in /tmp/mod_rewrite
- The Apache distribution root is /usr/local/apache/dist
$ cd /tmp/mod_rewrite $ cp mod_rewrite*[hc] /usr/local/apache/dist/src/ $ cp -r misc/for-apache-1.1.x/regexp /usr/local/apache/dist/src/ $ cp misc/for-apache-1.1.x/util_script.c.diff /usr/local/apache/dist/src/ $ cp misc/for-apache-1.1.x/mod_negotiation.c.diff /usr/local/apache/dist/src/ $ cd /usr/local/apache/dist/src/ $ patch <util_script.c.diff $ patch <mod_negotiation.c.diff $ vi Configuration | : | EXTRA_LIBS= regexp/libregexp.a | : | Module rewrite_module mod_rewrite.o | : $ ./Configure $ (cd regexp; make) $ make
NOTICE: 'Module rewrite_module mod_rewrite.o' must come after any other Modules that contain a URL-to-filename hook. Since it is not obvious which these are, the simple solution is to make it the last module in the Configuration list. $ cd /tmp/mod_rewrite $ cp mod_rewrite*[hc] /usr/local/apache/dist/src/ $ cd /usr/local/apache/dist/src/ $ vi Configuration | : | Module rewrite_module mod_rewrite.o | : $ ./Configure $ make
Configuration Directives
RewriteEngine
Syntax:RewriteEngine
{on,off
}
Default:RewriteEngine off
Context: server config, virtual host, per-directory config
The RewriteEngine directive enables or disables the runtime rewriting engine. If it is set to
off
this module does no runtime processing at all. It does not even update the SCRIPT_URx environment variables.Use this directive to disable the module instead of commenting out all RewriteRule directives!
RewriteOptions
Syntax:RewriteOptions
Option ...
Default: -None-
Context: server config, virtual host, per-directory config
The RewriteOption directive sets some special options for the current per-server or per-directory configuration. The Option strings can be one of the following:
- '
inherit
'
This forces the current configuration to inherit the configuration of the parent. In per-virtual-server context this means that the maps, conditions and rules of the main server gets inherited. In per-directory context this means that conditions and rules of the parent directory's .htaccess configuration gets inherited.
RewriteLog
Syntax:RewriteLog
Filename
Default: -None-
Context: server config, virtual host
The RewriteLog directive sets the name of the file to which the server logs any rewriting actions it performs. If the name does not begin with a slash ('/') then it is assumed to be relative to the Server Root. The directive should occur only once per server config.
To disable the logging of rewriting actions it is not recommended to set Filename to /dev/null
, because although the rewriting engine does not create output to a logfile it still creates the logfile output internally. This will slow down the server with no advantage to the administrator! To disable logging either remove or comment out the RewriteLog directive or use RewriteLogLevel 0!Example:
RewriteLog "/usr/local/var/apache/logs/rewrite.log"
RewriteLogLevel
Syntax:RewriteLogLevel
Level
Default:RewriteLogLevel 0
Context: server config, virtual host
The RewriteLogLevel directive set the verbosity level of the rewriting logfile. The default level 0 means no logging, while 9 or more means that practically all actions are logged.
To disable the logging of rewriting actions simply set Level to 0. This disables all rewrite action logs.
Notice: Using a high value for Level will slow down your Apache server dramatically! Use the rewriting logfile only for debugging or at least at Level not greater than 2! Example:
RewriteLogLevel 3
RewriteMap
Syntax:RewriteMap
Mapname{txt,dbm,prg}:
Filename
Default: not used per default
Context: server config, virtual host
The RewriteMap directive defines an external Rewriting Map which can be used inside rule substitution strings by the mapping-functions to insert/substitute fields through a key lookup.
The Mapname is the name of the map and will be used to specify a mapping-function for the substitution strings of a rewriting rule via
When such a directive occurs the map Mapname is consulted and the key LookupKey is looked-up. If the key is found, the map-function directive is substituted by SubstValue. If the key is not found then it is substituted by DefaultValue.${
Mapname:
LookupKey|
DefaultValue}
The Filename must be a valid Unix filepath, containing one of the following formats:
The RewriteMap directive can occur more than once. For each mapping-function use one RewriteMap directive to declare its rewriting mapfile. While you cannot declare a map in per-directory context it is of course possible to use this map in per-directory context.
- Plain Text Format
This is a ASCII file which contains either blank lines, comment lines (starting with a '#' character) or
MatchingKey SubstValuepairs - one per line. You can create such files either manually, using your favorite editor, or by using the programs mapcollect and mapmerge from the support directory of the mod_rewrite distribution.To declare such a map prefix, Filename with a
txt:
string as in the following example:
# # map.real-to-user -- maps realnames to usernames # Ralf.S.Engelschall rse # Bastard Operator From Hell Dr.Fred.Klabuster fred # Mr. DAU
RewriteMap real-to-host txt:/path/to/file/map.real-to-user
- DBM Hashfile Format
This is a binary NDBM format file containing the same contents as the Plain Text Format files. You can create such a file with any NDBM tool or with the dbmmanage program from the support directory of the Apache distribution.
To declare such a map prefix Filename with a
dbm:
string.
- Program Format
This is a Unix executable, not a lookup file. To create it you can use the language of your choice, but the result has to be a runable Unix binary (i.e. either object-code or a script with the magic cookie trick '#!/path/to/interpreter' as the first line).
This program gets started once at startup of the Apache servers and then communicates with the rewriting engine over its stdin and stdout filehandles. For each map-function lookup it will receive the key to lookup as a newline-terminated string on stdin. It then has to give back the looked-up value as a newline-terminated string on stdout or the four-character string ``NULL'' if it fails (i.e. there is no corresponding value for the given key). A trivial program which will implement a 1:1 map (i.e. key == value) could be:
#!/usr/bin/perl $| = 1; while (<STDIN>) { # ...here any transformations # or lookups should occur... print $_; }But be very careful:
- ``Keep the program simple, stupid'' (KISS), because if this program hangs it will lead to a hang of the Apache server when the rule occurs.
- Avoid one common mistake: never do buffered I/O on stdout! This will cause a deadloop! Hence the ``$|=1'' in the above example...
To declare such a map prefix Filename with a
prg:
string.
For plain text and DBM format files the looked-up keys are cached in-core until the mtime of the mapfile changes or the server does a restart. This way you can have map-functions in rules which are used for every request. This is no problem, because the external lookup only happens once!
RewriteBase
Syntax:RewriteBase
BaseURL
Default: default is the physical directory path
Context: per-directory config
The RewriteBase directive explicitly sets the base URL for per-directory rewrites. As you will see below, RewriteRule can be used in per-directory config files (.htaccess). There it will act locally, i.e. the local directory prefix is stripped at this stage of processing and your rewriting rules act only on the remainder. At the end it is automatically added.
When a substitution occurs for a new URL, this module has to re-inject the URL into the server processing. To be able to do this it needs to know what the corresponding URL-prefix or URL-base is. By default this prefix is the corresponding filepath itself. But at most websites URLs are NOT directly related to physical filename paths, so this assumption will be usually be wrong! There you have to use the RewriteBase directive to specify the correct URL-prefix.
So, if your webserver's URLs are not directly related to physical file paths, you have to use RewriteBase in every .htaccess files where you want to use RewriteRule directives. Example:
Assume the following per-directory config file:
# # /abc/def/.htaccess -- per-dir config file for directory /abc/def # Remember: /abc/def is the physical path of /xyz, i.e. the server # has a 'Alias /xyz /abc/def' directive e.g. # RewriteEngine On # let the server know that we are reached via /xyz and not # via the physical path prefix /abc/def RewriteBase /xyz # now the rewriting rules RewriteRule ^oldstuff\.html$ newstuff.htmlIn the above example, a request to /xyz/oldstuff.html gets correctly rewritten to the physical file /abc/def/newstuff.html.
For the Apache hackers:
The following list gives detailed information about the internal processing steps:
Request: /xyz/oldstuff.html Internal Processing: /xyz/oldstuff.html -> /abc/def/oldstuff.html (per-server Alias) /abc/def/oldstuff.html -> /abc/def/newstuff.html (per-dir RewriteRule) /abc/def/newstuff.html -> /xyz/newstuff.html (per-dir RewriteBase) /xyz/newstuff.html -> /abc/def/newstuff.html (per-server Alias) Result: /abc/def/newstuff.htmlThis seems very complicated but is the correct Apache internal processing, because the per-directory rewriting comes too late in the process. So, when it occurs the (rewritten) request has to be re-injected into the Apache kernel! BUT: While this seems like a serious overhead, it really isn't, because this re-injection happens fully internal to the Apache server and the same procedure is used by many other operations inside Apache. So, you can be sure the design and implementation is correct.
RewriteCond
Syntax:RewriteCond
TestString CondPattern
Default: -None-
Context: server config, virtual host, per-directory config
The RewriteCond directive defines a rule condition. Precede a RewriteRule directive with one ore more
RewriteCond directives. The following rewriting rule is only used if its pattern matches the current state of the URI AND if these additional conditions apply, too. TestString is a string which contains server-variables of the form
%{ NAME_OF_VARIABLE }where NAME_OF_VARIABLE can be a string of the following list:
HTTP headers: HTTP_USER_AGENT
HTTP_REFERER
HTTP_COOKIE
HTTP_FORWARDED
HTTP_HOST
HTTP_PROXY_CONNECTION
HTTP_ACCEPT
connection & request: REMOTE_ADDR
REMOTE_HOST
REMOTE_USER
REMOTE_IDENT
REQUEST_METHOD
SCRIPT_FILENAME
PATH_INFO
QUERY_STRING
AUTH_TYPE
server internals: DOCUMENT_ROOT
SERVER_ADMIN
SERVER_NAME
SERVER_PORT
SERVER_PROTOCOL
SERVER_SOFTWARE
SERVER_VERSION
system stuff: TIME_YEAR
TIME_MON
TIME_DAY
TIME_HOUR
TIME_MIN
TIME_SEC
TIME_WDAY
specials: API_VERSION
THE_REQUEST
REQUEST_URI
REQUEST_FILENAME
IS_SUBREQ
These variables all correspond to the similar named HTTP MIME-headers, C variables of the Apache server or struct tm fields of the Unix system. Special Notes:
- The variables SCRIPT_FILENAME and REQUEST_FILENAME contain the same value, i.e. the value of the filename field of the internal request_rec structure of the Apache server. The first name is just the commonly known CGI variable name while the second is the consistent counterpart to REQUEST_URI (which contains the value of the uri field of request_rec).
- There is the special format: %{ENV:variable} where variable can be any environment variable. This is looked-up via internal Apache structures and (if not found there) via getenv() from the Apache server process.
- There is the special format: %{HTTP:header} where header can be any HTTP MIME-header name. This is looked-up from the HTTP request. Example: %{HTTP:Proxy-Connection} is the value of the HTTP header ``Proxy-Connection:''.
- There is the special format: %{LA-U:url} for look-aheads like -U. This performans a internal sub-request to look-ahead for the final value of url.
- There is the special format: %{LA-F:file} for look-aheads like -F. This performans a internal sub-request to look-ahead for the final value of file.
CondPattern is the condition pattern, i.e. a regular expression which gets applied to the current instance of the TestString, i.e. TestString gets evaluated and then matched against CondPattern.
Remember: CondPattern is a standard Extended Regular Expression with some additions:
- You can precede the pattern string with a '!' character (exclamation mark) to specify a non-matching pattern.
- There are some special variants of CondPatterns. Instead of real regular expression strings you can also use one of the following:
- '-d' (is directory)
Treats the TestString as a pathname and tests if it exists and is a directory.
- '-f' (is regular file)
Treats the TestString as a pathname and tests if it exists and is a regular file.
- '-s' (is regular file with size)
Treats the TestString as a pathname and tests if it exists and is a regular file with size greater then zero.
- '-l' (is symbolic link)
Treats the TestString as a pathname and tests if it exists and is a symbolic link.
- '-F' (is existing file via subrequest)
Checks if TestString is a valid file and accessible via all the server's currently-configured access controls for that path. This uses an internal subrequest to determine the check, so use it with care because it decreases your servers performance!
- '-U' (is existing URL via subrequest)
Checks if TestString is a valid URL and accessible via all the server's currently-configured access controls for that path. This uses an internal subrequest to determine the check, so use it with care because it decreases your servers performance!Notice: All of these tests can also be prefixed by a not ('!') character to negate their meaning.
Additionally you can set special flags for CondPattern by appending
as the third argument to the RewriteCond directive. Flags is a comma-separated list of the following flags:[
flags]
- '
nocase|NC
' (no case)
This makes the condition test case-insensitive, i.e. there is no difference between 'A-Z' and 'a-z' both in the expanded TestString and the CondPattern.
- '
ornext|OR
' (or next condition)
Use this to combine rule conditions with a local OR instead of the implicit AND. Typical example:
Without this flag you had to write down the cond/rule three times.RewriteCond %{REMOTE_HOST} ^host1.* [OR] RewriteCond %{REMOTE_HOST} ^host2.* [OR] RewriteCond %{REMOTE_HOST} ^host3.* RewriteRule ...some special stuff for any of these hosts...
Example:
To rewrite the Homepage of a site according to the ``User-Agent:'' header of the request, you can use the following:Interpretation: If you use Netscape Navigator as your browser (which identifies itself as 'Mozilla'), then you get the max homepage, which includes Frames, etc. If you use the Lynx browser (which is Terminal-based), then you get the min homepage, which contains no images, no tables, etc. If you use any other browser you get the standard homepage.RewriteCond %{HTTP_USER_AGENT} ^Mozilla.* RewriteRule ^/$ /homepage.max.html [L] RewriteCond %{HTTP_USER_AGENT} ^Lynx.* RewriteRule ^/$ /homepage.min.html [L] RewriteRule ^/$ /homepage.std.html [L]
RewriteRule
Syntax:RewriteRule
Pattern Substitution
Default: -None-
Context: server config, virtual host, per-directory config
The RewriteRule directive is the real rewriting workhorse. The directive can occur more than once. Each directive then defines one single rewriting rule. The definition order of these rules is important, because this order is used when applying the rules at run-time.
Pattern can be (for Apache 1.1.x a System V8 and for Apache 1.2.x a POSIX) regular expression which gets applied to the current URL. Here ``current'' means the value of the URL when this rule gets applied. This may not be the original requested URL, because there could be any number of rules before which already matched and made alterations to it.
Some hints about the syntax of regular expressions:
^
Start of line$
End of line.
Any single character[
chars]
One of chars[^
chars]
None of chars?
0 or 1 of the preceding char*
0 or N of the preceding char+
1 or N of the preceding char\
char escape that specific char (e.g. for specifying the chars ".[]()
" etc.)(
string)
Grouping of chars (the Nth group can be used on the RHS with$
N)Additionally the NOT character ('!') is a possible pattern prefix. This gives you the ability to negate a pattern; to say, for instance: ``if the current URL does NOT match to this pattern''. This can be used for special cases where it is better to match the negative pattern or as a last default rule.
Notice! When using the NOT character to negate a pattern you cannot have grouped wildcard parts in the pattern. This is impossible because when the pattern does NOT match, there are no contents for the groups. In consequence, if negated patterns are used, you cannot use $N in the substitution string! Substitution of a rewriting rule is the string which is substituted for (or replaces) the original URL for which Pattern matched. Beside plain text you can use
Back-references are
- pattern-group back-references (
$N
)- server-variables as in rule condition test-strings (
%{VARNAME}
)- mapping-function calls (
${mapname:key|default}
)$
N (N=1..9) identifiers which will be replaced by the contents of the Nth group of the matched Pattern. The server-variables are the same as for the TestString of a RewriteCond directive. The mapping-functions come from the RewriteMap directive and are explained there. These three types of variables are expanded in the order of the above list.As already mentioned above, all the rewriting rules are applied to the Substitution (in the order of definition in the config file). The URL is completely replaced by the Substitution and the rewriting process goes on until there are no more rules (unless explicitly terminated by a
L
flag - see below).There is a special substitution string named '-' which means: NO substitution! Sounds silly? No, it is useful to provide rewriting rules which only match some URLs but do no substitution, e.g. in conjunction with the C (chain) flag to be able to have more than one pattern to be applied before a substitution occurs.
Notice: There is a special feature. When you prefix a substitution field with http://thishost[:thisport] then mod_rewrite automatically strips it out. This auto-reduction on implicit external redirect URLs is a useful and important feature when used in combination with a mapping-function which generates the hostname part. Have a look at the first example in the example section below to understand this. Remember: An unconditional external redirect to your own server will not work with the prefix http://thishost because of this feature. To achieve such a self-redirect, you have to use the R-flag (see below).
Additionally you can set special flags for Substitution by appending
as the third argument to the RewriteRule directive. Flags is a comma-separated list of the following flags:[
flags]
- '
redirect|R
[=code]' (force redirect)
Prefix Substitution withhttp://thishost[:thisport]/
(which makes the new URL a URI) to force a external redirection. If no code is given a HTTP response of 302 (MOVED TEMPORARILY) is used. If you want to use other response codes in the range 300-400 just specify them as a number or use one of the following symbolic names: temp (default), permanent, seeother. Use it for rules which should canonicalize the URL and gives it back to the client, e.g. translate ``/~
'' into ``/u/
'' or always append a slash to/u/
user, etc.
Notice: When you use this flag, make sure that the substitution field is a valid URL! If not, you are redirecting to an invalid location! And remember that this flag itself only prefixes the URL with
http://thishost[:thisport]/
, but rewriting goes on. Usually you also want to stop and do the redirection immediately. To stop the rewriting you also have to provide the 'L' flag.
- '
forbidden|F
' (force URL to be forbidden)
This forces the current URL to be forbidden, i.e. it immediately sends back a HTTP response of 403 (FORBIDDEN). Use this flag in conjunction with appropriate RewriteConds to conditionally block some URLs.
- '
gone|G
' (force URL to be gone)
This forces the current URL to be gone, i.e. it immediately sends back a HTTP response of 410 (GONE). Use this flag to mark no longer existing pages as gone.
- '
proxy|P
' (force proxy)
This flag forces the substitution part to be internally forced as a proxy request and immediately (i.e. rewriting rule processing stops here) put through the proxy module. You have to make sure that the substitution string is a valid URI (e.g. typically http://) which can be handled by the Apache proxy module. If not you get an error from the proxy module. Use this flag to achieve a more powerful implementation of the mod_proxy directive ProxyPass, to map some remote stuff into the namespace of the local server.Notice: You really have to put ProxyRequests On into your server configuration to prevent proxy requests from leading to core-dumps inside the Apache kernel. If you have not compiled in the proxy module, then there is no core-dump problem, because mod_rewrite checks for existence of the proxy module and if lost forbids proxy URLs.
- '
last|L
' (last rule)
Stop the rewriting process here and don't apply any more rewriting rules. This corresponds to the Perllast
command or thebreak
command from the C language. Use this flag to prevent the currently rewritten URL from being rewritten further by following rules which may be wrong. For example, use it to rewrite the root-path URL ('/
') to a real one, e.g. '/e/www/
'.
- '
next|N
' (next round)
Re-run the rewriting process (starting again with the first rewriting rule). Here the URL to match is again not the original URL but the URL from the last rewriting rule. This corresponds to the Perlnext
command or thecontinue
command from the C language. Use this flag to restart the rewriting process, i.e. to immediately go to the top of the loop.
But be careful not to create a deadloop!
- '
chain|C
' (chained with next rule)
This flag chains the current rule with the next rule (which itself can also be chained with its following rule, etc.). This has the following effect: if a rule matches, then processing continues as usual, i.e. the flag has no effect. If the rule does not match, then all following chained rules are skipped. For instance, use it to remove the ``.www'' part inside a per-directory rule set when you let an external redirect happen (where the ``.www'' part should not to occur!).
- '
type|T
=mime-type' (force MIME type)
Force the MIME-type of the target file to be mime-type. For instance, this can be used to simulate the old mod_alias directive ScriptAlias which internally forces all files inside the mapped directory to have a MIME type of ``application/x-httpd-cgi''.
- '
nosubreq|NS
' (used only if no internal sub-request)
This flag forces the rewriting engine to skip a rewriting rule if the current request is an internal sub-request. For instance, sub-requests occur internally in Apache when mod_include tries to find out information about possible directory default files (index.xxx). On sub-requests it is not always useful and even sometimes causes a failure to if the complete set of rules are applied. Use this flag to exclude some rules.
Use the following rule for your decision: whenever you prefix some URLs with CGI-scripts to force them to be processed by the CGI-script, the chance is high that you will run into problems (or even overhead) on sub-requests. In these cases, use this flag.
- '
passthrough|PT
' (pass through to next handler)
This flag forces the rewriting engine to set theuri
field of the internalrequest_rec
structure to the value of thefilename
field. This flag is just a hack to be able to post-process the output of RewriteRule directives by Alias, ScriptAlias, Redirect, etc. directives from other URI-to-filename translators. A trivial example to show the semantics: If you want to rewrite /abc to /def via the rewriting engine of mod_rewrite and then /def to /ghi with mod_alias:RewriteRule ^/abc(.*) /def$1 [PT] Alias /def /ghiIf you omit the PT flag then mod_rewrite will do its job fine, i.e. it rewrites uri=/abc/... to filename=/def/... as a full API-compliant URI-to-filename translator should do. Then mod_alias comes and tries to do a URI-to-filename transition which will not work.Notice: You have to use this flag if you want to intermix directives of different modules which contain URL-to-filename translators. The typical example is the use of mod_alias and mod_rewrite..
For the Apache hackers:
If the current Apache API had a filename-to-filename hook additionally to the URI-to-filename hook then we wouldn't need this flag! But without such a hook this flag is the only solution. The Apache Group has discussed this problem and will add such hooks into Apache version 2.0.
- '
skip|S
=num' (skip next rule(s))
This flag forces the rewriting engine to skip the next num rules in sequence when the current rule matches. Use this to make pseudo if-then-else constructs: The last rule of the then-clause becomes a skip=N where N is the number of rules in the else-clause. (This is not the same as the 'chain|C' flag!)
- '
env|E=
VAR:VAL' (set environment variable)
This forces an environment variable named VAR to be set to the value VAL, where VAL can contain regexp backreferences $N which will be expanded. You can use this flag more than once to set more than one variable. The variables can be later dereferenced at a lot of situations, but the usual location will be from within XSSI (via <!--#echo var="VAR"-->) or CGI (e.g. $ENV{'VAR'}). But additionally you can also dereference it in a following RewriteCond pattern via %{ENV:VAR}. Use this to strip but remember information from URLs.
Remember: Never forget that Pattern gets applied to a complete URL in per-server configuration files. But in per-directory configuration files, the per-directory prefix (which always is the same for a specific directory!) gets automatically removed for the pattern matching and automatically added after the substitution has been done. This feature is essential for many sorts of rewriting, because without this prefix stripping you have to match the parent directory which is not always possible. There is one exception: If a substitution string starts with ``http://'' then the directory prefix will be not added and a external redirect or proxy throughput (if flag P is used!) is forced!
Notice! To enable the rewriting engine for per-directory configuration files you need to set ``RewriteEngine On'' in these files and ``Option FollowSymLinks'' enabled. If your administrator has disabled override of FollowSymLinks for a user's directory, then you cannot use the rewriting engine. This restriction is needed for security reasons. Here are all possible substitution combinations and their meanings:
Inside per-server configuration (httpd.conf)
for request ``GET /somepath/pathinfo'':
Given Rule Resulting Substitution ---------------------------------------------- ---------------------------------- ^/somepath(.*) otherpath$1 not supported, because invalid! ^/somepath(.*) otherpath$1 [R] not supported, because invalid! ^/somepath(.*) otherpath$1 [P] not supported, because invalid! ---------------------------------------------- ---------------------------------- ^/somepath(.*) /otherpath$1 /otherpath/pathinfo ^/somepath(.*) /otherpath$1 [R] http://thishost/otherpath/pathinfo via external redirection ^/somepath(.*) /otherpath$1 [P] not supported, because silly! ---------------------------------------------- ---------------------------------- ^/somepath(.*) http://thishost/otherpath$1 /otherpath/pathinfo ^/somepath(.*) http://thishost/otherpath$1 [R] http://thishost/otherpath/pathinfo via external redirection ^/somepath(.*) http://thishost/otherpath$1 [P] not supported, because silly! ---------------------------------------------- ---------------------------------- ^/somepath(.*) http://otherhost/otherpath$1 http://otherhost/otherpath/pathinfo via external redirection ^/somepath(.*) http://otherhost/otherpath$1 [R] http://otherhost/otherpath/pathinfo via external redirection (the [R] flag is redundant) ^/somepath(.*) http://otherhost/otherpath$1 [P] http://otherhost/otherpath/pathinfo via internal proxyInside per-directory configuration for /somepath
(i.e. file .htaccess in dir /physical/path/to/somepath containing RewriteBase /somepath)
for request ``GET /somepath/localpath/pathinfo'':
Given Rule Resulting Substitution ---------------------------------------------- ---------------------------------- ^localpath(.*) otherpath$1 /somepath/otherpath/pathinfo ^localpath(.*) otherpath$1 [R] http://thishost/somepath/otherpath/pathinfo via external redirection ^localpath(.*) otherpath$1 [P] not supported, because silly! ---------------------------------------------- ---------------------------------- ^localpath(.*) /otherpath$1 /otherpath/pathinfo ^localpath(.*) /otherpath$1 [R] http://thishost/otherpath/pathinfo via external redirection ^localpath(.*) /otherpath$1 [P] not supported, because silly! ---------------------------------------------- ---------------------------------- ^localpath(.*) http://thishost/otherpath$1 /otherpath/pathinfo ^localpath(.*) http://thishost/otherpath$1 [R] http://thishost/otherpath/pathinfo via external redirection ^localpath(.*) http://thishost/otherpath$1 [P] not supported, because silly! ---------------------------------------------- ---------------------------------- ^localpath(.*) http://otherhost/otherpath$1 http://otherhost/otherpath/pathinfo via external redirection ^localpath(.*) http://otherhost/otherpath$1 [R] http://otherhost/otherpath/pathinfo via external redirection (the [R] flag is redundant) ^localpath(.*) http://otherhost/otherpath$1 [P] http://otherhost/otherpath/pathinfo via internal proxyExample:
We want to rewrite URLs of the forminto/
Language/~
Realname/.../
File/u/
Username/.../
File.
LanguageWe take the rewrite mapfile from above and save it under
/anywhere/map.real-to-user
. Then we only have to add the following lines to the Apache server configuration file:RewriteLog /anywhere/rewrite.log RewriteMap real-to-user txt:/anywhere/map.real-to-host RewriteRule ^/([^/]+)/~([^/]+)/(.*)$ /u/${real-to-user:$2|nobody}/$3.$1
Special Features
Environment Variables
This module keeps track of two additional (non-standard) CGI/SSI environment variables named SCRIPT_URL and SCRIPT_URI. These contain the logical Web-view to the current resource, while the standard CGI/SSI variables SCRIPT_NAME and SCRIPT_FILENAME contain the physical System-view.Notice: These variables hold the URI/URL as they were initially requested, i.e. in a state before any rewriting. This is important because the rewriting process is primarily used to rewrite logical URLs to physical pathnames.
Example:
SCRIPT_NAME=/v/sw/free/lib/apache/global/u/rse/.www/index.html SCRIPT_FILENAME=/u/rse/.www/index.html SCRIPT_URL=/u/rse/ SCRIPT_URI=http://en2.en.sdm.de/u/rse/Two practical uses:
- Creation of generic SSI-footers containing a reference about the URI of the current page:
: <ht> <center> URI: <--#echo var="SCRIPT_URI"--> </center> :
- Creation of absolute references to a relative file, e.g. for the use with the Refresh:-header where a new file is loaded after a period of time. In this example the new file has to be referenced by an absolute URL (i.e. an URI). Here is a way to do it [by the help of embedded Perl for HTML (ePerl), which can be obtained from http://www.engelschall.com/sw/eperl/]:
<!DOCTYPE html PUBLIC '-//IETF//DTD HTML 2.0//EN'> <META HTTP-EQUIV="Refresh" CONTENT="4; URL=<? $URI = $ENV{'SCRIPT_URI'}; $URI =~ s|[^/]+$||; $URI .= "other-rel-doc.html"; print $URI; !>" > <html> :
The author runs some websites and use this module for a lot of tasks. Some special solutions are listed here: Practical Examples
There are a lot of other situations where this module can help or, indeed, which can only be solved with the help of this module. For a complete set of practical solutions check the mod_rewrite Solutions Database.
- URL Canonicalization
To rewrite a lot of URL notations into their canonical form, e.g.
requested URL: gets rewritten to: --------------------------- ------------------------------ / /e/www/ /~user /u/user /{u,g,e}/{user,group,entity} /{u,g,e}/{user,group,entity}/which directly corresponds to the filesystem layout on our machines.
RewriteEngine On : # canonicalize the rootdir RewriteRule ^/$ /e/en [R,L] # canonicalize the Unix shorthand for user dirs # (we don't do a 'L'ast command here because below all # /[uge] dirs will be redirected, too) RewriteRule ^/~([^/]+)/?(.*) /u/$1/$2 [R] # always append / to homedirs if the client forgot it # (if this matches it is the 'L'ast rule and it does a 'R'edirect) RewriteRule ^/([uge])/([^/]+)$ /$1/$2/ [R,L] # enable the Robot Exclusion Standard configuration file RewriteRule ^/robots.txt /v/sw/free/lib/apache/internal/html/robots.txt [L] # disable getting of .wwwacl, .wwwpasswd and .wwwgroups files RewriteRule .*/\.wwwacl$ /internal/cgi/errors/nph-404-notfound RewriteRule .*/\.wwwpasswd$ /internal/cgi/errors/nph-404-notfound RewriteRule .*/\.wwwgroups$ /internal/cgi/errors/nph-404-notfound
- Homogeneous URL Layout
Create a homogenous and consistent URL layout over all WWW servers on a Intranet webcluster, i.e. all URLs (per definition server local and thus server dependent!) become actually server independed! This is obtained by instructing all servers to redirect URLs of the form
to
/{u,g,e}/
{user,group,entity}/
anypath...
http://
physical-host/{u,g,e}/
{user,group,entity}/
anypath...when
/{u,g,e}/
{user,group,entity}/
is not locally valid to anyone of the servers, i.e. the homepage of user doesn't reside on the requested machine. This gives our WWW namespace a consistent layout: because no URL has to include any physically correct target server, and because all servers know the physical target host and do a external redirect if needed. The knowledge of the target servers comes from (distributed) external maps which are used by mapping-functions inside the rewriting rules.
RewriteEngine On : # the map files: RewriteMap user-to-host txt:/v/sw/free/lib/apache/conf/maps/map.user-to-host RewriteMap group-to-host txt:/v/sw/free/lib/apache/conf/maps/map.group-to-host RewriteMap entity-to-host txt:/v/sw/free/lib/apache/conf/maps/map.entity-to-host # and the rules: RewriteRule ^/u/([^/]+)/?(.*) http://${user-to-host:$1|en2.en.sdm.de}/u/$1/$2 RewriteRule ^/g/([^/]+)/?(.*) http://${group-to-host:$1|en2.en.sdm.de}/g/$1/$2 RewriteRule ^/e/([^/]+)/?(.*) http://${entity-to-host:$1|en2.en.sdm.de}/e/$1/$2 # we do a explicit expansion of the effective homedirs by # manually inserting the "UserDir" (see above) into the path! # this gives us the feature of "virtual homedirs", i.e. homedirs # which actually don't have a corresponding user (UID, Homedir). # RewriteRule ^/([uge])/([^/]+)/?$ /$1/$2/.www/ RewriteRule ^/([uge])/([^/]+)/([^.]+.+) /$1/$2/.www/$3
- Secure CGI Script Integration
Be able to pipe any script
.scgi
CGI-program through the popular CGIwrap utility. This will check a CGI-program for security problems. If it passes, the CGI-program runs under the UID/GID of the physical owner. This could not be achieved by a simpleAction
directive (mod_action) because the executable cgiwrap requires itsPATH_INFO
in a special form and not/u/
user/.../
script.scgi
# transform our canonical path into the one CGIwrap wants RewriteEngine On : RewriteRule ^/[uge]/([^/]+)/\.www/(.+)\.scgi(.*) ... ... /internal/cgi/user/cgiwrap/~$1/$2.scgi$3 [NS,T=application/x-http-cgi]
- Simplification Of Services
To be able to add some string to a URL to start a service which operates on that particular URL. For example: We have a search-engine query-form, running as a CGI-program, which gets the directory to operate on via the
QUERY_STRING
variable ``i
''. Usually the user had to reference this program directly and supply a ``i=
directory''QUERY_STRING
part inside the URL, e.g. to call the search-form for
/u/foo/abc/def/
a URL reference to
/internal/cgi/user/swwidx?i=/u/foo/abc/def/
would be have been needed. This was really bad, because the user has to know and hard-code the location of our search-form CGI script and the location of its directory. With the help of this rewriting module he now can just reference the URL
/u/foo/abc/def/swwidx
and this gets rewritten on-the-fly to the physically needed format. The same technique is used for another tool which extracts the information about the specific URL from the local access.log file.
RewriteEngine On : RewriteRule ^/([uge])/([^/]+)(/?.*)/\* /internal/cgi/user/swwidx?i=/$1/$2$3/ RewriteRule ^/([uge])/([^/]+)(/?.*)/swwidx /internal/cgi/user/swwidx?i=/$1/$2$3/ RewriteRule ^/([uge])/([^/]+)(/?.*):swwlog /internal/cgi/user/swwlog?f=/$1/$2$3
- Backward Compatibility for Obsolete URLs
Suppose you have just renamed file oldfile.html inside your Homepage structure to newfile.html and want the old URL to be still valid, i.e. a request to oldfile.html should give the contents of newfile.html. You can achieve this with the following rule in the .htaccess file of the local directory where oldfile.html resides:
RewriteRule ^oldfile\.html$ newfile.html
- The Trailing Slash Problem
Every webmaster can sing a song about the problem of the trailing slash on URLs referencing directories. If they are missing, the server dumps an error, because if you say /somepath/somedir instead of /somepath/somedir/ then the server searches for a file named somedir. And because this file is a directory it complains.
The solution to this subtle problem is to let the server add the trailing slash automatically. To do this correctly we have to use a external redirect, so the browser correctly requests subsequent images etc. If we only did a internal rewrite, this would only work for the directory page, but would go wrong when any images are included into this page with relative URLs, because the browser would request an in-lined object. For instance, a request for image.gif in /somepath/somedir/index.html would become /somepath/image.gif without the external redirect!
So, to do this trick we write:
RewriteRule ^/somepath/somedir$ /somepath/somedir/ [R]The crazy and lazy can do the following in the top-level .htaccess file of their homedir:
RewriteBase /~userfoo RewriteCond %{REQUEST_FILENAME} -d RewriteRule ^(.+[^/])$ $1/ [R]
- Map External Stuff into Local Namespace
You can use the internal proxy module of the Apache server to map remote stuff into your local namespace. This gives a more powerful implementation of the ProxyPass directive from mod_proxy. It is activated by the P (proxy) flag.
Suppose we want to map the latest mod_rewrite manual into a subdirectory, say /u/rse/manuals/, I.e: if the URL /u/rse/manuals/mod_rewrite.html is requested from our local server it should give out the same stuff as the user had requested http://www.engelschall.com/sw/mod_rewrite/mod_rewrite.html. To achieve this we setup the following rule in /u/rse/manuals:
RewriteEngine On : RewriteRule ^mod_rewrite\.html$ ... ... http://www.engelschall.com/sw/mod_rewrite/mod_rewrite.html [P]Or if we want to map the whole mod_rewrite Homepage we can do:
RewriteEngine On : RewriteRule ^mod_rewrite/(.*)$ ... ... http://www.engelschall.com/sw/mod_rewrite/$1 [P]Notice! The proxy feature does not copy the file to your directory, it just looks this way for the user. Instead it is internally retrieved by the Apache proxy module and perhaps internally cached. This feature is very useful because it allows you to have virtual, up-to-date copies of hot stuff available locally.
- Hardcore Example: net.sw
Here is a hardcore example: a killer application which heavily uses per-directory RewriteRules to get a smooth look and feel on the Web while its data structure is never touched or adjusted.
Background:
net.sw is my archive of freely available Unix software packages, which I started to collect in 1992. It is both my hobby and job to to this, because while I'm studying computer science I have also worked for many years as a system and network administrator in my spare time. Every week I need some sort of software so I created a deep hierarchy of directories where I stored the packages:
drwxrwxr-x 2 netsw users 512 Aug 3 18:39 Audio/ drwxrwxr-x 2 netsw users 512 Jul 9 14:37 Benchmark/ drwxrwxr-x 12 netsw users 512 Jul 9 00:34 Crypto/ drwxrwxr-x 5 netsw users 512 Jul 9 00:41 Database/ drwxrwxr-x 4 netsw users 512 Jul 30 19:25 Dicts/ drwxrwxr-x 10 netsw users 512 Jul 9 01:54 Graphic/ drwxrwxr-x 5 netsw users 512 Jul 9 01:58 Hackers/ drwxrwxr-x 8 netsw users 512 Jul 9 03:19 InfoSys/ drwxrwxr-x 3 netsw users 512 Jul 9 03:21 Math/ drwxrwxr-x 3 netsw users 512 Jul 9 03:24 Misc/ drwxrwxr-x 9 netsw users 512 Aug 1 16:33 Network/ drwxrwxr-x 2 netsw users 512 Jul 9 05:53 Office/ drwxrwxr-x 7 netsw users 512 Jul 9 09:24 SoftEng/ drwxrwxr-x 7 netsw users 512 Jul 9 12:17 System/ drwxrwxr-x 12 netsw users 512 Aug 3 20:15 Typesetting/ drwxrwxr-x 10 netsw users 512 Jul 9 14:08 X11/In July 1996 I decided to make this 350 MB archive public to the world via a nice Web interface ( http://net.sw.engelschall.com/net.sw/). "Nice" means that I wanted to offer a interface where you can browse directly through the archive hierarchy. And "nice" means that I didn't wanted to change anything inside this hierarchy - not even by putting some CGI scripts at the top of it. Why? Because the above structure is accessible via FTP as well, and I didn't want my CGI scripts to be there.Solution:
The solution has two parts: The first is a set of CGI scripts which create all the pages at all directory levels on the fly. I put them under /e/netsw/.www/ as follows:
-rw-r--r-- 1 netsw users 1318 Aug 1 18:10 .wwwacl drwxr-xr-x 18 netsw users 512 Aug 5 15:51 DATA/ -rw-rw-rw- 1 netsw users 372982 Aug 5 16:35 LOGFILE -rw-r--r-- 1 netsw users 659 Aug 4 09:27 TODO -rw-r--r-- 1 netsw users 5697 Aug 1 18:01 netsw-about.html -rwxr-xr-x 1 netsw users 579 Aug 2 10:33 netsw-access.pl -rwxr-xr-x 1 netsw users 1532 Aug 1 17:35 netsw-changes.cgi -rwxr-xr-x 1 netsw users 2866 Aug 5 14:49 netsw-home.cgi drwxr-xr-x 2 netsw users 512 Jul 8 23:47 netsw-img/ -rwxr-xr-x 1 netsw users 24050 Aug 5 15:49 netsw-lsdir.cgi -rwxr-xr-x 1 netsw users 1589 Aug 3 18:43 netsw-search.cgi -rwxr-xr-x 1 netsw users 1885 Aug 1 17:41 netsw-tree.cgi -rw-r--r-- 1 netsw users 234 Jul 30 16:35 netsw-unlimit.lstThe DATA/ subdirectory holds the above directory structure, i.e. the real net.sw stuff and gets automatically updated via rdist from time to time.The second part of the problem remains: how to link these two structures together into one smooth-looking URL tree? We want to hide the DATA/ directory from the user while running the appropriate CGI scripts for the various URLs. This is the solution: first I put the following into the per-directory configuration file in the Document Root of the server to rewrite the announced URL /net.sw/ to the internal path /e/netsw:
RewriteRule ^net.sw$ net.sw/ [R] RewriteRule ^net.sw/(.*)$ e/netsw/$1The first rule is for requests which miss the trailing slash! The second rule does the real thing. And here comes the killer configuration which stays in the per-directory config file /e/netsw/.www/.wwwacl:
Options ExecCGI FollowSymLinks Includes MultiViews RewriteEngine on # we are reached via /net.sw/ prefix RewriteBase /net.sw/ # first we rewrite the root dir to # the handling cgi script RewriteRule ^$ netsw-home.cgi [L] RewriteRule ^index\.html$ netsw-home.cgi [L] # strip out the subdirs when # the browser requests us from perdir pages RewriteRule ^.+/(netsw-[^/]+/.+)$ $1 [L] # and now break the rewriting for local files RewriteRule ^netsw-home\.cgi.* - [L] RewriteRule ^netsw-changes\.cgi.* - [L] RewriteRule ^netsw-search\.cgi.* - [L] RewriteRule ^netsw-tree\.cgi$ - [L] RewriteRule ^netsw-about\.html$ - [L] RewriteRule ^netsw-img/.*$ - [L] # anything else is a subdir which gets handled # by another cgi script RewriteRule !^netsw-lsdir\.cgi.* - [C] RewriteRule (.*) netsw-lsdir.cgi/$1Some hints for interpretation:
- Notice the L (last) flag and no substitution field ('-') in the forth part
- Notice the ! (not) character and the C (chain) flag at the first rule in the last part
- Notice the catch-all pattern in the last rule
- Static HTML, Dynamically created:
We will now present another tricky example: assume we have a CGI-script which generates HTML on-the-fly. But the generation takes a lot of time and the generated HTML output is not really sensitive enough that we need it to be generated on every request. It would be enough to have it regenerated from time to time. So, we let the CGI-script, say, page.cgi not only create the HTML code on stdout but write the data to a file named page.html. Now if a request comes in for page.html, we serve it up if exists and was generated not too long ago. If not, we want to run the CGI-script in the background to regenerate it. The user should not see any differences. The following config makes it possible:
RewriteCond %{REQUEST_FILENAME} !-s RewriteRule ^page\.html$ page.cgi [L]
- Blocking some URLs
With the help of the "forbidden|F" flag you can block some URLs according to certain conditions. For example, the following config will block all requests for inlined-in-page.gif when the request contains a ``Referer:'' header which does not end in page-with-gif.html. This way, no one can (theoretically!) include your images into their pages. But in practice this is only true if his browser can send a Referer: header.
RewriteCond %{HTTP_REFERER} !.*/page-with-gif\.html$ RewriteRule ^inlined-in-page\.gif$ - [F]Or you can block all access to a security-sensitive page from IP-address 1.2.3.4:
RewriteCond %{REMOTE_ADDR} ^1\.2\.3\.4$ RewriteRule ^security-page\.html$ - [F]Or to get rid of hits from a specific robot for a specific subtree do the following:
RewriteCond %{HTTP_USER_AGENT} ^HatedFooRobot.* RewriteRule ^/somepath.* - [F]
- Programmed Maps
This example shows the way to do very complicated URL rewriting which cannot be done with the basic functionality of mod_rewrite. There is a RewriteMap filetype ``prg'' - for programs. With this you can setup dynamic maps, i.e. a program which acts like a map. The program gets one key per lookup on stdin and has to provide the value as one line on stdout. If it wants to say ``no value found'' it returns the string: ``NULL''. Here is a trivial example:
RewriteMap foopath-map prg:/usr/local/lib/apache/maps/foopath.pl RewriteRule ^/foo/(.*)$ /foo/${foopath-map:$1}This gives us the ability to program the URL-rewriting stuff as an external program. For this example we take a trivial Perl-script named foopath.pl with the following contents:
#!/usr/local/bin/perl $| = 1; while (<>) { s|bar|quux|; print $_; }This will fork foopath.pl once when Apache starts up and then mod_rewrite will communicate through the stdin/stdout filehandles of foopath.pl with this "map". In the example above this will rewrite /foo/bar/test to /foo/quux/test.
- Partially Forwarded Homepages
Another common situation is the following: You have two webservers, say www.company.dom and www2.company.dom. Each has its own variant of homepages, i.e. the users (or even just some directories or some files) are spread over the two machines. Now you want to provide all pages through www.company.dom virtually. This can be achieved with following configuration (assuming all homepages stay under ``/home'' and the UserDir is ``.www''):
ProxyRequests on RewriteEngine on RewriteRule ^/~([^/]+)/?(.*) /home/$1/.www/$2 RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule ^/home/([^/]+)/\.www/?(.*) http://www2.company.dom/~$1/$2 [P]
Frequently Asked Questions
Q.01: Where can I find more example configurations?
There is a collection of Practical Solutions at http://www.engelschall.com/sw/mod_rewrite/doc/solutions/ where you can find all typical solutions the author currently knows of. If you have own RewritingRule configurations which solve particular problems, send it to rse@engelschall.com for inclusion. The other webmasters will thank you for avoiding the reinvention of the wheel.
Q.02: Are there any published articles about mod_rewrite?
Yes, indeed! There was a german six-page article from Ralf S. Engelschall about mod_rewrite in the iX Multiuser Multitasking Magazin issue #12/96. It can be also read online at http://www.heise.de/ix/artikel/9612149/. And there is a translation to english which can be found at http://www.heise.de/ix/artikel/E/9612149/.
Q.03: Why is mod_rewrite difficult to learn and seems so complicated?
Hmmm... there are a lot of reasons. First, mod_rewrite itself is a powerful module which can help you in really all aspects of URL rewriting, so it can be no trivial module per definition. To accomplish its hard job it uses software leverage and makes use of a powerful regular expression library by Henry Spencer which is an integral part of Apache since its version 1.2. And regular expressions itself can be difficult to newbies, while providing the most flexible power to the advanced hacker.
On the other hand mod_rewrite has to work inside the Apache API environment and needs to do some tricks to fit there. For instance the Apache API as of 1.x really was not designed for URL rewriting at the .htaccess level of processing. Or the problem of multiple rewrites in sequence, which is also not handled by the API per design. To provide this features mod_rewrite has to do some special (but API compliant!) handling which leads to difficult processing inside the Apache kernel. While the user usually doesn't see anything of this processing, it can be difficult to find problems when some of your RewriteRules seem not to work.
Q.04: What can I do if my RewriteRules don't work as expected?
Use RewriteLog somefile and RewriteLogLevel 9 and have a precise look at the steps the rewriting engine performs. This is really the only one and best way to debug your rewriting configuration.
Q.05: Some of my URLs don't get prefixed with DocumentRoot?
If the rule starts with /somedir/... make sure that really no /somedir exists on the filesystem if you don't want to lead the URL to match this directory, i.e. there must be no root directory named somedir on the filesystem. Because if there is such a directory, the URL will not get prefixed with DocumentRoot. This behaviour looks ugly, but is really important for some other aspects of URL rewritings.
Q.06: Why can I not match URLs of type "proxy:http://..." when using the mod_proxy?
The problem here is the proxy context itself, not the URL! Because in the provided Apache configuration file ("Configation") mod_rewrite comes _AFTER_ mod_proxy, which means that mod_proxy's URL translation handler gets called _before_ mod_rewrite. Now it prefixes all "http://..." etc. URLs with "proxy:" AND RETURNS OK to the Apache kernel which means: "Ok, I handled this URL translation stage". As a result, _NO MORE_ translation handlers (like mod_rewrite or mod_alias) get called. So mod_rewrite never sees any "proxy:..." URLs!! The only way to solve your problem is to put mod_rewrite _AFTER_ the mod_proxy entry in "Configuration". Then all URLs get passed to mod_rewrite first and you can just match the "http://..." URLs. This is no problem, I operate mod_rewrite on all webservers in this order. The Apache Group just decided to put mod_rewrite near the mod_alias module. You can put it to the end of "Configuration" without problems.
Comparison to similar Modules
mod_alias
(core module)
- Syntax to be replaced:
server config: Alias /abc/def /XYZ ScriptAlias /abc/def /XYZ Redirect /abc/def http://<thishost>/XYZ/ Redirect /abc/def http://<otherhost>/XYZ/ per-directory config in /abc/def: Redirect /abc/def/oldfile.html http://<thishost>/abc/def/newfile.html- Replacement syntax (converted straight-forward):
server config: RewriteRule ^/abc/def(.*) /XZY$1 [L] RewriteRule ^/abc/def(.*) /XZY$1 [T=application/x-httpd-cgi,L] (**) RewriteRule ^/abc/def(.*) http://<thishost>/XYZ/$1 [R,L] RewriteRule ^/abc/def(.*) http://<otherhost>/XYZ/$1 [R,L] per-directory config in /abc/def: RewriteRule ^(.+)/oldfile.html$ $1/newfile.html [R]
- Replacement syntax (optimized/minimized):
server config: RewriteRule ^/abc/def(.*) /XZY$1 [L] RewriteRule ^/abc/def(.*) /XZY$1 [T=application/x-httpd-cgi,L] (**) RewriteRule ^/abc/def(.*) /XYZ$1 [R,L] RewriteRule ^/abc/def(.*) http://<otherhost>/XYZ$1 [R,L] per-directory config in /abc/def: RewriteRule (.+)/oldfile.html$ $1/newfile.html [R](**) The complete simulation does only work if and only if the directory /XYZ has ExecCGI option set, too!Result: mod_rewrite can fully replace mod_alias.
For a automatic server-internal directive conversion,
please have a look at the mod_rewrite_compat module!
mod_userdir
[request is /~bar/one/two.html]
(core module)Result: mod_rewrite can fully replace mod_userdir.
- Syntax to be replaced:
UserDir public_html UserDir /usr/web UserDir /home/*/www UserDir http://x/users UserDir http://x/*/y- Replacement syntax (converted straight-forward):
RewriteRule ^/~([a-z]+)(.*) ~$1/public_html$2 RewriteRule ^/~([a-z]+)(.*) /usr/web/$1$2 RewriteRule ^/~([a-z]+)(.*) /home/$1/www$2 RewriteRule ^/~([a-z]+)(.*) http://x/users/$1$2 RewriteRule ^/~([a-z]+)(.*) http://x/$1/y$2- Replacement syntax (optimized/minimized):
RewriteRule ^/~([a-z]+)(.*) ~$1/public_html$2 RewriteRule ^/~(.+) /usr/web/$1 RewriteRule ^/~([a-z]+)(.*) /home/$1/www$2 RewriteRule ^/~(.+) http://x/users/$1 RewriteRule ^/~([a-z]+)(.*) http://x/$1/y$2For a automatic server-internal directive conversion,
please have a look at the mod_rewrite_compat module!
mod_userdir_virtual
(contributed module)Result: mod_rewrite can fully replace mod_userdir_virtual.
- Syntax to be replaced:
VirtualUserDir /usr/web- Replacement syntax:
RewriteRule ^/~(.+) /usr/web/$1
mod_userpath
(contributed module)Result: mod_rewrite can fully replace mod_userpath.
- Syntax to be replaced:
UserPath /usr/web- Replacement syntax:
RewriteRule ^/~(.+) /usr/web/$1
mod_uri_remap
(contributed module)Result: mod_rewrite can fully replace mod_uri_remap.
- Syntax to be replaced:
Mother /home_page.html Rename /home.htm /home_page.html Rename /Home.htm /home_page.html- Replacement syntax (converted straight-forward):
RewriteRule ^$ /home_page.html RewriteRule ^/$ /home_page.html RewriteRule ^/\.$ /home_page.html RewriteRule ^/home.htm$ /home_page.html RewriteRule ^/Home.htm$ /home_page.html- Replacement syntax (optimized/minimized):
RewriteRule ^$ /home_page.html RewriteRule ^/\.?$ /home_page.html RewriteRule ^/[Hh]ome.htm$ /home_page.html
Additional Modules
mod_rewrite_compat
Summary
This module provides backward-compatibility for the older and less powerful core-modules mod_alias and mod_userdir.Installation
- Copy the module to apache distribution source tree:
$ cp mod_rewrite_compat.c ApacheDistRoot/src/- Comment out the two replaced modules, add this module to the Apache configuration file and recompile:
$ cd ApacheDistRoot/src/ $ vi Configuration | : | #Module userdir_module mod_userdir.o | #Module alias_module mod_alias.o | Module rewrite_compat_module mod_rewrite_compat.o | : $ ./Configure $ make- Kill the running Apache server, install the new binary, restart the Apache server and you will see a page displaying internal information from the Rewrite Engine.
Usage
Use your old directives from mod_alias and mod_userdir, which automatically get converted into the correct form of the corresponding directive from mod_rewrite.
mod_rewrite_status
Summary
This module is similar to the core modules mod_status and mod_info. It provides for online information about the actual internal configuration information of mod_rewrite.Installation
This module is contained in the standard distribution of mod_rewrite.
- Copy the module to apache distribution source tree:
$ cp mod_rewrite_status.c ApacheDistRoot/src/- Add the module to Apache configuration file and recompile:
$ cd ApacheDistRoot/src/ $ vi Configuration | : | Module rewrite_status_module mod_rewrite_status.c | : $ ./Configure $ make- Kill the running Apache server, install the new binary, restart the Apache server and you will see a page displaying internal information from the Rewrite Engine.
Usage
Just request the URL /rewritestatus through your favorite Web browser.