Status of this memo
===================

This document is an Internet Draft. Internet Drafts are working documents of the
Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that
other groups may also distribute working documents as Internet Drafts.

Internet Drafts are working documents valid for a maximum of six months. Internet
Drafts may be updated, replaced, or obsoleted by other documents at any time. It is
not appropriate to use Internet Drafts as reference material or to cite them other than
as a "working draft" or "work in progress".

This document is a DRAFT specification of a protocol in use on the internet and to
be proposed as an Internet standard. Discussion of this protocol takes place on the
www-talk@info.cern.ch mailing list -- to subscribe mail to
www-talk-request@info.cern.ch. Distribution of this memo is unlimited. 

Abstract
========

HTTP is a protocol with the lightness and speed necessary for a distributed
collaborative hypermedia information system. It is a generic stateless object-oriented
protocol, which may be used for many similar tasks such as name servers, and
distributed object-oriented systems, by extending the commands, or "methods", used.
A feature if HTTP is the negotiation of data representation, allowing systems to be
built independently of the development of new advanced representations. 

Note: This specification
========================

This HTTP protocol is an upgrade on the original protocol as implemented in the
earliest WWW releases. It is back-compatible with that more limited protocol.

This specification includes the following parts: 

 o The Request 
 o Methods 
 o A list of headers in the request message 
 o The response 
 o Status codes 
 o Metainformation headers on any object transmitted 
 o The content of any object content transmitted 
 o Format negotiation algorithm 
 o The HTTP Registration Authority 
 o Security Considerations 
 o Unresolved points 
 o References 

The following notes form recommended practice not part of the specification:

Servers tolerating clients

Clients tolerating servers 

Purpose
=======

When many sources of networked information are available to a reader, and when a
discipline of reference between different sources exists, it is possible to rapidly
follow references between units of information which are provided at different
remote locations. As response times should ideally be of the order of 100ms in, for
example, a hypertext jump, this requires a fast, stateless, information retrieval
protocol.

Practical information systems require more functionality than simple retrieval,
including search, front-end update and annotation. This protocol allows an
open-ended set of methods to be used. It builds on the discipline of reference
provided by the Universal Resource Identifier (URI) as a name (URN, RFCxxxx) or
address (URL, RFCxxxx) allows the object of the method to be specified.

Reference is made to the Multipurpose Internet Mail Extensions (MIME, RFC1341)
which are used to allow objects to be transmitted in an open variety of
representations. 

Overall operation
=================

On the internet, the communication takes place over a TCP/IP connection. This does
not preclude this protocol being implemented over any other protocol on the internet
or other networks. In these cases, the mapping of the HTTP request and response
structures onto the transport data units of the protocol in question is outside the scope
of this specification. It should not however be at all complicated.

The protocol is basically stateless, a transaction consisting of 

Connection 
   The establishment of a connection by the client to the server - when using
   TCP/IP port 80 is the well-known port, but other non-reserverd ports may be
   specified in the URL; 
Request 
   The sending, by the client, of a request message to the server; 
Response 
   The sending, by the server, of a response to the client; 
Close 
   The closing of the connection by either both parties. 

The format of the request and response parts is defined in this specification. Whilst
header information defined in this specification is sent in ISO Latin-1 character set
in CRLF terminated lines, object transmission in binary is possible. 

Character sets
==============

In all cases in HTTP where RFC822 characters are allowed, these may be extended to
use the full ISO Latin 1 character set. 8-bit transmission is always used.

tableofcontents 
Request
*******

The request is sent with a first line containing the method to be applied to the object
requested, the identifier of the object, and the protocol version in use, followed by
further information encoded in the RFC822 header style. The format of the request
is: 

        Request           =     SimpleRequest | FullRequest

        SimpleRequest     =     GET <uri> CrLf

        FullRequest       =     Method URI ProtocolVersion CrLf
                                [*<HTRQ Header>]
                                [<CrLf> <data>]

        <Method>          =     <InitialAlpha>

        ProtocolVersion   =     HTTP/1.0

        uri               =     <as defined in URL spec>

        <HTRQ Header>     =     <Fieldname> : <Value> <CrLf>

        <data>            =      MIME-conforming-message        
 


The URI is the Uniform Resource Locator (URL) as defined in the specification, or
may be (when it is defined) a Uniform Resource Name (URN) when a specification
for this is settled, for servers which support URN resolution.

Unless the server is being used as a gateway, a partial URL shall be given with the
assuptions of the protocol (HTTP:) and server (the server) being obvious.

The URI should be encoded using the escaping scheme described in the URL
specification to a level such that (at least) spaces and control characters (decimal
0-31 and 128-159) do not appear unesacaped.

Note. The rest of an HTTP url after the host name and optional port number is
completely opaque to the client: The client may make no deductions about the object
from its URL. 

Protocol Version
================

The Protocol/Version field defines the format of the rest of the request.. At the
moment only HTRQ is defined .

If the protocol version is not specified, the server assumes that the browser uses
HTTP version 0.9. 

Uniform Resource Identifier
===========================

This is a string identifying the object. It contains no blanks. It may be a Uniform
Resource Locator [ URL ] defining the address of an object as described in RFCxxxx,
or it may be a representation of the name of an object (URN, Universal Resource
Name) where that object has been registered in some name space. At the time of
writing, no suitable naming system exists, but this protocol will accept such names so
long as they are distinguishable from the existing URL name spaces. 

Methods
=======

Method field indicates the method to be performed on the object identified by the
URL. More details are with the list of method names below . 

Request Headers
===============

These are RFC822 format headers with special field names given in the list below ,
as well as any other HTTP object headers or MIME headers. 

Object Body
===========

The content of an object is sent (depending on the method) with the request and/or the
reply. 
Methods
*******

The Method field in HTTP indicates the method to be performed on the object
identified by the URL. The method GET below is always supported, The list of other
methods acceptable by the object are returned in response to either of these two
requests.

This list may be extended from time to time by a process of registration with the
design authority. Method names are case sensitive. Currently specified methods are
as follows: 

GET 
   means retrieve whatever data is identified by the URI, so where the URI
   refers to a data-producing process, or a script which can be run by such a
   process, it is this data which will be returned, and not the source text of the
   script or process. Also used for searches . 
HEAD 
   is the same as GET but returns only HTTP headers and no document body. 
CHECKOUT 
   Similar to GET but locks the object against update by other people. The lock
   may be broken by a higher authority or on timeout: in this case a future
   CHECKIN will fail. ( Phase out? ) 
SHOWMETHOD 
   Returns a description (perhaps a form) for a given method when applied to the
   given object. The method name is specified in a For-Method: field. (TBS) 
PUT 
   specifies that the data in the body section is to be stored under the supplied
   URL. The URL must already exist. The new contenst of the document are the
   data part of the request. POST and REPLY should be used for creating new
   documents. 
DELETE 
   Requests that the server delete the information corresponding to the given
   URL. After a successfull DELETE method, the URL becomes invalid for any
   future methods. 
POST 
   Creates a new object linked to the specified object. The message-id field of
   the new object may be set by the client or else will be given by the server. A
   URL will be allocated by the server and returned to the client. The new
   document is the data part of the request. It is considered to be subordinate to
   the specified object, in the way that a file is subordinate to a directory
   containing it, or a news article is subordinate to a newsgroup to which it is
   posted. 
LINK 
   Links an existing object to the specified object. 
UNLINK 
   Removes link (or other meta-) information from an object. 
CHECKIN 
   Similar to PUT, but releases the lock set on the object. Fails if no lock has
   been set by CHECKOUT. Suggestion : phase out this (rcs-like) model in
   favor of the PUT (cvs-like, non-locking) model of code management. 
TEXTSEARCH 
   The object may be queried with a text string. The search form of the GET
   method is used to query the object. 
SPACEJUMP 
   The object will accept a query whose terms are the cooridnates of a point
   within the object. The method is implemented using GET with a derived URL
   . 
SEARCH 
   Proposed only. The index (etc) identified by the URL is to be searched for
   something matching in some sense the enclosed message. How does the client
   know what message fromats are acceptable to the server? (Suggestion of Fred
   Williams) 

(Some of these methods require more detailed specification) 
HTTP Request fields
*******************

These header lines are sent by the client in a HTTP protocol transaction. All lines are
RFC822 format headers. The list of headers is terminated by an empty line. 


 o From 
 o Accept 
 o Accept-Encoding 
 o Accept-Language 
 o User-Agent 
 o Referer 
 o Authorization 
 o Charge-To 
 o If-Modified-Since 
 o Pragma 



From:
=====

In Internet mail format, this gives the name of the requesting user. This field may be
used for logging purposes and an insecure form of access protection. The
interpretation of this field is that the request is being performed on behalf of the
person given, who accepts responsability for the method performed.

The Internet mail address in this field does not have to correspond to the internet host
which issued the request. (For example, when a request is passed through a gateway,
then the original issuer's address should be used).

The mail address should, if possible, be a valid mail address, whether or not it is in
fact an internet mail address or the internet mail representation of an address on some
other mail system.



Accept:
=======

This field contains a semicolon-separated list of representation schemes (
Content-Type metainformation values) which will be accepted in the response to this
request.

The set given may of course vary from request to request from the same user.

This field may be wrapped onto several lines according to RCFC822, and also more
than one occurence of the field is allowed with the signifiance being the same as if all
the entries has been in one field. The format of each entry in the list is (/ meaning
"or") 

        <field>  =    Accept: <entry> *[ , <entry> ]
        <entry>  =    <content type> *[ ; <param> ]
        <param>  =    <attr> = <float>
        <attr>   =    q / mxs / mxb
        <float>  =    <ANSI-C floating point text represntation>


See the appendix on the negotiation algorithm as a function and penalty model. Note
that a semicolon has a higher precedence than a comma in this syntax, to conform to
MIME use.

If no Accept: field is present, then it is assumed that text/plain and text/html are
accepted. 

Example
-------

                Accept: text/plain, text/html
                Accept: text/x-dvi; q=.8; mxb=100000; mxt=5.0, text/x-c


Wildcards
+++++++++

In order to save time, and also allow clients to receive content types of which they
may not be aware, an asterisk "*" may be used in place of either the second half of
the content-type value, or both halves. This only applies to the Accept: filed, and not
to the content-type field of course. 

Example
-------

                        Accept:  *.*, q=0.1
                        Accept:  audio/*, q=0.2
                        Accept:  audio/basic q=1


may be interpreted as "if you have basic audio, send it; otherwise send me some other
audio, or failing that, just give me what you've got." 

Type parameters
+++++++++++++++

Parameters on the content type are extremely useful for describing resolutions,
colour depths, etc. They will allow a client to specify in the Accept: field the
resolution of its device. This may allow the server to economise greatly on
transmission time by reducing the resultion of an image, for example, and enable a
more appropriate custom-designed black and white image to be selected rathther
than giving the client a color image to convert into monochrome.

These parameters are to be specified when types are registered.. @@ TBS.

Sugestions include the following. Please feed back any references to existing
improved abreviations for these: 

dpi 
   dots per inch: pixels per inch [cm?!] 
pxmax 
   maximum width in pixels (image or video) 
pymax 
   maximum height in pixels 
bits 
   bits per sample (sound) or pixels (graphics) 
mchrome 
   greyscale or black and white (no value) 
sps 
   samples (sound) or frames (video) per second 
length 
   total size of object in bytes [bits?] 



Accept-Encoding:
================

Similar to Accept, but lists the Content-Encoding types which are acceptable in the
response. 

        <field>  =    Accept-Encoding: <entry> *[ , <entry> ]
        <entry>  =    <content transfer encoding> *[ , <param> ]


Example
+++++++

                Accept-Encoding: x-compress; x-zip





Accept-Language:
================

Similar to Accept, but lists the Language values which are preferable in the response.
A response in an unspecifies language is not illegal. See also: Language .

Language coding TBS. (ISO standard xxxx.) 



User-Agent:
===========

This line if present gives the software program used by the original client. This is for
statistical purposes and the tracing of protocol violations. It should be included. The
first white space delimited word must be the software product name, with an
optional slash and version designator. Other products which form part of the user
agent may be put as separate words. 

        <field>   =   User-Agent: <product>+
        <product> =   <word> [/<version>]
        <version> =   <word>


Example:
++++++++

               UserAgent:  LII-Cello/1.0  libwww/2.5




Referer:
========

This optional header field allows the client to specify, for the server's benefit, the
address ( URI ) of the document (or element within the document) from which the
URI in the request was obtained. 

This allows a server to generate lists of back-links to documents, for interest,
logging, etc. It allows bad links to be traced for maintenance.

If a partial URI is given, then it should be parsed relative to the URI of the object of
the request. 

Example:
++++++++

               Referer: http://info.cern.ch/hypertext/DataSources/Overview.html




Authorization:
==============

If this line is present it contains authorization information. The format is To Be
Specified (TBS). The format of this field is in extensible form. The first word is a 
specification of the authorisation system in use.

Basic
+++++

Specification for current one implemented by AL Sep 1993. 

PGP/PEM Encryption
++++++++++++++++++

People at NCSA are designing a PGP/PEM based protection system. 

User/Password scheme
++++++++++++++++++++

                Authorization:  user  fred:mypassword


The scheme name is "user". The second word is a user name (typically derived from a
USER environment variable or prompted for), with an optional password separated
by a colon (as in the URL syntax for FTP). Without a password, this povides very
low level security. With the password, it provides a low-level security as used by
unmodified FTP, Telnet, etc. 

Kerberos
++++++++

                Authorization:  kerberos  kerberosauthenticationsparameters


The format of the kerberosauthenticationsparameters is to be specified. 


ChargeTo:
=========

This line if present contains account information for the costs of the application of
the method requested. The format is TBS. The format of this field must be in
extensible form. The first word starts with a specification of the namespace in which
the account is . (This is similar to extensible URL definition.) No namespaces are
currently defined. Namespaces will be registered with the registration authority .

The format of the rest of the line is a function of the charging system, but it is
recommended that this include a maximum cost whose payment is authorized by the
client for this transaction, and a cost unit. 



If-Modified-Since: date
=======================

This request header is used with GET method to make it conditional: if the requested
document has not changed since the time specified in this field the document will not
be sent, but instead a Not Modified 304 reply. 

Format of this field is the same as for Date:. 



Pragma:
=======

Syntax is the same as for other multiple-value fields in HTTP, like the Accept: field,
namely, a comma-separated list of entries, for which the optional parameters are
separated by semicolons. 

Pragma directives should be understood by servers to which they are relevant, e.g. a
proxy server; currently only one pragma is defined: 

no-cache 
   When present the proxy should not return a document from the cache even
   though it has not expired, but it should always request the document from the
   actual server. 

Pragmas should be passed through by proxies even though they might have
significance to the proxy itself. This is necessary in cases when the request has to go
through many proxies, and the pragma should affect all of them. 

Last modified: May 3, 1994
Response
********

The response from the server shall start with the following syntax (See also: note on
client tolerance ): 

  <status line>   ::=    <http version>  <status code>  <reason line> <CrLf>
  <http version>  ::=    3*<digit>
  <status code>   ::=    3*<digit>
  <digit>         ::=    0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
  <reason line> ::=   * <printable>



<http version> 
   identifies the HyperText Transfer Protocol version being used by the server.
   For the version described by this document version it is "HTTP/1.0" (without
   the quotes). 
< status code > 
   gives the coded results of the attempt to understand and satisfy the request. A
   three digit ASCII decimal number. 
<reason string> 
   gives an explanation for a human reader, except where noted for particular
   status codes. 

Fields on the status line are delimited by a single blank (parsers should accept any
amount of white space). The possible values of the status code are listed below . 

Response headers
================

The headers on returned objects those RFC822 format headers listed as object headers
, as well as any MIME conforming headers, notably the Content-Type field. Note
that this specification doesnot define any headers particular to the response which are
not also apropriate to any transmission of an object with a request. 

Response data
=============

Additional information may follow, in the format of a MIME message body. The
significance of the data depends on the status code. 

The Content-Type used for the data may be any Content-Type which the client has
expressed its ability to accept, or text/plain, or text/html. That is, one can always
assume that the client can handle text/plain and text/html. 
Status codes
************

The values of the numeric status code to HTTP requests are as follows. The data
sections of messages Error, Forward and redirection responses may be used to contain
human-readable diagnostic information. 

Success 2xx
===========

These codes indicate success. The body section if present is the object returned by the
request. It is a MIME format object. It is in MIME format, and may only be in
text/plain, text/html or one fo the formats specified as acceptable in the request. 

OK 200
++++++

The request was fulfilled. 

CREATED 201
+++++++++++

Following a POST command, this indicates success, but the textual part of the
response line indicates the URI by which the newly created document should be
known. 

Accepted 202
++++++++++++

The request has been accepted for processing, but the processing has not been
completed. The request may or may not eventually be acted upon, as it may be
disallowed when processing actually takes place. there is no facility for status returns
from asynchronous operations such as this. 

Partial Information 203
+++++++++++++++++++++++

When received in the response to a GET command, this indicates that the returned
metainformation is not a definitive set of the object from a server with a copy of the
object, but is from a private overlaid web. This may include annotation information
about the object, for example. 

No Response 204
+++++++++++++++

Server has received the request but there is no information to send back, and the
client should stay in the same document view. This is mainly to allow input for
scripts without changing the document at the same time. 

Error 4xx, 5xx
==============

The 4xx codes are intended for cases in which the client seems to have erred, and the
5xx codes for the cases in which the server is aware that the server has erred. It is
impossible to distinguish these cases in general, so the difference is only
informational.

The body section may contain a document describing the error in human readable
form. The document is in MIME format, and may only be in text/plain, text/html or
one for the formats specified as acceptable in the request. 

Bad request 400
+++++++++++++++

The request had bad syntax or was inherently impossible to be satisfied. 

Unauthorized 401
++++++++++++++++

The parameter to this message gives a specification of authorization schemes which
are acceptable. The client should retry the request with a suitable Authorization
header. 

PaymentRequired 402
+++++++++++++++++++

The parameter to this message gives a specification of charging schemes acceptable.
The client may retry the request with a suitable ChargeTo header. 

Forbidden 403
+++++++++++++

The request is for something forbidden. Authorization will not help. 

Not found 404
+++++++++++++

The server has not found anything matching the URI given 

Internal Error 500
++++++++++++++++++

The server encountered an unexpected condition which prevented it from fulfilling
the request. 

Not implemented 501
+++++++++++++++++++

The server does not support the facility required. 

Redirection 3xx
===============

The codes in this section indicate action to be taken (normally automatically) by the
client in order to fulfill the request. 

Moved 301
+++++++++

The data requested has been assigned a new URI, the change is permanent. (N.B. this
is an optimisation, which must, pragmatically, be included in this definition.
Browsers with link editing capabiliy should automatically relink to the new
reference, where possible)

The response contains one or more header lines of the form 

       URI: <url> String CrLf


Which specify alternative addresses for the object in question. The String is an
optional comment field. If the response is to indicate a set of variants which each
correspond to the requested URI, then the multipart/alternative wrapping may be
used to distinguish different sets 

Found 302
+++++++++

The data requested actually resides under a different URL, however, the redirection
may be altered on occasion (when making links to these kinds of document, the
browser should default to using the Udi of the redirection document, but have the
option of linking to the final document) as for "Forward".

The response format is the same as for Moved . 

Method 303
++++++++++

        Method: <method> <url>
        body-section

Note: This status code is to be specified in more detail. For the moment it is for
discussion only. 

Like the found response, this suggests that the client go try another network address.
In this case, a different method may be used too, rather than GET. 

The body-section contains the parameters to be used for the method. This allows a
document to be a pointer to a complex query operation. 

The body may be preceded by the following additional fields as listed. 

Not Modified 304
++++++++++++++++

If the client has done a conditional GET and access is allowed, but the document has
not been modified since the date and time specified in If-Modified-Since field, the
server responds with a 304 status code and does not send the document body to the
client. 

Response headers are as if the client had sent a HEAD request, but limited to only
those headers which make sense in this context. This means only headers that are
relevant to cache managers and which may have changed independently of the
document's Last-Modified date. Examples include Date , Server and Expires . 

The purpose of this feature is to allow efficient updates of local cache information
(including relevant metainformation) without requiring the overhead of multiple
HTTP requests (e.g. a HEAD followed by a GET) and minimizing the transmittal of
information already known by the requesting client (usually a caching proxy). 
Object MetaInformation
**********************

The header fields given with or in relation to objects in HTTP are as follows. All are
optional. These headers specify metainformation: that is, information about the
object, not the information which is contained in the object.

There is no reason for limiting these fields to HTTP use, as any other system which
requires metainformation is encouraged to use them.

The order of header lines withing the HTTP header has no significance. However,
those fields which are not MIME fields should occur before the MIME fields, so that
the MIME fields and following form a valid MIME document. This is not
mandatory.

Any header fields which are not understood should be ignored.

(TBS in more detail.) 



 o Allowed 
 o Public 
 o Content-Length 
 o Content-Type 
 o Content-Transfer-Encoding 
 o Content-Encoding 
 o Date 
 o Expires 
 o Last-Modified 
 o Message-Id 
 o URI 
 o Version 
 o Derived-From 
 o Content-Language 
 o Cost 
 o Link 
 o Title 



Allowed: *Method
================

Lists the set of requests which the requesting user is allowed to issue for this URL. If
this header line is omitted, the default allowed methods are "GET HEAD" 

Example of use:
+++++++++++++++

                Allow: GET HEAD PUT




Public: *method
===============

As "Allow" but lists those requests which anyone may use. If omitted, the default is
"GET" only. 

Example of use:
+++++++++++++++

                Public: GET HEAD TEXTSEARCH




Content-Length: int
===================

Implies that the body is binary and should be read directly from the communications
link, without parsing lines, etc. When the data is part of the request, prevents the
escaping and de-escaping of the termination sequence .

@@@ This should be part of the MIME header, as it applies to any binary encoded
part. Note HTML is the first internet protocol to allow MIME "binary" encoding. In
MIME, the use of Content-Length is currently allowed only for external messages. 



Content-Type:
=============

As defined in MIME , except where noted here. 

Extra non-MIME types
++++++++++++++++++++

It is reasonable to put strict limits on transfer formats for mail, where there is no
guarantee that the receiver will understand a weird format. However, in HTTP one
knows that the receiver will be able to receive it because it will have been sent in the
Accept: field . There is therefore a lot to be gained from a very complete registry of
well-defined types for HTTP which may nevertheless not be recommended for mail.
In this case, the content-type list for HTTP may be a superset of the MIME list.

The x- convention for experimental types is of course still available as well. 

Multipart types
+++++++++++++++

MIME provides for a number of "multipart" types. These are encapsulations of
several body parts in the one message. In HTTP, Multipart types may be returned on
the condition that the client has indicated acceptability (using Accept: )of the
multipart type and also of the content types of each consitutent body part.

The body parts ( unlike in MIME ) MAY contain HTTP metainformation header
fields which ARE significant. 

Multipart/alternative
---------------------

This is normally used in mail to send different content-type variants when the
receiver's capabilities are not known. This is not the case with HTTP.
Multipart/alternate may however be used to provide meta information of many
instances of an object, in the case of a indirection response . This allows, for example,
pointers to be returned by a name server to a set of instances of an object. 

Multipart/related
-----------------

This is the type to be used when the first body part contains references to other parts
which the server wishes to send at the same time. For example, the first part could be
an HTML document, and the included bodyparts could be the inline images
mentioned within the text.

The body parts may have URI: fields if the body parts have URIs, and so they may be
referred to by these URIs in the body-parts. If the body-parts are transient (as in
speech insertions in mail messages) then the [propsed] "cid:" URI type may be used to
refer to them by content-identifier. 

Multipart/mixed
---------------

This may be used to simply transfer an unrelated unstructured set of objects.
Examples are a picture with audio commentary, etc. 

Multipart/parallel
------------------

This may be used as in MIME to indicate simultaneous presentation. [It is the
author's belief that this is a trivial case of a compound presentation which in general
should be described by a script which would be the first bodypart of a
multipart/related document]. 



Content-Transfer-Encoding
=========================

As in MIME. Some profiling and/or extension may be necessary, TBS. (Compression
is not treated as transfer encoding by MIME). 



Content-Encoding
================

Specifies the encoding mechamism used. Currently in use only x-compress and 
x-gzip. 



Date: date
==========

Creation date of object. (or last modified, and separately have a Created: field?)
Format as in RFC850 but GMT MUST BE USED. 



Expires: date
=============

Gives the date after which the information given ceases to be valid and should be
retrieved again. This allows control of caching mechanisms, and also allows for the
periodic refreshing of displays of volatile data. Format as for Date:. This does NOT
imply that the original object will cease to exist. 



Last-Modified: date
===================

Last time object was modified, i.e. the date of this version if the document is a
"living document". Format as for Date:. 



Message-id: uri
===============

A unique identifier for the message. As in RFC850 , except that the unlimited
lifetime of HTTP objects requires that the Message-ID be unique in all time, not just
in two years.

A document may only have one Message-ID.

No two documents, even if different versions of the same live document, may have
the same Message-id. 

Note: Unlike the URI field, this does not fgive a way of accessing the document, so
the Message-Id cannot be used to refer to the document. In the case of NNTP articles,
the message-id may in fact be used within the URI for retrieval using NNTP. 



URI: 1*uri
==========

This gives a URI with which the object may be found. There is no guarantee that the
object can be retrieved using the URI specified. However, it is guaranteed that if an
object is successfully retrieved using that URI it will be to a certain given degree the
same object as this one.

If the URI is used to refer to a set of variants, then the dimensiosn in which the
variants may differ must be given with the "vary" parameter: 

        Syntax          URI: <uri>  [ ; vary = dimension [ , dimension ]* ]
        dimension       content-type | language | version 



If no "vary" parameters are given, then the URI may not return anything other than
the same bit stream as this object.

Multiple occurencies of this field give alternative access names or addresses for the
object. 

Examples
++++++++

                URI:  http://info.cern.ch/pub/www/doc/url6.multi; vary=content-type


This indicates that retrieval given the URI will return the same document, never an
updated version, but optionally in a different rendition. 

                URI:  http://info.cern.ch/pub/www/doc/url.multi;
                    vary=content-type, language, version


This indicates that the URI will return the smae document, possibly in a different
rendition, possibly updated, and without excluding the provision of translations into
different languages.URI: http://info.cern.ch/pub/www/doc/url6.ps
vary=content-typeThis indicates that accessing the URI in question will return
exactly the same bitstream. 



Version:
========

This is a string defining the version of an evolving object. Its format is currently
undefined , and so it should be treated as opaque to the reader, defined by the
informatiuon provider. The version field is used to indicate evolution along a single
path of a partucular work. It is NOT used to indicate derived works (use a link), 
translations , or renditions in different representations .

Note: It would be useful to have sufficient semantics to be able to deduce whether
one version predated or postdated another. However, it may also be useful to be able
to insert a particular local code management system's own version stamp in this
field. Typically, publishers will have quite complex version information containing
hidden local semantics, giving value to the idea of this field being opaque to other
readers ofthe document. 



Derived-From:
=============

When an editied object is resubmitted using PUT for example, this field gives the
value of the Version . This typically allows a server to check for example that two
concurrent modifications by different parties will not be lost, and for example to use
established version management techniques to merge both modifications. 



Content-Language: code
======================

The language code is the ISO code for the language in which the document is written.
If the language is not known, this field should be omitted of course . 

The language code is an ISO 3316 language code with an optional ISO639 country
code to specify a national variant. 

Example
+++++++

                 Content-Language: en_UK


means that the content of the message is in British English, while 

                  Content-Language: en


means that the language is English in one of its forms. (@@ If a document is in more
than one language, for example requires both Greek Latin and French to be
understood, should this be representable?)

See also: Accept-Language. 



Cost: TBS
=========

The cost of retrieving the object is given. This is the cost of access of a copyright
work, with a specification of the payment system accepted. Format to be specified.
Currently refers to an unspecified charging scheme to be agreed out of band between
parties.

Note that ISO 4217 provides a standard for desription of currency codes, but that one
can imagine payment methods which have their own currency (Club Med iI believe
uses cloured beads). 


Link:
=====

Note. It is proposed that any HTML metainformation element (allowed withing the
HEAD as opposed to BODY element of the document) be a valid candidate for an
HTTP object header. LINK is one example, TITLE another. One suggestion was that
the isomorphism should be realized by prepending "WWW-" to the HTML element
name to make the HTTP header name, and the HTML attributes imply identically
named semicolon-separated MIME-style header parameters. It is open to discussion
whether the "WWW-" should be inserted or not.

This is semantically equivalent to the LINK element in an HTML document which
should be consulted for a full explanation. 

Examples
++++++++

        WWW-Link:  href="http://info.cern.ch/a/b/c"; rel="includes"
        WWW-Link:  href="mailto:timbl@info.cern.ch"; rev=made


The first example indicates that this object includes the specified /a/b/c object. The
second indicates that the author of the object is identified by the given mail address. 



Title:
======

The title of the document. Not part of the document. Syntax is as for the Subject field
in RFC822. The Title field differs from the Subject field in that the title is defined
by the author of a document, but the Subject field is defined by the originator.

WWW note: Isomorphic with the <TITLE> element in HTML. 


Last modified: May 3, 1994
Object Contents
***************

The data (if any) sent with an HTTP request or reply is in a format and encoding defined
by the object header fields, the default being "plain/text" type with "8bit" encoding.
Note that while all the other information in the request (just as in the reply) is in ISO
Latin1 with lines delimited by Carriage Return/Line Feed pairs, the data may contain
8-bit binary data. 

Termination
+++++++++++

The delimiting of the message is determined by the Content-Length: field. If this is
present, then the message contains the specified number of bytes.

Failing that, the content-type filed may contain a "bounday" attribute which gives the
boundary string with exacly the same syntax as for a MIME multipart message.

Failing either of the above conditions, the data is terminated by the closing of the
connection by the sending party. Note that this method can not be used for data sent with
the request.

See also: note on server tolerance for back-compatibility, etc. 
HTTP Negotiation algoritm
*************************

This note defines the significance of the q, mxb and mxs values optionally sent in the 
Accept: field of the HTTP protocol request message.

It is assumed that there is a certain value of the presentation of the document,
optimally rendered using all the information available in its original source.

It is further assumed that one can allocate a number between 0 and 1 to represent the
loss of value which occurs when a document is rendered into a representation with
loss of information. Whilst this is a very subjective measurement, and in fact largely
a function of the document in question, the approximation is made that one can
define this "degradation" figure as a function of merely the representation involved.

The next assumption is that the other cost to the user of viewing the document is a
function of the time taken for presentation. We first assume that the cost is linear in
time, and then assume that the time is linear in the size of the message.

The final net value to the user can therefore be written

presented_value = initial_value * total-degradation - a - b * size

for a document in a given incoming representation. Suppose we normalize the initial
value of the document to be 1. The server may judge that the value in a particular
format is less than 1 is a conversion on the server side has lost information. The total
degradation is then the product of any degradation due to conversions internal to the
server, and the degradation "q" sent in the Accept field. If q is not sent, it defaults to
1.

The values of a and b have components from processing time on the server, network
delays, and processing time on the client. These delays are not additive as a good
system will pipeline the processing, and whilst the result may be linear in message
size, calculation of it in advance is not simple. The amount of pipelining and the
loads on machines and network are all difficult to predict, so a very rough
assumption must be made.

We make the client responsible for taking into account network delays. The client
will in fact be in a better position to do this, as the client will after one transaction be
aware of the round-trip time.

We assume that the delays imposed by the server and by the client (including
network) are additive. We assume that the client's delay is proportional to message
size.

The three parameters given by the client to the server are 

q 
   The degradation (quality) factor between 0 and 1. If omitted, 1 is assumed. 
mxb 
   The size of message (in bytes) which even if immediately available from the
   server will cause the value to the reader to become zero 
mxs 
   The delay (in seconds) which, even for a very small message with no
   length-related penalty, will cause the value to the reader to become zero. 

These parameters are chosen in part because they are easy to visualize as the largest
tolerable delay and size. If not sent, they default to infinity.

The server may optimize the presented value for the user when deciding what to
return. The hope is that fine decisions will not have to be made, as in most cases the
results for different formats will be very different, and there will be a clear winner.

A suitable algorithm is that the assumed value v of a document of initial value u
delivered to the network after a delay t whose transfer length on the net is b bytes is

v = u * q - b/mxb - t/mxs

Note that t is the time from the arrival of the request to the first byte being available
on the net. [[See also: Design issues discussions around this point.]] 
Registration Authority
**********************

The HTTP Registration Authority is responsible for maintaining lists of: 

 o Charge account name spaces (see ChargeTo: field above) 
 o Authorization schemes (see Authorization: field above) 
 o Data format names (as MIME Content-Types) 
 o Data encoding names (as MIME Content-Encoding)) 

It is proposed that the Internet Assigned Numbers Authority or their successors take this
role.

Unregistered values may be used for experimental purposes if they are start with "X-". 
Security Considerations
***********************

The HTTP protocol allows requests to communication to a remote server machine,
and all the expetant security considerations for client-server systems apply, including

 o Authentication of requests 
 o Authenticationtion of servers 
 o Privacy of request and response 
 o Abuse of server features 
 o Abuse of servers by exploiting server bugs 
 o Unwitting actions on the net 
 o Abuse of log information 

The bulk of these are well known problems, tackled in part by some featured of this
protocol. Some aspects particular to HTTP are mentioned below. 

See also the Shen Security proposal. 

Unwitting actions on the net
============================

The writers of client software should be aware that the software represents the user
in his interactions over the net, and should be careful to allow the user to be aware of
any actions he may take which may be taken as having an unexpected significance by
others. 

TCP port numbers
++++++++++++++++

Clients should prompt a user before allowing HTTP access to reserved ports other
than the port resrverd for HTTP (port 80). Otherwise, the user may unwittingly cause
a transaction to occur in some other (present or future) protocol. 

Idempotent methods
++++++++++++++++++

The convention should be established that the GET and HEAD methods never have
the significance of taking an action. The link "click here to subscribe" causing the
reading of a special "magic" document is open to abuse by others making a link
"click here to see a pretty picture". These methods should be considered "safe" and
should not have side effects. This allows the client software to represent other,
methods (such as POST) in a special way, so that the use is aware of the fact that an
action is being requested. 

Abuse of log information
========================

A server is in the position to save large amount of personal data about information
requested by different readers. This information is clearly confidential in nature, and
its handling may be constrained by law in certain countries. Server providers shall
ensure that such material is not distributed without the permission of any people or
groups of people mentioned in the results published.

A feature which increases the amount of personal data transferred is the Referer:
field. This allow reading patters to be studied, reverse links drawn, and so is very
useful. Its power can be abused of course if user details are not separated from the
Referee-Referer pairs. Even when the personal information has been removed, the
Referer field may in fact be a secure document's URI, whose revelation itself is
breach of security. A method of suppressing the Referer information in such cases
may be the subject of further study. 
Unresolved points
*****************

Summarising the points about which further work is necessary: 

 o How does the client know what data formats a server can accept? 
 o Format of the Version: field 

Tim BL 
References
**********

RFC 822 
   "Standard for ARPA Internet Text Messages". David H. Crocker, describes
   Internet mail message fromat. 
RFC850 
   "Standard for Interchange of USENET Messages" This RFC uses some field
   names in common with this specification, and is relevant reading. 
RFC977 
   "Network News Transfer Protocol", Kantor and Lampsley. 
RFC 1341 
   Multipurpose Internet Mail Extensions (MIME), Nathaniel Borenstien and
   Ned Freed, Internet RFC 1341, 1992. Now obsoleted by RFC1521 : 
RFC 1521 
   MIME. Not available in hypertext form yet. 
RFC 1522 
   K Moore. "MIME: Part Two: Message Header Extensions for Non-ASCII
   Text". 
RFC1523 
   MIME text/enriched. 
RFC 1524 
   MIME... 
URL 
   Universal Resource Locators. RFCxxx. Currently available by anonymous
   FTP from info.cern.ch as /pub/ietf/url3.{ps,txt}. 
MIME and PEM 
   Internet Draft only 

(end of references) 
Note: Server tolerance of bad clients
*************************************

Whilst it is seen appropriate for testing parsers to check full conformance to this
specification, it is recommended that operational parsers be tolerant of deviations.

In particular, lines should be regarded as terminated by the Line Feed, and the preceeding
Carriage Return character ignored.

Any HTTP Header Field Name which is not recognised should be ignored in operational
parsers.

It is recommended that servers use URIs free of "variant" characters whose
representation differs in some of the national variant character sets, punctuation
characters, and spaces. This will make URIs easier to handle by humans when the need
(such as debugging, or transmission through non hypertext systems) arises. 
Note: Client tolerance of bad servers
*************************************

Servers not implementing the specification as written are not HTTP compiant. Servers
should always be made completely copmpliant. However, clients should also tolerate
deviant servers where possible. 

Back compatibility
==================

In order that clients using the HTTP protocol should be able to communicate with
servers using the protocol originally implemented in the W3 data model, clients should
tolerate responses which do not start with a numeric version number and response codes.

In this case, they should assume that the rest of the response is a document body in type
text/html. 

White space
===========

Clients should be tolerant in parsing response status lines, in particular they should
accept any sequence of white space (SP and TAB) characters between fields.

Lines should be regarded as terminated by the Line Feed, and the preceeding Carriage
Return character ignored. 
