Name-based Virtual Hosting in TCP

Name-based Virtual Hosting in TCP
Hubert Chao	Brian Kowolowski
Jed Liu	Jeffrey M. Vinocur

Introduction

Our project involves the addition of name-based virtual hosting support to the Transport Control Protocol [TCP]. This involves sending hostname information at the beginning of each connection, similar in spirit to virtual hosting on the web [HTTP/1.1] but at a lower level.

Motivation and goals

There are a few problems which can be solved with the modification we describe.

Very few application-level protocols have support for name-based virtual hosting; there simply was no need for it when most of these protocols were designed. In the absence of protocol support, the only information available (to the server) about the ``desired'' resource is what IP address and port number the client requested. A server application certainly can make use of the IP address in handling a request; this is IP-based virtual hosting. But IP-based virtual hosting is often a poor solution for reasons we will discuss below.
Although some application-level protocols, notably HTTP, support name-based virtual hosting, there are some protocols for which this is quite simply impossible. One of these is [HTTPS]; the protocol requires presentation of a ``server certificate'' before any in-band data is transmitted, including the virtual hosting information. A solution to this has been presented in [TLS-HTTP], but will require significant changes to the deployed software base and thus is unlikely to face easy adoption.
The success of ``Outbound'' Network Address Translation [NAT] (for using a single globally-unique IP address for an entire collection of machines on an internal network) relies on the fact that in general, clients are content with only outgoing TCP connections. However, there are times when this is inadequate. For example, the ``Sidecar'' authentication service here at Cornell [TECH-ARCH] requires the server make an out-of-band query to the client (similar to traditional Unix Identification Protocol [IDENT]). This is virtually impossible in the presence of traditional NAT.

We have designed the modifications to TCP necessary to ``cure'' all three of these problems, and implemented them in the Linux 2.4 kernel. We also present as proof-of-concept the minimal modifications to several applications necessary to make use of these changes. We did not investigate the changes to the kernel's routing functionality required to implement point (3) above.

Problems with IP-based virtual hosting

In principle, IP-based virtual hosting might be sufficient for point (1) above; simply have one IP address for each virtual host. But in practice, this is inadequate:

Almost all server applications support binding to only a particular IP address, and so one can simply run multiple copies of the server application, each listening on a different IP. But this requires a substantial overhead, since each virtual host gets its own process on the server machine.
The above can be overcome within the bounds of standard TCP/IP, by writing a server application to listen on multiple IP addresses but utilize the IP address in handling the request. (This is IP-based virtual hosting in the strictest sense, as only one server process is required.) This has been done for a number of application protocols; server applications with this feature exist for FTP, IRC, NNTP, and POP.
But the above solution does not scale particularly well either. In the end, it still requires an IP address for each virtual host. And the number of virtual hosts desired on a given machine can be quite large. (For example, a high-end webserver might host hundreds of thousands of domains. [ZEUS]) But even if the site can acquire enough IP addresses, it's hard to imagine an operating system handling such a number of addresses bound to a single interface.
(In fact, there might be no limit to the number of virtual hosts desired; imagine the administrator of some example.tld wanting requests for any host in the example.tld domain to be handled dynamically, generating content depending on the hostname. This is not even possible using IP-based virtual hosting.)

Certainly name-based virtual hosting is far cleaner and more useful, in principle, than IP-based virtual hosting. But the domain name used in the request is not available to the server from TCP/IP. As a result, protocol support is required in order to do name-based virtual hosting. But it is too late to get this sort of support in all of the application protocols. The solution is a method which requires software support only (that is, no protocol modifications). In addition, putting support in TCP reduces the overall amount of duplication of design and code.

Design and Implementation

Design decisions

Ideally, every IP packet could be directed to a particular host. But in practice, the overhead of specifying the extra data, possibly 520 bytes, is far too much for every packet. So we turn to TCP, in which case specifying the name of the remote machine which we wish to connect to is quite reasonably a connection-related thing. Thus we can place the functionality in the TCP protocol, to be negotiated at the beginning of each connection, and get many of the benefits with virtually no overhead.
Because a pair of hostnames is potentially quite a bit of data, we might consider alternatives to reduce the size of the SYN segment. One reasonable approach would be to include a hash of the hostname only; even a cryptographically strong hash needs only 16 bytes per hostname. But this eliminates a number of the interesting things that can be done with TCP-level name-based virtual hosting. And since nowadays very few links have MTU restrictions that would cause problems for a single packet of this size, we consider it worthwhile.
Because the maximum length of a TCP header is 60 bytes, we obviously needed to store the hostnames in the data segment. (Recall that hostnames are potentially up to 255 bytes long.) Thus we have a TCP option in the header which specifies that our data is present and how long it is. Thus non-compliant TCP implementations can immediately determine that the option is unrecognized, while compliant implementations can extract the data.
We chose to add the data to the initial SYN because the option is directly related to establishing a connection with the remote host.
We also added a checksum for our data. It is not clear whether non-compliant TCP implementations will checksum the entire SYN segment even if they will ignore everything except the headers. Thus we need to generate SYN segments that will have a correct checksum regardless of whether the data is included, in order to ensure backwards compatability. By carefully calculating the checksum of the data section, and then storing that value (complemented) also in the data section, we guarantee that the checksum will match in either case. Additionally, this technique ensures the reliability of the hostname data: on systems which do only checksum the headers of the SYN segment, the HOSTS implementation can verify the checksum internally; on implementations that do include the data portion in the TCP checksum, the hostname data will be verified automatically (if the data is corrupt, our checksum will not ``cancel'' the data and thus the TCP checksum of the entire segment will not match).
Thus we have added our host request to the data portion of the SYN segment while maintaining complete backward compatibility with existing IPv4 implementations.
Although hostnames are, in practice, restricted to a sane character set, we chose to design our extension to treat hostnames as raw data (using explicit lengths instead of null-termination to mark the end of the data). This allows arbitrary binary data to be passed using the HOSTS option. The reason for this comes from the standard:
However, future additions beyond current usage may need to use the full binary octet capabilities in names, so attempts to store domain names in 7-bit ASCII or use of special bytes to terminate labels, etc., should be avoided. [DNS]

We take their advice and avoid null-terminated strings.

Protocol modifications

Endpoint naming semantics

The goal is to pass two DNS hostnames in the SYN segment which initiates a TCP connection. We consider the sender name and receiver name, which in usual TCP fashion correspond to the local endpoint and remote endpoint, respectively. The data is always stored in the packet using the sender's point of view; this means that receiving TCP stack must swap the fields to get the receiver's point of view.

TCP header changes

The TCP specification includes an extension mechanism, using header options. Currently options through 26 decimal have been assigned [IANA]; we chose 42 decimal unofficially for the HOSTS option. Our option has a length of 6 bytes: the 8-bit TCP kind field, the 8-bit TCP length field, a 16-bit field specifying the location of the relevant section (see below) in the data section, and two 8-bit fields specifying the lengths of the receiver name and sender name, respectively.

    +--------+--------+---------+--------+--------+--------+
    |00101010|00000110|      offset      | rcvlen | sndlen |
    +--------+--------+---------+--------+--------+--------+
     Kind=42  Length=6

As there are currently no other options which involve putting data in a SYN segment, the offset will likely always be zero. However, implementations should handle the offset in the event it becomes useful.

TCP data changes (SYN segment only)

If the HOSTS option is present in the TCP headers of a SYN segment, there should be at least 2 + offset + rcvlen + sndlen bytes in the data section of the segment. If there is less data than that, the packet is malformed. An implementation may discard the segment, but is permitted (and encouraged) to accept the segment but treat it as if the HOSTS option had not been present. Beginning offset bytes into the data section, there should be a 16-bit checksum (in usual TCP ones-complement fashion), followed by the two variable length fields for receiver and sender hostname, respectively (the lengths, of course, are found in the TCP header as described above).

    +--------+--------+---          ---+---          ---+
    | option checksum | ...rcv host... | ...snd host... |
    +--------+--------+---          ---+---          ---+

The checksum is, of course, stored in network byte order.

Networking API modifications

The host_info struct

Central to our API modification is the host_info struct:

    struct host_info {
       __u8 rcv_host[TCP_MAX_HOST_LEN + 1];
       __u8 rcv_host_len;
       __u8 snd_host[TCP_MAX_HOST_LEN + 1];
       __u8 snd_host_len;
    };

This defines the basic data structure that an API programmer would use to interface with our TCP option.

setsockopt

The setsockopt(2) API function is used to set the sender and receiver hostnames for a socket. At present, this is how the client indicates, before the call to connect(2), which hostname it used to obtain the server's IP address. It might also be useful in the future on the server, before the call to bind(2), as described in Future Work below. Usage looks like:

    struct host_info hosti;
    socklen_t optlen = sizeof(struct host_info);
    /* initialize fields of "hosti" here, including length fields */
    if (setsockopt(sockfd, SOL_TCP, TCP_HOSTS, &hosti, optlen) < 0) {
        /* an error occurred */
    }

The kernel may also set the sender and receiver hostnames for a socket without the application calling setsockopt(2), for example if an incoming SYN segment includes the HOSTS option.

Calls to setsockopt(2) for the TCP_HOSTS option are not useful after the connection has been initiated. The only possible error (other than the normal setsockopt(2) errors) is EINVAL, indicating that the optlen passed in was not acceptable.

getsockopt

The getsockopt(2) API function is used to recover the current sender and reciever hostnames for a socket. This is how the server determines what hostnames, if any, the client specified when initiating the connection. Usage looks like:

    struct host_info hosti;
    socklen_t optlen = sizeof(struct host_info);
    if (getsockopt(sockfd, SOL_TCP, TCP_HOSTS, &hosti, &optlen) < 0) {
        /* an error occurred */
    }
    /* use fields of "hosti" here */

The caller should be warned that because of the potential for binary data (see the discussion in Design decisions), the hostname strings are not guaranteed to be terminated by a '\0' character.

The length returned in the final argument to getsockopt(2) is the length of the first string in the host_info struct (that is, the value of the rcv_host_len field), provided that the input length was sufficient to store at least that string. This means that if the server is only interested in the receiver hostname field, the following idiom is possible:

    socklen_t optlen = TCP_MAX_HOST_LEN;
    char hostname[optlen];
    if (getsockopt(sockfd, SOL_TCP, TCP_HOSTS, hostname, &optlen) < 0) {
        /* an error occurred */
    }
    /* use "optlen" and "hostname" fields here */

The only possible error (other than the normal getsockopt(2) errors) is EOPNOTSUPP, which indicates that no hostname information is currently available for this socket (for example, the client did not send the HOSTS option). Note that getsockopt(2) will return EOPNOTSUPP before examining the optval parameter; thus an input of NULL will allow the caller to determine if hostname information is available without allocating any storage.

host_info.h

To allow applications to be compiled on systems which do not have our changes to the system header files, we provide a stub header file which includes the constants and struct definition necessary to compile on a non-compliant system. Code should, in general, run on systems regardless of compliance; calls to setsockopt(2) and getsockopt(2) will return ENOPROTOOPT if the TCP_HOSTS option is not supported.

Kernel modifications

The modifications necessary to the Linux 2.4 kernel fall into several categories:

Storing hostname data for each socket

Since we only associate hostname data with TCP sockets, the appropriate place to store it is in the tcp_opt struct associated with each socket. This data structure is used to keep track of a variety of TCP features which can be enabled or disabled in certain circumstances.

Generating outgoing SYN segments

There are several modifications necessary to the generation of outgoing SYN segments. The TCP header generation must include the HOSTS option (if data has been supplied by the client application), and a body must be included in the SYN segment (normally no data is passed in such cases) with the appropriate checksum calculated.

Handling incoming SYN segments

When a SYN segment arrives, the TCP stack parses the options. Modification must be made to recognize the HOSTS option, and if it is present, extract the hostnames from the data section.

Handling RST responses to SYN segments

When a normal SYN is sent to host which is not listening on the specified port, the RST that comes back to the originating machine will have an ack sequence number one higher than that in the SYN. However, if the SYN is one of our enhanced SYNs, the additional data changes this figure. Instead of off by one, as expected, the RST has an ack sequence off by one plus the number of bytes in the data portion of the SYN. By default, the kernel ignores inbound packets that don't appear to be part of a stream, and a RST with an ack sequence number that's 20 or 30 higher than expected doesn't appear to be useful. Thus we change the check to allow in addition an incoming RST which has an ack sequence number that is higher by exactly the amount of data sent in the SYN. This allows a compliant implementation to detect a reset connection rather than timing out.

Extending the sockets API

Since we are only adding a TCP option, the only modifications that were required to the sockets API are to the setsockopt(2) and getsockopt(2) function that were extended to handle our new option. See Networking API Modifications for details.

Extending the sysctl API

The sysctl interface was extended to turn on the TCP Hosts functionality by default, however the current setting can easily be examined, enabled, and disabled, with the (respective) commands:

    % sysctl net.ipv4.tcp_hosts

    # sysctl -w net.ipv4.tcp_hosts=1

    # sysctl -w net.ipv4.tcp_hosts=0

Application examples

Trivial telnet-like client and server: tclient and tserver

The first thing we did to test our TCP option was to implement a trivial client and server, tclient and tserver. tclient is a simple, telnet-like client. It simply listens on stdin and sends the input to the server. tserver is a telnet-like server: it waits for a connection and writes any data received from the socket to stdout.

An HTTP server: thttpd

To demonstrate the utility of our option, we decided to modify an HTTP server to use the information that can be gathered from the option. We chose to modify the thttpd web server, which has support for virtual hosting and is relatively simple. The reason we chose this instead of Apache is because Apache has almost 10 times more lines of code than thttpd.

Modifying the code to support our option consisted of about 10 lines of changes. This involved getting the receive host information out of the option, adding a field to pass the hostname along to where it is needed, and then using the receive host information. We first check the option's receive host for a virtual host. If the option is not present, then we fall back on the default thttpd behavior.

An HTTP and FTP client: wget

To test the modified HTTP server, we needed to modify a web browser. Modifying Mozilla, or even Internet Explorer, was certainly out of the question. Hence, we opted to patch wget.

The modification involved less than 10 lines of changes, consisting of initializing the receiver host information in the host_info struct with the hostname of the machine being contacted and setting the socket option before connecting to the server.

Conclusions

Results

Both endpoints compliant

If both endpoints (application and TCP stack) are compliant with this extension, it works exactly as intended.

To test this, we brought up a thttpd server within a user-mode Linux kernel that supported our extension. This server had virtual hosts bound to the names foobaar, 192.168.20.20, 127.0.0.1, and localhost. Each virtual host had /index.html file which announced the virtual host on which the file was located.

On a separate machine running a copy of our kernel, we used wget to contact the server at 192.168.20.20. As expected, the page returned indicated that it was being served by the appropriate virtual host. To test the other virtual hosts, we used tclient, which allowed us to specify a value for the rcv_host field in the outgoing connection.

Noncompliant applications and/or TCP stacks

There is no problem, in principle, if an old application is run on a system with an updated TCP stack. Similarly, there is no problem if an updated application is run on a system with a traditional TCP stack. In either case, no outgoing SYN segments contain the HOSTS option, and that option is ignored on incoming SYN segments.

We have tested a variety of interactions. We can connect with ssh from a non-compliant system to a compliant one, and we can ssh from a compliant system to a non-compliant one. We can connect to a compliant system running our modified thttpd server from a non-compliant system, from a compliant system with a non-compliant application, and from a compliant system with a compliant application, such as our modified wget.

Potential problems and suggestions for future work

One enhancement that would be quite useful is extending the semantics of bind(2) to allow each endpoint to be specified as a triple of address, port, and hostname, instead of the current pair of address and port. This would allow servers to easily do virtual hosting by binding to each hostname that they are to serve. This would allow virtual hosting with extremely minimal changes to already deployed server applications. It would also allow the kernel to make the decision to reject a connection without the overhead of switching context to the server process.
Adding routing support based on this TCP option would increase its usefulness; NAT servers could then forward inbound connections to the desired machines behind the NAT. This is a far more graceful approach to address-sharing than the current ``port forwarding'' technique, which does not scale very well, and only works for protocols where client applications allow the user to specify a port number. This is currently a problem for the NAT server because there is no good way to forward connections to a set of machines in this manner. Most NAT implementations only allow forwarding of a port to one machine.
It is possible that on certain links, with small MTU and IP fragmentation not possible, this modification would prevent a SYN segment from making it to the destination. A reasonable procedure for dealing with this would be to first eliminate one of the hostnames from the packet (perhaps the server can make use of a fraction of the information), and then eliminate the option entirely. This requires making the TCP retransmission mechanism aware of these changes, but is certainly possible.
Though the need is much less with IPv6, there could be some benefits to this option there as well. Conveniently, the IPv6 extensible header mechanism would allow a cleaner modification than is possible for IPv4. There generally is no need for NAT under IPv6, but consider the following scenario. Amazon.com decides to implement host names such as mybook.amazon.com. They will need their name server to respond to any query in their domain, and all of the <book>.amazon.com names will need to resolve to the address of their web server. The webserver would then generate dynamic content based on the book requested. While the HTTP host header is sufficient to instruct the web server what content to generate, our TCP option would be needed for https because the SSL certificate must be presented before the headers are sent, and the server doesn't know which virtual host is to receive the request until it gets the header.

References

[DNS]

Mockapetris, P. ``Domain Names - Implementation and Specification'', RFC 1035, November 1987.

[HTTP/1.1]

Fielding, R., et. al. ``Hypertext Transfer Protocol -- HTTP/1.1'', RFC 2616, June 1999.

[IANA]

Internet Assigned Numbers Authority. ``TCP Option Numbers'', May 2001.

    http://www.iana.org/assignments/tcp-parameters

[IDENT]

St. Johns, M. ``Identification Protocol'', RFC 1413, February 1993.

[NAT]

Srisuresh, P. and M. Holdrege. ``IP Network Address Translator (NAT) Terminology and Considerations'', RFC 2663, August 1999.

[TCP]

Postel, J. ``Transmission Control Protocol'', RFC 793, September 1981.

[TECH-ARCH]

Cornell Information Technologies. ``Cornell University Technical Architecture'', draft, December 1989.

    http://solutions.cit.cornell.edu/doc/TechnicalArchitecture.pdf

[THTTPD]

Poskanzer, Jef. ``Tiny/Turbo/Throttling HTTP server''

    http://www.acme.com/software/thttpd/

[TLS-HTTP]

Khare, R., and S. Lawrence. ``Upgrading to TLS Within HTTP/1.1'', RFC 2817, May 2000.

[ZEUS]

Zeus Technology. ``Hosting Multiple Web Sites on a Single Server Machine'', 2002.

    http://www.zeus.com/library/articles/hosting.html