httpd,Apache 的 HTTP 服务器

The Apache HTTP Server is a “heavy duty” network server that Subversion can leverage. Via a custom module, httpd makes Subversion repositories available to clients via the WebDAV/DeltaV protocol, which is an extension to HTTP 1.1 (see http://www.webdav.org/ for more information). This protocol takes the ubiquitous HTTP protocol that is the core of the World Wide Web, and adds writing—specifically, versioned writing—capabilities. The result is a standardized, robust system that is conveniently packaged as part of the Apache 2.0 software, is supported by numerous operating systems and third-party products, and doesn't require network administrators to open up yet another custom port. [41] While an Apache-Subversion server has more features than svnserve, it's also a bit more difficult to set up. With flexibility often comes more complexity.

Much of the following discussion includes references to Apache configuration directives. While some examples are given of the use of these directives, describing them in full is outside the scope of this chapter. The Apache team maintains excellent documentation, publicly available on their website at http://httpd.apache.org. For example, a general reference for the configuration directives is located at http://httpd.apache.org/docs-2.0/mod/directives.html.

Also, as you make changes to your Apache setup, it is likely that somewhere along the way a mistake will be made. If you are not already familiar with Apache's logging subsystem, you should become aware of it. In your httpd.conf file are directives that specify the on-disk locations of the access and error logs generated by Apache (the CustomLog and ErrorLog directives, respectively). Subversion's mod_dav_svn uses Apache's error logging interface as well. You can always browse the contents of those files for information that might reveal the source of a problem that is not clearly noticeable otherwise.

先决条件

To network your repository over HTTP, you basically need four components, available in two packages. You'll need Apache httpd 2.0, the mod_dav DAV module that comes with it, Subversion, and the mod_dav_svn filesystem provider module distributed with Subversion. Once you have all of those components, the process of networking your repository is as simple as:

  • 配置好httpd 2.0,并且使用mod_dav启动,

  • 为mod_dav安装mod_dav_svn插件,它会使用Subversion的库访问版本库,并且

  • configuring your httpd.conf file to export (or expose) the repository.

You can accomplish the first two items either by compiling httpd and Subversion from source code, or by installing pre-built binary packages of them on your system. For the most up-to-date information on how to compile Subversion for use with the Apache HTTP Server, as well as how to compile and configure Apache itself for this purpose, see the INSTALL file in the top level of the Subversion source code tree.

基本的 Apache 配置

Once you have all the necessary components installed on your system, all that remains is the configuration of Apache via its httpd.conf file. Instruct Apache to load the mod_dav_svn module using the LoadModule directive. This directive must precede any other Subversion-related configuration items. If your Apache was installed using the default layout, your mod_dav_svn module should have been installed in the modules subdirectory of the Apache install location (often /usr/local/apache2). The LoadModule directive has a simple syntax, mapping a named module to the location of a shared library on disk:

LoadModule dav_svn_module     modules/mod_dav_svn.so

Note that if mod_dav was compiled as a shared object (instead of statically linked directly to the httpd binary), you'll need a similar LoadModule statement for it, too. Be sure that it comes before the mod_dav_svn line:

LoadModule dav_module         modules/mod_dav.so
LoadModule dav_svn_module     modules/mod_dav_svn.so

At a later location in your configuration file, you now need to tell Apache where you keep your Subversion repository (or repositories). The Location directive has an XML-like notation, starting with an opening tag, and ending with a closing tag, with various other configuration directives in the middle. The purpose of the Location directive is to instruct Apache to do something special when handling requests that are directed at a given URL or one of its children. In the case of Subversion, you want Apache to simply hand off support for URLs that point at versioned resources to the DAV layer. You can instruct Apache to delegate the handling of all URLs whose path portions (the part of the URL that follows the server's name and the optional port number) begin with /repos/ to a DAV provider whose repository is located at /var/svn/repository using the following httpd.conf syntax:

<Location /repos>
  DAV svn
  SVNPath /var/svn/repository
</Location>

If you plan to support multiple Subversion repositories that will reside in the same parent directory on your local disk, you can use an alternative directive, the SVNParentPath directive, to indicate that common parent directory. For example, if you know you will be creating multiple Subversion repositories in a directory /var/svn that would be accessed via URLs like http://my.server.com/svn/repos1, http://my.server.com/svn/repos2, and so on, you could use the httpd.conf configuration syntax in the following example:

<Location /svn>
  DAV svn

  # any "/svn/foo" URL will map to a repository /var/svn/foo
  SVNParentPath /var/svn
</Location>

Using the previous syntax, Apache will delegate the handling of all URLs whose path portions begin with /svn/ to the Subversion DAV provider, which will then assume that any items in the directory specified by the SVNParentPath directive are actually Subversion repositories. This is a particularly convenient syntax in that, unlike the use of the SVNPath directive, you don't have to restart Apache in order to create and network new repositories.

Be sure that when you define your new Location, it doesn't overlap with other exported Locations. For example, if your main DocumentRoot is exported to /www, do not export a Subversion repository in <Location /www/repos>. If a request comes in for the URI /www/repos/foo.c, Apache won't know whether to look for a file repos/foo.c in the DocumentRoot, or whether to delegate mod_dav_svn to return foo.c from the Subversion repository. The result is often an error from the server of the form 301 Moved Permanently.

在本阶段,你一定要考虑访问权限问题,如果你已经作为普通的web服务器运行过Apache,你一定有了一些内容—网页、脚本和其他。这些项目已经配置了许多在Apache下可以工作的访问许可,或者更准确一点,允许Apache与这些文件一起工作。Apache当作为Subversion服务器运行时,同样需要正确的访问许可来读写你的Subversion版本库。

You will need to determine a permission system setup that satisfies Subversion's requirements without messing up any previously existing web page or script installations. This might mean changing the permissions on your Subversion repository to match those in use by other things that Apache serves for you, or it could mean using the User and Group directives in httpd.conf to specify that Apache should run as the user and group that owns your Subversion repository. There is no single correct way to set up your permissions, and each administrator will have different reasons for doing things a certain way. Just be aware that permission-related problems are perhaps the most common oversight when configuring a Subversion repository for use with Apache.

认证选项

At this point, if you configured httpd.conf to contain something like

<Location /svn>
  DAV svn
  SVNParentPath /var/svn
</Location>

…then your repository is “anonymously” accessible to the world. Until you configure some authentication and authorization policies, the Subversion repositories you make available via the Location directive will be generally accessible to everyone. In other words,

  • anyone can use their Subversion client to check out a working copy of a repository URL (or any of its subdirectories),

  • 任何人可以在浏览器输入版本库URL交互浏览的方式来查看版本库的最新修订版本,并且

  • 任何人可以提交到版本库。

Of course, you might have already set up a pre-commit hook script to prevent commits (see “实现版本库钩子”一节). But as you read on, you'll see that it's also possible use Apache's built-in methods to restrict access in specific ways.

Setting Up HTTP Authentication

The easiest way to authenticate a client is via the HTTP Basic authentication mechanism, which simply uses a username and password to verify that a user is who she says she is. Apache provides an htpasswd utility for managing the list of acceptable usernames and passwords. Let's grant commit access to Sally and Harry. First, we need to add them to the password file.

$ ### First time: use -c to create the file
$ ### Use -m to use MD5 encryption of the password, which is more secure
$ htpasswd -cm /etc/svn-auth-file harry
New password: *****
Re-type new password: *****
Adding password for user harry
$ htpasswd -m /etc/svn-auth-file sally
New password: *******
Re-type new password: *******
Adding password for user sally
$

Next, you need to add some more httpd.conf directives inside your Location block to tell Apache what to do with your new password file. The AuthType directive specifies the type of authentication system to use. In this case, we want to specify the Basic authentication system. AuthName is an arbitrary name that you give for the authentication domain. Most browsers will display this name in the pop-up dialog box when the browser is querying the user for his name and password. Finally, use the AuthUserFile directive to specify the location of the password file you created using htpasswd.

After adding these three directives, your <Location> block should look something like this:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn
  AuthType Basic
  AuthName "Subversion repository"
  AuthUserFile /etc/svn-auth-file
</Location>

This <Location> block is not yet complete, and will not do anything useful. It's merely telling Apache that whenever authorization is required, Apache should harvest a username and password from the Subversion client. What's missing here, however, are directives that tell Apache which sorts of client requests require authorization. Wherever authorization is required, Apache will demand authentication as well. The simplest thing to do is protect all requests. Adding Require valid-user tells Apache that all requests require an authenticated user:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn
  AuthType Basic
  AuthName "Subversion repository"
  AuthUserFile /etc/svn-auth-file
  Require valid-user
</Location>

Be sure to read the next section (“授权选项”一节) for more detail on the Require directive and other ways to set authorization policies.

One word of warning: HTTP Basic Auth passwords pass in very nearly plain-text over the network, and thus are extremely insecure.

Another option is to not use Basic authentication but “Digest” authentication instead. Digest authentication allows the server to verify the client's identity without passing the plaintext password over the network. Assuming that the client and server both know the user's password, they can verify that the password is the same by using it to apply a hashing function to a one-time bit of information. The server sends a small random-ish string to the client; the client uses the user's password to hash the string; the server then looks to see if the hashed value is what it expected.

Configuring Apache for Digest authentication is also fairly easy, and only a small variation on our prior example. Be sure to consult Apache's documentation for full details.

<Location /svn>
  DAV svn
  SVNParentPath /var/svn
  AuthType Digest
  AuthName "Subversion repository"
  AuthDigestDomain /svn/
  AuthUserFile /etc/svn-auth-file
  Require valid-user
</Location>

If you're looking for maximum security, then public-key cryptography is the best solution. It may be best to use some sort of SSL encryption, so that clients authenticate via https:// instead of http://; at a bare minimum, you can configure Apache to use a self-signed server certificate. [42] Consult Apache's documentation (and OpenSSL documentation) about how to do that.

SSL 证书管理

商业应用需要越过公司防火墙的版本库访问,防火墙需要小心的考虑非认证用户“吸取”他们的网络流量的情况,SSL让那种形式的关注更不容易导致敏感数据泄露。

If a Subversion client is compiled to use OpenSSL, then it gains the ability to speak to an Apache server via https:// URLs. The Neon library used by the Subversion client is not only able to verify server certificates, but can also supply client certificates when challenged. When the client and server have exchanged SSL certificates and successfully authenticated one another, all further communication is encrypted via a session key.

怎样产生客户端和服务器端证书以及怎样使用它们已经超出了本书的范围,许多书籍,包括Apache自己的文档,描述这个任务,现在我们可以覆盖的是普通的客户端怎样来管理服务器与客户端证书。

When speaking to Apache via https://, a Subversion client can receive two different types of information:

  • 一个服务器证书

  • 一个客户端证书的要求

如果客户端接收了一个服务器证书,它需要去验证它是可以相信的:这个服务器是它自称的那一个吗?OpenSSL库会去检验服务器证书的签名人或者是核证机构(CA)。如果OpenSSL不可以自动信任这个CA,或者是一些其他的问题(如证书过期或者是主机名不匹配),Subversion命令行客户端会询问你是否愿意仍然信任这个证书:

$ svn list https://host.example.com/repos/project

Error validating server certificate for 'https://host.example.com:443':
 - The certificate is not issued by a trusted authority. Use the
   fingerprint to validate the certificate manually!
Certificate information:
 - Hostname: host.example.com
 - Valid: from Jan 30 19:23:56 2004 GMT until Jan 30 19:23:56 2006 GMT
 - Issuer: CA, example.com, Sometown, California, US
 - Fingerprint: 7d:e1:a9:34:33:39:ba:6a:e9:a5:c4:22:98:7b:76:5c:92:a0:9c:7b

(R)eject, accept (t)emporarily or accept (p)ermanently?

This dialogue should look familiar; it's essentially the same question you've probably seen coming from your web browser (which is just another HTTP client like Subversion). If you choose the (p)ermanent option, the server certificate will be cached in your private run-time auth/ area in just the same way your username and password are cached (see “客户端凭证缓存”一节). If cached, Subversion will automatically trust this certificate in future negotiations.

Your run-time servers file also gives you the ability to make your Subversion client automatically trust specific CAs, either globally or on a per-host basis. Simply set the ssl-authority-files variable to a semicolon-separated list of PEM-encoded CA certificates:

[global]
ssl-authority-files = /path/to/CAcert1.pem;/path/to/CAcert2.pem

Many OpenSSL installations also have a pre-defined set of “default” CAs that are nearly universally trusted. To make the Subversion client automatically trust these standard authorities, set the ssl-trust-default-ca variable to true.

当与Apache通话时,Subversion客户端也会收到一个证书的要求,Apache是询问客户端来证明自己的身份:这个客户端是否是他所说的那一个?如果一切正常,Subversion客户端会发送回一个通过Apache信任的CA签名的私有证书,一个客户端证书通常会以加密方式存放在磁盘,使用本地密码保护,当Subversion收到这个要求,它会询问你证书的路径和保护用的密码:

$ svn list https://host.example.com/repos/project

Authentication realm: https://host.example.com:443
Client certificate filename: /path/to/my/cert.p12
Passphrase for '/path/to/my/cert.p12':  ********
…

注意这个客户端证书是一个“p12”文件,为了让Subversion使用客户端证书,它必须是运输标准的PKCS#12格式,大多数浏览器可以导入和导出这种格式的证书,另一个选择是用OpenSSL命令行工具来转化存在的证书为PKCS#12格式。

Again, the runtime servers file allows you to automate this challenge on a per-host basis. Either or both pieces of information can be described in runtime variables:

[groups]
examplehost = host.example.com

[examplehost]
ssl-client-cert-file = /path/to/my/cert.p12
ssl-client-cert-password = somepassword

Once you've set the ssl-client-cert-file and ssl-client-cert-password variables, the Subversion client can automatically respond to a client certificate challenge without prompting you. [43]

授权选项

此刻,你已经配置了认证,但是没有配置授权,Apache可以要求用户认证并且确定身份,但是并没有说明这个身份的怎样允许和限制,这个部分描述了两种控制访问版本库的策略。

整体访问控制

最简单的访问控制形式是授权特定用户为只读版本库访问或者是读/写访问版本库。

You can restrict access on all repository operations by adding the Require valid-user directive to your <Location> block. Using our previous example, this would mean that only clients that claimed to be either harry or sally, and provided the correct password for their respective username, would be allowed to do anything with the Subversion repository:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn

  # how to authenticate a user
  AuthType Basic
  AuthName "Subversion repository"
  AuthUserFile /path/to/users/file

  # only authenticated users may access the repository
  Require valid-user
</Location>

Sometimes you don't need to run such a tight ship. For example, Subversion's own source code repository at http://svn.collab.net/repos/svn allows anyone in the world to perform read-only repository tasks (like checking out working copies and browsing the repository with a web browser), but restricts all write operations to authenticated users. To do this type of selective restriction, you can use the Limit and LimitExcept configuration directives. Like the Location directive, these blocks have starting and ending tags, and you would nest them inside your <Location> block.

The parameters present on the Limit and LimitExcept directives are HTTP request types that are affected by that block. For example, if you wanted to disallow all access to your repository except the currently supported read-only operations, you would use the LimitExcept directive, passing the GET, PROPFIND, OPTIONS, and REPORT request type parameters. Then the previously mentioned Require valid-user directive would be placed inside the <LimitExcept> block instead of just inside the <Location> block.

<Location /svn>
  DAV svn
  SVNParentPath /var/svn

  # how to authenticate a user
  AuthType Basic
  AuthName "Subversion repository"
  AuthUserFile /path/to/users/file

  # For any operations other than these, require an authenticated user.
  <LimitExcept GET PROPFIND OPTIONS REPORT>
    Require valid-user
  </LimitExcept>
</Location>

These are only a few simple examples. For more in-depth information about Apache access control and the Require directive, take a look at the Security section of the Apache documentation's tutorials collection at http://httpd.apache.org/docs-2.0/misc/tutorials.html.

每目录访问控制

It's possible to set up finer-grained permissions using a second Apache httpd module, mod_authz_svn. This module grabs the various opaque URLs passing from client to server, asks mod_dav_svn to decode them, and then possibly vetoes requests based on access policies defined in a configuration file.

If you've built Subversion from source code, mod_authz_svn is automatically built and installed alongside mod_dav_svn. Many binary distributions install it automatically as well. To verify that it's installed correctly, make sure it comes right after mod_dav_svn's LoadModule directive in httpd.conf:

LoadModule dav_module         modules/mod_dav.so
LoadModule dav_svn_module     modules/mod_dav_svn.so
LoadModule authz_svn_module   modules/mod_authz_svn.so

To activate this module, you need to configure your Location block to use the AuthzSVNAccessFile directive, which specifies a file containing the permissions policy for paths within your repositories. (In a moment, we'll discuss the format of that file.)

Apache非常的灵活,你可以从三种模式里选择一种来配置你的区块,作为开始,你选择一种基本的配置模式。(下面的例子非常简单;见Apache自己的文档中的认证和授权选项来查看更多的细节。)

最简单的区块是允许任何人可以访问,在这个场景里,Apache决不会发送认证请求,所有的用户作为“匿名”对待。

例 6.1. 匿名访问的配置实例。

<Location /repos>
  DAV svn
  SVNParentPath /var/svn

  # our access control policy
  AuthzSVNAccessFile /path/to/access/file
</Location>
          

On the opposite end of the paranoia scale, you can configure your block to demand authentication from everyone. All clients must supply credentials to identify themselves. Your block unconditionally requires authentication via the Require valid-user directive, and defines a means to authenticate.

例 6.2. 一个认证访问的配置实例。

<Location /repos>
  DAV svn
  SVNParentPath /var/svn

  # our access control policy
  AuthzSVNAccessFile /path/to/access/file

  # only authenticated users may access the repository
  Require valid-user

  # how to authenticate a user
  AuthType Basic
  AuthName "Subversion repository"
  AuthUserFile /path/to/users/file
</Location>
          

A third very popular pattern is to allow a combination of authenticated and anonymous access. For example, many administrators want to allow anonymous users to read certain repository directories, but want only authenticated users to read (or write) more sensitive areas. In this setup, all users start out accessing the repository anonymously. If your access control policy demands a real username at any point, Apache will demand authentication from the client. To do this, you use both the Satisfy Any and Require valid-user directives together.

例 6.3. 一个混合认证/匿名访问的配置实例。

<Location /repos>
  DAV svn
  SVNParentPath /var/svn

  # our access control policy
  AuthzSVNAccessFile /path/to/access/file

  # try anonymous access first, resort to real
  # authentication if necessary.
  Satisfy Any
  Require valid-user

  # how to authenticate a user
  AuthType Basic
  AuthName "Subversion repository"
  AuthUserFile /path/to/users/file
</Location>
          

Once you've settled on one of these three basic httpd.conf templates, you need to create your file containing access rules for particular paths within the repository. This is described in “基于路径的授权”一节.

禁用基于路径的检查

The mod_dav_svn module goes through a lot of work to make sure that data you've marked “unreadable” doesn't get accidentally leaked. This means that it needs to closely monitor all of the paths and file-contents returned by commands like svn checkout or svn update commands. If these commands encounter a path that isn't readable according to some authorization policy, then the path is typically omitted altogether. In the case of history or rename tracing—e.g. running a command like svn cat -r OLD foo.c on a file that was renamed long ago—the rename tracking will simply halt if one of the object's former names is determined to be read-restricted.

All of this path-checking can sometimes be quite expensive, especially in the case of svn log. When retrieving a list of revisions, the server looks at every changed path in each revision and checks it for readability. If an unreadable path is discovered, then it's omitted from the list of the revision's changed paths (normally seen with the --verbose option), and the whole log message is suppressed. Needless to say, this can be time-consuming on revisions that affect a large number of files. This is the cost of security: even if you haven't configured a module like mod_authz_svn at all, the mod_dav_svn module is still asking Apache httpd to run authorization checks on every path. The mod_dav_svn module has no idea what authorization modules have been installed, so all it can do is ask Apache to invoke whatever might be present.

On the other hand, there's also an escape-hatch of sorts, one which allows you to trade security features for speed. If you're not enforcing any sort of per-directory authorization (i.e. not using mod_authz_svn or similar module), then you can disable all of this path-checking. In your httpd.conf file, use the SVNPathAuthz directive:

例 6.4. 禁用所有的路径检查

<Location /repos>
  DAV svn
  SVNParentPath /var/svn

  SVNPathAuthz off
</Location>
          

The SVNPathAuthz directive is “on” by default. When set “off”, all path-based authorization checking is disabled; mod_dav_svn stops invoking authorization checks on every path it discovers.

额外的糖果

我们已经覆盖了关于认证和授权的Apache和mod_dav_svn的大多数选项,但是Apache还提供了许多很好的特性。

版本库浏览

One of the most useful benefits of an Apache/WebDAV configuration for your Subversion repository is that the youngest revisions of your versioned files and directories are immediately available for viewing via a regular web browser. Since Subversion uses URLs to identify versioned resources, those URLs used for HTTP-based repository access can be typed directly into a Web browser. Your browser will issue an HTTP GET request for that URL, and based on whether that URL represents a versioned directory or file, mod_dav_svn will respond with a directory listing or with file contents.

因为URL不能确定你所希望看到的资源的版本,mod_dav_svn会一直返回最新的版本,这样会有一些美妙的副作用,你可以直接把Subversion的URL传递给文档作为引用,这些URL会一直指向文档最新的材料,当然,你也可以在别的网站作为超链使用这些URL。

正确的文件类型

When browsing a Subversion repository, the web browser gets a clue about how to render a file's contents by looking at the Content-Type: header returned in Apache's response to the HTTP GET request. The value of this header is some sort of MIME type. By default, Apache will tell the web browsers that all repository files are of the “default” MIME type, typically text/plain. This can be frustrating, however, if a user wishes repository files to render as something more meaningful—for example, it might be nice to have a foo.html file in the repository actually render as HTML when browsing.

To make this happen, you only need to make sure that your files have the proper svn:mime-type set. This is discussed in more detail in “文件内容类型”一节, and you can even configure your client to automatically attach proper svn:mime-type properties to files entering the repository for the first time; see “自动设置属性”一节.

So in our example, if one were to set the svn:mime-type property to text/html on file foo.html, then Apache would properly tell your web browser to render the file as HTML. One could also attach proper image/* mime-type properties to images, and by doing this, ultimately get an entire web site to be viewable directly from a repository! There's generally no problem with doing this, as long as the website doesn't contain any dynamically-generated content.

定制外观

You generally will get more use out of URLs to versioned files—after all, that's where the interesting content tends to lie. But you might have occasion to browse a Subversion directory listing, where you'll quickly note that the generated HTML used to display that listing is very basic, and certainly not intended to be aesthetically pleasing (or even interesting). To enable customization of these directory displays, Subversion provides an XML index feature. A single SVNIndexXSLT directive in your repository's Location block of httpd.conf will instruct mod_dav_svn to generate XML output when displaying a directory listing, and to reference the XSLT stylesheet of your choice:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn
  SVNIndexXSLT "/svnindex.xsl"
  …
</Location>

Using the SVNIndexXSLT directive and a creative XSLT stylesheet, you can make your directory listings match the color schemes and imagery used in other parts of your website. Or, if you'd prefer, you can use the sample stylesheets provided in the Subversion source distribution's tools/xslt/ directory. Keep in mind that the path provided to the SVNIndexXSLT directory is actually a URL path—browsers need to be able to read your stylesheets in order to make use of them!

版本库列表

If you're serving a collection of repositories from a single URL via the SVNParentPath directive, then it's also possible to have Apache display all available repositories to a web browser. Just activate the SVNListParentPath directive:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn
  SVNListParentPath on
  …
</Location>

If a user now points her web browser to the URL http://host.example.com/svn/, she'll see list of all Subversion repositories sitting in /var/svn. Obviously, this can be a security problem, so this feature is turned off by default.

Apache 日志

Because Apache is an HTTP server at heart, it contains fantastically flexible logging features. It's beyond the scope of this book to discuss all ways logging can be configured, but we should point out that even the most generic httpd.conf file will cause Apache to produce two logs: error_log and access_log. These logs may appear in different places, but are typically created in the logging area of your Apache installation. (On Unix, they often live in /usr/local/apache2/logs/.)

The error_log describes any internal errors that Apache runs into as it works. The access_log file records every incoming HTTP request received by Apache. This makes it easy to see, for example, which IP addresses Subversion clients are coming from, how often particular clients use the server, which users are authenticating properly, and which requests succeed or fail.

Unfortunately, because HTTP is a stateless protocol, even the simplest Subversion client operation generates multiple network requests. It's very difficult to look at the access_log and deduce what the client was doing—most operations look like a series of cryptic PROPPATCH, GET, PUT, and REPORT requests. To make things worse, many client operations send nearly-identical series of requests, so it's even harder to tell them apart.

mod_dav_svn, however, can come to your aid. By activating an “operational logging” feature, you can ask mod_dav_svn to create a separate log file describing what sort of high-level operations your clients are performing.

To do this, you need to make use of Apache's CustomLog directive (which is explained in more detail in Apache's own documentation). Be sure to invoke this directive outside of your Subversion Location block:

<Location /svn>
  DAV svn
  …
</Location>

CustomLog logs/svn_logfile "%t %u %{SVN-ACTION}e" env=SVN-ACTION

In this example, we're asking Apache to create a special logfile svn_logfile in the standard Apache logs directory. The %t and %u variables are replaced by the time and username of the request, respectively. The really important part are the two instances of SVN-ACTION. When Apache sees that variable, it substitutes the value of the SVN-ACTION environment variable, which is automatically set by mod_dav_svn whenever it detects a high-level client action.

So instead of having to interpret a traditional access_log like this:

[26/Jan/2007:22:25:29 -0600] "PROPFIND /svn/calc/!svn/vcc/default HTTP/1.1" 207 398
[26/Jan/2007:22:25:29 -0600] "PROPFIND /svn/calc/!svn/bln/59 HTTP/1.1" 207 449
[26/Jan/2007:22:25:29 -0600] "PROPFIND /svn/calc HTTP/1.1" 207 647
[26/Jan/2007:22:25:29 -0600] "REPORT /svn/calc/!svn/vcc/default HTTP/1.1" 200 607
[26/Jan/2007:22:25:31 -0600] "OPTIONS /svn/calc HTTP/1.1" 200 188
[26/Jan/2007:22:25:31 -0600] "MKACTIVITY /svn/calc/!svn/act/e6035ef7-5df0-4ac0-b811-4be7c823f998 HTTP/1.1" 201 227
…

… you can instead peruse a much more intelligible svn_logfile like this:

[26/Jan/2007:22:24:20 -0600] - list-dir '/'
[26/Jan/2007:22:24:27 -0600] - update '/'
[26/Jan/2007:22:25:29 -0600] - remote-status '/'
[26/Jan/2007:22:25:31 -0600] sally commit r60

For an exhaustive list of all actions logged, see “High Level Logging”一节.

Write-Through Proxying

One of the nice advantages of using Apache as a Subversion server is that it can be set up for simple replication. For example, suppose that your team is distributed across four offices around the globe. The Subversion repository can only exist in one of those offices, and that means the other three offices will not enjoy accessing it—they're likely to experience significantly slower traffic and response times when updating and committing code. A powerful solution is to set up a system consisting of one master Apache server and several slave Apache servers. If you place a slave server in each office, then users can check out a working copy from whichever slave is closest to them. All read requests go to their local slave. Write requests get automatically routed to the single master server. When the commit completes, the master then automatically “pushes” the new revision to each slave server using the svnsync replication tool.

This configuration creates a huge perceptual speed increase for your users, because Subversion client traffic is typically 80-90% read requests. And if those requests are coming from a local server, it's a huge win.

In this section, we'll walk you through a standard setup of this single-master/multiple slave system. However, keep in mind that your servers must be running at least Apache 2.2.0 (with mod_proxy loaded) and Subversion (mod_dav_svn) 1.5.

Configure the Servers

First, configure your master server's httpd.conf file in the usual way. Make the repository available at a certain URI location, and configure authentication and authorization however you'd like. After that's done, configure each of your “slave” servers in the exact same way, but add the special SVNMasterURI directive to the block:

<Location /svn>
  DAV svn
  SVNPath /var/svn/repos
  SVNMasterURI http://master.example.com/svn
  …
</Location>

This new directive tells a slave server to redirect all write requests to the master. (This is done automatically via Apache's mod_proxy module.) Ordinary read requests, however, are still serviced by the slaves. Be sure that your master and slave servers all have matching authentication and authorization configurations; if they fall out of sync, it can lead to big headaches.

Next, we need to deal with the problem of infinite recursion. With the current configuration, imagine what will happen when a Subversion client performs a commit to the master server. After the commit completes, the server uses svnsync to replicate the new revision to each slave. But because svnsync appears to be just another Subversion client performing a commit, the slave will immediately attempt to proxy the incoming write request back to the master! Hilarity ensues.

The solution to this problem is to have the master push revisions to a different <Location> on the slaves. This location is configured to not proxy write requests at all, but accept normal commits from (and only from) the master's IP address:

<Location /svn-proxy-sync>
  DAV svn
  SVNPath /var/svn/repos
  Order deny,allow
  Deny from all
  # Only let the server's IP address access this Location:
  Allow from 10.20.30.40
  …
</Location>
Set up Replication

Now that you've configured your Location blocks on master and slaves, you need to configure the master to replicate to the slaves. This is done the usual way, using svnsync. If you're not familiar with this tool, see “版本库复制”一节 for details.

First, make sure that each slave repository has a pre-revprop-change hook script which allows remote revision property changes. (This is standard procedure for being on the receiving end of svnsync) Then log into the master server and configure each of the slave repository URIs to receive data from the master repository on local disk:

$ svnsync init http://slave1.example.com/svn-proxy-sync file://var/svn/repos
Copied properties for revision 0.
$ svnsync init http://slave2.example.com/svn-proxy-sync file://var/svn/repos
Copied properties for revision 0.
$ svnsync init http://slave3.example.com/svn-proxy-sync file://var/svn/repos
Copied properties for revision 0.

# Perform the initial replication

$ svnsync sync http://slave1.example.com/svn-proxy-sync
Transmitting file data ....
Committed revision 1.
Copied properties for revision 1.
Transmitting file data .......
Committed revision 2.
Copied properties for revision 2.
…

$ svnsync sync http://slave2.example.com/svn-proxy-sync
Transmitting file data ....
Committed revision 1.
Copied properties for revision 1.
Transmitting file data .......
Committed revision 2.
Copied properties for revision 2.
…

$ svnsync sync http://slave3.example.com/svn-proxy-sync
Transmitting file data ....
Committed revision 1.
Copied properties for revision 1.
Transmitting file data .......
Committed revision 2.
Copied properties for revision 2.
…

After this is done, we configure the master server's post-commit hook script to invoke svnsync on each slave server:

#!/bin/sh
# Post-commit script to replicate newly-committed revision to slaves

svnsync sync http://slave1.example.com/svn-proxy-sync > /dev/null 2>&1
svnsync sync http://slave2.example.com/svn-proxy-sync > /dev/null 2>&1
svnsync sync http://slave3.example.com/svn-proxy-sync > /dev/null 2>&1

The extra bits on the end of each line aren't necessary, but they're a sneaky way to allow the sync commands to run in the background, so that the Subversion client isn't left waiting forever for the commit to finish. In addition to this post-commit hook, you'll need a post-revprop-change hook as well, so that when a user, say, modifies a log message, the slave servers get that change as well:

#!/bin/sh
# Post-revprop-change script to replicate revprop-changes to slaves

REV=${2}
svnsync copy-revprops http://slave1.example.com/svn-proxy-sync ${REV} > /dev/null 2>&1
svnsync copy-revprops http://slave2.example.com/svn-proxy-sync ${REV} > /dev/null 2>&1
svnsync copy-revprops http://slave3.example.com/svn-proxy-sync ${REV} > /dev/null 2>&1

The only thing we've left out here is what to do about locks. Because locks are strictly enforced by the master server (the only place where commits happen), we don't technically need to do anything. Many teams don't use Subversion's locking features at all, so it may be a non-issue for you. However, if lock changes aren't replicated from master to slaves, it means that clients won't be able to query the status of locks (e.g. svn status -u will show no information about repository locks.) If this bothers you, you can write post-lock and post-unlock hook scripts which run svn lock and svn unlock on each slave machine, presumably through a remote shell method such as SSH. That's left as an exercise for the reader!

Caveats

Your master/slave replication system should now be ready to use. A couple words of warning are in order, however. Remember that this replication isn't entirely robust in the face of computer or network crashes. For example, if one of the automated svnsync commands fails to complete for some reason, the slaves will begin to fall behind. For example, your remote users will see that they've committed revision 100, but then when they run svn update, their local server will tell them than revision 100 doesn't yet exist! Of course, the problem will be automatically fixed the next time another commit happens and the subsequent svnsync is successful—the sync will replicate all waiting revisions. But still, you may want to set up some sort of out-of-band monitoring to notice synchronization failures and force svnsync to run when things go wrong.

Other Apache Features

Several of the features already provided by Apache in its role as a robust Web server can be leveraged for increased functionality or security in Subversion as well. The Subversion client is able to use SSL, (the Secure Socket Layer, discussed earlier). If your Subversion client is built to support SSL, then it can access your Apache server using https:// and enjoy a high-quality encrypted network session.

Equally useful are other features of the Apache and Subversion relationship, such as the ability to specify a custom port (instead of the default HTTP port 80) or a virtual domain name by which the Subversion repository should be accessed, or the ability to access the repository through an HTTP proxy.

Finally, because mod_dav_svn is speaking a subset of the WebDAV/DeltaV protocol, it's possible to access the repository via third-party DAV clients. Most modern operating systems (Win32, OS X, and Linux) have the built-in ability to mount a DAV server as a standard network “shared folder”. This is a complicated topic, but also wondrous when implemented. For details, read 附录 C, WebDAV 和自动版本.

Note that there are number of other small tweaks one can make to mod_dav_svn that are too obscure to mention in this chapter. For a complete list of all httpd.conf directives that mod_dav_svn responds to, see “指示”一节.



[41] 他们讨厌这样做。

[42] While self-signed server certificates are still vulnerable to a “man in the middle” attack, such an attack is much more difficult for a casual observer to pull off, compared to sniffing unprotected passwords.

[43] More security-conscious folk might not want to store the client certificate password in the runtime servers file.

[44] 之前叫做“ViewCVS”。