Skip to content
dumbbell edited this page May 20, 2011 · 12 revisions

When we started to redesign the Yakaz website architecture we questioned our choice of Apache as HTTP server and we tested Yaws.

From the Yaws website:

Yaws is a HTTP high perfomance 1.1 webserver particularly well suited for dynamic-content web applications. Two separate modes of operations are supported:
  • Standalone mode where Yaws runs as a regular webserver daemon. This is the default mode.
  • Embedded mode where Yaws runs as an embedded webserver in another Erlang application.

Yaws is entirely written in Erlang, and furthermore it is a multithreaded webserver where one Erlang lightweight process is used to handle each client.

The main advantages of Yaws compared to other Web technologies are performance and elegance. The performance comes from the underlying Erlang system and its ability to handle concurrent processes in an efficient way. Its elegance comes from Erlang as well. Web applications don't have to be written in ugly ad hoc languages.

Some benchmarks later, we decided to uses it for the Yakaz website, as a replacement of Apache, precisely for these reasons (and because we have some skills in Erlang).

After 7 months in production, it proves that it scales very well and that it is a very stable server. But to meet all our needs, we had to patch it to add some missing features and to fix some bugs or unexpected behaviours.

You can found here a fork of Yaws with all our updates. Feel free to get it, feedbacks are welcome. Each modification was made in a separate and independent branch to make integrations easier. But the all_patches branch contains all of it and will be maintained up to date with klacke's repository (as far as possible).

Table of Contents

Branches overview

Appup_src

We add an application upgrade file template used to build the final yaws.appup file.

This file gives us the possibility to do live upgrade or downgrade.

Auth_improvements

We have 2 parts in this patch.

The first one improve implementation of authentication mechanisms, like authentication inside all docroots and recursion through subdirectories (If you define an authentication policy for a directory, then it is automatically applied to any subdirectories, except those with their own authentication policy). It also adds the docroot directive in auth structures (or .yaws_auth files) to restrict it to a specific docroot (No docroot configured means all docroots).

The second one adds ACLs (Access Control List), like the mod_access of Apache, to protect data. You can add directives in auth structures (or .yaws_auth files) to control access to particular parts of the server, based on the client IP address. The allow and deny directives are used to specify which clients are or are not allowed to access to the server, while the order directive sets the default access state, and configures how the allow and deny directives interact with each other.

Here is an example:

  <server www.yakaz.com >
    ...
    <auth>
      docroot = /var/www/yakaz/admin
      deny    = all
      allow   = 127.0.0.1, 192.168.0.0/24
      order   = deny,allow
    </auth>
    ...
  </server>

Chunked_request

We can configure Yaws to receive large POSTs by chunks to save memory usage. This is done using the partial_post_size directive in the server part of the configuration file. It is useful to upload a file to the webserver.

But, currently, if a client uses a Chunked Transfer-Encoding request to send data, Yaws reads all of it without taking in account the value of partial_post_size.

Even if it is not a common case, because Chunked Transfer-Encoding is usually used for HTTP responses, the HTTP 1.1 spec stated that:

All HTTP/1.1 applications MUST be able to receive and decode the "chunked" transfer-coding

So, if a client sends a very large POST, Erlang can run out of memory. It is a DoS bug. Modifications applied in this branch fix it by reading these requests by block like any other HTTP request.

Compressible_javascript

Add "application/javascript" in compressible mime types.

Expires

Like the mod_expires of Apache, expires directive controls the setting of the Expires HTTP header and the max-age directive of the Cache-Control HTTP header in server responses. The expiration date can be relative to either the time the source file was last modified, or to the time of the client access. It is possible to have multiple expires directives for a virtual server.

Here is an example:

  <server www.yakaz.com >
    ...
    expires = <image/gif, access+2592000> <image/png, access+2592000>
    expires = <image/jpeg, access+2592000> <text/css, access+2592000>
    expires = <application/javascript, modify+2592000>
    ...
  </server>

Gserv_counter

In Yaws, each group of virtual servers (same ip, same port) is represented by an Erlang process that maintains some counters. But these counters were not updated correctly. We fix it with this patch.

Hard_setconf

With this little patch, we can do a hard reload of the configuration when the server is running, without stopping the application. This happens when the configuration has changed too much (e.g. the ip address of virtual server).

This is done by stopping all gserv processes and restarting it with the new configuration, so it is pretty brutal.

Ipv6_parsing

Support parsing of URLs with literal IPv6 addresses in function yaws_api:parse_url, like http://[::1]:8080.

Logger_mod

Recently, the logger_mod directive was added in the global configuration of Yaws to allow for customized access logging.

This patch goes a little farther away. logger_mod is defined in the server part and can be used to customize access and auth messages (auth_log directive is also in server part and is deprecated in global part). The new behaviour yaws_logger must be used to define external modules to log messages.

Multi_listen

Sometimes, a virtual server must be bound to several IP addresses. For now, you must duplicate virtual server configuration by changing the address to listen on.

With this patch, multiple Listen directives are allowed without any duplication. For example, if your server must listen on loopback address on both IPv4 and IPv6, your server part may be:

  <server www.yakaz.com >
    port = 80
    listen = 127.0.0.1
    listen = ::1
    ...
  </server>

Page_options

Scripts can return {page, {Options, Page}} to make Yaws return a different page than the one being requested, where Options is a deep list of options. For now, the only type of option is {header, H}.

This patch adds the type {status, Code} to set the HTTP status code of the response. It also delays the accumulation of these headers to preserve it during the handling of the request on Page.

Page_renew_sconf

When a script returns {page, P}, we renew #sconf{} record. This is particularly useful to restart checks on all docroots.

Parse_auth

The function yaws:parse_auth/1 now always return a tuple of the form {User, Pass, Orig} even if it couldn't parse the value of the "Authorization:" header. In case of an error, User and Pass are set to undefined. This allows to keep the original value of this header.

Php_handler

We replace the phpfcgi directive by php_handler to extend it. This new directive now supports the follwing definitions:

  • <cgi, Filename>: The default value, it is the path to the php executable used to interpret php scripts. (php_exe_path is deprecated).
  • <fgci, Host:Port>: Do the same thing than the old phpfcgi directive.
  • <extern, Module:Function | Node:Module:Function>: Use an external handler, possibly on another node, to interpret .php files.

Redirect

Module used to rewrite the Arg #arg{} records can now return any HTTP response. This can be used to redirect requests or to return an error. To do so, this module must set the element #arg.state using the record #rewrite_response{}.

The record #rewrite_response{} contains 3 elements:

  • status: any valid HTTP status code
  • headers: a list of {header, H}
  • content: an iolist
For example, to do an unconditional redirect to http://www.yakaz.com, you can use the module simple_redir_mod.erl:
  arg_rewrite(Arg) ->
    L = "http://www.yakaz.com",
    RwResp = #rewrite_response{status=301, headers=[{header, {location, L}}]}
    Arg#arg{state=RwResp}.

Soft_stop

When Yaws is stopped, all gserv processes are shutdown without closing softly opened HTTP connections.

With this patch, we close listening sockets, to not accept new connections, and then we try to shutdown all opened connections. If, after 60 seconds, some connections remain alive, we kill them.

So, shutdown is longer, but every processing requests have a chance to receive their response before closing.

Ssl_listen_opts

Initialize listening SSL sockets with same options that TCP sockets, especially the reuseaddr flag.

Yaws_conf_getenv

In Yaws, you can customize the path to the configuration file with the argument --conf on the command line. Else, Yaws uses the default configuration file.

With this patch, you can use the application parameter config to set the path of the default configuration file.

Yaws_debug_hrl_include_fixes

This branch just fixes the path to yaws_debug.hrl in two modules. The previous path was "../../yaws/src/yaws_debug.hrl". So unless the root directory was not named "yaws", it still worked.

Yaws_shaper

Inspired by the mod_bwshare module of Apache, This patch add the shaper directive to control access to virtual servers.

Access can be controlled based on the client's IP address. It is also possible to throttle HTTP requests based on the client's download rate. External modules used to shape the traffic must implement the new behaviour yaws_shaper.

Yaws_signature

Add the server_signature directive in the global part of the configuration to customize the HTTP header Server.

Yaws_status

With this patch, we improve the output of the status command by adding session and connection counters for each group of virtual servers.

  $> yaws --status
  IP Port Connections Sessions Requests
  127.0.0.1 443 0 1 3
  127.0.0.1 80 0 1 2511
  Uptime: 6 Days, 17 Hours, 6 Minutes
Clone this wiki locally