-
Notifications
You must be signed in to change notification settings - Fork 0
Home
When we started to redesign the Yakaz website architecture we questioned our choice of Apache as HTTP server and we tested Yaws.
From the Yaws website:
Yaws is a HTTP high perfomance 1.1 webserver particularly well suited for dynamic-content web applications. Two separate modes of operations are supported:
- Standalone mode where Yaws runs as a regular webserver daemon. This is the default mode.
- Embedded mode where Yaws runs as an embedded webserver in another Erlang application.
Yaws is entirely written in Erlang, and furthermore it is a multithreaded webserver where one Erlang lightweight process is used to handle each client.
The main advantages of Yaws compared to other Web technologies are performance and elegance. The performance comes from the underlying Erlang system and its ability to handle concurrent processes in an efficient way. Its elegance comes from Erlang as well. Web applications don't have to be written in ugly ad hoc languages.
Some benchmarks later, we decided to uses it for the Yakaz website, as a replacement of Apache, precisely for these reasons (and because we have some skills in Erlang).
After 7 months in production, it proves that it scales very well and that it is a very stable server. But to meet all our needs, we had to patch it to add some missing features and to fix some bugs or unexpected behaviours.
You can found here a fork of Yaws with all our updates. Feel free to get it, feedbacks are welcome. Each modification was made in a separate and independent branch to make integrations easier. But the master branch contains all of it and will be maintained up to date with klacke's repository (as far as possible).
We add an application upgrade file template used to build the final yaws.appup file.
This file gives us the possibility to do live upgrade or downgrade.
We have 2 parts in this patch.
The first one improve implementation of authentication mechanisms, like authentication inside all docroots and recursion through subdirectories (If you define an authentication policy for a directory, then it is automatically applied to any subdirectories, except those with their own authentication policy). It also adds the docroot directive in auth structures (or .yaws_auth files) to restrict it to a specific docroot (No docroot configured means all docroots).
The second one adds ACLs (Access Control List), like the mod_access of Apache, to protect data. You can add directives in auth structures (or .yaws_auth files) to control access to particular parts of the server, based on the client IP address. The allow and deny directives are used to specify which clients are or are not allowed to access to the server, while the order directive sets the default access state, and configures how the allow and deny directives interact with each other.
Here is an example:
<server www.yakaz.com > ... <auth> docroot = /var/www/yakaz/admin deny = all allow = 127.0.0.1, 192.168.0.0/24 order = deny,allow </auth> ... </server>
We can configure Yaws to receive large POSTs by chunks to save memory usage. This is done using the partial_post_size directive in the server part of the configuration file. It is useful to upload a file to the webserver.
But, currently, if a client uses a Chunked Transfer-Encoding request to send data, Yaws reads all of it without taking in account the value of partial_post_size.
Even if it is not a common case, because Chunked Transfer-Encoding is usually used for HTTP responses, the HTTP 1.1 spec stated that:
All HTTP/1.1 applications MUST be able to receive and decode the "chunked" transfer-coding
So, if a client sends a very large POST, Erlang can run out of memory. It is a DoS bug. Modifications applied in this branch fix it by reading these requests by block like any other HTTP request.
Add "application/javascript" in compressible mime types.
Like the mod_expires of Apache, expires directive controls the setting of the Expires HTTP header and the max-age directive of the Cache-Control HTTP header in server responses. The expiration date can be relative to either the time the source file was last modified, or to the time of the client access. It is possible to have multiple expires directives for a virtual server.
Here is an example:
<server www.yakaz.com > ... expires = <image/gif, access+2592000> <image/png, access+2592000> expires = <image/jpeg, access+2592000> <text/css, access+2592000> expires = <application/javascript, modify+2592000> ... </server>
In Yaws, each group of virtual servers (same ip, same port) is represented by an Erlang process that maintains some counters. But these counters were not updated correctly. We fix it with this patch.
With this little patch, we can do a hard reload of the configuration when the server is running, without stopping the application. This happens when the configuration has changed too much (e.g. the ip address of virtual server).
This is done by stopping all gserv processes and restarting it with the new configuration, so it is pretty brutal.
Support parsing of URLs with literal IPv6 addresses in function yaws_api:parse_url, like http://[::1]:8080.
Recently, the logger_mod directive was added in the global configuration of Yaws to allow for customized access logging.
This patch goes a little farther away. logger_mod is defined in the server part and can be used to customize access and auth messages (auth_log directive is also in server part and is deprecated in global part). The new behaviour yaws_logger must be used to define external modules to log messages.
Sometimes, a virtual server must be bound to several IP addresses. For now, you must duplicate virtual server configuration by changing the address to listen on.
With this patch, multiple Listen directives are allowed without any duplication. For example, if your server must listen on loopback address on both IPv4 and IPv6, your server part may be:
<server www.yakaz.com > port = 80 listen = 127.0.0.1 listen = ::1 ... </server>
Scripts can return {page, {Options, Page}} to make Yaws return a different page than the one being requested, where Options is a deep list of options. For now, the only type of option is {header, H}.
This patch adds the type {status, Code} to set the HTTP status code of the response. It also delays the accumulation of these headers to preserve it during the handling of the request on Page.
When a script returns {page, P}, we renew #sconf{} record. This is particularly useful to restart checks on all docroots.
We replace the phpfcgi directive by php_handler to extend it. This new directive now supports the follwing definitions:
- <cgi, Filename>: The default value, it is the path to the php executable used to interpret php scripts. (php_exe_path is deprecated).
- <fgci, Host:Port>: Do the same thing than the old phpfcgi directive.
- <extern, Module:Function | Node:Module:Function>: Use an external handler, possibly on another node, to interpret .php files.
Module used to rewrite the Arg #arg{} records can now return any HTTP response. This can be used to redirect requests or to return an error. To do so, this module must set the element #arg.state using the record #rewrite_response{}.
The record #rewrite_response{} contains 3 elements:
- status: any valid HTTP status code
- headers: a list of {header, H}
- content: an iolist
arg_rewrite(Arg) -> L = "http://www.yakaz.com", RwResp = #rewrite_response{status=301, headers=[{header, {location, L}}]} Arg#arg{state=RwResp}.
When Yaws is stopped, all gserv processes are shutdown without closing softly opened HTTP connections.
With this patch, we close listening sockets, to not accept new connections, and then we try to shutdown all opened connections. If, after 60 seconds, some connections remain alive, we kill them.
So, shutdown is longer, but every processing requests have a chance to receive their response before closing.
Initialize listening SSL sockets with same options that TCP sockets, especially the reuseaddr flag.
In Yaws, you can customize the path to the configuration file with the argument --conf on the command line. Else, Yaws uses the default configuration file.
With this patch, you can use the application parameter config to set the path of the default configuration file.
Inspired by the mod_bwshare module of Apache, This patch add the shaper directive to control access to virtual servers.
Access can be controlled based on the client's IP address. It is also possible to throttle HTTP requests based on the client's download rate. External modules used to shape the traffic must implement the new behaviour yaws_shaper.
Add the server_signature directive in the global part of the configuration to customize the HTTP header Server.
With this patch, we improve the output of the status command by adding session and connection counters for each group of virtual servers.
$> yaws --status IP Port Connections Sessions Requests 127.0.0.1 443 0 1 3 127.0.0.1 80 0 1 2511
Uptime: 6 Days, 17 Hours, 6 Minutes