[Apache] Don’t use .htaccess files

Apache has a very convenient feature: loading setting files from a webapp directories on the fly. These .htaccess files allow you to tweak the server’s behavior without access to the standard setting files. That a killer feature in a shared hosting environment but beside this particular case, you shouldn’t use it. Why ? Because there are serious drawback !

Performance hit

When AllowOverride all is used Apache will look for .htaccess files in every directory on the path of each served files. That pretty bad but it gets even worse: for each request the content of every .htaccess is read again. As you can imagine that a serious performance hit.

Security concern

Not all parameters can be overridden into .htaccess files, but there is already a lot you can do with a little imagination. For example an attacker could use an application vulnerability to modify a .htaccess in order to add a new file handler, so that .jpeg files are now processed by PHP.

By doing this he can now bypass most webapp upload ‘protection’ mechanism. You can sure the next .jpeg upload will not be a picture of a busty girl but something way more nasty, like a C99 shell.

Maintainability concern

Sometimes a customer will complain that a particular rewrite rule or a specific php setting doesn’t behave like expected. That weird, because you are sure you made ​​the appropriate changes. Surprise surprise, the behavior is altered into a .htaccess file, hidden somewhere in a 5-depth directory structure.

Integrate htaccess content into standard setting files

Many applications come with their own .htaccess. This file generally contain all the required rewrites rules to ensure their proper functioning. It can be very annoying to adapt its content into a standard apache setting file, because the syntax isn’t completely the same.

Fortunately there is a small tip that will save you a lot of trouble and time: you can just copy the rewrites rules into your main Directory block and then add a RewriteBase / line into it.

[Nginx] 413 Request Entity Too Large

This error append when trying to upload a file bigger than 1Mbits on a platform using Nginx as an HTTP server or reverse-proxy/SSL off-loader. That because Nginx, by default, limit client query size.

You can change the upper value using the client_max_body_size directive into Nginx setting file:

# vi /etc/nginx/nginx.conf
client_max_body_size <new_value>;

Note: if you want to disable this behavior, set the value to 0.

[Varnish] Reload configuration

As varnish doesn’t have any persistent storage you shoul avoid restart it, beside modifications on health checks, backend definitions or changes on the vcl_hash routine.

To reload varnish configuration you could use the distribution init script, or you can do it by hand:

# varnishadm -S /etc/varnish/secret  -T 127.0.0.1:6082
vcl.load conf_two /etc/varnish/prod.vcl
vcl.use conf_two
quit

Note that you can use these commands to load multiple setting files and switch between them on the fly.

[Varnish] HTTP Authentication

Put in place a basic HTTP authentication on a website with a Varnish can be a little tricky.

If you add the authentication on the backend without any changes into the varnish configuration, as soon as an authenticated client browse the website, filtered contents will be accessible to other users.

To avoid this behavior you can add the following block into your vcl_recv :

if (req.http.Authorization) {
   return (pass);
}

but that mean bypassing all cache for the filtered-content. If you want to protect the whole website this setting make Varnish completely useless. Fortunately, it is possible to implement the HTTP authentication on the Varnish side.

Encode user and password

As varnish is a reverse-proxy and not an HTTP server, there is nothing surprising in the fact that there is no support for password file. To bypass this limitation, we will add the login and password directly inside Varnish’s configuration, encoded into a base64 string, generated like this:

# echo -n "$LOGIN:$PW" | base64

VCL modifications

Into the vcl_recv add this block:

if (! req.http.Authorization ~ "Basic XXXXXXXXXXXX"){
   error 401 "Restricted" ;
}

where XXXXXXXXXXXX is your base64 encoded string.

Then add into the vcl_error :

if (obj.status == 401) {
   set obj.http.Content-Type = "text/html; charset=utf-8";
   set obj.http.WWW-Authenticate = "Basic realm=Secured";
   synthetic {"
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">
<HTML>
<HEAD>
<TITLE>Forbidden</TITLE>
<META HTTP-EQUIV='Content-Type' CONTENT='text/html;'>
</HEAD>
<BODY><H1>401 Forbidden</H1></BODY>
</HTML>
  "};
  return (deliver);
}

[Varnish] Basic performance tuning

Storage engine

Varnish has two storage engine: file and malloc. Both need you to specify (inside the /etc/default/varnish file on Debian) the amount of memory used for caching objects. The difference between the two is how varnish access the cache content.

With file storage varnish create a mapper file on the disk and bind it to virtual memory using the mmap system call. All operations are done via the mapper file. This mode is pretty simple, very ‘posix-compliant’ and therefore portable but… not so great for high performance, even if you use very fast drives.

Now that doesn’t mean file storage is completely useless. It’s a good choice for particular case when you can’t have enough RAM to cache all needed content (and adding SSD-like card like a Fusion-io isn’t an option) because the speed of your IO subsystem will always overcomes content generation. But that shouldn’t be your first pick.

Next we have the malloc storage where varnish directly reserve chunk of memory for each cache’s object, using the system call of the same name. This is the fastest type of cache as memory access latency and throughput are a few orders of magnitude lower/faster than even the fastest SSD drives. My recommendation is always to choose this storage first, and only change in last resort.

Worker threads

Worker threads are the ones that deliver cached objects to clients. You can tune them by playing on three parameters: thread_pool, thread_pool_min and thread_pool_max.

thread_pool is the number of pools that threads will be grouped into. The default value is 2. You can increase the number of pool but you shouldn’t exceed the number of CPU cores.

thread_pool_min defines the minimum amount of threads that always need to be alive per pool. As creating threads is a time-consuming operation it better to always keep at least 50 threads alive. In case of a sudden spike in traffic, you will have enough threads to handle the first wave of requests while varnish spawn new threads.

thread_pool_max defines the maximum amount of threads per pool. This parameter is trickier, as the ideal value depends on available resources and the traffic you want to handle. Usually you don’t want to go over 5000 threads as specified in the documentation.

Varnish shared memory log

The VSL file is used to log most traffic data. It operates on a circular non-persistent buffer of 80MB by default. This file is used by tools like varnishlog or varnishtop. On a Debian system, you can find it into the /var/lib/varnish directory. To improve performance you can move the whole directory inside a tmpfs.

Varnish

varnish is a reverse-proxy build exclusively for web acceleration. It speed up websites by caching their contents in memory and serve the requested objects (HTML pages, images, css, js, etc…) directly instead of the HTTP server.

It not only reduce user-perceived page load time, but also increase platform load capacity. Varnish is licensed under the two-clause BSD license, but commercial support is also available from Varnish-software.

Architecture / Performance

The big plus of Varnish compared to using a classic proxy, like Squid, in reverse mode, is it ability to store data in virtual memory and lets the kernel decide what to do with it (place the object in RAM or swap or anything else). This behavior make Varnish very memory efficient.

Varnish is also massively multi-threaded, queuing incoming connection only when it reaches its limit or when multiple users ask for the same non-cacheable resource. Also in order to reduce system calls, varnish logs requests in shared memory. In a nutshell: varnish is clearly build for performance.

Configuration

Configuration is done via a specific ‘pseudo-language’ called VCL. VCL is trans-coded into C code and compiled as a shared object when reloading the configuration. If you know what you do, you can add C code into a VCL setting file to increase varnish functionalities.

Speaking of functionalities, varnish supports round-robin load balancing and most ESI standard tags, which simplify a lot the inclusion of dynamic content into ‘static’ cachable pages.

Pound

pound is a tiny reverse proxy load balancer and SSL offloader. It’s not a caching proxy like Varnish, but its simplicity and lightweight make it a good choice for making an HTTPS front-end on a moderate traffic platform.

Create a PEM file

pound use the PEM format. A single PEM file can contain all the needed files (public certificate, intermediate certificate, root certificate and private key).

To convert your SSL files certificate to a PEM file usable for Pound:

# cat server.key > cert.pem
# cat your.domain.tld.crt >> cert.pem
# cat intermediate.crt >> server.pem

Disable SSLv3

To improve security you can disable the SSLv3 protocol. You need at least the patched version 2.6 to do that. Add the DisableSSLv3 directive inside your ListenHTTPs block.

Improve ciphers selection

To improve security you can also disable old/weak ciphers. Redefine the ciphers selection like this:

Ciphers    "EECDH+ECDSA+AESGCM:EECDH+aRSA+AESGCM:EECDH+ECDSA+SHA384:EECDH+ECDSA+SHA256:EECDH+aRSA+SHA384:EECDH+aRSA+SHA256:EECDH:EDH+aRSA:-RC4:EECDH+aRSA+RC4:EECDH+RC4:EDH+aRSA+RC4:!aNULL:!eNULL:!LOW:!3DES:!MD5:!EXP:!PSK:!SRP:!DSS:RC4+SHA"

Further Reading and sources

Self-contained maintenance page

Sometimes we need to take our web apps offline temporarily. It could be to do a major upgrade or in an emergency situation. Anyways, you should at least set a static page to notify regular users that the site is currently unavailable. And it’s even better if you can maintain access for you and your fellow developers in the meantime.

Create a maintenance page

First step is to create a good maintenance page. Good means clean, simple, visually attractive and the most important thing: “self-contained” ! So any required resource should be embedded inside a “simple” HTML file.

How to do that ?

For CSS use the tag style to integrate the content into the HTML like this:

<style media="screen" type="text/css">
 ... Add CSS data here ...
</style>

For images encode them in base64 using the base64 command and add the resulting string into the HTML like this:

<img src="data:image/gif;base64,BASE64STRING" alt="my_image" />

You have a good maintenance page ? Perfect, now let talk how to serve it.

Serve the maintenance page

The first component that receive the users requests should be serving the page. Depending your stack that means either the:

  • HTTP server should serve the page
  • HTTP reverse-proxy cache (aka. Varnish) should serve the page

In the first case i highly suggest you to make a dedicated maintenance VirtualHost on a separate port.

In the second case you must modify your vcl_error section with something like this:

if (obj.status == 703) {
    set obj.status = 503;
    set obj.http.Content-Type = "text/html; charset=utf-8";
    synthetic {"
           ... Add HTML content here ...
    "};
    return(deliver);
}

Note the fake HTTP code 703. We will use it later 😉

Maintain access for you, display maintenance page for others

If your maintenance page is served by Varnish use an acl for IP filtering like this:

acl me_and_dev {
"1.2.3.4"/32;
}

sub vcl_recv {
    if (!client.ip ~ me_and_dev) {
            error 703 "Service unavailable";
    }
}

If your maintenance page is served by Apache use an iptable rule to redirect traffic to the maintenance VirtualHost‘s port except for your IP :

iptables -t NAT -A PREROUTING -i eth0 -p tcp --dport 80 \! -s '1.2.3.4' --to-ports 81

Note that if you can’t use IP filtering, other solution are possible like using a special maintenance cookie set after requesting a “secret url”.

[Apache] Reload vs restart

People are often confused by the difference between the reload and the restart operation. That lead to a lot of questions like “Can i do a simple reload when changing a parameters into the modphp php.ini setting file ?” or “Does a reload interrupt the service ?

First of all it’s important to understand that the reload operation doesn’t really exist, not in a former sense. When doing a reload the parent process do reload its configuration, but as it doesn’t do anything by himself except spawning child process, the new configuration isn’t effective right away. Then the parent process send a graceful signal to each of its child to exit after finishing their current request (or to exit immediately if they’re not serving anything). As each child dies off the parent replaces it with a new child.

So a reload operation, doesn’t interrupt the service, but it effect isn’t immediate. If you need an immediate effect you must do a restart operation which “violently” kill all the child process. The reload operation also have one big limitation : it doesn’t take into account new files. For enabling a new module or after changing a SSL certificates, you must use the restart operation.