The last 5 months I’ve been doing a lot of work on Amazon Web Services (AWS) for my new job as a Cloud Architect at Nordcloud Sweden. Learning how to build applications that take fully advantage of The Cloud has made me very anxious to redo some of my previous projects and rebuild them for AWS. In this blog post I’ll start of with the best way to run a Drupal 7 website on AWS.
While this blog post is written with Drupal 7 as an example, it could easily by adapted for any other PHP based application.
1. The current Drupal server setup
If you are a Drupal builder, you are most likely using a combination of two typical web server setups for your production sites:
- A Shared Hosting server, where multiple websites run on the same server
- A dedicated Virtual Private Server (VPS) per website
How you deploy your code and if you use Docker or not is currently not relevant, the main thing is that you have dedicated (virtual or physical) servers that run 24-7 with the exact same hardware configuration.
On these servers you probably have this software stack installed:
- nginx or apache
- MySQL/MariaDB database
- A local disk where your user content gets uploaded to
- Shell access via ssh so you can run drush and cronjobs
- Maybe an Apache Solr server for search indexing
- Maybe a varnish cache in front of the web server
- Maybe a memcached bin to offload your database
All of this is managed by you, or maybe a hosting company that does it for you, using some kind of provisioning tool like chef or puppet. Making changes to this setup is hard and keeping the setup in sync with your development stack is probably even harder (even when you use Docker).
If you use a managed hosting provider you already got rid of being responsible for the hardware, but you still run the same kind of static server setup that you would have if you did it yourself.
Problems with this setup
Problem 1: There are a lot of single point of failures in this setup: a lot of non-redundant single-instance services are running on the same server. If any component crashes, your entire site is offline.
Problem 2: The CPU/RAM of this server does not scale up or down automatically depending on the server load, there’s always a manual intervention required to make changes to the hardware configuration. If you get an unexpected traffic boost, this might cause your server to go down.
Problem 3: The whole setup is constantly running at full power, no matter what load it’s currently having. This is a waste of resources and even worse, your money.
2. Moving things to AWS
So let’s see now how we can move this setup to AWS, and while doing so, get rid of the problems from the previous paragraph.
When you move this web server setup to the cloud you can basically do it two ways: the wrong way and the right way.
The wrong way: lift and shift
If you just see AWS as another managed hosting provider you could go for the lift and shift solution. In this scenario you re-create your entire server just like you did in the old setup. You run a single EC2 instance (= the AWS equivalent of a virtual server) with your full stack inside of it.
This works of course, but it does not scale, it’s not redundant and it will probably cost you more than running your old setup. So it doesn’t fix any of the problems we’ve described in the previous chapter.
AWS has a tool to calculate the cost of such a move, called the TCO Calculator. Just keep in mind that if you just compare the cloud cost to your own datacenter cost using the same hardware setup you are not using the cloud the right way and you will pay a lot more than you should.
The right way: build your application for AWS
Before we continue to optimize our setup for AWS I have to explain a few AWS concepts that will be important to understand: Managed Services and High-Availability
High-Availability (HA) is a concept that you will see pop up everywhere when using AWS. It’s about not having a single point of failure in your setup by using redundant setups using the tools AWS offers you.
An important part of HA setups is the concept of regions and availability zones (AZ). Each region has several availability zones, which are independent data centers that can communicate with each other as if they were a local network.
- Region: eu-west-1 (Ireland))
- Availability Zones (AZ): eu-west-1a, eu-west-1b, eu-west-1c
Certain things are automatically replicated within the AZ’s for a region (e.g. all the managed services we’ll see in the next topic), but you’re also required to use them intelligently yourself. For a web server setup for example, using EC2 instances, you would create 2 servers, each in a different AZ, and have a EC2 LoadBalancer (which is also HA since it’s a managed service) in front of them. If one of servers goes down, or even the whole AZ, the load balancer will keep working and only send traffic to the server in the AZ that is still working.
In a Lucid Chart diagram this HA setup would look like this:
AWS Services are simply put your usual services from your software stack, but managed by Amazon. They offer them as high-available software-as-a-service where you don’t have to worry about anything else than using it.
For our Drupal 7 setup we’ll be using these AWS Services:
- Web servers: Amazon EC2 (EC2 instances, Elastic Load Balancer, Auto Scaling Groups)
- Database: Amazon RDS (MySQL, MariaDB or even Aurora if you want)
- Configuration files and User uploaded content: Amazon S3
- Key/value caching server: Amazon Elasticache (memcached)
- Reverse proxy content cache: Amazon CloudFront
Now that we have all the AWS tools explained, let’s go build our Drupal 7 site using them.
3. Building Drupal 7 on AWS
To deal with the problems we had when running on a Shared Hosting or VPS server we have to make sure we cover these two items:
- Our setup needs to have High-Availability: no single point of failure
- It has to have automatic scaling: scale in and out when needed
Scaling up and down means increasing or decreasing the amount of RAM or CPU cores in a system, while scaling in and out means adding more similar servers to a setup or remove some of them. Scaling in and out obviously only works if you have a load balancer that distributes traffic among the available servers.
Look at this Lucid Chart diagram to get an idea of what the final stack will look like (click for a larger version):
Database: AWS RDS MySQL
The database is probably the easiest component to configure in our setup: we simply use an Amazon RDS MySQL instance. We connect to it using the something.amazonaws.com hostname and the username and password we supply.
We can make this stack HA by using the Multi-AZ option. This is not a master-master setup, but a standby instance in a different AZ that will get booted by AWS in the event the main one goes down. You do not need to configure anything for this, AWS will update the ip address of the hostname automatically.
Backups of the RDS instance are taken by using daily snapshots, which will be enabled by default for any RDS database you create.
Upload content: Amazon S3
Since our setup will include web servers running on AWS EC2 that will scale in and out depending on the usage, we cannot have any permanent data inside of them. All the content that gets uploaded by Drupal will have to be stored in a central file storage that is accessible by all web servers: Amazon S3.
Drupal can not use S3 out of the box, but there are https://drupal.org/download available to achieve this. When writing this blog post I was still experimenting which one was best suited for the task, I’ll update this post later on with my findings.
While S3 has versioning support, it’s not a bad idea to have a second AWS account copy all the files from S3 every day, hour or even when they get created.
Besides the user uploaded content we will store another type of files in S3: configuration files used by instances and load balancers. More about this later.
Memcached: AWS Elasticache
There’s not much to say about Elasticache. Simply create a memcached server and configure your instances to use it.
Caching: AWS CloudFront
CloudFront is Amazon’s CDN service with edge locations all over the world. The most important though to know here is that invalidating requests is not easy and you should pretty much rely on your Drupal site setting the correct cache headers for each request it serves. If you need to clear your entire cache, it might be easier (and cheaper) to just create a new CloudFront distribution and delete the old one.
We use CloudFront like you would use any other cache: just put it in front of the web server. In this case it will be put in front of the Elastic Load Balancer (see next topic) with the DNS record for our site pointing to the CloudFront distribution.
Web servers and AutoScaling: Amazon EC2
Now we get to the core of the setup: the actual web servers. We will be using a set of AWS EC2 services to accomplish that task.
Let’s start with pointing out that our Drupal code is in a Docker container, pushed to a (public or private) repository. The EC2 instances can reach the registry and can check out the images without authentication.
Configuring our EC2 instances will be done by something called a Launch Configurations. A Launch Configuration can be best seen as a configuration file that will be used by an Auto Scaling Group to create servers. The Launch Configuration contains the base server image to be used, the type of EC2 instance to be used, some other things I will not go into detail here, and most important: the user-data script.
The user-data script is simply a bash shell script that we will use to install the required software on the web servers:
- Install certain OS packages we need (e.g. aws-cli, docker)
- Install extra packages using simple curl commands (e.g. docker-compose)
- Configure rsyslog monitoring (if we don’t use it via docker-compose)
- More things as you like
- And as last step: start the Drupal Docker container.
The user-data script will also handle the creation of a custom settings.php file for Drupal. It will overwrite the default one inside the Docker Drupal container with our values for the datasbase, the memcached server, etc…
This Launch Configuration will now be used by an Auto Scaling Group (ASG) to fire up a set of instances. This ASG can become intelligent if you connect it to AWS Cloud Watch, where it will create or remove instances by monitoring certain metrics (server load, RAM usage,…) but it can also be quote simple as to just have a single web server running in each available AZ all the time.
The third component in our web server setup is an Elastic Load Balancer (ELB). The ASG will create servers and the ELB distributes traffic between them and performs the health checks. If a server becomes unhealthy, the ELB will remote it from the rotation and kills it. The ASG will create a new one which will then be picked up by the ELB again and put into the load balancing rotation.
Together these 3 services - LC, ASG and ELB - create a setup that scales in and out when needed, exactly what we wanted for our Drupal 7 setup.
If this all sounds a bit difficult to visualize, check the AWS Auto Scaling article for a longer explanation with some examples.
Route 53 is AWS’s DNS service. While you can use any DNS service you want and just point the CNAME records to AWS hostnames I strongly recommend using Route 53. Because AWS internally updates ip address all the time, using a CNAME record might give you situations where DNS lookups can go to the wrong ip.
To deal with this issue, AWS has created an ALIAS record where you can point to an internal AWS resource (ELB, CloudFront distribution, S3 location, …) and won’t be affected by any downtime when ip address change.
4. Did we solve all our problems?
Now that we’ve listed all the services we will be using to build our Drupal site, did we actually meet all the requirements we set out to achieve?
Do we have a High Available setup with no single points of failure? Yes. We either use Amazon HA services or we create services in 2 AZ’s at all times.
Is this a setup that automatically scales in and out without manual interaction? Yes. The combination of Launch Configuration, Auto Scaling Groups and Elastic Load Balancers takes care of that.
We have managed to turn our Drupal stack into a high available, auto scaling setup, but what will this cost us to run this? To get that cost, we use the Simple Monthly Calculator that Amazon provides.
Before we start calculating we have to make some decisions about which instance types and data usage we will be talking.
This is a very basic setup for now. I’m not going into detail about instance types, snapshot storage space, CloudWatch monitoring, etc… Adding these will of course increase the total cost of running your site on AWS.
- We use a db.t2.medium 10GB MySQL RDS database, with the Multi-AZ option
- An S3 bucket that contains 500GB of files
- A cache.t2.micro memcached instance, which is more than enough for our setup
- A CloudFront bucket that has worldwide edge location coverage
Our EC2 setup is as follows:
- We use 2 Availability Zones
- In each AZ we create a t2.small (1 CPU core, 2GB RAM) web server with a 30GB EBS root disk
- One Launch Configuration that handles creating the EC2 instances
- One Auto Scaling Group that scales out instances in pairs, one per AZ
- One Elastic Load balancer, created in both AZ’s
We expect about 100GB traffic per month to our site.
As you can see I’m using only small instance types for this calculation. Don’t go too big too fast, our scaling setup adds more capacity than in a setup where you would have only one big instance.
For price calculation I’m taking the EU West-1 region (Ireland). Prices vary between different regions, so this is not a complete picture. But still, you should go for the region that is closest to your customers and has all the services you need (e.g. in Europe the Frankfurt region does not have all the services Ireland currently offers).
I hope this blog post was a good example to show you have to optimize your Drupal site for AWS. It can easily be applied to Drupal 8 or any other PHP application, as long as you focus on the important goals of this setup: High Availability and Auto Scaling.
7. Next steps
Even though this is a lengthy blog post, there are still a lot of topics I haven’t covered yet. There are many more AWS services you can use to monitor, scale and build your application. I also haven’t addressed how you should run cron jobs or nightly import/sync tasks in a setup like this. This is all stuff for upcoming blog posts.
Drupal 7 on AWS Part 2: CloudFormation
This blog post focused on the “how?” and “why?” running Drupal on AWS. Part 2 is an actual example of such a setup, with a complete infrastructure setup provided as a CloudFormation stack. CloudFormation is AWS’s Infrastructure-as-a-code tool, something you should definitely should be using for any large software stack.