I’m in Las Vegas this week at Amazon Web Services Re:Invent 2014, and I’m tweeting live from the various bootcamps, keynotes, breakout sessions and pub crawls that I attend.
If you want to improve the performance of your WordPress installation, W3 Total Cache is one of the best options out there for optimizing page delivery and script minification. But if you want to take your performance web-scale, you’ll need to consider a commercial-grade Content Delivery Network (CDN). Fortunately, W3 Total Cache offer multiple options, and (in my humble opinion) Amazon Web Services Cloudfront running on top of an S3 bucket is the best CDN option available. While the W3 Total Cache FAQ has a decent overview of how to configure a basic CloudFront/S3 CDN, AWS architectures that implement the higher level of security offered by assigning specific IAM accounts to Cloudfront distributions and S3 buckets can present a problem.
Even though Frederick Townes deserves a Nobel Prize for creating W3 Total Cache, he didn’t quite nail it when comes to parsing messages generated by the AWS API when the plugin tries to connect to Cloudfront.
Here is the most common error messages I ran into when trying to configure W3 Total Cache to work with Cloudfront within a PCI 3 compliant AWS architecture:
Error: Unable to list buckets (S3::listBuckets(): [AccessDenied] Access Denied).
W3 Total cache tries to create a dropdown list of S3 Buckets to which the IMA account has access. While listBucket is a valid IAM policy to list the contents of an S3 bucketS3, listBuckets isn’t valid. The correct policy reference is ListAllMyBuckets.
The correct IAM policy should look like this:
But this only covers the policy to list the S3 Buckets. If you’re using W3 Total Cache with Cloudfront, you also need to be able to list the Cloudfront distributions:
From a security perspective, it’s not ideal that the IAM policy has to be able to list the distributions and buckets. A better solution would be to provide a field in which to manually enter the bucket or distribution name.
First, I’m going to warn you that this isn’t a “How-To” post, or a “Take a look at this nifty new tech” post. This is just me brainstorming on how to choose when to decouple a specific component from where it normally lives inside a “closed-box” web application.
If an application might have the potential to one day to the cloud, it’s components should be design from the start so they can be easily decoupled. The best strategy to support future decoupling is to practice good Object Oriented Programming (OOP) and organize your code into pretty functions and classes, perhaps even adopt a Model View Control (MVC) design pattern. Writing thousands of lines of procedural (spaghetti) code means a hard time down the road, but this mess can usually be split into templates that can be referenced and reused via “includes” (I work a lot in PHP and ColdFusion, and there are a million and one ways to include a code template).
Once the code code is separated into templates or class libraries, it’s not too difficult to write web services interfaces into that functionality: building REST APIs that pop out XML and JSON is fun, but sometimes all that is required of a fledgling web service is to dump out a comma delimited list.
Unless these baby web services are behind some sort of firewall and not publically accessible to the world, some sort of security mechanism to authorize request is necessary, whether it’s a full blown OATH provider or just a super-secret revolving hashed key (wait, I already said OATH).
I have a pretty good handle on HOW to decouple application services ( I’ve been building these things since 1997), but I’ve never really tried to codify WHEN should I decouple application services.
Thinking about my prototypical Web application, I’m always trying to solve the problem of one or two components dragging the whole system down. They don’t do it all the time, so throwing more resources at the whole system to solve an intermittent problem seems extremely wasteful.
I keep coming back to some measure of latency as the solution: how long does it take the application components to talk to each other within a closed box, versus how much latency do have trying to transfer that data over the network?
I hate mathematics, but I’m going to try to formulate how to determine if decoupling makes sense.
Request Latency + Component A Processing Time + Response Latency = Component Latency
Pretty simple, huh?
In the closed-box application, Request Latency and Response Latency should be effectively zero, since components can speak to each other directly. The only ways to improve the performance are to optimize the code or throw more horsepower at the system. Code optimization is always a good idea, but vertical scaling hits hardware limitations pretty quickly, and is often a huge waste when trying to solve variable performance issues for a single component.
In this imaginary distributed application, I’m yanking out that problem component and giving it its own set of resources. The rest of the application can run just as it always did, but instead of throwing massive resources at the whole black box, I can throw modest resources at the distributed component. There’s definitely take a hit on on Request Latency and Response Latency (both of which are tied to network latency, Web server latency, and possibly lunar tidal effects).
Measuring the Request Latency and Response Latency will help point to whether or not it makes sense to decouple. First, some debugging capability is required inside the black-box application component to track how long a component processes from the time it receives a request to the time it sends out the result (I’m sure we all do this for all of our code already, right?). To facilitate debugging, I typically set a timestamp variable at the start of the component, and another at the end of the component, and if debugging is enabled, I pass the difference of the two to whatever I’m using for debugging (often just an HTML comment). This is the time to beat.
For the distributed component, one way to capture this latency is to spin up a simple “Hello World” application on the newly provisioned distributed application server (although to get a more realistic picture I’d probably send the same volume of data, or static result dump from the original component instead of the “Hello World” text). From the primary application server, I can run a CURL (PHP), or CFHTTP (ColdFusion), or whatever flavor of HTTP Request I prefer for that application. Before and after the request, the component should output a timestamp. The differences in the time stamps will give a good idea of the combined Request and Response latency. For even better insight, I could add a start timestamp and stop timestamp to the “Hello world” and pass that back to the application.
If the “Hello World” distributed application Request/Response Latency is longer than the black box Component Processing Latency, it’s time to scale up the “hardware” until there is a significant performance difference. In a cloud environment, this type of experimental scalability is possible, but for anyone using physical servers, this tactic may be cost prohibitive (and a clear indicator you should start using cloud resources right away).
Once the performance target is identified, it’s time to decide if the cost of the “infrastructure” is worth the gains from reworking code to support distributed a distributed application architecture.
If there is no real difference between the black-box component and the distributed component, then it may be necessary to look for performance bottlenecks elsewhere in the overall application architecture (database performance, client connections to Web servers, under-provisioned network connections, rogue processes, etc.).
So, to boil it all down to a theory, if the Request/Response latency in a distributed application network is lower than the processing time for a black-box component, that component may be a good candidate for repackaging as a distributed application.
I have a several projects on my plate this month that will let me test out this metric. It may be an oversimplified approach to a complex problem, but when working with a few million lines of spaghetti code, you’ve got to start somewhere.