Hack Away – Musings on When to Decouple Application Components

First, I’m going to warn you that this isn’t a “How-To” post, or a “Take a look at this nifty new tech” post. This is just me brainstorming on how to choose when to decouple a specific component from where it normally lives inside a “closed-box” web application.

If an application might have the potential to one day to the cloud, it’s components should be design from the start so they can be easily decoupled. The best strategy to support future decoupling is to practice good Object Oriented Programming (OOP) and organize your code into pretty functions and classes, perhaps even adopt a Model View Control (MVC) design pattern. Writing thousands of lines of procedural (spaghetti) code means a hard time down the road, but this mess can usually be split into templates that can be referenced and reused via “includes” (I work a lot in PHP and ColdFusion, and there are a million and one ways to include a code template).

Once the code code is separated into templates or class libraries, it’s not too difficult to write web services interfaces into that functionality: building REST APIs that pop out XML and JSON is fun, but sometimes all that is required of a fledgling web service is to dump out a comma delimited list.

Unless these baby web services are behind some sort of firewall and not publically accessible to the world, some sort of security mechanism to authorize request is necessary, whether it’s a full blown OATH provider or just a super-secret revolving hashed key (wait, I already said OATH).

I have a pretty good handle on HOW to decouple application services ( I’ve been building these things since 1997), but I’ve never really tried to codify WHEN should I decouple application services.

Thinking about my prototypical Web application, I’m always trying to solve the problem of one or two components dragging the whole system down. They don’t do it all the time, so throwing more resources at the whole system to solve an intermittent problem seems extremely wasteful.

I keep coming back to some measure of latency as the solution: how long does it take the application components to talk to each other within a closed box, versus how much latency do have trying to transfer that data over the network?

I hate mathematics, but I’m going to try to formulate how to determine if decoupling makes sense.

Request Latency + Component A Processing Time + Response Latency = Component Latency

Pretty simple, huh?

In the closed-box application, Request Latency and Response Latency should be effectively zero, since components can speak to each other directly. The only ways to improve the performance are to optimize the code or throw more horsepower at the system. Code optimization is always a good idea, but vertical scaling hits hardware limitations pretty quickly, and is often a huge waste when trying to solve variable performance issues for a single component.

In this imaginary distributed application, I’m yanking out that problem component and giving it its own set of resources. The rest of the application can run just as it always did, but instead of throwing massive resources at the whole black box, I can throw modest resources at the distributed component. There’s definitely take a hit on on Request Latency and Response Latency (both of which are tied to network latency, Web server latency, and possibly lunar tidal effects).

Measuring the Request Latency and Response Latency will help point to whether or not it makes sense to decouple. First, some debugging capability is required inside the black-box application component to track how long a component processes from the time it receives a request to the time it sends out the result (I’m sure we all do this for all of our code already, right?). To facilitate debugging, I typically set a timestamp variable at the start of the component, and another at the end of the component, and if debugging is enabled, I pass the difference of the two to whatever I’m using for debugging (often just an HTML comment). This is the time to beat.

For the distributed component, one way to capture this latency is to spin up a simple “Hello World” application on the newly provisioned distributed application server (although to get a more realistic picture I’d probably send the same volume of data, or static result dump from the original component instead of the “Hello World” text). From the primary application server, I can run a CURL (PHP), or CFHTTP (ColdFusion), or whatever flavor of HTTP Request I prefer for that application. Before and after the request, the component should output a timestamp. The differences in the time stamps will give a good idea of the combined Request and Response latency. For even better insight, I could add a start timestamp and stop timestamp to the “Hello world” and pass that back to the application.

If the “Hello World” distributed application Request/Response Latency is longer than the black box Component Processing Latency, it’s time to scale up the “hardware” until there is a significant performance difference. In a cloud environment, this type of experimental scalability is possible, but for anyone using physical servers, this tactic may be cost prohibitive (and a clear indicator you should start using cloud resources right away).

Once the performance target is identified, it’s time to decide if the cost of the “infrastructure” is worth the gains from reworking code to support distributed a distributed application architecture.

If there is no real difference between the black-box component and the distributed component, then it may be necessary to look for performance bottlenecks elsewhere in the overall application architecture (database performance, client connections to Web servers, under-provisioned network connections, rogue processes, etc.).

So, to boil it all down to a theory, if the Request/Response latency in a distributed application network is lower than the processing time for a black-box component, that component may be a good candidate for repackaging as a distributed application.

I have a several projects on my plate this month that will let me test out this metric. It may be an oversimplified approach to a complex problem, but when working with a few million lines of spaghetti code, you’ve got to start somewhere.

Formatting Months With AP Stlye

logo_AP_newThe PHP date() class supports a mess-load of different date masks, but developers working with content creators and editors from a journalism background may find the date() class wanting, particularly when it comes to supporting style and usage guides like the AP Stylebook. In particular, AP Style calls for longer month names (Jan., Feb., Aug. Sept. Oct., Nov., Dec.) to be abbreviated, while shorter month names (March, April, May, June, July) remain intact.

I tackled similar AP Style issue realated to how str

So, here’s a quick and dirty function to convert “whole” months to AP Style months:

<?php
//Function to reformat dates (months) to comply with AP Style.
function APdate($datestring){
$APstyleDate = (str_replace(array(
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December",
"Mar.",
"Apr.",
"Jun.",
"Jul.",
"Sep.",
"Jan",
"Feb",
"Mar",
"Apr",
"May",
"Jun",
"Jul",
"Aug",
"Sep",
"Oct",
"Nov",
"Dec"
),
array(
"Jan.",
"Feb.",
"March",
"April",
"May",
"June",
"July",
"Aug.",
"Sept.",
"Oct.",
"Nov.",
"Dec.",
"March",
"April",
"June",
"July",
"Sept.",
"Jan.",
"Feb.",
"March",
"April",
"May",
"June",
"July",
"Aug.",
"Sept.",
"Oct.",
"Nov.",
"Dec."
), $datestring));
return $APstyleDate;
}
?>

Preserve Your Data’s International Flavour

funky

When working with MySQL data sources that contains international characters (i.e., accents, Asian characters), it’s important that form data is entered using UTF-8 encoding and stored in a UTF-8 compliant table. It’s easy to overlook that the data also needs to be pulled out of MySQL using UTF-8, since the default database engine encoding may be set to a local standard like ASCII, even if a particular database or table is set to UTF-8 (or other flavor of Unicode).

Preventing international characters stored in MySQL from going all funky on output is actually pretty simple:

In PHP, we would need to initiation the connection to MySQL in the usual way:

$myQuery=mysql_connect ("mysql.exampledomain.com:3306", "username", "password");

But then add the following line:

mysql_set_charset("UTF8", $myQuery);

Now, all that lovely data should return with all its international flavour intact.

There’s more information about this technique over at Stack Overflow.

Simple Way to Get a Domain from an Email Using T-SQL

One of my recent projects required me to create a Microsoft SQL Server view that displayed (among other information) and email address and its underlying domain. Microsoft SQL Server and T-SQL provide a few simple functions that, when used together, make this a fairly easy task.

So here’s the T-SQL example (with the important stuff highlighted):

SELECT EMAIL,
RIGHT(EMAIL, LEN(EMAIL) - CHARINDEX('@', EMAIL)) AS 'DOMAIN' 
FROM MYTABLE

Parsing text inside a SQL Query can slow things down if you’re dealing with a large dataset, so use this code snippet with caution. If the application allows it, it’s generally a better idea to perform this kind of parsing in the application layer. ColdFusion, PHP and JavaScript provide comparable functions to T-SQL’s RIGHT, LEN and CHARINDEX.

If you’re feeling brave, you can also try pure RegEx rather than use native functions.