Insights
Bringing our best ideas and thinking to you.
Blog Post
August 24, 2016
Share this page:
Improving Data Driven Website Performance
By Ken VanderAa
Can you significantly improve your website’s performance? Data driven websites are at the core of enabling a rich user experience, but working with large data sets – especially ones out of your control – can carry a heavy performance penalty for your HTML render. In enterprise web development, improving performance or continuing to maintain a high standard while implementing new features is always something to be mindful of. This series of posts aims to be an introduction on how this goal can be achieved using simple caching methods and is intended for a developer unfamiliar with the concept. In this post, I will introduce a couple of real-life scenarios and address the first.
I have seen a pattern emerge over the years I have spent developing for the enterprise web. In my time working at A Hundred Answers, I have worked on a few Web Content Management (WCM) and portal systems – several versions of OpenText TeamSite and LiveSite (previously owned by HP), Sitecore 8.1 Experience Platform, an Adxstudio site backed by Microsoft Dynamics, and two custom .NET portals backed by SAP. This issue has arisen, no matter the technology or product.
Consider the following scenarios:
- A resource on the file system must be read and parsed. It can be of complex structure, indeterminate size, and will be managed by a third party. For example, you have a requirement to read configuration data that will be managed from another system. This data will allow the business to enabled or disable special deals on your website at will, via their internal system. You must allow for updates at runtime without restarting, meaning you need to reprocess the file frequently. The issue? Reading, parsing, and querying against the resource multiple times per request cycle gets expensive quickly.
- A need arises to query another system for data to allow a rich interaction with your user. For example, you have a requirement to display a list of products that has been customized to show deals on the inventory that needs to move from all locations within a day’s shipping of the target IP address. The complication? The web service has already been (or will be) built by someone else. And… it’s slow.
In both of these situations, the issue comes down to working with data managed outside of your system. It’s out of your control and yet ultimately your website’s performance depends on it. In both cases, caching can help. If you search the web, you will find many discussions on approaches, techniques, and frameworks of varying complexity to perform caching. However, if we strip all of the complexities and constructs aside, on the most basic level, caching is defined as storing data for future use. Let’s take a look at a simple way we can do just that.
In the first scenario, it is possible to take advantage of a very simple form of in-memory caching applicable to web application development. It’s simple enough that some developers might not even consider it caching – HTTP Request Attribute caching. Simply put, after reading and parsing the resource, we store the processed data in an HTTP request attribute. All future queries against that resource during this request lifecycle can be run against the stored data.
The following example code assumes that it is being executed within request scope. For clarity, assume that Data is a Plain Old Java Object (POJO) that we have defined elsewhere to hold the data we care about.
/**
* Gets data, preferring the cached copy.
*/
public Data getData(HttpServletRequest request) {
// Retrieve the data from the request if present.
Data data = (Data) request.getAttribute("data");
if (data == null) {
// Retrieve data from source, process, etc.
data = loadData();
// store the processed data in the request
request.setAttribute("data", data);
}
return data;
}
/**
* Loads the data from source.
*/
private Data loadData() {
// TODO: load data and process
}
This already helps! No need to process the resource multiple times within the request. Any calls to getData() will prefer to use the cached data. Depending on how heavy your processing cost is and how many queries are being made, this might be enough to boost your performance to within target. However, while this helps improve multiple queries against a resource, each request still has to read and parse that resource. We can improve this solution further, at the cost of introducing a little more complexity.
In this next part, we need to change our code’s scope from the request to the application. Each request only lives for the duration it takes the application to assemble and send the response to the client. After the application sends the response, the request goes out of scope and is destroyed, along with any data we have stored. To store the data between requests, we need to have an object that also continues to live between requests.
Application scope means that an object is instantiated when the application is launched and lives until the application is shut down. This is often accomplished through the use of the Singleton design pattern, initializing the object on application launch, and cleaning up on application shutdown. This can be either custom code or through a framework like Spring.
Using an application scoped object allows us to us proactively cache data on application start, before we even need it – meaning even the first query will be significantly faster. However, as the application lives for a much longer time than the request, we need to reload the data occasionally to ensure that it is valid and matches the state of the external resource. Rather than waiting until a request is made to discover that our data is outdated, we will process the resource in a separate thread and make the result available to request threads (i.e. your website visitors) on demand. The data loading thread is launched from our application scoped object on start up, and the external resource is then reloaded on an interval that is relevant to the data.
The following example code assumes that it is being executed within application scope, as a singleton (or Spring bean). init() will be called on application start, and destroy() will be called on application shutdown. A calling class would need a reference to this object, and would call getData().
public class DataService implements Runnable {
// Class members
private Data data;
private boolean keepReloading = false;
/**
* Invoked on startup to start the Thread that refreshes the data.
*/
public void init() {
// Force an immediate synchronous load of the data
reload();
// Start the async thread
keepReloading = true;
new Thread(this).start();
}
/**
* @see java.lang.Runnable#run()
*/
@Override
public void run() {
while (keepReloading) {
try {
reload();
Thread.sleep(60000); // 60 seconds
} catch (InterruptedException e) {
// intentionally blank
}
}
}
/**
* Refreshes the data.
*/
private void reload() {
Data data = loadData();
synchronized (this) {
this.data = data;
}
}
/**
* Loads the data from source.
*/
private Data loadData() {
// TODO: load data and process
}
/**
* Invoked on shutdown stop the data refreshing thread.
*/
public void destroy() {
keepReloading = false;
}
/**
* Provide the current data to a caller.
*/
public Data getData() {
Data data;
synchronized(this) {
data = this.data;
}
return data;
}
}
This approach allows us to leverage a multi-threaded web server environment to improve performance on the thread(s) that matter the most by offloading the processing to a separate thread and proactively loading data. The only caveat is that as with any multi-threaded code, you need to ensure that you access the data in a thread-safe fashion.
I hope that you found this short introduction on caching insightful, and see how it can be applied to speed up your website or limit your dependence on external resources. In part 2, I will discuss an approach to the second scenario and how to mitigate performance issues for dynamic content. Until then, check out some of our other posts!