A Consistent Approach To Client-Side Cache Invalidation

by Jon Davis 10. August 2013 17:40

Download the source code for this blog entry here: ClientSideCacheInvalidation.zip

TL;DR?

Please scroll down to the bottom of this article to review the summary.

I ran into a problem not long ago where some JSON results from an AJAX call to an ASP.NET MVC JsonResult action were being cached by the browser, quite intentionally by design, but were no longer up-to-date, and without devising a new approach to route manipulation or any of the other fundamental infrastructural designs for the endpoints (because there were too many) our hands were tied. The caching was being done using the ASP.NET OutputCacheAttribute on the action being invoked in the AJAX call, something like this (not really, but this briefly demonstrates caching):

[OutputCache(Duration = 300)]
public JsonResult GetData()
{
return Json(new
{
LastModified = DateTime.Now.ToString()
}, JsonRequestBehavior.AllowGet);
}
@model dynamic
@{
ViewBag.Title = "Home";
}
<h2>Home</h2>
<div id="results"></div>
<div><button id="reload">Reload</button></div>
@section scripts {
<script>
var $APPROOT = "@Url.Content("~/")";
$.getJSON($APPROOT + "Home/GetData", function (o) {
$('#results').text("Last modified: " + o.LastModified);
});
$('#reload').on('click', function() {
window.location.reload();
});
</script>
}

Since we were using a generalized approach to output caching (as we should), I knew that any solution to this problem should also be generalized. My first thought was in the mistaken assumption that the default [OutputCache] behavior was to rely on client-side caching, since client-side caching was what I was observing while using Fiddler. (Mind you, in the above sample this is not the case, it is actually server-side, but this is probably because of the amount of data being transferred. I’ll explain after I explain what I did in my false assumption.)

Microsoft’s default convention for implementing cache invalidation is to rely on “VaryBy..” semantics, such as varying the route parameters. That is great except that the route and parameters were currently not changing in our implementation.

So, my initial proposal was to force the caching to be done on the server instead of on the client, and to invalidate when appropriate.

 

public JsonResult DoSomething()
{
// 
// Do something here that has a side-effect
// of making the cached data stale
// 
Response.RemoveOutputCacheItem(Url.Action("GetData"));
return Json("OK");
}
[OutputCache(Duration = 300, Location = OutputCacheLocation.Server)]
public JsonResult GetData()
{
return Json(new
{
LastModified = DateTime.Now.ToString()
}, JsonRequestBehavior.AllowGet);
}

 

 

<button id="invalidate">Invalidate</button></div>

 

 

$('#invalidate').on('click', function() {
$.post($APPROOT + "Home/DoSomething", null, function(o) {
window.location.reload();
}, 'json');
});

 

image
While Reload has no effect on the Last modified value, the
Invalidate button causes the date to increment.

When testing, this actually worked quite well. But concerns were raised about the payload of memory on the server. Personally I think the memory payload in practically any server-side caching is negligible, certainly if it is small enough that it would be transmitted over the wire to a client, so long as it is measured in kilobytes or tens of kilobytes and not megabytes. I think the real concern is that transmission; the point of caching is to make the user experience as smooth and seamless as possible with minimal waiting, so if the user is waiting for a (cached) payload, while it may be much faster than the time taken to recalculate or re-acquire the data, it is still measurably slower than relying on browser cache.

The default implementation of OutputCacheAttribute is actually OutputCacheLocation.Any. This indicates that the cached item can be cached on the client, on a proxy server, or on the web server. From my tests, for tiny payloads, the behavior seemed to be caching on the server and no caching on the client; for a large payload from GET requests with querystring parameters seemed to be caching on the client but with an HTTP query with an “If-Modified-Since” header, resulting in a 304 Not Modified on the server (indicating it was also cached on the server but verified by the server that the client’s cache remains valid); and for a large payload from GET requests with all parameters in the path, the behavior seemed to be caching on the client without any validation checking from the client (no HTTP request for an If-Modified-Since check). Now, to be quite honest I am only guessing that these were the distinguishing factors of these behavior observations. Honestly, I saw variations of these behaviors happening all over the place as I tinkered with scenarios, and this was the initial pattern I felt I was observing.

At any rate, for our purposes we were currently stuck with relying on “Any” as the location, which in theory would remove server-side caching if the server ran short on RAM (in theory, I don’t know, although the truth can probably be researched, which I don’t have time to get into). The point of all this is, we have client-side caching that we cannot get away from.

So, how do you invalidate the client-side cache? Technically, you really can’t. The browser controls the cache bucket and no browsers provide hooks into the cache to invalidate them. But we can get smart about this, and work around the problem, by bypassing the cached data. Cached HTTP results are stored on the basis of varying by the full raw URL on HTTP GET methods, they are cached with an expiration (in the above sample’s case, 300 seconds, or 5 minutes), and are only cached if allowed to be cached in the first place as per the HTTP header directives in the HTTP response. So, to bypass the cache you don’t cache, or you need to know up front how long the cache should remain until it expires—neither of these being acceptable in a dynamic application—or you need to use POST instead of GET, or you need to vary up the URL.

Microsoft originally got around the caching problem in ASP.NET 1.x by forcing the “normal” development cycle in the lifecycle of <form> tags that always used the POST method over HTTP. Responses from POST requests are never cached. But POSTing is not clean as it does not follow the semantics of the verbiage if nothing is being sent up and data is only being retrieved.

You can also use ETag in the HTTP headers, which isn’t particularly helpful in a dynamic application as it is no different from a URL + expiration policy.

To summarize, to control cache:

  • Disable caching from the server in the Response header (Pragma: no-cache)
  • Predict the lifetime of the content and use an expiration policy
  • Use POST not GET
  • Etag
  • Vary the URL (case-sensitive)

Given our options, we need to vary up the URL. There a number of approaches to this, but almost all of the approaches involve relying on appending or modifying the querystring with parameters that are expected to be ignored by the server.

$.getJSON($APPROOT + "Home/GetData?_="+Date.now(), function (o) {
$('#results').text("Last modified: " + o.LastModified);
});

In this sample, the URL is appended with “?_=”+Date.now(), resulting in this URL in the GET:

/Home/GetData?_=1376170287015

This technique is often referred to as cache-busting. (And if you’re reading this blog article, you’re probably rolling your eyes. “Duh.”) jQuery inherently supports cache-busting, but it does not do it on its own from $.getJSON(), it only does it in $.ajax() when the options parameter includes {cache: false}, unless you invoke $.ajaxSetup({ cache: false }); first to disable all caching. Otherwise, for $.getJSON() you would have to do it manually by appending the URL. (Alright, you can stop rolling your eyes at me now, I’m just trying to be thorough here..)

This is not our complete solution. We have a couple problems we still have to solve.

First of all, in a complex client codebase, hacking at the URL from application logic might not be the most appropriate approach. Consider if you’re using Backbone.js with routes that synchronize objects to and from the server. It would be inappropriate to modify the routes themselves just for cache invalidation. A more generalized cache invalidation technique needs to be implemented in the XHR-invoking AJAX function itself. The approach in doing this will depend upon your Javascript libraries you are using, but, for example, if jQuery.getJSON() is being used in application code, then jQuery.getJSON itself could perhaps be replaced with an invalidation routine.

var gj = $.getJSON;
$.getJSON = function (url, data, callback) {
url = invalidateCacheIfAppropriate(url); // todo: implement something like this
return gj.call(this, url, data, callback);
};

This is unconventional and probably a bad example since you’re hacking at a third party library, a better approach might be to wrap the invocation of $.getJSON() with an application function.

var getJSONWrapper = function (url, data, callback) {
url = invalidateCacheIfAppropriate(url); // todo: implement something like this
return $.getJSON(url, data, callback);
};

And from this point on, instead of invoking $.getJSON() in application code, you would invoke getJSONWrapper, in this example.

The second problem we still need to solve is that the invalidation of cached data that derived from the server needs to be triggered by the server because it is the server, not the client, that knows that client cached data is no longer up-to-date. Depending on the application, the client logic might just know by keeping track of what server endpoints it is touching, but it might not! Besides, a server endpoint might have conditional invalidation triggers; the data might be stale given specific conditions that only the server may know and perhaps only upon some calculation. In other words, invalidation needs to be pushed by the server.

One brute force, burdensome, and perhaps a little crazy approach to this might be to use actual “push technology”, formerly “Comet” or “long-polling”, now WebSockets, implemented perhaps with ASP.NET SignalR, where a connection is maintained between the client and the server and the server then has this open socket that can push invalidation flags to the client.

We had no need for that level of integration and you probably don’t either, I just wanted to mention it because it might come back as food for thought for a related solution. One scenario I suppose where this might be useful is if another user of the web application has caused the invalidation, in which case the current user will not be in the request/response cycle to acquire the invalidation flag. Otherwise, it is perhaps a reasonable assumption that invalidation is only needed, and only triggered, in the context of a user’s own session. If not, perhaps it is a “good enough” assumption even if it is sometimes not true. The expiration policy can be set low enough that a reasonable compromise can be made between the current user’s changes and changes invoked by other systems or other users.

While we may not know what server endpoint might introduce the invalidation of client cache data, we could assume that the invalidation will be triggered by any server endpoint(s), and build invalidation trigger logic on the response of server HTTP responses.

To begin implementing some sort of invalidation trigger on the server I could flag invalidations to the client using HTTP header(s).

public JsonResult DoSomething()
{
//
// Do something here that has a side-effect
// of making the cached data stale
//
InvalidateCacheItem(Url.Action("GetData"));
return Json("OK");
}
public void InvalidateCacheItem(string url)
{
Response.RemoveOutputCacheItem(url); // invalidate on server
Response.AddHeader("X-Invalidate-Cache-Item", url); // invalidate on client
}
[OutputCache(Duration = 300)]
public JsonResult GetData()
{
return Json(new
{
LastModified = DateTime.Now.ToString()
}, JsonRequestBehavior.AllowGet);
}

At this point, the server is emitting a trigger to the HTTP client that says that “as a result of a recent operation, that other URL, the one for GetData, is no longer valid for your current cache, if you have one”. The header alone can be handled by different client implementations (or proxies) in different ways. I didn’t come across any “standard” HTTP response that does this “officially”, so I’ll come up with a convention here.

image

Now we need to handle this on the client.

Before I do anything first of all I need to refactor the existing AJAX functionality on the client so that instead of using $.getJSON, I might use $.ajax or some other flexible XHR handler, and wrap it all in custom functions such as httpGET()/httpPOST() and handleResponse().

var httpGET = function(url, data, callback) {
return httpAction(url, data, callback, "GET");
};
var httpPOST = function (url, data, callback) {
return httpAction(url, data, callback, "POST");
};
var httpAction = function(url, data, callback, method) {
url = cachebust(url);
if (typeof(data) === "function") {
callback = data;
data = null;
}
$.ajax(url, {
data: data,
type: "GET",
success: function(responsedata, status, xhr) {
handleResponse(responsedata, status, xhr, callback);
}
});
};
var handleResponse = function (data, status, xhr, callback) {
handleInvalidationFlags(xhr);
callback.call(this, data, status, xhr);
};
function handleInvalidationFlags(xhr) {
// not yet implemented
};
function cachebust(url) {
// not yet implemented
return url;
};
// application logic
httpGET($APPROOT + "Home/GetData", function(o) {
$('#results').text("Last modified: " + o.LastModified);
});
$('#reload').on('click', function() {
window.location.reload();
});
$('#invalidate').on('click', function() {
httpPOST($APPROOT + "Home/Invalidate", function (o) {
window.location.reload();
});
});

At this point we’re not doing anything yet, we’ve just broken up the HTTP/XHR functionality into wrapper functions that we can now modify to manipulate the request and to deal with the invalidation flag in the response. Now all our work will be in handleInvalidationFlags() for capturing that new header we just emitted from the server, and cachebust() for hijacking the URLs of future requests.

To deal with the invalidation flag in the response, we need to detect that the header is there, and add the cached item to a cached data set that can be stored locally in the browser with web storage. The best place to put this cached data set is in sessionStorage, which is supported by all current browsers. Putting it in a session cookie (a cookie with no expiration flag) works but is less ideal because it adds to the payload of all HTTP requests. Putting it in localStorage is less ideal because we do want the invalidation flag(s) to go away when the browser session ends, because that’s when the original browser cache will expire anyway. There is one caveat to sessionStorage: if a user opens a new tab or window, the browser will drop the sessionStorage in that new tab or window, but may reuse the browser cache. The only workaround I know of at the moment is to use localStorage (permanently retaining the invalidation flags) or a session cookie. In our case, we used a session cookie.

Note also that IIS is case-insensitive on URI paths, but HTTP itself is not, and therefore browser caches will not be. We will need to ignore case when matching URLs with cache invalidation flags.

Here is a more or less complete client-side implementation that seems to work in my initial test for this blog entry.

function handleInvalidationFlags(xhr) {
// capture HTTP header
var invalidatedItemsHeader = xhr.getResponseHeader("X-Invalidate-Cache-Item");
if (!invalidatedItemsHeader) return;
invalidatedItemsHeader = invalidatedItemsHeader.split(';');
// get invalidation flags from session storage
var invalidatedItems = sessionStorage.getItem("invalidated-cache-items");
invalidatedItems = invalidatedItems ? JSON.parse(invalidatedItems) : {};
// update invalidation flags data set
for (var i in invalidatedItemsHeader) {
invalidatedItems[prepurl(invalidatedItemsHeader[i])] = Date.now();
}
// store revised invalidation flags data set back into session storage
sessionStorage.setItem("invalidated-cache-items", JSON.stringify(invalidatedItems));
}
// since we're using IIS/ASP.NET which ignores case on the path, we need a function to force lower-case on the path
function prepurl(u) {
return u.split('?')[0].toLowerCase() + (u.indexOf("?") > -1 ? "?" + u.split('?')[1] : "");
}
function cachebust(url) {
// get invalidation flags from session storage
var invalidatedItems = sessionStorage.getItem("invalidated-cache-items");
invalidatedItems = invalidatedItems ? JSON.parse(invalidatedItems) : {};
// if item match, return concatonated URL
var invalidated = invalidatedItems[prepurl(url)];
if (invalidated) {
return url + (url.indexOf("?") > -1 ? "&" : "?") + "_nocache=" + invalidated;
}
// no match; return unmodified
return url;
}

Note that the date/time value of when the invalidation occurred is permanently stored as the concatenation value. This allows the data to remain cached, just updated to that point in time. If invalidation occurs again, that concatenation value is revised to the new date/time.

Running this now, after invalidation is triggered by the server, the subsequent request of data is appended with a cache-buster querystring field.

image

 

In Summary, ..

.. a consistent approach to client-side cache invalidation triggered by the server might be by following these steps.

  1. Use X-Invalidate-Cache-Item as an HTTP response header to flag potentially cached URLs as expired. You might consider using a semicolon-delimited response to list multiple items. (Do not URI-encode the semicolon when using it as a URI list delimiter.) Semicolon is a reserved/invalid character in URI and is a valid delimiter in HTTP headers, so this is valid.
  2. Someday, browsers might support this HTTP response header by automatically invalidating browser cache items declared in this header, which would be awesome. In the mean time ...
  3. Capture these flags on the client into a data set, and store the data set into session storage in the format:
    		{
    	"http://url.com/route/action": (date_value_of_invalidation_flag),
    	"http://url.com/route/action/2": (date_value_of_invalidation_flag)
    	}
    	
  4. Hijack all XHR requests so that the URL is appropriately appended with cachebusting querystring parameter if the URL was found in the invalidation flags data set, i.e. http://url.com/route/action becomes something like http://url.com/route/action?_nocache=(date_value_of_invalidation_flag), being sure to hijack only the XHR request and not any logic that generated the URL in the first place.
  5. Remember that IIS and ASP.NET by default convention ignore case (“/Route/Action” == “/route/action”) on the path, but the HTTP specification does not and therefore the browser cache bucket will not ignore case. Force all URL checks for invalidation flags to be case-insensitive to the left of the querystring (if there is a querystring, otherwise for the entire URL).
  6. Make sure the AJAX requests’ querystring parameters are in consistent order. Changing the sequential order of parameters may be handled the same on the server but will be cached differently on the client.
  7. These steps are for “pull”-based XHR-driven invalidation flags being pulled from the server via XHR. For “push”-based invalidation triggered by the server, consider using something like a SignalR channel or hub to maintain an open channel of communication using WebSockets or long polling. Server application logic can then invoke this channel or hub to send an invalidation flag to the client or to all clients.
  8. On the client side, an invalidation flag “push” triggered in #7 above, for which #1 and #2 above would no longer apply, can still utilize #3 through #6.

You can download the project I used for this blog entry here: ClientSideCacheInvalidation.zip

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

ASP.NET | C# | Javascript | Techniques | Web Development

CAPTCHA This, Yo! - Early Alpha

by Jon Davis 14. July 2009 01:54

I've posted a mini-subproject:

http://www.CAPTCHAThisYo.com/

The site is self-explanatory. The idea is simple. I want CAPTCHA. I don't want to support CAPTCHA in my apps. I just want to drop in a one-liner snippet somewhere and call it done. I think other people share the same desire. So I now support CAPTCHA as a CAPTCHA app. I did all the work for myself so that I don't have to do that work. I went through all that trouble so that I don't have to go through the trouble .... Wait, ...

Seriously, it's not typical CAPTCHA, and it's Not Quite Done Yet (TM). It's something that'll evolve. Right now there isn't even any hard-to-read graphic CAPTCHA.

But what I'd like to do is have an ever-growing CAPTCHA questions library and, by default, have it just rotate through them randomly. The questions might range from shape detection to riddles to basic math. I'd really like to have some kind of community uploads thingmajig with ratings, so that people can basically share their own CAPTCHA solutions and they all run inside the same captchathisyo.com CAPTCHA engine. I'm just not sure yet how to pull that off.

Theoretically, I could take the approach Microsoft took when C# was initially released (long, long ago, in a galaxy far, far away), they had a cool insect sandbox game where you could write a .NET class that implements some interface and then send it up to the server and it would just run as an insect on the server. The objective of the game was to write the biggest killer/eater. I'm not sure how feasible the idea is of opening up all .NET uploads to the server, but it's something I'm pondering on.

Anyway, the concept has been prototyped and the prototype has been deployed is sound, but I still need to work out cross-site scripting limitations, bear with me. I still need to find a designer to make something beautful out of it. That said, feel free to use it and give feedback. Stand by.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

C# | Computers and Internet | Cool Tools | Pet Projects | Techniques | Web Development

Quick Tip: Invoke A URL From SQL Server Agent

by Jon Davis 10. October 2008 23:10

Sometimes for a web application it makes sense to perform batch operations on the web site itself. For example,

  • Your web app has matured with deep-rooted dependencies upon System.Web.HttpContext.
  • You want to reuse your object libraries that are already available in your web site. 
  • You require business logic that is only available in (misplaced) .ASCX controls.
  • You want to use ASP.NET templating to generate your e-mails.

But there's a problem: ASP.NET does not have an infinite loop to spawn batch processes. Either something has to keep pounding on ASP.NET, or else something has to invoke the web site to get it to perform the batch job.

Several years ago (wow, half a decade ago), as a matter of curiosity about the same problem worth solving (but having no actual instance of the problem, hehe), I pondered and published a prototype of the notion of having an object hosted in the Application collection that has a timer on it that routinely performs a GET operation on a web page to keep the application running indefinitely while allowing a page to be re-invoked frequently. I abandoned the thought because I wasn't sure what the implications would be, and in fact my little disclaimer/warning on the published prototype is nothing but FUD.

In a previous job, I saw regular use of custom C# console applications invoked with the Windows Task Scheduler, which is one way of doing the job. I think, though, that having seen the Task Scheduler for the first time in Windows 98 I can't help but reminisce of crappy OS times whenever I see the same scheduler today. Perhaps my discomfort is lacking merit.

I've also seen plenty of Windows services that are used for this purpose. Sadly, these seem to be as overkill as Task Scheduler seems vulnerable.

But I think this might also be a problem for SQL Server Agent to solve, which exists to be an application-level task scheduler and can simply perform an HTTP GET to spawn the batch processes. I initially considered using VBScript to load an ActiveX object that would perform an HTTP GET operation, but then I realized that I needed to research available COM objects in a Server 2008 context, and COM and specific ActiveX products from Microsoft are so old. I considered writing a COM object in .NET with a CCW (COM-callable wrapper) and go through the hassle of registering it. But then I realized, meh, why not just call wget? Wget is a GNU app that does exactly what the name infers: it does a World Wide Web 'GET' operation, spitting back the results to a file while displaying the HTTP headers to the console (although you can disable all of that). So the following is a configuration that I came up with.

  1. Select a URL that's on a different root branch from your user workflow, where you will store your batch jobs. For example, I like ~/batchjobs/mybatchjob/default.aspx (invoked as "/batchjobs/mybatchjob")
  2. Temporarily increase the timeout to suit your process's needs.
  3. In Page_Load(), slap on a Response.ContentType = "text/plain", invoke a RunXXX() method where you do your stuff, and plan on posting progress status information to the client using "Response.Write(" stuff \r\n"); Response.Flush();" wherever in a console app you'd normally use Console.WriteLine().
  4. Install SQL Server Agent on the database server.
  5. On the same server as SQL Server Agent, install wget for Windows. Add its directory to the %PATH% system environment variable, and grant ACL security rights on wget.exe to the designated SQL Server Agent system user account.
  6. In a new Job, use type CmdExec in a task to invoke: wget http://www.yoursite.com/batchjobs/mybatch1/ --spider -q
  7. The argument --spider drops the saving of the response body, and -q drops the HTTP header rendering on the console. If you want to retain the console output and HTTP response body, you can drop and/or replace the "--spider -q" arguments; use --help for more options.

The web page will be executed on the same schedule as the Agent.

I hope this was helpful.

kick it on DotNetKicks.com

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: ,

Techniques


 

Powered by BlogEngine.NET 1.4.5.0
Theme by Mads Kristensen

About the author

Jon Davis (aka "stimpy77") has been a programmer, developer, and consultant for web and Windows software solutions professionally since 1997, with experience ranging from OS and hardware support to DHTML programming to IIS/ASP web apps to Java network programming to Visual Basic applications to C# desktop apps.
 
Software in all forms is also his sole hobby, whether playing PC games or tinkering with programming them. "I was playing Defender on the Commodore 64," he reminisces, "when I decided at the age of 12 or so that I want to be a computer programmer when I grow up."

Jon was previously employed as a senior .NET developer at a very well-known Internet services company whom you're more likely than not to have directly done business with. However, this blog and all of jondavis.net have no affiliation with, and are not representative of, his former employer in any way.

Contact Me 


Tag cloud

Calendar

<<  March 2019  >>
MoTuWeThFrSaSu
25262728123
45678910
11121314151617
18192021222324
25262728293031
1234567

View posts in large calendar