High performance dynamic REST web pages with passive access
The next question is about the caching framework that will be implemented or already exists for the behavior invoked by REST, described below.
The goal is that GET and HEAD requests should be handled as efficiently as requests for static pages.
As far as technology is concerned, I am thinking of Java Servlets and MySQL to implement the site. (But good reasons can still influence my technology choices.)
Web pages must support GET, HEAD, and POST; GET and HEAD are much more common than POST. The content of the page will not change with GET / HEAD, only with POST. So I want to serve GET and HEAD requests directly from the filesystem and only POST requests from the servlet.
- The first (slightly incomplete) idea is for the POST request to pre-calculate the HTML for sequential GET / HEAD requests and store them on the filesystem. GET / HEAD will then always get the file from there. I believe this can be easily implemented in Apache with conditional URL rewriting.
- A better approach is that GET will serve HTML from the filesystem (and HEAD also uses it) if there is a precomputed file, and will otherwise reference the servlet engine to generate it on the fly. POST in this case will not generate any HTML code, but only update the database accordingly and remove the HTML file from the file system as a flag so that it will be regenerated with the following GET / HEAD. The advantage of this second approach is that it more intelligently handles the "start phase" of web pages where POST has not yet been called. I believe this lazy generation and storage approach can be implemented in Apache by providing an error handler that will invoke the servlet if "file-not-found-but-should-be-there ".
In a later round of refinement, to conserve bandwidth, the cached HTML files should also be available in a gzip-ed version, which is provided when the client understands it. I believe the basic mechanisms should be the same as for uncompressed HTML files.
Since there will be many such REST pages, both approaches may sometimes need some sort of garbage collection mechanism — rarely HTML files are used to conserve file space.
To summarize, I am confident that my optimized GET / HEAD architecture can be implemented cleanly. I would like to have opinions on this idea first of all (I think this is good, but I could be wrong), and does anyone have experience with such an architecture, perhaps they even know that it is a loose structure implementing it.
Finally, I would like to point out that client caching is not the solution I am using because several different clients will be GET or HEAD on the same page. Moreover, I want to absolutely avoid servlet mechanisms during GET / HEAD requests in case the precomputed file exists. It doesn't even need to be called to provide cache-related HTTP headers in GET / HEAD requests and doesn't dump the file for output.
- Are there better (standard) mechanisms for achieving the goal stated at the beginning?
- If not, does anyone know of the existing structure as I believe?
I think the HTTP cache is not achieving my goal. As I understand it, the HTTP cache will still have to call the servlet with a HEAD request to see if the POST has changed the page. Since page changes arrive at unpredictable times, the expiration time HTTP header is not good enough.
source to share
Use Expires HTTP Header and / or HTTP Conditional Requests .
The Expires entity-header field specifies the date / time after which the response is considered obsolete. A written cache entry usually cannot be returned by the cache (either the proxy cache or the user agent cache), unless it is first checked against the origin server (or an intermediate cache that has a new copy of the object). See Section 13.2 for further discussion of the expiration model.
Decorate a cache capable response with Expires, Last-Modified and / or ETag header. Make requests conditional with If-Modified-Since, If-None-Match header, If- *, etc. (See RFC).
eg. Latest response headers:
... Expires: Wed, 15 Nov 1995 04:58:08 GMT ...
don't make a new request on the resource until the expiration date (Expires header), and then execute the conditional request:
... If-Modified-Since: Wed, 15 Nov 1995 04:58:08 GMT ...
If the resource has not been modified, a 304 unmodified response code is returned and the response has no body. 200 OK and a response with a body is returned otherwise.
Note. The HTTP RFC also defines the Cache-Control header
See Caching in HTTP http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html
source to share