Wednesday, October 21, 2009

5.6. Applied Resource-Oriented Architecture



5.6. Applied Resource-Oriented
Architecture


Recently, I built a
resource-oriented system on the rearchitecture work my company did for the
Persistent URL (PURL) system. The original PURL[27] implementation was done close to 15 years ago. It was a
forked version of Apache 1.0, written in C and reflecting the state of the art
at the time.[28] It
has been a steady piece of Internet infrastructure since then, but it was
showing its age and needed modernization, particularly to support the W3C TAG's
303 recommendation and higher volumes of use. Most of the data was accessible
through web pages or ad hoc CGI-bin scripts because at the time, the browser
seemed like the only real client to serve. As we started to realize the
applicability of persistent, unambiguous identifiers for use in the Semantic
Web, life sciences, publication, and similar communities, we knew that it was
time to rethink the architecture to be more useful for both people and
software.



[27] http://purl.org



[28] This codebase formed the basis of the very successful
TinyURL (http://tinyurl.com) service.


The PURL system was designed to
mediate the tension between good names and resolvable names. Anyone who has been
publishing content on the Web over time knows that links break when content gets
moved around. The notion of a Persistent URL is one that has a good, logical
name that maps to a resolvable location. For example, a PURL could be defined
that points from http://purl.org/people/briansletten to http://bosatsu.net/foaf/brian.rdf and
returns a 303 to indicate a "see also" response. I am not a network-addressable
resource, but my Friend-of-a-Friend (FOAF) file[29] is a place to find more information about me. I could pass
that PURL around to anyone who wants to link to my FOAF file. If I ever move to
some other company, I could update the PURL to point to a new location for my
FOAF file. All existing links will remain valid; they will just 303 to the new
location. This process is described in Figure 5-6. The PURL Server implements the W3C Technical
Architecture Group (TAG) guidance that 303 response codes can be used to provide
more information about non-network addressable resources.



[29] http://foaf-project.org



Figure 5-6. PURL "See Also" redirect




In addition to supporting the PURL
redirection, we wanted to treat each major piece of data in the PURL system as
an addressable information resource. Not only does this simplify the interaction
with the user interface, it allows for unintended potential reuse of the data
beyond what we originally planned. Manipulation of the resource requires
ownership credentials, but anyone is allowed to fetch the definition of a PURL.
There is the direct resolution process of hitting a PURL such as http://purl.org/employee/briansletten (which will result in the 303 redirect), as well as the
indirect RESTful address of the PURL resource http://purl.org/admin/purl/employee/briansletten, which will return a definition of the PURL that currently
looks something like the following:

<purl status="1">
<id>/employee/briansletten</id>
<type>303</type>
<maintainers>
<uid>brian</uid>
</maintainers>
<seealso>
<url>http://bosatsu.net/foaf/brian.rdf</url>
</seealso>
</purl>


Clients of the PURL server can "surf" to
the data definition as a means of finding information about a PURL resource
without actually resolving it. No code needs to be written to retrieve this
information. We can view it in a browser or capture it on the command line with
curl. As such, we can imagine writing shell scripts that use data from our
information resources to check whether a PURL points to something valid and is
returning reasonable results. If not, we could find the owner of the PURL and
fire off a message to the email address associated with the account.
Addressable, accessible data finds its way into all manner of unintended
orchestrations, scripts, applications, and desktop widgets because it is so easy
and useful to do so.


In the interest of full disclosure,
we failed to support JSON as a request format in the initial release, which
complicated the AJAX user interface. JavaScript XML handling leaves a lot to be
desired. Even though we use the XML form internally, we should have gone to the
trouble of exposing the JSON form for parsing in the browser. You can be sure we
are fixing this oversight soon, but I thought it was important to highlight the
benefits we could have taken advantage of if we had gotten it right in the first
place. You do not need to support all data formats up front, but these days
supporting both XML and JSON is a good start.


As an interesting side note, we could
have chosen several containers and tools to expose this architecture as
expressed so far. Anything that responds to HTTP requests could have acted as
our PURL server. This represents a shallow but useful notion of RESTful
interfaces and resource-oriented architecture, as demonstrated in Figure 5-7. Any web server or application server can act as a shallow
resource-oriented engine. The logical HTTP requests are interpreted as requests
into servlets, Restlets, and similar addressable functionality.



Figure 5-7. Shallow resource-oriented
architectures




We chose to use NetKernel as the foundation for this architecture because
it is the embodiment of resource-oriented architectures and has a dual license,
allowing its use with both open source and commercial projects. The idea of a
logical coupling between layers with different representations is baked into the
software architecture and offers similar benefits of flexibility, scalability,
and simplicity. The linkage between the layers is through asynchronously
resolved, logical names. This deeper notion of resource-oriented architectures
looks something like Figure 5-8.
NetKernel is an interesting software infrastructure because it takes the idea of
logically connected resources inside so that HTTP logical requests can be turned
into other logical requests. This architecture reflects the properties of the
Web in a runtime software environment.



Figure 5-8. Deep resource-oriented architectures




The external URL http://purl.org/employee/briansletten gets mapped through a rewrite to a piece of
functionality called an accessor.[30]
Accessors live in modules that export public URI definitions representing an
address space they will respond to. The convenience here is that it is possible
to radically change the implementation technologies in a newer version of a
module and simply update the rewrite rules to point to the new implementation.
The client needs to be none the wiser as long as we return compatible responses.
We can approximate this flexibility in modern object-oriented languages through
the use of interfaces, but that still constrains us to a "physical" coupling to
the interface definition. With the logical-only binding, we still need to
support expectations from existing clients, but beyond that we are not coupled
to any particular implementation detail. This is the same value we see
communicating through URIs on the Web, but in locally running software!



[30] http://docs.1060.org/docs/3.3.0/book/gettingstarted/doc_intro_code_accessor.html


Internally, we use the Command
Pattern[31]
associated with the method type of the request to implement the accessor. An
HTTP GET method is mapped to a GetResourceCommand that maintains no state. When the request comes in, we
pull the command out of a map and issue the request to it. The REST stateless
style ensures that all information needed to answer the request is contained in
the request, so we do not need to maintain state in the command instance. We can
access that request state through the context instance in the following code.
This code looks relatively straightforward to Java developers. We are calling
methods on Java objects, catching exceptions, the works. An important thing to
note is the use of the IURAspect interface. We are essentially saying that we do
not care what form the resource is in. It could be a DOM instance, a JDOM
instance, a string, or a byte array; for our purposes it does not matter. The
infrastructure will convert it into a bytestream tagged with metadata before
responding to the request. If we had wanted it in a particular form supported by
the infrastructure, we could have simply asked for it in that form. This
declarative, resource-oriented approach helps radically reduce the amount of
code that is necessary to manipulate data and allows us to use the right tool
for the right job:



[31] http://en.wikipedia.org/wiki/Command_pattern

if(resStorage.resourceExists(context, uriResolver)) {
IURAspect asp = resStorage.getResource(context, uriResolver);

// Filter the response if we have a filter
if (filter!=null) {
asp = filter.filter(context, asp);
}

// Default response code of 200 is fine
IURRepresentation rep = NKHelper.setResponseCode(context, asp, 200);
rep = NKHelper.attachGoldenThread(context, "gt:" + path , rep);
retValue = context.createResponseFrom(rep);
retValue.setCacheable();
retValue.setMimeType(NKHelper.MIME_XML);
} else {
IURRepresentation rep = NKHelper.setResponseCode(context,
new StringAspect("No such resource: "
+ uriResolver.getDisplayName(path)), 404);
retValue = context.createResponseFrom(rep);
retValue.setMimeType(NKHelper.MIME_TEXT);
}


Most of the information resources will
return a 200 when a GET request is issued. Obviously, PURLs override that
behavior to return 302, 303, 307, 404, etc. The interesting resource-oriented
tidbit is revealed when we inspect the PURL-oriented implementation of the
resStorage.getResource() method:


 


INKFRequest req = context.createSubRequest("active:purl-storage-query-purl");
req.addArgument("uri", uri);
IURRepresentation res = context.issueSubRequest(req);
return context.transrept(res, IAspectXDA.class);




In essence, we are issuing a logical request
through the active:purl-storage-query-purl URI
with an argument of ffcpl:/purl/employee/briansletten. Ignore the unusual URI scheme; it is simply used to
represent an internal request in NetKernel. We do not know what code is actually
going to be invoked to retrieve the PURL in the requested form, nor do we
actually care. In a resource-oriented environment, we simply are saying, "The
thing that responds to this URI will generate a response for me." We are now
free to get things going quickly by serving static files to clients of the
module while we design and build something like a Hibernate-based mapping to our
relational database. We can make this transition by rewriting what responds to
the active:purl-storage-query-purl URI. The
client code never needs to know the difference. If we change the PURL resolution
away from a local persistence layer to a remote fetch, the client code can still
not care. These are the benefits we have discussed in the larger notion of
resource-oriented Enterprise computing made concrete in a powerful software
environment.


Not only are our layers loosely coupled
like this, but we get the benefit of an idempotent, stateless request in this
environment as well. The earlier code snippet that fetches the PURL definition
gets flattened internally to an asynchronously scheduled request to the URI
active:purl-storage-query-purl+uri@ffcpl:/purl/employee/briansletten. As we discussed earlier, this becomes a compound
hash key representing the result of querying our persistence layer for the
generated result. Even though we know nothing about the code that gets invoked,
NetKernel is able to cache the result nonetheless. This is the architectural
memoization that I mentioned before. The
actual process is slightly more nuanced, but in spirit, this is what is going
on. If someone else tries to resolve the same PURL either internally or through
the HTTP RESTful interface, we could pull the result from the cache. Though this
may not impress anyone who has built caching into their web pages, it is
actually a far more compelling result when you dig deeper. Any potential URI
request is cacheable in this way, whether we are reading files in from disk,
fetching them via HTTP, transforming an XML document through an XSLT file, or
calculating pi to 10,000 digits. Each of these invocations is done through a
logical, stateless, asynchronous result, and each has the potential to be
cached. This resource-oriented architectural style gives us software that
scales, is efficient, is cacheable, and works through uniform, logical
interfaces. This results in substantially less brittle, more flexible
architectures that scale, just like the Web and for the same reasons.


 


 


No comments: