May 15, 2008 at 1:34PM Quick hacks: Python netstring reader
I’m writing a proof-of-concept server for a less jokey version of the SRGP. The protocol is more or less SRGP, but it doesn’t use MIME-style headers each request is made up of a series of netstrings. The header and body sections are terminated with empty netstrings, i.e., 0:,.
The proof-of-concept server is being written in Python. Here’s the netstring reader generator function, in case it might be of use to somebody else out there:
class MalformedNetstringError:
pass
def netstring_reader(f):
while True:
n = ""
while True:
c = f.read(1)
if c == '':
return
if c == ':':
break
if len(n) > 10:
raise MalformedNetstringError
if c == '0' and n == '':
# We can't allow leading zeros.
if f.read(1) != ':':
raise MalformedNetstringError
n = c
break
n += c
n = int(n, 10)
payload = f.read(n)
if len(payload) < n:
return
if f.read(1) != ',':
raise MalformedNetstringError
yield payload
netstring_reader() takes a file object and returns a generator that returns each netstring extracted from the file object. If the stream is malformed, a MalformedNetstringError is raised.
Currently the diagnostics are far from as good as they ought to be. It should state whether it’s a malformed length or a missing terminating comma, and the offset into the stream where the error occurred. There’s a few places where it can be made more robust, but expect that in later releases.
Here’s an example of it in use.
f = open("sample.dat", "r")
try:
for i in netstring_reader(f):
print i
finally:
f.close()
May 13, 2008 at 1:55PM tkdiff patches: BZR support and improved documentation formatting
Two quick hacks to tkdiff 4.1.4.
The first patch is basic Bazaar NG support. I’ve grown so accustomed to using tkdiff for viewing file diffs with subversion that I miss it when I’m using Bazaar NG. This patch adds it, but there’s still an edge case when you’re dealing with new files, not that it should matter terribly.
The second patch improves the documentation formatting, prettying it up a bit. Not an important patch, but still it’s worth committing, I think.
May 13, 2008 at 1:40PM Iron and Wine - Boy with a Coin
And on the topic of things iron...
May 13, 2008 at 1:19PM Why you should wait in the cinema to watch the Iron Man credits
Seriously, could this possibly get any more awesome? I don’t think so!
[via Rick O’Shea, via an IM from Elly]
May 12, 2008 at 11:57AM Ideas from odd places: a reasonable RPC envelope protocol, and asynchronous calls
Warning: This post is an extended geeky joke. If you’re not a geek, you
definitely won’t get the point. If you are a geek, you still probably won’t
get the point, but believe me, you’re better for it. ![]()
Sometimes something you do as a joke can yield interesting results.
About a year or so ago, I was feeling bored and decided to see how much of a diet I could put XML-RPC on as you do. Though I managed to slim it down quite a bit and produce a RELAX NG schema for the resulting format, it was all just a diversion and I only did it for fun. I let all of this sit on my harddrive festering away for ages until I stumbled across it again and decided to fiddle with the idea of a HTTP-derived joke protocol designed specifically for tunnelling the likes of XML-RPC.
SRGP, a HTTP pod-person for RPC
I called the protocol SRGP, or Simplified RPC Gateway Protocol, and it was intended to be a pod-person equivalent of HTTP. It looked like HTTP, behaved a lot like HTTP, but there’s something wrong about it.
It had a Winer-esque specification that didn’t go to much effort to rigourously outline the workings of the protocol, and made lots of undocumented assumptions, all of which were meant to lead to subtly incompatible client and server implementations. Because it’s a pod-person protocol, it could be implemented on top of a HTTP server with little effort.
It was too late for April Fool’s Day, so I never published the spec, but the process of writing it did lead me down an interesting path.
An overview of SRGP
SRGP looks like HTTP, however it’s subtly different, though not quite incompatible. The semantics of GET and POST change slightly. A GET request on an endpoint returns a service document or IDL description for that endpoint describing what it exposes. A POST request is used for sending a payload representing a remote procedure call to the endpoint.
Requests and responses are of the same form as their HTTP equivalents, but the
number of headers SRGP has been trimmed down. It still understands
Content-Type, Accept-Encoding, Content-Encoding, User-Agent, Server,
Connection, Host, Location, Content-Length, Retry-After, and Date.
These headers mean more or less the same thing as their HTTP equivalents.
It also introduces two header elements of its own: Reference and Token, but
I’ll get to those in a tick.
The response status codes are trimmed down. Their semantics are close to the HTTP originals:
200 OK 202 Accepted 301 Moved Permanently 400 Bad Request 403 Forbidden 404 Bad Endpoint 405 Bad Request Type 500 Internal Server Error 503 Service Unavailable 505 Protocol Version Not Supported
200 implies that the request was successfully dispatched, but not
necessarily that it was successfully processed. After all, it can contain a
fault response. The headers accompanying it are Content-Type and Date, and
if they were provided with the request, Reference and Token will be
returned with the response too. The response might also include the
Content-Encoding, Server, and Connection responses.
202 implies that the request has been accepted but not yet processed, and the
body may contain a plaintext explanation of why. Other than that, it’s just
like a 200 response, but will always be accompanied by a Reference header so
that the request can be retried at a later point. If none accompanied the
request, the server generates its own. The response also contains a
Retry-After header so the client knows the polling interval to use to retry
the request to check if it’s been processed.
The point behind the Reference header is to guarantee call idempotency.
Lack of the header implies that the client doesn’t particularly care if the is
serviced one or many times, just as long as it get serviced. The presence of
the Reference header in a request is the client’s way of saying that it wants
the call to be serviced once and one only, and that any number of calls with
the same reference should be treated as a single call. To this extent, it’s a
bit like a HTTP client including an ETag in a POST request. The server should
maintain reference affinity with the client that generated the reference in the
first place, so that so clients can generate the same reference without the
collision causing security and other problems. How affinity might work, I’m not
sure.
Considering I’ve covered Reference, I’d might as well talk about Token. If
Cookie is the state management equivalent of somebody with half their brain
removed, Token is the equivalent of just leaving behind the brain stem. This
header, when used in a request, can contain one of two things: ‘new’, or
‘v=”token”’, where token is a string that acts as a session identifier.
Token: new is the client’s way of saying they want to start a new session.
The server responds with Token: v=”token”, and the client sends that back
with each request that’s meant to be part of the session. Session state is
unique to individual endpoints, not to the host. A session times out after 30
minutes without a request from the client, but other than that, it’s up to the
server to decide how it manages session state. References are not tied to
sessions, nor the reverse. It’s nicely stupid. Token is a whoopee cushion and
a server or client may not even implement it.
301, 400, 403, 404, 405, 500, 503, and 505 all do what you
might guess from familiarity with HTTP. 503 is notable though, so I think
it’s worth mentioning here. It means that the gateway server or endpoint is
temporarily down, so the client should try again later. The body can contain a
plaintext explanation of why it’s down, and the response will include a
Retry-After header so the client knows what amount of time to retry the
original request after.
The interesting diversion
The existence of SRGP is rather silly, though not entirely impractical. An implementation of it would be simpler than an implementation of HTTP, and it’s somewhat more useful than tunnelling RPC over POST.
It does include at least two interesting and connected ideas: the Reference
header and the repurposing of the 202 response code. The existence of these
makes it possible for a server to process calls asynchronously, which is
something tunnelling RPC over POST doesn’t let you do.
But now that calls can be processed asynchronously by the server, what about
the client? To get a response, it needs to poll the server for the response.
POST can be used, but it’s not a great fit, we’ll introduce a new verb,
POLL, which is to POST in SRGP as HEAD is to GET in HTTP. The
difference is that POLL is always accompanied by a Reference header, and
never has a body. This allows the client to poll for an asynchronous response
without the overhead of a possibly expensive to calculate or large request
body. Other than that, it’s behaviour is identical to POST.
It’s sometimes better for both the server and client to instead have the server
notify the client that the request’s been processed. With that in mind, I
decided to introduce one more verb and a header: NOTIFY and Notify
respectively.
A POST that includes a Notify header must always include a Reference
header. The presence of this header states that if the call is processed
asynchronously, the server will send a NOTIFY request to the endpoint
specified by the Notify header containing the response. The server indicates
its understanding of the Notify header by repeating the Notify header in
the 202 response headers verbatim. Neither clients nor servers can expect
their opposite to implement Notify support and should fall back on polling if
unimplemented.
When the server has processed the asynchronous request, it sends a NOTIFY
request to the endpoint specified in the original request. The request contains
the same headers and content as a polled 200 response would. The only valid
2xx code a notification endpoint can respond with is 200. 503 and 301
responses are processed as you’d expect. A 400 response to valid request, or
a 403, 404, 405, 505 response should be handled by an angry phonecall
or email to the entity who made the original request to fix their software. A
500 response should be handled by a polite phonecall or email asking them to
fix their software.
So, what’s interesting about this?
With very little effort, we’ve managed to add a pretty flexible and, more to the point, useful asynchronous call mechanism on top of SRGP.
Background: my XML-RPC reworking
Back to the XML-RPC reworking. It’s been done many times before, but about a year ago I decided to create a schema that mapped almost one-to-one onto that of XML-RPC, but without the bloat. I did this for fun and certainly never expected that anybody would ever seriously consider implementing anything to process it.
Here’s the sample response to a metaWeblog.getPost call:
<?xml version="1.0"?>
<methodResponse>
<params>
<param>
<value><struct>
<member>
<name>categories</name>
<value>
<array>
<data>
<value>Michegas</value>
<value>Mind Bombs</value>
<value>Rest & Relaxation</value>
<value>Two-Way-Web</value>
</data>
</array>
</value>
</member>
<member>
<name>dateCreated</name>
<value>
<dateTime.iso8601>20030729T10:59:48</dateTime.iso8601>
</value>
</member>
<member>
<name>description</name>
<value>Blogger Ed Cone of Greensboro talks about the several
intersections he overlooks.&nbsp; That is: junctions
of the public and the personal (which every blogger faces)
and more particularly the contrasting voices of a
newspaper columnist and a blogger (he is both) and the
opportunities for a local conversation in a global medium.</value>
</member>
<member>
<name>enclosure</name>
<value>
<struct>
<member>
<name>length</name>
<value>
<i4>11421281</i4>
</value>
</member>
<member>
<name>type</name>
<value>audio/mpeg</value>
</member>
<member>
<name>url</name>
<value>http://media.skybuilders.com/lydon/cone.mp3</value>
</member>
</struct>
</value>
</member>
<member>
<name>link</name>
<value>http://blogs.law.harvard.edu/lydon/2003/07/18#a187</value>
</member>
<member>
<name>permaLink</name>
<value>http://radio.weblogs.com/0001015/2003/07/29.html#a1829</value>
</member>
<member>
<name>postid</name>
<value>
<i4>1829</i4>
</value>
</member>
<member>
<name>title</name>
<value>Chris Lydon interview with Ed Cone</value>
</member>
<member>
<name>userid</name>
<value>
<i4>1015</i4>
</value>
</member>
</struct></value>
</param>
</params>
</methodResponse>
The more I look at an XML-RPC request or response, the more it looks to me like an elaborate practical joke.
Here’s that same response in my trimmed down schema:
<?xml version="1.0"?>
<response>
<map>
<array key="categories">
<string>Michegas</string>
<string>Mind Bombs</string>
<string>Rest & Relaxation</string>
<string>Two-Way-Web</string>
</array>
<date key="dateAdded">2003-07-29T10:59:48-05:00</date>
<string key="description">
Blogger Ed Cone of Greensboro talks about the several
intersections he overlooks.&nbsp; That is: junctions
of the public and the personal (which every blogger faces)
and more particularly the contrasting voices of a
newspaper columnist and a blogger (he is both) and the
opportunities for a local conversation in a global medium.
</string>
<map key="enclosure">
<int key="length">11421281</int>
<string key="type">audio/mpeg</string>
<string key="url">
http://media.skybuilders.com/lydon/cone.mp3
</string>
</map>
<string key="link">
http://blogs.law.harvard.edu/lydon/2003/07/18#a187
</string>
<string key="permaLink">
http://radio.weblogs.com/0001015/2003/07/29.html#a1829
</string>
<string key="postid">1829</string>
<string key="title">Chris Lydon interview with Ed Cone</string>
<string key="userid">1015</string>
</map>
</response>
There’s no way anybody can convince me that’s not better than the original.
As you can see, it didn’t take an awful lot of effort on my part to strip out
all the junk to give something a lot clearer and more compact than what Dave
Winer came up with. All it took was removing the redundant <value/>,
<params/>, <param/>, and <data/> tags; not being afraid of attributes and
using them as the were intended, which allowed me to get rid of the
<member/>, <methodName/>, <member>, and <name/> elements, and trim down
<fault/> considerably; I trimmed down the names, getting rid of the redundant
<i4/> and changing <datetime.iso8601/> to <date/> because it’s more
obvious and doesn’t include a pointless description of its encoding,
<double/> to <float/> as that’s more familiar to the average person,
<base64/> to <binary/> because that describes what it’s for rather than how
it’s encoded, and <struct/> to <map/> because struct has the implication
of ordering, whereas map doesn’t.
Here’s the RELAX-NG compact schema:
default namespace = "http://talideon.com/projects/schemas/xpc/v1/"
start = xpcCall | xpcResponse | xpcFault
xpcMethodName = xsd:string {
pattern = "[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)*"
}
xpcCall = element call {
attribute method { xpcMethodName },
xpcType*
}
xpcResponse = element response {
xpcType?
}
xpcFault = element fault {
attribute code { xsd:integer },
text
}
xpcType =
element nil { empty } |
element int { xsd:integer } |
element boolean { xsd:boolean } |
element string { text } |
element float { xsd:double } |
element date { xsd:dateTime } |
element binary { xsd:base64Binary } |
element array { xpcType* } |
element map { xpcMType* }
xpcKey = attribute key { text }
xpcMType =
element nil { xpcKey, empty } |
element int { xpcKey, xsd:integer } |
element boolean { xpcKey, xsd:boolean } |
element string { xpcKey, text } |
element float { xpcKey, xsd:double } |
element date { xpcKey, xsd:dateTime } |
element binary { xpcKey, xsd:base64Binary } |
element array { xpcKey, xpcType* } |
element map { xpcKey, xpcMType* }
The type system’s extended slightly to include <nil/>, which represents the
lack of a value. This is already a common extension. Method names now have a
canonical form similar to that of method name in all the major languages though
this is something I’m not terribly attached to. Because it has a schema, it’s
rigourously defined, so a spec for this RPC system would never have required
the Q&A section that accompanies the XML-RPC spec. Oh, and dates now require a
timezone.
Update: For reference’s sake, there’s already a reworking of XML-RPC called XPC, which goes even further by combining all the scalar types together, so the mapping is one way. I don’t mind annotating the types, but I wanted to minimise the waste markup. I considered batch calls, but just never bothered.
May 6, 2008 at 2:50PM EPP domain nameserver update suckage
One of the things I really dislike about EPP is the demented method its domain mapping uses for updating the nameservers associated with a domain.
Rather than just specifying the nameservers directly, it requires you to specify the ones you want to remove from the current list of nameservers associated, and the ones to add. I imagine that when this ‘feature’ was added, the excuse used was update efficiency, but considering that it requires the caller to have a pretty freshly minted list of the domain’s nameservers, which requires a fairly recent <domain:info/> call to avoid a malformed update, I really can’t see what possessed them to do things this way. That’s an extra call from a registrar’s EPP backend to the registry’s EPP server, which might fail, or might be slow, and may introduce a race condition where the fetch and update both need to be retried because somebody’s updated the nameservers in the mean time.
You have to wonder, really.
May 6, 2008 at 1:32PM Oh, Jeebus! My head’s about to explode!
Ignorant of the settings Michele had set up on the new espresso machine in work, I accidentally ended up making myself a quadruple espresso. Tasty and all as it was, I’m hurting now!
