Wednesday, April 21, 2010

Modifying the HTTP header parser in Lusca

I've been slowly working towards a variety of medium-term goals in Lusca. I've resisted committing various bits of partially finished work partly because they're works in progress but partially because I'm not happy with how the code fits together.

One of these areas is the HTTP header parser and management routines. Among other things, the main issues I have with the parser and management code is listed below.
  • Each header entry is represented by separate strings in memory;
  • Each header entry has a small, separately allocated object (HttpHeaderEntry), one per header
  • Parsing the header entries uses various stdio routines to iterate over characters, and these may be implemented slower (to handle unicode/wide/UTF/locale support) than what's needed here (7-bit ASCII);
  • There's some sanity checks in the header parser - specifically, duplicate content-length - which is likely better once the headers have been parsed.
I've been working on the first two items in separate branches. One converts the HttpHeaderEntry items into a single allocated array, which is grown if needed. Another takes the current String API and turns it into fully reference-counted strings. Both of these work fine for me. But shoe-horning it into the current HTTP parser code - which expects individually allocated/created HttpHeaderEntry items which it can destroy on a whim before they're considered a part of the Http Header set - is overly hackish and prone to introduce bugs.

It's taking me quite a bit of time to slowly change the HTTP parser code to be ready for the new management code. Well, it's taken me about 6 months to slowly modify it in a way that doesn't require rewriting everything and potentially changing expected behaviour and/or introduce subtle bugs.

The upshoot? Things take time, but the code hopefully will be tidier, cleaner and easier to understand. Oh, and won't include bugs.

No comments:

Post a Comment