Cixar

XML/RSS

Categories:

/ (117)
  art/ (4)
    tale/ (1)
  bookmark/ (2)
  langlubber/ (4)
  movies/ (2)
  music/ (1)
    garageband/ (2)
  photo/ (1)
  politics/ (1)
  program/ (28)
    cli/ (1)
    javascript/ (13)
      chiron/ (5)
    python/ (6)
    swil/ (2)
    tale/ (22)
  reading/ (4)
  tale/ (24)
  writing/ (2)

Archives:

2008-Oct
2008-Sep
2008-Aug
2008-May
2008-Apr
2008-Mar
2008-Feb
2008-Jan
2007-Jun
2007-May
2007-Apr
2007-Mar
2007-Feb
2007-Jan
2006-Oct
2006-Sep
2006-Aug
2006-Jun
2006-May
2006-Apr
2006-Mar
2006-Feb
2006-Jan
2005-Dec
2005-Nov
2005-Oct
2005-Sep
2005-Aug
2005-Jul
2005-Jun
2005-May
2005-Apr
2005-Mar


The Sourcerer

by Kris Kowal.

Sun, 05 Oct 2008

JSON Basics

JSON is a strict subset of JavaScript, particularly a subset of its object-literal notation, discovered and specified by Doug Crockford. The subset is sufficient to make the creation of parsers and formatters in various languages nearly trivial, while completely trivializing the process of parsing the notation in a web browser.

Object-literals in JavaScript are very similar to their analogs in many other languages, like Perl, Python, and even AppleScript. The notation provisions text strings, numbers, booleans, and the literal null for all elemental (scalar) data. Then the notation provides Arrays for ordered values and Objects for unordered, unique, string-to-anything key-value-pair mappings. With these types, you can express most hierarchical data easily.

10
3.1415
"a"
true
false
null
[1, 2, 3]
{"a": 10, "b": 20}
[{"a": 10}, {"a": 20}]

JSON makes a couple simplifications even on JavaScript's object literal grammar. These make it even easier to write parsers and formatters.

  1. a JSON expression contains no new-line characters.
  2. all strings, including keys in objects, are enquoted between double quotes.

JSON joins the ranks of many other markup languages that we can use to easily transfer data among web servers, but its primary function is as a common data interchange language between web browser clients and web servers hosted in the numerous languages of the web. JSON is well adapted for this space because it can take advantage, through various channels, of every web browser's fast, built-in JavaScript interpreter.

There are two flavors of JSON services with subtle differences for both clients and servers. One is JSON proper, which I will call XHR JSON because it uses an XML HTTP Request. The other is called JSONP or "JSON with Padding". You would use one, the other, or neither based on security and performance concerns.

XHR JSON

XHR JSON uses a feature that exists in one form or another in all modern web-browsers, called an XML HTTP Request, and a function in JavaScript that lets you evaluate arbitrary text strings as JavaScript programs, called eval. With an XML HTTP Request, the client can make an HTTP Request to any URL on the same domain as the hosting page. This constraint is called the "Same Origin Policy". Unfortunately, the policy is in many cases not sufficient to isolate vulnerability to paths of a particular domain, nor to prevent cross site scripts from using the data (more on that later). Whether the security mechanism is effective or not is a longer and later discussion. The point being that this mechanism is the crux of your choice between XHR JSON and JSONP.

With an XHR, you request plain text from a URL on the same domain as your page, then you evaluate it as a JavaScript program. Performing a cross-browser compatible XML HTTP Request isn't trivial in itself, so let's assume that I'm using a library that provides an xhr function that synchronously (blocks) until an HTTP request is complete and returns the content of the HTTP response as a String. There are other variations on xhr for asynchronous requests and grabbing XML.

var text = xhr("/some.json");
var json = eval(text);

In this case, the "/some.json" contains unadulterated JSON. However, because your JSON is likely to be an Object like {"a": 10}, we need to make a special arrangement. A JavaScript program that is merely an Object literal would be mis-parsed initially as a code block. For this reason, we must force the JavaScript parser into expression context instead of statement context. For this reason, we wrap the JSON string in parentheses or assign it to a variable.

var text = xhr("/some.json");
var json = eval("(" + text + ")");

I prefer parentheses because I haven't, in my fallible memory, encountered a browser that supported XHR but wasn't standards compliant enough for the eval function to return the value of the last evaluated expression.

When you use an XHR to grab JSON, you are vulnerable to the host, depending on it to not give you an alternate JavaScript program that insidiously evaluates to the same data. For that reason, a lot of JSON libraries engage in the slower and dubious attempt to validate the JSON before evaluating it with a regular expression. The bottom line is that you're vulnerable to your server when you use XHR and eval.

That being said, it's better to get an exception early than an untraceable error later. For that reason, attempting to validate JSON is a good idea. There's another one: variable laundering. The eval function inherits the calling context's scope chain. This means that the server can read and alter any variables on your scope chain. I use an evalGlobal routine to launder my scope chain. This doesn't give security all by itself, but it could help lead to a secure system down the way and can turn a bunch of silent name resolution errors or data leaks into thrown NameErrors.

(function (evalGlobal) {
	var text = xhr("/some.json");
	validateJson(text);
	var json = evalGlobal("(" + text + ")");
	...
})(function () {
	return eval(arguments[0]);
})

I'll give a detailed explanation of evalGlobal in another article. For now, suffice it to say, this is much more elegant than using a variable to capture the JSON value, although it is possible:

var text = xhr("/some.json");
var json;
eval("json = " + text);

JSONP

JSONP is different than XHR JSON in both the way the server hosts the data and in the way that the client consumes it. On the client side, the user adds a <script> tag to their own page. The source of the script is the URL of a server side program with the name of a callback function that the client places in global scope sent as a query argument.

<script src="http://other.com/some.jsonp?callback=foo">

The client script arranges to receive parsed and evaluated JSON data asynchronously by adding a foo method to the global scope, the window object.

window.foo = function (json) {
	...
	delete window.foo;
};
var script = document.createElement('script');
script.src = "http://other.com/some.jsonp?callback=foo";
document.getElementsByTagName('head').appendChild(script);

When you add a script tag to your document, browsers are kind enough to notice and automatically send an HTTP request to fetch the request JavaScript program. In this case, your page trusts the script on other.com to return a JavaScript program that will call your "foo" function and pass it some JSON data.

foo({"a": 10});

With JSONP, the client is vulnerable to the remote server because it can opt to write an arbitrary JavaScript program before calling foo, if at all. So, you should only use JSONP to request data from domains that you trust.

Also JSONP is slower and less responsive to errors than XHR JSON can be. The only signal that you receive as to the progress or status of a JSONP request is whether your callback gets called in a timely fashion. XHR JSON is preferable in all situations where it is possible and it may even make sense to create a proxy to the other domain from your same domain server. Using a proxy also gives you an opportunity to validate the JSON on the server-side, where computation time is cheap.

Same Origin XHR JSON

There's another trick with XHR JSON, and I'm not sure what scenarios require it. Some people can use the JSONP technique to fetch JSON from a foreign domain. These folks can't send HTTP headers or cookies, and they never get an opportunity to alter the text of the HTTP request before it's evaluated as JavaScript in their browser. I would think that it would be sufficient to prevent an other domain client from intercepting sensitive data for the server to require an authenticated token in the HTTP request, as can be provided by an XML HTTP Request but not a JSONP request. However, some crackers can attempt to get data from your service by using JSON data in a cross site script, much like JSONP. However, instead of using a callback, these crackers arrange to override the Array or Object constructors and thus can monitor and transmit snippets of JSON constructed by simply evaluating JavaScript object literal notation. This technique can be prevented by padding your XHR JSON service with an infinite loop.

while (1);
{"a": 10}

In this case, your client conspires with the server, and since it does get a chance to intercept and modify the text of the JSON, it strips the known number of characters for the while loop before evaluating it.

var text = xhr("/some.json");
var json = evalGlobal("(" + text.substring("while (1);".length) + ")");

What baffles me is that, unless you've secured your JSON with an authentication token, any server capable of making HTTP requests and parsing JSON could do the same thing. Malicious clients are not required to use web browsers. The only situation where this might occur would be if the cracker had compromised your client already, had your authentication token, and needed your browser to make a cross domain JSONP request on its behalf. That could not be the case since the server would have no reason to give an XHR JSON authentication token to a client that could only fetch data with JSONP. That point is even moot since, going back, the cracker has already compromised your client and might as well use XHR JSON with all the same rights as you on even the same origin.

That being said, I noticed that GMail used this technique, so I assume there's substance to it. Perhaps someone will clarify in the comments.

this entry was posted on Sun, 05 Oct 2008 at 00:38 in

Fri, 03 Oct 2008

Designing Django's Object-Relational-Model - The Python Saga - Part 6

Django is a web application framework in the Python language. One of the advantages that Django has over other libraries is that it was written and designed by Python experts. That is to say, they knew about variadic arguments, properties, and metaclasses. Furthermore, they knew how to cleverly use these ideas to sweep a lot of complexity under the hood (or bonnet if you will; some of them appear to be British) so that common developers, or uncommonly good developers who want to think about other things most of the time, can gracefully suspend disbelief that anything complicated is going on when they design their database in pure Python. This article will illustrate how Django uses metaclasses and properties to present an abstraction layer where you can specify a database schema with Python classes. For simplicity, the "database" backend will be plain Python primitive objects—tables will be dictionaries of dictionaries.

In the end, we want to be able to write code that looks a whole lot like it's using Django's Object-Relational-Model:

class Cow(Model):
	id = PrimaryKey()
	name = ModelProperty()

cow = Cow(id = 0, name = 'Moolius')
cow.save()

cow = Cow.objects.get(0)

The easiest part (for the purpose of this exercise) is Django's concept of an "object manager". In Django, every model has an object manager that provides a query API and, depending on the backend, might cache instances of Model objects. Conveniently, a very narrow subset of the object manager API is almost exactly the same as a dictionary. Conceptually, the object manager boils down to a dictionary proxy for the database where you can use the get function to retrieve records from the database. For simplicity, our ObjectManager is just going to be a dictionary.

class ObjectManager(dict):
	pass

Beyond the scope of this article, the ObjectManager should be handy for grabbing lots of objects from the database at once. Django provides a very thorough and relatively well-optimized lazy query system with its object managers. The ObjectManager has get, and filter methods which, instead of simply accepting the primary key, accept keyword arguments that translate to predicate logic rules. In particular, the filter function is lazy, so you can chain filter commands to construct complex queries and Django only goes to the database once.

While it would be super-cool to model all of this with native Python, it actually is a lot of code, so that's a topic for maybe later. We'll just use the built in dict.get.

We'll also need all of the code from Part 5 since models will be another application of the ordered property pattern. This is how Django creates SQL tables with fields in the same order as the Python properties.

from ordered_class import \
	OrderedMetaclass,\
	OrderedClass,\
	OrderedProperty

We use the OrderedMetaclass to make a ModelMetaclass. The model metaclass will have all the same responsibilities as our StructMetaclass, including "dubbing" the properties so that they know their own names. The model metaclass will also create an ObjectManager for the class. This isn't the complete ModelMetaclass; we'll come back to it.

class ModelMetaclass(OrderedMetaclass):
	def __init__(self, name, bases, attys):
		super(ModelMetaclass, self).__init__(name, bases, attys)
		if '_abstract' not in attys:
			self.objects = ObjectManager()
			for name, property in self._ordered_properties:
				property.dub(name, self)

The next step is to create a ModelProperty base class. This class will be an OrderedProperty so it's sortable. It will also need to implement the dub method so it can figure out its name. Other than that, it'll be just like the StructProperty from the previous section: it will get and set its corresponding item in the given object.

class ModelProperty(OrderedProperty):
	def __get__(self, objekt, klass):
		return objekt[self.item_name]
	def __set__(self, objekt, value):
		objekt[self.item_name] = value
	def dub(self, name):
		self.item_name = name
		return self

There is a distinction in the refinement of ModelProperty from StructProperty: ModelProperty objects will eventually need to distinguish the value stored in the dictionary from the value returned when you access an attribute. In the primitive case, they're the same, but for ForeignKey objects, down the road, you'll store the primary key for the foreign model instead of the actual object. This is the same as the behavior in an underlying database backend.

class ModelProperty(OrderedProperty):
	def __get__(self, objekt, klass):
		return objekt[self.item_name]
	def __set__(self, objekt, value):
		objekt[self.item_name] = value
	def dub(self, name):
		self.attr_name = name
		self.item_name = name
		return self

Let's consider a PrimaryKey ModelProperty. The purpose of a PrimaryKey is to designate a property of a model that will be used as the index in its object manager dictionary. In Django, this can be an implicit id field at the beginning of the table. For simplicity in this exercise, we'll require every model to explicitly declare a PrimaryKey. The ModelMetaclass will identify which of its ordered properties is the primary key by observing its type. Other than their type, a primary key's behavior is the same as a normal ModelProperty, so it's a really easy declaration:

class PrimaryKey(ModelProperty):
	pass

Now we can go back to our ModelMetaclass and add the code we need for every class to know the name of its primary key. I create a list of PrimaryKey objects from my _ordered_properties and pop off the last one, leaving error checking as an exercise for a more rigorous implementation. There should be only one primary key.

class ModelMetaclass(OrderedMetaclass):
	def __init__(self, name, bases, attys):
		super(ModelMetaclass, self).__init__(name, bases, attys)
		if '_abstract' not in attys:
			self.objects = ObjectManager()
			for name, property in self._ordered_properties:
				property.dub(name)
			self._pk_name = [
				name
				for name, property in self._ordered_properties
				if isinstance(property, PrimaryKey)
			].pop()

Now all we need is a Model base class. The model base class will just be a dictionary with the model metaclass and a note that it's abstract: that is, it does not have properties so the metaclass better not treat it as a normal model.

class Model(OrderedClass, dict):
	__metaclass__ = ModelMetaclass
	_abstract = True

The model will also have a special pk attribute for accessing the primary key and a save method for committing a model to the ObjectManager.

class Model(OrderedClass, dict):
	__metaclass__ = ModelMetaclass
	_abstract = True

	def save(self):
		self.objects[self.pk] = self

	@property
	def pk(self):
		return getattr(self, self._pk_name)

Now we have all the pieces we need to begin using the API. Let's look at that cow model.

class Cow(Model):
	id = PrimaryKey()
	name = ModelProperty()

cow = Cow(id = 0, name = 'Moolius')
cow.save()

cow = Cow.objects.get(0)

All of this works now. You make a cow model; that invokes the model metaclass that sets up Cow._pk_name to be "id" and tacks on a Cow.objects object manager. Then we make a cow and put it in Cow.objects with the save method. This is analogous to committing it to the database backend. From that point, we can use the object manager to retrieve it again.

We can refine the Model base class to take advantage of the fact that it's not just a dictionary anymore: it's an ordered dictionary. We create a better __init__ method that will let us assign the attributes of our Cow either positionally or with keywords. That makes our cow more like a hybrid of a list and a dictionary. Also, since our model instances aren't merely dictionaries, we create a new __repr__ method that will note that cows are cows and moose are moooose. The new __repr__ method also takes the liberty to write the items in the order in which their properties were declared.

class Model(OrderedClass, dict):

	…

	def __init__(self, *values, **kws):
		super(Model, self).__init__()
		found = set()
		for (name, property), value in zip(
			self._ordered_properties,
			values,
		):
			setattr(self, name, value)
			found.add(name)
		for name, value in kws.items():
			if name in found:
				raise TypeError("Multiple values for argument %s." % repr(name))
			setattr(self, name, value)

	…

	def __repr__(self):
		return '<%s %s>' % (
			self.__class__.__name__,
			" ".join(
				"%s:%s" % (
					property.item_name,
					repr(self[property.item_name])
				)
				for name, property in self._ordered_properties
			)
		)

Now we can make a cow model with positional and keyword arguments, and print it out nice and fancy-like:

>>> Cow(0, name = 'Moolius')
<Cow id:0 name:"Moolius">

The next step is to introduce ForeignKey model properties. These are properties that will refer, via a relation on a primary key, to an object in another model. So, the ForeignKey class will accept a Model for the foreign model. Its dub method will override the item_name (preserving the attr_name) provided by it's super-class's dub method. The new item_name with be the attr_name and the name of the primary key from the foreign table, delimited by an underbar. This will let the foreign key property hide the fact that it does not contain a direct reference to the foreign object; it just keeps the foreign object's primary key. However, if you access the foreign key property on a model instance, it will go off and diligently fetch the corresponding model instance. If you assign to the foreign key property, it'll tolerate either a primary key or an actual instance.

class ForeignKey(ModelProperty):
	def __init__(self, model, *args, **kws):
		super(ForeignKey, self).__init__(*args, **kws)
		self.foreign_model = model
	def __get__(self, objekt, klass):
		return self.foreign_model.objects.get(objekt[self.item_name])
	def __set__(self, objekt, value):
		if isinstance(value, self.foreign_model):
			objekt[self.item_name] = value.pk
		else:
			objekt[self.item_name] = value
	def dub(self, name):
		super(ForeignKey, self).dub(name)
		self.item_name = '%s_%s' % (
			name,
			self.foreign_model._pk_name,
		)

Now we can write code with more than one model using relationships. Let's give our cow a bell.

class Bell(Model):
	id = PrimaryKey()

class Cow(Model):
	id = PrimaryKey()
	name = ModelProperty()
	bell = ForeignKey(Bell)

bell = Bell(0)
bell.save()
cow = Cow(0, 'Moolius', bell)
cow.save()

Note that you must save the bell so that when you construct the cow, it can fetch the bell from Bell.objects.

There's more to Django's ORM, of course. This article doesn't cover parsing and validation, which are both assisted by the ORM. Nor does it cover queries, query sets, the related_name for ForeignKey properties on foreign models, Django's ability to use strings for forward references to models that have not yet been declared, or many of the other really neat features.

What this article does cover though, is that you can create a powerful abstraction of a proxied database with pure-Python in less than 200 lines of code. This means that you could create a light-weight proxy over HTTP to a Django database that exposes itself with a REST API. You could also create an abstraction layer that would allow you to pump Django ORM duck-types back into Django to use pure Python objects in addition to or in stead of a database backend.

But, if this article does nothing else, I hope it communicates that Django is cool. I have read a whole lot of code from every dark corner of the web and I have liked very little of it; people I've worked with will testify that I've regularly "hated on" every library or framework I've ever seen. I've never met Simon Willson and the growing developer community around Django. However, I've read their code and now I can tell you, over the course of several articles, that they're really smart and you should use their code.

this entry was posted on Fri, 03 Oct 2008 at 15:13 in /python

Wed, 01 Oct 2008

Ordered Properties - The Python Saga - Part 5

In C and SQL, structs and records have properties that appear in a particular order. In Python, objects use hash-tables to store their attributes, so the order is not deterministic nor relevant. That's no consolation if you're trying to model C structs or SQL tables in Python though. Reading though the Django code, I discovered that those wily coders had synthesized a technique that combines the virtues of properties and metaclasses that we can generally model the order in which fields of a struct or SQL table appear.

The trick is to provide an API that allows your users to write code like:

from time import gmtime
from cstruct import Struct, IntegerProperty

class EpochTime(Struct):
	seconds = IntegerProperty()
	microseconds = IntegerProperty()

epoch_time = EpochTime()
timestruct = gmtime(epoch_time.seconds)

In this case, the Struct type cares about the size, order, and disposition of the C structure it models so that it can unpack a byte buffer presumably retrieved from a C-type library.

There are two components to the general solution. First, there's an OrderedProperty base type. Ordered properties track the order in which they were initialized. For this purpose, ordered properties use a global counter. The absolute value of a property's creation counter is irrelevant. The only requirement is that they monotonically increase as each property is declared, and that classes declared concurrently do not interfere.

from itertools import count
next_counter = count().next

I learned about itertools from Brett Cannon who was preparing his thesis defense when I was at Cal Poly. In another piece of code, which I would cite if I could recall, I learned the trick of using the itertools.count function to atomize a global counter. It takes advantage of Python's dubious global interpreter lock.

Then the OrderedProperty base type just stores a creation counter upon initialization. Keep in mind that all derived types must trickle their __init__ call—your base types are not always what they seem and there are use cases you can not foresee where you might be required to receive and pass your initialization arguments to an unknown super-type. That's a side-effect of working in a language with mix-ins, and it's a "good thing".

class OrderedProperty(object):
	def __init__(self, *args, **kws):
		self._creation_counter = next_counter()
		super(OrderedProperty, self).__init__(*args, **kws)

The next part of the puzzle is inspecting your property order. We use a metaclass to note when new types are created and we give them an _ordered_properties attribute with the ordered-property items in their attributes. Keep in mind that base types might have properties too. __mro__ is an attribute of all classes that is a linearized list of a classes inheritance hierarchy. It's effectively the result of a dynamic topological sort algorithm for the closure of your base types. We traverse it backwards so that if you create a dictionary via dict(_ordered_properties), name collisions are resolved chronologically.

class OrderedMetaclass(type):
	def __init__(self, name, bases, attys):
		super(OrderedMetaclass, self).__init__(name, bases, attys)
		self._ordered_properties = sorted(
			(
				(name, value)
				for base in reversed(self.__mro__)
				for name, value in base.__dict__.items()
				if isinstance(value, OrderedProperty)
			),
			key = lambda (name, property): property._creation_counter,
		)

Then, we create our OrderedClass that we can inherit to get free _ordered_properties attributes.

class OrderedClass(object):
	__metaclass__ = OrderedMetaclass

Let's see it in action:

class Foo(OrderedClass):
	bar = OrderedProperty()
	baz = OrderedProperty()
Foo._ordered_properties == [
	('bar', <Ordered Property instance somewhere>),
	('baz', <Ordered Property instance somewhere>),
]

With a small modification, we can track ordered inner classes too.

class OrderedMetaclass(type):
	def __init__(self, name, bases, attys):
		super(OrderedMetaclass, self).__init__(name, bases, attys)
		self._creation_counter = next_counter()
		self._ordered_properties = sorted(
			(
				(name, value)
				for base in reversed(self.__mro__)
				for name, value in base.__dict__.items()
				if isinstance(value, OrderedProperty)
				or isinstance(value, OrderedMetaclass)
			),
			key = lambda (name, property): property._creation_counter,
		)

So, we put all of those order classes in our generic ordered properties module. Now we can delve into our particular Struct example.

First, we need field types. We will need a base type and derivative types for all of the field types we want to model from C, like IntegerField. Bear in mind that OrderedProperty is just a mix-in; it doesn't actually define __get__ and its ilk because those are all specific to the derivative implementation. All struct fields are going to store their actual values in a special _value dictionary on their corresponding Struct instance.

class StructProperty(OrderedProperty):
	def __get__(self, objekt, klass):
		return objekt._values[self.name]
	def __set__(self, objekt, value):
		objekt._values[self.name] = value
	def dub(self, name):
		self.name = name
		return self

You'll notice that instances of StructProperty need to know their name, the key in their corresponding instance's _values dictionary. The struct property instances don't get this from their initializer. Instead, each property will get "dubbed", given a name, when the StructMetaclass visits its _ordered_properties.

Let's just look at an integer.

class IntegerProperty(StructProperty):
	_format = 'i'

The only thing special about an integer is that it's format specifier for the pack routine is "i".

We're going to want to use a metaclass again, this time inheriting from our OrderedMetaclass. The purpose of this metaclass will be to analyze the ordered properties, construct an aggregate format specifier for pack and unpack methods, and to dub each of its properties with their name.

class StructMetaclass(OrderedMetaclass):
	def __init__(self, name, bases, attys):
		super(StructMetaclass, self).__init__(name, bases, attys)
		for name, property in self._ordered_properties:
			property.dub(name)
		self._format = "".join(
			property._format
			for name, property in self._ordered_properties
		)

All that remains is to create our API base class, Struct. This class initializes its values and declares its metaclass.

from struct import unpack

class Struct(OrderedClass):
	__metaclass__ = StructMetaclass

	def __init__(self, *args, **kws):
		super(Struct, self).__init__(*args, **kws)
		self._values = {}

	def unpack(self, value, prefix = None):
		if prefix is None: prefix = ""
		for (name, property), value in zip(
			self._ordered_properties,
			unpack(prefix + self._format, value)
		):
			self._values[name] = value

So, now our epoch time example is possible using Struct and IntegerProperty, completely oblivious to the machinations behind the scenes.

from time import gmtime
from datetime import datetime

class EpochTime(Struct):
	seconds = IntegerProperty()
	microseconds = IntegerProperty()
epoch_time = EpochTime()

timestruct = gmtime(epoch_time.seconds)
datestruct = (timestruct[:6] + (epoch_time.microseconds,))
print datetime(*datestruct)
this entry was posted on Wed, 01 Oct 2008 at 20:38 in /python

Tue, 30 Sep 2008

Metaclasses - The Python Saga - Part 4

The original type function, whose behavior is preserved in modern Python 2.5, accepts an object and returns the class, albeit the type, that would emit it. It's like the typeof operator in JavaScript that returns the String name of the primitive type of an object, or the C++ function that returns a pointer to an object's virtual function table. They're all sufficient for comparing apples to oranges, but all of them are also insufficient for the more interesting comparison of apples to the idea of a Fiji apple: the question, "Does your type inherit from this?", that can be accomplished with Python's isinstance, JavaScript's instanceof, or C++'s infernal dynamic_cast. So, type's single argument behavior is effectively retired.

At some transcendental moment, somebody deeply involved in the Python project must have been thinking, "Well, if functions and classes return objects, what returns a class? Could a class, like a property, be syntactic sugar for some deeply metaphysical latent behavior in pure Python?". I figure this is how the type function grew its new wings.

So consider a class declaration:

class Foo(object):
	bar = 10
	def __init__(self, bar = None):
		if bar is not None:
			self.bar = bar

This is what is actually happening behind the curtains:

name = 'Foo'
bases = (object,)
def __init__(self, bar = None):
	if bar is not None:
		self.bar = bar
attys = {'bar': bar, '__init__': __init__}
Foo = type(name, bases, attys)

That is to say, there is no magic in the syntax. Ultimately all of the magic happens when you call type. By "magic" I mean functionality that cannot be replicated in pure Python without the interpreter's intervention.

The type function returns a type: a function that returns new instances. It's also called a "metaclass". type just happens to also be the implied metaclass of object. That is to say, you can create your own metaclasses.

The big question about metaclasses is, "Why on earth would you want to define a metaclass?". David Mertz from IBM wrote that you would simply know when you needed them. Since I read that article, I've wracked my mind for a reason to use metaclasses to no avail. At some point, I was reading Django's ORM code and it occurred to me that the reason you would want to define a metaclass is to provide a class in your API that, when subclassed by unsuspecting users, would invoke certain preparations without their knowledge or consent. Here's how:

Define a metaclass. The best way to define a metaclass is to inherit type and override its __init__ method.

class FooType(type):
	def __init__(self, name, bases, attys):
		super(FooType, self).__init__(name, bases, attys)
		print '%s was declared!' % name

Define a base class for your API. The trick here is that you can override its metaclass. Let's look at this one in an interactive interpreter:

>>> class Foo(object):
...     __metaclass__ = FooType
...
Foo was declared!
>>>

Whoa! You didn't call anything. Not true. Here's what actually happened:

name = 'Foo'
bases = (object,)
attys = {}
attys['__metaclass__'] = FooType
Foo = attys.get('__metaclass__', type)(name, bases, attys)

Python checks your attributes for a metaclass before defaulting to type.

That means that your FooType.__init__ got called. Hot damn. I wonder what happens if you create a subclass.

>>> class Bar(Foo):
...     pass
...
Bar was declared!
>>>

Whoa! I totally inherited a metaclass.

So, the reason for writing a metaclass is that metaclasses give you an opportunity to get and manipulate your derived class objects before anyone instantiates them. You get to do this once, right after the class dictionary is fully populated. You can take this opportunity to monitor class declarations, to prepare additional attributes, or to interpolate additional base types.


Keep in mind that metaclasses are jealous. If you create a metaclass for a type that inherits from base classes in someone else's API, your metaclass must inherit from their metaclass. I suspect that it's best not to assume that your base types use a particular metaclass. Thankfully, you can use an expression for your base type.

class FooType(getattr(Bar, '__metaclass__', type)):
	pass
class Foo(Bar):
	__metaclass__ = FooType

This takes advantage of the Python idiom of accessor methods like dict.get and getattr that accept a default-if-none-exists argument. Unfortunately, Python's object doesn't explicitly state that type is its metaclass. Otherwise, you could safely say:

class FooType(Bar.__metaclass__):
	pass

Such things are to be looked for in Python 3. I find that the Python developers have either, after considerable review and debate, already accepted or rejected most of my ideas before I even consider them, so I'm not even going to check for a PEP on this one.

this entry was posted on Tue, 30 Sep 2008 at 22:45 in /python

Mon, 29 Sep 2008

Properties - The Python Saga - Part 3

Properties come out of a tired programming language genesis. In the beginning, there were structs. The trouble with structs was that an opaque data structure could not programmatically monitor or intercept access and mutation of its member data.

So that's not a big deal; we could solve the problem with classes. The best practice to avoid programming yourself into a corner was to never expose a datum; you would write accessor and mutator functions, whether you needed them at the moment or not. Thus, as your design grew, you could eventually do nice things like validation, observation, or proxying. The trouble with this approach was that you had to write six times as much code on the off chance you'd need to extend it some day. But it was worth it.

The idea of managed properties came along eventually in various languages (Python, C#, some implementations of JavaScript, and recent versions of [C]). The notion is that you would initially write all of your classes like structs with member data camped in public view. You would encourage your API consumers to interact with those members directly. Then, as need arose, you would subvert the member variables with property objects. These objects would intercept accesses and mutations with functions that you could write at any time of your design process.

Lets observe this design shift in Python. Here's a class with unmanaged data:

class Foo(object):
	def __init__(self);
		self.bar = 10

Here's some other fellow's code that uses your class:

foo = Foo()
foo.bar = 20
print foo.bar
del foo.bar

So there you have it. Just to keep on the same page, the idea at this point is to add a feature to Python that permits both of those code samples to work and, in-fact, be perfectly cromulent. However, we also want to eventually add features to Foo such that its bar attribute can be managed, validated, proxied, secured, or outright lied about. Enter property. property is a function that accepts an accessor function and optional mutator and deleter functions. The property must be a class attribute to work. Here's how you would use a property:

class Foo(object):
	def __init__(self):
		self.bar = 10
	def get_bar(self, objekt, klass):
		return self.baz / 2
	def set_bar(self, objekt, value):
		self.baz = value * 2
	def del_bar(self, objekt):
		del self.baz
	bar = property(get_bar, set_bar, del_bar)

Now we have a Foo class that transparently maintains the invariant that "bar" will always be half of "baz".

Sometimes you don't need to have a setter for a property, and you almost never need a deleter. For the common case, you can use the property function as a decorator.

class Foo(object):
	def __init__(self):
		self.baz = 20
	@property
	def bar(self):
		return self.baz / 2

Creating the property function.

So, it's easy to assume that the property function does all the magic behind the scenes, setting up traps in your class's accessor and mutator paths. There's actually another layer of code that can be done entirely in Python. That is, we can implement the property function in pure Python. The trick is that the property function is actually a type or factory method (who cares which) that returns a Python duck-type: a property object. A property object is any object that implements __get__, __set__, or __del__. These are special magic Python functions that intercept access, mutation, and deletion on members. All you have to do is install an object on a class with one of methods defined, with the name of the member you want to manage. The property function just handles the common cases. Let's redefine the property function in Python, as the Property class.

class Property(object):
	def __init__(self, fget):
		self.fget = fget
	def __get__(self, objekt, klass):
		return self.fget(objekt)

This defines enough of the Property object to decorate an accessor function.

class Foo(object):
	def __init__(self):
		self.baz = 20
	@Property
	def bar(self):
		return self.baz / 2

Here's a full implementation of Property. You will note that, in order to exactly emulate the property object, the __init__ method has the same argument names as the internal property so that code that uses keyword arguments will function in perfect ambivalence.

class Property(object):
	def __init__(
		self,
		fget,
		fset = None,
		fdel = None,
		doc = None,
	):
		self.fget = fget
		self.fset = fset
		self.fdel = fdel
		self.__doc__ = doc
	def __get__(self, objekt, klass):
		return self.fget(objekt)
	def __set__(self, objekt, value):
		self.fset(objekt, value)
	def __del__(self, objekt):
		self.fdel(objekt)
this entry was posted on Mon, 29 Sep 2008 at 23:34 in /python

Sun, 28 Sep 2008

Decorators - The Python Saga - Part 2

Python introduced a short-hand for the adapter pattern on functions. You can "decorate" a function with another function. This is a neat tool you can use to factor out some common code from a bunch of functions. You can fiddle with the arguments, return values, or intercept exceptions thrown by any function you decorate.

The canonical example is a memoize decorator. The idea is to generalize the notion of memoization so you can simply subscribe to it in any function you want to memoize.

def factorial(n):
	if n == 1: return 1
	return n * factorial(n - 1)
factorial = memoize(factorial)

You accomplish this by writing the memoize decorator. A decorator is a function that accepts a function and returns another. Python virtuously provides a shorthand for taking the function, decorating it, and assigning it to a variable with the same name.

@memoize
def factorial(n):
	if n == 1: return 1
	return n * factorial(n - 1)

In the imagined normal case of decorators, the returned function accepts the same arguments and returns the same kinds of values as the accepted function. However, a decorator does have the liberty of extending or restricting that interface, like accepting additional arguments or raising an exception if the arguments are of the wrong type. It might also perform some common computation on the original arguments and pass the result to the original function as an additional argument. In any case, you can use some closures to create a decorator:

def memoize(function):
	cache = {}
	def decorated(*args):
		if args not in cache:
			cache[args] = function(*args)
		return cache[args]
	return decorated

Of course, that's too simple. A lot of things you put after the "@" symbol are just functions that return decorators so that they can be configured with arguments. For example, you probably want to make a memoize decorator that lets you specify your own cache object. So, you need another layer of deference.

def memoize(cache = None):
	if cache is None: cache = {}
	def decorator(function):
		def decorated(*args):
			if args not in cache:
				cache[args] = function(*args)
			return cache[args]
		return decorated
	return decorator

@memoize({})
def factorial(n):
	if n == 1: return 1
	return n * factorial(n - 1)

Since, in Python, functions, objects, and types are indistinguishable to the casual observer, you can do the exact same thing with a class, although I shudder to think that you might want to forgo the simplicity and elegance of closures. After the transform, the previous code might look like this:

class memoize(object):
	def __init__(self, cache = None):
		self.cache = cache
	def __call__(self, function):
		return Memoized(function, self.cache)

class Memoized(object):
	def __init__(self, function, cache = None):
		if cache is None: cache = {}
		self.function = function
		self.cache = cache
	def __call__(self, *args):
		if args not in self.cache:
			self.cache[args] = self.function(*args)
		return self.cache[args]

@memoize()
def factorial(n):
	if n == 1: return 1
	return n * factorial(n - 1)

So now you can use a Least Recently Used Cache, assuming it is a dictionary-like-object (a duck-dict, if you will):

from lru_cache import LruCache
@memoize(LruCache(max_size = 100, cull = .25))
def factorial(n):
	if n == 1: return 1
	return n * factorial(n - 1)

Download decorators.zip.

this entry was posted on Sun, 28 Sep 2008 at 21:42 in /python

Variadic Positional and Keyword Arguments - The Python Saga - Part 1

Python supports "variadic" arguments. Variadic arguments are the man behind the curtain for C's printf function. The idea is that a function can accept a variable number of positional arguments, the values to put in your format string. In C this is accomplished with an ellipsis, ..., and some VA macro-linked-list-stuff that I always have to look up. Python goes a couple steps further with variadic arguments and the results are stunning, orthogonal, and actually useful almost every day. With Python, you get both "positional" arguments, like C, and keyword arguments: those arguments that conceptually map, in any order, to the names of the arguments in your function's declaration. The magic symbols are "*" and "**" for positional and keyword arguments respectively. With one "*", you can declare a function that accepts any number of arguments as the declared list object:

def foo(*args):
	return args
assert foo(1, 2, 3) == [1, 2, 3]

You can also pass an array of positional arguments to a function with very similar syntax:

def foo(a, b, c):
	return [a, b, c]
assert foo(*[1, 2, 3]) == [1, 2, 3]

And you can do the same thing with keyword arguments except you use dictionaries:

def foo(**kwargs):
	return kwargs
assert foo(a = 10, b = 20, c = 30) == {'a': 10, 'b': 20, 'c': 30}

Likewise, you can pass keyword arguments:

def foo(a, b, c):
	return [a, b, c]
assert foo(**{'a': 10, 'b': 20, 'c': 30}) == [10, 20, 30]

You can use them in combination, along with default arguments to provide beautiful, orthogonal, reusable abstractions:

def foo(a, b = None, c = None, d = None):
	return [a, b, c, d]
assert foo(*[1, 2], **{'c': 3}) == [1, 2, 3, None]
def bar(a, b, c, *args, **kws):
	return [a, b, c], args, kws
assert bar(1, 2, 3, 4, 5, f = 6) == ([1, 2, 3], [4], {'f': 5})
this entry was posted on Sun, 28 Sep 2008 at 00:09 in /python

Thu, 04 Sep 2008

JavaScript Module Standard

The purpose of this document is to propose a contract between a collection of JavaScript module loaders and modules. The specification describes the environment that module loader implementations provide, and the environment that modules may depend upon. In particular, compliant module systems:

  • map one file to one module (while leaving room for implementation-specific multi-module bundling for website performance),
  • cache singleton module objects before executing the corresponding module file,
  • execute modules with a feature-rich context,
  • resolve module URLs relative to a module path, and
  • conform to a domain-name-based site-package naming convention and leave a name-space for a centrally managed standard library.

The specification is intended to be suitable for client- and server-side JavaScript module loader implementations.

The specification is intended to provide insights and an easy migration path to future versions of JavaScript.

The specification is intended to narrow the domain in which JavaScript modules can universally depend to maximize portability.

The specification encourages modules to adhere to a strict subset of the JavaScript environment in which they may be loaded. In spirit, this is a theoretical version of the JavaScript language that provides the intersection of behaviors provided by Class-A browsers and server-side run-times including Rhino, plus this system for loading modules.

Module Execution Context

The singleton module object MUST be declared and cached BEFORE the corresponding module file is executed. The module file MUST only be executed ONCE for the duration of a page-view or JavaScript program.

In a module file's execution context, the context object, represented by this, MUST be the module object.

The scope chain, from global to local, of a module file's execution context MUST consist of:

  • builtins
  • moduleScope
  • module
  • <anonymous>

builtins

Rules for module systems:

  • The builtins object MAY be frozen.
  • All objects in the transitive closure of builtins on item selection MAY be frozen.
  • The builtins object MAY contain more values than specified.
  • The builtins object MUST include:
    • String
    • Number
    • Boolean
    • Object
    • Array
    • Date
    • RegExp
    • Error
    • EvalError
    • RangeError
    • ReferenceError
    • SyntaxError
    • TypeError
  • The module loader MAY enforce these invariants at any time. In some environments, verifying these invariants will not be possible or pragmatic.
  • All objects in builtins MUST conform to the JavaScript subset described in the introduction: one that consists of the intersection of behaviors of the respective objects in all Grade-A browsers and server-side JavaScript environments.

Rules for modules:

  • Modules MUST NOT write items to the builtins object.
  • Modules MUST NOT modify any object in the transitive closure through references on the builtins object.
  • Modules MUST NOT access any items in the builtins object not herein specified.
  • Modules MUST NOT use non-standard features provided by builtins.

moduleScope

The moduleScope is a module's private name-space for module loader functions and imported values.

The moduleScope MUST provide:

  • builtins
  • module
  • moduleUrl
  • moduleRootUrl
  • require
  • include
  • foreignModuleBind
  • log

The moduleScope MAY provide:

  • register
  • publish

Modules MAY augment the moduleScope with additional items.

Modules MUST NOT overwrite the items specified here.

Module Loading

require(<moduleUrl>, [<structure>])

The require function returns an object with items from a foreign module. The required module is referenced with a URL. By default, all items from the foreign module are copied into the returned object. The returned object MAY be frozen. Modules MUST NOT modify the returned object. If a structure is provided, a subset of the items from the foreign module will be returned, the result of destructure(<module>, <structure>). If a function in the foreign module was declared with the foreignModuleBind decorator, the corresponding item in the returned object is the result of <foreignModule>.moduleScope.moduleBind(<name>, <value>).

If the URL begins with a dot, ("."), the fully qualified URL for the requested module is resolved relative to the fully qualified URL of the current module. This would be the result of urlJoin(moduleRootUrl, moduleUrl, foreignModuleUrl). Otherwise the fully qualified URL is urlJoin(moduleRootUrl, foreignModuleUrl).

Regarding module file names:

  • Directory components and file names in modules MUST be in camelCase.
  • Modules MUST have a ".js" extension if they are provided by files.
  • Modules MUST not have an extension if they are provided by the module loader but are not backed by real files. This might include a "window" module in a particular browser implementation.
  • The module root is reserved for a cross-browser JavaScript standard library.
  • Modules provided by entities other than the standard library MUST exist in a subdirectory of the module root corresponding to a domain name controlled by the author, or a subdirectory thereof.

Module authors are encouraged to use module relative URLs to increase the mobility of entire directory trees of modules.

include(<moduleUrl>, [<structure>])

The include function defers to require, both its arguments and its return value. However, include also copies all of the items from the object returned by require to the moduleScope object.

foreignModuleBind(<function>)

A function decorator that denotes that the module loader guarantees that, when the decorated function is called, it will receive the module object for the module file in which it was called as its context object: this. foreignModuleBind MAY return the same Function it was passed. The returned function MUST be usable in the module in which it was declared as if it were the function passed to foreingModuleBind. foreignModuleBind MAY modify properties of the given Function.

For example:

this.foo = foreignModuleBind(function () {
	log("foo called from: " + this.moduleScope.moduleUrl);
});

destructure(<object>, <structure>)

For the purpose of this specification, the "destructure" function has the following semantics. If the structure is an Array, the returned Object contains the items from the given <object> corresponding to the keys provided in the given Array structure. If the structure is an Object, the returned object contains items where each key corresponds to a value in the structure, and the value is a value from the object corresponding to the key in the structure. For example:

  • destructure({"a": 10, "b": 20}, ["a"]) == {"a": 10}
  • destructure({"a": 10, "b": 20}, {"a": "A"}) == {"A": 10}

module

The module object is a module's public interface. Adding items to the module object makes them available for export to other module scopes. To that end, the module object is mutable. The module object MUST provide a moduleScope. Modules MAY augment the module object.

moduleRootUrl

moduleRootUrl is the fully qualified URL of the JavaScript site-packages directory: the module root.

moduleUrl

moduleUrl is the relative URL of the current module file relative to moduleRootUrl.

log(<message>, [<label>])

log is a simple console logging function.

The module loader MAY ignore calls to log. The module loader MAY ignore the label argument.

The optional label MAY be any string, and MUST be suitable for use as a CSS class-name (preferably lower-case delimited by hyphens) of which the following MAY be significant:

  • info
  • warn
  • error
  • module
  • help
  • pass
  • fail

Afterword: Browser Implementations

This specification outlines the process of requiring modules from within other modules. However, in a browser's global context, JavaScript execution blocks are not modules. To that end, this specification does not require that the module loader be invoked in any particular fashion. A particular implementation might hook an initial module to be loaded from within a script tag. Another implementation might scan the DOM for script tags with an alternate language attribute and execute them as modules with the current page's URL as their module URL.

Afterword: Future ECMAScript import semantics.

A future version of the ECMAScript standard might specify new syntax and semantics for importing modules. Current discussions about this feature trend toward having new syntax that "desugars" to native JavaScript. To that end, I propose the following syntax and desugaring transformations in the context of this specification:

  • import "<moduleUrl>" as <moduleName>;
  • module.<moduleName> = require("<moduleUrl>");
  • from "<moduleUrl>" import *;
  • include("<moduleUrl>");
  • from "<moduleUrl>" import <a>;
  • include("<moduleUrl>", ["<a>"]);
  • from "<moduleUrl>" import <a> as <a'>;
  • include("<moduleUrl>", {"<a>": "<a'>"});
this entry was posted on Thu, 04 Sep 2008 at 03:28 in /javascript

Wed, 14 May 2008

Got Style?

There's a new JavaScript library from Google on the block. So, I'm of course out to absorb the good bits for Chiron. I've got to wonder some times whether people have started to use ideas from my code.

Compare this bit from Google's JavaScript library with this one in Chiron.

I don't hear much about people using Chiron. Could this be a sign that people have been reading it and absorbing the good bits? There are other parallels. For example, some of the function names and file names match up with Chiron too, like base.js and inherits (instead of Doug Crockford's recommended term, begets).

this entry was posted on Wed, 14 May 2008 at 16:48 in /javascript/chiron

Thu, 24 Apr 2008

Re: Recommendations for the Fourth Edition of the ECMAScript Programming Language Specification

Doug Crockford has posted some recommendations to improve JavaScript.

I have some comments in the context of my experience with live-patching JavaScript issues with libraries.

New Features

Function Reflection

Doug recommends that function objects contain a "name" property. Some versions of JavaScript actually have this already. This could be handy but I haven't needed this and would not be likely to use it. I was surprised that FireFox had this when I used a Function in a with block. I've learned my lesson.

However, Doug also recommends the addition of an arguments member to Function objects. This is a critical feature. The arguments array would contain the names of all of the functions declared argument variables. Function objects already have a length member that corresponds to the length of this hypothetical arguments array. There was some buzz about using this value for method overloading and it permitted me to create an elegant partial application decorator. If we had access to the names of a function's argument declarations, I could implement a decorator for supporting Python-style positional and named arguments. Decorated functions would accept an array/list of positional arguments and an object/dict/map of keyword arguments and the decorator would translate them to an array for a wrapped Function.apply. That decorator or further decorators could support default values and automatic argument length assertions.

typeOf

Doug notes that JavaScript's typeof "function" is broken because it returns the wrong strings for null and arrays. All that and he hints that using camelCase instead of lowercase would have been more in keeping with the language's aesthetic. He recommends that we deprecate typeof and replace it with a fixed version called typeOf. I would go a step further. typeof returns a string. I propose that it ought to return the type object. This would jive well with instanceof, instanceOf, or isInstance. This means there should be a Null type that would construct null and an Undefined type to construct undefined.

Object Methods

Don't Enum

I don' particularly care whether Object has a dontEnum method since I don't augment base types and wrap them when I need reliable collection types, but this would help ameliorate integration woes with prototype so it's a good idea.

Array of Keys

Doug wants a Object.arrayOfKeys method. I want a keys method that returns a Set. I would also like keysIter and, less emphatically, keysArray. I don't want the name arrayOfKeys because it introduces an unnecessary Of and doesn't sort as well as keys*.

Array of Values.

Doug also wants Object.arrayOfValues. I want values and valuesIter. These should return Array and Iter object respectively.

toJsonString

Object.toJSONString(memberName). The first thing that comes to mind to improve this idea is to rename it toJsonString; I prefer to consider acronyms words and I would have also called XMLHttpRequest XmlHttpRequest. This name jives well with toString, but I would have called that string. But, I actually would call this repr and made it a global function. repr would work on arbitrary objects, a subset of which would be serializable JSON. The behavior of repr would be overridable by providing a repr method on an object.

Beget Object

Doug recommends that Prototype inheritance be a little more lucid by providing a beget function that returns a new, empty Object that inherits from the previous. I like this idea. I called it inherit in Chiron.

isEmpty

isEmpty would return whether an object has any properties, other than those inherited. Sounds fine to me, although I probably would have just used number and boolean casting. This is a good idea too though, because in JavaScript, it's hard to distinguish an Object that is being used as the base of the inheritance tree and an Object that is being used as a hash-map. To that end, I recommend the addition of a Map type. Map would have isEmpty, number, and boolean functions and would implicitly cast to Number and Boolean based on their length that would be reflected publically by the getLength function.

Array Methods

Doug recommends the addition of some of Mozilla's JavaScript 1.6 Array methods. I support all of them. However, some nomenclature refinements and specifications:

indexOf should be augmented by a find function that throws a KeyError instead of returning a negative index.

lastIndexOf should be augmented by a findLast function similar to find.

every should be called all. This function should accept an iterator or iterable and short circuit on the first failure.

filter should be called where. Filtering implies the opposite of finding. To filter something is to remove it from a stream if it passes a particular condition. where implies that the outgoing stream should contain only those elements from the original stream that affirm the guard.

forEach is good as is. I would like to specify that it should return this so that forEach calls can be chained. This has implications on iterations that are partially consumed by a forEach call that throws a StopIteration exception in its continuation.

map is good as is. As a member of an array, each should be a synonym. These functions should also be declared in global scope with opposite argument order: map(function, collection, [context]) vs each(collection, function, [context]).

some should be called any. This function should accept an iterator or iterable and short circuit on the first success.

String Methods

trim is good as is. There should also be trimBegin and trimEnd or trimFirst and trimLast.

string.eval is a good idea, but I've got my own ways to launder the scope chain for eval.

Date Methods

Date.toJSONString and Date.toISOString are both great ideas; not having them has been a problem for me in the past. I recommend calling them toJsonString and toIsoString.

Corrections

Reserved Words

Doug argues that identifiers in object literal notation and member dot notation should allow reserved words. I concur. It's a small syntax shortcut that identifiers used in member selection and object literal notation are not required to be enquoted; the language should be equally permissive for both syntax forms.

Doug also argues that the list of reserved words is too long. I have mixed feelings. On one hand, having a long list of reserved identifiers leaves open the door for future advances in the language (some of which include type annotations, which I find dubious). On the other hand, they muddy the name space for current code. On the latter note, it's important that all browsers are equally strict. Safari, at the moment, is much more strict than other browsers, so I've been surprised.

Object Literal Notation

I agree that commas should be more regular throughout the language. They should be permitted after any value, including the last in an Object or Array literal without affecting the length of either.

arguments

arguments should definitely be an instance of an Arguments type that inherits from Array.

Inner Functions and the Context Object (this)

I agree that this should not be the global object (window) in inner or anonymous functions. this should be acquired from the scope chain in such closures. Doug claims that this is not the standard. I recall being corrected on this point, but I also recall having been under the same impression. I leave this as an exercise to the reader to determine the current state of affairs in various browsers.

Tail Recursion

Tail recursion would be nice.

Deprecation

Primitive wrappers should be eliminated. Boxing is almost never necessary. In fact, the use of the new keyword could be completely obviated and code would become much more reusable since there would be no distinction between a factory method and an object constructor.

I actually use the with statement in a couple cases that are really important. I don't recommend using it for its designed purpose, but I really do need it to stay in the language for my module system and my JavaScript shell to keep working. FireBug, in particular, depends on this feature.

Semicolon insertion was silly. Off with its head.

I don't particularly care about arguments.callee.

typeof has to go.

Again, eval is a necessary part of my perverse world. I can do my own laundry. There are, however, irresponsible uses of eval that I do not condone.

this entry was posted on Thu, 24 Apr 2008 at 18:40 in /javascript

Mon, 24 Mar 2008

De Schpugeti

You've heard of syntactic sugar. After a syntax and vocabulary provides all of the normal forms necessary to express all of the programs possible in a language, yummy shortcuts bridge the gap between expressiveness and usability. For example, c++ is a delicious variation of (temp = c, c = c + 1, temp).

You've also heard of CRUD, an acronym for the normal forms for a data programming interface (databases, data structures). If you design an interface library for data and fail to cover all of these bases, you deserve what's coming to you — and I can't promise that it won't involve a big hole in the ground and Hello Kitty.

CRUD stands for create, read, update, and delete. Those functions encompass all of the necessary data interactions; however, they are not sufficient. While CRUD may provide all the nutrients, vitamins, and minerals your data API might need, it certainly won't taste good without sugar.

Enter: Chiron.

del, set, cut, has, put, get. De Schpugeti. All of these names are three letters long, should be familiar by their names, and are sufficient in addition to necessary. These methods are the interface of Chiron's Dict, for manipulating item data: data that have keys and corresponding values. They are also implemented in a consistent manner by List presuming that indices are like keys.

del
Deletes an item for a key.
Returns the mutated collection.
For ordered collections, accepts a beginning key and an ending key, deleting all items in that range of keys excluding the ending key.
Implemented in terms of remove.
set
Adds or overwrites an item for a key.
Returns the mutated collection.
Implemented in terms of insert.
cut
Deletes an item for a key.
Throws a key error of no item has that key.
Returns the corresponding value.
Implemented in terms of get and del.
has
Returns whether key is among items.
Implemented in terms of has.
put
Adds an item for a key.
Throws a key error if there is already an item with that key. returns the mutated collection.
In an ordered collection, reserves the right to change the keys of some other items to make room for the new item.
Implemented in terms of set and has.
get
Returns a value for the given key.
Throws a key error if there is no value for the given key.
Implemented in terms of retrieve and has.

Also, Lists, Sets, and, by extension, Dicts have functions for managing collections of values that do not have keys. Dict is a Set of items instead of mere values that are compared and hashed on their key.

has
Returns whether a collection contains a particular value.
insert
Adds or overwrites a value.
Returns the mutated collection.
Defaults to an implementation in terms of find and set.
retrieve
Returns the contained value that is naturally equivalent to a given value.
Throws a value error if none exists.
Defaults to an implementation in terms of find and get.
remove
Removes a value.
Throws a value error if the value isn't in the collection.
Returns the mutated collection.
Defaults to an implementation in terms of find and del.
discard
Removes a value if it exists.
Returns the mutated collection.
Implemented in terms of has and remove.
find
Returns the key for a given value.
Accepts an optional equivalence relation to override the default of eq.

Now accepting ideas for a good mnemonic.

The fun trick is that List, Dict, and Set implement all of these functions. While it may be more appropriate to use one collection type for one kind of data or one algorithm, you can use any of these types interchangeably.

this entry was posted on Mon, 24 Mar 2008 at 17:51 in /javascript/chiron

Thu, 21 Feb 2008

Unicode

What I've learned today—

In order for a program or library to operate in a Unicode compatible fashion, all strings must be in Unicode. All input strings must be brought into Unicode, and all output strings must be sent out of Unicode at the very last possible moment. This is because, outside of Unicode strings, encoding is not a function of type and type information does not generally cross API boundaries accurately, plus regular expressions don't play well against mixed-length characters.

Django works entirely in Unicode and drops a string to UTF-8 at the last moment. I had need for Python's Textile library to transform text, but it only dealt with byte strings. Anyhow, It turned out to be quick work to change all of its strings to unicode and not bother with encoding and decoding.

this entry was posted on Thu, 21 Feb 2008 at 20:10 in

Tue, 19 Feb 2008

Integrating Simile

Not to name names, but I've been working on integrating code from Simile from MIT into Chiron. Refactoring an existing JavaScript project highlights all the things you get for free in Chiron.

Simile has it's own XHR engine, DOM event wrappers, DOM layout and style functions, PNG transparency solution, and a SortedArray type that provides binary search functions. Here are some of my observations.

  • Simile's layout getSize was better than mine. I will rectify this.
  • Not having a module system makes us reinvent the wheel: frequently.
  • It's hard to write a good XHR engine. There are a lot of XHR modules out there, most of them have some issue or another: missing browser, doesn't report OK status on local files, doesn't unify browser caching inconsistencies, doesn't support timeouts, doesn't expose XML (the X in AJAX) in IE, or so on. If you're going to make a new one, you should use these as references and do some serious research, development, and testing. Otherwise, you should copy or use the best of them (jQuery, in my opinion). Also, it needs to support asynchronous (the A in AJAX) requests, and you need to use them as often as possible.
  • Not having a solid, modular library makes us lazy. The inconvenience of name-spacing makes us lazy. This causes us to write sloppy code. For example, we should always use an enquote function when we're string interpolating HTML attributes and an inoculate function when we're interpolating HTML, or we should use DOM functions or a DOM wrapper API to generate our HTML.
  • As I integrate code from other libraries, a pattern emerges. In my first pass, I collapse the name-spaces. Every module is a name-space, so all the manual creating of hierarchies like Simile.DOM (Simile = {} presumed, then Simile.DOM = {}, then endless repetition of Simile.DOM to augment or use its contents) is unnecessary and undesirable.
  • Referencing URL's of resources, like other scripts and images, relative to the URL of the script you're currently in, is hard. Starting from scratch, this usually means you're going to have a global URL constant. This means domain-coupling. Maybe you make the URL relative to the root. This means domain-coupling. Maybe you provide it as a configuration variable. This means site-coupling. Maybe you scrape the script tags on the page for the URL of your script then resolve the URL relative to your own URL. This means you're going to write a lot of slow code for what you perceive to be little value. In Chiron, you can get a function called resolve from http.js that resolves a URL relative to a base URL. Chiron also provides your modules with a moduleUrl variable that is the URL of the script you're in. resolve also implicitly uses this variable as your base URL if you don't provide a second argument(include('http.js'); resolve('images/blah.gif')).

    Chiron grabs the script tag href of modules.js and removes the script object from the DOM (so other scripts can't sniff it) exactly once, since it needs that URL to resolve other module URL's. From there, Chiron keeps track of where all of your modules are relative to it and provides that information to each module.

  • About SortedArray:
    • A collection type should create an empty instance if you pass no arguments in.
    • A collection type should populate itself from the values of another collection if you pass one in as its first argument. This should always be the first argument, even if you frequently create empty collections with overrides on later arguments. Force your user to pass in a null or undefined.
    • Try to accept null and undefined as equivalent unless the distinction is meaningful.
    • Try to distinguish null and undefined from 0 in all meaningful cases.
    • Invariants like "sorted" are a promise. Guarantee your invariants across all function calls, including construction. If this means an unacceptable performance degradation, permit the user to suppress whatever code you need to verify the invariant if they are willing to provide treated data.
    • If there is a reasonable default, it should always be implicit. I should not have to explicitly send the global compare function into a SortedArray if I want a SortedArray of types supported by compare.
  • Not having a system of base types makes for noisy API's where names from different organizations have different meanings. For example, find functions should always accept the same kinds of arguments and return the same kinds of values. Simile's name choices are very close to mine, to the effect that they could almost be used as partially implemented duck types for mine, but some of the names would have to be realigned. find in Simile accepts a comparator and returns an acceptable index to insert or remove a particular element. find in Chiron returns an index or key at which an item can be inserted or removed, and guarantees that it will be the first occurrence of a given value (not a comparator). It was very easy to refactor SortedArray to subscribe to the strict model. Also, removeAll needed to be clear, length and getCount both needed to be getLength, getIterator needed to be iter, next needed to throw StopIteration once in a while, among others.

I'm looking forward to having a semblance of Simile Timeplot and Timechart in the Chiron family.

this entry was posted on Tue, 19 Feb 2008 at 01:39 in /javascript/chiron

Mon, 18 Feb 2008

Polymorphic repr

I'm not about to debate whether debugging is a valuable exercise in JavaScript, nor whether introspection or reflection tools are useful, nor whether they would be especially helpful in a dynamic language. Ruby has inspect. Python has repr. The Chiron JavaScript library has repr too.

Notionally, repr is the inverse of eval for a reasonable subset of JavaScript. There are a lot of object hierarchies that cannot be reconstructed from a repr serialization, but as a debugging tool, repr is indispensable.

repr is a polymorphic function you can import from base.js. If you pass repr an object that implements a repr member function, repr will defer to your overridable repr. Otherwise, repr returns reasonable defaults for other types. For example, repr provides defaults for Array, Object, String, Number, Boolean, and Date. repr also recursively represents members of arrays and objects, but provides circular reference protection by tracking visited objects in a memo Set.

Chiron's debugger uses repr to convert the value of an expression on the command line to a human-readable string.

j$ 1
1
j$ repr(1)
"1"
j$ "hi"
"hi"
j$ repr("hi")
"\"hi\""
j$ true
true
j$ {a: 10}
{"a": 10}
j$ [1, 2, 3]
[1, 2, 3]
j$ [{a: 10}]
[{"a": 10}]
j$ var a = {}; a.a = a; a
{"a": <cycle>}
j$ type()()
<instance run.html#0 0>
j$ type({'repr': function () {return "x"}})()
x
this entry was posted on Mon, 18 Feb 2008 at 12:19 in /javascript/chiron

Thu, 10 Jan 2008

Some Handy Commands

I've learned a few handy commands this year, so here's the wealth.

Sets have always been a big hole in my command line toolkit. I've gone to such lengths as to implement commands for basic set intersection and union in Perl or Python. Go no farther! While looking over Lasermacaroni's shoulder about a year ago, I noticed that he used the comm command. comm operates on sorted unique streams of lines and produces a three column output of which elements were in column A only, column B only, or both. Of course, three columns are nearly useless, so you can tell comm to suppress the output of certain columns. Yes, this is stupid, but here's your mnemonic: ask not what columns comm can display for you, rather ask what columns you can suppress from comm. Go!

A & B: comm -12 A B
A & !B: comm -23 A B
!A & B: comm -13 A B
A | B: cat A B | sort | uniq

Of course, I like to use comm like grep or grep -v to find the elements from an input stream that either are or are not in a given file. Mind that your file needs to be sorted and uniqued.

also in A:  ... | sort | uniq | comm -12 - A
not in A:   ... | sort | uniq | comm -23 - A

I've known about the seq command in Linux-land for a while. It creates lists of numbers in a given range.

seq last
seq first last
seq first stride last

For examples:

$ seq 3
1
2
3
$ seq 3 5
3
4
5
$ seq 0 2 4
0
2
4

I was quite disappointed not to find seq on Mac OS X. Turns out the BSD folks have a pretty bad case of NIH. Instead of the seq function, you may, having the good fortune of working around brilliant people every day, notice a friend, coworker, or other friendly mammal use the jot command to produce their streams of numbers.

jot [reps [begin [end [stride]]]]

Here are some occlusive examples:

$ jot 3
1
2
3
$ jot 3 5
5
6
7
% jot 3 0 6
0
3
6
% jot 3 1 1
1
1
1
jot -b+ -s- 40
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

fin

this entry was posted on Thu, 10 Jan 2008 at 22:57 in /cli

Thu, 03 Jan 2008

modules.js

We've posted modulesjs.com this evening to proliferate what could be the most important piece of code I've ever written, modules.js. This JavaScript is the core of my Chiron JavaScript library project for the Tale game user interface, but could stand alone as the core of any number of JavaScript library projects. Every popular JavaScript library could benefit from a module system that handles dependencies and isolates name spaces like Python or Ruby. Read the tutorial and look over the code. Find me on an instant messenger or my face in the big blue box or send me an email and tell me what you think.

Thanks go out to Ryan Witt for managing the server and domain, my sister Kathleen Kowal for the graphic design, Ryan Paul, Ryan Ernst, and Mike Stone for proofreading and particularly to Ryan Paul for letting me expound at him nigh daily for the last decade.

this entry was posted on Thu, 03 Jan 2008 at 15:04 in /javascript/chiron

Tue, 01 Jan 2008

New Year's Resolutions

I don't generally make New Years resolutions. But, since I'm a software engineer, and I've got some milestones to plot, I've decided to post a release schedule for my personal software projects. This year, I'm going to release Tale.

This involves continued development on Chiron JavaScript for the front-end, Python on Planes for the back-end, and lots of new musical, graphical, and programmatic content.

I'm of course still looking for volunteers (occasionally paid help, if I must and it's worth it) to develop various parts. Chris Pasillas has been working on composing music for the game; we've got a lot of content and had a meeting [mp3] last week to develop creative direction. However, if someone's equipped to render orchestrated versions of the music from the MIDI's (I've got Jonathan DeKlotz in mind for this, but he's a busy fellow too), that would be a super-awesome contribution. We could also use some help from smart, creative, driven (perhaps by angst, anger, or escapism; whatever works for you) programmers who know or are eager to learn Python to develop the actual game engine logic and "mob procs"; this is a lot of work and the bulk of player experience, but I must admit that I'm only marginally interested in developing this aspect of the game, plus I've got a lot of work ahead of me to re-integrate PoP and Chiron. I've had some help from Scott, and Josh "Jynx" Jenkins to develop the game math, plus some ideas of my own in the works, but someone who can dedicate programming time to make ideas real could have a big impact. There are also a lot of Python and JavaScript mini-projects that I may or may not have time to develop myself with the remaining year of schedule, many of which would be very interesting, like intelligent narrative generation from event description trees, SVG sprite composition and rendering (partially implemented), much more 2d vector art for content, game state logic (making lots of use of Python generators if you want to learn about that), and much more. Most of all, Tale needs volunteers who are willing to give a part of their soul, as I have, to the project. Here's the development schedule.

January 2008
SunMonTueWedThuFriSat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
1 Post Chiron 0.1 (done)
February 2008
SunMonTueWedThuFriSat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28    
2 Deploy Chiron Website
May 2008
SunMonTueWedThuFriSat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
3 Post Python on Planes 0.1
June 2008
SunMonTueWedThuFriSat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          
1 Release Chiron 1.0
1 Begin Tale Invitation Beta
December 2008
SunMonTueWedThuFriSat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
5 Begin Tale Public Beta
January 2009
SunMonTueWedThuFriSat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
1 Deploy Tale 1.0

This year will also mark another major step in the collision of my hobby-life and work-life. I've never mentioned this in my personal blog, but I've been working at Apple for the last year and a half, but I'll be moving on to work for a start-up spun off of a networking lab at Caltech called FastSoft where I will be developing the web user-interface for their Aria, a wide-area network performance optimizer. This means I'll be much busier, and I'll be able to work on open-source projects like Chiron as part of my job.

Looking back at all of my personal projects for the last eight years, a pattern emerges. I want my code to help as many people as possible. As for my motivations, as I realized over lunch with Ryan Witt and Christine Ortega over lunch last week, I'm an artist — I'm performing for an audience and am gratified for appreciation of my creations. I'm a fan of Reusability and Repurposability with big R's. This has implications for the nature of the projects I find interesting. I prefer to write libraries and languages: crystalline, orthogonal, refined, applicable, general, powerful, hosting usefulness through emergent patterns: uses unforeseen. Reusability is also best served by proliferation, and what better way to proliferate a thing by making it valuable and free. This isn't economics where value, cost, supply, and demand have interesting but vague relationships; I mean value as usefulness minus cost. Also, what better way to make a thing valuable than to permit your users to exercise and refine it themselves and give them every reason in the world to share their contributions. On a selfish note, I prefer to write software I can use anywhere, even if I'm working on something proprietary. What I'm getting at here is Open Source, with a free, non-viral license. I've chosen other licenses in the past for various purposes, withholding a piece of value for myself for money or the cause of freedom, but for the raw purposes of Reusability, Repurposability, proliferation, and the advancement of the state of the art, I've chosen to use a permissive, free, non-viral license. So, my contributions to Chiron and PoP, at work and at home, will be available with the MIT license.

Happy New Year; let's make.

this entry was posted on Tue, 01 Jan 2008 at 17:35 in

Mon, 18 Jun 2007

Machina ex Deus

Behold!

"Do not be too proud of this technological terror you've constructed."

- Lord Vader

this entry was posted on Mon, 18 Jun 2007 at 00:00 in /tale

Tue, 12 Jun 2007

Cixar Tale Mailing List

If you're interested in Tale development, please join the email discussion in our new Google group.

this entry was posted on Tue, 12 Jun 2007 at 00:00 in /tale

Sun, 13 May 2007

Tale Music

It's also worth noting that I've commissioned Chris Pasillas, a friend from Cal Poly who studied Computer Engineering before getting his degree in Music with piano performance and composition. Chris drafted at least one sketch for each of the faces of the Tale World this week. The sketches are in the Tale Subversion repository. Each sketch is denoted by a number and a name for each face and tension level. Tension levels include "town", "adventure", and "battle". He's also working on a motif for each face. Please send us your comments and suggestions.

this entry was posted on Sun, 13 May 2007 at 00:00 in /tale

Chiron Automated Tests

It's worth nothing that I've posted the current results for automated unit testing for Javascript library.

The automated test system uses a web browser, an HTTP daemon, a client web page, and a batch of javascript unit test files. A script launches the daemon with an argument list containing a manifest of all of the javascript unit test files. The daemon serves all of the Chiron files, a logger, a reporter, and a test file-name iterator. The script then opens a web browser and navigates to the client page. The client page supplants the usual test.js module with its own autoTest.js module (in an aspect oriented fashion) so that the unit test assertions and timers get logged to the server via an HTTP request. Then, the client page iteratively requests a test URL from the server and opens the test in a new window. After all of the unit tests have run, the client opens the test results page.

In the future, I would like to use up a database and leave the test daemon running before and after each Subversion commission so that test results can be gathered from a suite of platforms. Then, there would be a catalog of performance and coverage trends.

this entry was posted on Sun, 13 May 2007 at 00:00 in /javascript

Sat, 12 May 2007

Cixar Javascript Library and Names

The name "Cixar Javascript Library", or CJL, has numerous virtues. It has a three letter acronym, a full name long enough to asphyxiate a cheetah, conjures absolutely no mental image, nor alludes to any works of art or literature.

I'd like to announce that, having rigorously contemplated the issue from a vantage about two feet above my naval (that is, half of a meter for our dear friends huddled safely distant from the sanity interference field), I've decided to name the Cixar Javascript Library. My requirements are succinct:

  • one word,
  • allegorical, preferably ambiguous,
  • eminently brandable,
  • subtly humorous,
  • an alternate spelling in Unicode,
  • If possible, the name must enamour an attractive girl no more than five years older or younger than me before Monday.

I browsed Wikipedia and discovered Ajax's centaur tutor, Chiron. So, there you have it, the library's name is Chiron. I leave it as an exercise for the reader and my fate/doom to discover whether it will suffice. However, I will explicitly state that brandability is fulfilled. Chiron, it turns out (back me up on this, Wikipedia), was really half-cow, the only Centauren to grace the Hellenic Pantheon.


For prospective users of Chiron, I have a question of nomenclature preference. Use of the Chiron library requires the HTML author to include exactly one script tag in their header, modules.js (13KB shrunk but not compressed). From there the author can opt to use whatever modules they will.

Central to the Chiron library is a module currently called javascript.js that contains exceptions (KeyError), the type system (type), common types (Dict, List, Iter), and a wealth of common functions (forEach, stripEnd) (currently 36KB shrunk but not compressed). Since I would like to give the user the option of not loading this module, most modules explicitly include it, even though in a language like Python, these primitives would be implicitly available.

I called the module javascript.js since it would include primitives that in Python would be part of the language. In Python, this module would probably be called __builtins__.py. In any other Javascript library, precedent suggests that that module would be called base.js.

Which of the following should I call this module of language primitives?

  • javascript.js,
  • builtins.js,
  • base.js,
  • your idea here.

Please send me an email and weigh in.

this entry was posted on Sat, 12 May 2007 at 00:00 in /javascript

Javascript and Load Time

I conducted a quick experiment to assess whether it is worth worrying about the size and distribution of Javascript modules. I gathered the following numbers on my Linux box at home (fast) connected to Cixar (fast) over my home Internet connection (average today).

ping cixar.com 30ms
ping google.com 80ms
HTTP overhead 30ms
bandwidth 0.1KB/ms
patience 1500ms
maximum size 150KB
Chiron core 100KB
Chiron core shrunk 50KB
Chiron core shrunk compressed 13KB

I've decided not to worry about the size or distribution of Chiron until its core is five times bigger.

this entry was posted on Sat, 12 May 2007 at 00:00 in /javascript

Fri, 11 May 2007

Higher Order Functions in Javascript

Recently I discovered that a friend of mine isn't using "Higher Order Functions" in their Javascript. I cried. However, I fear, this is common-place, perhaps some artifact of a latent and ubiquitous fear of closures instilled by Internet Explorer and the folklore that arises in the absence of reliable, or believable, information. In any case, I'm resolved to expound the virtues of this lesser known programming paradigm among object oriented language aficionados.

Higher Order Functions are functions that accept or return functions. This variety of insanity is possible in functional languages, or languages that provide closures. It is of course possible to accept and return function pointers in C or its ilk, but these are not sufficient for Higher Order Functions because you cannot create functions that can refer to the local variables of the function that instantiated them. This is the crux.

It is helpful to think of a name space as an Object. Calling a function creates a local name space, rooted in the name space in which the function is declared. Unlike in assembly languages, these name spaces can persist, as Objects do, long past the lifetime of the function call that created them. Like Objects, a name space exists as long as a reference exists referring to it. However, among name spaces, the only way to refer to a name space is to create a name space rooted therein. This means that the local variables of a function call exist as long as another function's name space refers to it.

Permit me to offer a practical example, one that any one can use every day of the week. Sort functions in most languages accept an optional comparator function as an argument. C++'s Boost library for example accepts predicates. This is, of course, an insane hack since C++ isn't designed for functional fu. Python provides sorted and accepts a comparator. You don't have to believe me; the important point is that Javascript follows suit.

This is the default sort behavior.

var items = [3, 1, 2];
items.sort();
assert(items == [1, 2, 3]);

A comparator is a function that accepts two arguments and returns an integer indicating whether the arguments are equivalent, sorted in ascending order or in descending order. The result of a comparison call can be compared to 0 (on the right hand side of an expression) with ==, > and < and the result will be exactly the same as if the operands were replaced with the comparison arguments respectively. So, if compare(a, b) == 0, then a == b, if compare(a, b) < 0 then a < b. The simplest compare function provides these results for numbers.

var compare = function (a, b) {
	return b - a;
}

This example, provides a reverse comparator; a comparator that will sort numbers in descending order.

var compare = function (a, b) {
	return -(b - a);
};
var items = [3, 1, 2];
items.sort(compare);
assert(items == [3, 2, 1]);

Of course, there are all manner of objects that you might want to sort, and not all of them are strings and numbers. Supposing you had an array of people, you might want to sort them by their last name. You can do this with a comparator.

var compare = function (a, b) {
	if (a.lastName b.lastName) {
		return -1;
	} else if (a.lastName == b.lastName) {
		return 0;
	} else {
		return 1;
	}
};
people.sort(compare);

Of course, it would be cumbersome to give highly descriptive names to every block of code you wrote and used once, so it's no more reasonable to have to name all of your comparators. You can just use an anonymous function.

people.sort(function (a, b) {
	if (a.lastName b.lastName) {
		return -1;
	} else if (a.lastName == b.lastName) {
		return 0;
	} else {
		return 1;
	}
});

However, you will find that all of your comparators will have the same shape. They will probably all test equality and lessity. It would be beneficial if you had a function that would create comparators for you just based on the attribute of the object you want to compare. Enter higher order functions. Let's construct one such function.

We want to create a function that accepts two elements and returns a number reflecting their transitive relationship, a comparator, based on some attribute. So, its clear that our function will accept an attribute name, and return a function.

/* the higher order function: */
var by = function (attribute) {
	/* the comparator: */
	return function (a, b) {
	};
};

Lets see the function in full.

/* the higher order function: */
var by = function (attribute) {
	/* the comparator: */
	return function (a, b) {
		if (a[attribute] b[attribute]) {
			return -1;
		} else if (a[attribute] == b[attribute]) {
			return 0;
		} else {
			return 1;
		}
	};
};

The important leap here is that the comparator can refer to the attribute from its enclosing name space, even though that function will have completed and been popped off the "stack" long before we ever call our comparator.

We can use our mighty by function to sort people by last name much more readably and succinctly.

people.sort(by('lastName'));

For good or ill, this will work for rigid arrays of arrays (tables) as well, if you provide an integer instead of an attribute name. I say this with half of a heart because it's a Pyrrhic victory. The price of being able to perform member selection and array indexing with the same, generic syntax is the inability to distinguish member selection for associative array (or hash) indexing, but that is a tale of woe for another day.

var table = [
	['Kris', 'Kowal', 3],
	['Ryan', 'Paul', 2],
	['Ryan', 'Witt', 1]
];
table.sort(by(1));

In any case, this demonstrates the power of higher order functions in your every day life. If you don't like the for (var i = 0; i length; i++) pattern (as you ought be weary of by now), modern Javascript implementations provide a higher order function, Array.forEach. From the pantheon of functions you will also find map, fold or reduce, filter, compose, partial, and others. Happy coding.

this entry was posted on Fri, 11 May 2007 at 00:00 in /javascript

Mon, 16 Apr 2007

Javascript Events

This evening, I finally got around to implementing event base classes for the Cixar Javascript Library for Tale. These events resemble W3C DOM events but add some much needed complexity under the hood.

I've implemented Event, Dispatch, and Observable. Event is a base-class for events that implements the usual stopPropagation and cancelDefault. Dispatch creates a function object for dispatching an Event with optionally overridden event type or event factory and default function. Observable can be mixed into any type to permit users to listen to method invocations. This is the beginning of the fun. Dispatch implements call and listen as expected. Additionally, Dispatch implements listenBefore and listenAfter so you can attach event listeners that will fire before any other listener as yet attached, or after the default function and any other listeners have dispatched. Additionally, all of these listener functions return a new Dispatch object so you can assure that your listeners will fire in order relative to each other. This behavior is recursive, so you can attach a listener before or after any other listener.

Here's the unit test.

this entry was posted on Mon, 16 Apr 2007 at 00:00 in /javascript

Sun, 25 Mar 2007

Don't Repeat Yourself: or Javascript DOM Initializers

I think I did something clever. Once again, it's time to fight the losing battle of explanation. I find that I've been repeating myself a lot in Javascript. Worse, I've had to compromise my ideals pretty frequently. Consider the following Javascript.

<html>
	<head>
		<script src="/javascript/modules.js">
	</head>
	<body>
		<div id="clock_0"></div>
		<script>
			require('clock.js', function (clock) {
				clock.Clock(
					document.getElementById('clock.js')
				);
			});
		</script>
	</body>
</html>

In this example, I've installed the Cixar Javascript module loader. This gives me a global function called require that fetches javascripts and loads them as singleton modules and returns the module object or calls a given function closure with the module object. I then populate my document with HTML and bless the HTML with Javascripts. In this case, I set up a clock. The details of how this clock works are hidden in a Clock type that is in turn hidden within a clock module. This particular setup requires that any blessed DOM element have an identifier so that it can be in turn fetched by name in the Javascript.

It turns out that this pattern of laying out out a DOM element like a div followed by a script to imbue it with DHTML powers is pretty common in my Ish project. This makes for a lot of script blocks but also makes the server side code rather orthogonal and elegant. It's not that bad, and I am proud it's this clean so far.

However, it could be much cleaner. Consider this example that I implemented today.

<html>
	<head>
		<script src="/javascript/modules.js"></script>
		<script>require('init.js')</script>
	</head>
	<body>
		<div require="clock.js#Clock"></div>
	</body>
</html>