A new adventure

posted 11 August 2011, updated 11 August 2011

Last Thursday, I informed my managers at Yahoo! that I will be leaving the company. I have a lot of thoughts about leaving Yahoo!, and I'm going to assemble them into another post later. For now I want to talk about the new gig.

A while back, Jonathan invited me out to dinner. We'd worked together for a year on the dream team that was Yahoo! Widgets before it got mothballed, and he wanted to talk about some ideas he had around entertainment, social media and the Internet.

In a way that is characteristic of him, he started speaking fluently, passionately, and with all the focus of a terminal ADD sufferer about friends of his who are media types who make web content. About how they use blogs, Facebook, Twitter, YouTube, Flickr, and sites like those -- what he collectively termed "social media", a buzzwordy phrase, but usefully short.

Mostly, content creators use social media haphazardly at best. Not because they're dumb, but because there are so many sites for them to use, each with different use-cases and conventions and tools. Knowing about and using more than a fraction of their capability is a full-time job, and most media organizations aren't big enough to dedicate a full person to that role.

He also talked about metrics, and "closing the loop" on social media. At the moment people who pump content into these various sites get only the most basic idea of how successful they're being. They see view counts on YouTube, basic stats on Flickr, and on Twitter they can kinda-sorta track your retweets (except when you can't), or you can search for links to your content, except when you can't. The stats aren't always there, and even when they are it's hard to get the big picture.

There is, he said, a business opportunity here. I agreed, and said I wish I could help him out -- but, being on an L-1 visa at the time, I knew I couldn't. I also have another post in store about that.

Today Jonathan has taken that idea and gotten a lot further with it: he has founded a company called Snowball Factory (we're going to work on that logo), and with the able assistance of Cloudspace it's launched three products -- the flagship awe.sm, as well as two smaller tools, TweetPo.st and fbShare.me. Collectively, they're helping tame the beast of social media -- making it easier to use, more measurable, and more effective.

This is a hard, hard problem. In fact, it's five or six hard problems. It involves taking enormous amounts of data and boiling them down to simple conclusions, and wrapping complex APIs into simple, usable user interfaces. It involves making websites that scale, and APIs that are powerful but easy to use. And the result is that the web, as a whole, gets better. In short, it's not just building a website; it's developing the web. It's what I'm all about.

Jonathan has a lot more ideas, and I've got more than a few of my own. And starting next week, I will be joining Snowball Factory as employee #1 and technical lead (and co-founder, and janitor, and CTO, and tea boy, and sysadmin -- when you're employee #1, you get a lot of job titles, but I'm sticking with "technical lead" for now).

I'm extremely excited. Joining a startup is what I came to the bay area hoping to do, and after three years at Yahoo!, I'm doing it, and I'm in pretty much as early as you can be. It's going to be hard -- expect a lot of annoyed tweets about technical difficulties -- and the hours will be long, and there will be setbacks as well as triumphs. But it's going to be an awesome (and awe.sm) ride.

6 comments

On leaving Yahoo!

posted 11 August 2011

Today is my last day at Yahoo!. It's been four years -- more than twice as long as I've held any other job.

I remember very clearly, when I was fifteen and had had Internet access for only a few weeks, building my first web page and thinking "wow! This is fun! I wish I could get a job doing this!" Then I tried to think of big, web companies I'd really want to work for, and the first one was Yahoo!. "But they've already built their website", I thought to myself, "They don't need another web developer. Plus, I don't know Perl."

So nine years later, when Yahoo! contacted me and offered me a job in the London office, it was a dream come true. I sent excited emails to friends and family, I printed out a huge "I WORK FOR YAHOO" banner above my desk at home (in a stolen copy of the Yahoo! font). I know it sounds terribly cheesy, but I really did.

Joining Yahoo! was amazing. We're so *big*! We have our own fork of Apache, our own version of PHP, dozens and dozens of our own specialized products and plugins (I love yinst!). In my very first week, I was already making changes to websites seen by millions of people (the FIFA world cup site). And the resources we can draw on! The devel-frontend list taught me volumes about CSS and Javascript, as did internal training for YUI.

For a young web developer, there is absolutely no better place to work than here. I got to build not just big sites but great sites, working with people who are absolutely at the top of their game in every department -- design, engineering, ops, and QA. Plus there are hack days -- I love hack days! You get to build the coolest thing you can think of, as fast as you can, and show it off to hundreds of appreciative engineers. There's very few places in the world where you can do that.

Yahoo! has made me a better web developer, a better engineer, and a better teammate. I have learned so much from this company, and for that I am deeply, truly grateful. Being a Yahoo has been a big part of my life -- and I know, from seeing it in others, that you never really stop being a Yahoo, even when you're working somewhere else.

So then, why am I leaving? Because I have grown as much as I can. In my first year I grew as a web developer, in JS and CSS. In my second I grew as an engineer, architecting a whole website from scratch. In my third I grew as a database developer -- I became "the database guy" to some of you, which I still think is funny. I'm a web guy! But in this last year I have mostly grown frustrated. I'm not saying I have nothing more to learn, but I need to go somewhere else to learn it.

tagged with
5 comments

Seldo.Com is 10

posted 11 August 2011

I registered this domain ten years ago today, sitting in the chair by the window in my brother's apartment in Clapham South -- I had just moved to the UK from Trinidad, and hadn't found my own apartment yet.

The 10th anniversary of this blog is a little further away -- the site was mostly static until March 2001. This morning I grabbed a quick set of screenshots of some of the oldest designs of the site; they are pretty funky.

7 comments

3 years, 3 days

posted 11 August 2011

It was March 18th, 2007 when Barack Obama visited Oakland and I went to see him speak in person for the first time. It was there I first heard him promise universal healthcare by the end of his first term in office. "And I want to be accountable for this," he said. Of his speech that day, I said:

Above all, it was a message of optimism: yes, the system is broken, but it can be fixed, by us, right now. And this funny, sincere, incredibly, hypnotically charismatic man seems like just the right guy to do it

Today, the trust he inspired has been validated, and the promise he made has been -- as far as I am concerned -- kept. Sure, the coverage is not quite universal. And lots of things won't kick in until 2014, after the end of his first term. But that's politics. It's a business of compromise, and incremental advance. But he promised the biggest change to healthcare in a generation, and here it is.

Good going, Barry.

12 comments

Re-Expressed

posted 11 August 2011, updated 11 August 2011

A few weeks ago the Trinidad Express, one of Trinidad and Tobago's major national newspapers, redesigned its website. The result is an unreadable mess of tiny fonts and hundreds of blinking, flashing ads. Literally unable to read it myself, I quickly hacked-up a script that reformatted the front page. Some friends liked it, so I expanded it a bit.

So now, after a few weeks of tweaking, I give you Re-Expressed: the Trinidad Express, made readable. I hope you find it useful.

7 comments

Apple's ban on intermediate platforms, and what this means for web apps

posted 11 August 2011, updated 11 August 2011

Dear web developers hoping to build apps for the iPhone: we're fucked. But Apple is shooting itself in the foot.

Some background

There's a big fuss right now because as part of the iPhone OS 4.0 release, Apple has explicitly banned the use of intermediate platforms to create iPhone apps (and hence presumably iPad apps, since they run the same operating system).

Their motivations for doing so are the subject of debate. The supremely well-informed Jon Gruber of Daring Fireball thinks Apple is doing it to lock in iPhone as the de facto standard for mobile development, in the same way that Microsoft managed to get a lock on the PC market despite the many flaws of Windows -- by attracting critical mass of developers, and hence apps, and hence users, and hence developers, in a virtuous, monopoly-creating feedback loop.

This interpretation has been tacitly acknowledged by Steve Jobs himself. However, Jobs placed the emphasis on another aspect of the post, saying

intermediate layers between the platform and the developer ultimately produces sub-standard apps and hinders the progress of the platform.

This spins it as a user-friendly decision rather than a ruthless business one, but there's no reason it can't be both, and one imagines it being both would be just fine with Mr. Jobs.

Whether or not Apple is correctly positioned to dominate mobile apps a la Microsoft is a subject for another post. But right now, I think the idea that intermediate platforms are unwelcome on the iPhone raises an important question for web-native developers, such as myself.

Does the web count as an intermediate platform?

When iPhone first launched, Apple announced that apps will be web apps. They were supposed to be first-class citizens and in fact were the only way of producing apps for the phone. There is even still a web apps directory, a neglected, poor man's App Store for web apps.

Since then the real SDK was introduced. It's unclear whether it was planned all along, or if it was a strategy adopted after Apple saw the enthusiasm and creativity going into jailbreaking, which allowed developers to run custom apps on iPhone before that was officially allowed. Meanwhile, the APIs web developers were promised for iPhone never materialized: location is now available, but a hundred others are not, and with the release of iPhone OS 4.0 that list has grown.

And now the official word is that intermediate platforms are not welcome to make apps for iPhone. I can't think of a more obvious and widespread intermediate platform than the browser environment, and whether you believe the motivation is a better user experience or a hard-nosed attempt to monopolize mobile development, web apps lose.

Because there's no denying it: web apps provide worse user experiences than native apps on the iPhone right now. They don't have to -- Apple could expose all the APIs via the web, and add extensions and libraries to Safari that would allow the beautiful, fine-grained UI controls currently available to native apps. In fact, they already built one, called PastryKit. Its non-release, despite being high quality and inclusion-ready, is another indicator that Apple is deliberately ignoring web apps as a platform for the iPhone.

Our one hope as web developers for developing on iPhone with full APIs was being able to build web apps that would get compiled down to native code (or, on clever platforms like Appcelerator Titanium, run as WebKit instances inside a customized, lightweight native app). With this change of the rules, the future of platforms like these looks very uncertain, and the door has been slammed in our faces.

The illustrious ppk thinks all iPhone apps should be web apps. I think the chances are slim, and getting slimmer all the time.

It's a damn shame. And the wrong call.

Everyone knows the largest development platform in the world isn't Windows, or Mac, or desktop or mobile: it's the web, the only platform that runs on all of those, plus nearly everywhere else. Ignoring the giant and ever-growing contingent of web-native developers -- people who grew up writing apps for the web, have never written apps for anything else, and see little reason to start -- is to ignore the tide of history.

The unstoppable march of technology has taught me that what ten years ago seemed like a ludicrously inefficient idea soon becomes standard practice. Running an entire IDE as a Java app, for instance, or installing each major component of my development environment in its own separate virtual machine. Computational efficiency is repeatedly sacrificed for speed of development, because computers are cheap and getting faster all the time, while developers remain expensive and oh-so-slow.

So it doesn't matter if, right now, native mobile apps are faster. That advantage is momentary. It does matter that the experience is better, but that just means there's an opening in the market for a platform that really does treat web apps like first-class citizens. Android or, perhaps, Palm, if they get acquired by somebody more capable of building out a platform.

At no point will web apps be faster than native apps. And the experience might never be quite as good. But one day it will be "good enough". The desktop hit that tipping point more than five years ago -- what's the last really exciting new desktop app you installed? In my case it was Chrome, and that was because it was a better browser. To pretend that won't ever happen on mobile devices is silly.

Once it happens, the web will win again. Attempts to lock web apps to your platform with useful but proprietary extensions will fail, as Microsoft failed with ActiveX. Developers will put up with building simpler apps because they run everywhere, really everywhere. Developers go where the users are, and the users, no matter who made their hardware or wrote their software, are always on the web.

The web will win, eventually. But in the meantime, find somebody who knows Objective-C.

5 comments

Towards a real distributed social network protocol

posted 11 August 2011

Last week Facebook announced its Open Graph protocol. It sounds exciting, but is unfortunately a completely misleading name, being neither open, nor a graph, nor a protocol. Instead it is a Facebook social Data API, but since they already had one of those and it was broken you can see why they felt the need to re-brand. Elsewhere on the web Google and others are working on the OpenSocial APIs, which are at least accurately named. But they are just a standard way of accessing everybody's isolated walled gardens. Neither effort do anything to achieve the inter-operation of social networks that I imagine when I hear the names.

What would an open graph protocol really look like?

The reason the web works is because it is independent, decentralized, and simple. There is no prescribed ideal for the way web pages should fit together. Indexing is independent of representation, and indexing is open to anyone. The web is a graph, a real graph, where no node is more important and any path is possible, and the protocol is a true protocol, defining only the most basic forms of interaction and leaving semantics to the application layer.

So at first glance, it seems a social graph protocol would need the same properties:

  • a permanent, independent representation of entities
  • vertices available and explorable by any individual or robot
  • no hidden links or metadata
  • no assumptions as to the shape of the graph

For the first part, I think a lot of people assume that a true social graph would unify identity. The WWW* is itself; it represents only itself and it contains all of its own information. But the social web isn't like that; the online representation of a person is a proxy. And I think for reasons of privacy, security, and practicality, a total representation of our social graph would be undesirable. Such a graph would consist of everyone you've ever met, and for completeness would have to indicate how strong your connection was to them. That in itself is information we keep socially very private. In addition, we keep our social and professional lives to some degree separated. Our friends are not interested in our business contacts and vice versa.

A graph of graphs

A true, complete, open social graph is socially undesirable; it's not what we want. So what do we want? The answer is already emerging: we like our social graphs partitioned by intention: professional networks (LinkedIn), personal networks (Facebook), "social" networks (in the sense of "socializing" -- people we like to talk to or hear from) like Twitter, romantic networks (an infinity of dating sites). Then there are less well-established, niche networks built around personal history (alumni networks, mailing lists) or interests (forums and online groups).

For individuals who exist in multiple of these circles, we already accept duplication readily. Many recent attempts have been made to find ways keep all your social networks in sync and related. This is not a particularly complex technical problem (though scaling it would be nontrivial), and yet no-one has succeeded. I think this is not because we've not worked out how to do it. It's because nobody wants it -- nobody except nerds who like graph data, and marketers who dream of the giant rewards to be reaped from owning that data.

This changes things for the designer of the proverbial Open Graph. The shape of the graph we are expecting changes, as do the nature of the nodes. The nodes become facets of personality rather than single true representations of people, and the vertices become somewhat simpler: type of connection, and probably directionality, but without the degree of strength that would be so tricky to judge relatively in a unified graph -- is your business partner closer to you than your girlfriend? It's an impossible -- and largely pointless -- question. The graph ceases to be a single unified graph and instead becomes hundreds of graphs, occasionally connected but in ad-hoc and inconsistent. This is already sounding much more like reality -- and much more like the web as we know it.

No more honeypots

Furthermore, the openness is still a problem. Professional networks are closely-guarded secrets. Personal networks if open can be exploited for identity theft and social engineering. Privacy is paramount. We trusted Facebook with it and they pulled the rug out from under us, to monetize better. We trusted Google with it and they broke it by accident with Buzz. We never, ever trusted Microsoft with it. A central commercial repository for all our data is clearly the wrong way, and even a cental repository for each facet -- one for professional, one for personal, one for romantic -- seems flawed. What's the webby way to do this?

If we don't trust a single company with our data, if any single repository would be too much of an attraction, then we need instead dozens, hundreds of repositories: we need domains and servers, just as we have web sites and web servers, or email addresses and email servers. Each server will hold our social connections -- not a single true representation, but whatever facet of our personality we wish to represent via that identity. In fact, using an ID like name@domain.com -- similar to email addresses -- would not be a bad start.

To free us from the giant honeypots of isolated, centralized social networks, what we need is the protocol that would allow these systems to communicate -- in the same way that we each have an email address on a different server, but all email addresses can contact each other, we need distributed identity that can communicate via a protocol. In the early days of the Internet, services like AOL, Prodigy and Compuserve overcame the lack of a unified protocol by building rich walled gardens. In the evolution of the social web, it is time to make that same leap. social network A must be able to talk to social network B as a peer.

The basics of a true open graph protocol

What are the actions this protocol must allow? The same things networks right now allow:

  • rich identity representation
  • network activity updates
  • private one-to-few messaging
  • in-network searching
  • ad-hoc group communication
  • events (essentially just specialized metadata attached to private or group messaging)

I learned a lot about ActivityStreams at their StreamCamp event last week, and it is an interesting solution to the second problem I listed: standardized, federated, and open, it doesn't care what network an update comes from, it just aggregates them and passes them along. It's the right direction. But we need something grander.

Imagine a set of servers. You can create an account on any server and invent an identity, or even several different identities. Duplication is expected and even encouraged. Now create connections between entities. They can be within the domain, or they can be between domains. For a unidirectional link, only the originating server knows the connection; for a bidirectional server, both do. If the originating and destination server are the same domain, it stores both. It doesn't matter; external and internal connections are equal citizens, wrapped around some central standardized metadata that is extensible at will: richer networks can share more, simple networks are not required to do so.

Handling OGP requirements

Rich identity

Each domain holds a single unique key: username@domain, possibly with a short, more human-readable label (very short, to avoid spam -- see later). Around that they can wrap as much metadata as they like. Secondary standards will emerge to define larger sets of metadata with suggested keys, which networks can adopt from each other in order to more richly represent entities on external networks. But the protocol itself says nothing.

The protocol should also make no assumptions about the nature of an entity. Some entities will be people, but others might be companies, or groups, or even events -- the difference lying merely in the metadata that might be attached to the entity rather than a fundamental protocol-level difference.

network activity updates

These can be handled via a pub/sub mechanism like PuSH. When an entity performs an action it distributes that action to any subscribers. They can further syndicate within their own network according to their domain-specific rules. ActivityStreams are the way forward here.

Private messaging

At the protocol level, the creation of a connection allows messaging to flow backwards from the subscribed party to the originating server; a pair of connections therefore allows bidirectional messaging. The connection is created simply by exchanging keys: when A connects with B it offers a key, signed with the identity of B and a timestamp. If B accepts the connection, it can thereafter use that key as authentication to send messages. A can revoke the key according to its own logic at any time, and re-issue a new key with a new stamp. B is expected but not required to cease communication attempts after its key is rejected. This solves a fundamental problem of email, which is that possession of an address is sufficient for communication; instead, possession of an address is merely sufficient to request communication.

Obviously the mechanism by which the connection is created is the weak link: there must be a very small, extremely proscribed set of allowed metadata in the connection request. There could also be an optional "connection password": if the request contains the password (which might be transmitted independently via IM, word of mouth, or attached to a business card) then the metadata accepted as part of the request might be expanded.

Spam is much easier to handle in this model. Communication attempts from entities with no connection would be ignored -- no more AI-level intelligence required to determine whether a message was solicited or not. There would still be connection spam, but the protocol would allow only one or two connection requests -- subsequent attempts would be ignored by default, and blacklisting an entire domain would be simple, possibly even automatic after a sufficient number of ignored requests. Some nets might even maintain a whitelist of trusted social networks, and only allow unlisted networks to send requests at all if they contained the "connection password". Simple heuristics would allow automatic blacklisting of a domain that generated hundreds of rejected or ignored requests.

In-network searching

A thornier problem, but an interesting solution presents itself: a search within your social network would become, by default, a distributed operation. A search request would be broadcast to all the domains to which you have connections, asynchronously, and they would be permitted a time window in which to respond. The search request would be in an open format related to the identity metadata: domains receiving a search request they do not understand or do not allow would be permitted to ignore the request, either silently or sending a specific HTTP response to allow the searching server to efficiently skip that request in future.

Thus indexing becomes a simpler problem. Instead of a single global index owned by any one company, each domain is its own index. Simply by being a smaller network, the problem becomes simpler -- the global social database is, in effect, sharded across hundreds of domains; the searching is distributed ("mapped") to hundreds of domains, and the originating server needs only to perform an aggregation operation (a "reduction").

Depending on the rules of the domains, some searches might be forwarded, to allow for 2nd and 3rd-degree searches. This would allow for an even more powerful distributed search; a multi-stage map/reduce as each network rolls up its own results for the next. The network latency issues here are considerable; some degree of caching should probably be permitted on the originating server side. The degree to which searching is effective is entirely dependent on both the user and the domain. Professional networks might allow two degrees of search; dating networks** might allow four or five; strictly personal ones might ignore searches entirely.

Ad-hoc group communication

Another pub/sub mechanism. A group would be just another entity: one user would create group.identity@example.com, and other users would provide that entity with a key to join the group, and revoke it when they left.

Events and Invitations

An extension to the metadata of either a private or personal message, containing the identity of a new entity. An RSVP simply becomes a connection request; you subscribe to the event just like you subscribe to a group, and leave it by revoking the key.

Next steps

This is the seed of this idea, dreamed up while flying back from Chicago. Clearly there are holes, edge-cases, and more. But this is the right shape of the idea, and it's pretty exciting, I think. Do let me know what you think.

Some areas that need work:

  • Peer identities: if bob@yahoo.com and bob@hotmail.com are the same, just disconnected for historical reasons, a connection type should exist to indicate that they are the same.
  • Search: I'm painting in really broad strokes here. Presumably peer-to-peer file-sharing networks already have this problem solved to some degree.
  • Oh hell, all of it.

* I'm using WWW here not to be old-fashioned, but to distinguish between the web of pages (the true WWW) and the secondary web of entities, people and objects that some of those pages represent.

** Note that the protocol doesn't know if a network is social, professional, or romantic -- that's defined ad-hoc by the entities that make up the graph. By using professional.identity@example1.com and making connections to professional.identity@example2.com, you are creating a de-facto professional network. If you start making connections to your romantic identity at the same time, that's up to you -- or possibly to the rules defined by your domain.

6 comments

A letter from a mother

posted 11 August 2011

I've already posted this letter from the mother of a gay son to her local newspaper in Vermont to delicious, but it's worth putting in as many places as possible. It's really brilliantly written. The phrase in particular that I wish every anti-gay religionist in America was required to read before opening their mouths ever again:

If you want to tout your own morality, you'd best come up with something more substantive than your heterosexuality. You did nothing to earn it; it was given to you.

I want this on a t-shirt. And a billboard. And written in the sky.

tagged with
6 comments

In defence of SQL

posted 11 August 2011

If this title does not interest you, here are some alternative, linkbait titles:

  • Why ORM is the Dumbest Idea Ever
  • Why NoSQL is a Terrible Idea
  • OMADS: the future of data storage
  • Why SQL Will Eventually Conquer The World

A little history

SQL was invented in the 1970s at the same time that "large-scale" (read: millions of rows) data stores came into existence. It triumphed over other query languages not because it was particularly great (though it was easier to read), but because it was standard. Everybody building a data store could write to the SQL standard without having to re-train all their clients and customers. It reduced friction all round. It was a huge success.

SQL is awkward

There's no escaping that SQL, as we use it day to day, is not pretty.

Keep in mind that what SQL is really designed to express is relational algebra, a type of logic essentially invented by the ridiculously clever E.F. Codd (along with nearly all the other theoretical underpinnings of relational databases). If you're not familiar with it, I find it helps to think about relational algebra as Venn diagrams: it's about sets intersecting with, unioning with, subtracting from, joining with each other. Find all the fruits in set A, with prices in set B, farmed by the farmer in set C. That kind of thing.

What it's not really for is collating, aggregating, and most especially filtering of data sets. The reason count(*) is so awkward is because that's not really what the language was designed to do. GROUP BY and ORDER BY clauses look tacked-on because they are (HAVING is an even more grievous hack, UNIQUE is a disaster, and let's not get started on LIMIT). Of course, in regular use of a data set, you nearly always want to do these things, which is why SQL provides them. SQL, loyal workhorse that it is, is nothing if not willing. But it might not be terribly quick.

So you're right. SQL -- the kind you write every day -- is ugly and awkward. In fact, it looks like hell on legs. And it's often pretty slow. And that's all because you're asking it to do something it, the language, is not really designed to do (whether the engine is designed to do it is another question). But it works, and in forty years since its invention we have come up with very little in the way of improvements and nothing close to powerful enough to be a replacement.

What about ORM?

I want to be very, very clear about this: ORM is a stupid idea.

The birth of ORM lies in the fact that SQL is ugly and intimidating (because relational algebra is pretty hard, and very different to most other types of programming). Our programs already have an object-oriented model, and we already know one programming language -- why learn a second language, and a second model? Let's just throw an abstraction layer on top of this baby and forget there's even an RDBMS down there.

This is obviously silly. You've stored your data in a way that doesn't match your primary use-case, accessible via a language that you are not willing to learn. Your solution is to keep the store and the language and just wrap them in abstraction? Maybe you'd do that if your data were in a legacy system and you needed to write a new front-end, but people slap ORM on new projects. Why the hell would you do that?

ORM is slower than just using SQL, because abstraction layers always are. But unlike other abstraction layers, which make up for their performance hit with faster development, ORM layers add almost nothing. In fact, often, if you need to do anything more complicated then a SELECT, you end up writing fragments of SQL or pseudo-SQL languages in order to tell the underlying RDBMS what you're trying to really do.

OMADS: data stores that match the application

ORM is dumb, and people noticed. So clever programmers looked at this ridiculous edifice and realized the real problem: the data store and the use-case were mismatched. So they threw away ORM, SQL, and RDBMS, and wrote lovely new key-value stores, or object stores, or document stores, or searchable indexes, or any of a half-dozen other data structures that more closely matched what they were trying to do. And because these data stores all turned up at a time when nearly all data stores were SQL-interfaced RDBMS, they got the name "NoSQL", even though the actual problem was the Relational model, not SQL itself. And because "Obviously More Appropriate Data Stores", or OMADS, is not catchy enough I guess.*

So I love NoSQL stores. My startup would literally be unable to function without memcache. I think Cassandra is nifty even if Twitter found it not worth the trouble of switching from MySQL. I think Redis is cool if a little buggy. MongoDB is awesome, and I'm probably going to be building a production system based on it quite soon. HDFS I use in production every day, and it still blows my tiny little mind. Really, the only think I dislike about them is the label "NoSQL", which as many people have already pointed out doesn't really say anything about what they are, just what they are not. And also because it makes people unfamiliar with the details of the situation think there's something Wrong, Bad or Old Fashioned about SQL. And programmers hate using anything that is any of those things.

What is the relational data model good for anyway?

So if your data store should always match your application, what application is it that RDBMS are perfect for? The answer is: all, and none.

We take this for granted these days, but the relational model is pretty magical. Set up a model of your entities, pour data into it, and get answers. How many teachers at the university earn over $100k but teach less than 20 students? How many customers who bought our newest product had never bought anything before? What were sales like on Tuesdays over the last 30 months? You don't have to know in advance what your questions will be; you don't have to write any special code to examine all the rows, or work out the most efficient strategy for combining the results: you just need to know how the data relate each other, and then you can ask ad-hoc questions and the database knows the answer. I remember the first time I really grokked that concept, and it filled me with nerdy joy.

If you pick the wrong data structure for your store when you're first writing your application, you can end up -- as happened to a team at my last job -- running crazy, days-long depth-first searches across distributed document stores in order to perform elementary operations like getting a total count of objects. So if you don't know all the questions you might need to ask about your data, the safest thing to do is put them in an RDBMS. And when you first start a project, you almost never know all the questions you're going to need to ask. So my advice: always use an RDBMS. Just don't only use an RDBMS.

Optimize, but be prepared for ad-hoc queries

Is your data really just a giant hash lookup? Then a key-value store is what you want. Do you primarily access your related data via a single key? Then a document store is for you. Do you need full-text searching? Then, dear god, use a text-indexing engine, not an RDBMS. Do you need to answer questions about your data that you can't predict in advance? Then make sure your data also ends up in an RDBMS. Maybe not in real-time, maybe summarized rather than in raw form, but somehow. Then when your co-founder asks "how many Xs happened in Y?" your answer won't be "uh, let me spend half a day writing code to find that out". Just throw down some SQL, and it'll give you an answer -- it'll take 5 minutes to return a single number, but that's a lot faster than half a day.

Because that's what SQL is for.

Post-SQL

If you scroll back to the top you'll see the description of the circumstances that gave birth to SQL: a whole bunch of new data stores came into existence at once, and the lack of a common language created friction and fragmentation. The same thing is happening again with the NoSQL crowd. If you decide to write your app using Cassandra, you better be sure it's what you want, because if you change stores you have to change all your code. It's the ultimate lock-in, and it's not the plan of an evil monopolist corporation, it's just an unfortunate side-effect.

Pretty soon, the same sort of clever people who noticed that ORM was a ridiculous hack will notice an opening for an actually useful abstraction layer: a single common API that can access all the NoSQL stores. Maybe it will be Thrift or Avro, but I'm not sure. I'd say the chance is about 50-50 that it will be SQL again.

SQL triumphant

And why not? Awkward it may be, but SQL is a lot more succint and readable than multiple lines of API calls or crazy, math-like relational algebra languages. And there's nothing intrinsically slow about the language itself. If you could run "SELECT * FROM table WHERE ..." on Cassandra, it would be no slower than specifying the same conditions via API calls. In fact, when trying to explain how to use its API, the MongoDB documentation lists the equivalent SQL queries. That's a pretty clear vote for the usability of SQL.

Computer programmers really like new, cool things. So when something like SQL hangs around for nearly 40 years, it either means nobody really cares about it -- I think we're clear that's not the case -- or that there's really nothing else that can do the job quite as well.

So go forth, use your OMADS, keep an RDBMS in your back pocket, and stop being so mean to poor old SQL.


* On the off-chance that anybody starts calling these things OMADS, remember: you heard it here first.

Updated 2010-07-13 to fix link to E.F. Codd; thank you Sordina!

12 comments

Arrington is completely wrong about women in technology

posted 11 August 2011

Michael Arrington's post on TechCrunch today about who to blame for the lack of women in tech was even more offensively wrong than I was expecting from the title, and that's really saying something. It goes off the rails right in the first paragraph:

Success in Silicon Valley, most would agree, is more merit driven than almost any other place in the world. It doesn’t matter how old you are, what sex you are, what politics you support or what color you are. If your idea rocks and you can execute, you can change the world and/or get really, stinking rich.

wrong, wrong, wrong and wrong. It matters enormously how old you are -- either too young to be taken seriously as an entrepreneur, or too old to be taken seriously talking about new tech. Your color is ridiculously important, because the people with money, who are almost exclusively men and mostly white, are more comfortable talking to other white men, and your nationality even more so, because of visa restrictions. Even your politics are important, because Silicon Valley is hugely liberal, and those who aren't democrats are libertarians.

And above all your gender matters. Because the ugly truth is that the men of Silicon Valley do not take women in tech seriously by default. I see it every day. If a woman walks into the office, people ask if she's in HR or marketing or legal or product, or frankly anything other than engineering. And distressingly, most of the time they're right, because there aren't many women in tech. And as everyone knows and keeps saying, that's a vicious circle: the expectation that women don't get into tech is what keeps them out of it.

Here's how it happens: if a woman engineer starts talking, men will wait until she says something notably clever before they start taking her seriously. Men on the other hand are taken seriously by default, and only get dismissed if they say something notably dumb. That, multiplied by thousands of conversations every day, is all it takes to enforce huge cultural bias against women in Silicon Valley and tech at large. I know this is true because, even though I try very hard not to, I've done this myself.

So if you're a man in tech, and you want to fix this problem, it's simple. Start with yourself, and your expectations. The next time a woman walks into your office, make no assumptions about her job title. Don't ask if she's somebody's girlfriend. The next time a woman -- at the workplace, at a party, wherever -- make a point about technology, make sure you're not making any assumptions about her level of expertise that you wouldn't make if she were male. That's the change I'm trying to make in myself, and it's surprisingly hard to do, because snap judgements are so easy to make, especially when they are habitual. I even had to edit this post a little when I realized I'd written it with the assumption that my audience would be male. It's insidious.

Arrington's post concludes with some weasel language in which he does not explicitly state, but instead paraphrases somebody else saying, that women are fundamentally, culturally unsuited to starting companies because they are "nurturing" and "not risk-taking" enough. He even trots out that bullshit about Mars and Venus. And I'm sure there's a lot of people out there who, secretly, think he might have a point.

But he's wrong. The reason there are so few women in tech is because of the men. As a man, I'm trying to do my part to undo that, and if you're a man I suggest you do the same.

4 comments

San Francisco city guide map, for the prospective resident

posted 11 August 2011, updated 24 March 2013

Update 2013-03-24: People continue to find this map very useful, so I've made a minor update to include the up-and-coming La Lengua neighborhood.


A friend of mine is moving to San Francisco, and asked me for advice on where are the nice places to live. This is a sufficiently common question that I decided to do a proper answer, in the form of a custom Google map. I mentioned it on Twitter and it got quite a lot of responses, so here for posterity is my guide to the neighbourhoods of San Francisco:


Where possible I've made comparisons to equivalent areas in London, as that's where my friend is moving from. Comments and suggestions are welcome; Twitter is probably the easiest way.

Update: this map is obviously extremely subjective. Many people love the Richmond, and huge numbers of people think the Marina is great. I make absolutely no claim to objectivity, so don't yell at me.

5 comments

The obligatory I-am-getting-older post

posted 11 August 2011

Few people, when they are fifteen, have any idea what they want to do with their lives. A lot of people, when they're eighteen years old, don't know what it is they want to study at college -- it's a momentous decision that shapes the rest of your life; how could you possibly make it without knowing what it's really about? After college, the same sort of fear paralyzes people: what do I do now? What sort of job do I want? Where do I want to live? At all these stages, overwhelmed, some make bad choices, while others luck out.

I'm not one of those people. Through a fantastically unlikely combination of timing and aptitude, I was born at pretty much exactly the right time to be present for the popularization of the Internet and the birth of the web, the medium by which I am endlessly fascinated and to which I am perfectly suited.

At age fifteen, before I even had Internet access at home, I was building my first web page. After a few days of that I was pretty clear that this was what I wanted to spend the rest of my life doing: combining technical know-how with creative self-expression to create what are (to greatly varying degrees) works of art that function and interact with the user. As a talkative geek, I believe there's no profession better than web development for me.

So there was none of this existential angst for me. Choice of degree at university? Obvious. Choice of first job out of college? Easy. Also second, third, and fourth. Opportunity to move to San Francisco? A no-brainer. Startups? Clearly the way to go. In stark contrast to the utterly typical uncertainty of my private life, my professional path has always been clear and my choices, if not easy, then simple. This is both very unusual and incredibly lucky, facts I'm very aware of and for which I am always grateful.

Today (well, yesterday) I turned twenty-nine, a fact that's pretty startling to me, as I'm pretty sure I only just left Trinidad yesterday. But I'm doing exactly what I want to do, in exactly the right place to be doing it, almost exactly as I pictured a decade and a half ago. And on top of that giant mound of luck I have the company of wonderful friends and the support, even from far away, of a loving family.

Everyone should be so lucky.

tagged with
6 comments

PHP needs to die. What will replace it?

posted 11 August 2011

It's time for PHP to die. And I say this as a die-hard PHP developer currently converting an existing Ruby on Rails codebase to PHP.

History repeating

The reason I know PHP has to die is because I've seen this before. Roughly a decade ago, PHP killed Perl. Not completely, of course; it still clings on in some environments, it has a sizable legion of die-hard fans, and legacy apps will need to be maintained in it for decades to come. But as a language for newcomers, and especially for web developers, it was already dying in 1999 and was mostly dead by sometime around 2005.

As a newcomer to web development around then, it was clear both that this would happen and why: Perl was ill-suited for the new application environment. Pages of tedious boilerplate CGI were required in Perl to achieve PHP's basic, default behaviour. The language was full of anachronistic features -- pointers (update: sorry, references), inconvenient hash structures, and a dozen other little language quirks -- that made web development tedious, insecure, or inconvenient. There was no reason you couldn't write a perfect web app in Perl, but in PHP you'd do it faster and easier, despite the flaws in PHP itself which were, even then, already obvious.

The arguments for Perl over PHP in 1999 were many: it was a lot faster, it had far more libraries and driver support, and CPAN was a wonderland of pre-written code that would get you 80% of the way in almost any task. It sounds funny to say it now, but "PHP doesn't scale" was actually an argument back then. But PHP won anyway, because those things are not intrinsic advantages to the language. Interpreters can get faster, libraries always get written, and PEAR and PECL are gigantic these days*, without considering the myriad less-formal libraries available from every vendor who wants people to use their APIs. PHP is the de facto standard language of web development.

Time to move on

Ten years later, I can feel the tide turning again. Developers' expectations of languages have moved on. If the critical thing Perl was lacking was PHP's wonderfully flexible "associative arrays" (aka smart hashes), then what PHP is lacking is lambdas and method chaining. While PHP used to be the language where you could write a web page in twenty lines of code, nowadays it doesn't feel like you're doing it properly unless you've laid down at least a basic MVC framework of some kind. That boilerplate code is the tell: the language now requires modification by a framework to do what you need.

Back then, I felt the die-hards clinging to Perl for web development were silly. Now, with ten years of PHP experience under my belt, I'm in the same position. I can knock out a good website in an hour in PHP, and an excellent one in a day or two. Its performance characteristics are well-known and understood, so I can make it scale pretty much indefinitely. Every developer we'd want to hire knows it, and every system we'd integrate with has a wrapper library written in it. I am trapped by the convenience of PHP in a language that is losing its suitability for the task.

Forward Ruby on Rails

The most obvious potential successor to PHP is Ruby on Rails**. Ruby is a newer, cleaner language, with modern features and a sparse, elegant syntax (much like Python). Rails takes the common tasks and boilerplate of a best-of-breed web app away and turns what are three or four-line idioms in PHP into first-class language constructs. This sounds exactly like what I need to replace PHP and accelerate my development once again.

But seven months into using Rails every day, hacking on a Rails app constructed by experienced Rails experts who love the framework and the language, I just can't say it's the right choice, for reasons that I find hard to pin down. The purpose of this essay is to try and discover them.

My chief gripe, it must be said, is performance. I've just finished saying this should not be considered a fatal flaw in a language, but rather a temporary problem in its implementation. So I cannot really take it as an argument, though I should note the performance is the primary reason I am porting my current app back to PHP. I can make Rails run just as fast as PHP, but I need between 2 and 4 times as much hardware to do so. In five years time this is unlikely to be true, and in five years time maybe I wouldn't be switching to PHP. But right now, it's not cutting it.

Second, I hate Active Record. Active Record is a pattern, not intrinsic to Ruby, and optional in recent versions of Rails, but its use and its patterns are deep in the DNA of Rails. I have previously gone into why I think ORM on RDBMS is a bad idea, so I won't repeat myself except to summarize that the efficiency gain of not having to write CRUD is more than outweighed by the efficiency lost by ActiveRecord doing silly things and spending time working out what those are, and bending the rules of the framework to prevent it doing so.

Thirdly, I am deeply suspicious of code generation. Code that writes your boilerplate for you is helpful and all, but if your language requires a pile of boilerplate to get anything done, then something is already wrong. Code generation encourages "magical thinking", where the coder is not sure whether a particular convenient feature comes from code that was written for them or intrinsically as part of the language environment. Magical thinking is dangerous.

Code generation brings me to what is probably the fundamental problem with Ruby on Rails, which is that it is not a language. Ruby is the language. And Ruby, while solving some of PHP's fundamental problems, does not solve the core problem, which is that modern web applications have an elevated set of expectations: features like routing, model/view separation and drop-in functionality are now par for the course. Rails adds these, but it is the same bandage that MVC frameworks like Zend, Symfony and Code Igniter add to PHP.

So what's missing?

The language that takes the reigns from PHP has to be as much better than PHP as PHP was better than Perl. It has to make assumptions about the primary shape of web applications in the same way that PHP assumed that your code's primary function was always going to be spitting out a web page -- a radical assumption that makes it slightly awkward to do other things, like shell scripts. I want a language that assumes everything I will be building is an MVC web app, and builds that right into the core language, not just a library.

The problem is, there's no such language. For a while it looked like maybe server-side JavaScript would be the next big thing, unifying language on the front-end and the back-end of web applications. But the great minds of JavaScript have wandered off on the tangent that is nodejs: the evented pattern is a radical and powerful way of making high-performance applications that make best use of modern hardware, but it is a way of making server-side applications, not web pages. And there are still an awful lot of web pages that need to be written. Other CommonJS efforts like ejScript begin to attempt replacing PHP but do not resolve the framework problem.

Still waiting

I'm forced to conclude that PHP's replacement is just not here yet. Ruby on Rails is good, but not that much better than a similar MVC framework on top of PHP, and certainly not better enough to justify the performance hit brought on by the double-whammy of inefficiency of Ruby itself and ActiveRecord's ORM shenanigans. Python appears uninterested in being the next web language, and JavaScript's server-side revolution is only just getting started.

I await the Next Big Thing. I want to switch away from PHP, I really do. I don't want to be the Perl dinosaur. But whatever it is, it doesn't seem to be here yet. Am I wrong?

* Another advantage that I strongly feel contributed to PHP's success was its amazing documentation, which was complete, accurate, and full of real-world example code. Many early PHP apps -- mine included -- were written by just surfing the documentation and pasting together the relevant snippets. However, this is not an intrinsic advantage of PHP so I'm going to leave it out of consideration. A final advantage was that PHP runs well and installs easily on Windows; 95% of all computers on earth are still running Windows, so your language needs to run there if you expect widespread adoption by teenaged newbie developers who don't get to pick their own hardware.

** Why not Python? I just feel if Python were going to break out as a web language it would have done so by now. That's not to say nobody is writing web apps in python -- many people I highly respect do so -- but it's hardly a revolution in progress. Python is a wonderful language: clean, elegant, easy to learn and maintain. But while it is excellent at the Internet and networking in general it has no particular love for the web. Django is capable but not really what developers expect from MVC, and other MVC frameworks for Python have much less traction. Python is not going to take over the web, I think primarily because not even its fans want it to.

73 comments

PHP needs to die. What will replace it?

posted 11 August 2011

It's time for PHP to die. And I say this as a die-hard PHP developer currently converting an existing Ruby on Rails codebase to PHP.

History repeating

The reason I know PHP has to die is because I've seen this before. Roughly a decade ago, PHP killed Perl. Not completely, of course; it still clings on in some environments, it has a sizable legion of die-hard fans, and legacy apps will need to be maintained in it for decades to come. But as a language for newcomers, and especially for web developers, it was already dying in 1999 and was mostly dead by sometime around 2005.

As a newcomer to web development around then, it was clear both that this would happen and why: Perl was ill-suited for the new application environment. Pages of tedious boilerplate CGI were required in Perl to achieve PHP's basic, default behaviour. The language was full of anachronistic features -- pointers (update: sorry, references), inconvenient hash structures, and a dozen other little language quirks -- that made web development tedious, insecure, or inconvenient. There was no reason you couldn't write a perfect web app in Perl, but in PHP you'd do it faster and easier, despite the flaws in PHP itself which were, even then, already obvious.

The arguments for Perl over PHP in 1999 were many: it was a lot faster, it had far more libraries and driver support, and CPAN was a wonderland of pre-written code that would get you 80% of the way in almost any task. It sounds funny to say it now, but "PHP doesn't scale" was actually an argument back then. But PHP won anyway, because those things are not intrinsic advantages to the language. Interpreters can get faster, libraries always get written, and PEAR and PECL are gigantic these days*, without considering the myriad less-formal libraries available from every vendor who wants people to use their APIs. PHP is the de facto standard language of web development.

Time to move on

Ten years later, I can feel the tide turning again. Developers' expectations of languages have moved on. If the critical thing Perl was lacking was PHP's wonderfully flexible "associative arrays" (aka smart hashes), then what PHP is lacking is lambdas and method chaining. While PHP used to be the language where you could write a web page in twenty lines of code, nowadays it doesn't feel like you're doing it properly unless you've laid down at least a basic MVC framework of some kind. That boilerplate code is the tell: the language now requires modification by a framework to do what you need.

Back then, I felt the die-hards clinging to Perl for web development were silly. Now, with ten years of PHP experience under my belt, I'm in the same position. I can knock out a good website in an hour in PHP, and an excellent one in a day or two. Its performance characteristics are well-known and understood, so I can make it scale pretty much indefinitely. Every developer we'd want to hire knows it, and every system we'd integrate with has a wrapper library written in it. I am trapped by the convenience of PHP in a language that is losing its suitability for the task.

Forward Ruby on Rails

The most obvious potential successor to PHP is Ruby on Rails**. Ruby is a newer, cleaner language, with modern features and a sparse, elegant syntax (much like Python). Rails takes the common tasks and boilerplate of a best-of-breed web app away and turns what are three or four-line idioms in PHP into first-class language constructs. This sounds exactly like what I need to replace PHP and accelerate my development once again.

But seven months into using Rails every day, hacking on a Rails app constructed by experienced Rails experts who love the framework and the language, I just can't say it's the right choice, for reasons that I find hard to pin down. The purpose of this essay is to try and discover them.

My chief gripe, it must be said, is performance. I've just finished saying this should not be considered a fatal flaw in a language, but rather a temporary problem in its implementation. So I cannot really take it as an argument, though I should note the performance is the primary reason I am porting my current app back to PHP. I can make Rails run just as fast as PHP, but I need between 2 and 4 times as much hardware to do so. In five years time this is unlikely to be true, and in five years time maybe I wouldn't be switching to PHP. But right now, it's not cutting it.

Second, I hate Active Record. Active Record is a pattern, not intrinsic to Ruby, and optional in recent versions of Rails, but its use and its patterns are deep in the DNA of Rails. I have previously gone into why I think ORM on RDBMS is a bad idea, so I won't repeat myself except to summarize that the efficiency gain of not having to write CRUD is more than outweighed by the efficiency lost by ActiveRecord doing silly things and spending time working out what those are, and bending the rules of the framework to prevent it doing so.

Thirdly, I am deeply suspicious of code generation. Code that writes your boilerplate for you is helpful and all, but if your language requires a pile of boilerplate to get anything done, then something is already wrong. Code generation encourages "magical thinking", where the coder is not sure whether a particular convenient feature comes from code that was written for them or intrinsically as part of the language environment. Magical thinking is dangerous.

Code generation brings me to what is probably the fundamental problem with Ruby on Rails, which is that it is not a language. Ruby is the language. And Ruby, while solving some of PHP's fundamental problems, does not solve the core problem, which is that modern web applications have an elevated set of expectations: features like routing, model/view separation and drop-in functionality are now par for the course. Rails adds these, but it is the same bandage that MVC frameworks like Zend, Symfony and Code Igniter add to PHP.

So what's missing?

The language that takes the reigns from PHP has to be as much better than PHP as PHP was better than Perl. It has to make assumptions about the primary shape of web applications in the same way that PHP assumed that your code's primary function was always going to be spitting out a web page -- a radical assumption that makes it slightly awkward to do other things, like shell scripts. I want a language that assumes everything I will be building is an MVC web app, and builds that right into the core language, not just a library.

The problem is, there's no such language. For a while it looked like maybe server-side JavaScript would be the next big thing, unifying language on the front-end and the back-end of web applications. But the great minds of JavaScript have wandered off on the tangent that is nodejs: the evented pattern is a radical and powerful way of making high-performance applications that make best use of modern hardware, but it is a way of making server-side applications, not web pages. And there are still an awful lot of web pages that need to be written. Other CommonJS efforts like ejScript begin to attempt replacing PHP but do not resolve the framework problem.

Still waiting

I'm forced to conclude that PHP's replacement is just not here yet. Ruby on Rails is good, but not that much better than a similar MVC framework on top of PHP, and certainly not better enough to justify the performance hit brought on by the double-whammy of inefficiency of Ruby itself and ActiveRecord's ORM shenanigans. Python appears uninterested in being the next web language, and JavaScript's server-side revolution is only just getting started.

I await the Next Big Thing. I want to switch away from PHP, I really do. I don't want to be the Perl dinosaur. But whatever it is, it doesn't seem to be here yet. Am I wrong?

* Another advantage that I strongly feel contributed to PHP's success was its amazing documentation, which was complete, accurate, and full of real-world example code. Many early PHP apps -- mine included -- were written by just surfing the documentation and pasting together the relevant snippets. However, this is not an intrinsic advantage of PHP so I'm going to leave it out of consideration. A final advantage was that PHP runs well and installs easily on Windows; 95% of all computers on earth are still running Windows, so your language needs to run there if you expect widespread adoption by teenaged newbie developers who don't get to pick their own hardware.

** Why not Python? I just feel if Python were going to break out as a web language it would have done so by now. That's not to say nobody is writing web apps in python -- many people I highly respect do so -- but it's hardly a revolution in progress. Python is a wonderful language: clean, elegant, easy to learn and maintain. But while it is excellent at the Internet and networking in general it has no particular love for the web. Django is capable but not really what developers expect from MVC, and other MVC frameworks for Python have much less traction. Python is not going to take over the web, I think primarily because not even its fans want it to.

4 comments

Three gay teens kill themselves every day

posted 11 August 2011

The twittersphere and blogs have been alight this week with a string of high-profile suicides by gay (or perceived to be gay) teenagers, starting with Billy Lucas, who was 15. Then there was Seth Walsh, a 13-year-old who hung himself and was taken off life support after ten days. Then Asher Brown, another 13-year-old, shot himself in the head. Then 18-year-old Tyler Clementi jumped off the George Washington Bridge after fellow students broadcast video of him having sex on the internet. And on Wednesday Raymond Chase, a 19 year old, hung himself.

Their deaths are unbearably sad, and deserve all the attention they've been getting, and all the hand-wringing about teen suicide, and gay teen suicide in particular. I encourage you to check out Dan Savage's It Gets Better project, and watch Ellen's plea to end bullying. But we need to be clear: this is not a sudden surge in gay teen suicides. This isn't even a complete list of the gay teen suicides that happened in September. For that, we'd need nearly a hundred names.

Between 4,000 and 5,000 teenagers kill themselves every year in America[1],[2]. That's between 11 and 13 teens killing themselves every single day. gay teenagers are four times more likely to commit suicide than straight teenagers, which means about 3 gay teenagers are killing themselves every day, about 95 every month[3].

So mourn these kids, all the potential they had that we've lost. But remember the scale of the problem we're dealing with: for every one of these, there are eighteen more we didn't hear about, just in September.


[3] About 26% of teen suicides are gay teens. This assumes gay teenagers are about 8% of the population, which is a very hard statistic to nail down.

9 comments

It gets better

posted 11 August 2011

[I tried to make an It gets better video, but it didn't work. If I spoke it sincerely I kept bursting into tears, and speaking it insincerely sounded robotic and terrible. I am much better with the written word, so inspired by Tom, here's my contribution. This is a message to gay kids. You can read it if you're not a gay kid, but you're not the intended audience.]

Hi. So I know this is several paragraphs long and you were born into the age of YouTube so you may not get to the end of this. In which case, here's the summary: it gets better. It's really bad now, it may even get worse, it will become unbearable, but somehow you'll bear it anyway. And then it will get better.

When I was fifteen and sixteen, I thought about suicide quite a lot. Not vague unfocused intentions, but specific plans of where, when, how high up I would start and how hard I would hit the ground.

I was going to do it because I had realized I was gay, and I couldn't face it. My parents were pretty conservative, especially my father, and I lived in Trinidad, a small island in the Caribbean with a whole lot of religions, most of which were pretty clear that being gay was a bad thing.

My school didn't help. It was an all-male, Catholic school run by priests. It was an all-day machismo competition and intensely homophobic. I was already unpopular for being a geek; I was already getting beat up every day, even before I realized I was gay.

So I couldn't come out, I was sure of that. I was sure my parents would disown me, my few friends would reject me, my school would expel me. And I couldn't leave, literally. On an island, there is nowhere to run to.

I already knew that after school I would be leaving the island to go to college. But that was three years away -- three years! An unbearably long time to endure the hell of knowing I was a sick pervert, of hearing friends belittle each other constantly for the slightest hint of less-than-total masculinity. Of hating myself for being unable to change myself to be "normal". I wanted desperately to be just be normal.

Most of all, I wanted to kill myself because I couldn't see how things would get better. As far as I could see I had fucked up my life. My plan had been: school, college, job, wife, kids, retire. Now that whole plan was derailed. I felt like I had lost everything I was looking forward to. I didn't know anything about gay people except that a lot of them seemed to get AIDS, and that a lot of people hated them.

And that's why I'm writing this now. Because I was wrong, totally wrong, about all of that stuff. And because what I really needed then was someone who knows the stuff I know now to turn up and tell me that things would be okay, that it would get better. To tell me it was worth hanging on.

I found those people when I got Internet access, via the Youth Lists. The love and support of my friends on that list saved my life. But lots of gay kids don't find those people, and they do terrible, drastic things that break my heart every time. So I am adding my voice to the chorus, hoping you can hear me: it gets better. And here's how.

The first step is to stop judging yourself by what you thought you had. Don't think about the things that your being gay has denied you, don't think about what you've lost. Think about what you have. Your youth, your health, your mind, your body, your potential. So much potential, to do things that are brave and beautiful and smart and funny.

The reason we older gays get so upset every time one of you guys kills yourselves is because we see ourselves in you. We see the same shitty situation, and we get angry that nothing seems to have changed in those schools that made our lives so terrible. But overwhelming our anger is our grief, because we see what might have been. We cry because of all the things you never got a chance to do just because we didn't find you in time, we didn't try hard enough, we didn't say it loudly enough: it gets better.

The next step is to come out. Even if it's just to yourself. You don't have to decide now and lock in your decision forever. You're allowed to change your mind later. But be honest with yourself, about what it is -- who it is -- that you want right now, and who you want to be. There's no right and wrong in recognizing what you want. There's no weird and normal. There's just you, and what makes you happy. There's nothing more normal than just wanting to be happy.

Maybe you can come out to some friends. My friends surprised me, and they were from a crazy homophobic country, and that was fifteen years ago. Your friends grew up watching Will and Grace, and Ellen, and that awesome Justin kid on Ugly Betty, and Kurt on Glee. Even if they might not think about it, they know that it's okay to be gay. They know that only crazy old folk really think it's wrong to be gay, even if they sometimes say otherwise.

And, amazingly, they'll realize that the person you were before you say the words "I'm gay" is the same person you are afterwards. They won't abandon you. I remember when it was so hard to believe that. So tell a friend. And then another. And that's when it will start to get better.

And after that, it keeps getting better. You can go to college, or just get a job, and leave home. That makes telling your parents easier, believe me. Once they realize that you're not around, and them acting like jerks just because you like dick means they might never see you again, they come around. And if they don't, then it's their loss, and their fault, not yours. You're not doing anything to hurt them. You're just telling the truth, like they taught you.

And you can get the hell out of that one-horse town you're in. So there are no gay people where you are? Head for a city. There are lots of us here, enough to make sure nobody tries to fuck with us just because we're different. That makes it get a lot better. You can do what you want, dress how you want, hold hands in the street. You'll discover that all the bad things about being gay are put there by bigots. Just get the bigots out of your life, and suddenly things improve.

Those are the little things. And then there's the big thing that makes it get better: falling in love. Oh, there's sex too, and sex is pretty great. But love -- love is amazing. Love is the real deal. Love is what makes all the shit you're putting up with now worth it.

And it'll happen to you. I know it's hard, so hard to believe that right now, as you're stuck in your bedroom watching this video with the sound down so nobody will hear. To believe that somewhere out there is a boy who will actually like you, your mind, your body, just the way it is. But there is.

It's worth it. Hang in there. Please hang in there. I know how hard it is, and I remember knowing that nobody could possibly understand how hard it was, so all I can do is tell you that I do understand, I do remember, even though I know you won't believe me.

It's hard, unbelievably hard, and totally unfair that you have to put up with all of this shit just to be the way you were born. But it gets better. Please believe me. All of it, all that crap I put up with was worth it, every second of the pain and misery and guilt, just to fall in love, just to feel that way for a moment. That feeling is worth it. So please be here to experience it.

And one day I'll meet the right guy, and get married, and have kids and retire, just like I planned. All that stuff I thought I lost? I was wrong. I can still have it, and so can you -- or whatever else it is you're looking for. Maybe not tomorrow. Maybe it'll take years. But it gets better. Nowadays, my life is so great I can barely understand what it was that had me so worried when I was a teenager. I was just so wrong.

It gets better. It gets so, so much better.


If you're in the US, the Trevor Project is a help line specifically for gay youth. In the UK, the London Gay & Lesbian Switchboard is a great place to start. And if you're in Trinidad, then organized resources are a bit thin on the ground, but you can email me and I can put you in touch with the right people.

6 comments

Why I really, really hate Instagram

posted 11 August 2011, updated 09 January 2012

Update 2012-01-09: OMG you guys, stop linking to this already! The new versions of Instagram save the un-filtered versions by default, and the new filters are in any case a lot more subtle than the first version. Instagram no longer destroys data, so I no longer hate it. Please stop sending me flame emails.

I love data, so I really hate Instagram.

I suppose it would be more accurate to say I really hate the users of Instagram, for what they do to their photos; Instagram is merely the enabler. The behaviour I take issue with isn't even the default behaviour of the app. But I'm uncomfortable applying such a strong word to such a large group of people who are mostly just trying to be cute and aren't considering the larger consequences of their collective action. So instead I hate Instagram, for enabling the senseless destruction of data contained in these photos.

Consider the digital camera on a mobile phone. Even on high-end mobile phones, it is already a pathetically inaccurate instrument. Even a moderate-quality film camera has a resolution of between 12 and 20 megapixels. The iPhone 4 has 5 megapixels on the rear camera, and high-end Android phones like the EVO 4G go as high as 8.

As a society, we have already made a collective sacrifice in photo quality in the name of quantity and convenience. There was a dark period in the late 90s when average photo quality plummeted from the 20 megapixels of film to pathetic VGA, 640x480 (0.3MP) photos. But it climbed, and now your average point-and-shoot has 12MP, roughly film quality. But in the meantime we sacrificed quality again, this time in the name of portability, and we rely mostly on the cameras in our phones.

With these rubbish phone cameras we take terrible photos of some of our most important moments and cherished memories. I am not complaining about composition and lighting here; I'm not a photographer. I am talking about the quantity of meaningful visual data contained in these files. Future historians will decry forever the appalling lack of visual fidelity in the historical record of the last decade.

Enter Instagram

The mobile app Instagram is -- though this intentionally non-obvious to users -- at heart a social networking play. It's drop-dead simple, it connects you to your friends, and it provides a way of sharing content with your friends that keeps you coming back to the app. It is brilliant. It's slickly built, cleverly marketed, and a masterpiece of usability and clean, get-out-of-the-way design.

The ultimate triumph of its subtly minimalist design is that users get involved without realizing this is what it is. Instead, what most users think of it as is "that way to take cute photos that look like a polaroid from the 1970s". That's what the icon looks like, that's what the name says, and that's the feature it has over all other camera apps: it has a set of filters that can make your otherwise standard cellphone photo look like it was shot with one of a variety of vintage cameras. Again, brilliant.

But also terrible. Think about what these filters are doing: they're taking the already horribly limited amount of visual data contained in a cellphone snapshot and destroying it. If you take a photo with a filter, your original photo -- the one with all the data you originally captured -- is lost. Instead, what is sent to your friends and saved to your photo library is a copy of the photo where a layer of junk has been applied. Colours are washed out, contrast destroyed, borders are cropped, blurs and scratches applied. The meaningful, unique information in those pixels is gone forever, replaced with cloned copies of the bits in the filter file. You are fucking up your photo.

I don't care how terrible your cellphone camera is. I don't care if the shot was already blurry, or badly lit. However bad it was, you have just made it worse. And the worst part is why people do this: because they want that "vintage" look.

Why the fuck do you need your digital photos to look vintage? You are not decorating the set of some 1970s version of Mad Men. You are not fooling anybody into thinking you are taking polaroids and scanning them before uploading them to twitter. People take these fucking faux-vintage shots of current events, things that happened five minutes ago. They add filters to screenshots of their mobile operating system, as if they have some cherished memory from 10 years before they were born of using a mobile operating system that wasn't invented until 40 years after that.

The reason they do this is because of faux-nostalgia. People, and in particular brand marketers, have associated good times with the past, and "vintage" things -- wine, clothes, cars -- with higher quality. So if your photo looks vintage, thanks to the collective marketing efforts of five or six different industries all trying to sell you old shit at 10x its production cost, you feel it looks like a better photo. There's also artificially altered expectations: it may be a terrible cellphone picture but it's great for a polaroid.

I have had enough. Stop fucking up your photos. You know what's going to look actually vintage? The original photo. Go back in your hard drive and look at photos you took with your cellphone 5 years ago. They already look ancient, with their 800x600 resolution. In 5 years your 5-megapixel iPhone 4 shots are going to look just as hokey. You do not need to fuck with these photos to make them look old. You do not need to dip them in artificial 70s-dust to add nostalgic charm to them.

These are already the best photos you'll ever take. They are taken in the moment, of spontaneous laughter and stupid adventures, of your best friends at high school and at college and afterwards. They record good times and solemn occasions, nights out and birthday parties and great meals and first dates. It doesn't matter if they're blurry or dark or out of focus. In forty years' time you're going to look back at these photos and love them no matter what. And you'll wonder why the fuck you thought wrapping a white border and splashing a pink blob over them was a good idea.

It breaks my data-loving heart. So much is going to happen to these photos. Hard drive crashes, virus infections, lost laptops, accidental deletions, misplaced files. It's going to be really hard for them to survive the next three decades. Why destroy them before they even start on their journey?

Stop deliberately destroying your own memories. Stop using the filters on Instagram.


If you like citation-free rants like this one you should follow me on Twitter here.


Update: Multiple people have pointed out that you can set Instagram to save a copy of the original photo. As I point out right at the beginning, this is not a problem with Instagram by itself, but with the users of Instagram, and Instagram as the enabler. (However, saving the original photo untouched in not the default setting)

240 comments

A few words about Wikileaks

posted 11 August 2011

By now you have probably heard about Wikileaks (currently unavailable via its main domain, apparently due to political pressure, but still available at wikileaks.de, wikileaks.fi, and wikileaks.nl).

Wikileaks is a tough case to take a position on. On the one hand, Julian Assange (it is hard to separate the website from the man, though he obviously has a lot of people assisting him) is clearly a bit of a tinfoil-hat guy, and also a shameless self-promoter (though I do not for one second believe he is a rapist, and since the women involved have both withdrawn their allegations, nor apparently do they*). His claims as to the volume of documents they possess, as well as who does and does not support him, are murky. He is an attention-seeking ideologue.

But on the other hand, Wikileaks has done some things I find it hard to condemn. The Afghanistan war diaries were a worthwhile effort. While not nearly as damning as the hype would have you believe, they genuinely shed light of the hopelessness and frustrations of the effort there that had not been reported, despite continuous media coverage of the war from the starts. It was not a clean win: they contained some recent operational information that was probably a genuine security risk. But the more recent information was responsibly held back until the operational risks lessened. It was not a reckless move.

The US diplomatic cables, over which the most recent fuss is about, are similarly a pretty admirable move. Diplomacy to some degree requires confidentiality, and it's a genuine concern if this move leads to less inter-diplomatic communication, or an abandonment of electronic communication. However an overriding priority is the value to the public of this information. The official position of other Arab states on Iran, mafia connections in Russia, secret nuclear weapons in Europe, the state of the drug war in Mexico, China distancing itself from North Korea, and a hundred other instances of countries giving their true assessments of each other: these are real, important, urgent issues where the "public's right to know" is not merely a get-out clause for a tabloid headline.

Moreover, despite claims to the contrary, Wikileaks' and Assange's handling of the release has been remarkably circumspect. The full volume of the cables is only being very slowly released to the general public; only five news organizations (the Guardian, the New York Times, Der Spiegel, Le Monde and El Pais) have been given full access to the complete dataset. It's telling that these long-standing respect news organizations are not sitting on the data either: they are publishing fast and furious, with fresh revelations every day. Why is it admirable and responsible when they do it, but reckless and damaging when Wikileaks does?

On top of this, the US government's response to these leaks has been heavy-handed and ridiculous. The government of a country that prides itself on freedom of speech is running character assassination on Mr. Assange, and American corporations which have benefited greatly from the open nature of US culture and business are folding left and right under political pressure to shut down Wikileaks access to hosting, DNS, and other services. It is shameful and embarrassing that the country most self-righteous about freedom of the press should be trying so hard to suppress information which is not damaging so much as embarrassing. The US government should not have immunity from scrutiny, even when it is deeply embarrassing.

So I cheer for Wikileaks, but quietly, because I fear that doing so will put me on some sort of list. And that fear, the fact that that fear is justified, should be the most embarrassing part of this whole affair to the US government. The damage down to the US government's reputation and respect by its response to these cables is far greater than the damage done by the cables themselves.


If you want to know more about Wikileaks, their Wikipedia entry is informative and, I'm pleased to see, remains up to date with its current domain name as it shifts around. Their Twitter feed also has the latest.


* I posted this hastily, and failed to note that the article in question is 4 months old. The charges have since been re-applied and there is a European arrest warrant out for Assange. I in no way wish to belittle the claims of rape victims or imply that I do not think he should be questioned and, if there is evidence, a trial should be had. However, I think the timing and political pressure involved is overwhelmingly suspicious.

6 comments

@linklog and the delicious shutdown

posted 11 August 2011

The linklog which appears to the left on the home page of this blog and also in my tumblr stream, as well as the independent @linklog stream on twitter, is powered by delicious. News leaked today that Yahoo!, in its great wisdom, is shutting down delicious. I'll be migrating my bookmarking to another service -- I've not yet decided which one, but pinboard.in is a strong candidate. You should expect no disruption in your service of wonderfully distracting links :-)

tagged with
9 comments

The 12 Days Of Christmas, by the numbers

posted 11 August 2011

So I read this tweet by Tim Siedell over the weekend, which made me think: hey, that is a lot of birds, isn't it? I mean, he's bringing her a partridge every day for 12 days, that's 12 partridges. But by the end of the song he's also bringing doves, hens, geese, and more. I started running the numbers. Then I told my friend Ricky, and together we packaged it up into this Christmas-themed infographic. Hope you like it! (Click to make it bigger!)

The 12 days of Christmas, by the numbers
Like this? Share it with your friends!
13 comments

I want to expose your children to homosexuality

posted 11 August 2011

Dear Parents of the World -

There is a phrase used often, when talking about portrayals of homosexuals in the media, by people who say that it's okay for people to be gay in the privacy of their own homes but they don't want to "expose their children to homosexuality". No offence, they say. I just don't want to have to explain boys kissing to my 4-year-old. To some people, it seems like a reasonable request, and you often get your way. Like the censorship of the gay kiss in Katy Perry's Fireworks video in the UK.

On behalf of the gay people of the world, let me say: get over it.

Let's not beat around the bush here. Yes, we want to expose your children to homosexuality. We absolutely do. It's important that we expose your children to homosexuality. But not because it makes us feel better. Not out of some desire to be politically correct, or inclusive. But because it is potentially vital for their psychological well-being.

Your children are already exposed to heterosexuality on a near-constant basis in advertising, in music, in television, in movies, in books. Snow White and the Seven Dwarves exposes children to heterosexuality. She kisses the prince! That is some full-on heterosexuality going on right there!

But homosexuality is much less well-covered. Sure, homosexuality is not as common as heterosexuality, so, sure, I'd expect straight kisses on TV to outnumber gay kisses. But by hundreds, not by millions. Every romance is a straight romance, every teenager's tale of self-discovery ends in their getting the girl. The princess always marries the prince, not another princess.

The reason there are confused kids who don't realize they're gay until their late teens, causing them much anguish and heartache, is precisely because they are not exposed to homosexuality. Some are just unaware that it is a possibility. Others know there are gay people, but because people like you request that images of loving gay couples not be shown to children, they get the clear impression that there is something Wrong or Bad about homosexuality.

I know lots of you, if asked, would say there's nothing wrong with homosexuality. You may even believe it to be true. But your stance on this issue indicates differently. It's the softest form of bigotry, but precisely because it seems so innocuous it remains unexamined and damaging to the psyche of the 4-8% of kids who, no matter what they see on TV, are going to turn out gay anyway. Not knowing the word "gay" won't stop them being gay, it will just prevent them understanding why they feel so different, and that lack of understanding can be traumatic.

You don't want to explain boys kissing boys to your four-year-old? GET OVER IT. You had to explain boys kissing girls to them last week. Whatever level of explanation you gave then ("they like each other") will do just fine. If they ask questions about how gay sex works, then you can always say "I'll tell you when you're older". You don't have to give them the full blow-by-blow the first time they ask. Whenever you explain how straight sex works is fine. (However, I recommend you first find out how gay sex works before attempting to explain it. Apparently a lot of you still have some pretty weird ideas.)

Even if you are 1000%, completely, totally sure your kid is straight -- his first words were "I love vaginal sex", or something -- do it anyway. Because the second-worst thing to gay kids who think being gay is wrong is straight kids who do, and pass that mistaken belief on to their peers. You don't have to sell it to them. You don't have to convince them that being gay is a thing they want to do. They just need to understand that it is a relatively uncommon but entirely natural way to be, like being left-handed.

And sure, some kids, when they hear that some people are left-handed, try writing with their left hand too. They soon discover it doesn't work for them, and switch back. So maybe your teenagers, hearing that homosexuality is an equally valid state of being, might try it out. But so what? They'll discover it's not what they're into, and they'll move on. Believe me, there's no chance they'll get confused about it. Either you like boys or you don't. And if the thought of your children trying out gay sex scares you more than the thought of them trying out straight sex, you might want to examine your own beliefs for a second.

So go on. Expose your children to homosexuality today. It's for their own good.


Update 2010-12-24 clarified language to avoid giving the false impression that sexuality is a conscious choice. Also corrected a typo.

31 comments

Briefly, on Agile

posted 11 August 2011

When you say "agile", I hear "cargo cult".

Agile is a process for managing software development. If you have a great team of smart people who communicate well and trust each other, they can use agile techniques to release lots of small iterations on a software project very quickly. This pattern of software release is often useful for startups. None of this is in dispute.

The problem is that with its rise in popularity, it has been both misunderstood and over-applied. If you have a good software team you can use agile, but if you use agile you will not automatically get a great team. If your team members communicate well and trust each other they can use agile, but if they communicate well and trust each other they could use any other methodology up to and including no fixed process whatsoever, and be equally successful. Agile changes your release pattern, not your people.

Bottom line: great teams produce great software. Great teams using agile release software every two weeks. Bad teams will produce shitty software. Bad teams using agile will release shitty software every two weeks.

tagged with
8 comments

iPhoneTracker, Extra Creepy Edition

posted 11 August 2011

You may have heard about Pete Warden's iPhoneTracker, an app that lets you explore the giant trove of geolocation data your iPhone has been collecting since iOS 4.0 (and possibly before).

You may not know that the grid on Pete's released app is the result of his app deliberately aggregating the datapoints to a grid, in order to be a little less creepy:

if you zoom in you’ll see the points are constrained to a grid, so your exact location is not revealed. The underlying database has no such constraints, unfortunately.

But hey, why should he decide how much we want to expose our location? Let's get super creepy! Following some instructions from a clever friend, I made the very simple change required to increase the granularity of the data shown on the map. Before:

and after:

Woah! Neat, right? If you want to try it out yourself, you can follow Nicole's instructions on your own downloaded copy of Pete's source from github, or if that's too much trouble and you trust me, you can download your own copy of the extra-creepy version of iPhoneTracker. I'm not very experienced with compiling desktop software, but this works for me, and I use OS X Snow Leopard, so it will probably work for Leopard OS X too.

Important note: "granularity" is not the same as "accuracy". Your iPhone is frequently wrong about where you are, by up to half a mile or so. So your data points will show on average about where you were, but there will be plenty of random outliers -- which is why I appear to spend so much time swimming in San Francisco Bay, for example.

Enjoy!

tagged with
38 comments

ORM is an anti-pattern

posted 11 August 2011

I tweeted about ORM last week, and since then several people have asked me to clarify what I meant. I have actually previously written about ORM, but it was in the context of a larger discussion about SQL and I shouldn't have confused the two issues. So here I'm going to focus on ORM itself. I'm also going to try to be very brief, since it became very apparent from my SQL article that people tend to stop reading at the first sentence that makes them angry (and then leave a comment about it, whether or not their point is addressed later on).

What's an anti-pattern?

I was pleased to discover that Wikipedia has a comprehensive list of anti-patterns, both from within the world of programming and outside of it. The reason I call ORM an anti-pattern is because it matches the two criteria the author of AntiPatterns used to distinguish anti-patterns from mere bad habits, specifically:

  1. It initially appears to be beneficial, but in the long term has more bad consequences than good ones
  2. An alternative solution exists that is proven and repeatable

It is the first characteristic that has led to ORM's maddening (to me) popularity: it seems like a good idea at first, and by the time the problems become apparent, it's too late to switch away.

What do you mean by ORM?

The chief offender that I'm talking about is ActiveRecord, made famous by Ruby on Rails and ported to half a dozen languages since then. However, the same criticisms largely apply to other ORM layers like Hibernate in Java and Doctrine in PHP.

The benefits of ORM

  • Simplicity: some ORM layers will tell you that they "eliminate the need for SQL". This is a promise I have yet to see delivered. Others will more realistically claim that they reduce the need to write SQL but allow you to use it when you need it. For simple models, and early in a project, this is definitely a benefit: you will get up and running faster with ORM, no doubt about it. However, you will be running in the wrong direction.
  • Code generation: eliminating user-level code from the model through ORM opens the way for code generation, the "scaffolding" pattern which can give you a functional interface to all your tables through a simple description of your schema. Even more magically, you can change your schema description and re-generate the code, eliminating CRUD. Again, this definitely works initially.
  • Efficiency is "good enough": none of the ORM layers I've seen claim efficiency gains. They are all fairly explicit that you are making a sacrifice of efficiency for code agility. If things get slow, you can always override your ORM methods with more efficient hand-coded SQL. Right?

The problems with ORM

Inadequate abstraction

The most obvious problem with ORM as an abstraction is that it does not adequately abstract away the implementation details. The documentation of all the major ORM libraries is rife with references to SQL concepts. Some introduce them without indicating their equivalents in SQL, while others treat the library as merely a set of procedural functions for generating SQL.

The whole point of an abstraction is that it is supposed to simplify. An abstraction of SQL that requires you to understand SQL anyway is doubling the amount you need to learn: first you need to learn what the SQL you're trying to run is, then you have to learn the API to get your ORM to write it for you. In Hibernate, to perform complicated SQL you actually have to learn a third language, HQL, which is maddeningly almost-but-not-quite SQL, which then gets translated to SQL for you.

A defender of ORM will say that this is not true of every project, that not everyone needs to do complicated joins, that ORM is an "80/20" solution, where 80% of users need only 20% of the features of SQL, and that ORM can handle those. All I can say is that in my fifteen years of developing database-backed web applications that has not been true for me. Only at the very beginning of a project can you get away with no joins or naive joins. After that, you need to tune and consolidate queries. Even if 80% of users need only 30% of the features of SQL, then 100% of users have to break your abstraction to get the job done.

Incorrect abstraction

If your project really does not need any relational data features, then ORM will work perfectly for you, but then you have a different problem: you're using the wrong datastore. The overhead of a relational datastore is enormous; this is a large part of why NoSQL data stores are so much faster. If your data is relational, however, that overhead is worth it: your database does not merely store your data, it represents your data and can answer questions about it on the basis of the relations captured, far more efficiently than you could in procedural code.

But if your data is not relational, then you are adding a huge and unnecessary overhead by using SQL in the first place and then compounding the problem by adding a further abstraction layer on top of that.

On the the other hand, if your data is relational, then your object mapping will eventually break down. SQL is about relational algebra: the output of SQL is not an object but an answer to a question. If your object "is" an instance of X and "has" a number of Y, and each of Y "belongs to" a Z, what is the correct representation in memory of your object? Is it merely the properties of X, or should it include all the Ys, and/or all the Zs? If you get only the properties of X, when do you run the query to fetch the Ys? And do you want one or all of them? In reality, it depends: that's what I mean when I say SQL is the answer to a question. The representation of your object in memory depends what you intend to do with it, and context-sensitive representation is not a feature of OO design. Relations are not objects; objects are not relations.

Death by a thousand queries

This leads naturally to another problem of ORM: inefficiency. When you fetch an object, which of its properties (columns in the table) do you need? ORM can't know, so it gets all of them (or it requires you to say, breaking the abstraction). Initially this is not a problem, but when you are fetching a thousand records at a time, fetching 30 columns when you only need 3 becomes a pernicious source of inefficiency. Many ORM layers are also notably bad at deducing joins, and will fall back to dozens of individual queries for related objects. As I mentioned earlier, many ORM layers explicitly state that efficiency is being sacrificed, and some provide a mechanism to tune troublesome queries. The problem, I have discovered with experience, is that there is seldom a single "magic bullet" query that needs to be optimized: the death of database-backed applications is not the efficiency of any one query, but the number of queries. ORM's lack of context-sensitivity means that it cannot consolidate queries, and must fall back on caching and other mechanisms to attempt to compensate.

What are the alternatives?

Hopefully by this point I've made some kind of case that ORM has fundamental design flaws. But to be an antipattern, there needs to be an alternative. In fact, there are two:

Use objects

If your data is objects, stop using a relational database. The programming world is currently awash with key-value stores that will allow you to hold elegant, self-contained data structures in huge quantities and access them at lightning speed. There's no law that says Step One of writing any web app is installing MySQL. The massive over-application of relational databases to every data representation problem is one of the reasons SQL has acquired a bad reputation in recent years, when in fact the problem is lazy design.

Use SQL in the Model

It's hugely dangerous to claim there is One True Wayâ„¢ to do anything in programming. But in my experience, the best way to represent relational data in object-oriented code is still through a model layer: encapsulation of your data representation into a single area of your code is fundamentally a good idea. However, remember that the job of your model layer is not to represent objects but to answer questions. Provide an API that answers the questions your application has, as simply and efficiently as possible. Sometimes these answers will be painfully specific, in a way that seems "wrong" to even a seasoned OO developer, but with experience you will get better at finding points of commonality that allow you to refactor multiple query methods into one.

Likewise, sometimes the output will be a single object X, which is easy to represent. But sometimes the output will be a grid of aggregate data, or a single integer count. Resist the temptation to wrap these in too many layers of abstraction, and deal with the data on its own terms. Above all resist the fallacy of OO, that it can represent anything and everything. OO is itself an abstraction, a beautiful and hugely flexible one, but relational data is one of its boundaries, and pretending objects can do something they can't is the fundamental, root problem in all ORM.

In summary (TL;DR)

  • ORM is initially simpler to understand and faster to write than SQL-based model code
  • Its efficiency in the early stages of any project is adequate
  • Unfortunately, these advantages disappear as the project increases in complexity: the abstraction breaks down, forcing the dev to use and understand SQL
  • Entirely anecdotally, I claim that the abstraction of ORM breaks down not for 20% of projects, but close to 100% of them.
  • Objects are not an adequate way of expressing the results of relational queries.
  • The inadequacy of the mapping of queries to objects leads to a fundamental inefficiency in ORM-backed applications that is pervasive, distributed, and therefore not easily fixed without abandoning ORM entirely.
  • Instead of using relational stores and ORM for everything, think more carefully about your design
  • If your data is object in nature, then use object stores ("NoSQL"). They'll be much faster than a relational database.
  • If your data is relational in nature, the overhead of a relational database is worth it.
  • Encapsulate your relational queries into a Model layer, but design your API to serve the specific data needs of your application; resist the temptation to generalize too far.
  • OO design cannot represent relational data in an efficient way; this is a fundamental limitation of OO design that ORM cannot fix.
110 comments

Wanted: statisticians

posted 11 August 2011

The only skills gap bigger than the one for programmers is the one for statisticians.

The whole web industry is accumulating vast quantities of data and storing it, magpie-like, as if it has intrinsic value, aided by ever-falling prices for storage. But the data isn't valuable. It doesn't mean anything until somebody who knows what they're doing looks at it, sifts through it, and produces a tool that lets others use it to draw valid and useful conclusions.

But hardly anybody does this. Instead we apply the most absurdly basic analyses and build whole businesses around them. We are messing around in the shallows, while the ocean of data gets bigger every day.

If you want to find yourself enormously over-employed for the next decade, learn a bunch of statistics. As a bonus, find a way to fit machine learning in there, but we even have way more people who understand machine learning than understand what it is we should be teaching them.

tagged with
21 comments