Monday, April 28th, 2008
Check out Daniel Tunkelang’s new blog at the noisy channel, where he tries to apply insights from information retrieval and information science to address the practical problems of information access.
Check out Daniel Tunkelang’s new blog at the noisy channel, where he tries to apply insights from information retrieval and information science to address the practical problems of information access.
I got some more positive feedback on the global IA workshop :)
“I took your workshop and enjoyed it - I was able to apply some of the insights the day I returned to the office in a meeting.”
The global IA workshop at the IA Summit turned out great. There were 17 people, from all over the world, and we had some great discussions. We also had a follow-up round-table which was fun too. I’ve embedded the entire deck below. I’ll see if I can get the content of the handouts out here too, although that might take a while.
The feedback from the workshop was good too. Some quotes from the feedback forms:
“The session was more fun than expected.”
“A much needed, excellent presentation with great documentation.”
“Great pace with great slides.”
A quick promotion for my global IA workshop at the iA SUMMIT 2008 - it’s shaping up to be the most in-depth workshop I’ve ever given, in a field that’s so new that the wheel is being invented again and again all over the place. I’ll post some more info on the coming weeks.
Lou Rosenfeld: “So sure, I can make plenty of recommendations, but, ultimately, what the heck do I really know other than the stuff that’s probably obvious to anyone else with some experience in the field?” Says the master.
Leap seconds. Like most categories, it turns out the second isn’t as stably defined as you’d think.
Categories are cultural,
locales mix up,
structure mostly translates,
global standards have local exceptions.
The “Slums” Of Search: about search results in non-US markets.
winterson.com: episode iii, the backstroke of the west: what happens when you cross-translate.
OK so now THIS is some innovative navigation :) Talking heads for each major section that tell you about it. Really, it’s innovative. You really have to visit the site to get the full effect, with audio turned on.
DonnaM is giving away a free pass to the Ozzie IA conference.
Drupalcon is at the same time in Barcelona as the Euro IA Summit, and I’ll be giving a talk there too :)
If you’re going to the IA Summit, you could come a day early and we could have a beer at Drupalcon.
My long pages work post (a quickie really) got picked up: Dion Almaer over at Ajaxian says: “The iPhone is also showing that scrolling is a nice UI tool”. Sean Kane provides some “it depends” input: “A long page with important messaging below the fold in this case could hurt sign-up rates.” I have to disagree with this example though: long pages can be *great* for converting users to sign up for something.
Craigslist rocks (and for reasons similar to why eBay rocks), but their internationalization approach has a serious flaw, which I think is responsible for their limited success in many international markets: they don’t localize their taxonomies. I’m not gonna write a long post, but here’s a quick analysis I did of their Dubai site:
As you can see, many categories just sit there empty (the red dots), and only a few categories are active (the green ones). For the original Sanfran Craigslist, the screenshot would be full of green dots.
The result of this approach is that the site feels empty and inactive. The solution would be to remove most categories and build local specific categories. For example, “woman seeking woman” isn’t particularly appropriate in Dubai society. Construction is booming on the other hand (foreigners cannot own property in Dubai itself, but clever businessmen as they are they have started building multiple artificial islands in the sea where property *can* be owned by foreigners), so one of the few active categories is “real estate for sale”, and there could easily be more real estate categories (like “offshore real estate”), specific to Dubai.
Google maps shows India in English, Thailand in (?) Thai. Curious, for some reason I wouldn’t have expected it to be localized like that.

I’ve been playing around with different diagrams that attempt to explain the various locales on a website. Here is an example:

Does this make sense as a diagram? Does it help you understand?
Language codes: “Some of the most heated discussions on the request for new projects
are about the status of a language; is it a language a dialect and
often the arguments are of a political nature. The inclusion of
languages in the ISO-639 has been political in the past. With ISO-639-3
many of these arguments have an answer with the many new language codes
that have been created.”
There seems to be a theory that the busyness or clean-ness of websites is related to the personal distance (the “bubble”) of the culture that the website is for.
Names in the world: Icelanders prefer to be called by their given name (Björk), or by their
full name (Björk Guðmundsdóttir). Björk wouldn’t normally expect to be
called Ms. Guðmundsdóttir. Telephone directories in Iceland are sorted
by given name.
A database developer talking about public health. (Udell) It’s really incredible how little we know, and how numbers are misused.
“But of course, you have to come up with an estimate of how many people
are dead. So somebody picks a number, and then you hear it on CNN that
night. Fifty thousand, a hundred thousand, a hundred and twenty-five
thousand, none of those estimates were based on any attempt to really
find out.”
I noticed this when journalists used to call me asking “how big the vlogosphere was”. (This was pre-Youtube). Of course, there was no way to know (we didn’t even know how to define “videoblogging”), so I made up a number that I thought was ballpark. And they *always* ran with it, leaving out my qualifiers.
“Imagine if you were the CEO of Toyota, and your CFO said, well, sales are pretty good, we think, but we’re not sure.”
And about standardization: “If you went to a UN organization and said, we want to standardize how
we collect data about child nutrition, the response would be, let’s
have a conference. We’ll have experts get together in Rome, and then in
Paris, and decide what are the key questions for any standard child
nutrition survey. But it’s hard to achieve unanimity, and there’s a
built-in incentive not to because every time you get together it’s a
trip to Rome.”
This working with the social life of information is something that seems mainly ignored in current IA texts and practices.
In Korea, Google features animations on it’s famously understated homepage! http://www.google.co.kr
(via)
http://www.w3.org/International/planet/ aggregator for i18n posts.
Languages of the world. With a cool language map.
When you think about which locales to choose, and which languages to translate in, I always recommend to think of locales as markets, not as languages. That way you focus on the right things when choosing locales. But still, I’m trying to improve my understanding of how to select locales. What are the things you consider when choosing which languages to translate your website in? Leave your thoughts in the comments.
Hey, Google translate is like Babelfish (translate text or pages), but it also can translate search results. Cool, Babelfish hadn’t had a competitor for too long.
OK, for your wayfinding presentations, check out this:

(I changed the title because “top 10″ posts are indeed sucky. Also: looking for my colombia travel site?)
By the way, here’s the RSS feed of my blog, in case you’d like to subscribe.
I always love to read scaling discussions, especially about popular web apps, and there are loads of them out there. Here’s my overview of the best. By the way, the best book on scaling apps I’ve ever read is Building Scalable Websites, by Cal Henderson (the Flickr guy).
It’s dog-eared on my desk, and taught me about sharding (which I used extensively for mefeedia). Sharding is when you cut a really big table into pieces, so you can put those on separate servers. It means you have to make changes to your code, and your database isn’t so database-y anymore, but it works. For example, online games use sharding to grow their virtual worlds, because there’s no way they could serve all that information from 1 db cluster.
Scaling Twitter with Ruby.
Twitter is hot today, and they ran into some serious scaling problems, although the app itself is quite simple. It consists of messages of maximum 140 characters. Lessons are the same as most apps: Memcache like crazy, and optimize the database (the biggest bottleneck most of the time).
Also, Ruby on Rails scales pretty much the same way as PHP and other similar languages: shared nothing architecture. Shared nothing means that there is no 1 thing that is shared by all servers, since that would become a bottleneck.
PHP, for example, has shared nothing architecture out of the box, except perhaps for sessions, but that’s easily solved by storing sessions in a db (which then has it’s own scaling approach) and not in the filesystem. Here’s a talk by Rasmus Lerdorf that explain scaling with PHP5. (Here’s the mp3 audio recorded by Niall Kennedy).
Blain Cook made this presentation:
Scaling Flickr.
Cal Henderson wrote the above book, and also has a good presentation: Scaling Flickr slides as PDF’s.
One of the problems you get into when scaling something like Flickr where you store LOTS of stuff, is that you can’t just store that on a harddrive anymore: it’s not big enough. Apart from just using Amazon’s S3 service (which rocks - I used it for mefeedia and I know lots of startups who use it), there are other solutions. A good presentation of that by Cal is this one:
Cal (he’s a busy dude) also made this presenation about scaling web apps, generally:
John Allspaw (flickr plumbr) also has a good presentation about scaling Flickr:
Scaling LiveJournal.
LiveJournal was one of the first social networks, before that word meant anything, and they’ve partly invented how to scale standard php/mysql/apache apps. They developed memcached, which is now used by almost anyone who wants to scale their site.
Brad Fitzpatrick has a good set of slides on how they evolved the service, here’s a PDF version. And here’s the slideshow embedded:
Kevin Rose mentioned this was “the bible for scaling Digg” - and I think quite a few other web apps are based on this.
Six Apart.
The livejournal guys with all their scaling expertise were acquired by Six Apart, and they soon launched Vox. And of course, here’s a presentation on making Vox scalable:
Bloglines.
Bloglines’ scaling problems where slightly different from your average web app, since they are an aggregator of feeds. That means they have billions of blogposts they have to keep and serve to users, and that creates its own scaling problems. The Bloglines approach was to, instead of using a database, just store all that stuff in a special filesystem. Today it’d be easier to do this since there are a few filesystems that do that, or you could just go with S3 again. Mark Fletcher (who also sold Onelist to Yahoo which is now Yahoo Groups) has given a few talks on scaling Onelist and Bloglines: here’s the mp3 audio version, and here’s the PDF of that talk. And a text transcript.
Last.fm
Last.fm is one of the aggregation-type apps: they gather a lot of data about what music you listen to. Similarly to Bloglines, that causes it’s own scaling problems:
Slideshare.
All the slides in this post are hosted by Slideshare, an incredible service by my fellow information architect Rashmi Sinha and team. When I found out about the project, I emailed her: “brilliant and so obvious once you think of it”. Like many startups, they use S3 to serve their content, and they have the obligatory yet interesting slides to explain how:
I haven’t linked to lots of good thinking about scaling, or to technical resources and stuff. But the presentations should get you going in the world of memcached, perlbal, nothing shared and federation :) Enjoy!
PS: See also How I Unexpectedly Found Myself Doing Consulting For Startups (this is a post on my “professional” site. I haven’t been able to figure out when to post here or there, any tips on that?).
Update: more presentations.
Another great talk in video this time, from the MySQL Bay Area Community Meetup, May 2007:
Finally, Dan Pritchett has a good presentation on scaling eBay (PDF). 26 Billion SQL queries per day! 300+ new features per quarter! 4 architecture versions since 1998 and some pretty crazy scaling of the search.
New: presentation on how Facebook uses PHP APC cache (PDF).
A talk on Youtube scalability: “In the summer of 2006, they grew from 30 million pages per day to 100 million pages per day, in a 4 month period. Thumbnails turn out to be surprisingly hard to serve efficiently. (I ran into this with mefeedia too, luckily Amazon S3 came to the rescue by then.)” Youtube uses Python, Apache, MySQL, Memcached.
NEW: Front end scaling is important too, and often ignored. Here’s a good presentation from the Yahoo guys:
btw, the new Boxes and Arrows design is very very pretty.
I’m playing around with Axure and it’s nice but I can’t seem to use backgrounds like I do in Visio. The masters don’t seem to work like backgrounds, or else I just haven’t figured out yet how to do it.
I do love the ability to add annotations and specific fields for annotations to various things.
http://continuouspartialattention.jot.com/WikiHome: “I believe attention is the most powerful tool of the human spirit. We can enhance or augment our attention with practices like meditation and exercise, diffuse it with technologies like email and Blackberries, or alter it with pharmaceuticals. In the end, though, we are fully responsible for how we choose to use this extraordinary tool.“
Attention is definitely underappreciated in the theory of IA, although it is there.
So again, this is a great talk. “You grew up with Andy Warhol’s 15 minutes of fame; they’re growing up
with being famous amongst 15. They’re collecting friends as a way of
demarcating audience in a world without meaningful signals about who’s
watching. If you’re not in their list of friends or aren’t like the
people in their list of friends, you are not the intended audience.”
Perhaps the breakdown of privacy is creating the biggest cultural shock since the 50s/60s when rock&roll and all that happened.
A lot of people really don’t understand what teenagers do on Myspace (as an example).
Incantations for muggles, a great talk transcript about different age groups and what their priorities are and how that affects the tech we build.
ITS, the Internationalization Tag Set has been published as a W3C Recommendation.
ITS is a set of attributes and elements that are designed to help the internationalization and the localization of XML material.
For example, the same way you can use <p xml:lang=”es”> to specify a the content of the <p> element is is Spanish, you can now use <p its:translate=”no”> to indicate the content of <p> should not be translated.
Customer acquisition is often a kind of forgotten part of building websites. Relying on just “viral” growth isn’t all that’s it made out to be. It’s usually hard work.
I’ve had the pleasure to work with some people that are very experienced in this area, and I’ve learnt quite a few things.
One, it’s easy enough to get 10,000 or even 100,000 users for your website. It’s much harder to get 1,000,000 or 10,000,000, and active users mean a lot more than just people who signed up and never came back.
Two, it’s hard to get paying users. The same numbers apply, but divided by about 100. So it’s realtively easy to get 100 paying users or even 1000. It’s a whole different ballgame to get 10,000 or 100,000 paying users. That’s very hard work, and it will cost you probably around 5 to 20$/user.
These numbers of course don’t mean a lot, but they give an idea. If you’re planning for a million paying users for your startup, you better realize it’s gonna take at least a year or two of hard work to get to that point even *if* you’re successful, and cost you millions, perhaps 10s of millions in customer acquisition cost (advertising, rewards, the whole customer acquisition engine).
If you’re going for a few 1000 paying users, that’s something that’s much easier to achieve. Just build a kick-ass useful product. If you can be way profitable with 10,000 paying users, you’re good.
I kind of hestitated to put numbers in this post, because things vary so much, but perhaps this can help some unexperienced entrepreneurs so here we are. Grain o’ salt please!
Would it be fair to say that when coorporate blogging fails, it does so most often because of the cultural problems? The coorporate culture doesn’t allow for the free flow of ideas, hence the “blogging” effort becomes nothing more than a news channel, and the whole point is lost.
When it does work, it’s because it supports an existing culture of openness. True? (ps: I know the comments are broken..)
I’m sure this would have happened anyway, but back in 2000, I did some usability experiments with having a sitemap on every page of the website. Peter Merholz wrote about it (I didn’t have my blog yet back then). I actually measured clickthroughs on that sitemap, and it turned out to be very popular.
Years later, that idea started to get picked up by more and more sites, and these days it seems like everyone is doing it (because it makes sense). So in a sense I could be the father of the sitemap at the bottom of every page pattern. Then again, it’s one of those things that would have happened anyway.
Like the navigation in the main column pattern that Amazon is using these days. They used to have left hand navs, but over time, slowly, undoubtedly with lots of testing, they moved to having almost no left or right-hand navigation on their product pages, they’re just one long page. The navigation is in the main column. I expect that pattern to take off more and more as well, since users quite effectively blind out the classic left hand nav.
Oh, in Peter’s post, in the comments, a beauty: “Putting a site map on every page really riles me, actually. It’s just laziness on the IA’s part. Come up with a navigation that makes sense, and there won’t be a need for it.” - Hahaha.
a 3mb file with supposedly ALL the worlds cities & towns with geolocation.