Archive for January, 2005

Sunday, January 23rd, 2005

Vloggercon 2005 is over. It was brilliant, and historical. Vloggercon 2006 will have 500 people instead of the about 80 we got this year. But the amazing thing was: the 80 people were all the right people.

Remember, this was an unconference, pulled together in a few weeks by some people. No money involved. The wifi worked. There was cofee and bages. Nobody seems to have gotten hurt. The discussions were interesting. The vendors didn’t take over. The audience got to speak - and really, it is silly to speak of “audience” because everyone was a presenter. Congratulations to everyone involved, and especially Jay, who really pulled this one together with good vibes and hard work.

Flickr: Photos tagged with vloggercon

Saturday, January 22nd, 2005

Raymond’s first day in nyc - pe-vloggercon (Quicktime, 800K)

Friday, January 21st, 2005

Blackbeltjones/work: Wikipedia: Shirky / (Tufte x Wattenburg) = ?: sparklines for wikipedia!

Friday, January 21st, 2005

Eric Scheid pointed me to Guidelines for Forming Language Equivalents: A Model Based on the Art & Architecture Thesaurus. I will report in more depth on this later.

Thursday, January 20th, 2005

I have a problem with backing up files to my external harddrive (on Win XP).

When I drag a large folder (a few gigs) to the harddrive, it starts copying, but after a while (5 minutes), the harddrive “dissapears” - it no longer shows up in my list of drives, and the copy gives an error. I have to then unplug the harddrive and plug it in again to make it show up again in my list of drives.

I have this problem with 2 computers, and with 2 external harddrives, so I don’t think it’s a problem with either the computer or the drive. Small folders are ok, it’s just the larger folders that show this problem.

Is it impossible to back up large folders by just dragging them? Should I use a backup program? If so, which one is good (and cheap or free)? PC Magazine recommends the free SynchBack so I’m trying that out.

Later: damn! I tried but the same problem happened, after copying for 5 minutes, the outside drive just ‘dissapeared” from my computer, and the copy program reports an error. So it’s not just with drag and drop, it’s any copying to those drives (remember I have 2) that makes them “dissapear’ from windows. Any ideas?

Later: I ran Windows update and all, tried again, still the same problem.

Later: This article seems to describe the problem, in short: Windows XP can have problems with harddrives larger than 138Gigs, upgrade to Service Pack 2 (something I’ve been holding off on) to fix it.

Later: I successfully installed Service Pack 2, after it crashed on me once. I tried again, STILL the same problem. Arg.

Later: Alright, I am downloading more updates from the Windows Update site. The words “critical update”, “strongly recommend” and “security” were plastered all over the page so I figured I’d better say yes.

Later: I installed all latest Windows updates, doublechecked again (no updates), restarted, tried again, and fuck: still the same problem.

I am running out of ideas here…

Thursday, January 20th, 2005

alex wright has some follow-up musings on my i18n folksonomy post. Gene Smith also weighs in.

Wednesday, January 19th, 2005

Signal vs. Noise launches a free todo list app. Brilliant move, again.

Wednesday, January 19th, 2005

I and the others guessed right: Six Log: Support for nofollow: Google approached blogging companies to implement support for nofollow.

Tuesday, January 18th, 2005

monkey methods: 5 Reasons Why Feedster and Technorati will Die, with some pretty good comments.

Monday, January 17th, 2005

Steve Arnold pointed out to me that “Languages that are synthetic do not fare well in automatic systems unless the source documents are highly technical.”

Which makes sense, once you understand what a synthetic language is. A synthetic language combines bits into really long words. For example, in Mohawk: Washakotya’tawitsherahetkvhta’se = “He ruined her dress” (strictly, “He made the thing that one puts on one’s body ugly for her”). One word is used for something that other languages need a multiple words or a whole sentence for. You can see how that can mess with automated systems.

Languages are not synthetic or isolating, they fall into a spectrum: some languages are just more synthetic than others. Examples of common synthetic languages are German, Russian, Turkish, Finnish, Japanese, Korean, and many more.

Monday, January 17th, 2005

I came accross this other interesting paper on multi-lingual search.

Internet Searching and Browsing in a Multilingual World: An Experiment on the Chinese Business Intelligence Portal (CBizPort) (PDF) Journal of the American society for information science and technology —July 2004.

It gives some good practical background on a project to create a business search for Chinese. It also describes an approach to automatically summarizing and categorizing search results. Here is its description of the Chinese search landscape, illustrating some interesting language-locale subtleties.

“Chinese is the primary language used by people in mainland China, Taiwan, and Hong Kong. Language encoding, vocabularies, economies, and societies of the three regions differ signiď¬?cantly. Regional search engines, therefore, have been developed to provide Internet searching.

In mainland China, the major search engines include Sina and Baidu. Baidu currently powers over 80% of Internet search services in China, including ChinaRen, 163.net, etc. The database of Baidu stores over 60 million Web pages collected from mainland China, Hong Kong, Taiwan, and Singapore, and grows at a speed of several hundreds of thousands of Web pages per day. Sina is an Internet portal providing comprehensive services such as Web searching, e-mail, news, business directory, entertainment, weather forecast, etc. From our review of search engines in mainland China, we found that Baidu has better search capabilities than the others, as shown by its content coverage. Sina has a wider scope of functions than Baidu.

In Taiwan, the two major Internet search portals are Open�nd and Yam. Open�nd, established in 1998, is one of the largest portals in
Taiwan. In addition to basic searching, Open�nd suggests terms that are highly associated with users’ queries to help them re�ne their search. It also allows users to �nd more related items from each search result and highlights the query terms in the results. Established in 1995, Yam provides comprehensive online services. Its four major focuses are content, communication, community, and commerce (4C). Yam’s search engine allows users to search various media: Web sites, Web pages, news, Internet forum messages, and activities (in 18 Taiwan cities or regions). We found that Open�nd has better functionality and content coverage, but Yam was better established in the local market (e.g., it powers the search function of the Taiwan government’s Web sites).

In Hong Kong, due to its bilingual culture, people rely on both English and Chinese when accessing and searching the
Internet. Major search portals include Yahoo Hong Kong and Timway. Of these, Yahoo Hong Kong is one of the most popular. Yahoo Hong Kong’s search engine returns results in different categories, Web sites, Web pages, and news. Headquartered in Hong Kong, Timway provides services such as Web searching, Web directory, e-mail, news, forums, etc. Its database stores over 30,000 Hong Kong Web sites and over 10 million Web pages. “

In other words, it’s not because Chinese is a language, that one Chinese search engine will be enough for the various users that want to search in Chinese. There are different groups of people who search in Chinese, with different local requirements, and this local requirement has given rise to a number of different search engines for Chinese.

Monday, January 17th, 2005

A Framework for Multilingual Searching and Meta-information Extraction: what is “term isolation“? “Term isolations means extracting the individual terms from the text. This is necessary for languages such as Chinese and Japanese, that do not contain white space between individual terms. Term isolation is not a trivial task, and requires the software to understand the language’s grammar and have a complete dictionary. Term isolation is clearly a language-specific task (with different software modules for different languages).”

Multi-lingual search

Monday, January 17th, 2005

How do you know what language the query a user entered is in, and how do you search languages like Japanese, that don’t use spaces between the words? And how do you identify languages used in an unstructured, multi-lingual document? Multilingual search is a hard problem.

Most of the search players all use the same basic technology provided by Basistech, check out their customer list: Google, Amazon, MSN, Yahoo, Endeca, Peoplesoft, Overture, you name it, they have them.

The technology, called the Rosette Linguistic Platform, “helps your applications unlock the meaning of unstructured text by determining the language and encoding of a given document, converting the text to Unicode so that it can be processed, identifying the basic linguistic features and structure, and locating key concepts like the names of people and places.”

In other words, it deals with Asian and Arabic language search problems, and does entity extraction (extracting names of people, places and companies and such). It identifies individual words for languages such as Japanese that do not use spaces between words, breaks compound words into their individual components, and identifies parts-of-speech such as verb, adjective, etc.

Once it has done its job making sense of the languages it finds, the search technology of the vendor takes over.

They have a pretty cool demo that explains what the technology does - here’s a screenshot:

rosetta

Google hits comment spammers hard?

Monday, January 17th, 2005

Simon thinks Google will soon announce that they won’t be calculating PageRank for links with a rel=”nofollow” attribute. And he’s probably right, it makes complete sense for Google to do this. Dave Winer has a “mysterious” announcement coming up, and if you view source you see he has the rel=”nofollow” implemented.

There are two ways rel=”nofollow” could work: either Google simply doesn’t follow these links, or they don’t attach Pagerank to these links. As opposed to Simon, I think it is probably the first. It makes semantically more sense. It’s probably easier to implement for them as well. Also, Pagerank has become less and less important in Google algorythms over the years.

This means that you just add this attribute to all the links that are added by users, like links in comments. You don’t add it to the links in your blog posts. People can still follow ALL links, but search engines (only Google for now) would only follow the links in your blogposts. The stuff *you* link to in your blogposts still gets yummy Pagerank goodness - your blog doesn’t loose its Google power.

And more importantly, Google doesn’t loose its blog power - it can still take advantage of the meaning embedded in the links on blogs, just without much of the pollution. I can even see them implementing a little Pagerank boost for outgoing links on a domain that does have some rel=”nofollow” links implemented, since it means that the links that don’t have that attribute are probably somewhat more meaningful.

When you do this, the incentive for spammers to spam you (increased Pagerank) is pretty much taken away. Comment spammers don’t do it in the hope that some human will follow that link. They’re in it for the Pagerank.

The amount of spam you get won’t diminish immediately - spammers use automated tools and don’t really care about whether it works on a particular blog. But if the majority of blogs implements this, then it will become less and less attractive for comment spammers to spend time comment spamming.

This is where hosted services like Blogger (owned by Google) or Typepad really shine. I expect them to support this from the moment of announcement on, making the majority of blogs protected against spam. It wouldn’t make sense for Google to implement this and not let Six Apart (owners of Typepad) know about it - that would be abusing their search engine power a bit.

The most popular blogging packages would support this as well, and as people slowly upgrade to new versions, within a year or so 80 to 90% (I’m making these numbers up) of blogs will be protected. OK, who makes the condom-like logo that says “my blog is spam protected”?

Not everyone is optimistic though.

An open question: as Google looses some of its dominance in the search world, will other search engines start supporting this? If not, the measure may not be as effective as we hope.

Will this stop comment spamming? Not right now. Will it stop the growth of comment spamming? Hopefully.

Then again, this may all be wrong, since Dave supposedly left a comment saying “Pssst. Good work. You’re getting warmer. ;->”

Follow this story on Technorati.

Sunday, January 16th, 2005

Smart Mobs: Cameraphone as Conversational Medium: “Daisuke Okabe has just published Emergent Social Practices, Situations and Relations
through Everyday Camera Phone Use, her report on the research she conducted with Mizuko Ito. The continuous sharing of image-streams with social networks seems to be developing as a hybrid of technological artifact and mediated discourse — friends self-surveill and share what they are seeing as they move through the world, through their day.”

From the comments: How and why people use cameraphones (PDF).

We’re seeing some of this happening in the videoblogging group as well.

Sunday, January 16th, 2005

Nick Finch (?) writes about content and locales and the problems with continents, but I can’t seem to contact him.

I linked to another website with this post, that I found via Technorati, and I now suspect that that other website was a fake blog meant to create Pagerank, populated automatically from various feeds. Evil.

Sunday, January 16th, 2005

I fucking HATE that Blogger requires you to open a fucking account just to leave a fucking comment. Sorry Google, but that’s being worse than Microsoft.

Saturday, January 15th, 2005

In the year 2014, the New York Times has ceased to exist.

Emergent i18n effects in folksonomies

Saturday, January 15th, 2005

My series of posts on international information architecture:

  1. Translating taxonomies and categories
  2. Translating categories, translating terms
  3. Translating the Dewey Decimal Classification system
  4. Designing the relationship between content and locales
  5. Emergent i18n effects in folksonomies (this post)
  6. The Maori versus Dewey, and why limiting access can be culturally appropriate.

Folksonomies are taxonomies created by users who add tags to things. Folksonomies are messy and have a lot of problems, but their great merit is that they’re scalable and they use the users’ terminology by definition, a serious problem with more classic taxonomies that are created by information architects or librarians.

There is a lot of innovation happening around folksonomies. One interesting area is internationalization. Users enter tags in many languages, but generally, the system does not know what language a tag is in. Have a look at a screenshot of this page from Technorati, showing popular tags:

technorati tags

I drew big red lines around them, so you’ll probably notice some tags in other languages than English. “Algemeen - Algemein - Algemeines” means “general” in Dutch and German. “Entertainment Entretenimento” is English-Spanish for entertainment. Notice that a misspelling of “Entretenimiento” is also used. But we are talking about languages today, not spelling. “Music - Musique - MĂşsica” are English, French and Spanish.

So what’s going on here? People are tagging things in many languages. Right now, Technorati displays the various languages mixed together on one page. That’s pretty interesting, expecially if you’re interested in languages like me. But it might also be cool to see only tags in your language. Especially if you don’t speak English.

How can we do this? One way is to use a dictionary lookup to figure out what language a certain tag is in. It won’t be perfect, but this approach could be used to display a page of popular tags in mostly German, mostly English, mostly Hindi and so on. This will reduce the amount of tags available to the user, but make them more relevant to them (because they are in their language). Again, seeing a few non-English tags won’t bother you, but this is not for English speakers - the dominant language. For all the people who don’t speak English, seeing tags in their language will be invaluable. If you do dictionary lookup only with popular tags, it shouldn’t be too resource-expensive - a tag only has to be checked once against the dictionary, and assigned a probable language.

Another way is to look at the language of the source (rss feed, …) and assume the tag is in the same language. Tricky - I’m not sure this will work. There might be other algorythms as well - if I do a Google search for “MĂşsica”, it knows this isn’t English, because it asks me if I want to “Search for English results only”, so there is some algorythm going on there I assume (unless they also use the dictionary approach).

Later: I realized something else. Displaying tags in mostly some language, as opposed to exclusively in that language, is not necessarily a bug, it might be a feature. Many user populations around the world incorporate words in multiple languages in their vocabulary. The language namespaces I am talking about might not map perfectly to a specific language, but include words in other languages, and slang and such, and in this way be a much better representation of the real language of a certain user population than if we were to just use one language. So it’s not so much about language namespaces, it’s more about user population namespaces. Language is just a starting point and might be an easy way to group user populations.

As an aside, I think the real innovation with folksonomies will come from creating algorythms. It’s all about scalability. The way Google’s superior algorythm in search made them the nr1 search engine, someone will invent superior algorythms in tagging and this may make them the nr1 tagging engine.

Back to languages. The most interesting aspect of the screenshot above isn’t that there are tags in other languages, is that the tags are the same in other languages. The tags in French and Spanish and such have their English translations right on the same page. This suggests that people seem to tag things similarly in different languages. Is there a way to create algorythms that take advantage of this fact? Also take into account that different people can tag the same things (pictures, bookmarks, …), in different languages.

An interesting language effect with tags was pointed out by Tanya, on this Flickr page for the tag chat. “Chat” means cat in French. Here’s a screenshot:

flickr tags chat

So different people have used “chat” and “cat” to tag similar items, and as a result Flickr knows that “chat” and “cat” are related tags. I’m not sure what that means for internationalization of folksonomies, but as an emergent i18n effect I think that’s pretty amazing.

A third and similar i18n effect I found when playing around with this in Flickr is that of language namespaces. If you start following related tags in Flickr in a certain language, you will see many tags in that language. Here’s a screenshot of the tag “leuk“, which means “funny” in Dutch.

leuk on flickr

The related tags are in English, but in the see also tags we see a whole bunch of Dutch words: “ik, konijn, middelharnis, mooi, oudetonge, overflakkee, plankje”. And if you follow those you’ll see more Dutch words in the related and see-also tags, creating a kind of Dutch namespace almost. Again, I’m not sure how to use this exactly, but it’s pretty amazing to me that, in this early stage, there are already interesting i18n effects happening in the tagging space.

Comments welcome!

| | | | |

Saturday, January 15th, 2005

Technorati has started to aggregate and combine tags from feeds (it uses categories), Flickr and Del.icio.us. Here’s an example page: Technorati: Tag: humor. They support the rel=tag attribute. It’s brilliant.

Innovation around folksonomies is going fast, a lot of it driven I think by David Weinberger, who is writing a book about tagging and taxonomies and such, and is blogging about this stuff all the time, and happens to be on the board of Technorati. I’m excited - this is great stuff. I am also starting to experiment with tags on me-tv, a videoblog aggregator project of mine.

Saturday, January 15th, 2005

I’m helping out a bit on the IA of ourmedia.org, and I’m trying to come up with a list of common languages. These will be expanded later. It’s tricky - the list I finally came up with is just a list - WARNING: please don’t use this list for your project if you need common languages - it is fairly arbitrary.

It’s pretty hard to come up with a list of languages for a project like this. We want to allow anyone to upload stuff in any language, but we also wanted to have a short-ish list of languages (20 to 40) instead of just using the really long one (a decision that’s not necessarily the right one, but it was taken).

We didn’t have time to take into account which languages are producing more media that’s likely to be uploaded to ourmedia.org, or any factors like that. So this list is mostly based on the amount of speakers these languages have, with some preference for European languages (target audience in terms of computer access right now).

- Top 10 Common Languages by amount of native speakers
- The UN uses these main languages on their website: English, Arabic, Chinese, French, Russian, Spanish.

This is the list I came up with, alphabetically. Users can use this to indicate the language of their media, and other things, or choose another language.

Arabic
Bengali
Chinese
Dutch
English
French
German
Hindi
Italian
Japanese
Javanese
Korean
Marathi
Polish
Portuguese
Russian
Spanish
Tamil
Telugu
Turkish
Urdu
Vietnamese

Again, don’t just use this list if you need common languages. It’s very arbitrary and it took me less than an hour to put this together. I share it here as a starting point only.

Please leave comments!

Thursday, January 13th, 2005

I am in the middle of grading papers for the XTech 2005 Conference, and I am thinking about grading, categorizing and comparing. The important thing with grading is that you get a consistent approach - if I only give A’s to a few people, and someone else gives A’s to almost everyone, it’s not consistent. I think the XTech people average it out by manually comparing all the grades and comments.

When you grade or categorize, it is important to have something to compare to. The first thing I did when I started was to have a look at all the papers. In my head, I averaged them out, and then started grading. Similar effects happen with categorizing: it may be important to see how others have categorized this item, or what other items have been categorized using the categories you assigned to this item.

Excuse my rambling, but this reminds me of a story I read in the New York Times a long time ago, about the grading student papers and how the graders got trained. The challenge is to get a consistent grade, and the article made it sound as if they did have a good system to get that. Lots of training was involved, with example papers and grades, and regular refreshing of the training, to make sure graders were still following the standard. The article might have been called ” Grading This Article? First, Take Time to Learn the Rules”, but I’m not sure because the NYT doesn’t let you access old articles without paying up.

Then again, while searching for this article I found another one: “Grading Mistakes Caused More Than 4,000 Would-Be Teachers to Fail a Licensing Exam” (the NYT pisses me off with this closed linking policy by the way).

Librarians tend to say categorizing can only be done after a lot of training. And the stats show that, even with trained indexers, indexing terms differ something like 60% between indexers (Bella, correct me on the numbers here). Of course, indexing isn’t the same as categorizing, but that’s a lot of inconsistency for trained people.

Anyway, my point is, I think you can build in the right kinds of feedback to make some kinds of categorizing pretty efficient. And we haven’t explored these kinds of feedback very much yet - they’re specific to a computing environment, in other words, we didn’t have these possibilities 20 years ago. I have more thoughts about this, will report back later!

Wednesday, January 12th, 2005

Entertaining cultural differences in manual.

Wednesday, January 12th, 2005

First episode (Quicktime movie).

Designing the relationship between content and locales

Wednesday, January 12th, 2005

My series of posts on international information architecture:

  1. Translating taxonomies and categories
  2. Translating categories, translating terms
  3. Translating the Dewey Decimal Classification system
  4. Designing the relationship between content and locales (this post)
  5. Emergent i18n effects in folksonomies
  6. The Maori versus Dewey, and why limiting access can be culturally appropriate.

I am trying to gather thoughts on the various structures and patterns that occur when creating multiple versions of a website in multiple locales. I hope to create a model, a way of thinking about these structures that can help when making decisions about things like:

  • What content should be translated?
  • From which website to which website?

We can come up with many models (models schmodels), the question is: is this a useful one? Does it aid your thinking? This is early work, so any feedback (shoot it down!) is greatly appreciated.

Let’s get started.

When you are creating versions of a website in multiple locales, the simplest case is when you just translate everything on the website. One on one translation. You start with a master locale, and just translate it into another locale. This is rare though – very few translation projects are this simple.


One on one translation.

The second, more common case is selective translation: you have a master locale (often English), and other locales are partially translated.


Selective translation.

There are various types of selective translation: you can do a summation, where multiple pages or whole sections of the master website are replaced by just one page in the translated version. Or you can just not translate parts of the website: removal. Most projects do a bit of both.

A third case is when you have a master locale, but also original content in the translated locale: original content.


Original content

The original content in the translated language can be used as a master for translation into yet another language.

For example, your master locale is English-US, the translated locales are English-Canadian and French-Canadian. (French is an official language in Canada, and there are certain legal requirements to provide information in both official languages.) You might have a partial original content translation from English-US to English-Canadian, in other words, you take parts of the English-US content, and create parts of the content for English-Canadian from scratch. Then you might do a one-on-one translation from English-Canadian to French-Canadian.

This example can be described as a grouping locales. Many countries or regions have legal requirements (and human needs) to provide content in various official languages. If you have an intranet in Canada, you must provide content in French even if you only have 1 employee in Montreal who speaks French. In Belgium, you should provide content in French and Dutch, since half the country speaks French and the other half Dutch. If your locale is South-America, you better provide content in both Spanish and Portuguese, and maybe a few other languages as well.

When you are creating your content-locale structure, grouping locales often makes sense, in that a certain locale can become the master of all local languages, like in our Canadian example.The content needed for this group of locales is the same or very close.

Finally, sometimes almost no content is directly shared. In this case, we’re just talking about separate websites. It is a valid option, but I won’t discuss it in much depth here.

So, to recap, we have 4 simple ways to connect locales:

  • One-on-one translation
  • Selective translation (summation and removal)
  • Original content
  • Grouping of locales.

These structures can help us design the relationship between content and locales for our websites.

When you start designing the relationship between content and locales, you’ll often find that the structure you come up with is different for different types of content. Technical support information may be translatable directly in a one-on-one translation. You might not sell exactly the same products or services in all markets, so marketing content might require a partial original content translation. And so on.

So here is an example of how you might design the relationship between content and locales for a public product website. We are distinguishing a few different content types, and for each content type we provide different relationships.

| | | |

Wednesday, January 12th, 2005

Slope One Predictors for Online Rating-Based Collaborative Filtering: an easy to implement way to do “people who liked this also …”. I don’t really understand the paper though, Daniel will hopefully soon post som SQL code.

Will the new $500 Apple mini work for my mom?

Tuesday, January 11th, 2005

Tim Bray asks if the new $500 Apple Mini is good for his mom. I asked myself exactly the same question today.

My mom reads email, and she looks at a website or two every now and then. And writes something in Word and prints it out. She also likes pictures we send her by email. Will the mini work for her?

Read emails: no problems there.

Write things: OpenOffice has a Mac version, and will do just fine for her. The printer won’t fit, I don’t see a serial port there, so we’ll have to buy a new one. But it seems that the cheap kind of inkjet printer (where they make money of the cartridges) will work on Macs as well, so that’s fine.

Hardware: I have an USB keyboard lying around, so that’s good. I’ll have to buy a mouse. I’m not sure the old monitor will work, it’s a few years old. If I have to buy a new one that would suck. It’s got that standard blue PC monitor connection, but I don’t see that on the Mac. Will that work?

The learning curve.
Tim was right, however easy the OS is, for my mom it’s just another learning curve. Then again, the biggest problem with Windows is those error dialogs popping up, confusing her. Maybe the Mac really *is* easier and will make her life less stressed. The “it just works” approach is perfect for my mom, if it really does just work. What do you think? Should I get her one?

Monday, January 10th, 2005

Christina pointed me to The ESP Game: Labeling the Web. It is a game format for labeling images on the web. Simply brilliant.

Monday, January 10th, 2005

James Tauber : Translations, Glosses, Tags and Folksonomies. I’ll parse this later and report back.

Sunday, January 9th, 2005

Jay talking about, wel, stuff.

*Pixelcharmer: Field Notes: L’iChat

Friday, January 7th, 2005

Tanhya points out how even uncontrolled vocabularies can show some nice multilingual effects.

Friday, January 7th, 2005

It turns out clean urls (using mod_rewrite) really make google happy (I thought they might have gotten over that), so go Google, index the India guide.

Friday, January 7th, 2005

Wait, Britney Spears is blogging?

Friday, January 7th, 2005

A good article explaining the reasons why just adding a bunch of navigation stuffs to a site isn’t always gonna work: Navigation blindness.

Friday, January 7th, 2005

Why Politicians Need Weblogs, and 10 reasons why politicians should blog.

Friday, January 7th, 2005

(via Joho) Brian Storms suggests Taggle - a search engine for tags that would federate all the tags from services like Flickr and such.

Friday, January 7th, 2005

Lou provides some much needed perspective on Folksonomies? How about Metadata Ecologies?

Translating the Dewey Decimal Classification system

Friday, January 7th, 2005

My series of posts on international information architecture:

  1. Translating taxonomies and categories
  2. Translating categories, translating terms
  3. Translating the Dewey Decimal Classification system (this post)
  4. Designing the relationship between content and locales
  5. Emergent i18n effects in folksonomies
  6. The Maori versus Dewey, and why limiting access can be culturally appropriate.

The Dewey Decimal Classification system (used in libraries throughout the world to classify books) has been translated into many languages, so I figured I’d ask their Editor in Chief, Joan S. Mitchell, about their experience. I was particularly interested in their approach to categories as “concepts”, and their approach to developing a taxonomy of concepts, regardless of translatability.

Joan S. Mitchell: “Many DDC translations contain more detailed developments in selected areas than found in the English-language standard edition. For example, the province of Rovigo has one number (–4533) in Table 2 in the English-language full edition; there are nine subdivisions listed under this number (representing parts of Rovigo province) in the Italian translation of the full edition. ”

In other words, local translations have different requirements - the English version may be content with providing geographical categories for Italy up to the level of a province, but people in Italy may be interested in having categories for subdivisions of provinces.

Here’s a case study describing this as well: Beall, J. 2003. Approaches to expansions: case studies from the German and Vietnamese translations. Presented at the Classification and Indexing - Workshop, World Library and Information Congress: 69th IFLA General Conference and Council. 1-9 August 2003, Berlin. http://www.ifla.org/IV/ifla69/papers/123e-Beall.pdf

I also asked her about terms or categories that are not easily translatable.

Joan S. Mitchell: “In nearly every translation, we have encountered terms that are not easily translated from one language to another. There are a number of strategies we employ to address these issues.

In some cases, the term is left in English, e.g., “land-grant colleges” appears in English in the German translation because it is part of the name of a category and there isn’t a suitable equivalent in German.

Sometimes, we adjust the definition of category itself to accommodate translation issues. For example, the popular German form of bowling is skittles (ninepins); in the US, the most popular form is tenpin bowling. After an inquiry from the German translators, we added a note to indicate that the category of bowling covers both types.

In the case of examples that illustrate categories, we routinely encourage translators to substitute examples that have meaning in the specific language group for those used in the English-language edition. We also encourage the inclusion of index terms that are meaningful to the language group and compatible with the definition of the category. The index to a translation can include different synthesized numbers and accompanying index terms from those found in the English-language standard edition.”

I was curious about their approach to the taxonomy as a “concept tree”. This assumes that all concepts can be expressed in all languages, although (as in the bowling example above), sometimes categories are adjusted with feedback from translators.

In my previous post, “Translating categories, translating terms“, I discussed the problems with translating categories. My conclusions was that sometimes, categories are just not translatable. People think too different. The DCC has a different philosophy: the assume ALL concepts are translatable.

Joan S. Mitchell: “Translatability is not a criteria for adding new concepts; however, there has to be a certain threshold of published material (”literary warrant”) for the inclusion of a term or the expansion of a category. The literary warrant we use to develop the English-language standard edition is based primarily on the contents of OCLC’s WorldCat database, the world’s largest bibliographic database (over 57 million records). ”

For non library folks: “literary warrant” just means that, if you have a lot of books about a subdivision of a province in Italy, you should probably have that subdivision as a category, so people can find those books. The concept is the same with products you sell, or services, or whatever. Most companies have different products or services in different parts of the world, or have a different amount of information available (maybe not all technical manuals have been translated), therefore, the categories should be of different granularity.

| | | |

Friday, January 7th, 2005

Jay was doing more experiments with videobloggers, iChat and public access tv tonight (about 20 minutes ago). Here’s the movie.

vloggercon

Thursday, January 6th, 2005

The videobloggers have been quietly working in the slipstream of the podcasters, fairly unnoticed and that has been a great advantage. Now, vloggercon, really just a meeting of a bunch of videoblogging geeks, is coming up and you can hear the spotlights (they sqeak) slowly turning towards us. I am confident the videobloggers will survive them, and when the storm is over we’ll continue to work on what we believe is tv 2.0. It’s about the long tail. It’s about conversations. We’re not entirely sure yet what it’s about, exactly, but it’s not about replacing tv. TV is not really relevant to this. Videoblogging is new. It’s also surprisingly different from text blogging, which is why we need quiet time to figure things out, things like language, discussion and voice.