8 Tips for Globalizing Your Software - Part 1

by Shawn O'Hern 22. January 2010 12:00

Today I'm going to give you some tips for globalizing (or internationalizing) your applications. But first, let me define what globalization is and isn't.

Globalization is the process of preparing your application for a global audience.

Globalization is not the same as localization. Localization is the act of tailoring your application to any number of different languages and cultures (or locales). It involves things like translating your user interface into the target language, and adding culture-specific information. When you complete the localization process, you will have a customized copy of your application for each target language/culture. But globalization is not about tailoring your application to any one language or culture—just the opposite, in fact.

Globalization is about removing all assumptions about the target language or culture. You can—and should—always globalize your applications, even if you don't localize them.

Now that we have defined the word, let's look at some globalization best practices you can follow to ensure that your software is truly world-ready. We will look at the first four tips this month; the remainder will be posted next month.

1. Don't use national flags to represent languages

Wrong:

A lot of web sites use flag icons as a compact, visually-appealing way for users to select the UI language. This is wrong for the simple reason that languages and countries are not interchangeable. A language can be spoken in multiple countries, and a country can have multiple languages spoken within its borders. Consider: India has over 20 languages of record. Suppose your application will support the Hindi, Punjabi, and Sanskrit languages. Are they all going to be represented by the Indian flag? On the other side of the coin, consider all the countries that have English as an official language. Which flag should represent English? The United States? The United Kingdom? Canada? Australia? New Zealand? South Africa? Liberia? (and so on...)

Solution: Just use the names of the languages themselves (with each written in that language).

Correct:

2. Don't build sentences dynamically

An incorrect example of dynamically building a sentence in code by concatenating strings (in VB.NET):

"Log me out after " & minutes.ToString() & " minutes"

This is wrong because different languages have different sentence structures. If you translate those fragments individually and then try to concatenate them back in the same order, the resulting sentence may not make any sense.

Instead, you should keep the whole sentence together in a single string; if you need to fill in values, then you can use placeholders. A better way to implement the previous example is:

"Log me out after {0} minutes".Replace("{0}", minutes.ToString())

This way, the entire sentence will be translated as a single linguistic unit, and the translator will put the placeholder in the correct position in the translated sentence.

Corollary: Don't build sentences out of UI controls

Wrong:

This is wrong for the same reason that dynamically building sentences in code is wrong. If you translate both halves of that sentence individually, your control may end up in the wrong part of the sentence when you're done. The correct way to handle this situation is to take the control out of the sentence completely.

Correct:

Also correct:

3. Don't use ASCII encoding

Hey, 1985 called, and it wants its character encoding back. Seriously, the only excuse you have to be using ASCII in this day and age is if you are writing embedded software for some sort of device that only has 4 kilobytes of memory, so you absolutely cannot spare more than 1 byte per character. For all other purposes, however, get in the habit of using Unicode for all your string-handling needs. Then you can rest assured that your application won't break when one of your Russian customers enters his name in Cyrillic script. As far as string encodings go, UTF-8 is always a good choice. UTF-8 uses a single byte for characters in the ASCII range, but it also scales up nicely to handle any Unicode character.

4. Accept special characters as input

 I'm going to tell you a story. I work for the U.S. government, and I use a lot of web applications at work. They are very expensive applications—like tens of millions of dollars each. Now, you might think that applications that cost that much money would be cutting-edge, shining examples of best coding practices. Well, you would be wrong.

My last name has an apostrophe in it. On my first day of work, when I received my work email address, I noticed that IT decided to include the apostrophe in my address; I personally would have left it out, but I thought little of it at first. But then I tried registering for some of these web apps that are needed to do my job. And naturally, three of them wouldn't accept my apostrophe email address. These apps are fairly central to my work, so not being able to register really cut down my productivity. The first application was fixed almost immediately (which I greatly appreciate). The second took about 10 months to be patched (it also took the complaints of a second employee who had the same problem as me). But the third application is still broken, and will be for the forseeable future; when I asked their help desk what I could do, they told me that the only solution was to get a new email address. Um, thanks a lot.

There is no conceivable reason to disallow apostrophes (or any other special characters) from email address fields. Obviously apostrophes are legal in addresses because I send and receive email every day. I believe there are three possible explanations:

  1.  The developer gave special meaning to apostrophes for some reason. Maybe the developer was splitting fields on apostrophes, or perhaps the developer was building an SQL database query without escaping his parameters (if you don't know why that's bad, try researching "SQL injection attack").
  2. The developer made a conscious arbitrary decision to disallow apostrophes from email addresses. Not for any technical reason, but because "I have never seen an email address like that, so therefore, they must not exist."
  3. The developer was lazy or naive when he wrote the validation function and didn't consider all possibilities for valid characters. I cringe every time I see [A-Za-z] in a regular expression. There are more than 26 letters in the world, people!

Please, fellow developers, please learn from this story. When you are writing your software, consider the plight of those who have extended characters in their names. Only when you are applying this lesson, make sure that you replace "email address" with "all fields", and "apostrophe" with "any character".

Come back next month for the last four globalization tips!

Tags:

Software Development

4 Website Pseudo-Security Techniques That Don't Protect You

by Shawn O'Hern 11. November 2009 12:00

Today I will present four of the top security features that web sites implement to try to keep you safe online.  And then I will tell you why each one is bogus.  While these techniques range from the merely annoying to the moderately dangerous, the one thing that they have in common is that they are so misguided that you should be slightly concerned if you see these on sites that you use.  If these features pass as good ideas in the minds of the developers, then who knows what other untold horrors lurk in the sites' code?

1. The Virtual Keyboard

 

 A virtual keyboard is an on-screen keyboard consisting of buttons that represent keys on your keyboard.  The keys are usually placed in a random order, and you are prevented from typing in your password normally—you have to type it with the on-screen keyboard.  So-called security "gurus" claim that this prevents keyloggers from stealing your password.  While this may or may not actually be true, it is completely irrelevant, because it is trivial for software running on your computer to get your password anyway.  This can be achieved in a number of ways, but one way is through accessibility interfaces that many browsers expose.  These interfaces allow other running programs to read information on the web pages you are browsing, including the password you just virtually typed in.  This isn't a bug, it's a feature; there are a number of legitimate uses for this functionality—screen readers are but one example.  The point is that this type of power can be used by programs for good or for evil.  So if you have an evil program on your computer that's reading your keystrokes as you type them, then you need to go to the root of the problem, which is the malicious software installed on your computer, instead of hoping that a band-aid solution like a virtual keyboard will protect you.

And we haven't even gotten into how virtual keyboards kill usability and accessibility.  Using the mouse to click virtual keys that are randomly placed and are the size of Micro SD cards is a tedious affair.  It's even worse if you don't have a mouse and are forced to use your (real) keyboard by pressing Tab and Shift+Tab dozens, if not hundreds, of times.  GRRRRRRRRRR.

2. Blocking Right Clicks

 

This feature goes beyond pseudo-security and touches upon pseudo-copyright protection also.  Here are just a few of the reasons that developers may use to justify blocking the right-click context menu on web pages:

  • To prevent users from downloading images on the page
  • To prevent users from viewing the page source code
  • To have greater control over the users' experiences

Notice a common theme in that list?  It's all about limiting users and what they can do.  Good software shouldn't limit people, it should enable people.  Furthermore, this technique is a joke because it's not even capable of keeping users from doing any of those things.  It provides no protection whatsoever and it just hinders the majority of visitors to the site who are not image thieves or computer hackers.  So if you ever come up against a context menu that won't open, just use any of these techniques and you'll be a 1337 h4x0r too:

  • Press the Application key (on Windows keyboards, the key with a picture of a menu on it)
  • Press Shift+F10 (Windows only)
  • Disable Javascript in your browser and right-click away.

 3. The Security Image

 

This technique seems to be favored by banks and credit unions.  Security images are meant to protect against phishing attacks.  The theory of operation is that you choose an image when you sign up for an online account, and then on subsequent visits, the server will display your image.  That way, when you see your image, you will know that the site is authentic and not a lookalike phishing site, because the lookalike site doesn't know what your image is.  This technique is actually a fairly good idea, except for the fact that attackers can still very likely trick many users into logging into their fake sites by lying about why their images can't be shown.  All they have to do is say something like "the image server is down," and users will fall for it.

Security images aren't necessary because there is already a superior method in use today for verifying the identity of servers—Secure Sockets Layer (SSL) certificates.  SSL certificates are superior because they don't place the burden on you to determine if the site is authentic or not.  If a site has been issued a certificate, then you know that the certification authority has investigated the company operating the site and has determined that it is legitimate.  So remember, next time you visit a secure site, look for the lock icon in your browser!

4. The Security (or Challenge) Question

Security questions are those personal questions to which sites require you provide answers, so that you can answer them in the future if you ever need to prove your identity.  They normally come in sets of three, and they usually require such arcane knowledge as your favorite color or your first pet's name.  The questions are such that you should be able to easily recall the answers from memory, but other people should not be able to guess those answers.  That sounds great in principle, except for the fact that the answers to those questions are usually too easy to come by.  Mother's maiden name?  Place of birth?  That kind of information is available in public records!  And any other information an attacker would need to answer those questions could be obtained via social engineering attacks.  This type of "security" feature is dangerous because while it is meant to make things easier for the users, it really just undermines their security and the security of the entire system.

So if you are ever forced to use security questions, how should you handle them?  Use random text for the answers.  Want to know my favorite movie?  S]ujv1)_EB8D.  And the color of my first car?  KSf$.CPP]uB2vry.  Of course, random answers are much more difficult to remember (which is the whole point), so you may need some sort of system for remembering them—not to mention all of your passwords.

Tags: ,

Software Development | Computer Security

Copyright © 2004-2009 Shawn O'Hern. All rights reserved.

Powered by BlogEngine.NET 1.5.0.7