Localization/Language

From Wikiversity
Jump to navigation Jump to search

For the purpose of language planning and defining language sets for new or existing products, a language or language variant should be defined as the outcome of localization work for a product following a certain set of conventions:

  • Engineering conventions: Being enabled in the localization infrastructure (i.e. translation management system, engineering system, build system, etc.) and marked with a unique language identifier (for example, en-US)
  • Translations conventions: Following a common orthography in a certain writing system and, quite commonly in the translation industry, the rules of a style guide and a terminology set
  • UI conventions: Following a certain set of conventions around user interface adjustment, like mirroring or font sizes

The written language of an application/software product might differ from spoken-language components (for example, French (Canada) in spoken, French (France) in written).

Common confusions pertaining to language in terms of localization[edit | edit source]

There are a few sources of confusion when speaking of languages in terms of localization, most prominently:

  • Confusion about writing systems --- some languages may use two (or more) official writing systems and/or orthographies. For instance, Serbian is written in both the Cyrillic and Latin alphabets. In the context of localization then, Serbian Cyrillic and Serbian Latin count as two distinct languages.
  • Confusion about “localization infrastructure languages” and written languages --- in some cases there is a discrepancy between the “engineering” code of a language and the actual language. For example, a translation tool might enforce the use of a language with the code es-ES, but in fact the released products are supposed to be in an “international” version of Spanish.
  • Confusion about written and spoken languages. In terms of reach, the numbers cited for language use are most often those for the use of a spoken language (for example, from Ethnologue), while the definition of language sets mostly focuses on written language. For some languages the distinction between written and spoken is highly relevant – prominent examples include Arabic and Chinese.

Language and Content[edit | edit source]

  • In addition to translation software, human intervention is often needed because machine translation may change the meaning of the content. For example, the "Got Milk?" commercial line translated directly into Spanish comes out as "Are You Lactating?". In order to properly market to Latino families, the commercial was changed to "And You, Have You Given Them Enough Milk Today?'
  • The translation ratio of words and sentences differ between languages. For example, when English text is translated into French, the increase is between 15%-20%, and up to 80% into Hindi, whereas when it is translated into Chinese the text decreases by 30-50%. Therefore, interface design must account for these different space requirements, and menus, boxes, logos or other graphics with words imbedded must be self-sizing.
  • Some technical terms may not exist in some languages. To solve this problem, a terminology base or glossary must be created and maintained.
  • Punctuation marks vary between languages. For example, interrogatory sentences in English end with a question mark (?) while in Greek they end with a semicolon mark (;), etc.
  • Placement of punctuation marks also varies. For example, in English the percent sign (%) is placed directly after the number it refers to, while in Turkish the percent sign is placed immediately before the number.
  • Content may be culturally loaded and translating a particular word or a sentence should take that into account. For example, "meeting" in Japan implies a public ritual of consensus while all brainstorming and conflict have been worked in a social setting outside of an office. It carries a different meaning in the U.S. and some European countries.
  • Sometimes a language differs depending on the country in which it is spoken, e.g. " en-GB" and "en-US", because different regions may speak and spell a shared language with nuanced differences. It is advisable to specify both a language and a country code to improve the localization process.
  • Content also should comply with a country's law. For example, comparative advertising ( brand A is better than brand B) is legal in the U.S. but not in some other countries, such as Germany. In many countries, the collection of personal data is subject to state laws. For example, according to Russia's Personal Data Localization Law of 2015, all personal information provided by Russian citizens through website registration, making online purchases or sending emails is considered personal data and must be stored inside Russia.

When a product is said to be released in a certain language, what does that mean[edit | edit source]

Language means

  1. User Interface language/Language documentation or a website is written in
  2. and/or written language for text input by the user
  3. and/or spoken language in language output (for example, in videos)
  4. and/or recognized spoken language for language input via voice recognition

Usage means A user can use a language when

  1. They can understand the UI and find it appealing enough to use it in the language. For example, a highly academically translated version of Hindi might not be understood nor be appealing and therefore not get used, and users might use English instead.
  2. They can enter text where text input is required in a spelling system they are familiar with, most often through education, sometimes through the use of printed material.
  3. They can understand spoken language wherever spoken language is used. In most cases this will be a standard variant of the language (if such a standard exists) or one that the user is familiar with through television or radio.
  4. They can use spoken language where spoken language input is required and be understood. Dialects or accents might impede that understanding.