language {tau}R Documentation

Parse IETF Language Tag

Description

Extract language, script, region and variant subtags from IETF language tags.

Usage

parse_IETF_language_tag(x)

Arguments

x a character vector with IETF language tags.

Details

Internet Engineering Task Force (IETF) language tags are defined by IETF BCP 47, which is currently RFC 4646 (http://tools.ietf.org/html/rfc4646) and RFC 4647 (http://tools.ietf.org/html/rfc4646), and are used in a number of modern standards.

Each language tag is composed of one or more “subtags” separated by hyphens. For the basic format currently supported, the subtags occur as follows:

Language subtags are mainly derived from ISO 639-1 and ISO 639-2, script subtags from ISO 15924, and region subtags from ISO 3166-1 alpha-2 and UN M.49. (See package ISOcodes for more information about these standards.) Variant subtags are not derived from any standard. The Language Subtag Registry (http://www.iana.org/assignments/language-subtag-registry), maintained by the Internet Assigned Numbers Authority (IANA), lists the current valid public subtags.

See http://en.wikipedia.org/wiki/IETF_language_tag for more information.

Note that in particular so-called grandfathered and private use tags are currently not supported.

Value

A character matrix with 4 columns named "Language", "Script", "Region", and "Variant", giving the corresponding subtags (or NA if these were missing from the language tag).

Examples

## German as used in Switzerland:
parse_IETF_language_tag("de-CH")
## Serbian written using Latin script as used in Serbia and Montenegro:
parse_IETF_language_tag("sr-Latn-CS")
## Spanish appropriate to the UN Latin American and Caribbean region:
parse_IETF_language_tag("es-419")
## All in one:
parse_IETF_language_tag(c("de-CH", "sr-Latn-CS", "es-419"))

[Package tau version 0.0-6 Index]