language {tau} | R Documentation |
Extract language, script, region and variant subtags from IETF language tags.
parse_IETF_language_tag(x)
x |
a character vector with IETF language tags. |
Internet Engineering Task Force (IETF) language tags are defined by IETF BCP 47, which is currently RFC 4646 (http://tools.ietf.org/html/rfc4646) and RFC 4647 (http://tools.ietf.org/html/rfc4646), and are used in a number of modern standards.
Each language tag is composed of one or more “subtags” separated by hyphens. For the basic format currently supported, the subtags occur as follows:
Language subtags are mainly derived from ISO 639-1 and ISO 639-2, script subtags from ISO 15924, and region subtags from ISO 3166-1 alpha-2 and UN M.49. (See package ISOcodes for more information about these standards.) Variant subtags are not derived from any standard. The Language Subtag Registry (http://www.iana.org/assignments/language-subtag-registry), maintained by the Internet Assigned Numbers Authority (IANA), lists the current valid public subtags.
See http://en.wikipedia.org/wiki/IETF_language_tag for more information.
Note that in particular so-called grandfathered and private use tags are currently not supported.
A character matrix with 4 columns named "Language"
,
"Script"
, "Region"
, and "Variant"
, giving the
corresponding subtags (or NA
if these were missing from the
language tag).
## German as used in Switzerland: parse_IETF_language_tag("de-CH") ## Serbian written using Latin script as used in Serbia and Montenegro: parse_IETF_language_tag("sr-Latn-CS") ## Spanish appropriate to the UN Latin American and Caribbean region: parse_IETF_language_tag("es-419") ## All in one: parse_IETF_language_tag(c("de-CH", "sr-Latn-CS", "es-419"))