encoding {tau} | R Documentation |
Functions for testing and adapting the (declared) encoding
of the components of a vector of mode character
.
is.utf8(x) is.ascii(x) is.locale(x) translate(x, recursive = FALSE, internal = FALSE) fixEncoding(x, latin1 = FALSE)
x |
a vector (of character). |
recursive |
option to process list components. |
internal |
option to use internal translation. |
latin1 |
option to assume "latin1" if the declared
encoding is "unknown" . |
is.utf8
tests if the components of a vector of character
are true UTF-8 strings, i.e. contain one or more valid UTF-8
multi-byte sequence(s).
is.locale
tests if the components of a vector of character
are in the encoding of the current locale.
translate
encodes the components of a vector of character
in the encoding of the current locale. This includes the names
attribute of vectors of arbitrary mode. If recursive = TRUE
the components of a list
are processed. If internal = TRUE
multi-byte sequences that are invalid in the encoding of the current
locale are changed to literal hex numbers (see FIXME).
fixEncoding
sets the declared encoding of the components of
a vector of character to their correct or preferred values. If
latin1 = TRUE
strings that are not valid UTF-8 strings are
declared to be in "latin1"
. On the other hand, strings that
are true UTF-8 strings are declared to be in "UTF-8"
encoding.
The same type of object as x
with the (declared) encoding
possibly changed.
Currently translate
uses iconv
and therefore is not
guaranteed to work on all platforms.
Christian Buchta
FIXME PCRE, RFC 3629
## Note that we assume R runs in an UTF-8 locale text <- c("aa", "a\xe4") Encoding(text) <- c("unknown", "latin1") is.utf8(text) is.ascii(text) is.locale(text) ## implicit translation text ## t1 <- iconv(text, from = "latin1", to = "UTF-8") Encoding(t1) ## oops t2 <- iconv(text, from = "latin1", to = "utf-8") Encoding(t2) t2 is.locale(t2) ## t2 <- fixEncoding(t2) Encoding(t2) ## explicit translation t3 <- translate(text) Encoding(t3)