read.gnumeric.sheet {gnumeric} | R Documentation |
Read data from a sheet of a gnumeric (or other common spreadsheet or database) file to a data.frame.
Requires an external program, ssconvert (normally installed with gnumeric (http://projects.gnome.org/gnumeric/) in PATH.
Calls ssconvert to convert the input to CSV. ssconvert can read several file formats (see Details below).
Note: During conversion to CSV ssconvert also evaluates formulas (e.g. =sum(A1:A3)) in cells, and emits the result instead of the formula.
df<-read.gnumeric.sheet(file, head=FALSE, sheet.name='Sheet1', top.left='A1', bottom.right='IV65536', drop.empty.rows="bottom", drop.empty.columns="right", colnames.as.sheet=FALSE, rownames.as.sheet=colnames.as.sheet, quiet=TRUE, LANG='C', import.encoding=NA, ... )
file |
Name of gnumeric file (or other file type readable by
gnumeric) to read from.
This may also be an URL, i.e. like 'http://example.com/path/file.gnumeric'
|
head |
When TRUE, use first row of requested gnumeric sheet
range as column names in the resulting data.frame |
sheet.name |
Name of sheet as appears in gnumeric. Sheet names containing space character do not work. sheet.name=NA Omits sheet name from the ssconvert command line.For gnumeric files this will read the sheet that was 'current' in gnumeric when the file was saved. |
top.left |
Top left corner of requested gnumeric sheet
range, e.g. 'A1' |
bottom.right |
Bottom right corner of requested gnumeric sheet
range. The default for read.gnumeric.sheet is 'IV65536' : this
causes a lot of unused lines to be printed by ssconvert then
parsed by read.csv , thus you might want to override it to
speed up reading.
Use read.gnumeric.sheet.info to read actual
bottom.right cell name from a gnumeric file (but not other formats).
|
drop.empty.rows |
One of
c('none','top','bottom','both','all') .
'all' drops all empty lines from the requested range, even
those that are between two non-empty rows.
'both' drops empty lines below the last non-empty row and
above the first non-empty.
'top' , 'bottom' and 'none' as you would expect.
|
drop.empty.columns |
One of
c('none','left','right','both','all') Similar to drop.empty.rows , but for columns.
|
colnames.as.sheet |
Rename columns to 'A', 'B', 'C', ... to
have names corresponding to gnumeric column names. |
rownames.as.sheet |
Rename rows to '1', '2', '3', ... to have
names corresponding to gnumeric row indices. Note: this means
df['1',] , not df[1,] in the result (rownames are
strings, not integers). Note: when deciding row names only
top.left and head are accounted for, but not
e.g. skip (which may be passed to read.csv via ... ).
|
quiet |
When TRUE, do not print command executed, and (on unix platforms) also redirect stderr of the external program ssconvert to /dev/null |
LANG |
Under unix, passed to ssconvert in the environment
variable 'LANG'. The default value ('C' ) is intended to avoid
using decimal comma in the emitted CSV file. |
import.encoding |
If not NA, passed to ssconvert as its
--import-encoding parameter.
|
... |
Extra arguments, passed to read.csv |
Data from the gnumeric file is dumped as .csv using the ssconvert program provided with gnumeric.
ssconvert supports several input formats, thus the input file does not have to be a gnumeric file. The formats supported may be listed with
ssconvert --list-importersfrom a shell prompt.
For me this prints (with ssconvert version '1.8.4')
ID | Description Gnumeric_xbase:xbase | Xbase (*.dbf) file format Gnumeric_Excel:excel | MS Excel (tm) (*.xls) Gnumeric_Excel:xlsx | MS Excel (tm) 2007 Gnumeric_html:html | HTML (*.html, *.htm) Gnumeric_oleo:oleo | GNU Oleo (*.oleo) Gnumeric_applix:applix | Applix (*.as) Gnumeric_QPro:qpro | Quattro Pro (*.wb1, *.wb2, *.wb3) Gnumeric_paradox:paradox | Paradox database or | primary index file Gnumeric_sc:sc | SC/xspread Gnumeric_XmlIO:sax | Gnumeric XML (*.gnumeric) Gnumeric_lotus:lotus | Lotus 123 (*.wk1, *.wks, *.123) Gnumeric_XmlIO:dom | Gnumeric XML (*.gnumeric) Old | slow importer Gnumeric_dif:dif | Data Interchange Format (*.dif) Gnumeric_Excel:excel_xml | MS Excel (tm) 2003 SpreadsheetML Gnumeric_OpenCalc:openoffice | Open/Star Calc (*.sxc, *.ods) Gnumeric_plan_perfect:pln | Plan Perfect Format (PLN) import Gnumeric_sylk:sylk | MultiPlan (SYLK) Gnumeric_mps:mps | Linear and integer program (*.mps) | file format Gnumeric_stf:stf_csvtab | Comma or tab separated | values (CSV/TSV) Gnumeric_stf:stf_assistant | Text import (configurable)
But the actual list may depend on which import plugins are installed for gnumeric.
Format | Source | Status |
.gnumeric | gnumeric | works |
.xls | gnumeric | works |
.html | gnumeric [Save as / HTML 4.0] | works |
.html | Openoffice Calc [Save as/HTML Document] | works |
.ods | Openoffice Calc | might work[1] |
Other formats | not tested |
[1] during .ods import ssconvert emits some messages on stdout (corrupts output). Worked around on unix with | grep , on the output. (The problem has been corrected in libgsf on 2009-08-20, thus the workaround is expected to be removable by 2010)
read.gnumeric.range
for a variant with default
arguments more suited for reading an exact cell range of a sheet.
read.gnumeric.sheet.info
to read actual
bottom.right cell name from a gnumeric file (but not other formats).
read.gnumeric.sheets
to read all sheets
from a gnumeric file (but not other formats).
read.xls
for reading Microsoft Excel files
(possibly from a http:// URL)
read.DIF
for reading Data Interchange Format (DIF)
files.
read.dbf
for Xbase (.dbf) files.
## Read all data from 'Sheet1' ## Not run: df <- read.gnumeric.sheet( file="file.gnumeric" ); df <- read.gnumeric.sheet( file="file.gnumeric", sheet.name='Sheet1' ); ## Read from Excel sheet named 'Sheet3' the range C3:D50, ## rename columns to 'C' and 'D', rows to '3' ... '50', ## then drop all empty rows. ## df<-read.gnumeric.sheet( "file.xls", sheet.name='Sheet3', top.left='C3', bottom.right='D50', drop.empty.rows="all", drop.empty.columns="none", colnames.as.sheet=TRUE ) ## Read from "file.gnumeric", 'Sheet1' data in 'A1:E100', ## Use first row (of selected range) as column names. ## Drop empty rows and columns from bottom and right. df<-read.gnumeric.sheet("file.gnumeric", head=TRUE, bottom.right='E100') ## Why does it not work? Set quiet=FALSE to see ## the command executed (and on unix, diagnostic ## messages from ssconvert). df<-read.gnumeric.sheet( "file.ods", quiet=FALSE ) ## End(Not run)