A few weeks ago I needed to export a number of data frames to separate worksheets in an Excel file. Although one could output csv-files from R and then import them manually or with the help of VBA into Excel, I was after a more streamlined solution, as I would need to repeat this process quite regularly in the future.
CRAN has several packages that offer the functionality of creating an Excel file, however several of them provide only the very basic functionality. The R-wiki page on exchanging data between R and Windows applications focuses mainly on the data import problem.
My objective was to find an export method that would allow me to easily split a larger dataframe by values of a given variable so that each subset would be exported to its own worksheet in the same Excel file. I tried out the different ways of achieving this and documented my findings below.
Data Preparation
The goal is to split the iris dataset by the unique values of iris$Species, and export it to a new Excel file with worksheets: setosa, versicolor and virginica.
Two ways of storing the information were used for the purposes of this exercise:
List of dataframes
> library(plyr) |
> testlist <- dlply(iris, .(Species)) |
A character vector of containing names of data frames
First create the data frames, and then the character vector with the object names – now we will have three separate data frames called setosa, versicolor and virginica.
> d_ply(iris, .(Species), function(df) { assign(as.character(df$Species[1]), df, envir = .GlobalEnv) }) > testchar <- as.character(unique(iris$Species)) > testchar1 <- paste(testchar, collapse = ",") |
Storing data in a list of dataframes is obviously much more convenient, however, as you see later many (or most) functions don’t accept lists as input.
dataframes2xls
Note |
Requires Python (>= 2.4) |
dataframes2xls saves dataframes to an xls file. Its main function write.xls, is a wrapper around a utility called xls2csv. xls2csv makes use of the Python module pyExcelerator and the afm submodule of the Python module matplotlib, both of which are included in dataframes2xls.
> library(dataframes2xls) > dataframes2xls::write.xls(c(setosa, versicolor, virginica), "dataframes2xls.xls") |
Appending to an existing file not possible, this negates the use of list of dataframes as input. One major shortcoming is that one needs to specify the names of exported dataframes manually. Also, I was not able to pass sheet names to the function if there was more than one sheet.
There is a way to work around the manual specification of data frames, but as you can see it is not very intuitive:
> eval(parse(text = paste("dataframes2xls::write.xls(c(", testchar1, "),", "'dataframes2xls.xls')"))) |
WriteXLS
Note |
Requires Perl with module Text::CSV_XS |
WriteXLS is a “Perl based R function to create Excel (XLS) files from one or more data frames. Each data frame will be written to a separate named worksheet in the Excel spreadsheet. The worksheet name will be the name of the data frame it contains or can be specified by the user”.
> library(WriteXLS) > WriteXLS(testchar, "WriteXLS.xls", perl = perl) |
Like dataframes2xls WriteXLS does not take lists as input, and therefore each of the data frames needs to be generated beforehand before calling the function(s). Also appending to a file is not possible as the Excel file, if it exists, is overwritten.
xlsReadWrite(Pro)
Note |
This package currently works only on Windows machines. |
xlsReadWrite saves a data frame, matrix or vector as an Excel file in Excel 97-2003 file format.
> library(xlsReadWrite) > xlsReadWrite::write.xls(iris, "xlsReadWrite.xls") |
The dataframe can be written to one sheet only. It is not possible to split the data between separate sheets, as append data to existing files feature is available in the Pro-version.
The Pro-version has a 30-day trial, so I tried it out.
> library(xlsReadWritePro) |
> rfile <- "xlsReadWritePro.xls" > exc <- xls.new(rfile) > l_ply(testlist, function(x) { sheet <- as.character(unique(x$Species)) xlsReadWritePro::write.xls(x, file = exc, sheet = sheet, colNames = TRUE) }) > xls.close(exc) |
Task accomplished.
The Pro-version has several other nice features, such as the ability to save images to an Excel file, or to write Excel formulas.
For reference, the licence costs are as follows.
- Single user license: 75 euros
- Non-commercial single user: 19 euros
- Company/university wide: 570 euros
RODBC
Note |
There are ODBC Excel drivers for Windows only. |
RODBC sqlSave function saves the data frame in the specified worksheet via ODBC after initiating the connection using a convenience wrapper odbcConnectExcel.
> library(RODBC) |
> save2excel <- function(x) sqlSave(xlsFile, x, tablename = x$Species[1], rownames = FALSE) |
> xlsFile <- odbcConnectExcel("RODBC.xls", readOnly = FALSE) > l_ply(testlist, save2excel) > odbcCloseAll() |
This worked well. Another good thing to note is that you can append to already existing files.
RDCOMClient
Note |
It requires a Windows machine that has Excel running on it. |
RDCOMClient allows to access and control applications such as Excel, Word, PowerPoint, Web browsers from within R session. As an alternative, rcom package provides similar functionality.
> library(RDCOMClient) |
Developer’s website provides some useful functions for exporting/importing dataframes to/from Excel.
> source("http://www.omegahat.org/RDCOMClient/examples/excelUtils3.R") |
> xls <- COMCreate("Excel.Application") > xls[["Visible"]] <- TRUE > wb = xls[["Workbooks"]]$Add(1) |
> rdcomexport <- function(x) { sh = wb[["Worksheets"]]$Add() sh[["Name"]] <- as.character(x$Species[1]) exportDataFrame(x, at = sh$Range("A1")) } > d_ply(iris, .(Species), rdcomexport) |
Now I have an Excel file open with an empty “Sheet1”. I delete it and, after specifying the save-directory (it defaults to My Documents) & filename , save the file. I have Office 2007 installed, so this file format is used by default.
> xls$Sheets("Sheet1")$Delete() > filename <- paste(getwd(), "RDCOMClient.xlsx", sep = "/") > filename <- gsub('/', '\\', filename) > wb$SaveAs(filename) > wb$Close(filename) |
Another example of RDCOMClient in action can be seen in this R-help post.
Conclusion
The Windows-only solutions were (expectedly) the most flexible with RDCOMClient-approach providing the greatest control over output. However, if formatting was not that important the simplest way to export data to multiple Excel worksheets would be via RODBC (if on Windows machine).