Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Using SQLite

SQLite (from www.sqlite.org) is an open source database package that is well supported in . It has the advantage that it requires no setup or administration (other than installing the package) and is an embedded system so that there is less of a connection overhead. You are able to manage very large datasets in SQLite without needing to load all the data into memory, unlike R itself, so that you are able to manipulate the data using SQL then load in just the data you need.

For small dataset SQLite is a good choice, but for very large datasets, MySQL still performs very well.

There is also a project under way as part of the http://code.google.com/soc/gnu/appinfo.html?csaid=7211BC2F22A5A01EGoogle Summer of Code project, that aims to create a package that will store data frames and matrices into sqlite databases, initially called sqlite data frames (sdf). These sdf's will behave like ordinary data frames so that existing R functions will work. This will enable R users to work with very large datasets much more readily, with no user effort.

For now, SQLite allows the easy import and export of data to text files.



library(RSQLite)
con <- dbConnect(SQLite(), "foo3.db")
dbGetQuery(con, "pragma cache_size")
    cache_size
  1 2000
dbGetQuery(con, "pragma cache_size=2500")
   NULL
dbGetQuery(con, "pragma cache_size")
   cache_size 
 1 2500

As an example, first create an empty SQLite database (outside of R) and import a CSV (comma separated value) file, tell sqlite to use commas, not '|':

$ sqlite3 -separator , audit.db
sqlite> create table audit(ID INTEGER, Age INTEGER, Employment TEXT,
                           Education TEXT, Marital TEXT, Occupation TEXT,
                           Income REAL, Sex TEXT, Deductions REAL,
                           Hours INTEGER, Accounts TEXT, 
                           Adjustment REAL, Adjusted INTEGER);
sqlite> .tables
audit
sqlite> .import audit.csv audit
sqlite> select count(*) from audit;
2001
sqlite> delete from audit where ID='"ID"';
sqlite> select count(*) from audit;
2000

sqlite> .quit

Now in R:

library((DBI)
library(RSQLite)
library(Hmisc)
survey <- read.csv("survey.csv")

driver<-dbDriver("SQLite")
connect<-dbConnect(driver, dbname = "survey.db")
dbWriteTable(connect, "surveytbl", survey, overwrite = T, row.names = F)

dbListTables(connect)

query01 <- dbSendQuery(connect, statement = "select * from surveytbl");
data01 <- fetch(query01, n = 10)
contents(data01)

sqliteCloseResult(query01)
sqliteCloseConnection(connect)
sqliteCloseDriver(driver)

Copyright © 2004-2006 [email protected]
Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.