Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Database Connection

The basic usage of RODBC will connect to a known ODBC object using the odbcConnect function and query the database for the tables it exports using sqlTables:



> library(RODBC)
> channel <- odbcConnect("DWH")
  # This may pop up a window to enter username and password
> tables <- sqlTables(channel)
> columns <- sqlColumns(channel, "clients")

You can then retrieve the full contents of a table with sqlFetch:

> ds <- sqlFetch(channel, "tablename")

Or else you can send a SQL query to the database:

> ds <- sqlQuery(channel, "SELECT * FROM clients WHERE age > 35")

Some ODBC drivers, such as the Netezza ODBC driver, have a pre-fetch option that interacts poorly with applications connecting through the driver. With a pre-fetch option the driver appears to report fewer rows being available than actually available. It seems that the number of rows reported is in fact the pre-fetch limited number of rows. For the Netezza ODBC driver, for example, the default is 256 rows. This confuses the application connecting to ODBC (in this case, R through the RODBC). The symptom is that we only receive 256 rows from the table. Internally, the application is probably using either the SQLExtendedFetch or SQLFetchScroll ODBC functions.

There are a number of solutions to this issue. One from the applications side is to set believeNRows to FALSE. This will then retrieve all the rows from the table. Another solution is at the driver configuration level. For example, in connecting through the Netezza ODBC driver a configuration option is available where you can change the default Prefetch Count value.

An example of the issue is illustrated below:

> channel <- odbcConnect("netezza")
> orders <- sqlQuery(channel, "SELECT * FROM orders LIMIT 500")
> dim(orders)
[1] 256   9
> orders <- sqlQuery(channel, "select * from orders limit 500",
                     believeNRows=FALSE)
> dim(orders)
[1] 500   9
> odbcCloseAll()

We can reopen the driver and in the resulting GUI configuration set the Prefetch Count to perhaps 10,000. Then:



> channel <- odbcConnect("netezza")
> orders <- sqlQuery(channel, "SELECT * FROM orders LIMIT 500")
> dim(orders)
[1] 500   9
> orders <- sqlQuery(channel, "SELECT * FROM orders LIMIT 50000")
> dim(orders)
[1] 10000     9
> orders <- sqlQuery(channel, "SELECT * FROM orders LIMIT 50000",
                     believeNRows=FALSE)
> dim(orders)
[1] 50000     9
>

Note that we would not want to default beleiveNRows to FALSE since, for example, with a Teradata query this increase the query time by some 3 times!

For an SQLite database, edit .odbc.ini

[audit]
Description=SQLite test audit database for Rattle
Driver=SQLite3
Database=/home/kayon/projects/rattle/audit.db
# optional lock timeout in milliseconds
Timeout=2000

Copyright © 2004-2006 [email protected]
Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.