Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Garbage Collection

When doing timings of commands it is important to know that garbage collection plays a role. R adjusts its garbage collection triggers accoring to your usage. When you first start using large objects the trigger levels will grow and generally things will speed up.

You can use gcinfo to start seeing the adjustments in action:

> gcinfo(TRUE)
[1] FALSE			# The setting was previously FALSE

For the system.time function use the gcFirst.

The gc function will cause a garbage collection to take place, and lists useful information about memory usage (the primary purpose for calling the gc function). Ncells is the number of so called cons cells used (each cell is 28 or 56 bytes on 32 or 64 bit systems, and is used for storing fixed sized objects), and this is converted in the function's to Mb for us. Vcells is the number of vector cells used (each cell is 8 bytes, and is used for storing variable sized objects). The final two columns show the maximum amount of memory that has been used since the last call to gc(reset=TRUE).

> gc()
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 177949  4.8     407500 10.9   350000  9.4
Vcells  72431  0.6     786432  6.0   332253  2.6
> survey <- read.csv("survey.csv")
> gc()
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 212685  5.7     741108 19.8   514436 13.8
Vcells 366127  2.8    1398372 10.7  1387692 10.6
> rm(survey)
> gc()
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 179940  4.9     741108 19.8   514436 13.8
Vcells  72773  0.6    1118697  8.6  1387692 10.6

Here, after reading the datafile survey.csv (which is 4100478 bytes, or 4MB, in size as a text file), XXXX

Copyright © 2004-2006 [email protected]
Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.