Archive for the ‘R’ Category

Paper describing the weaver package published in Computational Statistics

Saturday, June 14th, 2008

It seems like a lifetime ago that I developed the weaver package for caching code chunks in Sweave documents. The paper that I presented at the DSC 2007 has finally been published in Computational Statistics. The title is Caching Code Chunks in Dynamic Documents: The weaver package. Here’s the abstract:

Authoring dynamic documents can become tedious for authors when a document contains one or more time consuming code chunks and each edit requires reprocessing all of the document. We introduce the weaver package that allows computationally expensive code chunks to be cached in order to speed up the edit/process/review cycle for dynamic documents authored using the Sweave framework.


And here a link to an unofficial pdf and the weaver package.

Technorati Tags: ,

Testing for missing values of numeric (double) vectors in R

Friday, July 13th, 2007

I just came across this bit of R trickiness when trying to improve the error messages for one of our packages. I tried adding a check at the C level for NA_REAL, but to no avail. The test failed even when I was certain that I was passing down a vector containing missing values. Consulting the Writing R Extensions Manual (WREM) led me quickly to ISNA. The WREM says that ISNA only applies to numeric values of type double and that other types can be checked by equality comparison to NA_INTEGER, NA_LOGICAL, etc. What could perhaps be better emphasized is that ISNA is the only appropriate way to test for missingness of REALSXP elements and that equality testing with NA_REAL does not work.

It is an easy mistake to make since one might be lulled into complacency by the repeating patterns of a switch statement. Here’s how NOT to do it:


SEXP hello_hasNA(SEXP v)
{
int i, found_na = 0;

for (i = 0; i < length(v); i++) {
switch (TYPEOF(v)) {
case INTSXP:
if (INTEGER(v)[i] == NA_INTEGER)
found_na = 1;
break;
case LGLSXP:
if (LOGICAL(v)[i] == NA_LOGICAL)
found_na = 1;
break;
case REALSXP:
if (REAL(v)[i] == NA_REAL) /* WRONG, must use ISNA() */
found_na = 1;
break;
case STRSXP:
if (STRING_ELT(v, i) == NA_STRING)
found_na = 1;
break;
default:
error("no support for type");
}
if (found_na)
break;
}
return ScalarLogical(found_na);
}

To fix things, replace the REALSXP case like this:


case REALSXP:
if (ISNA(REAL(v)[i]))
found_na = 1;
break;

Technorati Tags: ,

Yes, it can

Wednesday, August 2nd, 2006

Joel on Software’s latest post is about having first class functions in your programming language. He shows how you can use functions as arguments to functions to reduce code duplication. IReducing code duplication is a nice way to motivate language features since the process of removing duplication almost always results in improved code.

He also links first class functions to Google’s MapReduce. What doesn’t get mentioned is how first-class functions play with closures (lexical scope) and how powerful these notions are when put together.

You can use this in a duplication reducing way to simplify repeated function calls where only a few of the parameters change. For example:

foo(a, b, c, "yes")
foo(a, b, c, "no")
foo(a, b, c, "maybe")

myfoo <- function(a, b, c, ans) {
function(ans) {
foo(a, b, c, ans)
}
}

myfoo("yes")
myfoo("no")
myfoo("maybe")

You can also use lexical scope to maintain state across function calls. This one is perhaps less interesting since in most OO languages, this sort of thing is done via classes.


callCounter <- function() {
count <- 0
function() {
print(paste("I've been called", count, "times"))
count <<- count + 1
}
}

countCalls <- callCounter()
> countCalls()
[1] “I’ve been called 0 times”
> countCalls()
[1] “I’ve been called 1 times”
> countCalls()
[1] “I’ve been called 2 times”
>

Here’s an improved system.time function for R

Sunday, May 14th, 2006

R’s current system.time function doesn’t name the vector of return values. Doing so makes it easier to understand the output. IMO, the current code has two uglies: it sets an on.exit hook and then calls on.exit() explicitly. It also computes the elapsed time twice (once for the stdout message and once for the return value). Another minor shortcoming is that when nested, only the outer call will print anything.

Anyhow, here’s an improved version that uses tryCatch instead of on.exit. One thing I wish was easier (maybe it is and I just don’t know) is reproducing the error message.

timeit <- function (expr, gcFirst = TRUE)
{
    nms <- c("User", "System", "Elapsed", "Sub.User", "Sub.System")
    if (!exists("proc.time")) {
        ans <- rep(as.numeric(NA), 5)
        names(ans) <- nms
        return(ans)
    }
    loc.frame <- parent.frame()
    if (gcFirst)
        gc(FALSE)
    expr <- substitute(expr)
    time <- proc.time()
    show_time <- function() {
        t <- proc.time() - time
        names(t) <- nms
        cat("Timing stopped at:\n")
        print(t)
        t
    }
    tryCatch(eval(expr, envir = loc.frame),
             error=function(e) {
                 msg <- paste("Error in", deparse(conditionCall(e)),
                              ":", conditionMessage(e), "\n")
                 cat(msg)
             })
    show_time()
}

Aggregating Results from Unreliable Functions in R

Friday, May 12th, 2006

I posted this as a response to a question on R-help. I think the idea of a “collect” function could be useful both in the context of unreliable functions that sometimes error out and also in filtering contexts where currently one creates a list containing good elements and some sort of sentinel, usually NULL, which has itself to be filtered out in a separate subsetting operation after the main filtering loop.

Here’s an example:

    d <- runif(20, min=-2, max=8) # test data

    aFunc <- function(x) {  # gives error occasionally
        if (x > 0)
          x
        else
          stop("encountered bad x")
    }

collect <- function(x, FUN, skip_error=TRUE, args_list=NULL)
{
    if (!is.vector(x))
      stop("arg x must be a vector")
    fname <- deparse(substitute(FUN))
    xvar <- deparse(substitute(x))
    i <- 1
    j <- 1
    result <- vector(mode=mode(x), length=length(x))
    while (i <= length(x)) {
        tryCatch({
            args <- list(x[i])
            if (length(args_list))
              args <- c(args, args_list)
            ans <- do.call(FUN, args)
            result[j] <- ans
            j <- j + 1
        }, error=function(e) {
            if (!skip_error) {
                msg <- paste("collect\n",
                             "call to", fname, "failed at",
                             paste(xvar, "[",  i, "]\n", sep=""),
                             "Message:\n", conditionMessage(e))
                stop(msg, call.=FALSE)
            }
            NULL
        },
                 finally={i <- i + 1})
    }
    if (j > 1)
      result[1:(j-1)]
    else
      vector(mode=mode(x), length=0)
}

## Example

collect(d, aFunc, skip_error=FALSE)
Error: collect
 call to aFunc failed at d[2]
 Message:
 encountered bad x

collect(d, aFunc, skip_error=TRUE)
 [1] 7.7380303 0.7554328 1.8352623 0.5136118 4.4231091 2.5368103 1.8656615
 [8] 2.9244200 2.1364120 7.6711189 0.2141325 7.8216620 5.8347576 5.3939892

Content recommendations from Evri