Skip to content

Commit

Permalink
remove leuk dataset download from vignette
Browse files Browse the repository at this point in the history
  • Loading branch information
hongooi73 committed Sep 10, 2023
1 parent 5b01868 commit 4512b19
Show file tree
Hide file tree
Showing 6 changed files with 25 additions and 13 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -170,3 +170,5 @@ pip-log.txt
*.mo
#Mr Developer
.mr.developer.cfg
/doc/
/Meta/
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: glmnetUtils
Type: Package
Version: 1.1.8
Version: 1.1.9
Title: Utilities for 'Glmnet'
Description: Provides a formula interface for the 'glmnet' package for
elasticnet regression, a method for cross-validating the alpha parameter,
Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
## glmnetUtils 1.1.9

- Remove vignette dependency on an external download.

## glmnetUtils 1.1.8

- Skip some tests on 32-bit Solaris R-patched due to numerical convergence issues.
Expand Down
Binary file added vignettes/figures/leukModCVA.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/figures/leukModList.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 18 additions & 12 deletions vignettes/intro.rmd
Original file line number Diff line number Diff line change
Expand Up @@ -163,16 +163,7 @@ As an option, glmnetUtils can also generate a _sparse_ model matrix, using the `

One piece missing from the standard glmnet package is a way of choosing $\alpha$, the elastic net mixing parameter, similar to how `cv.glmnet` chooses $\lambda$, the shrinkage parameter. To fix this, glmnetUtils provides the `cva.glmnet` function, which uses crossvalidation to examine the impact on the model of changing $\alpha$ and $\lambda$. The interface is the same as for the other functions:

```{r, echo = FALSE}
leukFile <- file.path(tempdir(), "Leukemia.rdata")
if(!file.exists(leukFile))
{
download.file("https://web.stanford.edu/~hastie/glmnet/glmnetData/Leukemia.RData", leukFile, mode = "wb")
}
load(leukFile)
```

```{r}
```r
# Leukemia dataset from Trevor Hastie's website:
# https://web.stanford.edu/~hastie/glmnet/glmnetData/Leukemia.RData
leuk <- do.call(data.frame, Leukemia)
Expand All @@ -181,6 +172,17 @@ leukMod <- cva.glmnet(y ~ ., data=leuk, family="binomial")
leukMod
```

```
## Call:
## cva.glmnet.formula(formula = y ~ ., data = leuk, family = "binomial")
##
## Model fitting options:
## Sparse model matrix: FALSE
## Use model.frame: FALSE
## Alpha values: 0 0.001 0.008 0.027 0.064 0.125 0.216 0.343 0.512 0.729 1
## Number of crossvalidation folds for lambda: 10
```

`cva.glmnet` uses the algorithm described in the help for `cv.glmnet`, which is to fix the distribution of observations across folds and then call `cv.glmnet` in a loop with different values of $\alpha$. Optionally, you can parallelise this outer loop, by setting the `outerParallel` argument to a non-NULL value. Currently, glmnetUtils supports the following methods of parallelisation:

- Via `parLapply` in the parallel package. To use this, set `outerParallel` to a valid cluster object created by `makeCluster`.
Expand All @@ -190,18 +192,22 @@ If the outer loop is run in parallel, `cva.glmnet` can check if the inner loop (

Because crossvalidation is often a statistically noisy procedure, it doesn't try to automatically choose $\alpha$ and $\lambda$ for you. Instead you can plot the output, to see how the results depend on the values of these parameters. Using this information, you can choose appropriate values for your data.

```{r, fig.height=5, fig.width=7}
```r
plot(leukMod)
```

![](figures/leukModCVA.png)

In this case, we see that values of $\alpha$ close to $1$ tend to lead to better accuracy. The curves don't have a well-defined minimum, but they do flatten out for lower values of $\lambda$. As the `cv.glmnet` documentation recommends though, it's a good idea to run `cva.glmnet` multiple times to reduce the impact of noise.

A `cva.glmnet` object contains a list of individual `cv.glmnet` objects, corresponding to the different $\alpha$ values tried. This lets you plot the crossvalidation results easily for a given $\alpha$:

```{r, fig.height=5, fig.width=7}
```r
plot(leukMod$modlist[[10]]) # alpha = 0.729
```

![](figures/leukModList.png)

## Conclusion

The glmnetUtils package is a way to improve quality of life for users of glmnet. As with many R packages, it's always under development; you can get the latest version from my [GitHub repo](https://github.com/hongooi73/glmnetUtils). If you find a bug, or if you want to suggest improvements to the package, please feel free to contact me at [[email protected]](mailto:[email protected]).
Expand Down

0 comments on commit 4512b19

Please sign in to comment.