Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Alt downloads
  • Loading branch information
pdwaggoner authored Aug 28, 2024
1 parent 3070ba7 commit 0355bf7
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# `hdImpute`: Batched high dimensional imputation
[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![metacran downloads](https://cranlogs.r-pkg.org/badges/last-week/hdImpute)](https://cran.r-project.org/package=hdImpute)
[![Downloads](http://cranlogs.r-pkg.org/badges/grand-total/hdImpute)](http://cranlogs.r-pkg.org/)
[![Downloads](https://cranlogs.r-pkg.org/badges/hdImpute)](https://cran.rstudio.com/web/packages/hdImpute/index.html)
[![Documentation](https://img.shields.io/badge/documentation-hdImpute-orange.svg?colorB=E91E63)](https://www.r-pkg.org/pkg/hdImpute)

`hdImpute` is a correlation-based batch process for addressing high dimensional imputation problems. There are relatively few algorithms designed to handle imputation of missing data in high dimensional contexts in a relatively fast, efficient manner. Further, of the existing algorithms, even fewer are flexible enough to natively handle mixed-type data, often requiring a great deal of preprocessing to get the data into proper shape, and then postprocessing to return data to its original form. Such decisions as well as assumptions made by many algorithms regarding for example, the data generating process, limit the performance, flexibility, and usability of the algorithm. Built on top of a recent set of complementary algorithms for nonparametric imputation via chained random forests, `missForest` and `missRanger`, I offer a batch-based approach for subsetting the data based on ranked cross-feature correlations, and then imputing each batch separately, and then joining imputed subsets in the final step. The process is extremely fast and accurate after a bit of tuning to find the optimal batch size. As a result, high dimensional imputation is more accessible, and researchers are not forced to decide between speed or accuracy.
Expand Down

0 comments on commit 0355bf7

Please sign in to comment.