From f26e60f6499edf4f2125148a57a8a7bc8e8affb1 Mon Sep 17 00:00:00 2001 From: Philip Waggoner <31326382+pdwaggoner@users.noreply.github.com> Date: Wed, 28 Aug 2024 10:02:54 -0600 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 9dd2f64..628d1d5 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # `hdImpute`: Batched high dimensional imputation [![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) -[![Downloads](https://cranlogs.r-pkg.org/badges/hdImpute)](https://cran.rstudio.com/web/packages/hdImpute/index.html) +[![Downloads](https://cranlogs.r-pkg.org/badges/grand-total/hdImpute)](https://cran.r-project.org/package=hdImpute) [![Documentation](https://img.shields.io/badge/documentation-hdImpute-orange.svg?colorB=E91E63)](https://www.r-pkg.org/pkg/hdImpute) `hdImpute` is a correlation-based batch process for addressing high dimensional imputation problems. There are relatively few algorithms designed to handle imputation of missing data in high dimensional contexts in a relatively fast, efficient manner. Further, of the existing algorithms, even fewer are flexible enough to natively handle mixed-type data, often requiring a great deal of preprocessing to get the data into proper shape, and then postprocessing to return data to its original form. Such decisions as well as assumptions made by many algorithms regarding for example, the data generating process, limit the performance, flexibility, and usability of the algorithm. Built on top of a recent set of complementary algorithms for nonparametric imputation via chained random forests, `missForest` and `missRanger`, I offer a batch-based approach for subsetting the data based on ranked cross-feature correlations, and then imputing each batch separately, and then joining imputed subsets in the final step. The process is extremely fast and accurate after a bit of tuning to find the optimal batch size. As a result, high dimensional imputation is more accessible, and researchers are not forced to decide between speed or accuracy.