dProf - A Data Quality Profiler

"Don't waste my time with garbage data sets!" When you are given a new data set, the first thing to do is a quick data quality scan. This package is inspired by Jack Olson's Data Quality: The Accuracy Dimension; as was in initial attempt presented at useR! 2004. This version leverages the advances in R and the Hadleyverse. More importantly the profiling and presentation functionality has been decoupled giving us the ability to optimize each for particular data sources and needs.

This version initially implements column level profiling. In other words, each column is profiled independently. Upcoming releases will add between column profiling. Also on the roadmap are a DBMS SQL optmized profiling module and a compact profile presentation following what was done in 2004.

See the vignette, dProf_Workflow.Rmd to get started.

Version 0.1.0 - an early beta!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

dProf - A Data Quality Profiler

Files

README.md

Latest commit

History

README.md

File metadata and controls

dProf - A Data Quality Profiler