mltools is an R package with tools for
- data cleansing
- exploratory data analysis
- evaluating machine-learning models
mltools (published on CRAN in 2016) boasts a variety of useful methods that help practitioners do rapid and meaningful exploratory data analysis, particularly with high dimensional data.
For example, raw data will often have columns which are masked duplicates (e.g. StateID and StateName) and columns with hierarchical relationships (e.g. StateID and CityID) but these relationships are not always obvious and can be hard to find amongst data with hundreds of columns and values which may be encoded or anonymized. mltools can quickly recognize structures like this in a dataset.
Another common phenomena in data is when a small group accounts for a large portion of a dependent variable (e.g. one customer accounts for 5% of sales or one exposure accounts for 10% of insurance losses). Finding these phenomena is very important, but the procedure is often tedious and overlooked. mltools helps simplify this analysis which can pay dividends during the modeling phase.
In addition to its variety of exploratory methods, mltools offers a number of convenient machine-learning based functions which are either unsatisfactory or missing from other R packages. These include:
- auc_roc(): A fast method for calculating Area Under the ROC Curve
- roc_scores(): A method for ranking and evaluating cross validated predictions of a ML model
- sparisfy(): A helper method that converts a data.table into a sparse matrix
- relative_position(): A helper method for ranking a set of values, scaled between 0 and 1
- exponential_weights(): Generates weights based on exponential decay