-
Notifications
You must be signed in to change notification settings - Fork 17
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
James Weakley
committed
Jan 11, 2021
1 parent
bddc1d5
commit 8bba677
Showing
12 changed files
with
266 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
version: 2 | ||
|
||
macros: | ||
- name: label_encoder | ||
description: | | ||
Encode target labels with value between 0 and n_classes-1. See scikit-learn's [LabelEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder) for full documentation. | ||
Will append a new column with the name <source column>_encoded | ||
Example usage: | ||
#### **`models\customer_features.yml:`** | ||
``` | ||
{{ '{{' }} config(materialized='view') {{ '}}' }} | ||
{{ '{{' }} dbt_ml_preprocessing.label_encoder( ref('customer') ,'city') {{ '}}' }} | ||
``` | ||
Will produce a model named customer_features, with a new column named ```city_encoded``` containing the encoded values. | ||
arguments: | ||
- name: source_table | ||
type: string | ||
description: Pass in a ref to the table containing the data you want to transform | ||
- name: source_column | ||
type: string | ||
description: The column containing the data you want to transform | ||
- name: include_columns | ||
type: string | ||
description: Other columns from the source table to be included in the model (defaults to '*' and brings all columns across) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
version: 2 | ||
|
||
macros: | ||
- name: max_abs_scaler | ||
description: | | ||
Scale each feature by its maximum absolute value. See scikit-learn's [MaxAbsScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html#sklearn.preprocessing.MaxAbsScaler) for full documentation. | ||
Will append a new column with the name <source column>_scaled | ||
Example usage: | ||
#### **`models\customer_features.yml:`** | ||
``` | ||
{{ '{{' }} config(materialized='view') {{ '}}' }} | ||
{{ '{{' }} dbt_ml_preprocessing.max_abs_scaler( ref('customer') ,'age') {{ '}}' }} | ||
``` | ||
Will produce a model named customer_features, with a new column named ```age_scaled``` containing the encoded values. | ||
arguments: | ||
- name: source_table | ||
type: string | ||
description: Pass in a ref to the table containing the data you want to transform | ||
- name: source_column | ||
type: string | ||
description: The column containing the data you want to transform | ||
- name: include_columns | ||
type: string | ||
description: Other columns from the source table to be included in the model (defaults to '*' and brings all columns across) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
version: 2 | ||
|
||
macros: | ||
- name: min_max_scaler | ||
description: | | ||
Transform features by scaling each feature to a given range. See scikit-learn's [MinMaxScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler) for full documentation. | ||
Will append a new column with the name <source column>_scaled | ||
Example usage: | ||
#### **`models\customer_features.yml:`** | ||
``` | ||
{{ '{{' }} config(materialized='view') {{ '}}' }} | ||
{{ '{{' }} dbt_ml_preprocessing.min_max_scaler( ref('customer') ,'age') {{ '}}' }} | ||
``` | ||
Will produce a model named customer_features, with a new column named ```age_scaled``` containing the encoded values. | ||
arguments: | ||
- name: source_table | ||
type: string | ||
description: Pass in a ref to the table containing the data you want to transform | ||
- name: source_column | ||
type: string | ||
description: The column containing the data you want to transform | ||
- name: include_columns | ||
type: string | ||
description: Other columns from the source table to be included in the model (defaults to '*' and brings all columns across) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
version: 2 | ||
|
||
macros: | ||
- name: normalizer | ||
description: | | ||
Normalize samples individually to unit norm. See scikit-learn's [Normalizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html#sklearn.preprocessing.Normalizer) for full documentation. | ||
Will append a new column with the name <source column>_normalized | ||
Example usage: | ||
#### **`models\customer_features.yml:`** | ||
``` | ||
{{ '{{' }} config(materialized='view') {{ '}}' }} | ||
{{ '{{' }} dbt_ml_preprocessing.normalizer( ref('customer') ,'age') {{ '}}' }} | ||
``` | ||
Will produce a model named customer_features, with a new column named ```age_normalized``` containing the encoded values. | ||
arguments: | ||
- name: source_table | ||
type: string | ||
description: Pass in a ref to the table containing the data you want to transform | ||
- name: source_column | ||
type: string | ||
description: The column containing the data you want to transform | ||
- name: include_columns | ||
type: string | ||
description: Other columns from the source table to be included in the model (defaults to '*' and brings all columns across) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
version: 2 | ||
|
||
macros: | ||
- name: one_hot_encoder | ||
description: | | ||
Encode categorical features as a one-hot numeric array. See scikit-learn's [OneHotEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder) for full documentation. | ||
Will append a new boolean column for every category present in the data with the name <source column>_<category value>. | ||
Example usage: | ||
#### **`models\customer_features.yml:`** | ||
``` | ||
{{ '{{' }} config(materialized='view') {{ '}}' }} | ||
{{ '{{' }} dbt_ml_preprocessing.one_hot_encoder( ref('customer') ,'gender') {{ '}}' }} | ||
``` | ||
Will produce a model named customer_features, with a new column named ```gender_encoded``` containing the encoded values. | ||
arguments: | ||
- name: source_table | ||
type: string | ||
description: Pass in a ref to the table containing the data you want to transform | ||
- name: source_column | ||
type: string | ||
description: The column containing the data you want to transform | ||
- name: include_columns | ||
type: string | ||
description: Other columns from the source table to be included in the model (defaults to '*' and brings all columns across) | ||
- name: categories | ||
type: string | ||
description: The categories of each feature determined during fitting. Defaults to 'auto', which will encode all values. | ||
- name: handle_unknown | ||
type: string | ||
description: Whether to raise an error or ignore if an unknown categorical feature is present during transform. Only supports the default value of 'ignore' at this time. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
version: 2 | ||
|
||
macros: | ||
- name: quantile_transformer | ||
description: | | ||
Transform features using quantiles information. See scikit-learn's [QuantileTransformer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html#sklearn.preprocessing.QuantileTransformer) for full documentation. | ||
Will append a new column with the name <source column>_transformed. | ||
Example usage: | ||
#### **`models\customer_features.yml:`** | ||
``` | ||
{{ '{{' }} config(materialized='view') {{ '}}' }} | ||
{{ '{{' }} dbt_ml_preprocessing.quantile_transformer( ref('customer') ,'age') {{ '}}' }} | ||
``` | ||
Will produce a model named customer_features, with a new column named ```age_transformed``` containing the encoded values. | ||
arguments: | ||
- name: source_table | ||
type: string | ||
description: Pass in a ref to the table containing the data you want to transform | ||
- name: source_column | ||
type: string | ||
description: The column containing the data you want to transform | ||
- name: include_columns | ||
type: string | ||
description: Other columns from the source table to be included in the model (defaults to '*' and brings all columns across) | ||
- name: n_quantiles | ||
type: string | ||
description: Number of quantiles to be computed, defaults to 10. | ||
- name: output_distribution | ||
type: string | ||
description: Marginal distribution for the transformed data. Only supports the default value of 'uniform' at this time. | ||
- name: subsample | ||
type: string | ||
description: Maximum number of samples used to estimate the quantiles for computational efficiency, defaults to 1000. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
version: 2 | ||
|
||
macros: | ||
- name: robust_scaler | ||
description: | | ||
Scale features using statistics that are robust to outliers. See scikit-learn's [RobustScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html#sklearn.preprocessing.RobustScaler) for full documentation. | ||
Will append a new column with the name <source column>_scaled. | ||
Example usage: | ||
#### **`models\customer_features.yml:`** | ||
``` | ||
{{ '{{' }} config(materialized='view') {{ '}}' }} | ||
{{ '{{' }} dbt_ml_preprocessing.robust_scaler( ref('customer') ,'age') {{ '}}' }} | ||
``` | ||
Will produce a model named customer_features, with a new column named ```age_scaled``` containing the scaled values. | ||
arguments: | ||
- name: source_table | ||
type: string | ||
description: Pass in a ref to the table containing the data you want to transform | ||
- name: source_column | ||
type: string | ||
description: The column containing the data you want to transform | ||
- name: include_columns | ||
type: string | ||
description: Other columns from the source table to be included in the model (defaults to '*' and brings all columns across) | ||
- name: with_centering | ||
type: string | ||
description: If True, center the data before scaling. Only supports the default value of 'False' at this time. | ||
- name: quantile_range | ||
type: string | ||
description: Quantile range, must be a two-item array containing the first quartile threshold and the third quartile threshold. Defaults to Interquartile Range, which is [25,75] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
version: 2 | ||
|
||
macros: | ||
- name: standard_scaler | ||
description: | | ||
Standardize features by removing the mean and scaling to unit variance. See scikit-learn's [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) for full documentation. | ||
Will append a new column with the name <source column>_scaled. | ||
Example usage: | ||
#### **`models\customer_features.yml:`** | ||
``` | ||
{{ '{{' }} config(materialized='view') {{ '}}' }} | ||
{{ '{{' }} dbt_ml_preprocessing.standard_scaler( ref('customer') ,'age') {{ '}}' }} | ||
``` | ||
Will produce a model named customer_features, with a new column named ```age_scaled``` containing the scaled values. | ||
arguments: | ||
- name: source_table | ||
type: string | ||
description: Pass in a ref to the table containing the data you want to transform | ||
- name: source_column | ||
type: string | ||
description: The column containing the data you want to transform | ||
- name: include_columns | ||
type: string | ||
description: Other columns from the source table to be included in the model (defaults to '*' and brings all columns across) | ||
- name: with_mean | ||
type: string | ||
description: If True, center the data before scaling. Only supports the default value of 'True' at this time. |