Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

style: format code with Autopep8, Black, ClangFormat, dotnet-format, Go fmt, Gofumpt, Google Java Format, isort, Ktlint, PHP CS Fixer, Prettier, RuboCop, Ruff Formatter, Rustfmt, Scalafmt, StandardJS, StandardRB, swift-format and Yapf #155

Merged
merged 2 commits into from
May 12, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 11 additions & 6 deletions ai/risk_assessment/data_preparation.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
import pandas as pd
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split


def load_data(data_path):
"""
Loads the data from the given file path.
"""
data = pd.read_csv(data_path)
return data


def preprocess_data(data):
"""
Preprocesses the data by cleaning, transforming, and encoding the features.
Expand All @@ -17,18 +19,21 @@ def preprocess_data(data):
data = data.dropna()

# Transform the data
data['amount'] = np.log(data['amount'])
data["amount"] = np.log(data["amount"])

# Encode the categorical features
data = pd.get_dummies(data, columns=['category', 'merchant'])
data = pd.get_dummies(data, columns=["category", "merchant"])

return data


def split_data(data, test_size=0.2):
"""
Splits the data into training and testing sets.
"""
X = data.drop('fraud', axis=1)
y = data['fraud']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42)
X = data.drop("fraud", axis=1)
y = data["fraud"]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=test_size, random_state=42
)
return X_train, X_test, y_train, y_test
Loading