All codes used for my master thesis.
-
Open a Terminal and navigate to the directory where the algorithms should be.
-
Clone this repository.
git clone https://github.com/Zepp3/Master-Thesis
-
It is possible to download and locally run
DataIngest_Schachtschneider.py
in PyCharm for example. -
If
ModuleNotFoundError: No module named 'package'
occurs, install these missing packages withpip3 install package
. -
Return to step 1.
-
Open a Terminal and navigate to the directory in which
add_externals.py
can be found. -
Add external data with
python3 add_externals.py
-
If
ModuleNotFoundError: No module named 'package'
occurs, install these missing packages withpip3 install package
. -
Return to step 5.
-
Open a Terminal and navigate to the directory in which
DGW.py
can be found. -
Open
DGW.py
withvim DGW.py
and adjust the desired hyperparameters for the optimization in theobjective
function. -
Run the optimization of DGW.
python3 DGW.py -data CashierData.csv -n_trials 50 -database_name DGW_default -shift_numbers 0
Arguments
data : str
Dataset that should be used.
n_trials : int
Number of optimization trials.
database_name : str
Name of the project.
shift_numbers : int
Number of days for which the dataset should be shifted. Can be multiple days as well as positive and negative.
Returns
Dataframe and .txt-file
Fake data in a Dataframe and performance measures.
-
If
ModuleNotFoundError: No module named 'package'
occurs, install these missing packages withpip3 install package
. -
Return to step 2.
-
Just go to the DGW folder.
-
Open the
test_samples.ipynb
in a Jupyter Notebook. -
Let it run.
test_samples(data, database_name="default", print_data=True)
Arguments
data : str
Dataset that should be used.
database_name : str
Name of the project that you want to evaluate.
print_data : bool
Weather to print the test and fake dataset or not.
Returns
Plots and texts in sdtout
Visual and statistical evaluation of the fake data compared to the real test data.
-
If
ModuleNotFoundError: No module named 'package'
occurs, install these missing packages withpip3 install package
. -
Return to step 3.
-
Open a Terminal and navigate to the directory in which CTGAN.py can be found.
-
Open
CTGAN.py
withvim CTGAN.py
and adjust the desired hyperparameters for the optimization in theobjective
function. -
Run the optimization of CTGAN with.
python3 CTGAN.py -data CashierData.csv -num_samples 500 -n_trials 100 -database_name CTGAN_default -shift_numbers 0
Arguments
data : str
Dataset that should be used.
num_samples : int
Number of fake samples that should be generated.
n_trials : int
Number of optimization trials.
database_name : str
Name of the project.
shift_numbers : int
Number of days for which the dataset should be shifted. Can be multiple days as well as positive and negative.
Returns
Dataframe and .txt-file
Fake data in a Dataframe and performance measures.
-
If
ModuleNotFoundError: No module named 'package'
occurs, install these missing packages withpip3 install package
. -
Return to step 2.
Test the similarity of the fake data to the real data of CTGAN and visualize the optuna optimization
-
Just go to the CTGAN folder.
-
Open the
test_samples.ipynb
in a Jupyter Notebook. -
Let it run.
test_samples(data, database_name="default", print_data=True)
Arguments
data : str
Dataset that should be used.
database_name : str
Name of the project that you want to evaluate.
print_data : bool
Weather to print the test and fake dataset or not.
Returns
Plots and texts in sdtout
Visual and statistical evaluation of the fake data compared to the real test data.
-
If
ModuleNotFoundError: No module named 'package'
occurs, install these missing packages withpip3 install package
. -
Return to step 3.
- Insert the right study name and storage.
- Run the cell.
-
Open a Terminal and navigate to the directory in which
timeGAN.py
can be found. -
Open
timeGAN.py
withvim timeGAN.py
and adjust the desired hyperparameters for the optimization in theobjective
function. -
Run the optimization of timeGAN with
python3 timeGAN_je.py -data CashierData.csv -num_samples 500 -n_trials 100 -database_name CTGAN_default -shift_numbers 0
Arguments
data : str
Dataset that should be used.
num_samples : int
Number of fake samples that should be generated.
n_trials : int
Number of optimization trials.
database_name : str
Name of the project.
shift_numbers : int
Number of days for which the dataset should be shifted. Can be multiple days as well as positive and negative.
Returns
Dataframe and .txt-file
Fake data in a Dataframe and performance measures.
-
If
ModuleNotFoundError: No module named 'package'
occurs, install these missing packages withpip3 install package
. -
Return to step 2.
Test the similarity of the fake data to the real data of timeGAN and visualize the optuna optimization
-
Just go to the timeGAN folder.
-
Open the
test_samples.ipynb
in a Jupyter Notebook. -
Let it run.
test_samples(data, database_name="default", print_data=True)
Arguments
data : str
Dataset that should be used.
database_name : str
Name of the project that you want to evaluate.
print_data : bool
Weather to print the test and fake dataset or not.
Returns
Plots and texts in sdtout
Visual and statistical evaluation of the fake data compared to the real test data.
-
If
ModuleNotFoundError: No module named 'package'
occurs, install these missing packages withpip3 install package
. -
Return to step 3.
All codes in this repository were written by myself with exeption of the following scripts which were written by the authors of corresponing paper:
The scripts are based on scripts of Florian Haslbeck.
augmentation.py
, dtw.py
, helper.py
timegan.py
, cut_data(ori_data, seq_len)
in own_data_loading_cashierdata
and own_data_loading_schachtschneider
, utils.py
CTGAN: Xu, Lei; Skoularidou, Maria; Cuesta-Infante, Alfredo; Veeramachaneni, Kalyan (2019): Modeling Tabular data using Conditional GAN. In H. Wallach, H. Larochelle, A. Beygelzimer, F. Alché-Buc, E. Fox, R. Garnett (Eds.): Advances in Neural Information Processing Systems, vol. 32: Curran Associates, Inc. Available online at https://proceedings.neurips.cc/paper/2019/file/254ed7d2de3b23ab10936522dd547b78-Paper.pdf.
DGW: Iwana, Brian Kenji; Uchida, Seiichi (2020): Time Series Data Augmentation for Neural Networks by Time Warping with a Discriminative Teacher. Available online at http://arxiv.org/pdf/2004.08780v1.
timeGAN Yoon, Jinsung; Jarrett, Daniel; van der Schaar, Mihaela (2019): Time-series Generative Adversarial Networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. Alché-Buc, E. Fox, R. Garnett (Eds.): Advances in Neural Information Processing Systems, vol. 32: Curran Associates, Inc. Available online at https://proceedings.neurips.cc/paper/2019/file/c9efe5f26cd17ba6216bbe2a7d26d490-Paper.pdf.