DecisionSnippetFeatures

Code and data accompanying the paper

Pascal Welke, Fouad Alkhoury, Christian Bauckhage, and Stefan Wrobel: Decision Snippet Features. 25th International Conference on Pattern Recognition (ICPR) 2021, Milano, Italy

If you use this code, please cite our paper.

Requirements

To install the necessary requirements, we provide a environment.yml file that can be used with anaconda. You may of course use any other means of installing the requirements listed in this file.

It is possible that this code will continue to work with more recent versions of the requirements but we don't guarantee anything.

How to run the code

The file /dsf/program.py is the entry point to the decision snippet pipeline. In order to keep the code rather concise, we don't include timing measurements, as well as the code to measure average number of inference steps.

In order to change something in the code, you may have a look at the parameters section in /dsf/program.py:

```python
dataPath = "./data/"
forestsPath = "./tmp/forests/"
snippetsPath = "./tmp/snippets/"
resultsPath = "./tmp/results/"


# current valid options are ['sensorless', 'satlog', 'mnist', 'magic', 'spambase', 'letter', 'bank', 'adult', 'drinking']
dataSet = 'magic'
# dataSet = 'adult'
# dataSet = 'drinking'

# possible forest_types ['RF', 'DT', 'ET']
forest_types = ['RF']
forest_depths = [5, 10, 15, 20]
forest_size = 25

maxPatternSize = 6
minThreshold = 2
maxThreshold = 25

scoring_function = 'accuracy'

# learners that are to be used on top of Decision Snippet Features
learners = {'DSF_NB': MultinomialNB,
            'DSF_SVM': LinearSVC, 
            'DSF_LR': LogisticRegression}

# specify parameters that are given at initialization
learners_parameters = {'DSF_NB': {},
                    'DSF_SVM': {'max_iter': 10000},
                    'DSF_LR': {'max_iter': 1000}}


# for quick debugging, let the whole thing run once. Afterwards, you may deactivate individual steps
# each step stores its output for the subsequent step(s) to process
run_fit_models = True
run_mining = True
run_training = True
run_eval = True

verbose = True
```

Datasets are provided in /data/ folder and can be selected in the parameter section of /dsf/program.py by changing dataSet accordingly. The code /dsf/program.py is intended to be run for a single dataset on each call.

Output will be written to the folders specified by *Path variables. For each type of output, a subfolder with the name of the current dataset will be created.

Comments

Please note that the code will not create the exact numbers reported in the paper when you run it. This is due to randomization in the Random Forests, as well as random cross validation splits when selecting the best Decision Snippet Features for each Learner.

Finally, I would like to thank Lukas Pfahler to point out an error in an earlier version of our code.

Name		Name	Last commit message	Last commit date
Latest commit History 233 Commits
arch-forest		arch-forest
data		data
dsf		dsf
.gitignore		.gitignore
README.md		README.md
cstring		cstring
environment.yml		environment.yml
lwgr		lwgr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DecisionSnippetFeatures

Requirements

How to run the code

Comments

About

Releases

Packages

Languages

ML2R-center/DecisionSnippetFeatures

Folders and files

Latest commit

History

Repository files navigation

DecisionSnippetFeatures

Requirements

How to run the code

Comments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages