Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor modifier #104

Draft
wants to merge 59 commits into
base: python-cli-refactor
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 51 commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
99c7b5b
Add query and generator class
ebensonm Jan 18, 2021
deee5fa
Untested seed swap functionality
ebensonm Jan 18, 2021
cdf770a
Minor syntax fixes
ebensonm Jan 22, 2021
082429a
Merge branch 'python-cli-refactor' into refactor-modifier
ebensonm Jan 22, 2021
b323c1f
Unit test, and job maker
ebensonm Jan 26, 2021
fdd2813
functions to execute D3M runtime evaluate via cli
clarkjoe Jan 26, 2021
275cbc6
added data_prep_pipelines
clarkjoe Jan 28, 2021
9b02988
renamed execute --> evaluate. created blank file for new implementati…
clarkjoe Jan 28, 2021
a9f264d
Merge branch 'experimenter-to-d3m-refactor' of https://github.com/byu…
ebensonm Jan 29, 2021
a0c9991
Bug fixes and queueing
ebensonm Jan 29, 2021
d678e2b
Merge branch 'python-cli-refactor' into refactor-modifier
ebensonm Jan 29, 2021
e565325
Merge branch 'python-cli-refactor' into experimenter-to-d3m-refactor
clarkjoe Jan 29, 2021
a4cc817
Syntax fixes and return paths from query
ebensonm Feb 12, 2021
5f6dfb0
Merge branch 'experimenter-to-d3m-refactor' of https://github.com/byu…
ebensonm Feb 12, 2021
991ad7f
setup.py updates
ebensonm Feb 12, 2021
83cceb4
added function to save pipeline_run docs to DB.
clarkjoe Feb 19, 2021
5c4bb38
updated documentation
clarkjoe Feb 19, 2021
2635fdd
setup.py add dependencies
ebensonm Feb 19, 2021
aea187d
New file
ebensonm Feb 19, 2021
ba5089a
fixed condition typo. renamed file
clarkjoe Feb 19, 2021
e3cc010
Merge branch 'experimenter-to-d3m-refactor' of https://github.com/byu…
ebensonm Feb 19, 2021
62d639e
Update job maker in the seed swap functionality
ebensonm Feb 19, 2021
37d94bc
implemented review suggestions
clarkjoe Feb 22, 2021
d308e92
setup.py elasticsearch and working queue/enqueue
Feb 23, 2021
fe3ba99
Merging branches and evaluate updates
ebensonm Feb 23, 2021
6edfb62
D3M configuration variables
ebensonm Feb 23, 2021
a18d859
Fixed merge from python-cli-refactor
ebensonm Mar 2, 2021
dc35a7f
Merge branch 'config-d3m' of https://github.com/byu-dml/d3m-experimen…
ebensonm Mar 2, 2021
4f8cfa1
fix environment variables and pipeline run to dict for saving
ebensonm Mar 2, 2021
92047d0
Working queue with queue refactor
Mar 2, 2021
78b9605
Added job count and worker info tracking for queue status command
ebensonm Mar 3, 2021
43d06c9
Merge branch 'queue-status-update' of https://github.com/byu-dml/d3m-…
ebensonm Mar 3, 2021
c8c7cc4
rq-worker Popen commands updated
Mar 3, 2021
1a56219
Updated paths for saving pipelines and pipeline runs, queue and evalu…
Mar 5, 2021
68303e0
Merge fix and refactor fitting
ebensonm Mar 12, 2021
c9b0b77
experimenter logging updates
Mar 15, 2021
075844c
Experimenter and queue updates
ebensonm Mar 16, 2021
9d06273
Merge from python-cli-refactor
ebensonm Mar 19, 2021
9b3e649
Working queue and pipeline run local
ebensonm Mar 19, 2021
a53478c
Minor typos, need to update queue and query
Mar 19, 2021
d0fe148
Updates beginning datapreparation functionality
ebensonm Mar 19, 2021
f48abcb
Merge branch 'refactor-modifier' of https://github.com/byu-dml/d3m-ex…
ebensonm Mar 19, 2021
8c26b4a
Working with test and when the data preparation is explicitly defined…
ebensonm Mar 22, 2021
614be91
Bug fixes for remote work
Mar 22, 2021
ce3d768
Added data preparation checks for d3m module
ebensonm Mar 22, 2021
9ce7db7
Minor changes to query, still failed pipelines that probably should n…
Mar 22, 2021
6165dbd
More robust data preparation and scoring pipelines from pipeline run
ebensonm Mar 23, 2021
5208a46
Unnecessary commenting in _run_seed_test
ebensonm Mar 23, 2021
c59ad21
query changes for data params and scoring params
Mar 23, 2021
500ea10
Adding data params and scoring params to pipeline run cli works locally
ebensonm Mar 24, 2021
5c90be4
using data and scoring params working remotely
Mar 24, 2021
c3b4b60
Working on merge suggestions
ebensonm Mar 24, 2021
a0bd988
Merge branch 'refactor-modifier' of https://github.com/byu-dml/d3m-ex…
ebensonm Mar 24, 2021
55e8ce7
First round of suggested changes (mostly on runtime.py)
ebensonm Mar 24, 2021
5777cd8
Cleaned query.py and added tests to test_modifier
ebensonm Mar 25, 2021
a235dfb
Test Updates and query cleaning
ebensonm Mar 26, 2021
635b1c0
docker compose update and finish suggested changes
ebensonm Mar 26, 2021
1e5a946
Fix tests and generator part of the modify generator
ebensonm Mar 26, 2021
3714ac1
Add runtime and old execute pipeline files
ebensonm Mar 29, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .env-example
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ RQ_QUEUES="default"

D3M_DB_SUBMITTER=submitter_name
D3M_DB_TOKEN=token
SAVE_TO_D3M=false
14 changes: 13 additions & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,26 @@ services:
volumes:
- type: bind
source: '${DATA_DIR}/${REDIS_DATA_DIR}'
target: '/data'
target: /data
networks:
- default

rq_worker:
image: 'd3m-experimenter:latest'
env_file:
- ./.env
volumes:
- type: bind
source: '${DATASETS_DIR}'
target: /datasets
read_only: true
- type: bind
source: '${DATA_DIR}'
target: /data
- type: bind
source: '${EXPERIMENTER_DIR}'
target: /d3m-experimenter
read_only: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you need to mount the experimenter code as a volume?

Copy link
Author

@ebensonm ebensonm Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not sure if the last mount is necessary. The other two are needed to show the worker where to get the data and put the data. I will run some tests with and without the /d3m-experimenter volume mount before resolving this. Can the worker use experimenter functionality without being in the experimenter?

command: 'rq worker --url redis://${REDIS_HOST} ${RQ_QUEUES}'
networks:
- default
Expand Down
Empty file modified experimenter/__init__.py
100755 → 100644
Empty file.
137 changes: 135 additions & 2 deletions experimenter/cli.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
import argparse
import typing

from experimenter import exceptions, queue

from experimenter.modify_generator import ModifyGenerator
from experimenter import config, exceptions, queue


def main(argv: typing.Sequence) -> None:
Expand All @@ -21,13 +23,21 @@ def configure_parser(parser: argparse.ArgumentParser) -> None:
)
configure_queue_parser(queue_parser)

generator_parser = subparsers.add_parser(
'generator',
description='generates new pipelines and queues them to run on available datasets',
)
configure_generator_parser(generator_parser)


def handler(arguments: argparse.Namespace, parser: argparse.ArgumentParser) -> None:
experimenter_command = arguments.experimenter_command
subparser = parser._subparsers._group_actions[0].choices[experimenter_command] # type: ignore

if experimenter_command == 'queue':
queue_handler(arguments, subparser)
elif experimenter_command == 'generator':
generator_handler(arguments, subparser)
else:
raise exceptions.InvalidStateError('Unknown experimenter command: {}'.format(experimenter_command))

Expand All @@ -40,6 +50,12 @@ def configure_queue_parser(parser: argparse.ArgumentParser) -> None:

empty_parser = subparsers.add_parser('empty', help='remove all jobs from a queue')
empty_parser.add_argument('-q', '--queue-name', help='the name of the queue to empty')
empty_parser.add_argument('-f', '--failed', help='remove the failed queue', action='store_true')

#save a failed traceback parser
save_failed_parser = subparsers.add_parser('save-failed', help='save failed job error output')
save_failed_parser.add_argument('-q', '--queue-name', help='the name of the queue to empty')
save_failed_parser.add_argument('-j', '--job-num', type=int, default=0, help='the failed job number')


def queue_handler(arguments: argparse.Namespace, parser: argparse.ArgumentParser) -> None:
Expand All @@ -48,6 +64,123 @@ def queue_handler(arguments: argparse.Namespace, parser: argparse.ArgumentParser
if queue_command == 'status':
queue.status()
elif queue_command == 'empty':
queue.empty(arguments.queue_name)
queue.empty(arguments.queue_name, arguments.failed)
elif queue_command == 'save-failed':
queue.save_failed_job(arguments.queue_name, arguments.job_num)
else:
raise exceptions.InvalidStateError('Unknown queue command: {}'.format(queue_command))


def configure_generator_parser(parser: argparse.ArgumentParser) -> None:
parser.add_argument('-j', '--max-jobs', type=int, default=None, action='store', help='maximum number of jobs generated')
parser.add_argument('-t', '--job-timeout', type=int, default=None, action='store', help='maximum runtime for a single job in seconds')

subparsers = parser.add_subparsers(dest='generator_command')
subparsers.required = True # type: ignore

search_subparser = subparsers.add_parser(
'search',
help='searches for new pipelines not found in the metalearning database',
)
configure_search_parser(search_subparser)

modify_subparser = subparsers.add_parser(
'modify',
help='modifies existing pipelines in the metalearning database',
)
configure_modify_parser(modify_subparser)

update_subparser = subparsers.add_parser(
'update',
help='updates existing pipeline runs in the metalearning database to use the current versions of datasets and primitives',
)
configure_update_parser(update_subparser)


def generator_handler(arguments: argparse.Namespace, parser: argparse.ArgumentParser) -> None:
generator_command = arguments.generator_command
subparser = parser._subparsers._group_actions[0].choices[generator_command] # type: ignore

if generator_command == 'search':
search_handler(arguments, subparser)
elif generator_command == 'modify':
modify_handler(arguments, subparser)
elif generator_command == 'update':
update_handler(arguments, subparser)
else:
raise exceptions.InvalidStateError('Unknown queue command: {}'.format(generator_command))


def configure_search_parser(parser: argparse.ArgumentParser) -> None:
pass


def search_handler(arguments: argparse.Namespace, parser: argparse.ArgumentParser) -> None:
raise exceptions.NotImplementedError()


def configure_modify_parser(parser: argparse.ArgumentParser) -> None:
#create the subparsers for the different types of modifications

#seed swapper functionality
subparsers = parser.add_subparsers(dest='modify_type')
subparsers.required = True
swap_seed_subparser = subparsers.add_parser(
'random-seed',
description='Uses database data to search pipelines and run functional pipelines on different random seeds',
)
#subparser arguments
swap_seed_subparser.add_argument(
'--pipeline_id',
help='The pipeline id to search for in the query, if none, searches all pipelines',
default=None,
type=str)
swap_seed_subparser.add_argument(
'--submitter',
help='The pipeline submitter to add to the query',
default=None,
type=str)
swap_seed_subparser.add_argument(
'--seed-limit',
help='The amount of random seeds that each ran pipeline will have at the end of the test',
default=2,
type=int)
swap_seed_subparser.add_argument(
'--test',
help='run the test instead of random pipeline generation',
action='store_true')

#Primitive swapper functionality
primitive_swap_subparser = subparsers.add_parser(
'primitive-swap',
description='Searches database for pipeline runs containing a primitive and swaps out primitive for a different given primitive')
#subparser arguments
primitive_swap_subparser.add_argument(
'--primitive_id',
help='The id of the primitive to swap out',
default=None,
type=str)
primitive_swap_subparser.add_argument(
'--limit_indeces',
help='Details for primitive swapping',
default=None)
primitive_swap_subparser.add_argument(
'--swap_primitive_id',
help='The id of the primitve to swap in',
default=None,
type=str)


def modify_handler(arguments: argparse.Namespace, parser: argparse.ArgumentParser) -> None:
modify_type = arguments.modify_type
modify_generator = ModifyGenerator(modify_type, arguments.max_jobs, arguments)
#now run the enqueuer part
queue.enqueue_jobs(jobs=modify_generator, job_timeout=arguments.job_timeout)


def configure_update_parser(parser: argparse.ArgumentParser) -> None:
pass


def update_handler(arguments: argparse.Namespace, parser: argparse.ArgumentParser) -> None:
raise exceptions.NotImplementedError()
9 changes: 8 additions & 1 deletion experimenter/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ def validate_data_dir():
def validate_redis_host():
if redis_host is None:
raise exceptions.ConfigError(_ERROR_MESSAGE.format('REDIS_HOST'))

ebensonm marked this conversation as resolved.
Show resolved Hide resolved

d3m_db_submitter: str = os.environ.get('D3M_DB_SUBMITTER', None)
def validate_d3m_db_submitter():
Expand All @@ -39,3 +39,10 @@ def validate_d3m_db_submitter():
def validate_d3m_db_token():
if d3m_db_token is None:
raise exceptions.ConfigError(_ERROR_MESSAGE.format('D3M_DB_TOKEN'))


save_to_d3m: bool = os.environ.get('SAVE_TO_D3M', None) == 'true'
def validate_save():
if save_to_d3m is None:
raise exceptions.ConfigError(_ERROR_MESSAGE.format('SAVE_TO_D3M'))

6 changes: 3 additions & 3 deletions experimenter/databases/d3m_mtl.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,17 +23,17 @@ def __init__(self) -> None:
self._post_url = D3M_MTL_DB_POST_URL
# This env var allows code calling this class to be run during
# unit tests without actually saving to the production DB.
self.should_save = config.SAVE_TO_D3M
self.should_save = config.D3MConfig().save_to_d3m
ebensonm marked this conversation as resolved.
Show resolved Hide resolved
# A reference to a low-level elasticsearch client. This can be
# used to query the D3M DB, or this classe's `search` method
# can be used, and is preferred, since its API is more straightforward.
# This low-level client is the only way to accomplish
# certain things though.
self.es = Elasticsearch(hosts=[D3M_MTL_DB_GET_URL], timeout=30)
# Our submitter name.
self._submitter = config.D3M_DB_SUBMITTER
self._submitter = config.D3MConfig().d3m_submitter
# The secret verifying us as the submitter we say we are.
self._x_token = config.D3M_DB_TOKEN
self._x_token = config.D3MConfig().d3m_token
ebensonm marked this conversation as resolved.
Show resolved Hide resolved
if self._is_identifying_as_submitter():
logger.info(
f"Documents will be saved under submitter name: '{self._submitter}'"
Expand Down
Loading