Skip to content

Commit

Permalink
Fix for the transformers translation plus some renaming (#162)
Browse files Browse the repository at this point in the history
* Fix the translation plus some renaming

* Updated the changelog.

* Apply suggestions from code review

Co-authored-by: Christoph Kuhnke <[email protected]>

* Trying to get the translation test working

* Trying to get the translation test working

* Trying to get the translation test working

---------

Co-authored-by: Christoph Kuhnke <[email protected]>
  • Loading branch information
ahsimb and ckunki authored Feb 2, 2024
1 parent 02c18cc commit 4529e68
Show file tree
Hide file tree
Showing 12 changed files with 83 additions and 44 deletions.
2 changes: 2 additions & 0 deletions doc/changes/changes_0.2.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@ Version: 0.2.0
## Bug Fixes

* #163: Fixed version number of VM images etc.
* #161: Fixed the bug in the Transformers' Translation notebook.

## Refactoring
* #160: Implemented the PM's recommendations of 2024-01-24.

## Documentation
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@
"source": [
"## Poll training status\n",
"\n",
"As it was mentioned above, the model training runs asynchronously. We can monitor its progress by polling the Autopilot job status. Please call this script periodically until you see the status as Completed. "
"As it was mentioned above, the model training runs asynchronously. We can monitor its progress by polling the Autopilot job status. Please call this script periodically until you see the status as Completed. Please note that the model training with AWS Sagemaker may take a considerable time. At the time when this notebook was designed the waiting time was typically in the range of 1-2 hours."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,8 @@
"id": "f3727075-4680-4f2f-83c2-17e5c22c6a57",
"metadata": {},
"source": [
"We will collect 5 best answers."
"We will collect the 5 best answers.\n",
"We will save the result in the variable `udf_output` to support automatic testing of this notebook."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,8 @@
"id": "43f3f6e9-6f40-49b2-bbba-8954c35b5e06",
"metadata": {},
"source": [
"We will collect 5 best answers."
"We will collect 5 best answers.\n",
"We will save the result in the variable `udf_output` to support automatic testing of this notebook."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,8 @@
"source": [
"## Use language model\n",
"\n",
"Let's try to classify a single phrase which definitely bears emotions but is also somewhat ambiguous - \"Oh my God!\""
"Let's try to classify a single phrase that definitely bears emotions but is also somewhat ambiguous - \"Oh my God!\".\n",
"We will save the result in the variable `udf_output` to support automatic testing of this notebook."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@
"outputs": [],
"source": [
"%run utils/model_retrieval.ipynb\n",
"load_huggingface_model(MODEL_NAME, method='udf')"
"load_huggingface_model(MODEL_NAME)"
]
},
{
Expand Down Expand Up @@ -130,7 +130,7 @@
"id": "27b1dd67-ffed-4bf8-9ee7-a1e003cdbcc6",
"metadata": {},
"source": [
"We will be updating this variable at every call to the model.\n",
"At the start, the `MY_TEXT` variable has an initial context. We will update and print this variable at every call to the model.\n",
"Please run the next cell multiple times to see how the text evolves."
]
},
Expand All @@ -139,6 +139,7 @@
"execution_count": null,
"id": "51b0d826-519a-4e03-9ce0-3827d0756f1d",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,14 @@
"MY_TEXT = MY_TEXT.replace(\"'\", \"''\")"
]
},
{
"cell_type": "markdown",
"id": "dff2b9c8-6b6a-4e7a-b065-898d841a34fc",
"metadata": {},
"source": [
"Let's run the token classification model. We will save the result in the variable `udf_output` to support automatic testing of this notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@
"outputs": [],
"source": [
"%run utils/model_retrieval.ipynb\n",
"load_huggingface_model(MODEL_NAME, method='udf')"
"load_huggingface_model(MODEL_NAME)"
]
},
{
Expand All @@ -115,7 +115,20 @@
"MY_TEXT = 'We all live in a yellow submarine'\n",
"\n",
"# Make sure our text can be used in an SQL statement.\n",
"MY_TEXT = MY_TEXT.replace(\"'\", \"''\")"
"MY_TEXT = MY_TEXT.replace(\"'\", \"''\")\n",
"\n",
"SOURCE_LANGUAGE = 'English'\n",
"TARGET_LANGUAGE = 'German'\n",
"\n",
"MAX_TRANSLATION_LENGTH = 200"
]
},
{
"cell_type": "markdown",
"id": "cb8641d1-1b84-47a5-8ef0-e1f01cd8a412",
"metadata": {},
"source": [
"Let's run the translation model. We will save the result in the variable `udf_output` to support automatic testing of this notebook."
]
},
{
Expand All @@ -133,17 +146,21 @@
"outputs": [],
"source": [
"%%sql --save udf_output\n",
"SELECT TE_TRANSLATION_UDF(\n",
" NULL,\n",
" '{{sb_config.te_bfs_connection}}',\n",
" '{{sb_config.te_hf_connection}}',\n",
" '{{sb_config.te_models_bfs_dir}}',\n",
" '{{MODEL_NAME}}',\n",
" '{{MY_TEXT}}',\n",
" '',\n",
" '',\n",
" 0\n",
")"
"WITH MODEL_OUTPUT AS\n",
"(\n",
" SELECT TE_TRANSLATION_UDF(\n",
" NULL,\n",
" '{{sb_config.te_bfs_connection}}',\n",
" '{{sb_config.te_hf_connection}}',\n",
" '{{sb_config.te_models_bfs_dir}}',\n",
" '{{MODEL_NAME}}',\n",
" '{{MY_TEXT}}',\n",
" '{{SOURCE_LANGUAGE}}',\n",
" '{{TARGET_LANGUAGE}}',\n",
" {{MAX_TRANSLATION_LENGTH}}\n",
" )\n",
")\n",
"SELECT translation_text, error_message FROM MODEL_OUTPUT"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,14 @@
"MY_LABELS = 'space & cosmos, scientific discovery, microbiology, robots, archeology'"
]
},
{
"cell_type": "markdown",
"id": "768a97ed-a104-4f89-8c7b-9fdd6558b5ab",
"metadata": {},
"source": [
"Let's run the zero shot text classification model. We will save the result in the variable `udf_output` to support automatic testing of this notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,28 +44,28 @@
"\n",
"class ITDEStatus(Enum):\n",
" \"\"\"\n",
" Display status of the Docker-DB\n",
" Display status of the Exasol Docker-DB\n",
" \"\"\"\n",
" running = 'Docker-DB is RUNNING'\n",
" stopped = 'Docker-DB is STOPPED'\n",
" missing = 'Docker-DB is NOT CREATED'\n",
" running = 'Exasol Docker-DB is RUNNING'\n",
" stopped = 'Exasol Docker-DB is STOPPED'\n",
" missing = 'Exasol Docker-DB is NOT CREATED'\n",
"\n",
"\n",
"def get_db_selection_ui() -> widgets.Widget:\n",
" \"\"\"\n",
" Creates a UI form for choosing between the Docker-DB and a customer database.\n",
" Creates a UI form for choosing between the Exasol Docker-DB and an external Exasol Database.\n",
"\n",
" A global variable `sb_config` should reference a configuration store object.\n",
" \"\"\"\n",
"\n",
" ui_look = get_config_styles()\n",
"\n",
" db_options = ['Docker-DB', 'Custom']\n",
" db_options = ['Exasol Docker-DB', 'External Exasol Database']\n",
" db_choice = 0 if sb_config.get(CKey.use_itde, 'True') == 'True' else 1\n",
" db_selector = widgets.RadioButtons(options=db_options, value=db_options[db_choice], \n",
" layout=ui_look.input_layout, style=ui_look.input_style)\n",
" select_btn = widgets.Button(description='Select', style=ui_look.button_style, layout=ui_look.button_layout)\n",
" header_lbl = widgets.Label(value='Database Choice', style=ui_look.header_style, layout=ui_look.header_layout)\n",
" header_lbl = widgets.Label(value='Exasol Database Choice', style=ui_look.header_style, layout=ui_look.header_layout)\n",
"\n",
"\n",
" def select_database(btn):\n",
Expand All @@ -84,9 +84,9 @@
" return ui\n",
"\n",
"\n",
"def get_custom_db_config_ui() -> widgets.Widget:\n",
"def get_external_db_config_ui() -> widgets.Widget:\n",
" \"\"\"\n",
" Creates a UI form for editing a customer database configuration.\n",
" Creates a UI form for editing an external Exasol Database configuration.\n",
" \n",
" A global variable `sb_config` should reference a configuration store object.\n",
" \"\"\"\n",
Expand Down Expand Up @@ -127,7 +127,7 @@
"\n",
"def get_docker_db_config_ui() -> widgets.Widget:\n",
" \"\"\"\n",
" Creates a UI form for editing the Docker-DB configuration.\n",
" Creates a UI form for editing the Exasol Docker-DB configuration.\n",
"\n",
" A global variable `sb_config` should reference a configuration store object.\n",
" \"\"\"\n",
Expand All @@ -154,23 +154,22 @@
" if sb_config.get(CKey.use_itde, 'True') == 'True':\n",
" return get_docker_db_config_ui()\n",
" else:\n",
" return get_custom_db_config_ui()\n",
" return get_external_db_config_ui()\n",
"\n",
"\n",
"def _get_docker_db_action_buttions(itde_exists: bool, itde_running: bool, \n",
" display_status: widgets.Widget) -> List[widgets.Button]:\n",
" \"\"\"\n",
" Creates one or two action buttons with the correspondent on_click functions for managing the\n",
" Docker-DB. Depending on the current status (idte_exists, itde_running) of the docker container,\n",
" the \"Start\", \"Restart\" or both buttons are created.\n",
" Exasol Docker-DB. Depending on the current status (idte_exists, itde_running) of the docker\n",
" container, the \"Start\", \"Restart\" or both buttons are created.\n",
" When the action is completed successfully, the running status is displayed in the provided\n",
" widget (display_status).\n",
" \"\"\"\n",
"\n",
" def start_docker_db(btn):\n",
" popup_message('Will start the DockerDB')\n",
" try:\n",
" # Need to check if the Docker-DB still exists and not running because\n",
" # Need to check if the Exasol Docker-DB still exists and not running because\n",
" # the situation might have changed while the the widgets were hanging around.\n",
" itde_exists_now, itde_running_now = is_itde_running(sb_config)\n",
" if not itde_running_now:\n",
Expand All @@ -182,11 +181,11 @@
" display_status.value = ITDEStatus.running.value\n",
" btn.icon = 'check'\n",
" except Exception as e:\n",
" popup_message('Failed to start the Docker-DB:' + str(e))\n",
" popup_message('Failed to start the Exasol Docker-DB:' + str(e))\n",
" \n",
" def restart_docker_db(btn):\n",
" try:\n",
" # Need to check again if the Docker-DB exists or not because\n",
" # Need to check again if the Exasol Docker-DB exists or not because\n",
" # the situation might have changed while the widgets were hanging around.\n",
" itde_exists_now, _ = is_itde_running(sb_config)\n",
" if itde_exists_now:\n",
Expand All @@ -196,7 +195,7 @@
" display_status.value = ITDEStatus.running.value\n",
" btn.icon = 'check'\n",
" except Exception as e:\n",
" popup_message('Failed to restart the Docker-DB:' + str(e))\n",
" popup_message('Failed to restart the Exasol Docker-DB:' + str(e))\n",
"\n",
" if itde_running:\n",
" btn_restart = widgets.Button(description='Recreate and Start')\n",
Expand All @@ -216,9 +215,9 @@
"\n",
"def get_start_docker_db_ui() -> widgets.Widget:\n",
" \"\"\"\n",
" A UI for starting or restarting the Docker-DB.\n",
" It checks if an instance of the Docker-DB is already running or if it exists. In that case\n",
" a warning is displayed.\n",
" A UI for starting or restarting the Exasol Docker-DB.\n",
" It checks if an instance of the Exasol Docker-DB is already running or if it exists.\n",
" In that case a warning is displayed.\n",
"\n",
" A global variable `sb_config` should reference a configuration store object.\n",
" \"\"\"\n",
Expand All @@ -228,7 +227,7 @@
"\n",
" ui_look = get_config_styles()\n",
"\n",
" # Get the current status of the Docker-DB.\n",
" # Get the current status of the Exasol Docker-DB.\n",
" itde_exists, itde_running = is_itde_running(sb_config)\n",
"\n",
" # Display the status.\n",
Expand All @@ -241,9 +240,9 @@
" header_lbl.value = ITDEStatus.missing.value\n",
" group_items = [header_lbl]\n",
" \n",
" # Add a warning message about recreating an existing Docker-DB.\n",
" # Add a warning message about recreating an existing Exasol Docker-DB.\n",
" if itde_exists:\n",
" warning_text = 'Please note that recreating the Docker-DB will result in the loss of all data stored in the ' \\\n",
" warning_text = 'Please note that recreating the Exasol Docker-DB will result in the loss of all data stored in the ' \\\n",
" f'{\"running\" if itde_running else \"existing\"} instance of the database!'\n",
" warning_html = widgets.HTML(value= '<style>p.itde_warning{word-wrap: break-word; color: red;}</style>' \\\n",
" '<p class=\"itde_warning\">' + warning_text + ' </p>')\n",
Expand Down
1 change: 1 addition & 0 deletions test/notebooks/nbtest_itde.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
)
from exasol.connections import open_pyexasol_connection


def test_itde(tmp_path):
store_path = tmp_path / 'tmp_config.sqlite'
store_password = "password"
Expand Down
2 changes: 1 addition & 1 deletion test/notebooks/nbtest_transformers.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
'masked_modelling.ipynb',
'token_classification.ipynb',
'text_generation.ipynb',
pytest.param('translation.ipynb', marks=pytest.mark.xfail(reason='some issue to be investigated')),
'translation.ipynb',
'zero_shot_classification.ipynb'
]
)
Expand Down

0 comments on commit 4529e68

Please sign in to comment.