diff --git a/notebooks/auto-cube/main-advanced.ipynb b/notebooks/auto-cube/main-advanced.ipynb new file mode 100644 index 00000000..d4840d7c --- /dev/null +++ b/notebooks/auto-cube/main-advanced.ipynb @@ -0,0 +1,747 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "ba690e2b-72ef-4b43-92f5-2a77e83267bd", + "metadata": {}, + "source": [ + "# Automatic cube creation with atoti - Advanced\n", + "\n", + "[atoti](https://www.atoti.io/) is a free Python BI analytics platform for Quants, Data Analysts, Data Scientists & Business Users to collaborate better, analyze faster and translate their data into business KPIs. \n", + "\n", + "This notebook is an extension of [main.ipynb](main.ipynb), demonstrating how users could customize the data type of each column. This is useful particularly for columns storing an array list. We will also the atoti session and its attributes in this notebook after the BI application is created (with reference to the [VaR dataset](https://s3.eu-west-3.amazonaws.com/data.atoti.io/notebooks/auto-cube/var_dataset.csv)). \n", + "\n", + "\n", + "\n", + "__NOTE:__\n", + "- This is a simplified use case where there is only 1 single atoti table (created from the uploaded CSV)\n", + "- The CSV should be of encoding UTF8\n", + "- For best experience, choose a dataset with a fair number of numeric and non-numeric columns, e.g. [Data Science Job Salaries dataset](https://www.kaggle.com/datasets/ruchi798/data-science-job-salaries) from Kaggle: \n", + " - non-numerical columns are translated into hierarchies\n", + " - a SUM and a MEAN measure will be automatically created for numerical columns (non-key columns)\n", + "- When selecting keys for the atoti table, choose the columns that will ensure data uniqueness.\n", + " - When unsure, skip key selection.\n", + " - Non-unique keys will result in a smaller dataset getting loaded. Only the last occurrence of the duplicates will be kept.\n", + " \n", + "\n", + "To understand more about multidimensional datacubes, check out the [atoti tutorial](https://docs.atoti.io/latest/getting_started/tutorial/tutorial.html). \n", + "\n", + "
\"Try
" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "c9a014b6-e1ff-4d7f-a5d7-0cc4429de899", + "metadata": {}, + "outputs": [], + "source": [ + "import functools\n", + "import io\n", + "import typing\n", + "import webbrowser\n", + "\n", + "import atoti as tt\n", + "import ipywidgets as widgets\n", + "import numpy as np\n", + "import pandas as pd\n", + "from IPython.display import SVG, Markdown" + ] + }, + { + "cell_type": "markdown", + "id": "1bbb4e75-bb49-4f92-ab29-fb3c27ce926c", + "metadata": {}, + "source": [ + "Since atoti is a Python library, we can use it along with other libraries such as ipywidget and Pandas. \n", + "We used FloatProgress from ipywidget to track the loading progress of web application." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "09c8e4fd-cce2-46dd-88c9-e113ec451b5d", + "metadata": {}, + "outputs": [], + "source": [ + "out = widgets.Output()\n", + "fp = widgets.FloatProgress(min=0, max=6)" + ] + }, + { + "cell_type": "markdown", + "id": "f677673a-4bfa-401d-aaab-e3e708ddf5a0", + "metadata": {}, + "source": [ + "We create some global variables in order to access the atoti cube for exploration in the notebook." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "2f9b99d1-2231-4d4e-a324-75d045f2a5c2", + "metadata": {}, + "outputs": [], + "source": [ + "session: tt.Session\n", + "cube: tt.Cube\n", + "table: tt.Table\n", + "\n", + "# just managing some common data types in this use case\n", + "data_types = [\n", + " ty\n", + " for ty in ([\"Default\"] + list(typing.get_args(tt.type.DataType)))\n", + " if ty\n", + " not in [\n", + " \"boolean\",\n", + " \"Object\",\n", + " \"Object[]\",\n", + " \"ZonedDateTime\",\n", + " \"LocalDateTime\",\n", + " \"LocalTime\",\n", + " ]\n", + "]" + ] + }, + { + "cell_type": "markdown", + "id": "5072a445-a7bf-4747-9963-34a2464d7886", + "metadata": { + "tags": [] + }, + "source": [ + "## Steps to creating BI analytics platform with atoti\n", + "\n", + "In the following function, the key steps to create an atoti web application are defined:\n", + "- Instantiate atoti session (web application is created upon instantiation)\n", + "- Create atoti table by loading the Pandas DataFrame (atoti also accepts other datasources such as CSV, Parquet, SQL, Spark DataFrame etc.)\n", + "- Create cube with the atoti table\n", + "- Create [single-value measures](https://docs.atoti.io/latest/lib/atoti/atoti.agg.single_value.html#atoti.agg.single_value) for numerical columns \n", + "\n", + "\n", + "\n", + "__It is possible to create and join multiple atoti table.__ However, in our use case, we are only creating one atoti table using the __Pandas connector__. \n", + "We could have used the CSV connector instead to create the atoti table but Pandas allow us to manipulate the data (e.g. select the key columns and set data type) through interaction with ipywidget.\n", + "\n", + "__We can also create multiple cubes within a session and access them from the web application.__ To keep things simpler, we stick with a single cube in this notebook. \n", + "\n", + "Finally, we make use of the [webbrowser](https://docs.python.org/3/library/webbrowser.html) api to launch the web application in a new browser tab." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "a26ed19e-7764-47c9-8e8e-a9e810551c5c", + "metadata": {}, + "outputs": [], + "source": [ + "def create_cube(df, keys=None, single_value_cols=None, port=19090):\n", + " global session, cube, table\n", + "\n", + " print(f\"-- Creating session on port {port}\")\n", + " fp.value = 2\n", + " session = tt.Session(port=port, user_content_storage=\"./content\")\n", + "\n", + " print(\"--- Loading data into table\")\n", + " fp.value = 3\n", + " table = session.read_pandas(df, table_name=\"table\", keys=keys)\n", + "\n", + " print(\"---- Creating cube\")\n", + " fp.value = 4\n", + " cube = session.create_cube(table)\n", + "\n", + " fp.value = 5\n", + " if single_value_cols:\n", + " print(\n", + " f\"---- Create single value measures for non-keys numerical columns: {single_value_cols}\"\n", + " )\n", + " for col in single_value_cols:\n", + " cube.measures[f\"{col}.VALUE\"] = tt.agg.single_value(table[col])\n", + "\n", + " fp.value = 6\n", + " print(f\"----- Launching web application: {session._local_url}\")\n", + " webbrowser.open(session._local_url)\n", + "\n", + " print(\"======================================================\")\n", + " print(f\"Number of records loaded: {len(table)}\")\n", + " print(\"Table schema: \")\n", + " display(cube.schema)\n", + "\n", + " print()\n", + " display(Markdown(\"### Access web application\"))\n", + " display(\n", + " Markdown(\n", + " \"__Click on this URL if web application is not automatically launched:__\"\n", + " ),\n", + " session.link(),\n", + " )\n", + " print()\n", + " print(\"======================================================\")" + ] + }, + { + "cell_type": "markdown", + "id": "1cd0ed02-093d-4cd4-864e-4bbe28ebd0b5", + "metadata": {}, + "source": [ + "## Data processing prior to BI platform creation\n", + "\n", + "Using iPyWidget, users are able to:\n", + "- interactively select CSV for upload\n", + "- choose keys for table column and set specific data type for columns where necessary\n", + "- monitor progress of creation with the use of `FloatProgress`\n", + "- re-create new cube\n", + "\n", + "We trigger the creation of the cube upon selection of a CSV. \n", + "__Note that we recreate the session whenever a new CSV is selected.__ So the previous dataset will no longer be accessible." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "d5395e0b-a371-4e34-bb35-249876d39dd7", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "def disable_widget(w):\n", + " w.disabled = True\n", + "\n", + "\n", + "@out.capture()\n", + "def on_key_change(b, _df, _keys, _datatypes):\n", + " b.disabled = True\n", + " [disable_widget(ck) for ck in (_keys + _datatypes)]\n", + "\n", + " keys = []\n", + " datatypes = {}\n", + " numerical_cols = []\n", + "\n", + " for i in range(0, len(_keys)):\n", + "\n", + " # unless datatype is specified, datatype is inferred by Pandas\n", + " # atoti inherits datatype from pandas dataframe\n", + " if _datatypes[i].value != \"Default\":\n", + " try:\n", + " if _datatypes[i].value in [\"int[]\", \"long[]\"]:\n", + " _df[_keys[i].description] = (\n", + " _df[_keys[i].description]\n", + " .apply(eval)\n", + " .apply(lambda x: np.array(x).astype(int))\n", + " )\n", + " elif _datatypes[i].value in [\"double[]\", \"float[]\"]:\n", + " _df[_keys[i].description] = (\n", + " _df[_keys[i].description]\n", + " .apply(eval)\n", + " .apply(lambda x: np.array(x).astype(float))\n", + " )\n", + " elif _datatypes[i].value in [\"String\"]:\n", + " _df[_keys[i].description] = _df[_keys[i].description].astype(str)\n", + " elif _datatypes[i].value in [\"LocalDate\"]:\n", + " _df[_keys[i].description] = pd.to_datetime(\n", + " _df[_keys[i].description]\n", + " )\n", + " elif _datatypes[i].value in [\"double\", \"float\"]:\n", + " _df[_keys[i].description] = _df[_keys[i].description].astype(\n", + " _datatypes[i].value\n", + " )\n", + " elif _datatypes[i].value in [\"int\", \"long\"]:\n", + " _df[_keys[i].description] = _df[_keys[i].description].astype(int)\n", + "\n", + " if _datatypes[i].value not in [\"LocalDate\", \"String\"]:\n", + " numerical_cols = numerical_cols + [_keys[i].description]\n", + "\n", + " except:\n", + " print(\n", + " f\"Error encountered casting {_keys[i].description} to {_datatypes[i].value}. Value remain in default type.\"\n", + " )\n", + "\n", + " if _keys[i].value == True:\n", + " keys = keys + [_keys[i].description]\n", + "\n", + " # we gather the numerical columns in order to create single_value measures\n", + " numerical_cols = (\n", + " numerical_cols + _df.select_dtypes(include=\"number\").columns.to_list()\n", + " )\n", + " # exclude the selected table keys as we will not create measures for them\n", + " if len(keys) > 0:\n", + " numerical_cols = [col for col in numerical_cols if col not in keys]\n", + " print(f\"numerical_cols: {numerical_cols}\")\n", + "\n", + " create_cube(_df, keys, numerical_cols)\n", + " displayFileLoader()" + ] + }, + { + "cell_type": "markdown", + "id": "fd5c8c62-5947-4aa2-b22f-8ec3d69b07b3", + "metadata": {}, + "source": [ + "## Set the stage with ipywidget\n", + "\n", + "Using ipywidget, we can interact with the uploaded data to:\n", + "1. choose keys for the atoti table that we are creating\n", + "2. choose datatype for column (to override the default type inferred by Pandas)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "8651e559-cc3a-43ff-a1c8-d3bb8f632b47", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "@out.capture()\n", + "def on_upload_change(change):\n", + " out.clear_output()\n", + " display(fp)\n", + " print(\"Starting cube creation for \", list(change[\"new\"].keys()))\n", + "\n", + " fp.value = 0\n", + " print(\"- Reading file\")\n", + " input_file = list(change[\"new\"].values())[0]\n", + " content = input_file[\"content\"]\n", + " content = io.StringIO(content.decode(\"utf-8\"))\n", + " df = pd.read_csv(content)\n", + "\n", + " fp.value = 1\n", + " columns = df.columns.tolist()\n", + "\n", + " # checkboxes for list of columns for users to select table keys\n", + " checkboxes = [widgets.Checkbox(value=False, description=label) for label in columns]\n", + "\n", + " # dropdown list for data type options for each column\n", + " dropdowns = [\n", + " widgets.Dropdown(options=data_types, value=data_types[0]) for label in columns\n", + " ]\n", + "\n", + " button = widgets.Button(\n", + " description=\"Submit\",\n", + " disabled=False,\n", + " button_style=\"\",\n", + " tooltip=\"Submit selected keys\",\n", + " icon=\"check\", # (FontAwesome names without the `fa-` prefix)\n", + " )\n", + "\n", + " instructions = widgets.HTML(\n", + " value=\"\"\"
    \n", + "
  1. Select checkbox to select column as keys.
  2. \n", + "
  3. Select data type from drop-down list for specific column. Common types are inferred when creating Pandas DataFrame.
  4. \n", + "
\"\"\"\n", + " )\n", + "\n", + " left_box = widgets.VBox(children=checkboxes)\n", + " right_box = widgets.VBox(children=dropdowns)\n", + "\n", + " display(widgets.VBox([instructions, widgets.HBox([left_box, right_box]), button]))\n", + "\n", + " button.on_click(\n", + " functools.partial(on_key_change, _df=df, _keys=checkboxes, _datatypes=dropdowns)\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "ca0577b5-8e09-4448-9ee9-165cc54e8221", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "def displayFileLoader():\n", + " uploader = widgets.FileUpload(\n", + " accept=\".csv\",\n", + " multiple=False,\n", + " )\n", + "\n", + " uploader.observe(on_upload_change, \"value\")\n", + " with out:\n", + " display(uploader)" + ] + }, + { + "cell_type": "markdown", + "id": "962edc22-6072-4584-b55f-572182525f15", + "metadata": {}, + "source": [ + "Feel free to re-select a new CSV file to test out different datasets." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "276c673d-78e6-4a4b-ae76-b36bd73760d5", + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "13ba1bf7518d4fb89eda34e466d3ce9f", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Output()" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "displayFileLoader()\n", + "out" + ] + }, + { + "cell_type": "markdown", + "id": "3e5e730f-65d1-4000-924d-0a7393d6d8b9", + "metadata": { + "tags": [] + }, + "source": [ + "## Technology behind atoti \n", + "\n", + " \n", + "\n", + "### In-memory multidimensional data cube\n", + "\n", + "Behind the scene, we create an in-memory multidimensional data cube following the [snowflake schema](https://en.wikipedia.org/wiki/Snowflake_schema). \n", + "Once the cube is formed, user is able to perform multidimensional data analytics from different perspectives:\n", + "- slice and dice\n", + "- drill-down and roll-up\n", + "- drill-through for investigation" + ] + }, + { + "cell_type": "raw", + "id": "469a6c62-f577-424e-87af-7e97aa92b288", + "metadata": {}, + "source": [ + "cube" + ] + }, + { + "cell_type": "markdown", + "id": "bfb4f8a1-4bed-4c35-ac8e-433c5437ed36", + "metadata": {}, + "source": [ + "### JupyterLab for prototyping and Web application for end-user\n", + "\n", + "atoti makes it easy to explore your dataset and construct your data model in __JupyterLab__ during prototyping stage:\n", + "- easily add new data source to the cube\n", + "- create new measures\n", + "- visualize data within notebook" + ] + }, + { + "cell_type": "raw", + "id": "cd824d22-af32-4b76-b1e7-6e6ef7e1dfe4", + "metadata": { + "atoti": { + "widget": { + "columnWidths": { + "[Measures].[pnl_vector.VALUE]": 228.80621337890625, + "[table].[instrument_code].[instrument_code]": 178.390625 + }, + "mapping": { + "columns": [ + "ALL_MEASURES" + ], + "measures": [ + "[Measures].[pnl_vector.VALUE]" + ], + "rows": [ + "[table].[instrument_code].[instrument_code]" + ] + }, + "query": { + "mdx": "SELECT NON EMPTY Hierarchize(Descendants({[table].[instrument_code].[AllMember]}, 1, SELF_AND_BEFORE)) ON ROWS, NON EMPTY {[Measures].[pnl_vector.VALUE]} ON COLUMNS FROM [table] CELL PROPERTIES VALUE, FORMATTED_VALUE, BACK_COLOR, FORE_COLOR, FONT_FLAGS", + "updateMode": "once" + }, + "serverKey": "default", + "widgetKey": "pivot-table" + } + }, + "tags": [] + }, + "source": [ + "session.visualize()" + ] + }, + { + "cell_type": "markdown", + "id": "a70c52bf-ad58-48aa-ac46-4afca21f6b90", + "metadata": {}, + "source": [ + "### Working with cube" + ] + }, + { + "cell_type": "raw", + "id": "318c0a5c-ed1c-4e86-be4c-25d60a27cef4", + "metadata": {}, + "source": [ + "h, l, m = cube.hierarchies, cube.levels, cube.measures" + ] + }, + { + "cell_type": "markdown", + "id": "b797fd05-5748-4c30-bb71-655d7866e18a", + "metadata": {}, + "source": [ + "#### Creating measures" + ] + }, + { + "cell_type": "raw", + "id": "f2673684-a4bd-4ebf-9341-1a173d026b75", + "metadata": {}, + "source": [ + "m[\"scaled_pnl_vector\"] = m[\"quantity.SUM\"] * m[\"pnl_vector.VALUE\"]" + ] + }, + { + "cell_type": "raw", + "id": "0300e606-3ad3-44e4-a63d-c4f59373023c", + "metadata": {}, + "source": [ + "m[\"Position vector\"] = tt.agg.sum(m[\"scaled_pnl_vector\"], scope=tt.OriginScope(l[\"instrument_code\"], l[\"book_id\"]))" + ] + }, + { + "cell_type": "raw", + "id": "23f45d39-a876-4695-819f-c1898c084e0e", + "metadata": { + "atoti": { + "height": 487, + "widget": { + "columnWidths": { + "[Measures].[Position vector]": 317, + "[Measures].[scaled_pnl_vector]": 292.22503662109375, + "[confidence_simulation].[confidence_simulation].[90%],[Measures].[Position vector]": 208, + "[confidence_simulation].[confidence_simulation].[90%],[Measures].[scaled_pnl_vector]": 176.77496337890625, + "[confidence_simulation].[confidence_simulation].[95%],[Measures].[Position vector]": 189.22503662109375, + "[confidence_simulation].[confidence_simulation].[95%],[Measures].[scaled_pnl_vector]": 159.22503662109375, + "[table].[instrument_code].[instrument_code]": 209.80621337890625 + }, + "mapping": { + "columns": [ + "ALL_MEASURES" + ], + "measures": [ + "[Measures].[scaled_pnl_vector]", + "[Measures].[Position vector]" + ], + "rows": [ + "[table].[instrument_code].[instrument_code]" + ] + }, + "query": { + "mdx": "SELECT NON EMPTY Hierarchize(Descendants({[table].[instrument_code].[AllMember]}, 1, SELF_AND_BEFORE)) ON ROWS, NON EMPTY {[Measures].[scaled_pnl_vector], [Measures].[Position vector]} ON COLUMNS FROM [table] CELL PROPERTIES VALUE, FORMATTED_VALUE, BACK_COLOR, FORE_COLOR, FONT_FLAGS", + "updateMode": "once" + }, + "serverKey": "default", + "widgetKey": "pivot-table" + } + }, + "tags": [] + }, + "source": [ + "session.visualize()" + ] + }, + { + "cell_type": "markdown", + "id": "8210c392-701f-4bae-8745-2428a7baddc2", + "metadata": {}, + "source": [ + "### Running simulations" + ] + }, + { + "cell_type": "raw", + "id": "3cefdeca-f9f1-4234-9035-221399fed514", + "metadata": {}, + "source": [ + "confidence_simulation = cube.create_parameter_simulation(\n", + " \"confidence_simulation\",\n", + " measures={\"Confidence level\": 0.95},\n", + " base_scenario_name=\"95%\"\n", + ")" + ] + }, + { + "cell_type": "raw", + "id": "8c6620fe-1019-4455-8ae3-8dd3dbc7140e", + "metadata": {}, + "source": [ + "cube.query(m[\"Confidence level\"])" + ] + }, + { + "cell_type": "raw", + "id": "8e533b42-b31c-4bdb-abda-a83b0432370b", + "metadata": {}, + "source": [ + "m[\"VaR\"] = tt.array.quantile(m[\"Position vector\"], m[\"Confidence level\"])" + ] + }, + { + "cell_type": "raw", + "id": "cfc51b8f-0d2a-4b79-8673-ec2f1e897f96", + "metadata": { + "atoti": { + "height": 350, + "widget": { + "columnWidths": { + "[Measures].[Position vector]": 317, + "[Measures].[scaled_pnl_vector]": 292.22503662109375, + "[confidence_simulation].[confidence_simulation].[90%],[Measures].[Position vector]": 208, + "[confidence_simulation].[confidence_simulation].[90%],[Measures].[scaled_pnl_vector]": 176.77496337890625, + "[confidence_simulation].[confidence_simulation].[95%],[Measures].[Position vector]": 189.22503662109375, + "[confidence_simulation].[confidence_simulation].[95%],[Measures].[scaled_pnl_vector]": 159.22503662109375, + "[table].[instrument_code].[instrument_code]": 209.80621337890625 + }, + "mapping": { + "columns": [ + "ALL_MEASURES" + ], + "measures": [ + "[Measures].[scaled_pnl_vector]", + "[Measures].[Position vector]", + "[Measures].[VaR]" + ], + "rows": [ + "[table].[instrument_code].[instrument_code]" + ] + }, + "query": { + "mdx": "SELECT NON EMPTY Hierarchize(Descendants({[table].[instrument_code].[AllMember]}, 1, SELF_AND_BEFORE)) ON ROWS, NON EMPTY {[Measures].[scaled_pnl_vector], [Measures].[Position vector], [Measures].[VaR]} ON COLUMNS FROM [table] CELL PROPERTIES VALUE, FORMATTED_VALUE, BACK_COLOR, FORE_COLOR, FONT_FLAGS", + "updateMode": "once" + }, + "serverKey": "default", + "widgetKey": "pivot-table" + } + }, + "tags": [] + }, + "source": [ + "session.visualize()" + ] + }, + { + "cell_type": "raw", + "id": "ebc531f4-c764-4f78-b2ff-d1d74736df97", + "metadata": {}, + "source": [ + "confidence_simulation += (\"90%\", 0.90)\n", + "confidence_simulation += (\"98%\", 0.98)" + ] + }, + { + "cell_type": "raw", + "id": "59fda9ab-9ce8-45f4-a056-5a789bc6d95e", + "metadata": { + "atoti": { + "height": 474, + "widget": { + "columnWidths": { + "[table].[instrument_code].[instrument_code]": 168 + }, + "mapping": { + "columns": [ + "ALL_MEASURES", + "[confidence_simulation].[confidence_simulation].[confidence_simulation]" + ], + "measures": [ + "[Measures].[VaR]" + ], + "rows": [ + "[table].[book_id].[book_id]", + "[table].[instrument_code].[instrument_code]" + ] + }, + "query": { + "mdx": "SELECT NON EMPTY Crossjoin({[Measures].[VaR]}, [confidence_simulation].[confidence_simulation].[confidence_simulation].Members) ON COLUMNS, NON EMPTY Crossjoin(Hierarchize(Descendants({[table].[book_id].[AllMember]}, 1, SELF_AND_BEFORE)), Hierarchize(Descendants({[table].[instrument_code].[AllMember]}, 1, SELF_AND_BEFORE))) ON ROWS FROM [table] CELL PROPERTIES VALUE, FORMATTED_VALUE, BACK_COLOR, FORE_COLOR, FONT_FLAGS", + "updateMode": "once" + }, + "serverKey": "default", + "widgetKey": "pivot-table" + } + }, + "tags": [] + }, + "source": [ + "session.visualize()" + ] + }, + { + "cell_type": "markdown", + "id": "147c33c9-3cb2-4dde-a30a-62fd7969853e", + "metadata": {}, + "source": [ + "## Find out more about atoti\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + "
LinkedIn https://www.linkedin.com/company/atoti
Twitter https://twitter.com/atoti_io
YouTube https://www.youtube.com/c/atoti
Medium https://medium.com/atoti
\n", + "\n", + "## More examples\n", + "
Notebook gallery https://github.com/atoti/notebooks
\n", + "\n", + "\n", + "## Reach out to us\n", + "
GitHub Discussion https://github.com/atoti/atoti/discussions
\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "da676c84-e9bf-4ade-8475-90bb7a74677c", + "metadata": {}, + "source": [ + "
\"Try
" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/notebooks/auto-cube/main.ipynb b/notebooks/auto-cube/main.ipynb index cd1d6820..388406a9 100644 --- a/notebooks/auto-cube/main.ipynb +++ b/notebooks/auto-cube/main.ipynb @@ -14,15 +14,16 @@ "\n", "__NOTE:__\n", "- This is a simplified use case where there is only 1 single atoti table (created from the uploaded CSV)\n", - "- CSV should be of encoding UTF8\n", + "- The CSV should be of encoding UTF8\n", "- For best experience, choose a dataset with a fair number of numeric and non-numeric columns, e.g. [Data Science Job Salaries dataset](https://www.kaggle.com/datasets/ruchi798/data-science-job-salaries) from Kaggle: \n", " - non-numerical columns are translated into hierarchies\n", - " - a SUM and a MEAN measure will be automatically created for non-numeric columns (non-key columns)\n", + " - a SUM and a MEAN measure will be automatically created for numerical columns (non-key columns)\n", "- When selecting keys for the atoti table, choose the columns that will ensure data uniqueness.\n", " - When unsure, skip key selection.\n", " - Non-unique keys will result in a smaller dataset getting loaded. Only the last occurrence of the duplicates will be kept.\n", " \n", - "To understand more about multidimensional datacube, check out [atoti tutorial](https://docs.atoti.io/latest/getting_started/tutorial/tutorial.html). \n", + "\n", + "To understand more about multidimensional datacubes, check out the [atoti tutorial](https://docs.atoti.io/latest/getting_started/tutorial/tutorial.html). \n", "\n", "
\"Try
" ] @@ -268,7 +269,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.12" + "version": "3.9.9" } }, "nbformat": 4,