Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[18.0][ADD] polars_db_schema: introspect databases with dataframe capabilities #316

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 88 additions & 0 deletions polars_db_schema/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
================
Polars Db Schema
================

..
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! This file is generated by oca-gen-addon-readme !!
!! changes will be overwritten. !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! source digest: sha256:0e6c26c886f0fc18c585b743314981ec58966dd913b8a469530e62e313327de2
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

.. |badge1| image:: https://img.shields.io/badge/maturity-Alpha-red.png
:target: https://odoo-community.org/page/development-status
:alt: Alpha
.. |badge2| image:: https://img.shields.io/badge/licence-AGPL--3-blue.png
:target: http://www.gnu.org/licenses/agpl-3.0-standalone.html
:alt: License: AGPL-3
.. |badge3| image:: https://img.shields.io/badge/github-OCA%2Fserver--backend-lightgray.png?logo=github
:target: https://github.com/OCA/server-backend/tree/18.0/polars_db_schema
:alt: OCA/server-backend
.. |badge4| image:: https://img.shields.io/badge/weblate-Translate%20me-F47D42.png
:target: https://translation.odoo-community.org/projects/server-backend-18-0/server-backend-18-0-polars_db_schema
:alt: Translate me on Weblate
.. |badge5| image:: https://img.shields.io/badge/runboat-Try%20me-875A7B.png
:target: https://runboat.odoo-community.org/builds?repo=OCA/server-backend&target_branch=18.0
:alt: Try me on Runboat

|badge1| |badge2| |badge3| |badge4| |badge5|

Introspect external database

.. image:: https://raw.githubusercontent.com/OCA/server-backend/18.0/polars_db_schema/static/description/figure1.png

Use case: you want discover an external database extracting data from
only relevant tables and columns

.. IMPORTANT::
This is an alpha version, the data model and design can change at any time without warning.
Only for development or testing purpose, do not use in production.
`More details on development status <https://odoo-community.org/page/development-status>`_

**Table of contents**

.. contents::
:local:

Bug Tracker
===========

Bugs are tracked on `GitHub Issues <https://github.com/OCA/server-backend/issues>`_.
In case of trouble, please check there if your issue has already been reported.
If you spotted it first, help us to smash it by providing a detailed and welcomed
`feedback <https://github.com/OCA/server-backend/issues/new?body=module:%20polars_db_schema%0Aversion:%2018.0%0A%0A**Steps%20to%20reproduce**%0A-%20...%0A%0A**Current%20behavior**%0A%0A**Expected%20behavior**>`_.

Do not contact contributors directly about support or help with technical issues.

Credits
=======

Authors
-------

* Akretion

Contributors
------------

- Akretion

- David BEAL [email protected]

Maintainers
-----------

This module is maintained by the OCA.

.. image:: https://odoo-community.org/logo.png
:alt: Odoo Community Association
:target: https://odoo-community.org

OCA, or the Odoo Community Association, is a nonprofit organization whose
mission is to support the collaborative development of Odoo features and
promote its widespread use.

This module is part of the `OCA/server-backend <https://github.com/OCA/server-backend/tree/18.0/polars_db_schema>`_ project on GitHub.

You are welcome to contribute. To learn how please visit https://odoo-community.org/page/Contribute.
1 change: 1 addition & 0 deletions polars_db_schema/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from . import models
21 changes: 21 additions & 0 deletions polars_db_schema/__manifest__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"name": "Polars Db Schema",
"version": "18.0.1.0.0",
"author": "Akretion, Odoo Community Association (OCA)",
"development_status": "Alpha",
"website": "https://github.com/OCA/server-backend",
"license": "AGPL-3",
"depends": [
"polars_db_process",
],
"external_dependencies": {"python": []},
"data": [
"security/ir.model.access.xml",
"views/db_config.xml",
"views/db_table.xml",
"views/db_type.xml",
"data/db_type.xml",
],
"demo": [],
"installable": True,
}
44 changes: 44 additions & 0 deletions polars_db_schema/data/db_type.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
<odoo>
<record id="sql_server_db_type" model="db.type">
<field name="name">SQL Server</field>
<field name="code">mssql</field>
<field name="excluded_types">Binary</field>
<field name="row_count_query">
-- This query order tables by rows count
SELECT sOBJ.name AS 'name', SUM(sPTN.Rows) AS row_count
FROM
sys.objects AS sOBJ
INNER JOIN sys.partitions AS sPTN
ON sOBJ.object_id = sPTN.object_id
WHERE
sOBJ.type = 'U'
AND sOBJ.is_ms_shipped = 0x0
AND index_id &lt; 2 -- 0:Heap, 1:Clustered
GROUP BY
sOBJ.schema_id, sOBJ.name
ORDER BY row_count desc
</field>
</record>

<record id="sqlite_db_type" model="db.type">
<field name="name">Sqlite</field>
<field name="code">sqlite</field>
<field name="excluded_types">BLOB</field>
<field name="row_count_query">
-- This query order tables by rows count
SELECT tbl, stat FROM sqlite_stat1
</field>
</record>

<record id="postgres_db_type" model="db.type">
<field name="name">PostgreSQL</field>
<field name="code">postgresql</field>
<field name="excluded_types">bytea</field>
<field name="row_count_query">
-- This query order tables by rows count
SELECT relname AS name, n_live_tup AS row_count
FROM pg_stat_user_tables
ORDER BY row_count DESC;
</field>
</record>
</odoo>
4 changes: 4 additions & 0 deletions polars_db_schema/models/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from . import db_config
from . import db_type
from . import db_table
from . import db_source
46 changes: 46 additions & 0 deletions polars_db_schema/models/db_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import polars as pl

from odoo import fields, models


class DbConfig(models.Model):
_inherit = "db.config"

db_type_id = fields.Many2one(comodel_name="db.type")
db_table_ids = fields.One2many(comodel_name="db.table", inverse_name="db_config_id")
row_count_query = fields.Text(related="db_type_id.row_count_query")

def get_db_metadata(self):
self.ensure_one()
if self.row_count_query:
self._read_sql("SELECT 1")
df = self._read_sql(self.row_count_query)
if self.db_type_id.code == "sqlite":
# https://docs.pola.rs/user-guide/expressions/user-defined-functions/#processing-individual-values-with-map_elements
df = (
df.with_columns(
pl.col("stat").map_elements(sqlite, return_dtype=pl.Int32)
)
# rename columns
.rename({"tbl": "name", "stat": "row_count"})
# stat columns store extra info leading to duplicate lines,
# then make it unique
.unique(maintain_order=True)
)
df = df.filter(pl.col("row_count") > 0).with_columns(
# add m2o foreign key
db_config_id=pl.lit(self.id)
)
df = self._filter_df(df)
self.env["db.table"].search([("db_config_id", "=", self.id)]).unlink()
self.env["db.table"].create(df.to_dicts())

def _filter_df(self, df):
"You may want ignore some tables: inherit me"
return df


def sqlite(value):
"Extract row_count info from 'stat' column"
values = value.split(" ")
return values and int(values[0]) or int(value)
17 changes: 17 additions & 0 deletions polars_db_schema/models/db_source.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from odoo import models


class DfSource(models.Model):
_inherit = "df.source"

# def _get_test_file_paths(self):
# res = super()._get_test_file_paths()
# res.update(
# {
# "polars_db_schema": {
# "relative_path": "tests/files",
# "xmlid": "migr.contact",
# }
# }
# )
# return res
91 changes: 91 additions & 0 deletions polars_db_schema/models/db_table.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
import base64
import io

import connectorx as cx
import polars as pl

from odoo import _, exceptions, fields, models


class DbTable(models.Model):
_name = "db.table"
_description = "Access to database tables"
_order = "row_count DESC"

name = fields.Char(required=True, help="Name of the table")
row_count = fields.Integer(required=True, help="Number of rows contained in table")
xlsx = fields.Binary(string="File", attachment=False, readonly=True)
db_config_id = fields.Many2one(comodel_name="db.config")
filename = fields.Char()
sql = fields.Text(
string="Significant Columns", help="Columns with variable data over rows"
)
uniques = fields.Text(
string="Unique Values",
help="Columns with the same value over rows.\n"
"It could be useless to extract data from these columns,\n"
"because they're probably unused by the application",
)

def get_metadata_info(self):
self.ensure_one()
query = f"SELECT * FROM {self.name}"
connexion = self.db_config_id._get_connexion()
df = cx.read_sql(connexion, query, return_type="polars")
excluded_types = self.db_config_id.db_type_id.excluded_types.split("\n")
cols = [x[0] for x in df.schema.items() if str(x[1]) not in excluded_types]
new_cols = []
uniques = {}
for col in cols:
# TODO improve it
# Some database have dirty column names: :-(
conditions = [x for x in (" ", "*", "-") if x in col]
if any(conditions):
continue
query = f"SELECT distinct {col} FROM self"
res = df.sql(query)
if len(res) > 1:
new_cols.append(col)
else:
# column has the same value whatever row
uniques[col] = res.to_series()[0]
self.uniques = f"{uniques}"
if new_cols:
self.sql = f"SELECT {', '. join(new_cols)}\nFROM {self.name};\n"

# WARNING Thread <Thread(odoo.service.http.request.129007460812352,
# started 129007460812352)> virtual real time limit (151/120s) reached.
# Dumping stacktrace of limit exceeding threads before reloading

def get_spreadsheet(self):
self.ensure_one()
if not self.sql:
self.get_metadata_info()
if not self.sql:
raise exceptions.ValidationError(
_(
"There is no column with varaiable data in this table: "
"check Uniques Values column"
)
)
df = cx.read_sql(
self.db_config_id._get_connexion(), self.sql, return_type="polars"
)
excel_stream = io.BytesIO()
vals = {"workbook": excel_stream}
vals.update(self.get_spreadsheet_settings())
df.write_excel(**vals)
excel_stream.seek(0)
self.filename = f"{self.name}.xlsx"
self.xlsx = base64.encodebytes(excel_stream.read())

def get_spreadsheet_settings(self):
return {
"position": "A1",
"table_style": "Table Style Light 16",
"dtype_formats": {pl.Date: "dd/mm/yyyy"},
"float_precision": 6,
"header_format": {"bold": True, "font_color": "#702963"},
"freeze_panes": "A2",
"autofit": True,
}
15 changes: 15 additions & 0 deletions polars_db_schema/models/db_type.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from odoo import fields, models


class DbType(models.Model):
_name = "db.type"
_description = "Constant parameters about database"

name = fields.Char(required=True)
code = fields.Char(required=True)
row_count_query = fields.Text(
required=True, help="SQL code to find how many records contains each table"
)
excluded_types = fields.Text(
help="Column types to ignore for better introspection (set 1 data by line)"
)
3 changes: 3 additions & 0 deletions polars_db_schema/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[build-system]
requires = ["whool"]
build-backend = "whool.buildapi"
2 changes: 2 additions & 0 deletions polars_db_schema/readme/CONTRIBUTORS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
- Akretion
- David BEAL <[email protected]>
5 changes: 5 additions & 0 deletions polars_db_schema/readme/DESCRIPTION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Introspect external database

.. image:: ../static/description/figure1.png

Use case: you want discover an external database extracting data from only relevant tables and columns
20 changes: 20 additions & 0 deletions polars_db_schema/security/ir.model.access.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
<odoo>
<record id="db_table_all" model="ir.model.access">
<field name="name">Database Table</field>
<field name="model_id" ref="model_db_table" />
<field name="group_id" ref="base.group_user" />
<field name="perm_read" eval="1" />
<field name="perm_create" eval="1" />
<field name="perm_write" eval="1" />
<field name="perm_unlink" eval="1" />
</record>
<record id="db_type_all" model="ir.model.access">
<field name="name">Database Type</field>
<field name="model_id" ref="model_db_type" />
<field name="group_id" ref="base.group_user" />
<field name="perm_read" eval="1" />
<field name="perm_create" eval="1" />
<field name="perm_write" eval="1" />
<field name="perm_unlink" eval="1" />
</record>
</odoo>
Binary file added polars_db_schema/static/description/figure1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading