Add sqlite3 as another possible backing store for the dbm module #100414

rhettinger · 2022-12-21T17:49:22Z

Right now we support ndbm and gnu.dbm which might or might not be part of a given build. The fallback is the super slow dumbdbm. Not the sqlite3 is part of the standard build, we can do better.

The module docstring says:

Future versions may change the order in which implementations are
tested for existence, and add interfaces to other dbm-like
implementations.

The future is now. Let's provide a fast, stable, robust, always available alternative.

This can be done will pure python calls to the existing sqlite3 module, or there can be a C extension that calls the Sqlite3 C API directly.

This would automatically be available to the shelve module, giving us a high quality, persistent key-value store.

Linked PRs

The text was updated successfully, but these errors were encountered:

erlend-aasland · 2022-12-21T17:51:41Z

TL;DR

dbm.dumb is not slower than SqliteDict
dbm.dumb is not fail safe, so try to avoid using it until it is fixed
SQLite is a good alternative for persistent dict backend, especially with JSON encoding

In details

The dbm.dumb is not so slow. According to my measurements the existing SqliteDict can be as fast ast dbm.dumb, but only with some tweaks (using WAL mode). The measurements with code are in this pull request: Reorganize the code of benchmark Dobatymo/lmdb-python-dbm#2
If high speed is needed, they can use vedis / unqlite or lmdbm.
Using SQLite backend can be very-very helpful if I would like to view or change the database manually. So if the speed of SqliteDict is enough, I'd prefer use that with JSON encoding (not pickle).
SQLite may be not the best choice storing binary data. Although it could be a good option if we can change shelve module to allow to use json encoding also, and not only have the option to set pickle protocol version.
pysos is also pretty fast and have the ability to review / change the database if the size of the database is small enough to load into an editor. (For example using it as a persistent state backend for small amount of data.)

The biggest problem with dbm.dumb (at least for me) is that it is not fail safe, see the example below. Moreover, it is not mentioned in the documentation. If you kill the Python process, the database will be corrupted every time. (This is because the key file is updated only on adding and deleting keys, and on close.)

import dbm.dumb
import shelve
import itertools

database = dbm.dumb.open("test", "c")
shelf = shelve.Shelf(database)

if "key" in shelf:
    print("The stored value is: " + shelf["key"])
    input("Press ENTER to continue")
    del shelf["key"]

try:
    for index in itertools.count():
        print(f"Persist value {index} to database")
        shelf["key"] = f"value {index} " + ("x" * index)
except:
    print("Exiting gracefully. Try killing the Python process.")

Performance measurements

Sorry. in Windows I do not have dbm.gnu.
Copied from Dobatymo/lmdb-python-dbm#2 (comment)
The numbers are seconds required to do the mentioned amount of work.

write

items	lmdbm	vedis	unqlite	rocksdict	dbm.gnu	semidbm	pysos	dbm.dumb	sqlite-wal	sqlite-autocommit	sqlite-batch	dummypickle	dummyjson
10	0.015	0.000	0.000	0.031	-	0	0	0	0.015	0.015	0	0.015	0
100	0.141	0.000	0.000	0.047	-	0	0	0.031	0.031	0.109	0.094	0.156	0.172
1000	1.609	0.000	0.016	0.062	-	0.016	0.031	0.328	0.328	1.015	0.937	1.781	4.250
10000	15.359	0.062	0.062	0.172	-	0.140	0.250	3.219	3.125	- 17.922	19.453	95.188	-
100000	-	0.719	0.672	1.390	-	1.562	2.687	-	31.172	-	-	-	-
1000000	-	9.234	9.140	16.594	-	17.484	27.390	-	-	-	-	-	-

batch

items	lmdbm	vedis	unqlite	rocksdict	dbm.gnu	semidbm	pysos	dbm.dumb	sqlite-wal	sqlite-autocommit	sqlite-batch	dummypickle	dummyjson
10	0.000	0.000	0.000	-	-	-	-	0	0.016	0	0	0	0
100	0.000	0.000	0.000	-	-	-	-	0.032	0.031	0.078	0	0	0
1000	0.000	0.015	0.000	-	-	-	-	0.312	0.109	0.719	0.109	0	0
10000	0.031	0.047	0.046	-	-	-	-	3.140	1.046	- 16.156	1.047	0	0.047
100000	0.422	0.609	0.594	-	-	-	-	-	10.875	-	10.985	0.375	2.859
1000000	4.188	8.500	8.406	-	-	-	-	-	-	-	-	41.760	-

read

items	lmdbm	vedis	unqlite	rocksdict	dbm.gnu	semidbm	pysos	dbm.dumb	sqlite-wal	sqlite-autocommit	sqlite-batch	dummypickle	dummyjson
10	0.000	0.000	0.000	0.031	-	0	0	0	0	0	0	0	0
100	0.000	0.000	0.000	0.031	-	0	0	0	0.031	0.016	0.031	0	0
1000	0.000	0.000	0.000	0.046	-	0.015	0	0.078	0.218	0.297	0.282	0	0
10000	0.047	0.047	0.047	0.093	-	0.109	0.141	0.765	2.079	2.890	3.013	0.031	0.015
100000	0.610	0.641	0.640	0.735	-	1.422	1.797	-	23.797	-	29.375	0.187	0.218
1000000	6.297	8.609	8.765	20.391	-	15.891	20.485	-	-	-	-	2.141	-

combined

items	lmdbm	vedis	unqlite	rocksdict	dbm.gnu	semidbm	pysos	dbm.dumb	sqlite-wal	sqlite-autocommit	sqlite-batch	dummypickle	dummyjson
10	0.156	0.000	0.015	0.047	-	0	0	0.078	0.235	0.390	0.406	0.156	0.156
100	0.156	0.015	0.000	0.047	-	0.016	0	0.078	0.235	0.390	0.390	0.156	0.203
1000	0.156	0.000	0.000	0.047	-	0.015	0.015	0.094	0.235	0.406	0.390	0.188	0.672
10000	0.156	0.015	0.015	0.047	-	0.015	0.031	0.203	0.234	0.469	0.406	0.594	5.141
100000	0.171	0.000	0.000	0.047	-	0.140	0.250	-	0.266	-	0.484	5.547	49.750
1000000	0.172	0.015	0.016	0.047	-	1.297	2.407	-	-	-	-	90.718	-

rhettinger · 2022-12-31T19:58:07Z

@rhettinger, would you like to continue the discussion here

If you want to run with this, do whatever makes sense for you. The other issue is mostly stale and likely needs a fresh start. A new first draft is likely just an afternoon project. If you don't want to run with it, let me know and I'll work on it.

I don't know that there needs to be much discussion because we have collections.abc.MutableMapping to structure the implementation and because the APIs for shelve and dbm are already established. This is a clearly desirable buildout that addresses the known issues with dumbdbm. With Sqlite3, we get down to a single file in a standard format, we get ACID guarantees, concurrency and compaction, and we get a dbm that will be present in every build.

If you're interested, here's some code that can be used as a starting point:

import sqlite3
from contextlib import suppress
from collections.abc import MutableMapping

BUILD_TABLE = "CREATE TABLE Dict (key TEXT NOT NULL, value BLOB NOT NULL, PRIMARY KEY (key));"
GET_SIZE = "SELECT COUNT (*) FROM Dict;"
LOOKUP_KEY = "SELECT value FROM Dict WHERE key = ?;"
STORE_KV = "REPLACE INTO Dict (key, value) VALUES (?, ?);"
DELETE_KEY = "DELETE FROM Dict WHERE key = ?;"
ITER_KEYS = "SELECT key FROM Dict;"

class SQLiteDict(MutableMapping):

    def __init__(self, dbname=':memory:'):
        self.conn = sqlite3.connect(dbname, isolation_level=None) 
        self.conn.execute('PRAGMA journal_mode = wal') 
        with suppress(sqlite3.OperationalError):
            self.conn.execute(BUILD_TABLE)

    def __len__(self):
        return self.conn.execute(GET_SIZE).fetchone()[0]

    def __getitem__(self, key):
        value_tuple = self.conn.execute(LOOKUP_KEY, (key,)).fetchone()
        if value_tuple is None:
            raise KeyError(key)
        return value_tuple[0]

    def __setitem__(self, key, value):
        self.conn.execute(STORE_KV, (key, value));
        self.conn.commit()

    def __delitem__(self, key):
        if key not in self:
            raise KeyError(key)
        self.conn.execute(DELETE_KEY, (key,));
        self.conn.commit()

    def __iter__(self):
        curs = self.conn.execute(ITER_KEYS)
        while (value_tuple := curs.fetchone()):
            yield value_tuple[0]
        curs.close()

    def close(self):
        self.conn.close()
        self.conn = None

    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.close()

Other thoughts:

If we wanted, we could implement dict-like tracking of key insertion order but the API doesn't seem to require it.
The idea is to leave the above class as a simple as possible and to let a separate DB_Sqllite class build around it to provide readonly logic and anything else specific to the dbm API. That would nicely decouple the core dictionary logic from the needs of dbm.
To make this support concurrency, the mutating methods need to be pause and wait if the db is locked.

erlend-aasland · 2023-01-02T21:24:08Z

If you want to run with this, do whatever makes sense for you. The other issue is mostly stale and likely needs a fresh start.

I prefer closing the old issue; we definitely need a fresh start here. I'll take care of it.

A new first draft is likely just an afternoon project. If you don't want to run with it, let me know and I'll work on it.

I'll give it a shot; it'll be a nice chance for me to get to know the shelve and dbm modules better.

If you're interested, here's some code that can be used as a starting point:

Thanks!

erlend-aasland · 2024-01-23T12:10:23Z

Finally got back to this; I added @rhettinger's patch, amended it so the generic test_dbm test suite passed, and created PR #114481. Go nuts.

Co-authored-by: Raymond Hettinger <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: Mariusz Felisiak <[email protected]>

Co-authored-by: Alex Waygood <[email protected]>

Co-authored-by: Raymond Hettinger <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: Mariusz Felisiak <[email protected]>

…hon#115449) Co-authored-by: Alex Waygood <[email protected]>

…15447)

AdamWill · 2024-06-21T18:59:33Z

This seems to have broken the beaker library: bbangert/beaker#242 . Its test suite, which passes fine on Python 3.12, fails on Python 3.13 in dbm-based tests, with errors about sqlite3 thread violations. Isn't it a bit bad if code using dbm which worked fine with the previous default backend fails with the new one?

AdamWill · 2024-06-21T19:37:13Z

of course, I guess it's possible beaker is just not handling threads correctly, but the other backends weren't smart enough to notice, and beaker's tests aren't good enough to catch it either. I'm not sure which is the case.

rhettinger added type-feature A feature request or enhancement 3.12 bugs and security fixes labels Dec 21, 2022

AlexWaygood added the topic-sqlite3 label Dec 21, 2022

erlend-aasland mentioned this issue Dec 29, 2022

dbm.sqlite proof of concept #48033

Closed

erlend-aasland added this to sqlite3 issues Jan 2, 2023

erlend-aasland removed this from sqlite3 issues Jan 2, 2023

erlend-aasland added 3.13 bugs and security fixes and removed 3.12 bugs and security fixes labels Jun 7, 2023

iritkatriel added the stdlib Python modules in the Lib dir label Nov 28, 2023

erlend-aasland removed the 3.13 bugs and security fixes label Jan 22, 2024

erlend-aasland self-assigned this Jan 22, 2024

erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Jan 23, 2024

pythongh-100414: Add SQLite backend to dbm

8c19c81

bedevere-app bot mentioned this issue Jan 23, 2024

gh-100414: Add SQLite backend to dbm #114481

Merged

This comment was marked as resolved.

Sign in to view

erlend-aasland added this to sqlite3 issues Jan 25, 2024

erlend-aasland moved this to In Progress in sqlite3 issues Jan 25, 2024

erlend-aasland added a commit that referenced this issue Feb 14, 2024

gh-100414: Add SQLite backend to dbm (#114481)

dd5e4d9

Co-authored-by: Raymond Hettinger <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: Mariusz Felisiak <[email protected]>

erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Feb 14, 2024

pythongh-100414: Make dbm.sqlite3 the preferred dbm backend

ad40cfb

bedevere-app bot mentioned this issue Feb 14, 2024

gh-100414: Make dbm.sqlite3 the preferred dbm backend #115447

Merged

erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Feb 14, 2024

pythongh-100414: Skip test_dbm_sqlite3 if sqlite3 is unavailable

77a2a2e

bedevere-app bot mentioned this issue Feb 14, 2024

gh-100414: Skip test_dbm_sqlite3 if sqlite3 is unavailable #115448

Closed

erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Feb 14, 2024

pythongh-100414: Skip test_dbm_sqlite3 if sqlite3 is unavailable

df23c87

bedevere-app bot mentioned this issue Feb 14, 2024

gh-100414: Skip test_dbm_sqlite3 if sqlite3 is unavailable #115449

Merged

erlend-aasland added a commit that referenced this issue Feb 14, 2024

gh-100414: Skip test_dbm_sqlite3 if sqlite3 is unavailable (#115449)

029ec91

Co-authored-by: Alex Waygood <[email protected]>

erlend-aasland added a commit that referenced this issue Feb 14, 2024

gh-100414: Make dbm.sqlite3 the preferred dbm backend (#115447)

6d9141e

erlend-aasland closed this as completed Feb 14, 2024

github-project-automation bot moved this from In Progress to Done in sqlite3 issues Feb 14, 2024

fsc-eriker pushed a commit to fsc-eriker/cpython that referenced this issue Feb 14, 2024

pythongh-100414: Skip test_dbm_sqlite3 if sqlite3 is unavailable (pyt…

642d5f6

…hon#115449) Co-authored-by: Alex Waygood <[email protected]>

fsc-eriker pushed a commit to fsc-eriker/cpython that referenced this issue Feb 14, 2024

pythongh-100414: Make dbm.sqlite3 the preferred dbm backend (python#1…

d8969ee

…15447)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sqlite3 as another possible backing store for the dbm module #100414

Add sqlite3 as another possible backing store for the dbm module #100414

rhettinger commented Dec 21, 2022 •

edited by bedevere-app bot

Loading

erlend-aasland commented Dec 21, 2022

erlend-aasland commented Dec 21, 2022

erlend-aasland commented Dec 29, 2022

presidento commented Dec 31, 2022 •

edited

Loading

rhettinger commented Dec 31, 2022 •

edited by AlexWaygood

Loading

erlend-aasland commented Jan 2, 2023

erlend-aasland commented Jan 23, 2024

This comment was marked as resolved.

AdamWill commented Jun 21, 2024

AdamWill commented Jun 21, 2024

Add sqlite3 as another possible backing store for the dbm module #100414

Add sqlite3 as another possible backing store for the dbm module #100414

Comments

rhettinger commented Dec 21, 2022 • edited by bedevere-app bot Loading

Linked PRs

erlend-aasland commented Dec 21, 2022

erlend-aasland commented Dec 21, 2022

erlend-aasland commented Dec 29, 2022

presidento commented Dec 31, 2022 • edited Loading

TL;DR

In details

Performance measurements

write

batch

read

combined

rhettinger commented Dec 31, 2022 • edited by AlexWaygood Loading

erlend-aasland commented Jan 2, 2023

erlend-aasland commented Jan 23, 2024

This comment was marked as resolved.

AdamWill commented Jun 21, 2024

AdamWill commented Jun 21, 2024

rhettinger commented Dec 21, 2022 •

edited by bedevere-app bot

Loading

presidento commented Dec 31, 2022 •

edited

Loading

rhettinger commented Dec 31, 2022 •

edited by AlexWaygood

Loading