dbm.sqlite proof of concept #48033

smontanaro · 2008-09-04T23:57:31Z

BPO	3783
Nosy	@rhettinger, @doko42, @jcea, @pitrou, @merwok, @florentx, @erlend-aasland
Files	sq_dict.py: an alternate sqlite dbm-like interface v.5 dbsqlite.py: Fixed len dbdict.py: Non-synchronous DB based on a dict subclass.

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2008-09-04.23:57:30.571>
labels = ['type-feature', 'library']
title = 'dbm.sqlite proof of concept'
updated_at = <Date 2021-07-28.20:11:08.117>
user = 'https://github.com/smontanaro'

bugs.python.org fields:

activity = <Date 2021-07-28.20:11:08.117>
actor = 'erlendaasland'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2008-09-04.23:57:30.571>
creator = 'skip.montanaro'
dependencies = []
files = ['11602', '12933', '12939']
hgrepos = []
issue_num = 3783
keywords = ['patch']
message_count = 51.0
messages = ['72556', '72565', '72566', '72567', '72568', '72569', '72570', '72587', '72595', '72661', '72677', '72719', '72725', '72729', '73012', '73013', '73018', '73020', '73025', '73026', '73027', '73028', '73032', '73034', '73046', '73053', '73054', '73067', '73068', '73070', '73072', '73074', '73077', '73080', '73775', '73800', '79050', '80802', '80803', '80808', '80817', '81109', '81110', '81113', '81155', '81483', '85717', '95492', '95804', '97473', '248835']
nosy_count = 11.0
nosy_names = ['rhettinger', 'doko', 'jcea', 'ghaering', 'pitrou', 'erno', 'eric.araujo', 'gregburd', 'flox', 'rute', 'erlendaasland']
pr_nums = []
priority = 'low'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue3783'
versions = ['Python 3.4']

smontanaro · 2008-09-04T23:57:28Z

Based on recent discussions about ridding Python of bsddb I decided to
see how hard it would be to implement a barebones dbm.sqlite module.
Turns out, not very hard.

No docs. No test cases. Caveat emptor. But I think it can serve as
at least a proof of concept, maybe as the basis for a new module in
3.1.

smontanaro · 2008-09-05T04:44:26Z

Attaching corrected module.

smontanaro · 2008-09-05T04:45:19Z

Attaching test cases based on dumbdbm tests.

smontanaro · 2008-09-05T04:54:41Z

Another slight revision to the module.

smontanaro · 2008-09-05T04:55:57Z

Trivial doc diffs against 3.0b3 doc.

smontanaro · 2008-09-05T05:00:53Z

Another tweak - add values()

smontanaro · 2008-09-05T05:02:13Z

Updated test cases

pitrou · 2008-09-05T12:01:34Z

It would be more efficient to base keys() on iterkeys() than the
reverse, IMO.
Other than that, why not open a branch or at least upload full-fledged
patch files? :)

smontanaro · 2008-09-05T14:39:37Z

Antoine> It would be more efficient to base keys() on iterkeys() than the
Antoine> reverse, IMO.

True. I was just modifying the dumbdbm implementation.

Antoine> Other than that, why not open a branch or at least upload
Antoine> full-fledged patch files? :)

Well, I only intended to create the initial proof of concept, but then last
night I couldn't sleep so I cobbled the test module and docs from the
existing stuff ... ;-)

When I get a couple minutes free I'll try and condense it into a more
suitable form. Probably not until the weekend though.

Skip

smontanaro · 2008-09-06T02:55:25Z

OK, I made a sandbox project out of it:

svn+ssh://[email protected]/sandbox/trunk/dbm_sqlite

Hack away!

rhettinger · 2008-09-06T18:06:07Z

I would like to see something like this go into 3.0 so that shelves
don't become useless for Windows users.

josiahcarlson · 2008-09-06T22:43:15Z

Here's an alternate version with most of bsddb's interface intact.

gpshead · 2008-09-07T00:54:46Z

sq_dict review:

have sqlite quote/escape self._mtn before using it with a python %s
substitution. or pass it into the sql query function as a positional ?
parameter like you do for keys and values. (avoid sql injection)

raise a TypeError rather than a ValueError when you don't like the key
or value type.

also, to test the type, isinstance(val, str) is better than using type(val).

josiahcarlson · 2008-09-07T03:34:21Z

I tried passing the db name as a parameter with '?', it doesn't always
work. Also, there shouldn't be any SQL injection issues here unless
someone designed their system wrong (if a third party is allowed to pass
the name of a db table into the open/create function, then they can do
much worse than mangle or hide data in a sqlite database).

With regards to isinstance being better than type; it's only better if
you want to support subclasses. When writing the module, I had no
interest in supporting subclasses (though supporting both str and buffer
in 2.x, and bytes and memoryview in 3.x seems reasonable).

ghaering · 2008-09-11T10:37:45Z

I like Skip's version better, because it's closer to the dbm
"specification" instead of trying to mimic bsddb (first, last, etc.).
I'd like to keep such things out.

I've made a few changes to the sandbox project which I will check in
later today. The most important change is support for a "fast mode",
which doesn't commit changes until you call the synch() method. synch()
is also called on close().

Perhaps we should do automatic commits every n (like 1000) changes, too?

What's all this ORDER BY in both your implementations about? The dbm
"spec" says nothing about keys being ordered AFAIC. Can we get rid of these?

ghaering · 2008-09-11T10:42:32Z

One question about Josiah's _check_value(). SQLite is less crippled than
other simplistic databases and also supports integers, reals and blobs
in addition to strings.

Shouldn't we make this accessible to users? Or is compatibility with
other dbm implementations more important?

smontanaro · 2008-09-11T12:45:46Z

Gerhard> What's all this ORDER BY in both your implementations about?
Gerhard> The dbm "spec" says nothing about keys being ordered AFAIC. Can
Gerhard> we get rid of these?

I'd like to guarantee that zip(db.keys(), db.values() == db.items().

Skip

pitrou · 2008-09-11T13:10:23Z

Le jeudi 11 septembre 2008 à 12:46 +0000, Skip Montanaro a écrit :

Gerhard> What's all this ORDER BY in both your implementations about?
Gerhard> The dbm "spec" says nothing about keys being ordered AFAIC. Can
Gerhard> we get rid of these?

I'd like to guarantee that zip(db.keys(), db.values() == db.items().

It doesn't sound very useful, and it may hurt performance on big tables.

ghaering · 2008-09-11T13:41:20Z

I'd like to guarantee that zip(db.keys(), db.values() == db.items().

Ok. If that isn't guaranteed elsewhere just drop it here?

FWIW that will also work without the ORDER BY, because you're getting
the rows back in the same ORDER. Something cheaper would also be "ORDER
BY ROWID". I still propose to just do without the ORDER BY.

smontanaro · 2008-09-11T13:46:50Z

> I'd like to guarantee that zip(db.keys(), db.values() == db.items().

Antoine> It doesn't sound very useful, and it may hurt performance on
Antoine> big tables.

Actually, I think Python guarantees (for dicts at least - other mappings
should probably follow suit) that if you call keys() then call values()
without making any changes to the dict that their orders match, e.g., that

zip(d.keys(), d.values()) == d.items()

Skip

smontanaro · 2008-09-11T13:49:23Z

Gerhard> FWIW that will also work without the ORDER BY, because you're
Gerhard> getting the rows back in the same ORDER. Something cheaper
Gerhard> would also be "ORDER BY ROWID". I still propose to just do
Gerhard> without the ORDER BY.

As long as SQLite guarantees that the ordering is identical, then sure, dump
the ORDER BY clause.

Skip

ghaering · 2008-09-11T13:53:04Z

Skip Montanaro wrote:

Skip Montanaro <[email protected]> added the comment:

Gerhard> FWIW that will also work without the ORDER BY, because you're
Gerhard> getting the rows back in the same ORDER. Something cheaper
Gerhard> would also be "ORDER BY ROWID". I still propose to just do
Gerhard> without the ORDER BY.

As long as SQLite guarantees that the ordering is identical, then sure, dump
the ORDER BY clause.

It doesn't guarantee it, but the implementation behaves like this.

-- Gerhard

pitrou · 2008-09-11T14:16:51Z

Le jeudi 11 septembre 2008 à 13:48 +0000, Skip Montanaro a écrit :

Actually, I think Python guarantees (for dicts at least - other mappings
should probably follow suit) that if you call keys() then call values()
without making any changes to the dict that their orders match, e.g., that
zip(d.keys(), d.values()) == d.items()

Perhaps. I've never written any code that relies this, though, and it
doesn't sound like an useful guarantee since you can just use the
items() method anyway. It probably dates back to an era when list
comprehensions didn't exist, and extracting keys or values from the
items list required several lines of code and costly method calls.

Also, the point is that Python dicts can make that guarantee without
being any slower. It may not be the same for an RDBMS backend. Why?
Because, depending on the backend, index and data can be stored in
separate areas with different storage layouts (e.g. keys are in a B tree
while values are just dumped sequentially). If you only ask for
unordered keys, they will be read in optimal (sequential) index order,
and if you only ask for unordered values, they will be read in optimal
(sequential) data order, which is not the same. This is true for e.g.
MySQL.

(also, IMO this discussion proves that the module shouldn't be included
in Python 3.0. It's too young, its API hasn't even settled down)

pitrou · 2008-09-11T14:33:44Z

I might add that calling keys() then values() is suboptimal, because it
will issue two SQL queries while calling items() will issue just one.

josiahcarlson · 2008-09-11T18:25:59Z

I like Skip's version better, because it's closer to the dbm
"specification" instead of trying to mimic bsddb (first, last, etc.).
I'd like to keep such things out.

dbm.sqlite is meant as a potential replacement of dbm.bsddb. Since
people do use the extra methods (.first(), .last(), etc.), not having
them could lead to breakage.

Separating them out into a subclass (regular open doesn't have it, but
btopen does), along with all of the other order guarantees (the ORDER BY
clauses in the SQL statements), could keep it fast for people who don't
care about ordering, and keep it consistent for those who do care about
ordering.

Attached you will find an updated version.

smontanaro · 2008-09-11T19:48:04Z

> As long as SQLite guarantees that the ordering is identical, then
>> sure, dump the ORDER BY clause.

Gerhard> It doesn't guarantee it, but the implementation behaves like
Gerhard> this.

If the behavior isn't guaranteed, I think you need to retain ORDER BY.

Skip

smontanaro · 2008-09-11T20:12:45Z

Antoine> I might add that calling keys() then values() is suboptimal,
Antoine> because it will issue two SQL queries while calling items()
Antoine> will issue just one.

Well, sure, but heaven only knows what an application programmer will do...

S

rhettinger · 2009-02-03T22:36:01Z

Here's an updated patch (it's also in the sandbox):

Added a sync() method to support shelves.
Removed commits on granular sets and gets.
Optimized __len__ and __contains__.

pitrou · 2009-02-03T22:42:21Z

I think issuing 'SELECT MAX(ROWID)' to compute the length of the table
is not correct if some rows get deleted in the table.
I've found a thread about it here:
http://osdir.com/ml/db.sqlite.general/2004-03/msg00329.html
In that thread someone suggested caching the length in another table and
updating it through a trigger each time the main table is modified.

rhettinger · 2009-02-03T23:11:46Z

That's a bummer. Changing this method to __bool__ and then setting
__len__ back to "count(*)".

rhettinger · 2009-02-04T20:18:53Z

FWIW, I put an alternative in the sandbox /dbm_sqlite/alt/dbdict.py and
am attaching a copy here. The idea is to emulate gdbm's fast mode and
delay all writes until closing. That lets us subclass from dict and get
high-speed lookups, sets, and deletions. Freeing ourselves from an DB
also gets us a choice of ultra-portable file formats (json, csv, pickle,
eval).

rhettinger · 2009-02-09T20:12:14Z

Unassigning. The code works but no one seems to be pushing for or
caring about inclusion in Py3.1.

If commits are delayed, then you might as well adopt the dbdict.py
approach instead (reading the file in once at the beginning, operating
directly on a dict subclass, and atomically writing it out at the end).

doko42 · 2009-04-07T15:36:32Z

is there any chance for inclusion in 3.1?

rute · 2009-11-19T16:56:48Z

By utilizing triggers on inserts and deletes it is possible to
keep track of the size and speed up __len__ by 10 x.

SQL:

CREATE TABLE IF NOT EXISTS info
(key TEXT UNIQUE NOT NULL,
value INTEGER NOT NULL);

INSERT OR IGNORE INTO info (key,value) VALUES ('size',0);

CREATE TABLE IF NOT EXISTS shelf
(key TEXT UNIQUE NOT NULL,
value TEXT NOT NULL);

CREATE TRIGGER IF NOT EXISTS insert_shelf
AFTER INSERT ON shelf
BEGIN
UPDATE info SET value = value + 1 WHERE key = 'size';
END;

CREATE TRIGGER IF NOT EXISTS delete_shelf
AFTER DELETE ON shelf
BEGIN
UPDATE info SET value = value - 1 WHERE key = 'size';
END;

On my laptop this increase the speed of 'len' about 10x

I have a slightly modified version of dbsqlite.py for
running on python 2.5 utilizing the triggers for
keep track of the size:

http://dpaste.com/hold/122439/

pitrou · 2009-11-29T12:47:18Z

It would be nice to try to advance this at PyCon, or at another time.

rute · 2010-01-09T21:58:37Z

Multi threading:

According to http://www.sqlite.org/cvstrac/wiki?p=MultiThreading
we need to keep a connection for each thread to support multi threaded
access to the database.

I came across this when deploying an application in a multi threaded
environment. Solution is to keep the connection in the thread local-data.

Also note that a memory database is not shared between threads and
a hairy workaround with a separate working thread with queues for access
is needed. A memory database could perhaps be disallowed as the dbm is file only?

import threading

class SQLhash(collections.MutableMapping):
    def __init__(self, filename=':memory:', flags='r', mode=None):
        self.__filename = filename
        self.__local = threading.local()
        
        MAKE_SHELF = 'CREATE TABLE IF NOT EXISTS shelf (key TEXT PRIMARY KEY, value TEXT NOT NULL)'
        self.conn.execute(MAKE_SHELF)
        self.conn.commit()
    
    @property
    def conn(self):
        try:
            conn = self.__local.conn
        except AttributeError:
            conn = self.__local.conn = sqlite3.connect(self.__filename)
            self.conn.text_factory = bytes
            
        return conn

ghaering · 2015-08-19T13:06:06Z

This wiki page is out of date. It appears that SQlite is now threadsafe by default: http://www.sqlite.org/threadsafe.html

erlend-aasland · 2022-04-29T14:27:27Z

I would like to try and resurrect this issue using @rhettinger's last submitted code (on bpo). @ambv almost mentioned shelve and sqlite3 in the same sentence in one of his developer-in-residence blog posts last year, so it may be worth it to try and get this up and running. (If not, it will at least be a fun exercise.)

presidento · 2022-12-05T15:23:50Z

@erlend-aasland do you have any knews about this?

erlend-aasland · 2022-12-29T22:24:13Z

@erlend-aasland do you have any knews about this?

Nope, and I've been busy with $work. Raymond has created a duplicate issue, though: see #100414

erlend-aasland · 2023-01-02T21:28:18Z

Raymond and I prefer to close this (stale) issue in favour of #100414; if we're going to do this, I think we can benefit from a fresh start.

Closing this issue as stale and superseded by #100414; please re-open if you disagree.

erlend-aasland · 2024-01-23T22:59:42Z

Just a heads-up; the gh-100414 now has a PR based on a recent patch Raymond posted. Feel free to review it:

gh-100414: Add SQLite backend to dbm #114481

smontanaro added type-feature A feature request or enhancement stdlib Python modules in the Lib dir labels Sep 4, 2008

smontanaro self-assigned this Sep 6, 2008

rhettinger self-assigned this Jan 30, 2009

rhettinger removed their assignment Feb 9, 2009

rute mannequin added type-bug An unexpected behavior, bug, or error and removed type-feature A feature request or enhancement labels Jan 9, 2010

pitrou added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Mar 17, 2010

ezio-melotti transferred this issue from another repository Apr 10, 2022

erlend-aasland added the topic-sqlite3 label May 16, 2022

erlend-aasland added this to sqlite3 issues May 21, 2022

erlend-aasland moved this to Decision needed in sqlite3 issues May 21, 2022

erlend-aasland removed the status in sqlite3 issues May 21, 2022

erlend-aasland mentioned this issue Dec 21, 2022

Add sqlite3 as another possible backing store for the dbm module #100414

Closed

erlend-aasland closed this as not planned Won't fix, can't repro, duplicate, stale Jan 2, 2023

github-project-automation bot moved this to Done in sqlite3 issues Jan 2, 2023

erlend-aasland moved this from Done to Discarded in sqlite3 issues Jan 2, 2023

dbm.sqlite proof of concept #48033

dbm.sqlite proof of concept #48033

Comments

smontanaro commented Sep 4, 2008

smontanaro commented Sep 4, 2008

smontanaro commented Sep 5, 2008

smontanaro commented Sep 5, 2008

smontanaro commented Sep 5, 2008

smontanaro commented Sep 5, 2008

smontanaro commented Sep 5, 2008

smontanaro commented Sep 5, 2008

pitrou commented Sep 5, 2008

smontanaro commented Sep 5, 2008

smontanaro commented Sep 6, 2008

rhettinger commented Sep 6, 2008

josiahcarlson mannequin commented Sep 6, 2008

gpshead commented Sep 7, 2008

josiahcarlson mannequin commented Sep 7, 2008

ghaering mannequin commented Sep 11, 2008

ghaering mannequin commented Sep 11, 2008

smontanaro commented Sep 11, 2008

pitrou commented Sep 11, 2008

ghaering mannequin commented Sep 11, 2008

smontanaro commented Sep 11, 2008

smontanaro commented Sep 11, 2008

ghaering mannequin commented Sep 11, 2008

pitrou commented Sep 11, 2008

pitrou commented Sep 11, 2008

josiahcarlson mannequin commented Sep 11, 2008

smontanaro commented Sep 11, 2008

smontanaro commented Sep 11, 2008

rhettinger commented Feb 3, 2009

pitrou commented Feb 3, 2009

rhettinger commented Feb 3, 2009

rhettinger commented Feb 4, 2009

rhettinger commented Feb 9, 2009

doko42 commented Apr 7, 2009

rute mannequin commented Nov 19, 2009

pitrou commented Nov 29, 2009

rute mannequin commented Jan 9, 2010

ghaering mannequin commented Aug 19, 2015

erlend-aasland commented Apr 29, 2022

presidento commented Dec 5, 2022

erlend-aasland commented Dec 29, 2022

erlend-aasland commented Jan 2, 2023

erlend-aasland commented Jan 23, 2024