Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API question: cursors and databases #154

Open
os12 opened this issue Jan 15, 2018 · 11 comments
Open

API question: cursors and databases #154

os12 opened this issue Jan 15, 2018 · 11 comments

Comments

@os12
Copy link

os12 commented Jan 15, 2018

Hi Dmitry, I've started stated testing Sophia and hit an issue: a cursor-based walk returns keys across databases. Assume two databases, "db1" and "db2". One of them is opened and keys are fetched using a cursor:

    void *env = sp_env();

    sp_setstring(env, "sophia.path", ".store", 0);
    sp_setstring(env, "db", db_name.c_str(), 0);
    if (sp_open(env) == -1)
        return FromStringError(LastSophiaError(env));

    void *db = sp_getobject(env, ("db." + db_name).c_str());
    if (db == nullptr)
        return FromStringError(LastSophiaError(env));

    void *cursor = sp_cursor(env);
    void *doc = sp_document(db);
    sp_setstring(doc, "order", ">=", 0);

    while ((doc = sp_get(cursor, doc))) {
        int size;
        auto key_ptr = static_cast<const char *>(sp_getstring(doc, "key", &size));
        ...
        ...
    }

Am I missing something here? Do cursors go across databases? If so, how do I provide an additional filter to limit iteration to a single DB?

Basically, I can achieve the behavior I want by splitting the databases at a higher level this way:

    sp_setstring(env, "sophia.path", (".store/" + db_name).c_str(), 0);
    sp_setstring(env, "db", "mydb", 0); // hard-coded name

Could you clarify the design behind the aforementioned "path "and "db" settings please?

Thanks a lot!
Oleg.

@pmwkaa
Copy link
Owner

pmwkaa commented Jan 16, 2018

Hi, not quite sure what the issue in here. There is no way records can overlap this way, only if you use different db pointers during sp_document() creation. Are you sure it sets correct name for sp_getobject()?

sophia.path specifies main directory, where all databases will be stored by name. For example, db.mydb will be placed in sophia.path/mydb

@os12
Copy link
Author

os12 commented Jan 16, 2018

Right, both the cursor-based "read" path and the transaction-based "write" path start with the same code (the first fragment). I do see per-database subdirectories created inside .store as well as log (which seems to contain the last few mutations).

I know you have tests that cover multi-db mutations. Yet, do you have a multi-db cursor test?

@pmwkaa
Copy link
Owner

pmwkaa commented Jan 17, 2018

Could you please specify your work case you are trying to achieve? One cursor is designed to iterate only one database at a time. If you need to make a join between databases, you need to open a second cursor, and so on.

@os12
Copy link
Author

os12 commented Jan 17, 2018

Could you please specify your work case you are trying to achieve? One cursor is designed to iterate only one database at a time

Yes, that's what I expected. Yet the cursor returns keys from an adjacent DB, which is very surprising.

@os12
Copy link
Author

os12 commented Jan 19, 2018

Please see #156 for the repro - a cursor-based walk in DB2 should find no keys (yet it fetches everything from the DB1).

@pmwkaa
Copy link
Owner

pmwkaa commented Jan 22, 2018

Now i understand what you meant. This is known issue, which might be quite confusing. Sophia does not have persistent catalog of created databases, they must be predefined each time before sp_open(). This is done to support easier Sophia integration with DBMS's which have its own database catalog.

Open procedure will find and recover every specified database directory by its name. After that it will reply write-ahead log records. To identify database in write-ahead log db.name.id is used: http://sophia.systems/v2.2/conf/db.html. If database is not specified it will be set in sequential order of database definition.

@os12
Copy link
Author

os12 commented Jan 22, 2018

Umm... I don't follow. Is there a way for me do deal with this issue?

  • My application needs to read/write distinct DBs at random times (otherwise, I'd have to impose prefixes onto record names to fake that)
  • The "read" is usually started by opening a DB by name and then listing records using a cursor (they must be ordered).

If this does not work, I will have to create distinct directories to isolate the DBs... Are there downsides?

@pmwkaa
Copy link
Owner

pmwkaa commented Jan 22, 2018

Sophia has a single write-ahead log which is shared between databases to efficient support cross-database transactions. To match a database in the log db.name.id is used.

sp_open() will fail if it will not find a defined database during log reply. In other words, scheme must be exactly the same every time you open environment. Basically, if you define mydb0 and mydb1, next time you need to define them again in the same order (or set db.name.id to distinct them).

Keep in mind, that open operation is very time consuming. Keep a database open, if you need good timing.

@os12
Copy link
Author

os12 commented Jan 22, 2018

OK, I've extended the code to set db.db1.id bit it's failing with this:

sophia/runtime/sr_conf.c:100 db.db1.id is read-only

Could you take a look at my second commit in #156 please? What am I missing?

@os12
Copy link
Author

os12 commented Jan 22, 2018

Also, let me use this comment to collect misc notes/comments:

  1. [minor] The list of options at http://sophia.systems/v2.2/conf/db.html is hard to understand as it mentions things like db.name.id
    • it should really say something like db.$(db_name).id as the other string is meaningless
  2. [major] The across-db nature of the log file makes things very tricky:
    • it should be documented somewhere in BOLD
    • these DBs are not actually independent (as they share the write-ahead log) and act more like namespaces
    • the code is hard to use in environments that create databases on the fly
      • the user must maintain an ordered list of DB names and open the entire set every time
      • a new DB can only be added during Environment initialization and only as a last one
        • sp_open() is expensive and so the DBs should not be closed
        • yet the environment must be torn down and recreated in order to add a new DB

@pmwkaa
Copy link
Owner

pmwkaa commented Jan 24, 2018

My fault, sorry about that. I completely forgot that custom id generation scheme been removed in v2.2 (so recommended scheme was actual only for previous version). Right now, the only way to make this work is to always define same databases in same order. I was planning to completely remake it.

It is also possible to force compaction and wait for its completion, this will guarantee that write-ahead records will not have database references. This will also increase open time dramatically.

Could you describe your use-case in more details? i might have some thoughts how to make it in optimal way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants