-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL schema specification inconsistencies #133
Comments
Thank you very much for such a detailed bug report. And I'm very happy to see you experimenting with codd like in the article too! Also, I still need to reply to your post in Discourse, which I hope to do Sunday or early next week. I wasn't aware of this issue, so thanks for figuring it out. The solution to it seems to be very simple - codd could just I haven't merged and released it yet because I'd like to add a test for this first, and double-check I'm not missing something. But fixing this is my next top priority, and I hope to get it done next week because one or two weeks after that I'm afraid I'll be very busy. In the meantime, I'm attaching a statically compiled version with that change, hoping it might do the trick for you until a proper release comes along. You can also look at the code changes (it includes other non-related stuff for now) at #134. |
Thank you! I had been thinking about the same way to resolve the issue. I was unsure about how the search path is scoped, however. Investigating, the
The term "session" is unfortunately not explicitly defined. A Stack Overflow answer concisely defines it as "synonymous with a TCP connection" with links to various supporting content. We set the search path to the empty string using a The
We are not doing this within a transaction, correct? In this case, the Using a transaction would also be preferable to ensure that the selected data is consistent, IMHO. I completely empathize with having a busy schedule. 😄 I really appreciate your time and efforts! |
Good point. By default, the function that builds the schema from the database runs in a transaction before committing, but perhaps only if all pending migrations are
The only reason is odd parsing rules for
But using |
I had been thinking that
Good idea!
Thank you for the explanation. If I wonder if restoring the previous value is necessary in the actual case? It is if subsequent commands are executed within the same transaction, but perhaps reading the schema state is always done at the end of transactions? I guess restoring the value is a good idea regardless, for safety in case the function is used differently by somebody. One small suggestion is to implement a function |
Yep, that's what I'm thinking, too.
Sounds good. I'll start working on this asap. Still, it might take me some time given there's a lot happening - I'm moving to a different place, which means a couple of weeks of reduced free time in my evenings. But it's only a matter of time. Hopefully the executable I shared in the other comment works for you temporarily? |
Sounds good. No worries about the timing. Good luck with your move! I have not started building Codd yet. The environment where I use it is a Reflex FRP project that is still using GHC 8.6.5 on an old By the way, I did an experiment with PostgreSQL upgrades and wrote in my blog about it: Codd Experiment 2: PostgreSQL Upgrade. It might be worthwhile to add functionality to Codd to help verify such upgrades, but it is also possible to verify them manually. I created an issue regarding the error messages (#136). I will create issues for the usability suggestions, but there is a blog entry that I would like to write first so that I can reference it. That blog entry is also a prerequisite for a possible PostgreSQL upgrade issue, and it relates to managing multiple environments as well. EDIT: I was able to finish that prerequisite blog entry: Codd Design Ideas. |
I'm finally kind of back. Sorry it took so long, but I had more issues than I expected to fix in the new place. Since I'm still dealing with some remnants of moving, I'm still not back full time. In light of that, I think I'll worry about getting a small PR that fixes this issue in first, and work on enforcing being inside a transaction in the call signature in a second PR. Also, I'll make sure to arrange time to read your blog entry on design ideas, and thanks once again for experimenting and writing about your thoughts! |
Hi! No worries! First resolving the path issue and then ensuring that the database state is retrieved within a transaction sounds like a good plan. I had some more design ideas and documented them in a Codd Design Ideas (Part 2) blog entry. In summary, storing hashes of the JSON files ("snapshots") that represent the database state makes it possible to check equality with any recorded state. It is then possible to not commit snapshots ( I had some time last week to start a prototype of these ideas, and I hope to find time to work on it again sometime soon. It is based on snapshots of database state but is otherwise very different from how Codd currently works. You are of course welcome to use any ideas that you want, but I totally understand if you do not want to make such huge design changes. Perhaps implementing in a separate tool is more appropriate. I worked on the snapshot queries a bit, and I now much better appreciate the work that you have put into those. Nice job! I have mentioned elsewhere that I would like to just manage the database schema and not include roles/permissions. I was thinking that it might be better to check database settings, roles, and permissions separately. I have since realized that ownership complicates this idea, and I am still thinking about this. One thing that I do not yet fully understand is inclusion of grantors. It seems to me that the role used to grant permissions is not significant. For example, perhaps development servers use the default superuser |
Good to hear that. I have such a PR ready at #134. I've been working on codd by myself since its inception, but I thought I should ask if you would like to review PRs. Please feel free to say you don't want to, don't have time, or anything (or for no reason at all). In case your answer is "No" to this PR, also feel free if there are specific PRs you would like to review in the future too.
I think we are currently discussing to a greater extent your ideas in another issue, so I won't comment about them here. In short I find them very very interesting and hope we can find some way to fit them into codd, but I think both the current and your proposed workflows have pros and cons (unless I misunderstand yours), so I'd like to think we can keep both without making a mess of codd. Like I said in the other issue, we may want to open an issue to discuss your model and how to fit it into codd.
Thanks :)
There are some DDL statements which can make a difference if you ignore grantors. One example I can think of is if you run: set role grantor_1; revoke select on table employee from some_user;
set role grantor_2; revoke select on table employee from some_user;
set role some_user; select * from employee;
So codd needs to differentiate grantors to make sure the schemas differ in these two cases, or else it wouldn't hold its promise that if you have the same schema, the same query and the same data, you get the same results.
In the shop I work at we decided to recreate the users that exist in our cloud postgres offering locally. They're not exactly the same since postgres cloud offerings are slightly modified from vanilla postgres, but we've had good success with that model. |
Looks good! I am happy to review when I have time, but I cannot guarantee prompt availability. One thing that I have been thinking about is use of the Codd library from other applications. For example, the application that I am working on has a development mode that automatically performs migrations (to a Gargoyle-managed local database) on start. Many assumptions can be made when the library is just used by the project executable ( IMHO, exception management is one of the most tricky things about Haskell, primarily because exceptions are not managed using types. The current implementation of
Some people avoid default names for administrative users, just to provide one more hurdle for attackers. When password authentication is used, attackers may attempt to brute-force the password for the (I am careful about usernames in publicly-accessible services, of course, and I always make sure to set |
Thanks! And no worries with the availability. I'll keep the PR open waiting for your review, but don't hesitate to let me know if you change your mind, and don't feel pressured to be thorough if you want to review; PRs would've been merged without any review before your arrival anyway.
We currently use codd as a library in the shop I work at, but I'm embarrassed to admit I never really put much effort into making the codebase reasonable for that use-case. I'm not sure what you have in mind here. Can you elaborate a bit more? Feel free to create separate issues for this stuff too.
I think it might be a bit more complicated than that to make this safe against exceptions, since async exceptions make things more difficult and postgresql-simple does not handle them very well just yet. See haskellari/postgresql-simple#69. I'm also the author of a PR trying to improve that (and there's a lot of conversation in that PR from people more experienced than me suggesting this is hard), but I think even if that PR were merged, we'd still have to be very careful when handling exceptions here. If e.g. the code throws a Granted, I'm guilty of writing queries in exception handlers carelessly in a few places in codd, and even postgresql-simple is, but maybe this is something to be considered with more care before we implement it?
Seems sensible. I think that can be replicated in development environments too, though. Still, I don't want to shield myself from what setups out there in the real world might do, so it sounds more and more like some custom post-processing of codd's json schema should be allowed, or custom schema comparison knobs like some you've suggested before. |
I have not tried to integrate Codd yet, so I am afraid that I do not yet have anything concrete. Generally, I think it would be nice to have a library API that allows core tasks to be integrated into other software. The project that I am working on currently uses a different migration system, and it automatically performs migrations on server start when in development mode. I currently have a proof-of-concept that uses Codd, and it simply runs a At a previous job, I have worked on "scripted" multi-step migrations. I have no plans to do that again any time soon, but it is another example of when the Codd library could be helpful. A Haskell program could coordinate migrations, software upgrades, tests, error monitoring, rollbacks, notifications, etc.
Thank you very much for the links and explanation! This is very informative, and I now agree that your current implementation (without Thank you! |
Codd seems to have issues with schema specifications in SQL depending on the search path. For example, consider the following:
This code specifies the
public
schema, but doing so is not required when that schema is at the front of the search path. Indeed, the default search path is"$user", public
and specifyingpublic
is not required (when a schema named after the current user does not exist). Note that some people customize the search path to include their schema, so this is not just an issue withpublic
.PostgreSQL dump files created with
pg_dump
set the search path to the empty string in order to avoid potential issues with an unknown (customized) search path, and they consistently specify the schema. Codd can add a dump file without issue, but verification then results in discrepancies due to inconsistent schema specifications when a schema that is in the search path is used.As an example, please see my debugging log below.
(1) Create the test schema
Run a test database:
Create a test project:
Configure settings in
config/codd-dev.sh
:Load settings in the current shell:
Create a migration in
0001-init-codd.sql
that allows Codd to initialize a database without having to create it (reference):Add the migration, initializing the database:
Create a migration in
0002-init-schema.sql
that initializes the schema:Note that specifying
public
here does not cause any problems.Add the migration:
Verify the schema:
Dump the schema to
dump.sql
:(2) Start over using the dumped schema
Stop the database and reset the project schema/migrations:
Start a new (empty) database:
Add the first migration, initializing the database:
Add the dumped schema:
Verifying the schema now results in an error:
Query the expected
email_employee__id_fkey
state:The difference is the schema specification:
Query the expected
employee.id
state:The difference is the schema specification:
(3) Start over using a dumped schema that is patched
Stop the database and reset the project schema/migrations:
The dump file includes the following settings:
This is the setting that clears the search path (on line 13):
Create a patched dump with this setting commented out:
Start a new (empty) database:
Add the first migration, initializing the database:
Add the patched dump:
Verify the schema:
The text was updated successfully, but these errors were encountered: