ActiveRecord Truncation does not work when using multiple schemas #6

abhchand · 2017-12-29T03:48:41Z

Hello - thanks for putting together database_cleaner! I've been using the gem for a while and I know maintenance takes effort, so appreciate those who contribute to this project.

I'm using the following setup -

Rails 5.1.0
ActiveRecord 3.0.0
Postgres 9.6

Like many multi-tenant applications, my app utilizes postgres schemas which store parrallel, independent copies of my database tables.

So if I have two tenants (earth, mars) and two tables (users, companies) I would have the following tables in my DB -

schema_migrations
ar_internal_metadata
public.users
public.companies
earth.users
earth.companies
mars.users
mars.companies

When using DatabaseCleaner.strategy = :truncation, it looks for a list of tables in my DB to truncate (see here).

When I run that query against my local test database -

SELECT schemaname || '.' || tablename
FROM pg_tables
WHERE
  tablename !~ '_prt_' AND
  tablename <> 'schema_migrations' AND tablename <> 'ar_internal_metadata'
  AND schemaname = ANY (current_schemas(false))
;
                 ?column?
-------------------------------------------
 public.users
 public.companies
(2 rows)

As you can see, it only returns the public schemas, because of the schemaname = criteria. This effectively doesn't wipe my non-public schema tables, causing run-over between tests.

I understand that there are options to use a transaction instead, or even the only: [..] parameter to list all my tables. But it seems like this should work as is without having to use one of those as a workaround (and if my DB is quite large, a whitelist can get messy and inefficient across thousands of tests).

Would it make sense to update the query logic to something like

SELECT schemaname || '.' || tablename
FROM pg_tables
WHERE
  tablename !~ '_prt_' AND
  tablename <> 'schema_migrations' AND tablename <> 'ar_internal_metadata'
  AND schemaname NOT IN ('pg_catalog', 'information_schema')
;

Thanks!

Related to: https://github.com/DatabaseCleaner/database_cleaner/issues/225

The text was updated successfully, but these errors were encountered:

fcheung · 2018-01-08T13:04:35Z

I've come across a similar thing. For now doing

  def with_all_schemas
    old = ActiveRecord::Base.connection.schema_search_path
    begin
      ActiveRecord::Base.connection.schema_search_path = [
        your_list_of_schemas_here
      ].join(',')
      yield
    ensure
      ActiveRecord::Base.connection.schema_search_path = old
    end
  end

and then wrapping calls to Databasecleaner.clean with with_all_schemas seems to do the trick

deathwish · 2019-09-30T22:37:51Z

Honestly, this is pretty awful.

But the solution above works and can be improved by using

select schema_name from information_schema.schemata where schema_name not like 'information_schema' and schema_name not like 'pg_%'

to fetch all schemas in the active database for cleaning, which is the behavior I expected database_cleaner to exhibit.

alejandro-falkowski-gelato · 2019-12-18T11:51:00Z

For me as I am using sequel was to use search_path as described here https://github.com/jeremyevans/sequel/blob/master/doc/opening_databases.rdoc#postgres

emad-elsaid · 2020-04-16T16:25:49Z

My workaround was to just truncate the tables I need myself after each test example:

  config.after(:each) do
    DatabaseCleaner.clean
    ['schema.tablename'].each do |table|
      ActiveRecord::Base.connection.execute("TRUNCATE #{table}")
    end
  end

botandrose · 2020-04-16T16:47:58Z

Hi, so I'd love to help get this problem solved, but honestly I'm not familiar with with the concept of multiple schemas in PostgreSQL, so I'm going to need some help understanding. Can someone explain why an app might have more than one schema?

Also, about the current clause that is causing the issue: schemaname = ANY (current_schemas(false)):
Looking in the git history, the commit that added this line has the following to say:

commit 77bca129d6031571b4da1edf63b9f5ae93fc6ee2
Author: Billy Watson <[email protected]>
Date:   Tue Jul 22 12:31:03 2014 -0400

    always return Postgres table names with schema to avoid awfulness
    
    - without the schema, the migrations table gets truncated since the default AR method returns ‘public.schema_migrations’
    - also a possibility is truncation of tables other than desired once since without the schema, Postgres has no way of knowing which table to truncate if two schemas have the same table

Can someone unpack this for me?

Finally, in the OP @abhchand has suggested the possible solution of replacing the line with schemaname NOT IN ('pg_catalog', 'information_schema'). Can I get some opinions on this suggestion? Would it address the problem exhibited here, while still addressing the concerns that Billy Watson was trying to solve in his commit?

Adamantish · 2020-04-16T17:31:44Z

Hiya @botandrose . The reason we're using multiple schemata is to do close integration with a proprietary 3rd party application. There are some good reasons not to do it this way (it's a microservices sin and I feel a little bit ill about it) but amongst the advantages we find are...

We get to have multiple services doing a lot of otherwise complex work with remote data but with the code simplicity as if it were local.
We get a lot of the database efficiency being able to do SQL joins between the public data we fully own and the data we don't.
A lot of rails stuff by default is just unaware of the other schemata. Which is good and the right thing in most cases.

I think a general solution which would ensure truncate strategy works might be to count records in all tables everywhere at the start of the whole test suite and truncate all that had zero records.

erlinis · 2020-04-16T20:08:15Z

There are many reasons why some apps use several schemas, but for sure it is a thing.

What Billy Watson was trying to solve is totally fine, but there is one piece in his solution
that is not working as as he is expecting (current_schemas(false), even when the database has multiple schemas current_schemas(false) is only returning {public} because it depends what is set in the search_path.

select current_schemas(false);

 current_schemas
-----------------
 {public}

While if you execute the query suggested by @deathwish you will indeed get all your schemas (excluding the Postgres' ones of course)

 SELECT schema_name FROM information_schema.schemata WHERE schema_name not like 'information_schema' AND schema_name not like 'pg_%';
 
 schema_name
-------------
 public
 archive

So, combining the query above with the current code, it will solve the issue while still addressing the concern that Billy Watson was trying to solve since it now lists the tables in all schemas and still appending the schema name.

SELECT schemaname || '.' || tablename as table_name
  FROM pg_tables
 WHERE tablename !~ '_prt_'  
   AND tablename <> 'schema_migrations' AND tablename <> 'ar_internal_metadata' 
   AND schemaname = ANY (
     	SELECT schema_name 
         FROM information_schema.schemata 
        WHERE schema_name not like 'information_schema' 
          AND schema_name not like 'pg_%'
       );


                table_name
-----------------------------------------
 public.users
 public.questions
 public.responses
 public.versions
 archive.logs
 archive.versions

erlinis · 2020-04-16T20:33:38Z

Or even simpler as @alejandro-falkowski-gelato suggested, setting up the search_path in the database configuration will in fact lists all the desired schemas when running current_schemas(false)

test:
  adapter: postgresql
  host: localhost
  database: db_test
  ...
  schema_search_path: 'schema1,public'

So it will worth at least mention in the docs that you should set the search_path if you expect DatabaseCleaner also truncate the tables in all schemas rather than only public

botandrose · 2020-04-16T20:35:53Z

Thank you, @Adamantish, for the description of your use-case... I think I have a good handle on this now! And thank you, @erlinis, for your superb detective work and proposed solution, which appears to me to be solid.

Is there any reason why one might not want to clean all tables outside of the public schema? Let's say we were to fold this change into a hypothetical v1.8.5 bugfix release of database_cleaner. Would this, in fact, be a breaking change for some users? In that case, would it be better to release it in the form of a new option to the truncation strategy, like all_schemas: true etc?

erlinis · 2020-04-16T21:00:29Z

Hi @botandrose,
A new option to the truncation strategy sounds great, but instead of all_schemas: true/false,
I think would be better to have something like included_schemas: [ 'public', 'schema1'] and if nothing is specified ['public'] is the default value.

This way current behavior won't be affected and the developer will have more control over what clean.

botandrose · 2020-04-16T21:23:42Z

@erlinis I like it! What do others in this thread think? Are you game for taking a stab at it in a PR?

deathwish · 2020-04-16T21:56:01Z

Like the original submitter's per-tenant schema use case, I'd definitely prefer to be able to clean all schemata rather than saying which ones should be cleaned. That said, I agree that being able to specify which schemata are cleaned is also important.

We are in dynamic-language land here, so why not both? Something like schemata: :all is not incompatible with schemata: ['public', 'schema1'], and allows the later addition of additional functionality like schemata: {except: 'schema2'}.

The only other note I have immediately is that this should interact correctly with the cache: true|false option as documented, which also implies that it should be possible to have a non-fixed list of schemata to be cleaned.

botandrose · 2020-04-16T22:40:03Z

@deathwish All great points! Like you, I think :except can be punted on into its own subsequent PR, and agree that this functionality should provide an :all option, for those who don't want to enumerate every schema. Also agree on respecting the :cache option. Thank you for pointing these things out!

jherdman · 2022-04-08T15:25:48Z

Is this issue still relevant with the latest version of this gem?

timscott · 2022-10-17T20:04:38Z

@jherdman - I am on the latest version (2.0.1) and can confirm it's still an issue. @botandrose, anything in the works?

timscott · 2022-10-17T20:37:13Z

This line seems to be the problem.

@only = all_tables.map { |table| table.split(".").last }

For me all_tables contains the following two entries: public.items and events.items. So database cleaner will execute DELETE FROM items twice.

Is there a reason to remove the schema? Seems like a simple bug fix, change this:

all_tables = cache_tables? ? connection.database_cleaner_table_cache : connection.database_tables
@only = all_tables.map { |table| table.split(".").last }

To this:

@only = cache_tables? ? connection.database_cleaner_table_cache : connection.database_tables

kanmeiban · 2024-04-05T02:19:07Z

@timscott A few lines below is

@except += connection.database_cleaner_view_cache + migration_storage_names

which should also be turned into FQN for your suggestion to work.

botandrose transferred this issue from DatabaseCleaner/database_cleaner Feb 18, 2020

Splines mentioned this issue May 7, 2024

Add unit tests back to pipeline & improve VSCode integration MaMpf-HD/mampf#581

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ActiveRecord Truncation does not work when using multiple schemas #6

ActiveRecord Truncation does not work when using multiple schemas #6

abhchand commented Dec 29, 2017

fcheung commented Jan 8, 2018

deathwish commented Sep 30, 2019

alejandro-falkowski-gelato commented Dec 18, 2019

emad-elsaid commented Apr 16, 2020

botandrose commented Apr 16, 2020

Adamantish commented Apr 16, 2020 •

edited

Loading

erlinis commented Apr 16, 2020

erlinis commented Apr 16, 2020

botandrose commented Apr 16, 2020

erlinis commented Apr 16, 2020

botandrose commented Apr 16, 2020

deathwish commented Apr 16, 2020

botandrose commented Apr 16, 2020

jherdman commented Apr 8, 2022

timscott commented Oct 17, 2022

timscott commented Oct 17, 2022

kanmeiban commented Apr 5, 2024 •

edited

Loading

ActiveRecord Truncation does not work when using multiple schemas #6

ActiveRecord Truncation does not work when using multiple schemas #6

Comments

abhchand commented Dec 29, 2017

fcheung commented Jan 8, 2018

deathwish commented Sep 30, 2019

alejandro-falkowski-gelato commented Dec 18, 2019

emad-elsaid commented Apr 16, 2020

botandrose commented Apr 16, 2020

Adamantish commented Apr 16, 2020 • edited Loading

erlinis commented Apr 16, 2020

erlinis commented Apr 16, 2020

botandrose commented Apr 16, 2020

erlinis commented Apr 16, 2020

botandrose commented Apr 16, 2020

deathwish commented Apr 16, 2020

botandrose commented Apr 16, 2020

jherdman commented Apr 8, 2022

timscott commented Oct 17, 2022

timscott commented Oct 17, 2022

kanmeiban commented Apr 5, 2024 • edited Loading

Adamantish commented Apr 16, 2020 •

edited

Loading

kanmeiban commented Apr 5, 2024 •

edited

Loading