Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the status of the TF/HKT approach to avoiding excess type variables? #590

Open
tysonzero opened this issue May 6, 2024 · 15 comments

Comments

@tysonzero
Copy link

I have been looking into alternative database libraries to use, as I've solely been using persistent/esqueleto for a while now. I like a lot of the Opaleye design but I have to say that the a b c d e f ... type variables are quite an eyesore, both directly and in the way that it seems like they will affect lots of the surrounding code and error messages and so on. I noticed Opaleye.TypeFamilies exists but is deprecated, and then there is rel8 which builds on opaleye and uses Column f MyType where f is generally some sort of functor. I'm assuming based on the lack of mention for this stuff in the tutorial that the a b c d e f ... approach is the idiomatic way to do things, but I was wondering if there was a way to avoid that.

@tomjaguarpaw
Copy link
Owner

Hi @tysonzero, I'd be happy to help you get started with Opaleye.

I wouldn't say that "a b c d e f ... type variables style", which I call "fully polymorphic product style", is idiomatic. In fact, Opaleye is completely agnostic about how you structure your data types. As long as you can come up with the ProductProfunctors required by the various operations (the simplest method being writing a Default instance) then you can use whatever structure of data types that you want. Fully polymorphic product style is just the "lowest technology" way to achieve that (and comes at the cost of verbosity).

I initial introduced Opaleye.TypeFamilies as a way of getting better type inference, not as a way of reducing the number of type parameters. Once I discovered how to get better inference of Default instances I implemented runSelectI and deprecated runSelectTF. If you think a style like that would still be useful we could resurrect it, but first I think it's best to zoom out and look at what the requirements are.

Suppose we have a table foo with a primary key integer field, and a text field. We might represent it with a Haskell product type

data FooField =
  FooField (Field SqlInt4) (Field SqlText)

We also need to read it into Haskell values, so we also need a product type like

data FooHaskell =
  FooHaskell Int String

When inserting, the primary key is optional (it can be specified as DEFAULT), so we need a way of omitting it. So we need another product type like

data FooFieldWrite =
  FooFieldWrite (Maybe (Field SqlInt4)) (Field SqlText)

but we probably want to specify Haskell values to insert, so we also need

data FooHaskellWrite =
  FooHaskellWrite (Maybe Int) String

And that set up is absolutely fine by Opaleye. As long as you're willing to write the necessary ProductProfunctor values (or Default instances) (probably by hand) then Opaleye will be perfectly happy with this set up. But it's a bit verbose!

The simplest "low technology" way to deal with the verbosity is to use a fully polymorphic product type for each of the four cases

data Foo a b  = Foo a b

type FooField = Foo (Field SqlInt4) (Field SqlText)
type FooHaskell = Foo Int String
type FooFieldWrite = (Maybe (Field SqlInt4)) (Field SqlText)
type FooHaskellWrite = (Maybe Int) String

I wrote Data.Profunctor.Product.TH to make working with this style easier. It writes the Default instance for you as well as the "adaptor" (which would idiomatically be called pFoo in this case).

But this is not the only possibility! You could define a type family to make this job easier. That's why Opaleye.TypeFamilies did. You could then write

data Foo f = Foo (TableRecordField SqlInt4 Int Opt NN) (TableRecordField SqlText String Req NN)

that defines an optional (i.e. DEFAULTable) field of SqlInt4 that maps to Int and a required (i.e. not DEFAULTable) field of SqlText that maps to String. Then the type synonyms seem superfluous, but for completeness, they were something like

type FooField = Foo O
type FooHaskell = Foo H
type FooFieldWrite = Foo W
-- type FooHaskellWrite = ? I'm not sure there was one for this

Importantly, doing this does not require any access to Opaleye internals! It can be done purely by external clients. Again, Opaleye itself is completely agnostic. (This is one of the reasons that rel8 was able to be build on top of Opaleye.)

Of course, it's not necessarily helpful to be agnostic. Sometimes libraries should be opinionated and codify good practice.


So my question to you is what do you really want to achieve here? If you just want to reduce boilerplate then I'm sure you could start with Opaleye.TypeFamilies, play around, and come up with a better version that could be shipped with Opaleye or as a standalone library. It would be helpful to ship it with some TH that generates the Default instances and adaptor.

@tysonzero
Copy link
Author

That all makes sense, thanks so much for the thorough response!

To directly respond to your final question. I guess what I want to achieve is to write something close to what you have there:

data Foo f = Foo
    { id :: TableRecordField SqlUuid UUID Opt NN
    , name :: TableRecordField SqlText Text Req NN
    , created :: TableRecordField SqlTimestamptz UTCTime Opt NN
    , notes :: TableRecordField SqlText Text Req N
    }

... hopefully minimal boilerplate, ideally with a tutorial demonstrating it for me and other new users ...
... if it can be done without TH rel8-style that'd be neat, but TH is not the end of the world ...

fooTable :: Table (Foo W) (Foo O)
fooTable = table "foo" . pFoo $ Foo
    { id = tableField "id"
    , name = tableField "name"
    , created = tableField "created"
    , notes = tableField "notes"
    }

fooSelect :: Select (Foo O)
fooSelect = selectTable fooTable

I'm hesitant to use Opaleye.TypeFamilies due to the deprecation comment on it, but if there is a future for it that could be an option.

Is there existing material to do the above? I definitely wouldn't rule out contributing in this area in the longer term. However in the short term my time/resource constraints and my lack of familiarity with Opaleye make this unviable, so I just want to figure out the best setup I can get right now. I was leaning towards using rel8 which more or less fits with your external lib commentary since it's built on opaleye, however rel8 is a bit too segfault-y at the moment for me to use it.

@tomjaguarpaw
Copy link
Owner

OK, well you can simply do (almost) exactly that! (See code below). There was a tutorial for the type family style, deleted in ad6f7f7. You can read the version before it was deleted.

If you want to keep using this style then

  • I can promise not to delete Opaleye.TypeFamilies
  • I'll even undeprecate it if you actually like it
  • I can add a couple of boilerplate-reduction synonyms (see below)
  • I'll have a think about how we can generate the type signature on the Default instance too. (It's far too annoying at the moment.)

What I need from you, if we're going down this route:

  • Continuing prompt feedback on your experiences with this style, and my efforts to improve the ergonomics.
{-# LANGUAGE GHC2021 #-}
{-# LANGUAGE UndecidableInstances #-}

import Opaleye (Select, SqlText, SqlUuid, SqlTimestamptz, Table,
                selectTable, tableField, table)
import Opaleye.TypeFamilies (Opt, NN, N, O, W,
                             Req, (:<*>), (:<$>))
import qualified Opaleye.TypeFamilies
import Opaleye.Internal.TypeFamilies (Arr, TC)
import Data.Text (Text)
import Prelude hiding (id)
import Data.UUID (UUID)
import Data.Time (UTCTime)
import Data.Profunctor.Product (ProductProfunctor)
import Data.Profunctor.Product.Adaptor (genericAdaptor)
import qualified Data.Profunctor.Product.Default as D
import GHC.Generics (Generic)
import Data.Kind (Type)

-- We should make this change in Opaleye, to restrict the
-- argument type of TableRecordField
type TableRecordField (f ::Arr Type (TC Type) Type) a b c d =
  Opaleye.TypeFamilies.TableRecordField f a b c d

data Foo f = Foo
    { id :: TableRecordField f UUID SqlUuid NN Opt
    , name :: TableRecordField f Text SqlText NN Req
    , created :: TableRecordField f UTCTime SqlTimestamptz NN Opt
    , notes :: TableRecordField f Text SqlText N Req
    }
  deriving Generic

-- I guess we can add this to Opaleye, for boilerplate reduction.
type AdaptorTF f =
  forall p a b.
    ProductProfunctor p
     => Foo (p :<$> a :<*> b)
     -> p (Foo a) (Foo b)

-- No TH needed for the adaptor. Generic works fine.
pFoo :: AdaptorTF Foo
pFoo = genericAdaptor

-- I guess we can add this to Opaleye, for boilerplate reduction.
type FieldTF p h s n o a b =
  D.Default
    p
    (TableRecordField a h s n o)
    (TableRecordField b h s n o)

-- TH still may be needed here to write out the context
instance ( ProductProfunctor p
         , FieldTF p UUID SqlUuid NN Opt a b
         , FieldTF p Text SqlText N Req a b
         , FieldTF p Text SqlText NN Req a b
         , FieldTF p UTCTime SqlTimestamptz NN Opt a b) =>
  D.Default p (Foo a) (Foo b)
  -- Body derived by Generic

fooTable :: Table (Foo W) (Foo O)
fooTable = table "foo" . pFoo $ Foo
    { id = tableField "id"
    , name = tableField "name"
    , created = tableField "created"
    , notes = tableField "notes"
    }

fooSelect :: Select (Foo O)
fooSelect = selectTable fooTable

@tomjaguarpaw
Copy link
Owner

In fact, to avoid the enormous context on the Default instance, it's probably not crazy to just write out all the instances you need by hand. For example

instance D.Default Unpackspec (Foo O) (Foo O)
instance D.Default FromFields (Foo O) (Foo H)

Those are the specific instances you need to write

fooSelect :: Select (Foo O)
fooSelect = selectTable fooTable

fooRunSelect :: Conn -> IO [Foo H]
fooRunSelect conn = runSelectTF conn fooSelect

@tysonzero
Copy link
Author

tysonzero commented May 7, 2024

Thanks so much for the all the info and ideas!

Hmm, the real solution here IMO is if Haskell had proper extensible anonymous records, however given the lack of them I am warming up a bit more to the fully polymorphic approach. Particularly considering you can always immediately define a type family based alias:

-- Somewhere.hs

type family Column m h f

data Haskells
type instance Column Haskells h f = h

data Fields
type instance Column Fields h f = Field f

-- Foo/Tables.hs

data Foo_ a b c = Foo
    { id :: a
    , name :: b
    , created :: c
    } deriving Show

$(makeAdaptorAndInstance "pFoo" ''Foo_)

type Foo m = Foo_
    (Column m UUID SqlUuid)
    (Column m Text SqlText)
    (Column m UTCTime Sql)
 
 type FooFields = Foo Fields
 
 type FooHaskells = Foo Haskells

I skipped differing read/write types and nullability for simplicity, but adding it to the above is pretty trivial. This makes deriving various instances on Foo quite a bit easier, and enables things like type-changing lenses and partial record updates.

I wish you could just bundle up a b c ... into a data FooTypes = { id :: Type, name :: Type ... } and go from there, but we aren't Agda, and besides at that point you're pretty much just trying to approximate a real extensible records solution.

@tysonzero
Copy link
Author

Given the above I still think the Opaleye.TypeFamilies module could still be useful for standardizing common patterns even if the TH and/or Generics stuff for using it directly on the original type may be less important (but perhaps still worth considering).

type Column :: Mode -> Type -> Type -> Nullability -> Generation -> Type
type family Column mode haskell field nullability generation

data Mode = HaskellRead | HaskellWrite | FieldRead | FieldWrite

data Nullability = NotNull | Null

data Generation = NoDefault | HasDefault | Generated

-- all the appropriate instances for all combinations of the above

This seems pretty similar to the existing machinery, just cleaned up a little bit and with the inclusion of Generated for handling postgres's generated column support. You could also define additional type families for simpler cases, similar to the ' suffixed lens variants.

On a related note readOnlyTableField actually seems fine if it's simply renamed to generatedTableField. When you really think about it postgres doesn't actually have a "read only column" in any meaningful sense (other than generated columns), it's simply an application convention/choice not to write to it, just like it could be an application choice to be allowed to write to a column but only higher values than the current value.

@tomjaguarpaw
Copy link
Owner

Thanks so much for the all the info and ideas!

You're welcome!

the real solution here IMO is if Haskell had proper extensible anonymous records

Right. To put it another way: whatever style of record type Haskell supports conveniently, Opaleye also supports conveniently. As soon as there's a convenient Haskell solution for records with possibly-absent, possibly-Maybe fields then that's an Opaleye solution too. That's what it means for Opaleye to be agnostic in this matter.

I am warming up a bit more to the fully polymorphic approach

Yes, I'd suggest just getting started with that because it's the lowest-overhead way of becoming familiar with the library.

Particularly considering you can always immediately define a type family based alias. This makes deriving various instances on Foo quite a bit easier, and enables things like type-changing lenses and partial record updates.

Indeed.

I wish you could just bundle up a b c ... into a data FooTypes = { id :: Type, name :: Type ... }

I think it there's potential in data FooField = Id | Name | ... and then type family FooType (f :: FooField) where FooType Id = Int; FooType Name = String; ... }, but I haven't worked it through yet. If it does work, that would be another Haskell approach that works fine with Opaleye.

I still think the Opaleye.TypeFamilies module could still be useful ... This seems pretty similar to the existing machinery, just cleaned up a little bit

Yes, that looks exactly like the kind of more polished API we'd need if we're going forward with this type families stuff.

On a related note readOnlyTableField actually seems fine if it's simply renamed to generatedTableField

The problem is that if you declare a field, say the primary key, readOnlyTableField, and you try to update and existing row, the primary key will be set to DEFAULT, which will change it. It will get a new default value! You want it to keep its existing value.

@tysonzero
Copy link
Author

Agreed on everything else, and yes I did dabble a little into that style of type family approach, and whilst I didn't quite get it to a nice spot, I may have just not tried hard enough.

The problem is that if you declare a field, say the primary key, readOnlyTableField, and you try to update and existing row, the primary key will be set to DEFAULT, which will change it. It will get a new default value! You want it to keep its existing value.

Yes with the rename to generatedTableField hopefully it'll be clear to users that if they use it for a non-generated column like a primary key that here be demons, but a comment explicitly saying as much could be useful. I just think it currently has the exactly correct semantics for generated columns so I figured I'd pitch for it to not be formally deprecated but just renamed.

This renamed generatedTableField also works for defining view types with the clearest semantics and least fanfare, as you define every single field to be a generatedTableField and then throw in an lmap absurd to make sure that no DEFAULT, DEFAULT, DEFAULT inserts are attempted either. Using tableField will require additional type annotations, and (required/optional)TableField do work but just feel a bit semantically misleading.

@tomjaguarpaw
Copy link
Owner

Agreed on everything else

Great

I figured I'd pitch for it to not be formally deprecated but just renamed.

Ah, fair enough. I might need a bit more explanation of what you're imagining, but there's no rush since I won't deprecate it until a new major version of Opaleye and there's not one of those on the horizon!

@tysonzero
Copy link
Author

Here is an example of generated column usage and the associated opaleye:

CREATE TABLE "email" (
    "id" uuid NOT NULL DEFAULT gen_random_uuid(),
    "localpart" text NOT NULL,
    "domain" citext NOT NULL,
    "address" text GENERATED ALWAYS AS ("localpart" || '@' || "domain") STORED,
     PRIMARY KEY ("id")
);

should have:

data Email a b c d = Email
    { id :: a
    , localpart :: b
    , domain :: c
    , address : d
    }

$(makeAdaptorAndInstance "pEmail" ''Email)

type EmailFieldWrite = Email
    (Maybe (Field SqlUuid))
    (Field SqlText)
    (Field SqlCitext)
    ()

type EmailFieldRead = Email
    (Field SqlUuid)
    (Field SqlText)
    (Field SqlCitext)
    (Field SqlText)

emailTable :: Table EmailFieldWrite EmailFieldRead
emailTable = table "email" . pEmail $ Email
    { id = optionalTableField "id"
    , localpart = requiredTableField "localpart"
    , domain = requiredTableField "domain"
    , address = generatedTableField "address"
    }

Generated columns + multi-column on update cascade foreign keys let you do a wild amount of "denormalization" with pretty much none of the usual downsides, whilst keeping the performance benefits, and in some cases actually enforcing more constraints than a more traditionally "normalized" schema.

@tysonzero
Copy link
Author

Although with the above said, being able to truly "unhook" a field when calling table could be quite useful. Either on just the write side (a true working readOnlyTableField) or even on both sides (keeping a field/type-variable open for storing additional data from follow up queries / other systems):

data Post a b c d e = Post
    { id :: a
    , title :: b
    , description :: c
    , created :: d
    , comments :: e
    }
$(makeAdaptorAndInstance "pPost" ''Post)

data Comment ...
$(makeAdaptorAndInstance "pComment" ''Comment)

type PostWrite = Post
    ()
    (Field SqlText)
    (Field SqlText)
    ()
    ()

type PostRead = Post
    (Field SqlUuid)
    (Field SqlText)
    (Field SqlText)
    (Field SqlTimestamptz)
    ()

postTable :: Table PostWrite PostRead
postTable = table "post" . pPost $ Post
    { id = readOnlyTableField "id"
    , name = requiredTableField "title"
    , description = requiredTableField "body"
    , created = readOnlyTableField "created"
    , comments = missingTableField
    }

type PostObject = Post
    UUID
    Text
    Text
    UTCTime
    [CommentObject]

I know this has gone sort of off topic, but it all more or less relates to the core idea of taking the most advantage of this fully polymorphic product approach and making people less motivated to reach for the type family approach like I initially was.

@tomjaguarpaw
Copy link
Owner

Generated columns + multi-column on update cascade foreign keys let you do a wild amount of "denormalization"

That's very impressive!

it all more or less relates to the core idea of taking the most advantage of this fully polymorphic product approach

Yes, the fully polymorphic product approach affords a lot of flexibility.

@tysonzero
Copy link
Author

Based on the above I'm not sure the original issue really needs to be addressed. Additional documentation going through these trade offs could be worthwhile though to help new people who go through the same thought process as me.

Perhaps Opaleye.TypeFamilies module could include the summary of the discussion and say something along the lines of "if you still want to use TypeFamilies after considering these tradeoffs, consider making a PR to add them here".

I don't think I'll use type families myself at all, as the decreased verbosity isn't that big of a deal and the additional flexibility is nice.

However I do think it's worth creating issues for a properly working "readOnlyTableField" and "missingTableField", and I guess even "writeOnly(Required|Optional|)TableField" to complete the diagram.

If "readOnlyTableField" is fixed then renaming the existing implementation to "generatedTableField" is not necessary.

@tomjaguarpaw
Copy link
Owner

However I do think it's worth creating issues for a properly working "readOnlyTableField" and "missingTableField", and I guess even "writeOnly(Required|Optional|)TableField" to complete the diagram.

Could you file a separate issue about this, describing all the combinations that are useful to support, in more detail? I still don't fully understand (although what you've explained so far convinces me there's potential in this area).

@tysonzero
Copy link
Author

Created the other issue! Ok so this issue in my mind is now just:

"Document reasons for taking fully polymorphic approach over type families approach in Opaleye.TypeFamilies, with note that contributions are welcome for those that disagree and would like a cleaned up and more automated type family approach"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants