All you need is higher kinded types

{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE DerivingStrategies #-}
{-# LANGUAGE GHC2021 #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE LiberalTypeSynonyms #-}
{-# LANGUAGE PatternSynonyms #-}
{-# LANGUAGE QuantifiedConstraints #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE UndecidableSuperClasses #-}
{-# OPTIONS_GHC -fprint-explicit-foralls -fprint-explicit-kinds #-}
{-# OPTIONS_GHC -fno-show-valid-hole-fits -fdefer-type-errors #-}
{-# OPTIONS_GHC -Wall -Wcompat -Wredundant-constraints #-}

import Data.Kind (Type)
import Data.Void (Void)
import Data.Functor.Identity (Identity (Identity))
import Unsafe.Coerce (unsafeCoerce)

This is a literate Haskell-file. You can open it in GHCi by doing the following:

curl -O https://las.rs/blog/all-you-need-is-hkt-s.lhs
nix develop --impure --expr \
  'with import <nixpkgs> {}; mkShell {
    buildInputs =
      [(haskell.packages.ghc924.ghcWithPackages
        (p: [ p.markdown-unlit ])
      )];
  }' \
  -c ghci -pgmL markdown-unlit all-you-need-is-hkt-s.lhs

Haskellers (and even worse, dependent type brained people) are prone to making complicated types to maximise abstraction and correctness. However, you might be surprised to know that you (technically) need nothing more than ADTs, HKTs, and rank-N types to do almost all of the magic you need.

Core theorem: (Roughly) any valid Haskell term (or of a similar language) typed with type families and GADTs, can be reformulated as a semantically equivalent term that can be typed without type families and GADTs.

What do I mean with semantically equivalent? When erasing types, the new term has the same structure, modulo technically unnecessary wrappings using constructors that ought to be newtype constructors. (You could in fact aleviate this by adding more language features, but that would ruin the point, unless there is a way of doing it in a minimal way.)

Type families → Data families

Type families in Haskell represent uncurriable functions that transform one type-level term into another type-level term.

type family Not b where
  Not 'True = 'False
  Not 'False = 'True

It is in fact possible to transform the above into a data family, and in fact turn any type family into a data family (or something equivalent to avoid the limitation wrt. overlapping data family instances, see end of this post):

type NotD :: Bool -> (Bool -> Type) -> Type
data family NotD b k
newtype instance NotD 'True k = NotDTrue (k 'False)
newtype instance NotD 'False k = NotDFalse (k 'True)

How can this be true? The intuitive and informal proof is that, at the type-level terms of data kinds, though not types themselves, will always be used to construct a type.

We lose no power (modulo GHC bugs and Haskell misfeatures) compared to the type family version. In addition, since the data family uses newtype instances, the term can be kept representationally equivalent.

data Something (b :: Bool) where
  SomethingInt :: Int -> Something 'True
  SomethingDouble :: Double -> Something 'False

flipSomething :: Something b -> Something (Not b)
flipSomething (SomethingInt n) = SomethingDouble $ fromIntegral n
flipSomething (SomethingDouble n) = SomethingInt $ truncate n

flipSomethingD :: Something b -> NotD b Something
flipSomethingD (SomethingInt n) = NotDTrue $ SomethingDouble $ fromIntegral n
flipSomethingD (SomethingDouble n) = NotDFalse $ SomethingInt $ truncate n

flipSomethingD' :: Something b -> Something (Not b)
flipSomethingD' = unsafeCoerce flipSomethingD

You can try running the above function and confirm it works. Representionally, it should be equivalent to flipSomething modulo GHC optimisations that might be inhibitted due to the use of unsafeCoerce.

Data families → GADTs

-- :k NotD
type NotD :: Bool -> (Bool -> Type) -> Type
-- :t NotDTrue
pattern NotDTrue :: forall (k :: Bool -> Type). k 'False -> NotD 'True k
-- :t NotDFalse
pattern NotDFalse :: forall (k :: Bool -> Type). k 'True -> NotD 'False k

In fact, we can formulate the exact same thing, except unfortunately it turns into a data type, hence unnecessary constructor tags and wrappings are added in the representation:

type NotG :: Bool -> (Bool -> Type) -> Type
data NotG b k where
  NotGTrue :: k 'False -> NotG 'True k
  NotGFalse :: k 'True -> NotG 'False k

flipSomethingG :: Something b -> NotG b Something
flipSomethingG (SomethingInt n) = NotGTrue $ SomethingDouble $ fromIntegral n
flipSomethingG (SomethingDouble n) = NotGFalse $ SomethingInt $ truncate n

You can not do the unsafeCoerce trick on this to turn it back into the original flipSomething, given that NotG is (unfortunately) not a newtype.

One thing to note is that we gain slightly more power by turning it into a GADT, notably, we can match on NotG b k and thus find out what b is.

GADTs → ADTs

type NotE :: Bool -> (Bool -> Type) -> Type
data NotE b k
  = b ~ 'True => NotETrue (k 'False)
  | b ~ 'False => NotEFalse (k 'True)

flipSomethingE :: Something b -> NotE b Something
flipSomethingE (SomethingInt n) = NotETrue $ SomethingDouble $ fromIntegral n
flipSomethingE (SomethingDouble n) = NotEFalse $ SomethingInt $ truncate n

However, equalities in constructor constraints are also somewhat exotic. We can remove them by instead using Data.Type.Equality.:~:, but that is in turn defined using a GADT, which we want to avoid. We turn to Leibniz equality:

type (:~:) :: forall k. k -> k -> Type
newtype (:~:) a b = EqPrf (forall p. p a -> p b)

refl :: a :~: a
refl = EqPrf id

absurd :: 'False :~: 'True -> a
absurd _ = error "absurd"

type NotEL :: Bool -> (Bool -> Type) -> Type
data NotEL b k
  = NotELTrue (b :~: 'True) (k 'False)
  | NotELFalse (b :~: 'False) (k 'True)

flipSomethingEL :: Something b -> NotEL b Something
flipSomethingEL (SomethingInt n) =
  NotELTrue refl $ SomethingDouble $ fromIntegral n
flipSomethingEL (SomethingDouble n) =
  NotELFalse refl $ SomethingInt $ truncate n

ex :: Int
ex = case flipSomethingEL (SomethingDouble 4.2) of
  NotELFalse _ (SomethingInt n) -> n
  NotELTrue prf _ -> absurd prf

Peano arithmetic example

data Nat = Zero | Succ Nat

data Vec (n :: Nat) (a :: Type)
  = VNil (n :~: 'Zero)
  | forall n'. VCons ('Succ n' :~: n) a (Vec n' a)

data Plus (x :: Nat) (y :: Nat) (k :: Nat -> Type)
  = PlusZero (x :~: 'Zero) (k y)
  | forall x'. PlusSucc ('Succ x' :~: x) (Plus x' ('Succ y) k)

newtype Flip (f :: a -> b -> Type) (y :: b) (x :: a)
  = Flip (f x y)

twoBools :: Plus ('Succ 'Zero) ('Succ 'Zero) (Flip Vec Bool)
twoBools = PlusSucc refl $
  PlusZero refl $ Flip $ VCons refl True $ VCons refl True $ VNil refl

Compared to the original approach where you’d simply pass in VCons True $ VCons True $ VNil, what we’re doing here is more akin to passing in an execution trace, i.e. the execution of the “type family” is included in the program, and the type checker checks it.

Doing existential quantification through universal quantification

Existential quantification can be expressed as universal quantification instead.

newtype Some f = Some { runSome :: forall r. (forall a. f a -> r) -> r }

some :: f a -> Some f
some x = Some $ \r -> r x

data VUCons' n a n' = VUCons' ('Succ n' :~: n) a (VecU n' a)

data VecU (n :: Nat) (a :: Type)
  = VUNil (n :~: 'Zero)
  | VUCons (Some (VUCons' n a))

data PlusUSucc' x y k x' = PlusUSucc' ('Succ x' :~: x) (PlusU x' ('Succ y) k)

data PlusU (x :: Nat) (y :: Nat) (k :: Nat -> Type)
  = PlusUZero (x :~: 'Zero) (k y)
  | PlusUSucc (Some (PlusUSucc' x y k))

twoBoolsU :: PlusU ('Succ 'Zero) ('Succ 'Zero) (Flip VecU Bool)
twoBoolsU = PlusUSucc $ some $ PlusUSucc' refl $
  PlusUZero refl $ Flip $ VUCons $ some $ VUCons' refl True $
  VUCons $ some $ VUCons' refl True $ VUNil refl

Can we go further?

The answer is yes. I assume you are familiar with the SKI calculus, if not, it essentially works the following way:

s :: (a -> b -> c) -> (a -> b) -> a -> c
s = (<*>)
k :: a -> b -> a
k = pure
i :: a -> a
i = id

Using these three primitives, we can write any LC term (with the same restrictions wrt. the type system used).

How can we leverage this here? We can use the CPS trick above to define newtypes for the above constructs. You could also define the AST as an ADT and then have the interpreter be another ADT (using the Leibniz or GADT trick), but this seems simpler:

type CPS a = (a -> Type) -> Type

infixr 0 ~>
type a ~> b = a -> CPS b

type UnCPS' :: a -> (b -> Type) -> (a ~> b) -> Type
newtype UnCPS' x k f = UnCPS' (f x k)

type UnCPS :: CPS (a ~> b) -> a ~> b
newtype UnCPS x y k = UnCPS (x (UnCPS' y k))

type S2' :: CPS (b ~> c) -> (c -> Type) -> b -> Type
newtype S2' f k x = S2' (UnCPS f x k)

type S2 :: (a ~> b ~> c) -> (a ~> b) -> a ~> c
newtype S2 f g x k = S2 (g x (S2' (f x) k))

type S1 :: (a ~> b ~> c) -> (a ~> b) ~> a ~> c
newtype S1 f g k = S1 (k (S2 f g))

type S :: (a ~> b ~> c) ~> (a ~> b) ~> a ~> c
newtype S f k = S (k (S1 f))

type (<*>) a b = S :@ a :@ b
infixl 4 <*>

type K1 :: a -> b ~> a
newtype K1 x y k = K1 (k x)

type K :: a ~> b ~> a
newtype K x k = K (k (K1 x))

type I :: a ~> a
newtype I x k = I (k x)

infixl 9 :@
type (:@) :: (a ~> c ~> d) -> a -> c ~> d
newtype (:@) x y z k = App (UnCPS (x y) z k)

We use the same CPS trick as used for turning type families into data families, and thus are able to construct the following:

type S :: (a ~> b ~> c) ~> (a ~> b) ~> a ~> c
type K :: a ~> b ~> a
type I :: a ~> a

where a ~> b represents a type-level computation using CPS, i.e. it’s defined as a -> (b -> Type) -> Type. Using the new application operator, :@, we can now move our point~~free~~less programming to the type-level!

type FlipS :: (a ~> b ~> c) ~> b ~> a ~> c
type FlipS = S :@ (S :@ (K :@ S) :@ (S :@ (K :@ K) :@ S)) :@ (K :@ K)

type ComposeS :: (b ~> c) ~> (a ~> b) ~> a ~> c
type ComposeS = S :@ (K :@ S) :@ K

type (.) a b = ComposeS :@ a :@ b
infixr 8 .

type TwiceS :: a ~> (a ~> a ~> b) ~> b
type TwiceS = S :@ (S :@ (K :@ S) :@ (S :@ (K :@ (S :@ I)) :@ K)) :@ K

What does this gain us? Essentially, type-level lambdas without built-in language support. Quite powerful I’d say.

Can we then possibly go a step further? Could we, replace data kinds, with SKI-Scott-encoded versions?

The answer is very unfortunately “no” in Haskell, because universal quantification isn’t erased at the type-level.

-- In fact Church-encoded, but point stands.
type NatS :: Type
type NatS = forall a. a ~> (a ~> a) ~> a

type ZeroS :: NatS
type ZeroS = K

-- Fails horribly!
{-
/my/file:some:place: error:
    • Couldn't match kind ‘a0 -> CPS ((b0 ~> c0) ~> b0)’ with ‘Nat’
      Expected kind ‘Nat ~> Nat’,
        but ‘ComposeS :@ (S :@ I)’ has kind ‘(a0 ~> ((b0 ~> c0) ~> b0))
                                             -> ((a0 ~> ((b0 ~> c0) ~> c0)) -> Type) -> Type’
    • In the type ‘ComposeS :@ (S :@ I)’
      In the type declaration for ‘SuccS’
   |
79 | type SuccS = ComposeS :@ (S :@ I)

-}
-- type SuccS :: NatS ~> NatS
-- type SuccS = ComposeS :@ (S :@ I)

type SI :: forall b. ((b ~> b) ~> b) ~> (b ~> b) ~> b
type SI = S :@ I

{-
/my/file:some:place: error:
    • Couldn't match kind: a0 -> CPS ((b0 ~> b0) ~> b0)
                     with: forall a. a ~> ((a ~> a) ~> a)
      Expected kind ‘(forall a. a ~> ((a ~> a) ~> a))
                     ~> (forall b. b ~> ((b ~> b) ~> b))’,
        but ‘ComposeS :@ SI’ has kind ‘(a0 ~> ((b0 ~> b0) ~> b0))
                                       -> ((a0 ~> ((b0 ~> b0) ~> b0)) -> Type) -> Type’
    • In the type ‘ComposeS :@ SI’
      In the type declaration for ‘SuccS’
   |
78 | type SuccS = ComposeS :@ SI
   |              ^^^^^^^^^^^^^^

-}
type SuccS :: (forall a. a ~> (a ~> a) ~> a) ~> (forall b. b ~> (b ~> b) ~> b)
type SuccS = ComposeS :@ SI

a0 ~> (b0 ~> b0) ~> b0
~
forall a. a ~> (a ~> a) ~> a

Both typings would be valid for the above type-level term, if the forall had been erased. Because it isn’t, and is more akin to an implicit function, GHC can’t do anything, since we don’t have real type-level lambdas, and the point is to avoid type-level lambdas.

Though we have been defeated here, we can still have some fun, albeit I’ve gone back to using real GADTs to help with type (term?) inference:

data VecImpl (n :: Nat) (a :: Type) where
  VImplNil :: VecImpl 'Zero a
  VImplCons :: a -> VecImpl n a -> VecImpl ('Succ n) a

-- Representation is equal to lists so we steal the instance
instance Show a => Show (VecImpl n a) where
  showsPrec = unsafeCoerce $ showsPrec @[a]
  show = unsafeCoerce $ show @[a]
  showList = unsafeCoerce $ showList @[a]

type VecS1 :: Nat -> Type ~> Type
newtype VecS1 (n :: Nat) (a :: Type) (k :: Type -> Type)
  = VecS1 (k (VecImpl n a))

type VecS :: Nat ~> Type ~> Type
newtype VecS (n :: Nat) (k :: (Type ~> Type) -> Type)
  = VecS (k (VecS1 n))

type PlusImpl :: Nat -> Nat ~> Nat
data PlusImpl (x :: Nat) (y :: Nat) (k :: Nat -> Type) where
  PlusImplZero :: (k y) -> PlusImpl 'Zero y k
  PlusImplSucc :: PlusImpl x ('Succ y) k -> PlusImpl ('Succ x) y k

type PlusS :: Nat ~> Nat ~> Nat
newtype PlusS x k = PlusS (k (PlusImpl x))

-- We use SKI stuff to flip stuff around rather than simply using Flip as before
twoBoolsS ::
  (FlipS :@ VecS :@ Bool . PlusS :@ 'Succ 'Zero)
    ('Succ 'Zero)
    Identity

twoBoolsS is what you expect it to be, however, how do we define it? The answer is that we ask GHC to infer it for us. Insert a typed hole, i.e. twoBoolsS = _, and let GHC guide you forward. Because only one constructor is valid, GHC will essentially infer the term for you. When you reach the case of PlusImpl, you need to try both, and one of them will make GHC err because it’s invalid. What does the end result look like?

twoBoolsS = App
  $ UnCPS $ App $ UnCPS $ App $ UnCPS $ App $ UnCPS $ S $ UnCPS' $ S1
  $ UnCPS' $ S2 $ K $ S2' $ UnCPS $ App $ UnCPS $ K $ UnCPS' $ K1 $ UnCPS'
  $ S $ UnCPS' $ S1 $ UnCPS' $ S2 $ App $ UnCPS $ PlusS $ UnCPS' $ PlusImplSucc
  $ PlusImplZero $ S2' $ UnCPS $ K1 $ UnCPS' $ App $ UnCPS $ App $ UnCPS
  $ App $ UnCPS $ App $ UnCPS $ S $ UnCPS' $ S1 $ UnCPS' $ S2 $ App $ UnCPS
  $ K $ UnCPS' $ K1 $ S2' $ UnCPS $ App $ UnCPS $ App $ UnCPS $ S $ UnCPS'
  $ S1 $ UnCPS' $ S2 $ App $ UnCPS $ App $ UnCPS $ S $ UnCPS' $ S1 $ UnCPS'
  $ S2 $ S $ S2' $ UnCPS $ App $ UnCPS $ K $ UnCPS' $ K1 $ UnCPS' $ K $ S2'
  $ UnCPS $ App $ UnCPS $ K $ UnCPS' $ K1 $ UnCPS' $ S $ UnCPS' $ S1 $ UnCPS'
  $ S2 $ K $ S2' $ UnCPS $ K1 $ UnCPS' $ S1 $ UnCPS' $ S2 $ K1 $ S2' $ UnCPS
  $ VecS $ UnCPS' $ VecS1 $ Identity $ VImplCons True $ VImplCons False $ VImplNil

No one said this would be pr*tty. It took me a painstaking amount of time to do the above with GHCi. I assume it would be faster if you used HLS. You could theoretically also completely automate it upto the uninferrable parts, i.e. the two booleans you choose (True and False in my case).

Conclusion

Maybe the above wasn’t a good idea after all. Maybe there was a reason they added type families (surprisingly). I would (unironically) however recommend using data families instead wherever possible as they are much easier to infer types for (given that they are normal data types mostly).

I’m especially annoyed by the fact that many people create associated type families that end in -> Type, yet don’t use a data family. You have absolutely no good reason most of the time!

There is one notable real-world use case however: Given any complex data type, you can apply the above transformations to turn it into something that you can derive Generic for, if you also turn any instance of universal quantification into explicit uses of a Forall type.

You can also handle recursive types by turning recursion into using explicit fixpoints using Fix.

Aside: Handling type family overlapping instances (for matching types)

type family IsUnit (x :: Type) :: Bool where
  IsUnit () = 'True
  IsUnit _ = 'False

data family IsUnitD (a :: Type) (k :: Bool -> Type)
newtype instance IsUnitD () k = IsUnitDY (k 'True)
newtype instance IsUnitD _ k = IsUnitDN (k 'False)

newtype IsUnitD (a :: Type) (k :: Bool -> Type) = IsUnitD (k (IsUnit a))

If however you want to see how to replicate this in languages with fewer features, you could naively do:

data IsUnitG (a :: Type) (k :: Bool -> Type) where
  IsUnitGY :: (k 'True) -> IsUnitG () k
  IsUnitGN :: (k 'False) -> IsUnitG a k

The issue is this is wrong! In the case of (), both constructors are valid. We need an inequality proof:

data IsUnitE (a :: Type) (k :: Bool -> Type)
  = IsUnitEY (a :~: ()) (k 'True)
  | IsUnitEN (a :~: () -> Void) (k 'False)

data SBool (b :: Bool) where
  STrue :: SBool 'True
  SFalse :: SBool 'False

handleIsUnitEY :: IsUnitE () SBool -> SBool 'True
handleIsUnitEY (IsUnitEY _ b) = b
handleIsUnitEY (IsUnitEN p _) = case p refl of

handleIsUnitEN :: IsUnitE Int SBool -> SBool 'False
handleIsUnitEN (IsUnitEY _ _) = error "impossible"
handleIsUnitEN (IsUnitEN _ b) = b

This is one aspect where using explicit equalities is more powerful than using data families and GADTs. If only the above type could be a newtype!

However, there is an issue: It is not clear to me how you would construct a :~: b -> Void for two different types a and b (i.e., the heads when in WHNF differ). Take for example, Int :~: (), which we get from the IsUnitEY case in handleIsUnitEN: Can we somehow reduce this to False :~: True to make use of the “trusted” function absurd? Naive solutions end up with an open branch with Int :~: () in scope again, thus resulting in a cycle, unless you use type families etc., which ruins the point.