Subset algebraic data type, or type-level set, in Haskell

Question

Subset algebraic data type, or type-level set, in Haskell

556 Views Asked by mcmayer At 29 July 2025 at 06:10

Suppose you have a large number of types and a large number of functions that each return "subsets" of these types.

Let's use a small example to make the situation more explicit. Here's a simple algebraic data type:

data T = A | B | C

and there are two functions f, g that return a T

f :: T
g :: T

For the situation at hand, assume it is important that f can only return a A or B and g can only return a B or C.

I would like to encode this in the type system. Here are a few reasons/circumstances why this might be desirable:

Let the functions f and g have a more informative signature than just ::T
Enforce that implementations of f and g do not accidentally return a forbidden type that users of the implementation then accidentally use
Allow code reuse, e.g. when helper functions are involved that only operate on subsets of type T
Avoid boilerplate code (see below)
Make refactoring (much!) easier

One way to do this is to split up the algebraic datatype and wrap the individual types as needed:

data A = A
data B = B
data C = C

data Retf = RetfA A | RetfB B 
data Retg = RetgB B | RetgC C

f :: Retf
g :: Retg

This works, and is easy to understand, but carries a lot of boilerplate for frequent unwrapping of the return types Retf and Retg.

I don't see polymorphism being of any help, here.

So, probably, this is a case for dependent types. It's not really a type-level list, rather a type-level set, but I've never seen a type-level set.

The goal, in the end, is to encode the domain knowledge via the types, so that compile-time checks are available, without having excessive boilerplate. (The boilerplate gets really annoying when there are lots of types and lots of functions.)

Original Q&A

There are 3 best solutions below

**danidiaz** · Answer 1

Define an auxiliary sum type (to be used as a data kind) where each branch corresponds to a version of your main type:

{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE StandaloneKindSignatures #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE DataKinds #-}
import Data.Kind
import Data.Void
import GHC.TypeLits

data Version = AllEnabled | SomeDisabled

Then define a type family that maps the version and the constructor name (given as a type-level Symbol) to the type () if that branch is allowed, and to the empty type Void if it's disallowed.

type Enabled :: Version -> Symbol -> Type
type family Enabled v ctor where
    Enabled SomeDisabled "C" = Void
    Enabled _ _ = ()

Then define your type as follows:

type T :: Version -> Type
data T v = A !(Enabled v "A")
         | B !(Enabled v "B")
         | C !(Enabled v "C")

(The strictness annotations are there to help the exhaustivity checker.)

Typeclass instances can be derived, but separately for each version:

deriving instance Show (T AllEnabled)
deriving instance Eq (T AllEnabled)
deriving instance Show (T SomeDisabled)
deriving instance Eq (T SomeDisabled)

Here's an example of use:

noC :: T SomeDisabled
noC = A ()

main :: IO ()
main = print $ case noC of
    A _ -> "A"
    B _ -> "B"
    -- this doesn't give a warning with -Wincomplete-patterns

This solution makes pattern-matching and construction more cumbersome, because those () are always there.

A variation is to have one type family per branch (as in Trees that Grow) instead of a two-parameter type family.

**chi** · Answer 2

I tried to achieve something like this in the past, but without much success -- I was not too satisfied with my solution.

Still, one can use GADTs to encode this constraint:

data TagA = IsA | NotA
data TagC = IsC | NotC
    
data T (ta :: TagA) (tc :: TagC) where
   A :: T 'IsA  'NotC
   B :: T 'NotA 'NotC
   C :: T 'NotA 'IsC

-- existential wrappers
data TnotC where TnotC :: T ta 'NotC -> TnotC
data TnotA where TnotA :: T 'NotA tc -> TnotA

f :: TnotC
g :: TnotA

This however gets boring fast, because of the wrapping/unwrapping of the exponentials. Consumer functions are more convenient since we can write

giveMeNotAnA :: T 'NotA tc -> Int

to require anything but an A. Producer functions instead need to use existentials.

In a type with many constructors, it also gets inconvenient since we have to use a GADT with many tags/parameters. Maybe this can be streamlined with some clever typeclass machinery.

**leftaroundabout** · Answer 3

Giving each individual value its own type scales extremely badly, and is quite unnecessarily fine-grained.

What you probably want is just restrict the types by some property on their values. In e.g. Coq, that would be a subset type:

Inductive T: Type :=
     | A
     | B
     | C.

Definition Retf: Type := { x: T | x<>C }.
Definition Retg: Type := { x: T | x<>A }.

Well, Haskell has no way of expressing such value constraints, but that doesn't stop you from creating types that conceptually fulfill them. Just use newtypes:

newtype Retf = Retf { getRetf :: T }
mkRetf :: T -> Maybe Retf
mkRetf C = Nothing
mkRetf x = Retf x

newtype Retg = Retg { getRetg :: T }
mkRetg :: ...

Then in the implementation of f, you match for the final result of mkRetf and raise an error if it's Nothing. That way, an implementation mistake that makes it give a C will unfortunately not give a compilation error, but at least a runtime error from within the function that's actually at fault, rather than somewhere further down the line.

An alternative that might be ideal for you is Liquid Haskell, which does support subset types. I can't say too much about it, but it's supposedly pretty good (and will in new GHC versions have direct support).

Subset algebraic data type, or type-level set, in Haskell

There are 3 best solutions below

Related Questions in HASKELL

Related Questions in DEPENDENT-TYPE

Related Questions in ALGEBRAIC-DATA-TYPES

Trending Questions

Popular # Hahtags

Popular Questions