Haskell lacks a serious record system. (The existing mechanism for named fields in data types was always seen as a stop-gap measure.) At the 1999 Haskell Workshop, Mark Jones and Simon Peyton Jones proposed a more fully-fledged record system, closely modelled on the TRex system implemented in Hugs. But Simon never got around to implementing it in GHC. Why not? Mainly because the implementation cost turned out to be relatively high. (We elaborate later.)
In this note we suggest another design for a Haskell record system. It is a little less expressive than the earlier proposal, but (a) it is considerably simpler to implement, and (b) it is rather simpler to explain. In particular, we can translate source programs using records into System Fw, GHC's existing, strongly-typeed intermediate language. This is a major benefit, because it means that the existing transformations and optimisations in GHC's middle end can remain unchanged.
In common with the earlier proposal, this one strictly replaces Haskell's existing named-field design. That is important, because we need to re-use the syntatic forms; although it is also undesirable, beecause existing Haskell 98 programs will not work unchanged.
I'd be very interested to hear from Haskell afficionados about
('a', True) :: (Char, Bool)
("foo", False, 'x') :: (String, Bool, Char)
Here are some record expressions and their types:
{ name = "Fred", married = True } :: { name :: String, married :: Bool }
{ x = 3, y = 4 } :: { x,y :: Int }
Needless, to say, the same field name can occur in many different record types,
thereby avoiding one of the most painful features of Haskell's existing system
which is that record field names cannot be re-used.
Pattern matching works on records, of course:
f1 :: { x,y :: Int } -> Int
f1 { x = x1, y = y1 } = x1 + y1
Unlike Haskell 98, we allow punning:
f2 :: { x,y :: Int } -> Int
f2 { x, y } = x + y
Field selection uses an infix '.', so
f3 :: { x,y :: Int } -> Int
f3 pt = pt.x + pt.y
The '.' is distinguished from function composition in exactly the same way that
Haskell distinguishes the qualified name "M.x" from the composition "M . x"; if you
want function composition, put spaces round the '.'. Record-selection '.' is left-associative, and binds
more tightly even than function application, so that f pt.x means f (pt.x).
Mutter mutter. Is (f 3).y OK? Is the rule "no spaces before the y", or "no spaces on either side of the ."?
Record update uses a "<-" syntax for the update specification:
g :: { x,y :: Int } -> { x,y :: Int }
g pt = pt { x <- pt.y }
An alternative syntax would be this:
g pt = { pt | x <- pt.y }
I think I like this more: the curly braces say "here comes a record",
while the innards say how the record is defined.
Contrast this with Haskell's existing syntax for record update:
g pt = pt { x = y pt } -- The Haskell 98 form
In our new system, the right hand side would be interpreted as the
application of function pt to a singleton record { x = y pt }.
Indeed, the ability to apply a function to a record easily makes the "record as
keyword parameters" idiom very useful:
print :: { width :: Int, mode :: Mode } -> Tree -> String
print { width, mode } t = ....
main = .....(print { mode = LaidOut, width = 80 } my_tree).......
Optionally, like ML, we could regard tuples as records with special
labels '#1', '#2' etc; the only cost is gobbling up more name space.
data Point = Point { x,y :: Int }
Haskell allows strictness annotations in data type declarations, thus:
data Point = Point !{ x,y :: Int }
With this annotation, GHC can lay out the constructor with two fields,
just as if it had been declared like this:
data Point = Point Int IntThe "!" is not especially nice, but it's not necessary for a newtype:
newtype Point = Point { x,y :: Int }
To extract a field from a Point (with either declaration),
you can't just use '.', thus:
f :: Point -> Int f p = p.x + p.ybecause p's type isn't a record type. Instead, we need to get from Point to the record type. Like the earlier proposal, we suggest that every unary constructor C implicitly defines a postfix operator .C which does this. So one could write:
f :: Point -> Int f p = p.Point.x + p.Point.ySimilarly, one could write x.Just to extract the contents of a value (Just v); it would fail if the value was Nothing.
Of course, you could always define your own selector functions, in the style of Haskell 98:
x :: Point -> Int x p = p.Point.x y :: Point -> Int y p = p.Point.yBut if you had many data types, each with an 'x' field, you couldn't do this. And that is one of the things that is broken about Haskell 98 right now.
f3 :: { x,y :: Int } -> Int
f3 pt = pt.x + pt.y
f4 :: (r <: { x,y :: Int }) => r -> Int
f4 pt = pt.x + pt.y
Here, f3 can only be applied to an argument of exactly the type
{x,y::Int}. But f4 can be applied to an argument of
any type r, provided that r is a record type possessing the
specified field. The constraint, or predicate, (r <: {x,y::Int}) expresses
the requirements on r. The lexeme "<:" is a new keyword, in
types anyway. It can be read "has" or "is a subtype of", as you prefer.
The constraint (r <: {x,y::Int}) is equivalent
to the pair of constraints (r <: {x::Int}, r <: {y::Int}).
We can express polymorphic update (a standard awkward case) quite nicely:
swapXY :: (r <: { x,y :: Int }) => r -> r
swapXY pt = pt { x <- pt.y, y <- pt.x }
Our function f4, above, used the '.' operator, but we can use pattern matching too:
f2 :: { x,y :: Int } -> Int
f2 {x,y} = x + y
f5 :: (r <: { x,y :: Int }) => r -> Int
f5 {x,y, ..} = x + y
The ".." indicates the possibility of more fields.
And that is really it! We don't need to talk about row variables or any of that stuff; polymorphism does all what we need.
type Point = { x,y :: Int }
type ColouredPoint = Point + { col :: Colour }
Note the "+" which stands for record union. Just as the type-lambdas that 'type' declarations
introduce are really just macros, which are expanded before the type checker gets
to work, so is "+". That is, "+" is reduced away before the type checker does
anything serious. You could not write
data T a = MkT (a + { x,y :: Int })
because now the "+" could not be reduced away.
Is this a hack?
It should be possible to use these type synonyms in constraints, thus:
move :: (r <: Point) => r -> r
r p = p { x <- p.x + 1, y <- p.y - 1 }
(At this point the subtyping notation (using "<:") begins to look much
more attractive than "has".)
Perhaps we can take a leaf from the derivable-type-classes paper too. Suppose the compiler's built-in support amounted to providing a built-in support for the following instance schemes:
instance Eq (a,(b,c)) => Eq {l::a,m::b,n::c}
instance ShowR (a :*: b :*: One c)) => Show {l::a,m::b,n::c}
-- Calls showR on a suitably-transformed argument
class ShowR r where
showR :: [String] -> r -> ShowS
-- The [String] are the labels
-- Instances for :*:, One, defined by user
Whether the additional programmability is worth it isn't clear.
newtype T m = T { return :: forall a. a -> m a, ...etc... }
So if x::T, then x.T.return must be a polymorphic function. The orthogonal
way to achieve this is to allow records with polymorphic fields (and tuples too,
presumably).
It's not quite clear what this means for inference. I guess that if you see 'x.l', it's only polymorphic if we know the record type for 'x' from the context.
data Decl = DeclType { declName :: String, ... }
| DeclData { declName :: String, ... }
| ...
You can still do this, of course, and you can define a selector too, like this:
declName :: Decl -> String declName (DeclType r) = r.declName declName (DeclData r) = r.declName ...etc...Now the function declName is just like the current Haskell 98 selector. But we'd have lost the convenience of having the compiler generate the code for the selectors. Enter Template Haskell? Or "deriving( declName )"?
f :: (r\size) => { r } -> { r | size :: Int }
f r = { r | size = 3 }
Here f adds an size field to r without knowing
what fields r already has. We cannot do that; we can update existing
fields, but not extend with new fields. Polymorphic extension is what "lacks" predicates
give you.
f :: (r\l) => {r | l::a} -> b -> {r | l::b}
f r x = { r | l = x}
Notice that field "l" in the input to f has a different
type to the same field in the result. (Actually, you can do this in Haskell 98 too.)
You can do a type-changing record update if you know all the fields:
type T a = { l::Int, m::Bool, n::a }
f :: T a -> b -> T b
f r x = { l = r.l, m = r.m, n = x }
One could even imagine some better notation, to avoid the
tiresome repetitition, but it's not clear that it's worth it.
And you could not do type-changing update without knowing all the other fields.
project :: Show (Rec r) => Rec r -> Query (Rel r)Here, 'project' can only be applied to a record type, rather than any Showable type. It's not clear to me whether these extra kind distinctions are worth their price, unless they are needed for other reasons.
We say this is a "major benefit" for two reasons:
data Has r t = H (r -> t) -- Selection
(r -> t -> r) -- Update
get :: Has r t -> r -> t
get (Has g s) = g
set :: Has r t -> r -> t -> r
set (Has g s) = s
f3 :: { x,y :: Int } -> Int
f3 pt = pt.x + pt.y
f4 :: (r <: { x,y :: Int }) => r -> Int
f4 pt = pt.x + pt.y
Here are their translations:
f3 :: (Int,Int) -> Int f3 pt = (fst pt) + (snd pt) f4 :: (Has r Int, Has r Int) -> r -> Int f4 (hx, hy) pt = get hx pt + get hy ptNotice that the monomorphic version, f3 gives more efficient code, because the compiler can bake in the selection operation. In contrast, f4 has to use a selector that is passed in. Uupdate is similar:
swapXY :: (r <: { x,y :: Int }) => r -> r
swapXY pt = pt { x <- pt.y, y <- pt.x }
translates to:
swapXY :: (Has r Int, Has r Int) -> r -> r
swapXY (hx,hy) pt = set hy (set hx pt (get hx pt))
(get hy pt)
The two big points to notice are
What we cannot do is to expose the Int representation. Consider the function
h :: (r <: {a::Int, b::Bool}) => r -> Int
h r = if r.b then a else a+1
We cannot translate it to
h :: (Int, Int) -> r -> Int h (ha, hb) r = if (get hb r) then get ha r else get ha r + 1where ha is the offset of a and hb is the offset of b Reason: because now we have no way to type-check the translation, at least not without getting into dependent types, because we need (get ha r) and (get hb r) to have type Int and Bool respectively, but plainly we can't deduce that from the translate code of h. No, no, no.
Instead, just imagine that Has is an abstract type, known only to the code generator, with primitive operations get and set. The code generator can then generate good (inline) code for applications of get and set. The other thing we need is a way to construct values of type Has. For example:
(h {a=3, b=True})
might translate to:
(let ha = has_1_(Int,Bool)
hb = has_2_(Int,Bool)
in h (ha,hb) (3,True))
Here, has_n_type is an indexed family of constants. So far
as most of GHC is concerned, it's just an opaque constant of the
appropriate tpye, but the code generator knows that it is represented
by the Int index. The type of has_i_(t1, ...tn) is just
Has t1,..tn) ti, of course. GHC already has an indexed family of
constants for foreign calls, so this is not entirely new.
Finally, we need some simple transformations (= rewrite rules):
get has_i_(t1..tn) e
====>
case e of (x1::t1, ..., xn:tn) -> xi
Similarly for set. This is what makes things efficient when
we know the types and there is no polymorphism involved.
(get hx r) + (get hy r)will end up something like
case (get hx r) of { I# x ->
case (get hy r) or { I# y ->
I# (x +# y)
But when the code generator is trying to generate code for the get, it
must evaluate r. Uh oh -- it might do that twice. The simplifier usually
spots such things, but only because evaluation is explicit. So we'd like to make
the primitive get take an evaluated argument. Hmm.