Discussion:
Internal representation of algebraic types
(too old to reply)
luserdroog
2021-06-07 22:53:58 UTC
Permalink
What is the common internal representation for algebraic types?
Say I wanted to add a complex typing system to a simple homebrew
Lisp interpreter, how would I go about doing it? I suppose the contrite
answer is "lists", but does anyone have any more detail or references
to where I can learn about how to build something like this?
Paul Rubin
2021-06-08 01:03:34 UTC
Permalink
Post by luserdroog
What is the common internal representation for algebraic types?
For a product type you just have the components, and for a sum type you
have a cell saying which alternative has been chosen, plus that value.

The standard (though now old) book about FPL implementation is Simon
Peyton Jones "The Implementation of Functional Programming Languages":

https://www.microsoft.com/en-us/research/publication/the-implementation-of-functional-programming-languages/

R. W. Harper's PFPL is newer and (despite having "Practical" in its
title) somewhat more theoretical:

https://www.cs.cmu.edu/~rwh/pfpl/
Luca Saiu
2021-06-20 19:25:16 UTC
Permalink
Post by luserdroog
What is the common internal representation for algebraic types?
Say I wanted to add a complex typing system to a simple homebrew
Lisp interpreter, how would I go about doing it? I suppose the contrite
answer is "lists", but does anyone have any more detail or references
to where I can learn about how to build something like this?
About the general problem of data representation (assuming you care
about efficiency) I like this old survey, which is still the best I
know:
David Gudeman.
``Representing Type Information in Dynamically Typed Languages'',
technical report,
Department of Computer Science, The University of Arizona, 1993.

Nowadays some solutions are more convenient than others; for example
tagging in the low bits is compatible with conservative-pointer-finding
garbage collection, while tagging in the high bits is not. Some people
like NaN-coding.

Algebraic types, as Paul Rubin hints at, are sums of products. The
general complex case will always require boxed values; but the simple
case (equivalent to C enumerates) is important in practice and worth
optimising: it is possible to arrange for a practically unbounded amount
of distinct unique objects. In Lisp you should have already built the
required machinery when implementing Booleans and the empty list, if it
is a different object.

Regards,
--
Luca Saiu http://ageinghacker.net
I support everyone's freedom of mocking any opinion or belief, no
matter how deeply held, with open disrespect and the same unrelented
enthusiasm of a toddler who has just learned the word "poo".
luserdroog
2021-06-26 23:45:53 UTC
Permalink
Post by Luca Saiu
Post by luserdroog
What is the common internal representation for algebraic types?
Say I wanted to add a complex typing system to a simple homebrew
Lisp interpreter, how would I go about doing it? I suppose the contrite
answer is "lists", but does anyone have any more detail or references
to where I can learn about how to build something like this?
About the general problem of data representation (assuming you care
about efficiency) I like this old survey, which is still the best I
David Gudeman.
``Representing Type Information in Dynamically Typed Languages'',
technical report,
Department of Computer Science, The University of Arizona, 1993.
Nowadays some solutions are more convenient than others; for example
tagging in the low bits is compatible with conservative-pointer-finding
garbage collection, while tagging in the high bits is not. Some people
like NaN-coding.
Algebraic types, as Paul Rubin hints at, are sums of products. The
general complex case will always require boxed values; but the simple
case (equivalent to C enumerates) is important in practice and worth
optimising: it is possible to arrange for a practically unbounded amount
of distinct unique objects. In Lisp you should have already built the
required machinery when implementing Booleans and the empty list, if it
is a different object.
Regards,
Thanks a bunch! Paul Rubin's links were closer to what I was asking for,
but this paper is what I've always wanted but didn't know how to ask about!
I've stumbled across, read about, and reinvented many of those techniques
but without any of the careful analysis of performance costs.

I've used large wrappers in my PostScript interpreter (trying to take care
that they're not /too/ large), tagging the low bits in my Lisp interpreter,
and tagging the high bits in my APL interpreter. But my goals have always
been just to be cool or clever, with the metric usually being how compact
the source code could be.

Probably I spent too much time doing code golf.

Loading...