Florian Weimer
2011-02-26 16:11:23 UTC
There is a well-known syntactic ambiguity in pattern matching: it is
unclear where identifier denotes a data constructor, a bound
variable, or a free variable. This is a problem if a datatype is
changed because exhaustiveness checking does not reliably identify all
places which need updating.
I know about several different solutions to this problem.
The SML way: An identifier which is has been bound as a data
constructor cannot be used in a pattern anymore. This is a non-local
property which cannot be inferred without analyzing all dependencies
(particularly due to the "open" construct). No attempt is made at
reducing ambiguities. Some implementations check for redundant
patterns, but this warning does not fire in all relevant cases.
The Erlang way: Erlang hasn't got datatypes, so this aspect does not
arise. However, it is possible to parametrize the pattern by
referencing bound variables. The submatch only succeeds if the
subvalue matches the value of the variable. This is a localized
property which can be statically inferred by looking at one file in
isolation in Erlang because variables do not leak across top-level
function boundaries.
The Haskell and Caml way: Data constructors and variable identifiers
occupy distinct lexical categories, and patterns always introduce new
bindings.
The Ada way: Do not implement pattern matching at all, and provide a
syntactically distinct alternative for the default case.
The shell way: Prefix all variable names with "$".
Creating syntactically distinct categories seems to be the best
approach. However, it does not work if you're introducing pattern
matching to an existing language where variable identifiers cannot be
restricted conveniently.
One approach is absent from the list above: syntax in the pattern (and
only in the pattern) to indicate whether an identifier introduces a
new variable, or refers to an existing data constructor (or perhaps
variable). ML-like languages introduce variables through pattern
matching, so this is not substantially different from the Haskell or
shell approach. But is there a non-ML language where something like
this has been implemented?
unclear where identifier denotes a data constructor, a bound
variable, or a free variable. This is a problem if a datatype is
changed because exhaustiveness checking does not reliably identify all
places which need updating.
I know about several different solutions to this problem.
The SML way: An identifier which is has been bound as a data
constructor cannot be used in a pattern anymore. This is a non-local
property which cannot be inferred without analyzing all dependencies
(particularly due to the "open" construct). No attempt is made at
reducing ambiguities. Some implementations check for redundant
patterns, but this warning does not fire in all relevant cases.
The Erlang way: Erlang hasn't got datatypes, so this aspect does not
arise. However, it is possible to parametrize the pattern by
referencing bound variables. The submatch only succeeds if the
subvalue matches the value of the variable. This is a localized
property which can be statically inferred by looking at one file in
isolation in Erlang because variables do not leak across top-level
function boundaries.
The Haskell and Caml way: Data constructors and variable identifiers
occupy distinct lexical categories, and patterns always introduce new
bindings.
The Ada way: Do not implement pattern matching at all, and provide a
syntactically distinct alternative for the default case.
The shell way: Prefix all variable names with "$".
Creating syntactically distinct categories seems to be the best
approach. However, it does not work if you're introducing pattern
matching to an existing language where variable identifiers cannot be
restricted conveniently.
One approach is absent from the list above: syntax in the pattern (and
only in the pattern) to indicate whether an identifier introduces a
new variable, or refers to an existing data constructor (or perhaps
variable). ML-like languages introduce variables through pattern
matching, so this is not substantially different from the Haskell or
shell approach. But is there a non-ML language where something like
this has been implemented?