What's List Comprehension and Why is it Harmful?

Discussion:

(too old to reply)

Xah Lee

2010-10-18 09:11:07 UTC

here's a article about list comprehension.

• 〈What's List Comprehension and Why is it Harmful?〉
http://xahlee.org/comp/list_comprehension.html

i wrote it from a recent debate.

I hope those of you computer scientists and language designers think
about this and avoid list comprehension in your language.

plain text version follows.

--------------------------------------------------
What's List Comprehension and Why is it Harmful?

Xah Lee, 2010-10-14

This page explains what is List Comprehension (LC), with examples from
several languages, with my opinion on why the jargon and concept of
“list comprehension” are unnecessary, redundant, and harmful to
functional programing.
What is List Comprehension?

Here's a example of LC in python:

S = [2*n for n in range(0,9) if ( (n % 2) == 0)]
print S
# prints [0, 4, 8, 12, 16]

It generates a list from 0 to 9, then remove the odd numbers by 「( (n
% 2) == 0), then multiply each element by 2 in 「2*n」, then returns a
list.

Python's LC syntax has this form:

[myExpression for myVar in myList if myPredicateExpression]

In summary, it is a special syntax for generating a list, and allows
programers to also filter and apply a function to the list, but all
done using expressions.

In functional notation, list comprehension is doing this:

map( f, filter(list, predicate))

Other languages's LC are similiar. Here are some examples from
Wikipedia. In the following, the filter used is 「x^2 > 3」, and the
「2*x」 is applied to the result.
javascript

Number.prototype.__iterator__ = function() { for (let i = 0; i < this;
i++) yield i}
var s = [2*i for (i in 100) if (i*i > 3)]

Haskell

s = [ 2*x | x <- [0..], x^2 > 3 ]

F#

seq { for x in 0..100 do if x*x > 3 then yield 2*x } ;;

OCaml

[? 2 * x | x <- 0 -- max_int ; x * x > 3 ?];;

Clojure

(take 20 (for [x (iterate inc 0) :when (> (* x x) 3)] (* 2 x)))

Common Lisp

(loop for x from 1 to 20 when (> (* x x) 3) collect (* 2 x))

Erlang

S = [2*X || X <- lists:seq(0,100), X*X > 3].

Scala

val s = for (x <- Stream.from(0); if x*x > 3) yield 2*x

Here's how Wikipedia explains List comprehension. Quote:

A list comprehension is a syntactic construct available in some
programming languages for creating a list based on existing lists.

The following features makes up LC:

* (1) A flat list generator, with the ability to do filtering and
applying a function.
* (2) Usually, a special syntax in the language.
* (3) The syntax uses expressions, as opposed to using functions
as parameters.

Why is List Comprehension Harmful?

• List Comprehension is a bad jargon; thus harmful to functional
programing or programing in general. It hampers communication, and
encourage mis-understanding.

• List Comprehension is a redundant concept in programing. What List
Comprehension does is simply 「map(func, filter(list, predicate))」.

• The special syntax of List Comprehension as it exists in many langs,
are not necessary. If a special purpose function for it is preferred,
then it can simply be a plain function, e.g 「LC(function, list,
predicate)」.
Map + Filter = List Comprehension Semantics

The LC's semantics is not necessary. A better way and more in sync
with functional lang spirit, is simply to combine plain functions:

map( f, filter(list, predicate))

Here's the python syntax:

map(lambda x: 2*x , filter( lambda x:x%2==0, range(9) ) )
# result is [0, 4, 8, 12, 16]

In Mathematica, this can be written as:

Map[ #*2 &, Select[***@9, EvenQ]]

In Mathematica, a arithemetic operation can be applied to list
directely without using Map explicitly, so the above can be written
as:

Select[***@9, EvenQ] * 2

in my coding style, i usually write it in the following syntactically
equivalent forms:

(#*2 &) @ (Select[#, EvenQ]&) @ Range @ 9

or

9 // Range // (Select[#, EvenQ]&) // (#*2 &)

In the above, we sequence functions together, as in unix pipe. We
start with 20, then apply “Range” to it to get a list from 1 to 9,
then apply a function that filters out all numbers not greater than 3,
then we apply a function to multiply each number by 2. The “//” sign
is a postfix notation, analogous to bash's “|”, and “@” is a prefix
notation that's the reverse of “|”.

(See: Short Intro of Mathematica For Lisp Programers.)
List Comprehension Function Without Special Syntax

Suppose we want some “list comprehension” feature in a functional
lang. Normally, by default this can be done by

map(func, filter(inputList, Predicate))

but perhaps this usage is so frequent that we want to create a new
function for it, to make it more convenient, and perhaps easier to
make the compiler to optimize more. e.g.

LC(func, inputList, Predicate)

this is about whether a lang should create a new convenient function
that otherwise require 3 function combinations. Common Lisp vs Scheme
Lisp are the typical example of extreme opposites.

Note, there's no new syntax involved.

Suppose, someone argues that

For instance, this is far more convenient:

[x+1 for x in [1,2,3,4,5] if x%2==0]

than this:

map(lambda x:x+1,filter(lambda x:x%2==0,[1,2,3,4,5]))

How about this:

LC(func, inputList, P)

compared to

[func for myVar in inputList if P]

the functional form is:

* Shorter
* Not another idiosyncratic new syntax

Issues and Decisions on Creating a New Function

Suppose we decided that generating list by a filter is so frequently
used that it worth it to create a new func for it.

LC(func, inputList, Predicate)

Now, in functional langs, in general a design principle is that you
want to reduce the number of function unless you really need it.
Because, any combination of list related functions could potentially
be a new function in your lang. So, if we really think LC is useful,
we might want to generalize it. e.g. in

LC(func, inputList, Predicate)

is it worthwhile say to add a 4th param, that says return just the
first n? (here we presume the lang doesn't support list of infinite
elements) e.g.

LC(func, inputList, Predicate, n)

what about partition the list to m sublists?

LC(func, inputList, Predicate, n, m)

what about actually more generalized partition, by m sublist then by
m1 sublist then by m2 sublist?

LC(func, inputList, Predicate, n, list(m,m1,m2,...))

what about sorting? maybe that's always used together when you need a
list?

LC(func, inputList, Predicate, n, list(m,m1,m2,...), sortPredcate)

what if actually frequently we want LC to map parallel to branches?
e.g.

LC(func, inputList, Predicate, n, list(m,m1,m2,...), sortPredcate,
mapBranch:True)

what if ...

you see, each of these or combination of these can be done by default
in the lang by sequencing one or more functions (i.e. composition).
But when we create a new function, we really should think a lot about
its justification, because otherwise the lang becomes a bag of
functions that are non-essential, confusing.

So the question is, is generating a list really that much needed? And,
if so, why should we create a special syntax such as 「[ expr for var
in list if P]」 than 「 LC(func, list, P)」?

Also note, that LC is not capable of generating arbitrary nested list.
For a example of a much powerful list generator that can generate
arbitrary nested tree, see:

* http://reference.wolfram.com/mathematica/ref/Table.html
* Tree Functions: Table

For those who find imperative lang good, then perhaps “list
comprehension” is good, because it adds another idiosyncratic syntax
to the lang, but such is with the tradition of imperative langs. The
ad hoc syntax aids in reading code by various syntactical forms and
hint words such as “[... for ... in ...]”.
Bad Jargon and How To Judge a Jargon

Someone wrote:

The term “list comprehension” is intuitive, it's based on math set
notation.

The jargon “list comprehension” is opaque. It hampers communication
and increases misunderstanding. A better name is simply “list
generator”.

What's your basis in saying that “List Comprehension” is intuitive?
Any statics, survey, research, references?

To put this in context, are you saying that lambda, is also intuitive?
“let” is intuitive? “for” is intuitive? “when” is intuitive? I mean,
give your evaluation of some common computer language terminologies,
and tell us which you think are good and which are bad, so we have
some context to judge your claim.

For example, let us know, in your view, how good are terms: currying,
lisp1 lisp2, tail recursion, closure, subroutine, command, object. Or,
perhaps expound on the comparative merits and meaning on the terms
module vs package vs add-on vs library. I would like to see your view
on this with at least few paragraphs of analysis on each. If you, say,
write a essay that's at least 1k words on this topic, then we all can
make some judgment of your familiarity and understanding in this area.

Also, “being intuitive” is not the only aspect to consider whether a
term is good or bad. For example, emacs's uses the term “frame”. It's
quite intuitive, because frame is a common english word, everyone
understands. You know, door frame, window frame, picture frame, are
all analogous to emacs's “frame” on a computer. However, by some turn
of history, in computer software we call such as “window” now, and by
happenstance the term “window” also has a technical meaning in emacs,
what we call “split window” or “frame” today. So, in emacs, the term
“frame” and “window” is confusing, because emacs's “frame” is what we
call “window”, while emacs's “window” is what we call a frame. So
here, is a example, that even when a term is intuitive, it can still
be bad.

As another example, common understanding by the target group the term
is to be used is also a important aspect. For example, the term
“lambda”, which is a name of greek char, does not convey well what we
use it for. The word's meaning by itself has no connection to the
concept of function. The char happens to be used by a logician as a
shorthand notation in his study of what's called “lambda
calculus” (the “calculus” part is basically 1700's terminology for a
systematic science, especially related to mechanical reasoning).
However, the term “lambda” used in this way in computer science and
programing has been long and wide, around 50 years in recent history
(and more back if we trace origins). So, because of established use,
here it may decrease the level of what we might think of it as a bad
jargon, by the fact that it already become a standard usage or
understanding. Even still, note that just because a term has establish
use, if the term itself is very bad in many other aspects, it may
still warrant a need for change. For one example of a reason, the
argon will be a learning curve problem for all new generations.

You see, when you judge a terminology, you have to consider many
aspects. It is quite involved. When judging a jargon, some question
you might ask are:

• Does the jargon convey its meaning by the word itself? (i.e. whether
the jargon as a word is effective in communication)

• How long has been the jargon in use?

• Do people in the community understand the jargon? (e.g. more
scientifically: what percentage?)

Each of these sample questions can get quite involved. For example, it
calls for expertise in linguistics (many sub-fields are relevant:
pragmatics, history of language, etymology), practical experience in
the field (programing or computer science), educational expertise
(e.g. educators, professors, programing book authors/teachers),
scientific survey, social science of communication...

Also, you may not know, there are bodies of professional scientists
who work on terminologies for publication. It is not something like “O
think it's good, becus it is intuitive to me.”.

Xah ∑ http://xahlee.org/ ☄

Marc Mientki

2010-10-18 09:34:10 UTC