Haskell, anyone?

Discussion:

Haskell, anyone?

(too old to reply)

Paul Rubin

2015-11-28 20:40:52 UTC

[Note to whoever: this continues a discussion about functional
programming that started on comp.lang.ada but got off-topic for that
group so I've migrated it here. Please don't crosspost unless there's
significant Ada content in your post.]

For example, if you wanted to explain the factorial to a student who
has never been exposed to the concept, chances are that your first
approach wouldn't be to write on a blackboard
"fact 1 = 1 -- [order of these two lines switched by me -Paul]
fact n = n * fact (n-1)"

But that's the mathematical definition of the factorial--this isn't
grade school. I assume I'm explaining it to a professional trying to do
high-assurance programming so of course that's the definition I'd use.
Treating the above as Haskell code that purports to compute the
factorial, it's easier to reason about than the imperative loop version:

Let P(n) be the proposition "fact(n)=n!". We want to verify that P(n)
is true for all n. We prove it by induction. First, P(1) is obviously
true. Second, assume P(n) is true. So by substitution, fact(n+1) =
(n+1) * fact(n). And since P(n) is true by the induction hypothesis,
fact(n+1) = (n+1) * n! which is (n+1)!, which means we've proved P(n) =>
P(n+1). Therefore by induction, forall n. P(n), QED, so we've verified
the code, and this was very easy.

I'm out of my depth here, but I think to do the same thing with
imperative code is much messier, you have to put preconditions and
postconditions around every statement to prove the loop invariant that
you're going through factorials, etc. I guess you can do that with
SPARK but I don't know how it's done.

At the level of everyday coding though, going from loops to recursion is
nothing to be bothered about, trust me. It's like when people freak out
by whitespace as syntax when they first see Python, or by the
parentheses when they first see Lisp. You get used to it quickly and it
stops mattering. Or anyway, if it's really an obstacle, FP probably
isn't for you.

In any case, if you are using a machine which can only perform one
binary operation at a time, you will always need an accumulator to
compute the final result.

I think that's an advanced, low level concept though. There's tons of
magic in Javascript: JIT compilation, garbage collection, built-in
objects that make RPC calls, etc. But there's zillions of
not-so-experienced Javascript programmers writing crappy web pages all
over the the world, who never think about that stuff and have never
heard of an accumulator. A programmer with deep knowledge doing
bleeding edge systems has to understand what the machine is doing, but
someone just trying to bang out everyday tasks can usually stay at the
high level. No accumulators.

I would gladly challenge this claim by picking two large groups of 10
years old who have never been exposed to imperative programming yet,

I don't really know about this and it doesn't concern me too much. It
might be easier to start one way or the other, but either way you pick
up more technique as you gain experience. By the time you're evaluating
different approaches to formal software verification (what this thread
was originally about), you're outside the range of what most
10-year-olds are capable of doing, so it's reasonable to assume more
fluency.

I'm writing programs for massively parallel GPU processors, and last
time I checked Blelloch's scan was still one of the most efficient
algorithms

That's outside my area and maybe a niche, but I know some Haskell and FP
work has been done on this stuff (Data Parallel Haskell is inspired by
Blelloch's NESL nested parallel work):

http://conal.net/blog/posts/parallel-tree-scanning-by-composition
http://www.cs.cmu.edu/~rwh/papers/iolambda/short.pdf
https://wiki.haskell.org/GHC/Data_Parallel_Haskell/References

There is also an array library with GPU support:

http://hackage.haskell.org/package/accelerate

Well, I'm learning OCaml using a pretty nice MOOC which has been set
up by my former university. If you're interested the address is

Ocaml is close enough to what I've been doing already that I think I'm
ok with just the online docs and the irc channel, plus hacking code.
Thanks though. I might look at some online math classes sometime.

hu47121+ (Hannah)

2015-11-29 00:01:04 UTC

Permalink

Hi!

Post by Paul Rubin
[Note to whoever: this continues a discussion about functional
programming that started on comp.lang.ada but got off-topic for that
group so I've migrated it here. Please don't crosspost unless there's
significant Ada content in your post.]

But that's the mathematical definition of the factorial--this isn't
grade school. I assume I'm explaining it to a professional trying to do
high-assurance programming so of course that's the definition I'd use.
Treating the above as Haskell code that purports to compute the
Let P(n) be the proposition "fact(n)=n!". We want to verify that P(n)
is true for all n. We prove it by induction. First, P(1) is obviously
true. Second, assume P(n) is true. So by substitution, fact(n+1) =
(n+1) * fact(n). And since P(n) is true by the induction hypothesis,
fact(n+1) = (n+1) * n! which is (n+1)!, which means we've proved P(n) =>
P(n+1). Therefore by induction, forall n. P(n), QED, so we've verified
the code, and this was very easy.
I'm out of my depth here, but I think to do the same thing with
imperative code is much messier, you have to put preconditions and
postconditions around every statement to prove the loop invariant that
you're going through factorials, etc. I guess you can do that with
SPARK but I don't know how it's done.
At the level of everyday coding though, going from loops to recursion is
nothing to be bothered about, trust me. It's like when people freak out
by whitespace as syntax when they first see Python, or by the
parentheses when they first see Lisp. You get used to it quickly and it
stops mattering. Or anyway, if it's really an obstacle, FP probably
isn't for you.

Ok some things may eventually even be not so difficult as many recursion
patterns can be abstracted away into higher order functions and there
are many such already there. Like:
forM_ [1..10] $ \a -> print a
(monadic "for loop" over a list, bind each item to a, "execute" print
on each a in "sequence" [i.e. connected with monadic >> under the hood,
which *does* mean sequencing in the IO monad]).

Post by Paul Rubin
[...]

Hannah.

Hadrien Grasland

2015-11-29 11:12:22 UTC

Permalink

Post by Paul Rubin

But that's the mathematical definition of the factorial--this isn't
grade school. I assume I'm explaining it to a professional trying to do
high-assurance programming so of course that's the definition I'd use.
Treating the above as Haskell code that purports to compute the
Let P(n) be the proposition "fact(n)=n!". We want to verify that P(n)
is true for all n. We prove it by induction. First, P(1) is obviously
true. Second, assume P(n) is true. So by substitution, fact(n+1) =
(n+1) * fact(n). And since P(n) is true by the induction hypothesis,
fact(n+1) = (n+1) * n! which is (n+1)!, which means we've proved P(n) =>
P(n+1). Therefore by induction, forall n. P(n), QED, so we've verified
the code, and this was very easy.
I'm out of my depth here, but I think to do the same thing with
imperative code is much messier, you have to put preconditions and
postconditions around every statement to prove the loop invariant that
you're going through factorials, etc. I guess you can do that with
SPARK but I don't know how it's done.

I'm not sure I understand. Here, you show an example of functional code verification which is done by hand, then you say that it would be much more difficult to do it automatically with imperative code. That looks like a comparison of apples to orange to me.

I can verify imperative code manually, too. In fact, that's one of my favorite debugging techniques : sitting down, taking a pen and a piece of paper, carefully reading through the code I've just written, manually going through a few iterations with a simplified input, and wondering if it does what I want.

But if you want to go into automated verification, you need to somehow instruct the formal proof system about what you're trying to do, and this is the messy part. For simple functions like factorial, you will effectively end up duplicating your code in the proof system's own language for preconditions/postconditions/invariants.

To the best of my knowledge, this is true no matter whether you're using a functional or imperative programming language. It's a fundamental limitation of formal proof (and possibly a reason why it has failed to become mainstream so far, outside of niches like critical systems).

Post by Paul Rubin
At the level of everyday coding though, going from loops to recursion is
nothing to be bothered about, trust me. It's like when people freak out
by whitespace as syntax when they first see Python, or by the
parentheses when they first see Lisp. You get used to it quickly and it
stops mattering. Or anyway, if it's really an obstacle, FP probably
isn't for you.

As I said on CLA, I agree that recursive code is relatively easy to write. My concern is how easy to *read* it is after a couple months of developper working on another project + code bitrot.

In imperative programming, recursion is just one tool in your toolbox among many others. When you meet a problem which recursion fits best, such as quicksort or tree traversal, you use recursion. When you encounter a problem which loops fit best, such as traversing a data structure while accumulating some intermediary state or mutating a data structure, you use loops. When neither approach has a clear advantage, you use loops because they are easier to understand.

But in functional programming, it seems to me that people will often try to shoehorn recursion into problems which it doesn't fit well at all. This will result in complex code, where recursive functions have parameters which only serve to accumulate state changes, and where multi-step algorithms are implemented using complex nested trees of pattern matching, possibly further obscured by gratuitous use of higher-order functions.

It is this kind of process which I am criticizing : it results in code which is about as easy to read as imperative code where all control flow is done using goto statements, and I am absolutely not convinced that codebases doing this will remain readable once they aren't fresh in a developer's mind anymore.

Hence my opinion that the best attitude is probably that of functional programming languages embracing imperative constructs, like OCaml, or imperative programming languages embracing functional constructs, like C++ or Python. Do not force developers to use the functional approach when it is suboptimal, and trust them to pick the option that fits their problem domain best.

Post by Paul Rubin

In any case, if you are using a machine which can only perform one
binary operation at a time, you will always need an accumulator to
compute the final result.

I'm sorry, I just cannot let Javascript be categorized as a beginner-friendly programming language. When a programming language lets you, without even an implementation warning, do atrocious things at runtime like rewriting the standard library or summing values of different types, it may give the illusion of ease of use, but painful debugging experience will quickly correct this false impression.

Anyway, if you want to write even mildly efficient code, you need to have at least a basic idea of what the implementation is doing. Otherwise, you end up overloading the GC with thousands of throwaway objects, as in beginner-level Java code. Or you end up crashing your program with stack overflows as soon as the datasets become non-trivial, as in beginner-level functional code. Or you end up using plenty of indirection in your data access when you could have used fast tables instead. And so on.

Not caring about the implementation is a luxury that is only available when your program requires far smaller compute capabilities than what the underlying machine+OS can provide. Though I will agree that it is certainly nice when you can do it :)

Post by Paul Rubin

I'm writing programs for massively parallel GPU processors, and last
time I checked Blelloch's scan was still one of the most efficient
algorithms

That's outside my area and maybe a niche, but I know some Haskell and FP
work has been done on this stuff (Data Parallel Haskell is inspired by
http://conal.net/blog/posts/parallel-tree-scanning-by-composition
http://www.cs.cmu.edu/~rwh/papers/iolambda/short.pdf
https://wiki.haskell.org/GHC/Data_Parallel_Haskell/References
http://hackage.haskell.org/package/accelerate

Here, my point was not that it is impossible to use functional algorithms on GPUs, only that sometimes, mutable data structures and associated algorithms are the most effective way to achieve some goals on shared-memory architectures.

For example, in this post that you mentioned...

http://conal.net/blog/posts/parallel-tree-scanning-by-composition

...it takes the author 10 A4 pages of code (accompanied by a bit of discussion and mathematical concepts) in order to implement an algorithm which I can describe in one schematic, then implement in a few dozen lines of code which anyone familiar with imperative constructs can understand. I hope we can agree that the signal-to-noise ratio is not in favor of the functional approach here.

The reason, I believe, can be found in one core tenet of eXtreme Programming and agile development in general: never use an algorithm which is more general than what you really need. Instead, generalize algorithms only when you find yourself doing something closely related to what you've done before.

It seems to me that more often than not, the functional approach is rather to take the most general abstraction that will possibly fit, then painstakingly go through the difficult process of shoehorning this complex abstraction into a simple problem. It is not without reminding how Java developers will sometimes design extremely complex class hierarchies to solve problems which are adressed by 10 lines of procedural code.

Richard

2015-11-29 17:44:17 UTC

Permalink

Post by Hadrien Grasland
In imperative programming, recursion is just one tool in your
toolbox among many others. When you meet a problem which recursion
fits best, such as quicksort or tree traversal, you use recursion.
When you encounter a problem which loops fit best, such as traversing
a data structure while accumulating some intermediary state or
mutating a data structure, you use loops. When neither approach has a
clear advantage, you use loops because they are easier to
understand.

But couldn't the same point be made when comparing structured control
structures with the goto statement? Some algorithms can be expressed in
a simpler way using labels and goto statements.

For example, the following pseudo code shows how a stream of characters
delimited by a new line character could be processed using structured
control statements:

read (c)
WHILE c # new line DO
process (c)
read (c)
END

Using labels and goto, it could be programmed like this:

loop:
read (c)
IF c = new line THEN GOTO end
process (c)
end:

At first sight, the second version is simpler and easier to understand:
The read procedure is only called at one position, and it is immediately
clear where the value of variable c is modified and used. In the
structured version there are two (redundant) read calls, and variable c
is assigned twice with its value used textually before the second
assignment.

Nevertheless, most developers would agree that the structured version is
better because the control flow is visible immediately and there is no
danger of some external code jumping into the loop with unclear
consequences.

Similiarly, implementing an algorithm with an iteration instead of
recursion could be simpler and clearer at first sight. Nevertheless, it
could be actually better to use the recursive version of a pure
functional programming language because then one can be sure that
(instances of) variables are assigned only once and there can be no
unexpected behavior caused by uninitialized variables and side effects.

Post by Hadrien Grasland
constructs, like C++ or Python. Do not force developers to use the
functional approach when it is suboptimal, and trust them to pick
the option that fits their problem domain best.

In my experience, you can not trust developers to pick the best option:
they will use whatever the programming languages offers.

Richard

Hadrien Grasland

2015-11-30 09:02:15 UTC

Permalink

Post by Richard

I would say that the problem with goto is that it is usually too powerful as an abstraction. Every structured control flow primitive may be implemented on top of "if" and "goto", and many more styles of control flow can be implemented on top of goto (see e.g. how C programmers use goto as a replacement for exception handling). This also means, as a necessary consequence, that code using goto can do just about anything: jump in and out of variable scopes, bypass initialization instructions, etc. Which is, after all, goto's main problem.

So for me, goto is like the C preprocessor: a dangerous and highly imperfect fix for language deficiencies, in this case missing control flow.

For example, in the case of the loop you outlined above, Ada proposes a nice solution in the form of loop + exit when. This allows you to tell a developer reading your code that the loop you are writing uses nonstandard loop control flow. Using this primitive, an idiomatic Ada equivalent to your code would look like this :

loop
. read(c);
. exit when c = new line;
. process(c);
end loop;

With this construct, you avoid code duplication in the first part of the loop, which can in some cases be longer than one instruction. And for cases where you really need to implement even more complex control flow in the form of nested loops, Ada also allows you to name your loops and specify which loop you are exiting.

I wonder why other programming languages did not copy that idea, as it is surely a lot nicer than the mess of breaks, gotos and returns that can plague C-derived programming languages in similar situations.

Anyway, my point is, if your programming language actually took the time to offer a good design for loop early exit, the power of goto is overkill, and its use is thus solely harmful to readability. I am not convinced that the same is true for all imperative constructs.

Post by Richard
Similiarly, implementing an algorithm with an iteration instead of
recursion could be simpler and clearer at first sight. Nevertheless, it
could be actually better to use the recursive version of a pure
functional programming language because then one can be sure that
(instances of) variables are assigned only once and there can be no
unexpected behavior caused by uninitialized variables and side effects.

All popular compilers these days warn about code that reads from uninitialized variables, so one cannot really use an uninitialized variable without asking for it anymore. It's a former problem which has been (mostly) solved by toolchain improvements, like incomplete pattern matching in functional languages.

Conversely, initializing variables whose content is not available yet, as is somewhat suggested by programming textbooks and made mandatory by some programming languages, should be considered harmful IMO. It leads people to put nonsensical content inside of a variable "since it needs to be initialized", and thus bypass this extremely useful toolchain warning. The result is useless memory traffic and obscured code workflow during debugging.

As for side effects, well, they are a necessary evil really: can you describe any useful and complete program (where complete means runnable by an average operating system) which can perform its work without any kind of external input and output?

Post by Richard

they will use whatever the programming languages offers.

That can be true, but I also believe that no programming language or tool can stop careless developers. It has been tried again and again, and it has always failed.

If you suppress global variables, people will replace them by a gigantic "state" record that is passed as a parameter of every function and returned as a result. Same end result, with more useless memory traffic (actually, "pure" OO programming languages like Java could sometimes be said to revolve around this :) ).

If you suppress goto, break, and other forms of tricky control flow, people will misuse exception handling in order to get it back.

If you make it mandatory for a function to declare which exceptions it may be throwing, people will end up either declaring that all functions can throw <insert exception base class here> or, worse, swallowing exceptions silently.

At some point, you always need to rely on developer education and trust your users in order to move forward. Tools can help a careful programmer make less mistakes, and notice them more quickly, but they cannot stop a programmer who is actively trying to cheat the system and do careless things.