How does variable binding work with recursion in Haskell?
I read about Haskell and it stated that once a variable is bound to an expression, it cannot be bounced like,
x = 10
x = 11
Assign.hs:2:1: error:
Multiple declarations of βxβ
Declared at: Assign.hs:1:1
Assign.hs:2:1
So if this is the case ... how does recursion work when the same variable continues to bind to other things? for example
drop n xs = if n <= 0 || null xs
then xs
else drop (n-1) (tail xs)
variable n
... is bound over and over again every time the function recursively. Why is this allowed?
Also, if you guys can tell me a few keywords to find so I can learn more about it, I'd really appreciate it.
In almost any language, identifiers only make sense because they exist in scope; conceptually a kind of implicit mapping from names to the things they stand for.
In modern languages, you usually have (at least) a global scope and every local scope associated with every function / procedure. Example pseudocode:
x = 1
print x
x = 2
print x
function plus_one(a):
b = 1
return a + b
print plus_one(x)
x
are the names in the global scope, a
and b
are the names in the local scope of the function plus_one
.
Imperative languages ββ(and impure declarative languages) are generally understood by treating these names as being mapped to some kind of slot or pigeon hole in which things can be stored as intended, and current connections are referenced by name. It only works because the imperative stepwise way of thinking about these programs gives us a way to understand what βcurrentβ means. 1 The example above shows this; x
first assigned 1
, then printed, then assigned 2
, and then printed again; we expect this to print "1" and then print "2" (and then print "3" from the last line).
Given that understanding variables as slots that can store things, it's easy to fall into the trap of thinking about local variables, which represent function arguments as simply slots that are filled when the function is called. I say trap because it is not a useful way of thinking about arguments and function calls, even in imperative languages. This more or less works as long as each function only ever has one call "in flight" at once, but introducing one of any number of generic programming functions breaks this pattern (recursion, concurrency, laziness, closure, etc.) ). This misunderstanding is at the root of a number of questions I've seen here on SO where the poster was having difficulty understanding recursive calls or wanted to know how to access the local variables of a function from the outside, etc.
Actually you have to think about a function having a separate scope associated with each call to that function 2 . The function itself looks like a pattern for scope, not scope (although the common language usually speaks of "function scope" as shorthand). If you've provided bindings for the parameters of a function, then you can create a scope, but a typical function is called many times with different parameters, so there isn't just one scope for that function.
Consider my pseudocode plus_one
, with an argument a
. You can imagine what a
is the local name for the variable, and the call plus_one(x)
just assigns the content x
to the slot for a
and then starts executing the code plus_one
. But my point is that it is better to think that when you call plus_one
on x
, you create a new scope that has a named variable a
(containing the contents of the global scope x
at that point), but it is not a variable a
.
This is vital to understanding recursion, even in imperative languages:
function drop(n, xs):
if n <= 0 || null(xs):
return xs
else:
return drop(n - 1, tail(xs))
Here we could try to imagine that only one variable exists xs
, and when we make a recursive call drop
, we assign the tail of the original xs
variable xs
and run the function code again. But it crashes as soon as we change it to something like:
function drop(n, xs):
if n <= 0 || null(xs):
return xs
else:
result = drop(n - 1, tail(xs))
print xs
return result
We now use xs
after the recursive call. What this does is difficult to explain if we imagine that there is only one xs
, but trivial if we think that each separate tag xs
is in a separate scope every time we call drop
. When we make a recursive call drop
(by passing it in n - 1
and out tail(xs)
), it creates its own separate one xs
, so it is completely harmless that print xs
this scope can still be accessed xs
.
So is this story different from Haskell? It is true that the nature of variables in Haskell is in contrast to typical recursive languages. Instead of mapping area names to slots where we can place different content at different times, objects are bound directly to values, and there is no notion of time in which we could say that an identifier "was" bound to a single value (or was not whatever) and "now" is tied to a different value. x = 1
globally in Haskell it is not a step that "happens" and changes something, it is just a fact. x
only 1
... Haskell doesn't throw an error in your lines x = 10
and x = 11
because the designers wanted to restrict you to only assigning a variable once, and Haskell has no concept of assignment at all. x = 10
provides a definition for x
in this area, and you cannot have two separate definitions for the same name in the same area.
The local scopes that come from functions in Haskell are the same; each name is simply associated with a specific value 3 . But the actual values ββare parameterized by the values ββthat you call the function on (they are literally a function of those values); each call has its own scope, with a different mapping from names to values. This does not mean that each call n
"changes" to bind to a new argument, just that each call has a different one n
.
Thus, recursion affects variable binding in Haskell in much the same way as it does in all major imperative languages. Moreover, an extremely simple way to describe how recursion affects name binding in all of these languages ββis to say "it is not." So recursion is not special at all. parameter passing, local scopes, etc. works exactly the same way when you call a recursive function as when you call "normal" non-recursive functions, provided you understand local scopes as being associated with each function call, not as one thing associated with a function.
1 Sometimes even the scope mapping itself is mutable, and the names of the entry mappings in slots can be added and removed as program steps. Python, for example, has a mutable global scope (in every module); you can dynamically add and remove module global variables, even with names derived from runtime data. But it uses immutable local scopes: function local variables exist even before a value has been assigned to them (or after that value has been removed with del
).
2 At least since functions / procedures are very common in modern languages. It is not completely universal, but it is generally recognized as a good way of working / procedures.
3 Of course, thanks to laziness, a particular meaning can be the lower meaning, which is the "meaning" we pretend to be β it is the meaning of endless cycles and other forms of inaction, so that we can interpret expressions as always meaningful.
Haskell has a lexical definition. When you declare a variable inside the scope, it hides any variable with the same name from the outside scope. For example, in the following program
x :: Int
x = 5
main :: IO ()
main = do
x <- readLn :: IO Int
print x
If you compile with ghc -Wall
, it will compile correctly, but you will receive the following warnings:
sx-scope.hs:2:1: warning: [-Wunused-top-binds]
Defined but not used: βxβ
sx-scope.hs:6:10: warning: [-Wname-shadowing]
This binding for βxβ shadows the existing binding
defined at sx-scope.hs:2:1
So, x
outside main
and x
in main
are different variables, and in the inner area, it temporarily hides the one in the outer area. (Okay, "temporarily" in this example means until main
it returns, as long as the program is running, but you get the concept.)
Likewise, when you declare drop n xs
or \n xs ->
, n
and xs
are local variables that only exist at the time this function is called. It can be called more than once with different values ββof n
and xs
.
When a function is tail-recursive, that is, it returns a call to itself, the compiler knows that it is going to replace old parameters that were from a scope that no longer exists with updated parameters of the same type. This way, it can reuse the stack frame and store the new parameters in the same places as the previous ones. The resulting code can be as fast as iterating through a procedure language.
From https://en.wikibooks.org/wiki/Haskell/Variables_and_functions ,
Within this scope, a variable in Haskell is defined only once and cannot be changed.
but scope is not just function code. Roughly speaking, this is the function code + its context, i.e. The arguments with which he was called, and something from his outer sphere. This is often referred to as closing. Each time you call the function, the code runs under a new closure, so variables can evaluate different things.
The recursive case is nothing special about this: a function called twice by other code is run with a different closure, and its internal variables can be evaluated differently.