How does groovy distinguish division from strings?
Groovy supports /
both the division operator:
groovy> 1 / 2
===> 0.5
It supports /
as a line separator, which can even be multi-line:
groovy> x = /foo/
===> foo
groovy:000> x = /foo
groovy:001> bar/
===> foo
bar
With this in mind, why can't I evaluate the slashy-string literal in groovysh?
groovy:000> /foo/
groovy:001>
clearly groovysh thinks this is discontinued for some reason.
How can groovy avoid confusion between division and strings? What does this code mean:
groovy> f / 2
Is this a function call f(/2 .../)
where /
the multi-line slashy starts f
, or is it divided by 2?
source to share
How does Groovy distinguish division from strings?
I'm not entirely sure how Groovy works, but I'll tell you how I do it, and I'd be very surprised if Groovy didn't work in a similar way.
Most of the parsing algorithms I've heard about ( Shunting-yard , Pratt , etc.) recognize two different types of tokens:
- Those expected to be preceded by an expression (infix statements, postfix statements, closing parentheses, etc.). If one of them is not preceded by an expression, it is a syntax error.
- Those not expected to be preceded by an expression (prefix operators, open parentheses, identifiers, literals, etc.). If one of them is preceded by an expression, it is a syntax error.
To keep things simple, from now on I will refer to the old token as an operator , and the second - to a non-operator .
Now, the interesting thing about this difference is that it is not based on what the token is, but rather on the immediate context, especially on previous tokens. Because of this, the same token can be interpreted differently depending on its position in the code and whether it is parsed by the parser as an operator or a non-operator. For example, token ' -
', if in operator position, denotes subtraction, but the same token in non-operator position is negation. There is no problem deciding whether operator <<20> is "subtractive" or not, because you can define its context.
The same is generally true for the ' /
' character in Groovy. If it is preceded by an expression, it is interpreted as a statement, which means it is separated. Otherwise, it is a non-operator that makes it a string literal. So, you can generally tell whether " /
" is a division or not by looking at the marker that immediately precedes it:
- '
/
' is a subdivision if it follows an identifier, literal, postfix operator, closing parenthesis, or other token that denotes the end of an expression. - '
/
' starts a lineif it follows a prefix operator, infix operator, open parenthesis, or other such token, or if it begins a string.
Of course, in practice it is not so easy. Groovy is designed to be flexible over different styles and uses, and so things like semicolons or parentheses are often optional. This can lead to ambiguous parsing. For example, let's say our parser ends up on the following line:
println / foo
This is most likely an attempt to print a multiline string: foo
is the start of the string passed println
as an argument, and the optional parentheses around the argument list are ignored. Of course, for a simple parser, this is like splitting. I expect the Groovy parser can tell the difference by reading the following lines to see which interpretation does not give an error, but for something like groovysh
that, which is literally impossible (since like repl, but have access to more lines), so he made me just guess.
Why can't I evaluate the slashy-string literal in groovysh?
As before, I don't know the exact reason, but I know that since it groovysh
is a replica, it should have more trouble with more ambiguous rules. That said, a simple one-liner slashy is pretty unambiguous, so I believe there might be something else here. Here is the result of playing with various shapes in groovysh
:
> /foo - unexpected char: '/' @ line 2, column 1.
> /foo/ - awaits further input
> /foo/bar - unexpected char: '/' @ line 2, column 1.
> /foo/bar/ - awaits further input
> /foo/ + 'bar' - unexpected char: '/' @ line 2, column 1.
> 'foo' + /bar/ - evaluates to 'foobar'
> /foo/ - evaluates to 'foo'
> /foo - awaits further input
> /foo/bar - Unknown property: bar
It looks like something weird is happening when the ' /
' character is the first character in the string. A pattern that seems to follow (as far as I can tell):
- Slash when the first character of a string starts strange parsing mode.
- In this mode, each line ending with a slash followed by only a space will cause repl to wait for further lines.
- The first line ending with something other than a forward slash (or a space after the forward slash) prints an error
unexpected char: '/' @ line 2, column 1.
.
I also noticed a couple of interesting points in this regard:
- In this special mode, both forward slashes (
/
) and backslashes (\
) are displayed and are considered completely interchangeable. - This does not happen at all in
groovyConsole
or in real Groovy files. - By putting spaces in front of the open forward slash character, it calls
groovysh
to interpret it correctly, but only if the open forward slash is a forward slash and not a backslash.
So, I personally expect this to be just a quirk groovysh
, either a bug, or some kind of documented documentation that I haven't heard of.
source to share