How does groovy distinguish division from strings?

Groovy supports /

both the division operator:

groovy> 1 / 2
===> 0.5

      

It supports /

as a line separator, which can even be multi-line:

groovy> x = /foo/
===> foo
groovy:000> x = /foo
groovy:001> bar/
===> foo
bar

      

With this in mind, why can't I evaluate the slashy-string literal in groovysh?

groovy:000> /foo/
groovy:001>

      

clearly groovysh thinks this is discontinued for some reason.

How can groovy avoid confusion between division and strings? What does this code mean:

groovy> f / 2

      

Is this a function call f(/2 .../)

where /

the multi-line slashy starts f

, or is it divided by 2?

+3


source to share


1 answer


How does Groovy distinguish division from strings?

I'm not entirely sure how Groovy works, but I'll tell you how I do it, and I'd be very surprised if Groovy didn't work in a similar way.

Most of the parsing algorithms I've heard about ( Shunting-yard , Pratt , etc.) recognize two different types of tokens:

  • Those expected to be preceded by an expression (infix statements, postfix statements, closing parentheses, etc.). If one of them is not preceded by an expression, it is a syntax error.
  • Those not expected to be preceded by an expression (prefix operators, open parentheses, identifiers, literals, etc.). If one of them is preceded by an expression, it is a syntax error.

To keep things simple, from now on I will refer to the old token as an operator , and the second - to a non-operator .

Now, the interesting thing about this difference is that it is not based on what the token is, but rather on the immediate context, especially on previous tokens. Because of this, the same token can be interpreted differently depending on its position in the code and whether it is parsed by the parser as an operator or a non-operator. For example, token ' -

', if in operator position, denotes subtraction, but the same token in non-operator position is negation. There is no problem deciding whether operator <<20> is "subtractive" or not, because you can define its context.

The same is generally true for the ' /

' character in Groovy. If it is preceded by an expression, it is interpreted as a statement, which means it is separated. Otherwise, it is a non-operator that makes it a string literal. So, you can generally tell whether " /

" is a division or not by looking at the marker that immediately precedes it:

  • ' /

    ' is a subdivision if it follows an identifier, literal, postfix operator, closing parenthesis, or other token that denotes the end of an expression.
  • ' /

    ' starts a lineif it follows a prefix operator, infix operator, open parenthesis, or other such token, or if it begins a string.

Of course, in practice it is not so easy. Groovy is designed to be flexible over different styles and uses, and so things like semicolons or parentheses are often optional. This can lead to ambiguous parsing. For example, let's say our parser ends up on the following line:

println / foo

      



This is most likely an attempt to print a multiline string: foo

is the start of the string passed println

as an argument, and the optional parentheses around the argument list are ignored. Of course, for a simple parser, this is like splitting. I expect the Groovy parser can tell the difference by reading the following lines to see which interpretation does not give an error, but for something like groovysh

that, which is literally impossible (since like repl, but have access to more lines), so he made me just guess.

Why can't I evaluate the slashy-string literal in groovysh?

As before, I don't know the exact reason, but I know that since it groovysh

is a replica, it should have more trouble with more ambiguous rules. That said, a simple one-liner slashy is pretty unambiguous, so I believe there might be something else here. Here is the result of playing with various shapes in groovysh

:

> /foo             - unexpected char: '/' @ line 2, column 1.
> /foo/            - awaits further input
> /foo/bar         - unexpected char: '/' @ line 2, column 1.
> /foo/bar/        - awaits further input
> /foo/ + 'bar'    - unexpected char: '/' @ line 2, column 1.
> 'foo' + /bar/    - evaluates to 'foobar'
>  /foo/           - evaluates to 'foo'
>  /foo            - awaits further input
>  /foo/bar        - Unknown property: bar

      

It looks like something weird is happening when the ' /

' character is the first character in the string. A pattern that seems to follow (as far as I can tell):

  • Slash when the first character of a string starts strange parsing mode.
  • In this mode, each line ending with a slash followed by only a space will cause repl to wait for further lines.
  • The first line ending with something other than a forward slash (or a space after the forward slash) prints an error unexpected char: '/' @ line 2, column 1.

    .

I also noticed a couple of interesting points in this regard:

  • In this special mode, both forward slashes ( /

    ) and backslashes ( \

    ) are displayed and are considered completely interchangeable.
  • This does not happen at all in groovyConsole

    or in real Groovy files.
  • By putting spaces in front of the open forward slash character, it calls groovysh

    to interpret it correctly, but only if the open forward slash is a forward slash and not a backslash.

So, I personally expect this to be just a quirk groovysh

, either a bug, or some kind of documented documentation that I haven't heard of.

+2


source







All Articles