Java regex to split attribute list from SQL query into string [] attrs

I currently have the following code:

String select = qry.substring("select ".length(),qry2.indexOf(" from "));
String[] attrs = select.split(",");

      

which works for most parts, but fails if the following is given:

qry = "select a,b,c,DATETOSTRING(date_attr_name,'mm/dd/yyyy') from tbl_a";

      

what I'm looking for is a regex to feed to String.split () that will pass this situation, in which case any other special cases you might think I can't.

+1


source to share


4 answers


[^,]+\([^\)]+\)|[^,]+,

      

You should do this if you always add the final ',' line to your selection line:

a,b,c,DATETOSTRING(date_attr_name,'mm/dd/yyyy'),f,gg,dr(tt,t,),fff

      

won't be able to break the last "fff" attributes, but:

a,b,c,DATETOSTRING(date_attr_name,'mm/dd/yyyy'),f,gg,dr(tt,t,),fff,

      

capture him. Thus, a little preprocessing could smooth things over.



Caveat : this ignores the expression in the expression

EXP(arg1, EXP2(ARG11,ARG22), ARG2)

      

Tell me if this can happen in the requests you have to process.

Caveat bis : Since this requires a true regex, and not the simple delimiter expected by split (), you should use Matcher based on the pattern [^,]+\([^\)]+\)|[^,]),

and iterate over Matcher.find () to populate the attribute array attrs

.

In short, with the split () function, there is no single, simple delimiter that could do the trick.

+1


source


Your answer in the form of a quote:

Some people, when faced with a problem, think, "I know, I will use regular expressions." They now have two problems. - Jamie Zawinski



Your regex should consider all possible functions, nested functions, nested strings, etc. Your solution is probably not a regex, it is a lexer + parser.

+2


source


You would probably be in luck with a SQL parser .

+1


source


As others have pointed out, this is actually a lexer and parser problem that is much more complex than just string splitting or regex. You will also find that depending on which version of SQL you are using and which database you are throwing all sorts of cogs into your parser, given the many variations that can arise in your SQL. The last thing you want to do is keep this piece of code in your full work, as you will find additional edge cases that break.

I asked myself the following questions.

  • What are you trying to accomplish this tokenization? What problem are you trying to solve? There might be a simple solution that doesn't require parsing the statement.

  • Do you want all the SQL or just the target columns / list of predictions?

+1


source







All Articles