Splitting a string in JS

I need to split a string where I need to grab three pieces of information from that string and put it in an array essentially, the array will always have three things: [first, second, third]

and the second and third elements can be empty.

The line will be in the form "First Second, Id". I need to ignore extra spaces after each word or before each word.

So, the first and second words are distinguished by a space or spaces between them, and the second word and Id are distinguished by a comma.

Examples of lines to split:

John Doe, 1234

=> result: [John, Doe, 1234]

John [# spaces] Doe,[# spaces] 1234

=> result: [John, Doe, 1234]

[# spaces] John [# spaces] Doe [# spaces] , [# spaces] 1234

=> result: [John, Doe, 1234]

John , 1234

=> result: [John,"",1234]

John

=> result: [John, "", ""]

I tried using regex line.split(/[\s,]+/)

, but it will only work in case 1.

How do I create a regular expression that includes all of these cases?

+3


source to share


2 answers


Tested in every case you provided ...

Note. ... According to your examples, there should be a comma after the second capture group to distinguish between two groups or three.

In all examples, use .slice (1) to remove the first element from the returned array. This is because String.prototype.match returns an array including the original string.

Example one: one.match (regex) => ["John Doe, 1234", "John", "Doe", "1234"];

Example two: one.match (regex) .slice (1) => ["John", "Doe", "1234"];



You can include the original string in the array if you like, but to answer your question as accurately as possible, I could cast a chunk from index 1 to the end of the array.

var one = "John Doe, 1234";
var two = "John          Doe,       1234";
var three = "           John       Doe    ,      1234    ";
var four = "John , 1234";
var five = "John";
var six = ""; // additional test.
var seven = "John doe"; // additional test.
var eight = "John Doe,        " // additional test.

// Here is the regex...
var regex = new RegExp("^\\s*(\\w*)\\s*(\\w*)\\s*,?\\s*(\\w*)");
// regex => /^\s*(\w*)\s*(\w*)\s*,?\s*(\w*)/;

one.match(regex).slice(1);
// result: ["John", "Doe", "1234"];

two.match(regex).slice(1);
// result: ["John", "Doe", "1234"];

three.match(regex).slice(1);
// result: ["John", "Doe", "1234"];

four.match(regex).slice(1);
// result: ["John", "", "1234"];

five.match(regex).slice(1);
// result: ["John", "", ""];

six.match(regex).slice(1);
// result: ["", "", ""];

seven.match(regex).slice(1);
// result: ["john", "doe", ""];

eight.match(regex).slice(1);
// result: ["John", "Doe", ""];

      

Also, when creating a regular expression object using the new RegExp, some characters must be escaped, so the double "\\".

+1


source


My idea was to first remove the extra spaces and commas and then run a regex that will look for three components specifically for two groups of characters and one group of numbers. I tried this in Python.

def get_name(namestr):
    returnable = []
    namestr = re.sub("(\s\s+)|(\,)", " ", namestr.strip())
    mat = re.match("([a-zA-Z]+)(\s+)?([a-zA-Z]+)?(\s+)?([0-9]+)?", namestr)
    if mat:
        return [mat.group(i) if mat.group(i) else '' for i in [1,3,5]]

      



You will need to translate this to Javascript. I tried, but my poor language team took 20 minutes out of my life, just trying to remove the extra spaces.

Would love to see a suggested edit with a JS implementation.

0


source







All Articles