How can I split a C # string when an array member can be multiple words?

I am working on a small project to take a CSV file and then insert it into an HTML table (I would use datagrid and dataset or datatable, but the system I will be talking to does not support ASP.NET uploads for sending newsletters).

Anyway, I will use the file.readalllines method to return the contents of the csv file into an array of strings.

But for each string member of the array, I will use the string.split function to split the string into a char array. The problem is (and the csv file is generated by the system I am talking to btw - I am fetching data from that system and loading data into it) the csv content is cars. This means that I could:

Nissan almera

Nissan Almera 1.4 TDi

VW Golf 1.9 SE

Etc...

I'm sure I could ensure that if I have an Almera 1.4 TDi, for example, it is one element in a char array, I am stripping each line rather than individual elements.

-1


source to share


6 answers


Use an overloaded version string.Split()

that limits the number of values ​​returned.



    string makeModel = csvArray[0]; // or whichever column it is in
    string[] makeAndModel = makeModel.Split( new char[] { ' ' } , 2 );
    string make = makeAndModel[0];
    string model = makeAndModel[1];

      

+3


source


I'm a bit stupid when it comes to cars, but could you please specify the main brand as a separator, not spaces?

EG: Nissan Almera Nissan _X100_Ultra_Model Ford Prefect Toyota Foo Bar Honda Prius



Analysis of major brands (Nissan, Ford, Toyota, Honda) will produce:

  • Nissan almera
  • Nissan _X100_Ultra_Model
  • Ford prefect
  • Toyota Foo Bar
  • Honda prius
0


source


You will need to use a regular expression.

I'm not sure if you want a regex, but you can solve the problem with one and then you have 2 problems.

5 seconds Google search regex csv

gives blog entry

,(?=([^"]*"[^"]*")*(?![^"]*"))

      

At first it seems like this is a trick, this regex not matching the comma within the lines matches the position of the comma. So you think it would be pretty trivial to turn this into something useful, or at least give you a starting point.

Remember, you fail if you have an input string like

123,456,"Unbalanced quote

      

If it doesn't match a comma.


Step 2.Another Google search, this time for c# split csv files

CSV FILE PARSER AND WRITER IN C # (PART 3) (but check parts 1 and 2 for code)

It looks much more robust and even has test cases.

Since there is no standard CSV format, you have to be the judge if this works or not for the input files you allow.

0


source


As I understand it, the problem is:

  • The lines in the analyzed file are not CSV, they are separated by spaces.
  • The value of the first field of each line (make / model) can contain 0 or more actual spaces.
  • Other field values ​​on each line do not contain spaces, so the space separator works fine for them.

Let's say you have four columns and the first column value should be "Nissan Almera 1.4 TDi". Using normal Split () will result in 7 fields, not 4.

(untested code)

First, split it up:

int numFields = 4;
string[] myFields = myLine.Split(' ');

      

Then fix the array:

int extraSpaces = myFields.length-numFields;
if(extraSpaces>0) {
  // Piece together element 0 in the array by adding the extra elements
  for(int n = 1; n <= extraSpaces; n++) {
    myFields[0] += ' ' + myFields[n];
  }
  // Move the other values back to elements 1, 2, and 3 of the array
  for(int n = 1; n < numFields; n++) {
    myFields[n] = myFields[n + extraSpaces];
    }
  }

      

Finally, ignore every element of the array outside of the four that you really wanted to parse.

Another approach would be regular expressions. I think something like this will work:

 MatchCollection m = RegEx.Matches(myLine, "^(.*) ([^ ]+) ([^ ]+) ([^ ]+)$");
 string MakeModel = m.Groups[1].Captures[0].ToString();
 string ModelYear = m.Groups[2].Captures[0].ToString();     
 string Price     = m.Groups[3].Captures[0].ToString();     
 string NumWheels = m.Groups[4].Captures[0].ToString();

      

There are no sections or arrays here, just groups captured by RegEx.

If there was a built-in String.Reverse () method (not there), I might suggest using the VB.NET Replace () function with a Count parameter to replace all spaces after the first three spaces (assuming four fields) into a reverse raw string. then reverse it again and smash it. Something like:

string[] myFields = Microsoft.VisualBasic.Replace(myLine.Reverse(), " ", "_", 0, 3).Reverse().Split(' ');
myFields[0] = myFields[0].Replace("_", " "); //fix the underscores

      

0


source


As someone else pointed out, string.split () takes a parameter, so you can pass a ',' to split based on that. It doesn't matter if you have spaces in the values. Unless you're really sure you won't have values ​​that contain commas, I don't suggest this. Sailing CSV files is a bit more complicated than it might seem at first glance (handling quotes and values ​​containing commas), and I suggest using some of the existing libraries for such as http://www.codeproject.com/KB/database/CsvReader. aspx .

0


source


The Split () method takes a char parameter that can be used to specify the delimiter. Therefore, you can do something like:

String.Split(Convert.ToChar(","));

      

Judging from your question, all car brands must be comma separated, so this should work.

-1


source







All Articles