How can I split a C # string when an array member can be multiple words?
I am working on a small project to take a CSV file and then insert it into an HTML table (I would use datagrid and dataset or datatable, but the system I will be talking to does not support ASP.NET uploads for sending newsletters).
Anyway, I will use the file.readalllines method to return the contents of the csv file into an array of strings.
But for each string member of the array, I will use the string.split function to split the string into a char array. The problem is (and the csv file is generated by the system I am talking to btw - I am fetching data from that system and loading data into it) the csv content is cars. This means that I could:
Nissan almera
Nissan Almera 1.4 TDi
VW Golf 1.9 SE
Etc...
I'm sure I could ensure that if I have an Almera 1.4 TDi, for example, it is one element in a char array, I am stripping each line rather than individual elements.
source to share
I'm a bit stupid when it comes to cars, but could you please specify the main brand as a separator, not spaces?
EG: Nissan Almera Nissan _X100_Ultra_Model Ford Prefect Toyota Foo Bar Honda Prius
Analysis of major brands (Nissan, Ford, Toyota, Honda) will produce:
- Nissan almera
- Nissan _X100_Ultra_Model
- Ford prefect
- Toyota Foo Bar
- Honda prius
source to share
You will need to use a regular expression.
I'm not sure if you want a regex, but you can solve the problem with one and then you have 2 problems.
5 seconds Google search regex csv
gives blog entry
,(?=([^"]*"[^"]*")*(?![^"]*"))
At first it seems like this is a trick, this regex not matching the comma within the lines matches the position of the comma. So you think it would be pretty trivial to turn this into something useful, or at least give you a starting point.
Remember, you fail if you have an input string like
123,456,"Unbalanced quote
If it doesn't match a comma.
Step 2.Another Google search, this time for c# split csv files
CSV FILE PARSER AND WRITER IN C # (PART 3) (but check parts 1 and 2 for code)
It looks much more robust and even has test cases.
Since there is no standard CSV format, you have to be the judge if this works or not for the input files you allow.
source to share
As I understand it, the problem is:
- The lines in the analyzed file are not CSV, they are separated by spaces.
- The value of the first field of each line (make / model) can contain 0 or more actual spaces.
- Other field values on each line do not contain spaces, so the space separator works fine for them.
Let's say you have four columns and the first column value should be "Nissan Almera 1.4 TDi". Using normal Split () will result in 7 fields, not 4.
(untested code)
First, split it up:
int numFields = 4;
string[] myFields = myLine.Split(' ');
Then fix the array:
int extraSpaces = myFields.length-numFields;
if(extraSpaces>0) {
// Piece together element 0 in the array by adding the extra elements
for(int n = 1; n <= extraSpaces; n++) {
myFields[0] += ' ' + myFields[n];
}
// Move the other values back to elements 1, 2, and 3 of the array
for(int n = 1; n < numFields; n++) {
myFields[n] = myFields[n + extraSpaces];
}
}
Finally, ignore every element of the array outside of the four that you really wanted to parse.
Another approach would be regular expressions. I think something like this will work:
MatchCollection m = RegEx.Matches(myLine, "^(.*) ([^ ]+) ([^ ]+) ([^ ]+)$");
string MakeModel = m.Groups[1].Captures[0].ToString();
string ModelYear = m.Groups[2].Captures[0].ToString();
string Price = m.Groups[3].Captures[0].ToString();
string NumWheels = m.Groups[4].Captures[0].ToString();
There are no sections or arrays here, just groups captured by RegEx.
If there was a built-in String.Reverse () method (not there), I might suggest using the VB.NET Replace () function with a Count parameter to replace all spaces after the first three spaces (assuming four fields) into a reverse raw string. then reverse it again and smash it. Something like:
string[] myFields = Microsoft.VisualBasic.Replace(myLine.Reverse(), " ", "_", 0, 3).Reverse().Split(' ');
myFields[0] = myFields[0].Replace("_", " "); //fix the underscores
source to share
As someone else pointed out, string.split () takes a parameter, so you can pass a ',' to split based on that. It doesn't matter if you have spaces in the values. Unless you're really sure you won't have values that contain commas, I don't suggest this. Sailing CSV files is a bit more complicated than it might seem at first glance (handling quotes and values containing commas), and I suggest using some of the existing libraries for such as http://www.codeproject.com/KB/database/CsvReader. aspx .
source to share