C # User defined mapping CSV to POCO

I have a system that reads input from a Serial / UDP / TCP source. Input data is just CSV of different data types (e.g. DateTime, double, int, string). An example line could be:

2012/03/23 12:00:00,1.000,23,information,1.234

      

I currently have (untested) code that allows the user to specify which value in a CSV list goes to which POCO property.

So, in the above example, I will have an object like this:

public class InputData
{
 public DateTime Timestamp{get;set;}
 public double Distance{get;set;}
 public int Metres{get;set;}
 public string Description{get;set;}
 public double Height{get;set;}
} 

      

Now in this class, I have a method to parse the CSV string and populate the properties. This method also requires Mapping information as there is no guarantee that the CSV data order will be received - it is up to the user to determine the correct order.

This is my mapping class:

//This general class handles mapping CSV to objects
public class CSVMapping
{
    //A dictionary holding Property Names (Key) and CSV indexes (Value)
    //0 Based index
    public IDictionary<string, int> Mapping { get; set; }
}

      

Now my ParseCSV () method:

//use reflection to parse the CSV survey input
public bool ParseCSV(string pCSV, CSVMapping pMapping)
{
    if (pMapping == null) return false;
    else
    {
        Type t = this.GetType();
        IList<PropertyInfo> properties = t.GetProperties();
        //Split the CSV values
        string[] values = pCSV.Split(new char[1] { ',' });
        //for each property set its value from the CSV
        foreach (PropertyInfo prop in properties)
        {
            if (pMapping.Mapping.Keys.Contains(prop.Name))
            {
                if (prop.GetType() == typeof(DateTime))
                {
                    if (pMapping.Mapping[prop.Name] >= 0 && pMapping.Mapping[prop.Name] < values.Length)
                    {
                        DateTime tmp;
                        DateTime.TryParse(values[pMapping.Mapping[prop.Name]], out tmp);
                        prop.SetValue(this, tmp, null);
                    }
                }
                else if (prop.GetType() == typeof(short))
                {
                    if (pMapping.Mapping[prop.Name] >= 0 && pMapping.Mapping[prop.Name] < values.Length)
                    {
                        double tmp;
                        double.TryParse(values[pMapping.Mapping[prop.Name]], out tmp);
                        prop.SetValue(this, Convert.ToInt16(tmp), null);
                    }
                }
                else if (prop.GetType() == typeof(double))
                {
                    if (pMapping.Mapping[prop.Name] >= 0 && pMapping.Mapping[prop.Name] < values.Length)
                    {
                        double tmp;
                        double.TryParse(values[pMapping.Mapping[prop.Name]], out tmp);
                        prop.SetValue(this, tmp, null);
                    }
                }
                else if (prop.GetType() == typeof(string))
                {
                    if (pMapping.Mapping[prop.Name] >= 0 && pMapping.Mapping[prop.Name] < values.Length)
                    {
                        prop.SetValue(this, values[pMapping.Mapping[prop.Name]], null);
                    }
                }
            }
        }
        return true;
    }
}

      

Now for my question:

I have potentially several classes that will require this functionality. Would it be helpful to implement a generic or parsing extension class for me? Is my method a good way to parse CSV data and enrich my object - or is there a better way to do this?

I've read other questions on dynamic CSV parsing, but all deal with order known before execution, whereas I require the user to determine the order.

+3


source to share


2 answers


OleDB is great for parsing CSV data and you don't need to use reflection to do this. Here's the basic idea for mapping to OleDB classes:

  • The user defines the mapping (using a delegate, fluent interface, or whatever) and it ends up in a dictionary in your Mapper class.
  • Parser creates DataTable and inserts columns from mapper
  • Parser creates OleDbConnection, Adapter, Command and populates dataTable from CSV file in correct types.
  • Parser Selectors IDataRecords from DataTable and your Mapper should map from IDataRecord to objects. For a guide on mapping records to objects, I would recommend reading ORM map sources like Dapper.NET, Massive, PetaPoco.

OleDb CSV parsing: Load csv into oleDB and force all outputted data types to string

UPDATE

Since there is only one line, it goes without saying that a simple and straightforward approach is best. So, for questions:



Implement a generic class - if there is no need to further advance the parsing (no more string in the future, more restrictions / features), I would go for a static class that accepts object, strings and mapping information. It would be almost the same as it is now. Here's a slightly modified version (may not compile, but should reflect the general idea):

public static class CSVParser
{
    public static void FillPOCO(object poco, string csvData, CSVMapping mapping)
    {
        PropertyInfo[] relevantProperties = poco.GetType().GetProperties().Where(x => mapping.Mapping.Keys.Contains(x)).ToArray();
        string[] dataStrings = csvData.Split(',');
        foreach (PropertyInfo property in relevantProperties)
            SetPropertyValue(poco, property, dataStrings[mapping.Mapping[property.Name]]);
    }

    private static void SetPropertyValue(object poco, PropertyInfo property, string value)
    {
        // .. here goes code to change type to the necessary one ..
        property.SetValue(poco, value);
    }
}

      

As for converting a string to a typed value - there is a Convert.ChangeType method that handles most of the cases. There's a particular problem with booleans (when it gave "0" instead of "false").

As far as the dataset is concerned - although reflection is considered slow, for individual objects and rarely used, this should be sufficient because it is easy and simple. The usual methods for solving the poco population problem are: create a runtime transform method (which uses reflection, which must be initialized and then compiled and called like any other method) - usually implemented using DynamicMethod, expression trees, etc. - there are many topics here on so; using dynamic objects (available since C # 4.0) - where to assign / get a variable that you don't need to declare; use available libraries on the market (usually from ORM systems as they rely heavily on converting data to an object).

Personally, I would measure if reflection is right for my performance needs and will go ahead to fix this.

+2


source


I would 100% agree with @Dimitriy on this, as I've written 5-10 CSV parsers in the last few weeks.

Edit: (this needs to save the text to a temporary file using something like Path.GetTempFile()

, but that will give you flexibility)

The argument to use DataTable would be best as if the connection string is correct - with Extended Properties='true;FMT=Delimited;HDR=Yes'

it will go to the DataTable and the column headers (which will help you in this case) will be preserved.

So you can write CSV like

Name,Age,City
Dominic,29,London
Bill,20,Seattle

      

This generates a DataTable with the column headings you specified. Otherwise, stick to using ordinals as before.

To integrate this, add a constructor (or an extension method, which I will get shortly) that will strip the data when passed to the DataRow:



public UserData(DataRow row)
{
     // At this point, the row may be reliable enough for you to
     // attempt to reference by column names. If not, fall back to indexes

    this.Name = Convert.ToString(row.Table.Columns.Contains("Name") ? row["Name"] : row[0]);
    this.Age = Convert.ToInt32(row.Table.Columns.Contains("Age") ? row["Age"] : row[1]);
    this.City = Convert.ToString(row.Table.Columns.Contains("City") ? row["City"] : row[2] );
}

      

Some would argue that the conversion process is not the responsibility of the UserData class, as it is a POCO. Instead, implement either an extension method in the class ConverterExtensions.cs

.

public static class ConverterExtensions
{
     public static void LoadFromDataRow<UserData>(UserData data, DataRow row)
     {
         data.Name = Convert.ToString(row.Table.Columns.Contains("Name") ? row["Name"] : row[0]);
         data.Age = Convert.ToInt32(row.Table.Columns.Contains("Age") ? row["Age"] : row[1]);
         data.City = Convert.ToString(row.Table.Columns.Contains("City") ? row["City"] : row[2] );
     }
}

      

A more architecturally sound technique is to implement an interface that defines a transform. Implement this interface with the conversion process, and then store this link on the backend. This will make the conversion for you, keeping the display completely separate and keeping your POCO nice and tidy. It will also allow you "pluggable" maps.

public interface ILoadFromDataRow<T>
{
     void LoadFromDataRow<T>(T object, DataRow dr);
}

public class UserLoadFromDataRow : ILoadFromDataRow<UserData>
{
     public void LoadFromDataRow<UserData>(UserData data, DataRow dr)
     {
        data.Name = Convert.ToString(row.Table.Columns.Contains("Name") ? row["Name"] : row[0]);
        data.Age = Convert.ToInt32(row.Table.Columns.Contains("Age") ? row["Age"] : row[1]);
        data.City = Convert.ToString(row.Table.Columns.Contains("City") ? row["City"] : row[2] );
     }
}

public class UserData
{
    private ILoadFromDataRow<UserData> converter;

    public UserData(DataRow dr = null, ILoadFromDataRow<UserData> converter = new LoadFromDataRow<UserData>())
    {
         this.converter = (converter == null ? new LoadFromDataRow<UserData>() : converter);

         if(dr!=null)
             this.converter.LoadFromDataRow(this,dr);
    }

    // POCO as before
}

      

In your script go to extension methods. This interface method (called segregation) was the way to implement it before there were extension methods.

+2


source







All Articles