How to deserialize only part of a large xml file into C # classes?

I've already read some posts and articles on how to deserialize xml, but still haven't figured out how I should write the code according to my needs, so I apologize for the other question about deserializing xml))

I have a large (50MB) xml file that I need to deserialize. I am using xsd.exe to get the xsd schema of the document, not the autogenerate C # classes file that I injected into my project. I want to get some (not all) data from this xml file and put it in my sql database.

Here is the file hierarchy (simplified, xsd is very large):

public class yml_catalog 
{
    public yml_catalogShop[] shop { /*realization*/ }
}

public class yml_catalogShop
{
    public yml_catalogShopOffersOffer[][] offers { /*realization*/ }
}

public class yml_catalogShopOffersOffer
{
    // here goes all the data (properties) I want to obtain ))
}

      

And here is my code:

first approach:

yml_catalogShopOffersOffer catalog;
var serializer = new XmlSerializer(typeof(yml_catalogShopOffersOffer));
var reader = new StreamReader(@"C:\div_kid.xml");
catalog = (yml_catalogShopOffersOffer) serializer.Deserialize(reader);//exception occures
reader.Close();

      

I am getting InvalidOperationException: There is an error in the XML document (3,2)

second approach:

XmlSerializer ser = new XmlSerializer(typeof(yml_catalogShopOffersOffer));
yml_catalogShopOffersOffer result;
using (XmlReader reader = XmlReader.Create(@"C:\div_kid.xml"))          
{
    result = (yml_catalogShopOffersOffer)ser.Deserialize(reader); // exception occures
}

      

InvalidOperationException: There is an error in the XML document (0,0)

third: I tried to deserialize the whole file:

 XmlSerializer ser = new XmlSerializer(typeof(yml_catalog)); // exception occures
 yml_catalog result;
 using (XmlReader reader = XmlReader.Create(@"C:\div_kid.xml"))          
 {
     result = (yml_catalog)ser.Deserialize(reader);
 }

      

And I get the following:

error CS0030: The convertion of type "yml_catalogShopOffersOffer[]" into "yml_catalogShopOffersOffer" is not possible.

error CS0029: The implicit convertion of type "yml_catalogShopOffersOffer" into "yml_catalogShopOffersOffer[]" is not possible.

      

So how do you fix (or rewrite) your code so you don't get an exception?

Edit: Also when I write:

XDocument doc = XDocument.Parse(@"C:\div_kid.xml");

      

An XmlException is thrown: data not specified at the root level, line 1, position 1.

Here is the first line of the xml file:

<?xml version="1.0" encoding="windows-1251"?>

      

edit 2: Example xml file:

<?xml version="1.0" encoding="windows-1251"?>
<!DOCTYPE yml_catalog SYSTEM "shops.dtd">
<yml_catalog date="2012-11-01 23:29">
<shop>
   <name>OZON.ru</name>
   <company>?????? "???????????????? ??????????????"</company>
   <url>http://www.ozon.ru/</url>
   <currencies>
     <currency id="RUR" rate="1" />
   </currencies>
   <categories>
      <category id=""1126233>base category</category>
      <category id="1127479" parentId="1126233">bla bla bla</category>
      // here goes all the categories
   </categories>
   <offers>
      <offer>
         <price></price>
         <picture></picture>
      </offer>
      // other offers
   </offers>
</shop>
</yml_catalog>

      

PS I've already taken the answer (it's perfect). But now I need to find the "base category" for each offer using categoryId. The data is hierarchical and the base category is a category that does not have a parentId attribute. So, I wrote a recursive method to find the "base category", but it never ends. The algorithm seems to be not very fast))
Here is my code: (in the main () method)

var doc = XDocument.Load(@"C:\div_kid.xml");
var offers = doc.Descendants("shop").Elements("offers").Elements("offer");
foreach (var offer in offers.Take(2))
        {
            var category = GetCategory(categoryId, doc);
            // here goes other code
        }

      

Helper method:

public static string GetCategory(int categoryId, XDocument document)
    {
        var tempId = categoryId;
            var categories = document.Descendants("shop").Elements("categories").Elements("category");
            foreach (var category in categories)
            {
                if (category.Attribute("id").ToString() == categoryId.ToString())
                {
                    if (category.Attributes().Count() == 1)
                    {
                        return category.ToString();
                    }
                    tempId = Convert.ToInt32(category.Attribute("parentId"));
                }
            }
        return GetCategory(tempId, document);
    }

      

Can recursion be used in a situation like this? If not, how else can I find the "base category"?

+3


source to share


1 answer


Try LINQ to XML. XElement result = XElement.Load(@"C:\div_kid.xml");

The LINQ query is brilliant, but admittedly a little odd at the start. You select nodes from document in SQL syntax or use lambda expressions. Then create anonymous objects (or use existing classes) containing the data of interest.

It's best to see it in action.

Based on your XML sample and code, here's a specific example:

var element = XElement.Load(@"C:\div_kid.xml");
var shopsQuery =
    from shop in element.Descendants("shop")
    select new
    {
        Name = (string) shop.Descendants("name").FirstOrDefault(),
        Company = (string) shop.Descendants("company").FirstOrDefault(),
        Categories = 
            from category in shop.Descendants("category")
            select new {
                Id = category.Attribute("id").Value,
                Parent = category.Attribute("parentId").Value,
                Name = category.Value
            },
        Offers =
            from offer in shop.Descendants("offer")
            select new { 
                Price = (string) offer.Descendants("price").FirstOrDefault(),
                Picture = (string) offer.Descendants("picture").FirstOrDefault()
            }

    };

foreach (var shop in shopsQuery){
    Console.WriteLine(shop.Name);
    Console.WriteLine(shop.Company);
    foreach (var category in shop.Categories)
    {
        Console.WriteLine(category.Name);
        Console.WriteLine(category.Id);
    }
    foreach (var offer in shop.Offers)
    {
        Console.WriteLine(offer.Price);
        Console.WriteLine(offer.Picture);
    }
}  

      

As an optional extra: here's how to deserialize a category tree from flat elements category

. You need a suitable class to house them, as the list of children must be of type



class Category
{
    public int Id { get; set; }
    public int? ParentId { get; set; }
    public List<Category> Children { get; set; }
    public IEnumerable<Category> Descendants {
        get
        {
            return (from child in Children
                    select child.Descendants).SelectMany(x => x).
                    Concat(new Category[] { this });
        }
    }
}

      

To create a list containing all the individual categories in a document:

var categories = (from category in element.Descendants("category")
                    orderby int.Parse( category.Attribute("id").Value )
                    select new Category()
                    {
                        Id = int.Parse(category.Attribute("id").Value),
                        ParentId = category.Attribute("parentId") == null ?
                            null as int? : int.Parse(category.Attribute("parentId").Value),
                        Children = new List<Category>()
                    }).Distinct().ToList();

      

Then arrange them into a tree (heavily borrowed from a flat list into a hierarchy ):

var lookup = categories.ToLookup(cat => cat.ParentId);
foreach (var category in categories)
{
    category.Children = lookup[category.Id].ToList();
}
var rootCategories = lookup[null].ToList();

      

To find a root containing theCategory

:

var root = (from cat in rootCategories
            where cat.Descendants.Contains(theCategory)
            select cat).FirstOrDefault();

      

+5


source







All Articles