C # constipation and general thread design

I have a streaming console app that works great, but needs to be improved and I need some feedback.

The program is currently loading a list of data and splitting that data into sections (one chunk for each stream). The program then initializes a new thread using the ThreadPool and passes ONE segment of the partitioned data to it to run on.

Everything works nicely ... except:

Some of the streams don't work ... due to network issues or fatal exceptions. This is expected behavior, not a bug.

I now need a way (if the thread fails) to restore this segment of the data stream and provide it to another worker thread so that it doesn't become an orphan. I'm sure there are ways to do this, i.e. inter-thread data exchange etc., but I think there is a better approach.

Instead of pre-segmenting the data and passing it to each thread, I could share ONE static collection of that data between all threads. This is more elegant, but introduces new synchronization problems that the old approach didn't bother with.

A.) What are your thoughts on this approach to the old one?
B.) If this approach is good, how do I go about blocking access to a shared static collection.

When a stream is inits, I can block the collection and cut the data segment for that stream only. The static collection will now REDUCE the amount dropped for this thread. After a thread FAILURE, I could reallocate that data segment into a shared collection, locking it again and putting the data back into the collection for other threads to try to process.

For example: (untested pseudocode)

void Process(object threadInfo)
{
  lock(StaticCollection)
  {
    var segment = StaticCollection.Take(100);
    StaticCollection.Remove(StaticCollection.Where(item => segment.Contains(item)))
  }

  foreach(var seg in segment)
  {
    // do something
  }

  // reallocate the thread data on failure
  if(unrecoverableErrorOccurred)
  {
    lock(StaticCollection)
    {
      StaticCollection.Add(segment);
    }
  }
}

      

Am I on the right track with this? It seems to me that one thread can delete items at the same time that another thread is reallocating items ... or blocking on a STATIC collection means that no other thread can access that collection at all. So Thread A.) got the lock on the first part of the method, are all other threads blocking the execution of the LAST part of the method until ThreadA finishes?

+2


source to share


2 answers


Share a few things here ...

First, you are not actually blocking the collection. You are locking the monitor associated with the object. I personally think it was a mistake that .NET followed Java by giving each associated monitor object a block, but leave that to one side. Personally, I prefer to have objects and associated variables purely for blocking - so in my code you can see:

private readonly object padlock = new object();

      

This will ensure that no other code will try to acquire this lock, because they will not know about the object.



Second, locks are advisory. It's part of the "you don't block collection" business. If the collection itself is synchronizing with the same lock, and for non-sequential collections a method is used for this purpose Synchronized

, but basically, unless something explicitly blocks the lock, you will not get synchronized.

Third, yes, the two locked blocks shown in your code are using the same locking (assuming the value StaticCollection

doesn't change, of course). If one thread is busy with a call Remove

, this will stop any other thread from being called Add

at the same time, because each one must have a lock. This is probably what you want.

I personally wouldn't make it a truly static collection, though (or rather, I wouldn't use a variable StaticCollection

). I would give each task a reference to the same collection (and a reference to the associated lock, in fact I would probably encapsulate collection, sync and "get me a bunch of work" and "a bunch of work here to have back" bits in a separate class). This will make testing easier and generally more logical. This also means that you could have two separate "sets" of threads running on different collections at the same time ... which can be useful if you make the above encapsulation generic so that they can perform radically different tasks ...

+4


source


You can use a Queue to store unprocessed chunks, and as Jon Skeet says, lock a neutral object and hold the lock long enough to access the queue. I have used this approach with many threads and it worked well for me.



0


source







All Articles