Is it advisable to use a HashSet as a static global on a web server?

I want to create a "temporary cache lookup" to speed up file searches on my web server.

Now I have a folder of images, and if the user requests an image, I use File.Exists (...) to check if the image exists. If it isn't, download it from another server and redirect the user to it anyway.

The problem is that many requests make File.Exists () hang right away. I would like to keep a quick and dirty HashSet of filenames that are known to be in the local folder, so that if the user requests the file and exists in the HashSet, just redirect it without doing File.Exists () if it doesn't exist int he HashSet, execute File.Exists (), then add it.

I know that the HashSet is reset if the server ever restarts, but I'm not worried about that because it will quickly "rebuild" itself with the most requested images using the script above.

The main question is that since this object will be called from multiple users and various queries will add items to the set, could this cause a problem? So far, all I've used in a static global sense on web servers is DB connection strings or email to send alerts, etc.

Edit regarding race conditions:
Yes, I was thinking about race conditions. But is race condition possible on one HashSet? Since it only allows unique values, can't the second try to add the value fail? At what point will I just ignore the error and continue.

+2


source to share


2 answers


It makes sense. However, make sure you take care of the possible race conditions. You can use a ReaderWriterLockSlim

class
to control access to an object from different threads.

UPDATE:



You absolutely need to lock the object appropriately, since a method is Add

not an atomic operation. You can even leave an object in an inconsistent state by adding two things at the same time.

+3


source


In addition to the race problems people have been talking about, you ask a number of other problems.

First, a cache without a cache lifecycle policy has a name - we call caches that grow without constraints and never release their memory leaks. They have a way to bring servers to their knees. Think about how you know when your cache gets too big and it's time to throw it away.

Second, caches are worse than useless if they are liars. Your cache claims to be file system acceleration, but exactly what mechanism do you propose to store in the cache and file system? Suppose a user accesses a file, the information is cached, and then another operation deletes the file. Now the cache is out of sync with the fact that the item should be abstracted; this sounds like a potential source of error. The next time someone accesses this file, the cache will say that it exists, although it no longer works.



These are, of course, solvable problems; the question is not whether they are allowed, but whether their solution pays for the performance gain you choose.

Or, you could ignore these issues and hope for the best. For example, if you think the cost of a memory leak or cache inconsistency is the cost of doing business that the performance gain is counting on. If this is the case, then it would be wise to deliberately make the decision to make a fragile, low-cost solution, rather than accidentally and wondering later when things break down.

+4


source







All Articles