What's better? File system query or database query?

We have about half a million images residing in the file system. Several images belong to the product and are named after the ProductID. For example, product ID 10010 has 3 images; 10010_1.jpg, 10010_2.jpg and 10010_3.jpg. To display these 3 images in a photo gallery, the current code in classic ASP queries the file system. The code looks something like this:

Dim objFSO, i
Set objFSO = Server.CreateObject("Scripting.FileSystemObject") 
i=1
While objFSO.FileExists(Server.MapPath(productid & "_" & i & ".jpg")
    ' draw the image
    i = i + 1
WEnd

      

Wouldn't it be better to store the image names in a database table and run a query to get a list of the images available for a given product? I'm sure it's better to read it from the database. Just reluctant, because I need a large table with half a million records. Which method is more efficient?

My table will look something like this:

ID      ProductID   ImageURL
-------------------------------------------
1       10010       10010_1.jpg
2       10010       10010_2.jpg
3       10010       10010_3.jpg
4       10011       10011_1.jpg
. . .

      

Any suggestions? I am rewriting a website in ASP.NET MVC 3. Anything else I need to take care of in this framework / Framework?

+3


source to share


4 answers


Half a million records in the database are not really that big.



If your table is indexed correctly, you will get much better performance from the database than the filesystem.

0


source


IMO you can implement one of two solutions.

  • Hybrid - Use a database to store the file path as it exists on the file system.

  • NoSQL - Move away from the database entirely and go with a NoSQL solution that provides an API to access files on the filesystem.

I saw both versions used # 1 prior to level 2. Even though I've been using # 1 for longer, I find # 2 to be the best alternative for storing "asset" data (images, html descriptions, video files). The main reason NoSQL is better than MySQL when it comes to accessing resource data is that you don't waste precious database connection just pulling out file paths.



Also, another recommendation; slice your directories using productids as parent directories. Consider this:

# yours
dir1
  ... 125K files
dir2
  ... 125K files

# chunked (parent dirs are product ids)
3
  ... 3 files
4 
  ... 2 files
5 
  ... 3 files

      

It is much faster to write files to a directory with only a few (<100 files) than stat files in a directory that contains several thousand files.

+2


source


Yes, storing the file path in the DB makes general sense and will be better even if there is some performance loss.

Note that the presence of data on disk (file) and database (file path) makes a new class of errors possible for an application where the data is inconsistent. Consider this during migration.

Since there is a performance part of the question - prototype and dimension for your dataset if performance is related to that. Things to consider when prototyping / measuring:

  • You don't need to build a website for this, a simple console application that reads data from disk / database would be enough to get you started.
  • try different file layouts (i.e. storing a huge number of files in one directory can slow down file searches)
  • make sure you know if the data will be cached. If your database is too large so it cannot be cached, you will end up with completely different numbers or a case where it fits in memory. Keep in mind that depending on your server configuration, some other services may be competing for the same memory.
0


source


Something to think about. I assume this is a .NET stack, so:

  • Simple database space
  • Reading from disk will be faster than reading from a relational database
  • You will have more problems in a load balanced environment if all your images are in the database
  • Reading a lot of data from the database will have a negative impact on other queries
  • It doesn't matter if your presentation layer is MVC, web forms, or a console application. Your problem is database design.
0


source







All Articles