Built-in file system and power

I am working on an embedded application without any OS that requires the use of the filesystem. I have met people on the project many times and some of them agree with me that the system should properly shutdown the system whenever a power failure occurs or the filesystem can go crazy.

Some people say that it doesn't matter if you just turn off the system and let nature take its course, but I think one of the worst things, especially if you know, is going to cause you problems and possibly shorten the life of your product.

In the last paragraph, I only assumed this was the problem, but my question remains:

Does it have any impact on the file system?

+3


source to share


5 answers


For a non-journaled file system, an unexpected shutdown can mean damage to certain data, including the directory structure. This occurs if there is unsaved data in the cache, or if the FS is in the process of writing a multi-block update and an interrupt occurs when only some blocks are written.

Journalling solves this problem in the main - if an interrupt in the middle, a restore procedure, or a check and repair performed by FS (usually implicitly) brings the filesystem to a consistent state. However, this state is not always the last - that is, if there is some data in the memory cache, it may be lost even during logging. This is because journaling saves you from damaging the filesystem, but it doesn't do the magic.



Write mode (no write caching) reduces the chances of data loss, but does not completely solve the problem, since the logging will work like a cache (for a very short time).

Unfortunately, backing up or duplicating data are the main ways to prevent data loss.

+3


source


Below is a list of various methods to help an embedded system survive a power failure. This may not be practical for your particular application.



  • Use Journaling File System . May transfer incomplete records due to power failure, OS crash, etc. Most modern filesystems register, but confirm your homework.

  • If your application does not require write performance, disable all write caching. Check your disk drivers for caching options. On Linux / Unix, consider setting up the file system in sync mode.

  • If it is not writable, make it read-only. ... Try to keep application executables and operating system files on their own partitions (eg, write-protected) mount is only readable on Linux). Your read / write data should be on its own partition. Even if your application data gets corrupted, your system will still be able to boot (albeit with a fault tolerant default configuration).

    3a. For data that is written only once (for example, configuration settings), try to keep it set as read-only most of the time. If there is a configuration change, it mounts as R / W temporarily, refreshes the data, and then disables / reloads it as read-only.

    3b. Use a technique similar to 3a to handle application / OS updates in the field.

    3c. If it is not practical for you to mount FS as read-only, at least consider opening the individual files as read-only (eg fp = fopen ("configuration.ini", "r")).

  • If possible, use separate devices for your storage. Keeping things in separate partitions provides some protection, but there are still edge cases where the partition table can become corrupted and render the entire disk unreadable. The use of physically separate devices further isolates from a single damaged device that degrades the entire system. In an ideal world, you would have at least 4 separate devices:

    4a. Bootloader

    4b. Operating system and application code

    4c. Configuration settings

    4e. Application data

  • Know the characteristics of your storage devices and manage the brand / model / version of the devices used. Some hard drives ignore OS cache flush commands. We have had cases where some models of CompactFlash cards have corrupted themselves during a power failure, but the "industrial" models did not have this problem. Of course, this information was not published in any datasheet and had to be collected through experimental testing. We have developed a list of approved CF cards and kept an inventory of these cards. We had to update this list from time to time because old cards became obsolete or the manufacturer would make changes.

  • Place your temporary files in RAM Disk . If you store these records off-disk, you eliminate them as a potential source of corruption. You also reduce flash wear.

  • Develop automated methods to detect and repair corruption. ... All of the above methods will not help you if the application just hangs because the configuration file is missing. You should be able to recover as gracefully as possible:

    7a. Your system must maintain at least two copies of its configuration settings, "primary" and "backup". If the primary error, for any reason, went to the backup. You should also consider mechanisms for creating backups whenever the configuration changes, or after the user declares the configuration "good" (testing in production).

    7b. Couldn't connect to your app data section? Run chkdsk / fsck automatically.

    7c. Could you solve the chkdsk / fsck problem? Reformat the partition automatically and return it to a known state.

    7d. Do you have a boot loader or other method of recovering your OS and application after a crash?

    7e. Make sure your system beeps, fires an LED, or something to indicate to the user what has happened.

  • Power failure should be part of testing your system. The only way to make sure you have a reliable system is to test it. Unplug the power cord from the system and write down what happens. Try raising the power at several points in the system (at runtime, at boot, in a medium configuration, etc.). Repeat each test several times.

  • If you can't fix all power outage issues, include a battery or Supercapacitor in the system - keep in mind that you will need a background process in your OS to initiate graceful shutdowns at low power levels. In addition, batteries require periodic testing and replacement with age.

+11


source


Addition to msemack's answer, unfortunately my rating is too low to post a comment on his answer and a separate answer.

Does power affect the file system?

Yes, unless proper measures are taken to prevent corruption. See the previous answers for file system options to help mitigate. However, if ATA flush / sleep is not properly implemented on your device, you may end up with the script we did. In our scenario, the device was corrupted outside the filesystem, and fdisk / format did not restore the device.

Instead, it took ATA security recovery to recover the device after corruption occurred. To avoid this, we implemented the ATA disable command before power loss. This required a 400ms holdover to support 160-second ATA sleep, and also leave some headroom for cap degradation over the life of the product.

Notes on our script:

  • fdisk / format failed to repair / restore disk.
  • Our Power-safe filesystem health checker returned that the device had bad blocks, but it really didn't.
  • flush / sync returned success, fast, and most likely not implemented.
  • After being corrupted, dd was unable to read the device beyond the 1st partition boundary and returned I / O errors after.
  • hdparm is used to remove ATA security as the only recovery method for some corruption scenarios.
+4


source


It totally depends on the filesystem in use and if it is acceptable to lose some data on power outage based on your project requirements.

One can imagine using a file system that is protected from automatic power off and can be recovered from a partial write sequence. So, on the application side, if you don't have critical data that absolutely needs to be written before discarding, there is no need for a special shutdown detection routine.

Now, if you want a more specific answer for your project, you will need to provide more information about the filesystem used and your project requirements.

Edit: Since you have critical applicative data to save before power outage, I think you answered the question yourself. The only way to ensure automatic power off is to detect a fault that warns the built-in device in conjunction with some hardware circuitry that keeps the device powerful enough to perform the shutdown procedure.

+2


source


The FAT file system is especially prone to corruption if it is being written or the file is open at shutdown — especially if it is a buffered operation that is not flushed. One project I worked on a solution was to run a filesystem consistency check and repair (essentially chkdsk / scandsk) at startup. This strategy did not prevent data loss, but it did prevent the file system from being unusable.

A number of vendors provide add-on journaling for FAT to counter this very issue. These include Segger , Quadros and Micrium , for example.

In any case, your system should generally use the open-write approach to the file, or open-write-flush if you deem it necessary to open the file.

+2


source







All Articles