File age report

I often find answers on this site without, but on this I need more personalized help. Hopefully someone can point me in the right direction.

I've already fiddled with trying to pull a report from my NAS system to get the Age of Data stats and Data Size so I can try to provide a Charge Back / Show back solution.

I've managed to do this mostly with Powershell using get-childitem, and I even try to connect to .net using [System.IO.Directory] :: EnumerateDirectories, and other commands. All of these solutions work, but I seem to be very slow at getting this information, especially when comparing it to the Jam TreeSize which gives this information quickly.

to note, I even tried multithreading in powershell, thinking if I could collect data from different points that it would collect so that all data would be faster, but I had mostly mixed results.

I hope someone else has tackled this project sooner and has managed to get a good quick way to do it. I am even open to other languages ​​doing this.

Quick notes, I am doing this in powershell v5. I also started learning a little python, so if anyone has a suggestion this would be a great place for me to find out.

Edit:

OK Here are some examples. Times: Treesize takes 10 seconds Powershell Get-ChildItem takes about 2 minutes dotnet actions in Powershell takes about 2 minutes.

The number of objects counted 60,000 objects, size 120 GB.

get-childitem with recurse you will get all the file objects at the specified location including their attributes like access time and size available during dotnet byte access, you need to use a combination of EnumerateFiles and so on and then a loop that with FileInfo, which gets the file objects at the given location and then check their attributes accordingly

In terms of multithreading, I will point you to some of the links I used, it would be too much to add here, but I tried to create a runpacepool, but I also tried to manually run two separate spaces to compare the results and they were pretty much the same. why am I time obsessed, while the test directory I use above is only 2 minutes long, my NAS has millions of files in some volumes. One test I did took an hour and a half, and if I did it with other volumes it would take several hours. I just want to find speeds closer to Treesize

Edit: I have applied the robocopy workaround as an answer, however if you have any suggestions for a different language and procedure please feel free to comment and this will be something I will consider in the future

+3


source to share


1 answer


I was there, and to get what you need ... it's tricky at least: TreeSize reads information directly from the MFT and Get-ChildItem

works at a higher level, already in the OS, so the speed changes a lot.

So, if you want to speed up your report, you really need to go under the hood and do something at the lower levels.



For me, even if it wasn't the fastest solution, I got a compromise and used robocopy /l /log:c:\mylog.txt

(which doesn't copy bytes and just writes files to mylog.txt

) and then I parsed it. You can play with the multithreading option ( /MT:[N]

where N is the default 8) to speed it up.

What I find useful with this method is that if I need more research, I have all the data I need in a file and therefore it will be faster to query for it. Static, not updated, but when you are talking about millions of files, I think photographing a certain moment is a good approach.

+3


source







All Articles