What is an efficient way in C # to do MD5 and load everything at once?
I am working on a download and then MD5 checks to ensure a successful download. I have the following code that should work, but not the most efficient - especially for large files.
using (var client = new System.Net.WebClient())
{
client.DownloadFile(url, destinationFile);
}
var fileHash = GetMD5HashAsStringFromFile(destinationFile);
var successful = expectedHash.Equals(fileHash, StringComparison.OrdinalIgnoreCase);
I'm worried that all bytes are being transferred to disk and then MD5 ComputeHash()
has to open the file and read all bytes again. Is there a good, clean way to compute MD5 as part of the download stream? Ideally, MD5 should just fall out of function DownloadFile()
as a side effect of the sort. A function with this signature:
string DownloadFileAndComputeHash(string url, string filename, HashTypeEnum hashType);
Edit: Adds code forGetMD5HashAsStringFromFile()
public string GetMD5HashAsStringFromFile(string filename)
{
using (FileStream file = File.Open(filename, FileMode.Open, FileAccess.Read, FileShare.Read))
{
var md5er = System.Security.Cryptography.MD5.Create();
var md5HashBytes = md5er.ComputeHash(file);
return BitConverter
.ToString(md5HashBytes)
.Replace("-", string.Empty)
.ToLower();
}
}
source to share
Is there a good, clean way to compute MD5 as part of the download stream? Ideally, MD5 should just fall out of function
DownloadFile()
as a side effect of the sort.
You could follow this strategy, do "chunked" calculations, and minimize memory pressure (and duplication):
- Open the response flow in the web client.
- Open the target file.
- Repeat if available:
- Reading a fragment from the response stream into a byte buffer
- Write it to the stream of the final file.
- Use a method
TransformBlock
to add bytes to the hash calculation
- Use
TransformFinalBlock
to get the computed hash code.
The example code below shows how this can be achieved.
public static byte[] DownloadAndGetHash(Uri file, string destFilePath, int bufferSize)
{
using (var md5 = MD5.Create())
using (var client = new System.Net.WebClient())
{
using (var src = client.OpenRead(file))
using (var dest = File.Create(destFilePath, bufferSize))
{
md5.Initialize();
var buffer = new byte[bufferSize];
while (true)
{
var read = src.Read(buffer, 0, buffer.Length);
if (read > 0)
{
dest.Write(buffer, 0, read);
md5.TransformBlock(buffer, 0, read, null, 0);
}
else // reached the end.
{
md5.TransformFinalBlock(buffer, 0, 0);
return md5.Hash;
}
}
}
}
}
source to share
If you are talking about large files (I am assuming more than 1 GB), you will want to read the data in chunks, then process each chunk using the MD5 algorithm, and then save it to disk. It's doable, but I don't know how many of the default .NET classes will help you with this.
One approach could be with a custom stream wrapper. First you get Stream
from the WebClient (through GetWebResponse()
and then GetResponseStream()
), then you wrap it up and then pass it ComputeHash(stream)
. When MD5 calls Read()
on your wrapper, the wrapper will call Read
on the network stream, write the data after receiving it, and then pipe it back to MD5.
I don't know what problems await you if you try to do this.
source to share