Upload files in chunks in multiple streams in Golang

I need to upload files, chunks using chunk in multiple threads. For example, I have 1k files, each file is ~ 100Mb-1Gb, and I can only download those files in 4096Kb chunks (each HTTP request gives me just 4kb).

It might take a long time to load it into one thread, so I want to load them in, say, 20 threads (one thread for one file) and I also need to load several chunks in each of these threads at the same time.

Is there any example that shows this kind of logic?

+3


source to share


1 answer


This is an example of how to configure a parallel bootloader. The things to be aware of are bandwidth, memory, and disk space. You can kill your bandwidth by trying to do this right away, the same goes for memory. Your downloads are quite large, so memory can be a problem. Another thing to note is that with gorountines, you lose the order of the requests. So if the order of the returned bytes matters, then it won't work, because you will need to know the byte order to build the file at the end, which would mean loading one at a time is best, unless you implement a way to keep track of the order (maybe , some kind of global map [order int] [] byte with a mutex to prevent race conditions). An alternative that does not includeGo

(assuming you have a unix machine for convenience) should be used Curl

here http://osxdaily.com/2014/02/13/download-with-curl/



package main

import (
    "bytes"
    "fmt"
    "io"
    "io/ioutil"
    "log"
    "net/http"
    "sync"
)

// now your going to have to be careful because you can potentially run out of memory downloading to many files at once..
// however here is an example that can be modded
func downloader(wg *sync.WaitGroup, sema chan struct{}, fileNum int, URL string) {
    sema <- struct{}{}
    defer func() {
        <-sema
        wg.Done()
    }()

    client := &http.Client{Timeout: 10}
    res, err := client.Get(URL)
    if err != nil {
        log.Fatal(err)
    }
    defer res.Body.Close()
    var buf bytes.Buffer
    // I'm copying to a buffer before writing it to file
    // I could also just use IO copy to write it to the file
    // directly and save memory by dumping to the disk directly.
    io.Copy(&buf, res.Body)
    // write the bytes to file
    ioutil.WriteFile(fmt.Sprintf("file%d.txt", fileNum), buf.Bytes(), 0644)
    return
}

func main() {
    links := []string{
        "url1",
        "url2", // etc...
    }
    var wg sync.WaitGroup
    // limit to four downloads at a time, this is called a semaphore
    limiter := make(chan struct{}, 4)
    for i, link := range links {
        wg.Add(1)
        go downloader(&wg, limiter, i, link)
    }
    wg.Wait()

}

      

+3


source







All Articles