AWS Scalable Architecture Design

I am currently implementing my first web application using AWS framework and am learning the basics. I ran into a design problem, so the following script was given to illustrate my problem:

Suppose I am making a web application that saves / prints a website in pdf format and saves it to S3. The front part has one shape. The user types in the URL for the site they want to save in pdf format and click Submit. The application should print the page with the given url to pdf and present the file to the user.

To make the application scalable, I assumed that clicking the submit button would send an SQS message to the queue with the URL to process. A fleet of workers can then be pulled from that queue, generate pdfs and store them in S3, and then store the S3 key / path in SimpleDB. The problem I'm running into is how does the worker notify the web application when processing is complete?

Design example: Example Design

My guess is that the web application can try SimpleDB continuously until there is an entry for the S3 key, however this solution seems a little clunky. I feel like this is a pattern / problem that I usually have to deal with. Can anyone provide a general way to solve this problem?

Also, any recommended resources for common design patterns in the cloud will be very helpful.

+3


source to share


3 answers


Unless you're using something like WebSockets, I don't see a problem with that. When the user makes a request, the web application will poll SimpleDB (as you mentioned) to check if processing has completed (or an error has occurred). Something like WebSockets, then you can have another queue that the web app subscribes to to be notified when processing is complete and then notify the browser that it has completed.



+1


source


As you said, you basically solved all your problems except the front-end, which would have to poll our API to make sure the media was ready. At my company, we do what you said above, providing screenshots of web pages as well as JPG snapshots for PDFs, office documents, and video and audio encoding processing.



We use ajax for updates and set them up so that they ping several times per second and then gradually drop once a second and every few seconds so as not to put too much load on our servers. Another option, as mentioned in another mention, would be using websockets, which is a persistent connection to the server from which you can "push" and "pull" data. However, most of them use ajax polling approach. With older technologies like Apache this can be a big problem for thousands of connections, but with things like Nginx, Node, and intermediate caching, it doesn't really matter.

+1


source


You can store the object (as a token) in S3 and then poll S3 instead of a simple DB. This way you can avoid stress on your SimpleDB and your voting results will be more stable.

You can do this approach for polling from your web app, or even polling from the ajax layer. (although the latter is not the best choice, as any error in these calls will not be logged to your server)

0


source







All Articles