XML to database, which path should I take?
I have access to a traffic data server, from where I get XML files with the information I need. (Example: point A to point B: travel time 20 minutes, distance 18 miles, etc.).
I download an XML file (which is zipped), extracts it, then processes it and stores it in the DB. I only allow the XML file upload per request, but only if 5 minutes have passed since the last upload. The XML on the traffic server is refreshed every 30 seconds, possibly 5 minutes. Within 5 minutes, any user requesting the webpage will be fetching data from the DB (without refreshing), so it limits the number of requests made to the traffic server.
My problem with my current approach is that when I receive a new XML file, the whole process takes a while (3-7 seconds) and it makes the user wait too long before getting anything. However, when XML loading is not required and all data is displayed directly from the database, the process is very fast. The zipped XML is around 100-200KB while the unzipped XML is around 2MB. The XML file contains traffic data from 3 or 4 states, whereas I only need data for one state. This is why I am currently using the DB method.
Is this approach good? I was wondering if I just need to fetch data directly from the loaded XML file for each request and somehow limit how often the XML file is downloaded from the traffic server. Or, can anyone point me to a better way?
Sample XML file
This is how it looks on my site
source to share
You need to load XML every time it changes.
But only if you have active users in the next period of time to download files.
Since you can't foresee the future, you don't know if you will receive a user request within the next 7 seconds.
However, you can find out with the HEAD request if the XML file has been updated.
This way, you can create yourself a service that loads XML from the remote system every time it changes. In case this date is really not needed, often you can set up this service to check and / or download it often and not often.
The rest of your system can be independent of it if you can learn about the best configuration of the boot service, statistical analysis of your users' behavior.
If you require even more real-time time, you need to set up new services based on data changes from another system, and then you need to start a two-way communication between the two systems, which is more complex and can lead to more side effects. But from the number you give, that level of detail is probably not needed at all, so I won't like it.
source to share