How to request and then process large quantities in a somewhat short time frame
I have a large table of orders. At a set time, I need to send an SMS (different content for each of them, unfortunately), to a large chunk of them (based on whether they were selected for the message and if they contain a phone number). It could be 200,000+ numbers in a fairly short amount of time. (in fact, it is not so high, but theoretically it is possible and necessary to build as such).
They don't need to send everything instantly, but within 1-3 hours from the set time it would be ideal.
I'm using a service with an API to send them, so no problem - problems:
1) how to handle large quantities
2) how to know that they are all processed or reprogrammed by those who did not have
I don't think it would be a good idea to do a MySQL query to get all 200,000+ phone numbers and then skip - I should have assumed that would have sucked up a lot of memory (?).
So - I thought to try cron job and run it every minute (or so). In this script, I could pull out perhaps 5000 records, mark as "in-process" and re-process them, and then update the line to "submitted".
But we have so many potential problems ... what if the SMS service slows down and I can't post all of them. What if there is an unexpected error and the script stops partially after ... etc. If for any number of reasons the script does not send all 5000 records, how can I know which ones are being returned and processed?
It's not only this process, but a few others - a common question keeps coming up , how to handle the large number of lines we need to process and know that each one is done .
Hopefully I'm just overcomplicating the crap out of this and that there is an easier way to handle it.
If I'm not clear please comment and I would be happy to explain any aspect even more.
source to share
Short version:
-
Don't worry about memory consumption. Just don't try to get the whole result at once.
-
Your idea of using a separate table to display each text message and then updating the row when you know if it succeeded or not is generally the right approach (whether you do it in cron or not, it doesn't matter).
-
If you are concerned that your SMS provider might refuse some of your requests, you can implement your own queuing mechanism using ActiveMQ or something similar. However, this kind of defeats a large part of the purpose of using the provider. They should use their own queue so you don't have to worry about it.
Details:
The SMS service should notify you of success or failure. Most high volume SMS services queue your messages and send them in chunks n messages at a time. They will then notify you via some kind of callback or webhook which messages have succeeded and which failed. Most of them also provide an API to check if a specific message (s) has been sent. You need to use these functions.
I think you are on the right track with your cron approach.
One option is to never pull records. Instead, there is a column in the existing table that indicates whether it is waiting for a message to be sent or not. So, instead of doing SELECTs and dealing with hundreds of thousands of rows, you do a simple UPDATE, and then when each callback comes from the API, you can re-update the rows on success / failure.
If you're in a situation where you can send multiple messages at the same time for each row of data, obviously this won't work. You will need to have a separate table with a row for each post that you want to track.
As far as your memory is concerned, I don't think this is a problem. Just don't get the whole set of results. Instead, select each line separately. This will prevent mysql from returning the entire dataset and you don't need to store it in memory.
From php.net
Since mysqli_fetch_all () returns all rows as an array in one step, it can consume more memory than some similar functions such as mysqli_fetch_array (), which returns only one row at a time from a result set. Also, if you need to iterate over a set of results, you will need a loop construct that will further impact performance. For these reasons, mysqli_fetch_all () should only be used in situations where the resulting result set will be sent to another level for processing.
Change / change
to answer the question / question:
I can't pull out just one record per chron - it will be forever ... I understand that I shouldn't get the entire result set either, that's what prompted me to ask "then how else can I do this?
In PHP (using mysqli with mysqlnd) when you execute a query, it doesn't actually return any data. It prepares data to be returned based on your request, but it doesn't return it.
When you use fetch_all, you are requesting the entire result. When you use fetch_array, you are asking for the next result , and you are telling mysql to move the result cursor so that after that you can get the next result . As long as you don't store each individual result in memory (in separate variables), there is no memory issue. Just use the string as you need to use it and then get the next one. Regardless of whether it is cron running or not. You don't need to call script over-and-over, once for each line. The script processes each line in one call. It just reads in one line of data to save memory.
Here's an example script:
$mysqli = new mysqli("host", "user", "pass", "db");
$query = "SELECT * from TextMessages";
$result = $mysqli->query($query);
while ($row = $mysqli->fetch_array($result))
{
//this is the only thing you store in memory, one single row at a time
$row = $result->fetch_array(MYSQLI_ASSOC);
//go send the text message and do whatever else you need to do
if ($row["SomeSmsToken"] == null && $row["TextHasAlreadyBeenSentOrDateSentOrWhatever"] == false)
{
//$someSmsToken = $myTwilioObject->SendByRow($row);
//$this->UpdateRowToTellItThatItHasBeenSentToProviderAndIsWaitingForResponse($row,$someSmsToken);
//..etc...
//then go to the next row.
}
}
$result->free();
And then in some script callback you do something like this.
$mysqli = new mysqli("host", "user", "pass", "db");
$query = "SELECT * from TextMessages where SomeSmsToken = '".$_POST["SomeTokenSentFromProviderInCallback"]."'";
$result = $mysqli->query($query);
while ($row = $mysqli->fetch_array($result))
{
$someObject->UpdateRowToSayThatTheTextWasSentOrItFailed($row,$_POST["SomeStatusSentFromProviderInCallback"]);
}
$result->free();
You can also use mysqli_free_result
to free up any memory consumed by the php mysql driver.
From php.net:
You should always free your result with mysqli_free_result () when your final object is no longer needed.
EDIT: If you need some clever way to deal with "what to do if the script expires", I would suggest running cron every minute. When it starts, it should check if it is running, and if it is not already running, start it. The script will run until it times out. Then, within a minute, cron will start it up again, and since it is not running, it will start again and close where it left off.
source to share