Polymorphic string for template

I am working on an issue where users (truck drivers in this case) use SMS to send work status information. I want to keep the key simple as not all users have smartphones, so I adopted some simple shortcodes to enter them. Here are some examples and their meanings:

  • P # 123456-3 (This is for loading load 123456-3)
  • D # 456789-1 (for load shedding 456789-1)
  • L # 345678-9 (Loading 345678-9 will be late)

It's pretty straightforward, but the users (and truck drivers) who are who they are will be key updates in somewhat deviant ways, such as:

  • #D 456789-1
  • D # 456789 - 1
  • D # .456789-1 This download looks wet for me to cancel this order

You can pretty much come up with a dozen other permutations and I have a hard time catching and fixing what I can imagine.

Basically I use regular expressions to test input against all of my imaginary "bad" patterns, and then extract what I believe are the good parts, reinstalling them in the correct order.

These are new bugs that are causing me problems, so I wondered if there was a more general method where I could pass the "template" and "message" to a function that would do its best to turn the "message" into something corresponding " pattern ".

My searches haven't found anything that really matches what I'm trying to do, and I'm not even sure if there is a good general way to do this. I am using PHP for this implementation, but any example should help. Do any of you have a method?


source to share

4 answers

Try something like this:

function parse($input) {
    // Clean up your input: 'D#.456789 - 1 foo bar' to 'D 456789 1 foo far'
    $clean = trim(preg_replace('/\W+/', ' ', $input));
    // Take first 3 words.
    list($status, $loadId1, $loadId2) = explode(' ', $clean);
    // Glue back your load ID to '456789-1'
    $loadId = $loadId1 . '-' . $loadId2;
    return compact('status', 'loadId');



$inputs = array(
    '#D 456789-1',
    'D# 456789 - 1',
    'D#.456789-1 This load looks wet to me do weneed to cancelthis order',
echo '<pre>';
foreach ($inputs as $s) {



    [status] => P
    [loadId] => 123456-3
    [status] => D
    [loadId] => 456789-1
    [status] => D
    [loadId] => 456789-1
    [status] => D
    [loadId] => 456789-1




If the user is having problems with your software, fix the software, not the user!

The problem comes from the fact that your format looks unnecessarily complex. Why do you need a hash in the first place? How to simplify it to the following:

 operation-code maybe-space load-number maybe-space and comment


Operation codes are assigned to different phone keys, therefore J

, K

and L

mean the same thing. Download numbers can be sent as numbers and as letters, for example, agja

means 2452

. It is difficult for the user to make a mistake using this format.

Here's some code to illustrate this approach:

function parse($msg) {

    $codes = array(
        3 => 'DROP',
        5 => 'LOAD',
        // etc

    preg_match('~(\S)\s*(\S+)(\s+.+)?~', $msg, $m);
        return null; // cannot parse

    $a = '.,"?!abcdefghijklmnopqrstuvwxyz';
    $d = '1111122233344455566677777888999';

    return array(
        'opcode'  => $codes[strtr($m[1], $a, $d)],
        'load'    => intval(strtr($m[2], $a, $d)),
        'comment' => isset($m[3]) ? trim($m[3]) : ''

print_r(parse(' j ww03 This load looks wet to me'));
//[opcode] => LOAD
//[load] => 9903
//[comment] => This load looks wet to me

//[opcode] => DROP
//[load] => 990123
//[comment] => 




First remove the stuff that shouldn't be there:

$str = preg_replace('/[^PDL\d-]/i', '', $str);


This gives the following normalized results:



Then try to find the data you want:

if (preg_match('/^([PDL])(\d+-\d)/i', $str, $match)) {
    $code = $match[1];
    $load = $match[2];
} else {
    // uh oh, something wrong with the format!




Something like



or be even more relaxed,



will deliver you what you want. But I would prefer HamZa's comment as a solution: drop it and tell them to act together :)



All Articles