Checking time intervals in Matlab

I'm just wondering if there is a way to compare many time stamps to see if any are missing. At the moment I am looking 365 days a year and 48 readings are performed every day. (In excel doc), so I have over 17000 points to analyze. Currently the format of timestamps is:

1/01/2011 12:30 AM
1/01/2011 1:00 AM
1/01/2011 1:30 AM
1/01/2011 2:00 AM
1/01/2011 2:30 AM

      

I need to go through and see if any values ​​are missing every 30 minutes. I thought about using

datenum('')

      

and then try to compare it and throw an error when it doesn't follow the trend and return the previous value. But I'm not sure.

Any help would be appreciated!

+3


source to share


1 answer


You can use datenum

and insert one of those exact date formatted strings from the example you provided. If you have half hour intervals, the difference between successive calls datenum

should give the same difference. For example, give your dates in a cell array like this:

C = {'1/01/2011 12:30 AM',
'1/01/2011 1:00 AM',
'1/01/2011 1:30 AM',
'1/01/2011 2:00 AM',
'1/01/2011 2:30 AM'};

      

We can distinguish between successive elements using diff

. How does it work diff

, given the i th element in the array, the output for the vector in y_i

given the input value x_i

:

y_i = x_{i+1} - x_i

      

Hence this will return a vector that is less than one in length. We're mainly looking at the elements of the second element of your dates and beyond. So, applying diff

to datenum

on each element in this cell array, we get:

format long
diffs = diff(datenum(C))

diffs =

   0.020833333255723
   0.020833333372138
   0.020833333372138
   0.020833333255723

      

The first 7 significant digits or so are significant. The rest of the digits are due to some accuracy differences, but set them aside halfway. Thus, you need to check if each item is in the difference about 0.0208333

. If it is not, you are missing the spacing. Try trying several times:

C = {'1/01/2011 12:30 AM',
'1/01/2011 1:30 AM',
'1/01/2011 2:30 AM',
'1/01/2011 3:00 AM',
'1/01/2011 4:30 AM'};

format long
diffs = diff(datenum(C))

diffs =

   0.041666666627862
   0.041666666627862
   0.020833333372138
   0.062500000000000

      

Therefore, for the second, third and last item, C

we skip measurements at half hour intervals. In particular, I assume that your units are in half an hour. So the smallest possible jump between missing measurements is an hour and a jump between 0.0208

and 0.0416

, so what about the difference 0.02

. Thus, we need to find places in this array where it is greater than 0.0416

. To be safe, set the value 0.03

. Thus, if you want to do it programmatically, you can do this:

diffs = diff(datenum(C));
locs = find(diffs > 0.03) + 1;

      

find

defines locations in a matrix / array that satisfy a particular boolean condition. In this case, we want to find locations with a difference > 0.03

. We also compensate 1

because we are looking at the second element we talked about earlier. By doing this with our modified array C

, we get:

locs =

     2
     3
     5

      

This tells us that at locations 2, 3, and 5 for our modified date array ( C

), we are skipping measurements with a half hour mark.

To double check our first example, if we applied that in the first example, when there are no gaps, we get an empty array as expected:

locs = 

[]

      


As a little bonus, we can display where the spacing is missing. In particular:

missingTimes = C(locs)

      

In our fashion time example, we get:

missingTimes = 

    '1/01/2011 1:30 AM'
    '1/01/2011 2:30 AM'
    '1/01/2011 4:30 AM'

      


Edit

From our conversation in the comments, it messes up as soon as you have a date without a time and just a date. In particular, when you call datenum

with at least one of them in a cell array, we will no longer get floating point precision. We'll only get integers (for some odd reason ... and I can't figure out why. Maybe I should make a StackOverflow post about this). In other words, if we did:

C = {'1/01/2011',
'1/01/2011 12:30 AM',
'1/01/2011 1:30 AM',
'1/01/2011 2:30 AM',
'1/01/2011 3:00 AM',
'1/01/2011 4:30 AM'};

      

If we did:

diff(datenum(C))

      

We get:

ans =

  0
  0
  0
  0
  0

      




To get around this, I had to implement my own version diff

and access the items in the date array separately. So do this instead:

format long;
diffs = arrayfun(@(x) datenum(C{x}) - datenum(C{x-1}), (2:numel(C)).');

      

I have used arrayfun

and I give an input array that goes from 2 to the number of elements as we have in C

. For each element in our output, we take the th datenum

element's representation i+1

and subtract that from our i

th element. This essentially implements the operation diff

manually and avoids the small bug when you include a date that has no time. I honestly don't know why all the decimal points after the integers are removed ... but it works for now.

In any case, we get:

diffs =

   0.020833333372138
   0.041666666627862
   0.041666666627862
   0.020833333372138
   0.062500000000000

      


Edit # 2

It looks like you still have problems. Another assumption I would like to make is to find those times when the timestamp is missing12:00 AM

. Then we will find these records and place the timestamp 12:00 AM

manually. So we can use regular expressions to do this via regexp

. Regular expressions try to find where patterns occur in strings. So we're going to find those templates that do n't have a timestamp at the end, and then use some extra code to insert that timestamp. Consider a toy example:

C = {'1/01/2011',
'1/01/2011 12:30 AM',
'1/01/2011 1:30 AM',
'1/01/2011 2:30 AM',
'1/01/2011 3:00 AM',
'1/01/2011 4:30 AM',
'1/02/2011',
'1/02/2011 12:30 AM',
'1/02/2011 1:30 AM',
'1/02/2011 2:30 AM',
'1/02/2011 3:00 AM',
'1/02/2011 4:30 AM',
'1/03/2011',
'1/03/2011 12:30 AM',
'1/03/2011 1:30 AM',
'1/03/2011 2:30 AM',
'1/03/2011 3:00 AM',
'1/03/2011 4:30 AM'};

      

Here we have different dates and times, with some of them not having a time stamp 12:00 AM

. Thus, this is how I'm going to insert the timestamps into:

missingTimeStampsLocs = cellfun(@(x) isempty(regexp(x,'[0-9]{1,2}\/[0-9]{2}\/[0-9]{4} [0-9]{1,2}:[0-9]{2} [AaPp][Mm]')), C);
missingTimeStamps = C(missingTimeStampsLocs);
filledInTimeStamps = cellfun(@(x) [x ' 12:00 AM'], missingTimeStamps, 'uni', 0);
C(missingTimeStampsLocs) = filledInTimeStamps;

      

It looks like a daunting piece of code, but it can certainly be explained. Start at the first line of code. First, we call regexp

where it takes the string we want to look at, and then the second parameter is for describing the template you are looking for. What I have to do here, I will search for all dates that follow the following format:

 #/##/#### ##:## xx
       OR
##/##/#### ##:## xx

      

#

stands for a number and x

stands for a symbol. We will search for all dates that follow this exact one . We will mark any dates that do not conform to this format, which means that they do not have timestamps. Take a look at this statement:

regexp(x,'[0-9]{1,2}\/[0-9]{2}\/[0-9]{4} [0-9]{1,2}:[0-9]{2} [AaPp][Mm]')

      

This suggests that for a string, x

we will be looking for a string that starts with 1 or 2 numbers, /

followed by exactly 2 numbers, and then /

followed by exactly 4 numbers, and then a space, then we will search either 1 or 2 numbers, then :

followed by exactly 2 numbers followed by a space, then either AM

or or PM

and case insensitive . This means that AM

or PM

can be either uppercase or lowercase.

What will be returned from regexp

are the locations in your string where this string was found. In our case, it will either return 1

that we will find this string in starting with , or empty , which means that we did not find such a string. If it regexp

returns empty, then this date has a missing timestamp. This is why I wrapped this call isempty

to check if it is returning regexp

. Then I complete this call by using cellfun

so that we can iterate over all the elements in the cell array. The output (stored in missingTimeStampsLocs

) will contain a boolean array, which 1

means there is no timestamp, 0

meaning that it is not.

The next line of code then extracts dates from the original array of cells that do not have dates. Then I run cellfun

again to iterate over those cells and then we concatenate the timestamp 12:00 AM

at the end of each row in this extracted cell array. Note that I am also specifying two additional parameters ( 'uni'

and 0

), because the output is no longer one value, but a string. These strings will be placed in the cell array, which is ideal because they are pulled from the cell array anyway. We didn't need to specify this in the first call cellfun

as the output is the only value - in this case it was a boolean value, 0

or1

... Once we are done, we will replace the dates that have the missing timestamps and the ones we just filled in with the timestamp 12:00 AM

. This is overwritten in C

. Thus, by executing the above code with ours C

, we get the following:

C =  

'1/01/2011 12:00 AM'
'1/01/2011 12:30 AM'
'1/01/2011 1:30 AM'
'1/01/2011 2:30 AM'
'1/01/2011 3:00 AM'
'1/01/2011 4:30 AM'
'1/02/2011 12:00 AM'
'1/02/2011 12:30 AM'
'1/02/2011 1:30 AM'
'1/02/2011 2:30 AM'
'1/02/2011 3:00 AM'
'1/02/2011 4:30 AM'
'1/03/2011 12:00 AM'
'1/03/2011 12:30 AM'
'1/03/2011 1:30 AM'
'1/03/2011 2:30 AM'
'1/03/2011 3:00 AM'
'1/03/2011 4:30 AM'

      

We can then run this through our detection code and see which dates jump by half an hour.

diffs = diff(datenum(C));
locs = find(diffs > 0.03) + 1;
missingTimes = C(locs)

      

Thus, we get:

missingTimes = 

'1/01/2011 1:30 AM'
'1/01/2011 2:30 AM'
'1/01/2011 4:30 AM'
'1/02/2011 12:00 AM'
'1/02/2011 1:30 AM'
'1/02/2011 2:30 AM'
'1/02/2011 4:30 AM'
'1/03/2011 12:00 AM'
'1/03/2011 1:30 AM'
'1/03/2011 2:30 AM'
'1/03/2011 4:30 AM'

      


I really hope this is the last time I work on this issue (LOL) as I am quite sure I have covered all contingencies. I am also assuming that your dates are formatted in a specific way and I hope this solves your problem. We also don't need to use our custom function diff

that we wrote, since now I am trailing your dates to have a stamp on it 12:00 AM

.

Good luck!

+4


source







All Articles