Creating dates in a date range using U-SQL

I need to populate a rowset with all dates between a specific start date and an end date. If my start date is 19/7/2017 and my end date is 21/7/2017 then the rowset should contain 19/7/2017, 20/7/2017 and 21/7/2017.

I was wondering if there is an easy way to do this using U-SQL

+3


source to share


4 answers


We always recommend that developers first learn to use the pure U-SQL approach instead of using C # UDOs, here is another way to accomplish this task.

Let's first look at how you simply get a list of numbers in U-SQL

@numbers_10 = 
    SELECT
        *
    FROM 
    (VALUES
        (0),
        (1),
        (2),
        (3),
        (4),
        (5),
        (6),
        (7),
        (8),
        (9)
    ) AS T(Value);

      

There are only 10 numbers - from 0 to 9. We can use CROSS JOIN to expand the list.

@numbers_100 = 
    SELECT (a.Value*10 + b.Value) AS Value
    FROM @numbers_10 AS a 
        CROSS JOIN @numbers_10 AS b;

      

We now have 0 to 99. We can use the CROSS JOIN to create even more numbers.



@numbers_10000 = 
    SELECT (a.Value*100 + b.Value) AS Value
    FROM @numbers_100 AS a CROSS JOIN @numbers_100 AS b;

      

Then create a list of dates from that.

DECLARE @StartDate = DateTime.Parse("1979-03-31");

...

@result = 
    SELECT 
        Value,
        @StartDate.AddDays( Value ) AS Date
    FROM @numbers_10000;

      

The complete script looks like this:

DECLARE @StartDate = DateTime.Parse("1979-03-31");

@numbers_10 = 
    SELECT
        *
    FROM 
    (VALUES
        (0),
        (1),
        (2),
        (3),
        (4),
        (5),
        (6),
        (7),
        (8),
        (9)
    ) AS T(Value);

@numbers_100 = 
    SELECT (a.Value*10 + b.Value) AS Value
    FROM @numbers_10 AS a CROSS JOIN @numbers_10 AS b;

@numbers_10000 = 
    SELECT (a.Value*100 + b.Value) AS Value
    FROM @numbers_100 AS a CROSS JOIN @numbers_100 AS b;

@result = 
    SELECT 
        Value,
        @StartDate.AddDays( Value ) AS Date
    FROM @numbers_10000;

OUTPUT @result TO "/res.csv" USING Outputters.Csv(outputHeader:true);

      

Once you have a list of numbers or dates, it can be convenient to store it in a U-SQL table so you can easily retrieve the list later.

+4


source


The easiest way to do this is to export your favorite date dimension from your favorite storage and import it into a U-SQL table.

You can also do it with custom U-SQL code, something like this:

DECLARE @outputFilepath string = "output/output74.csv";

//DECLARE @startDate DateTime = DateTime.Parse("19/7/2017");
//DECLARE @endDate DateTime = DateTime.Parse("21/7/2017");

DECLARE @startDate DateTime = DateTime.Parse("1/1/2000");
DECLARE @endDate DateTime = DateTime.Parse("31/12/2017");


// User-defined appliers
// Take one row and produce 0 to n rows
// Used with OUTER/CROSS APPLY
@output =
    SELECT outputDate
    FROM(
        VALUES ( 1 ) 
        ) AS dummy(x)
        CROSS APPLY new USQLtpch.makeDateRange (@startDate, @endDate) AS properties(outputDate DateTime);


OUTPUT @output
TO @outputFilepath
USING Outputters.Tsv();

      



Code file:

using Microsoft.Analytics.Interfaces;
using Microsoft.Analytics.Types.Sql;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;

namespace USQLtpch
{

    [SqlUserDefinedApplier]
    public class makeDateRange : IApplier
    {
        private DateTime startDate;
        private DateTime endDate;

        public makeDateRange(DateTime startDate, DateTime endDate)
        {
            this.startDate = startDate;
            this.endDate = endDate;
        }

        public override IEnumerable<IRow> Apply(IRow input, IUpdatableRow output)
        {

            // Initialise
            DateTime outputDate = this.startDate;


            // Loop until date range has been filled out
            while (outputDate <= endDate)
            {
                output.Set<DateTime>("outputDate", outputDate);

                // Increment date
                outputDate = outputDate.AddDays(1);

                yield return output.AsReadOnly();

            }
        }
    }
}

      

I did this using a custom Applier that takes 1 string and converts it to 0 or n.

+5


source


Step 1: You need to have a deterministic ordering of strings in the string set for this to logically make sense. So, specify which column you want to order with your rows by

Step 2. Get the line number assigned to each line. Here is an example of how https://msdn.microsoft.com/en-us/library/azure/mt763822.aspx

Step 3. You can use the line number assigned to each line in combination with a C # expression to generate the date that should be for each line.

+1


source


This is a prime example of how .Net

elements of the U-SQL language can be used to great effect. In this case, you can explode

in Enumerable.Range

to get a list of Increment values ​​that can be applied to the data:

DECLARE @startDate DateTime = DateTime.Parse("2000/01/01");
DECLARE @endDate DateTime = DateTime.Parse("2017/12/31");

@dates =
    SELECT d.DateValue
    FROM (VALUES(@startDate)) AS sd(s)
         CROSS APPLY    // EXPLODE creates a rowset from all the values in the given list
             EXPLODE(Enumerable.Range(0
                                     ,(@endDate - @startDate).Days
                                     )
                                     .Select(offset => sd.s.AddDays(offset))
                    ) AS d(DateValue)
    ;

      

0


source







All Articles