Creating dates in a date range using U-SQL
We always recommend that developers first learn to use the pure U-SQL approach instead of using C # UDOs, here is another way to accomplish this task.
Let's first look at how you simply get a list of numbers in U-SQL
@numbers_10 =
SELECT
*
FROM
(VALUES
(0),
(1),
(2),
(3),
(4),
(5),
(6),
(7),
(8),
(9)
) AS T(Value);
There are only 10 numbers - from 0 to 9. We can use CROSS JOIN to expand the list.
@numbers_100 =
SELECT (a.Value*10 + b.Value) AS Value
FROM @numbers_10 AS a
CROSS JOIN @numbers_10 AS b;
We now have 0 to 99. We can use the CROSS JOIN to create even more numbers.
@numbers_10000 =
SELECT (a.Value*100 + b.Value) AS Value
FROM @numbers_100 AS a CROSS JOIN @numbers_100 AS b;
Then create a list of dates from that.
DECLARE @StartDate = DateTime.Parse("1979-03-31");
...
@result =
SELECT
Value,
@StartDate.AddDays( Value ) AS Date
FROM @numbers_10000;
The complete script looks like this:
DECLARE @StartDate = DateTime.Parse("1979-03-31");
@numbers_10 =
SELECT
*
FROM
(VALUES
(0),
(1),
(2),
(3),
(4),
(5),
(6),
(7),
(8),
(9)
) AS T(Value);
@numbers_100 =
SELECT (a.Value*10 + b.Value) AS Value
FROM @numbers_10 AS a CROSS JOIN @numbers_10 AS b;
@numbers_10000 =
SELECT (a.Value*100 + b.Value) AS Value
FROM @numbers_100 AS a CROSS JOIN @numbers_100 AS b;
@result =
SELECT
Value,
@StartDate.AddDays( Value ) AS Date
FROM @numbers_10000;
OUTPUT @result TO "/res.csv" USING Outputters.Csv(outputHeader:true);
Once you have a list of numbers or dates, it can be convenient to store it in a U-SQL table so you can easily retrieve the list later.
source to share
The easiest way to do this is to export your favorite date dimension from your favorite storage and import it into a U-SQL table.
You can also do it with custom U-SQL code, something like this:
DECLARE @outputFilepath string = "output/output74.csv";
//DECLARE @startDate DateTime = DateTime.Parse("19/7/2017");
//DECLARE @endDate DateTime = DateTime.Parse("21/7/2017");
DECLARE @startDate DateTime = DateTime.Parse("1/1/2000");
DECLARE @endDate DateTime = DateTime.Parse("31/12/2017");
// User-defined appliers
// Take one row and produce 0 to n rows
// Used with OUTER/CROSS APPLY
@output =
SELECT outputDate
FROM(
VALUES ( 1 )
) AS dummy(x)
CROSS APPLY new USQLtpch.makeDateRange (@startDate, @endDate) AS properties(outputDate DateTime);
OUTPUT @output
TO @outputFilepath
USING Outputters.Tsv();
Code file:
using Microsoft.Analytics.Interfaces;
using Microsoft.Analytics.Types.Sql;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
namespace USQLtpch
{
[SqlUserDefinedApplier]
public class makeDateRange : IApplier
{
private DateTime startDate;
private DateTime endDate;
public makeDateRange(DateTime startDate, DateTime endDate)
{
this.startDate = startDate;
this.endDate = endDate;
}
public override IEnumerable<IRow> Apply(IRow input, IUpdatableRow output)
{
// Initialise
DateTime outputDate = this.startDate;
// Loop until date range has been filled out
while (outputDate <= endDate)
{
output.Set<DateTime>("outputDate", outputDate);
// Increment date
outputDate = outputDate.AddDays(1);
yield return output.AsReadOnly();
}
}
}
}
I did this using a custom Applier that takes 1 string and converts it to 0 or n.
source to share
Step 1: You need to have a deterministic ordering of strings in the string set for this to logically make sense. So, specify which column you want to order with your rows by
Step 2. Get the line number assigned to each line. Here is an example of how https://msdn.microsoft.com/en-us/library/azure/mt763822.aspx
Step 3. You can use the line number assigned to each line in combination with a C # expression to generate the date that should be for each line.
source to share
This is a prime example of how .Net
elements of the U-SQL language can be used to great effect. In this case, you can explode
in Enumerable.Range
to get a list of Increment values ββthat can be applied to the data:
DECLARE @startDate DateTime = DateTime.Parse("2000/01/01");
DECLARE @endDate DateTime = DateTime.Parse("2017/12/31");
@dates =
SELECT d.DateValue
FROM (VALUES(@startDate)) AS sd(s)
CROSS APPLY // EXPLODE creates a rowset from all the values in the given list
EXPLODE(Enumerable.Range(0
,(@endDate - @startDate).Days
)
.Select(offset => sd.s.AddDays(offset))
) AS d(DateValue)
;
source to share