Normalizing a table

I have a deprecated table that I cannot change. The values ​​in it can be changed from the legacy application (the application also cannot be changed). Due to the large table access from a new application (new requirement) I would like to create a temporary table that will hopefully speed up queries.

The actual requirement is to calculate the number of business days from X to Y. For example, give me all business days from January 1, 2001 to December 24, 2004. The table is used to denote which days are off as different companies may have different days off - it's not just Saturday + Sunday)

A temporary table will be created from a .NET program, every time the user enters the screen for this query (user can run the query multiple times, with different values, the table is created once), so I would like it to be as fast as possible. The approach below works in less than a second, but I've only tested it with a small dataset, and yet it takes probably almost half a second, which is not very user-friendly for the UI - although it's just an overhead for the first request.

The inherited table looks like this:

CREATE TABLE [business_days](
    [country_code] [char](3) ,
    [state_code] [varchar](4) ,
    [calendar_year] [int] ,
    [calendar_month] [varchar](31) ,
    [calendar_month2] [varchar](31) ,
    [calendar_month3] [varchar](31) ,
    [calendar_month4] [varchar](31) ,
    [calendar_month5] [varchar](31) ,
    [calendar_month6] [varchar](31) ,
    [calendar_month7] [varchar](31) ,
    [calendar_month8] [varchar](31) ,
    [calendar_month9] [varchar](31) ,
    [calendar_month10] [varchar](31) ,
    [calendar_month11] [varchar](31) ,
    [calendar_month12] [varchar](31) ,
misc.
)

      

Each month has 31 characters and any weekend (Saturday + Sunday + holiday) is marked with an X. Every half day is marked with an "H". For example, if the month starts on Thursday it will look like (Thursday + Friday, Saturday + Sunday marked with an X):

'  XX     XX ..'

      

I would like the new table to look like this:

create table #Temp (country varchar(3), state varchar(4), date datetime, hours int)

      

And I would like to only have rows for a few days that are disabled (marked with X or H from a previous request)

What I have done so far is this: Create a temporary staging table that looks like this:

create table #Temp_2 (country_code varchar(3), state_code varchar(4), calendar_year int, calendar_month varchar(31), month_code int)

      

To fill it, I have a union that basically combines calendar_month, calendar_month2, calendar_month3, etc.

Than I have a loop that goes through all the lines in # Temp_2, after each line is processed, it is removed from # Temp_2. There is a loop from 1 to 31 to process the string, and the substring (calendar_month, counter, 1) is checked for X or H, in which case there is an insert into the #Temp table. [edit added code]

Declare @country_code char(3)
Declare @state_code varchar(4)
Declare @calendar_year int
Declare @calendar_month varchar(31)
Declare @month_code int
Declare @calendar_date datetime
Declare @day_code int
WHILE EXISTS(SELECT * From #Temp_2) -- where processed = 0)
BEGIN
    Select Top 1 @country_code = t2.country_code, @state_code = t2.state_code, @calendar_year = t2.calendar_year, @calendar_month = t2.calendar_month, @month_code = t2.month_code From #Temp_2 t2 -- where processed = 0

    set @day_code = 1
    while @day_code <= 31
    begin
        if substring(@calendar_month, @day_code, 1) = 'X'
        begin
            set @calendar_date = convert(datetime, (cast(@month_code as varchar) + '/' + cast(@day_code as varchar) + '/' + cast(@calendar_year as varchar)))
            insert into #Temp (country, state, date, hours) values (@country_code, @state_code, @calendar_date, 8)
        end
        if substring(@calendar_month, @day_code, 1) = 'H'
        begin
            set @calendar_date = convert(datetime, (cast(@month_code as varchar) + '/' + cast(@day_code as varchar) + '/' + cast(@calendar_year as varchar)))
            insert into #Temp (country, state, date, hours) values (@country_code, @state_code, @calendar_date, 4)
        end

        set @day_code = @day_code + 1
    end
    delete from #Temp_2 where @country_code = country_code AND @state_code = state_code AND @calendar_year = calendar_year AND @calendar_month = calendar_month AND @month_code = month_code
    --update #Temp_2 set processed = 1 where @country_code = country_code AND @state_code = state_code AND @calendar_year = calendar_year AND @calendar_month = calendar_month AND @month_code = month_code
END

      

I am not an expert in SQL, so I would like to get some input on my approach and maybe even a much better suggestion.

After the temp table I am planning to do (dates will come from the table):

select cast(convert(datetime, ('01/31/2012'), 101) -convert(datetime, ('01/17/2012'), 101) as int) -  ((select sum(hours) from #Temp where date between convert(datetime, ('01/17/2012'), 101) and convert(datetime, ('01/31/2012'), 101)) / 8)

      

Aside from the table normalization solution, the other solution I have implemented so far is a function that does all this logic of getting weekdays by scanning the current table. It works pretty fast, but I hesitate to call the function if I can instead add a simpler query to get the result.

(I am currently trying to do this in MSSQL, but I will need to do the same for Sybase ASE and Oracle)

+3


source to share


2 answers


This must meet the requirement "... calculate the number of working days from X to Y".

It counts each space as a workday and nothing but X or a space as half a day (there should only be H, according to the OP).

I disabled this in SQL Server 2008 R2:

-- Calculate number of business days from X to Y
declare @start date = '20120101' -- X
declare @end date = '20120101' -- Y
-- Outer query sums the length of the full_year text minus non-work days
-- Spaces are doubled to help account for half-days...then divide by two
select sum(datalength(replace(replace(substring(full_year, first_day, last_day - first_day + 1), ' ', '  '), 'X', '')) / 2.0) as number_of_business_days
from (
    select
        -- Get substring start value for each year
        case
                when calendar_year = datepart(yyyy, @start) then datepart(dayofyear, @start)
                else 1
            end as first_day
        -- Get substring end value for each year
        , case
                when calendar_year = datepart(yyyy, @end) then datepart(dayofyear, @end)
                when calendar_year > datepart(yyyy, @end) then 0
                when calendar_year < datepart(yyyy, @start) then 0
                else datalength(full_year)
            end as last_day
        , full_year
    from (
        select calendar_year
            -- Get text representation of full year
            , calendar_month
            + calendar_month2
            + calendar_month3
            + calendar_month4
            + calendar_month5
            + calendar_month6
            + calendar_month7
            + calendar_month8
            + calendar_month9
            + calendar_month10
            + calendar_month11
            + calendar_month12 as full_year
        from business_days
        -- where country_code = 'USA' etc.
    ) as get_year
) as get_days

      

And where the offer can fulfill the internal request.



This is not an integral part of the legacy format that the OP spends a lot of time on and which will probably take longer (and possibly unnecessary) computational cycles. I assume that such a thing is "nice to see" and not part of the requirements. Jeff Moden has excellent articles on how a table of tables could help in this case (for SQL Server, anyway).

It may be necessary to look at the trailing spaces depending on how the particular RDBMS is installed (note that I am using datalength and not len).

UPDATE: Added the requested OP temporary table:

select country_code
    , state_code
    , dateadd(d, t.N - 1, cast(cast(a.calendar_year as varchar(8)) as date)) as calendar_date
    , case substring(full_year, t.N, 1) when 'X' then 0 when 'H' then 4 else 8 end as business_hours
from (
    select country_code
        , state_code
        , calendar_year
        , calendar_month
            + calendar_month2
            + calendar_month3
            + calendar_month4
            + calendar_month5
            + calendar_month6
            + calendar_month7
            + calendar_month8
            + calendar_month9
            + calendar_month10
            + calendar_month11
            + calendar_month12
            as full_year
    from business_days
) as a, (
        select a.N + b.N * 10 + c.N * 100 + 1 as N
        from (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a
            , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) b
            , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) c
    ) as t -- cross join with Tally table built on the fly
where t.N <= datalength(a.full_year)

      

+2


source


Given that your temporary table is slow, can you precompute it?

If you can put the trigger on an existing table, perhaps you can run a proc which will crash and create a temporary table. Or have an agent job that checks if an existing table is up to date (raise a flag somewhere) and then relink the temp table.



The existing table structure is so bitter that I wouldn't be surprised if it is always expensive to normalize it. Precomputing is a simple and straightforward way to solve this problem.

0


source







All Articles