SAS length operator

I just realized how useful it is to reduce the length of numeric variables (numeric and integer) as it saves time on both disk and disk. However, I find it convenient to use the length operator at the end of my code rather than mention "set" (the latter is how SAS bloggers and other experts generally recommend that you use the length operator).

So, is there a difference between the two (see examples below)? I can't find any difference in the output, but I'm a little concerned that I might be doing something wrong. Could you please explain what is the difference (if any) and why you would prefer to do this anyway.

Thanks in advance!

This is an example of using the length operator:

data b;  
set a;

dummy = income > 10 000;

label "dummy = Income > 10 000";

length dummy 3;

run;

      

But here's how the experts recommend you do it.

data b;  
length dummy 3;  
set a;

dummy = income > 10 000;

label "dummy = Income > 10 000";

run;

      

+3


source to share


2 answers


I would have sworn that in previous versions of SAS you would not have been able to override the length of a variable defined by the length operator or "inherited" from the original data.

I remember that some notes or warnings about "variable length ... have already been set".

In SAS 9.3, the code is:

data a;
    length income dummy 8.;
    income = 1234567890;
    dummy = 1234567890;
    output;
    stop;
run;

data b;  
    set a;
    attrib dummy length = 3 label = "dummy = Income > 10 000";

    dummy = income > 10000;
    length dummy 8;
    length dummy 5;
run;

      

creates a variable fiction of length 5 without any notes. So it seems to me that the behavior has changed. Previously, I would say that you end up with a variable defined by the first explicit definition or appearance in the original data.



However, this certainly does not help the readability and maintainability of the code to assign values ​​to variables first and define the basic properties of the variables at the very end.

Btw correct label definition: label dummy = "dummy = Income > 10 000";

Alternatively, you may prefer the operator ATTRIB

to specify different properties of the same variable in a single expression.

data b;  
    set a (drop = dummy);
    attrib dummy length = 3 label = "dummy = Income > 10 000";

    dummy = income > 10000;

run;

      

+3


source


Numeric variables can be changed at any time, while symbolic variables can only be made before they are created. This is because the numeric length of a variable only affects the output dataset; within PDV, numeric variables always have 8 bytes of precision regardless of any length operators. However, character variables may not be redefined in length because the length of the PDV associated with a character variable is not interchangeable after it was originally defined (in a set assignment or the first length / attribute / assignment for a character variable). See the LENGTH documentation for more details (although not as much as we would like).



However, I prefer formatting and length to the front over the end. Part of this is that anyone who reads the program knows what will happen in the end; but most of it is that some lengths / attributes have to come up front: character lengths in particular, and any variable where you need to pre-specify the type (number / character) to make sure you end up with the right a type. If you usually put lengths at the end, you will end up with a mix of some of them in the front / back, and so I would prefer to do everything in front to be more organized.

+3


source







All Articles