Saturday 19 September 2015

Learning SAS by example

Program 1
Using IF and ELSE IF statements, compute two new variables as follows: Grad(Numeric), with a value of if Age is 12 and a value of if Age is 13.The quiz grades have numerical equivalents as follows: A = 95B = 85C = 75,
D = 70, and F = 65. Using this information, compute a course grade (Course) as a
Weighted average of the Quiz (20%), Midterm (30%) and Final (50%).

Code:

Output:
Learning:
Use of IF-ELSE in SAS.

Program 2

You have the following seven values for temperatures for each day of the week, starting with Monday: 70, 72, 74, 76, 77, 78, and 85. Create a temporary SAS data set (Temperatures) with a variable (Day) equal to Mon, Tue, Wed, Thu, Fri, Sat, and Sun and a variable called Temp equal to the listed temperature values. Use a DO loop to create the Day variable.

Code:

Output

Learning:
When this program executes, Day is first set to Mon, the lower limit in the iterative DO range. All the statements up to the END statement are executed and DAY is incremented by 1 (the default increment value). SAS then tests if the new value of DAY is between the lower and the upper limit (the value after the keyword =). If it is, the statements in the DO group execute.

Program 3:
You have daily temperatures for each hour of the day for two cities (Dallas
And Houston). The 48 temperature values are strung out in several lines like this:
80 81 82 83 84 84 87 88 89 89
91 93 93 95 96 97 99 95 92 90 88
86 84 80 78 76 77 78
80 81 82 82 86
88 90 92 92 93 96 94 92 90
88 84 82 78 76 74
The first 24 values represent temperatures from Hour 1 to Hour 24 for Dallas and the next 24 values represent temperatures for Hour 1 to Hour 24 for Austin. Using the appropriate DO loops, create a data set (Temperature) with 48 observations, each observation containing the Variables City, Hour, and Temp.
Note: For this problem, you will need to use a single trailing @ on your INPUT
statement (see Chapter 21, Section 21.11 for an explanation).

Code:
Output:
Learning’s:
To solve this problem, use a trailing @ at the end of the first INPUT statement. This is an instruction to “hold the line” for another INPUT statement in the same DATA step. By “holding the line,” we mean to leave the pointer at the present position and not to advance to the next record. The single trailing @ holds the line until another INPUT statement, (without a trailing @) is encountered further down in the DATA step, or the end of the DATA step is reached.


 Problem 4
You have several lines of data, consisting of a subject number and two dates (date of birth and visit date). The subject starts in column 1 (and is 3 bytes long), the date of birth starts in column 4 and is in the form mm/dd/yyyy, and the visit date starts in column 14 and is in the form nnmmmyyyy  (see sample lines below). Read the following lines of data to create a temporary SAS data set called Dates. Format both dates using the DATE9. Format. Include the subject’s age at the time of the visit in this data set.

0011021195011Nov2006
0020102195525May2005
0031225200525Dec2006

Code:




Output:











Learning's:
SAS does not normally store dates in any of these forms—it converts all of these dates into a single number—the number of days from January 1, 1960. Dates after January 1, 1960, are positive integers; dates before January 1, 1960, are negative integers. The first date (starting in columns 4) is in the month-day-year form; the last date (starting in column 14) starts with the day of the month, a three-letter month abbreviation, and a four-digit year. Notice that some of the dates include separators between the values, while others do not. The date of birth (DOB) takes up 10 columns, MMDDYY10 and the visit date the number of columns used for these dates is 9, and the informat name is DATE.

Problem 5
A listing of the data file is:
IBM 5/21/2006 $80.0 10007/20/2006 $88.5
CSCO04/05/2005 $17.5 20009/21/2005 $23.6
MOT 03/01/2004 $14.7 50010/10/2006 $19.9
XMSR04/15/2006 $28.4 20004/15/2007 $12.7
BBY 02/15/2005 $45.2 10009/09/2006 $56.8
Create a SAS data set (call it Stocks) by reading the data from this file. Use
Formatted input. Compute several new variables as follows:
TotalPur , TotalSell , Profit. Print out the contents of this data set using PROC PRINT.

Code:





Output:







Learning’s:
The @ (at) signs in the INPUT statement are called column pointers—and they do just that. For example, @5 says to SAS, go to column 5. Following the variable names are SAS informats. Informats are built-in instructions that tell SAS how to read a data value. The choice of which informat to use is dictated by the data. Two of the most basic informats are w.d and $w. The w.d format reads standard numeric values. The w tells SAS how many columns to read. The optional d tells SAS that there is an implied decimal point in the value. For example, if you have the number 123 and you read it with a 3.0 informat, SAS stores the value 123.0. If you read the same number with a 3.1 informat, SAS stores the value 12.3. If the number you are reading already has a decimal point in it (this counts as one of the columns to be read), SAS ignores the the portion of the informat. So, if you read the value 1.23 with a 4.1 informat, SAS stores a value of 1.23. The $w. informat tells SAS to read w columns of character data. In this program, Subj is read as character data and takes up three columns; values of Gender take up a single column. The MMDDYY10. informat tells SAS that the date you are reading is in the mm/dd/yyyy form. The DATE9. Format, as you can see, prints dates as a two-digit day of the month, a three character month abbreviation, and a four-digit year. This format helps avoid confusion between the month-day-year and day-month-year formats used in the United States and Europe, respectively. The DOLLAR6.1 format makes the Balance figures much easier to read.
This is a good place to mention that the COMMA w.d format is useful for displaying large numbers where you don’t need or want dollar signs.
           
 Problem 6
You have several lines of data, consisting of a subject number and two dates (date of birth and visit date). The subject starts in column 1 (and is 3 bytes long), the date of birth starts in column 4 and is in the form mm/dd/yyyy, and the visit date starts in column 14 and is in the form nnmmmyyyy (see sample lines below). Read the following lines of data to create a temporary SAS data set called Dates. Format both dates using the DATE9. format. Include the subject’s age at the time of the visit in this data set.
0011021195011Nov2006
0020102195525May2005
0031225200525Dec2006

Code:

Output:











Learning’s:
Same as the previous problem.


 Problem 7
Using the Hosp data set, compute the frequencies for the days of the week, months of the year, and year, corresponding to the admission dates (variable AdmitDate). Supply a format for the days of the week and months of the year. Use PROC FREQ to list these frequencies.

Code:











Output:

























Learning’s:
SAS does not normally store dates in any of these forms—it converts all of these dates into a single number—the number of days from January 1, 1960. Dates after January 1, 1960, are positive integers; dates before January 1, 1960, are negative integers. The first date (starting in columns 4) is in the month-day-year form; the last date (starting in column 14) starts with the day of the month, a three-letter month abbreviation, and a four-digit year. Notice that some of the dates include separators between the values, while others do not. The date of birth (DOB) takes up 10 columns, MMDDYY10 and the visit date the number of columns used for these dates is 9, and the informat name is DATE.


Program 8
Use the following data. If there is a missing value for the day, substitute the 15th of the month.

25 12 2005
. 5 2002
12 8 2006

Code:


Output:









Learning’s:
There are occasions where you have a missing value for the day of the month but still want to compute an approximate date. Many people use the 15th of the month to substitute for a missing Day value. You can use the Month data set from the previous section to demonstrate how this is done. Here the MISSING function tests if there is a missing value for the variable Day. If so, The number 15 is used as the second argument to the MDY function. The resulting listing shows the 15th of the month for the date in the second observation.


Program 9
Using the SAS data set Blood, create two temporary SAS data sets called Subset_Aand Subset_B. Include in both of these data sets a variable called Combined equal to .001 times WBC plus RBC. Subset_A should consist of observations from Blood where Gender is equal to Female and BloodType is equal to AB. Subset_B should consist of all observations from Blood where Gender is equal to Female, BloodType is equal to AB, and Combined is greater than or equal to 14.

Code:


Output:













Program 10
Look at the following program and determine the storage length of each of the variables:
data storage;
length A $ 4 B $ 4;
Name = 'Goldstein';
AandB = A || B;
Cat = cats(A,B);
if Name = 'Smith' then Match = 'No';
else Match = 'Yes';
Substring = substr(Name,5,2);
run;

A _________________
B _________________
Name _________________
AandB _________________
Cat _________________
Match _________________
Substring _________________

Code:


Output: