Pages: 1 2 3 4 5

Airline data

Posted by Oleg Solovyev on Apr 13, 2011

US Bureau of Transportation Statistics not only collects data but provides access to it through its web site BTS.gov. For example SAS uses airline data in its advertising and education materials to demonstrate time series forecasting.

Airline data graph

This time series is ideal for demonstration because it contains the trend (the average of 12 months is constantly increasing) and seasonality (every value “repeats” every 12 months). This article shows how to download this data and import it into SAS.

more »

Excel Import

Posted by Oleg Solovyev on Apr 12, 2011

Recently I had to import Excel file into SAS DWH. There are three popular ways:

  1. import wizard: File &rarr& Import Data
  2. SAS library with ODBC engine assign to Excel file
  3. infile and input statements in the data step

All these methods work well when the data is in one table. E.g. first row contains variables names and the rest is data. But my Excel file had several blocks of data. I could transform them into separate spreadsheets but human factor increases errors. Anyway I would have to do the same thing any time new file comes.

more »

Most important var

Posted by Oleg Solovyev on Mar 15, 2011

They say that Business and IT speak different languages. I consider the third group: Statisticians. All three have problems interacting with each other that slow down any mutual project.

I once worked on the marketing campaign optimization. The task was to find a group of customers most likely to buy a product. The problem involves half a hundred variables and Business kept pressing me to find the most important one. They though there is a variable that alone may split customers into groups with the highest/lowest response rate.

The most important variables

I told them that models can include dozens of variables. Their importance can depend on the presence of other variables (multicollinearity). Moreover each model has an algorithm to pick the “best” ones. But I wasn’t able to conceive the Business. They kept looking for the most important variable.

more »

SPDE library

Posted by Oleg Solovyev on Mar 14, 2011

SAS standard library is called V9. Each table in the library is a separate file on disk. SPDE is an alternative library. Data set in this library can be saved in several files to increase I/O operations or to store big tables on several discs. The cool thing is that SPDE doesn’t require additional license and comes with Base SAS.

In terms of Oracle or PostgreSQL the technology is called partitioning.

more »

Tree w/o flag

Posted by Oleg Solovyev on Oct 10, 2010

The first time I heard about decision tree was when I learned how to separate good and bad credit applicants. In brief each bank’s client is assigned a value called good/bad flag: “bad” – means that the client has defaulted on a credit within 90 days, “0” – otherwise.

decision tree with binary target

more »