Social Network Analysis
Posted by Oleg Solovyev on Aug 14, 2011
One of the newest fields in data mining is Social Network Analysis (SNA). The task is to find out your friends (first circle), then friends of your friends (second circle) etc. Mathematicians call it “to develop a graph” made of nodes (the people) and edges (ties between people).
For example in Telecom graphs can be built using phone calls data. The people you call are your first circle. They are relatives, colleagues or friends. You value those people and listen to their opinions. If one of your friends uses mobile internet the telecom operator can offer this service to you with a high probability of purchase.
Banks and Credit bureaus
Posted by Oleg Solovyev on Aug 7, 2011
According to Russian legislation every bank has to report to one of the credit bureaus (CB). The reported credit histories (CH) contain info on credit amount, monthly payments and other information. Any bank can request your credit information to assess consumer credit worthiness and decide whether to issue you another loan or not.

Consumers with good credit histories can get a new loan with a lower interest rate. But one has to know which CB stores its credit history and what banks can request that history from CB. If your credit history is poor for example you had delinquent loans you better look for a bank that don’t request your credit history.
Gender cleaning
Posted by Oleg Solovyev on Jul 5, 2011
Investigating the ABT table I’ve found anomaly in the gender column. There were only 5% of males and 95% of females in the sample. The expected ratio was 50%/50%. After comparing client’s names and gender I was sure that values of the gender are wrong.
I couldn’t simply delete the gender because it is often an important factor in the model. Thus I decided to replace the gender with a new column calculated using clients’ patronymics. The thing is that most Russian male patronymics are derived from father’s name by adding “ich” like “Ivanovich” and “Ilyich”. Most female patronymics end in “na” like “Ivanovna” and “Ilyinichna”.
Locale names
Posted by Oleg Solovyev on Apr 26, 2011
Once at the interview they proposed me to show my programming skills on the production database. The interviewer rose from his table and asked me to write any code I want. I set down at his computer, opened the SAS window and wrote simple query that returns a list of all columns in the database:
proc sql; create table test as select * from dictionary.columns; quit;
“Well you rashly gave me access to your product DB and now I can get any information out of it.”
“Hey, be careful!”
“For instance I can get all the column names in your database.”
“Hm… Nice!”
DWH optimization
Posted by Oleg Solovyev on Apr 23, 2011
Indexes
The first thing one should start with is indexes. They decrease table read time if one of the columns in the where statement is indexed. They also decrease tables join/merge time if one of the ID columns is indexed. The list of DWH indexes is available in the system table DICTIONARY.INDEXES:
proc sql; create table work.indexes_list as select * from dictionary.indexes; quit;
Indexes can be simple and combined. Simple index is created on one column. Combined index involves several columns. The main difference is that combined index is more efficient then simple index when query involves filter or join based on several columns. SAS compiler decides what index to use depending on the query code and indexes available.
