data one ; input obs pid upmode mode x1 x2 x3 x4 x5 decision; cards ; 1 1 1 1 # # # # # 1 2 1 1 2 # # # # # 0 3 1 2 3 # # # # # 0 4 1 2 4 # # # # # 0 5 1 2 5 # # # # # 0 6 2 1 1 # # # # # 0 7 2 1 2 # # # # # 0 8 2 2 3 # # # # # 0 9 2 2 4 # # # # # 0 10 2 2 5 # # # # # 1 ; /* You can view choices as a decision tree and model the decision tree by using the nested logit model. You need to use either the NEST statement or the CHOICE= option of the MODEL statement to specify the nested tree structure. Additionally, you need to identify which explanatory variables are used at each level of the decision tree. These explanatory variables are arguments for what is called a utility function. The utility function is specified using UTILITY statements. For example, consider a two-level decision tree. The tree structure is displayed in Figure 17.27. Figure 17.27 Two-Level Decision Tree * 1 (12) 2 (345) ;*/ proc mdc data=one type=nlogit; model decision = x1 x2 x3 x4 x5 / choice=(upmode 1 2, mode 1 2 3 4 5); id pid; utility u(1, 3 4 5 @ 2) = x1 x2, u(1, 1 2 @ 1) = x3 x4, u(2, 1 2) = x5; run; /* All model variables, x1 through x5, are specified in the UTILITY statement. It is required that entries denoted as # have values for model estimation and prediction. The values of the level 2 utility variable x5 should be the same for all the primitive (level 1) alternatives below node 1 at level 2 and, similarly, for all the primitive alternatives below node 2 at level 2. In other words, x5 should have the same value for primitive alternatives 1 and 2 and, similarly, it should have the same value for primitive alternatives 3, 4, and 5. More generally, the values of any level 2 or higher utility function variables should be constant across primitive alternatives under each node for which the utility function applies. Since PROC MDC expects this to be the case, it will use the values of x5 only for the primitive alternatives 1 and 3, ignoring the values for the primitive alternatives 2, 4, and 5. Thus, PROC MDC uses the values of the utility function variable only for the primitive alternatives that come first under each node for which the utility function applies. This behavior applies to any utility function variables that are specified above the first level. The choice variable for level 2 (upmode ) should be placed before the first-level choice variable (mode ) when the CHOICE= option is given. Alternatively, the NEST statement can be used to specify the decision tree. The following SAS statements fit the same nested logit model: */ proc mdc data=a type=nlogit; model decision = x1 x2 x3 x4 x5 / choice=(mode 1 2 3 4 5); id pid; utility u(1, 3 4 5 @ 2) = x1 x2, u(1, 1 2 @ 1) = x3 x4, u(2, 1 2) = x5; nest level(1) = (1 2 @ 1, 3 4 5 @ 2), level(2) = (1 2 @ 1); run; /* The U(1, 3 4 5 @ 2)= option specifies three choices, 3, 4, and 5, at level 1 of the decision tree. They are connected to the upper branch 2. The specified variables (x1 and x2 ) are used to model this utility function. The bottom level of the decision tree is level 1. All variables in the UTILITY statement must be included in the MODEL statement. When all choices at the first level share the same variables, you can omit the second argument of the U()= option for that level. However, U(1, ) = x1 x2 is not equivalent to the following statements: u(1, 3 4 5 @ 2) = x1 x2; u(1, 1 2 @ 1) = x1 x2; */ * Decision Tree for Revealed and Stated Preference Data ; * 1 (123) 2(4) 3 (5) 4 (6) ; proc mdc data=a type=nlogit; model decision = x1 x2 x3 / spscale choice=(mode 1 2 3 4 5 6); id pid; utility u(1,) = x1 x2 x3; nest level(1) = (1 2 3 @ 1, 4 @ 2, 5 @ 3, 6 @ 4), level(2) = (1 2 3 4 @ 1); run; /*The SPSCALE option specifies that parameters of inclusive values for nodes 2, 3, and 4 at level 2 be the same. When you specify the SAMESCALE option, the MDC procedure imposes the same coefficient of inclusive values for choices 1–4. */ *=======================; /*-- Two-level Nested Logit --*/ * http://support.sas.com/documentation/cdl/en/etsug/60372/HTML/default/etsug_mdc_sect037.htm; * 1 (12345678) 2 (9) 3 (10 11 12) ; proc mdc data=small maxit=200 outest=a; model decision = r15 r10 ttime ttime_cp sde sde_cp sdl sdlx d2l / type=nlogit choice=(alt); id id; utility u(1, ) = r15 r10 ttime ttime_cp sde sde_cp sdl sdlx d2l; nest level(1) = (1 2 3 4 5 6 7 8 @ 1, 9 @ 2, 10 11 12 @ 3), level(2) = (1 2 3 @ 1); run; * avec contrainte d'égalité des paramètres tau ; /*-- Nested Logit with Equal Dissimilarity Parameters --*/ proc mdc data=small maxit=200 outest=a; model decision = r15 r10 ttime ttime_cp sde sde_cp sdl sdlx d2l / samescale type=nlogit choice=(alt); id id; utility u(1, ) = r15 r10 ttime ttime_cp sde sde_cp sdl sdlx d2l; nest level(1) = (1 2 3 4 5 6 7 8 @ 1, 9 @ 2, 10 11 12 @ 3), level(2) = (1 2 3 @ 1); run; * calcul des p values ; data _null_; /*-- test for H0: tau1 = tau2 = tau3 --*/ /* ln L(max) = -990.8191 */ /* ln L(0) = -994.3940 */ stat = -2 * ( -994.3940 + 990.8191 ); df = 2; p_value = 1 - probchi(stat, df); put stat= p_value=; run; /*******************************************************************/ /* you can estimate the HEV model with unit scale restrictions on all three alternatives (theta1=theta2=theta3=)/ /*-- HEV Estimation --*/ proc mdc data=newdata; model decision = ttime / type=hev nchoice=3 hev=(unitscale=1 2 3, integrate=laguerre) covest=hess; id pid; run; /* The test for scale equivalence (SCALE2=SCALE3=1) is performed using a likelihood ratio test statistic. The following SAS statements compute the test statistic (1.4276) and its -value (0.4898) from the log-likelihood values in Figure 17.4.1 and Figure 17.4.2: */ data _null_; /*-- test for H0: scale2 = scale3 = 1 --*/ /* ln L(max) = -34.1276 */ /* ln L(0) = -33.4138 */ stat = -2 * ( - 34.1276 + 33.4138 ); df = 2; p_value = 1 - probchi(stat, df); put stat= p_value=; run; /*-- Heteroscedastic Multinomial Probit --*/ proc mdc data=newdata; model decision = ttime / type=mprobit nchoice=3 unitvariance=(1 2) covest=hess; id pid; restrict RHO_31 = 0; run; /*-- Homoscedastic Multinomial Probit -- meme écart-type S1 =S2=S3=1 */ proc mdc data=newdata; model decision = ttime / type=mprobit nchoice=3 unitvariance=(1 2 3) covest=hess; id pid; restrict RHO_21 = 0; run; /* The test for homoscedasticity ( = 1) under shows that the error variance is not heteroscedastic since the test statistic () is less than 3.84. The marginal probability or -value computed in the following program from the PROBCHI function is 0.2519. */ data _null_; /*-- test for H0: sigma3 = 1 --*/ /* ln L(max) = -33.8860 */ /* ln L(0) = -34.5425 */ stat = -2 * ( -34.5425 + 33.8860 ); df = 1; p_value = 1 - probchi(stat, df); put stat= p_value=; run; /*-- generate simulated series --*/ %let ndim = 3; %let nobs = 1000; /* matrice (2 .6 0 / .6 1 0 / 0 0 1 ) */ data trichoice; array error{&ndim} e1-e3; array vtemp{&ndim} _temporary_; array lm{6} _temporary_ (1.4142136 0.4242641 0.9055385 0 0 1); retain nseed 345678 useed 223344; do id = 1 to &nobs; index = 0; /* generate independent normal variate */ do i = 1 to &ndim; /* index of diagonal element */ vtemp{i} = rannor(nseed); end; /* get multivariate normal variate */ index = 0; do i = 1 to &ndim; error{i} = 0; do j = 1 to i; error{i} = error{i} + lm{index+j}*vtemp{j}; end; index = index + i; end; x1 = 1.0 + 2.0 * ranuni(useed); x2 = 1.2 + 2.0 * ranuni(useed); x3 = 1.5 + 1.2 * ranuni(useed); util1 = 2.0 * x1 + e1; util2 = 2.0 * x2 + e2; util3 = 2.0 * x3 + e3; do i = 1 to &ndim; vtemp{i} = 0; end; if ( util1 > util2 & util1 > util3 ) then vtemp{1} = 1; else if ( util2 > util1 & util2 > util3 ) then vtemp{2} = 1; else if ( util3 > util1 & util3 > util2 ) then vtemp{3} = 1; else continue; /*-- first choice --*/ x = x1; mode = 1; decision = vtemp{1}; output; /*-- second choice --*/ x = x2; mode = 2; decision = vtemp{2}; output; /*-- third choice --*/ x = x3; mode = 3; decision = vtemp{3}; output; end; run; /*-- Trinomial Probit --*/ /* l'ajustement est assez bon */ proc mdc data=trichoice randnum=halton nsimul=100; model decision = x / type=mprobit choice=(mode 1 2 3) covest=op optmethod=qn; id id; run; /* The nested model is also estimated based on a two-level decision tree (see the following program). (See Output 17.3.2.) The estimated result (see Output 17.3.3) shows that the data support the nested tree model since the estimates of the inclusive value parameters are significant and are less than 1. */ /* Output 17.3.2 Nested Tree Structure 1 (12) 2 (3) */ /*-- Two-Level Nested Logit --*/ proc mdc data=trichoice; model decision = x / type=nlogit choice=(mode 1 2 3) covest=op optmethod=qn; id id; utility u(1,) = x; nest level(1) = (1 2 @ 1, 3 @ 2), level(2) = (1 2 @ 1); run; data travel; length mode $ 8; input auto transit mode $; datalines; 52.9 4.4 Transit 4.1 28.5 Transit 4.1 86.9 Auto 56.2 31.6 Transit 51.8 20.2 Transit 0.2 91.2 Auto 27.6 79.7 Auto 89.9 2.2 Transit 41.5 24.5 Transit 95.0 43.5 Transit ... more lines ... ; data new; set travel; retain id 0; id+1; /*-- create auto variable --*/ decision = (upcase(mode) = 'AUTO'); ttime = auto; autodum = 1; trandum = 0; output; /*-- create transit variable --*/ decision = (upcase(mode) = 'TRANSIT'); ttime = transit; autodum = 0; trandum = 1; output; run; proc print data=new(obs=10); var decision autodum trandum ttime; id id; run; /* The following statements perform the binary logit estimation. */ proc mdc data=new; model decision = autodum ttime / type=clogit nchoice=2; id id; run; /* In order to handle more general cases, you can use the MDCDATA statement. Choice-specific dummy variables are generated and multiple observations for each individual are created. The following example converts the original data set travel by using the MDCDATA statement and performs conditional logit analysis. Interleaved data are output into the new data set new3. This data set has twice as many observations as the original travel data set. */ proc mdc data=travel; mdcdata varlist( x1 = (auto transit) ) select=mode id=id alt=alternative decvar=Decision / out=new3; model decision = auto x1 / nchoice=2 type=clogit; id id; run; /****************************************************/ /* logit binaire */ data smdata; input gpa tuce psi grade; datalines; 2.66 20 0 0 2.89 22 0 0 3.28 24 0 0 ... more lines ... data smdata1; set smdata; retain id 0; id + 1; /*-- first choice --*/ choice1 = 1; choice2 = 0; decision = (grade = 0); gpa_2 = 0; tuce_2 = 0; psi_2 = 0; output; /*-- second choice --*/ choice1 = 0; choice2 = 1; decision = (grade = 1); gpa_2 = gpa; tuce_2 = tuce; psi_2 = psi; output; run; /*-- Conditional Logit --*/ /* Therefore, you can interpret the binary choice data as the difference between the first and second choice characteristics*/ proc mdc data=smdata1; model decision = choice2 gpa_2 tuce_2 psi_2 / type=clogit nchoice=2 covest=hess; id id; run; /* attention résultats différents du prodbit traditionnel car il faut multiplier par un facteur d'échelle 1/racine(2)*/ /*-- Multinomial Probit --*/ proc mdc data=smdata1; model decision = choice2 gpa_2 tuce_2 psi_2 / type=mprobit nchoice=2 covest=hess unitvariance=(1 2); id id; run;