Published on April 23, 2019

Leon Ginsburg

Artificial Intelligence (AI) is transforming the landscape of many industries and the industrial markets are no exception. At the end of March, Sphere Software sponsored a TechDebate in Chicago to discuss AI’s impact on labor markets. Adam McElhinney, chief of machine learning and AI strategy at Uptake Technologies, brought his years of experience and insights to the discussion.

After the TechDebate, I had the opportunity to talk further with Adam about the topic.

Let’s have part of our conversation within the context of Uptake’s business. Uptake serves industrial markets; what are the typical differences in operations twenty years ago compared to today?

ADAM: Uptake is an industrial AI company, and our forte is helping customers manage big machines that cost a lot of money when they fail or when they are not being utilized to their full potential. For example, we help railway companies manage their locomotives. If a locomotive fails on the tracks, it costs a lot of money.

Repair costs are high. But the bigger cost is the impact on operations because other locomotives can’t just drive around the failed unit. Everything is on hold until the unit is up, running, and out of the way. There is a good chance that this lapse in operation will rack up additional labor costs, or maybe the railway will be charged a penalty for a late shipment. The common thread of Uptake’s business is providing asset performance management(APM) for companies that operate equipment whose failure can have significant costs. big, expensive machinery.

Our technology also helps operate wind turbines for Berkshire Hathaway Energy. Its wind farm operators log in to our software to see wind turbine performance, the location of technicians, the turbine maintenance schedule, how much revenue each turbine is generating, and many other insights. But the secret sauce of our business is how we inject AI to make insights more powerful.

Twenty years ago, you mostly saw companies and users running machines to failure, then fixing them. Industrial enterprises then started doing preventative maintenance largely based on their expertise, intuition, or guidelines from the manufacturer.

Then industry started adopting what I call optimized preventative maintenance, where companies run optimizations and financial calculations to find the balance between performing maintenance tasks before they are truly needed, which adds unnecessary expense, and performing the maintenance too late, which leads to equipment failure.

Now we’re at the point where we can look at the underlying health of a machine using sensor data and make dynamic, real-time optimizations on the health of that machine.

Are some industry segments more open to applying AI than others?

ADAM: Some companies are much more open than others to applying AI, but we also see industry-wide differences. Companies that tend to have higher margins, thus large costs when there is downtime, and operations as one of their core differentiators, tend to be more open to AI implementation than other companies are.

For example, the oil and gas industry is rapidly adopting AI. It’s been highly sophisticated, using advanced software and mathematical modeling in exploration for decades. We’re now seeing the industry start to leverage AI for production and distribution as well as for refining management. Energy segments outside of oil and gas, like renewable and thermal energy, have also been open to adopting AI.

When you look across the industrial market, it’s hard to think of a company or sector that isn’t harmed by equipment failure, even when operating under tighter margins. Why are some sectors not as open to applying AI?

ADAM: AI implantation requires foundational data. I continue to be shocked by the number of industries and companies that still use paper-based records. Obviously, without large sets of digital data, there is nothing to fuel the AI.

Some companies don’t see operations as a differentiator; therefore, maximizing operations may not be a top priority for their time and monetary investments. I believe maximizing operational efficiencies is critical for most every sector and every company, and organizations will come to this conclusion when it is best for the health of their business.

Are there clear differentiations between companies that are adopting AI and machine learning, versus those that aren’t?

ADAM: Absolutely. At Uptake we work to quantify the exact value being delivered to our customers. Our machine learning algorithms operate on a closed loop that helps them get smarter over time; for instance, we can show our executive sponsor the number of value-creating events triggered by our software.

To learn more about rapidly evolving artificial intelligence and other technology advancements, watch for upcoming TechDebates being held around the world.

Our company, Sphere Software, is a sponsor and organizer of TechDebates initiative and events globally.

]]>The full podcast is here. I hope to soon get time to type up notes to share on the site. Stay tuned!

]]>However, I still feel that there is room for improvement. One common example is many times during the model building process, you wish to split data into testing, training and validation data sets. This should be extremely easy and done in a standardized way.

Thus, I have set out to simplify this process. My goals are to write a function in to split data frames R that:

1. Allows the user to specify any number of splits and the size of splits (the splits need not necessarily all be the same size).

2. Specify the names of the resulting split data.

3. Provide default values for the most common use case.

4. Just “work”. Ideally this means that the function is intuitive and can be used without requiring the user to read any documentation. This also means that it should have minimal errors, produce useful error messages and protect against unintended usage.

5. Be written in a readable and well commented manner. This should facilitate debugging and extending functionality, even if this means performance is not 100% optimal.

I have written this code as part of my package that is in development called helpRFunctions, which is designed to make R programming as painless as possible.

The function takes just a few arguments:

1. df : The data frame to split

2. pcts Optional. The percentage of observations to put into each bucket.

3. set.names Optional. What to name the resulting data sets. This must be the same length as the pcts vector.

4. seed Optional. Define a seed to use for sampling. Defaults to NULL which is just the normal random number generator in R

The function then returns a list containing data frames named according to the set.names argument.

Here is a brief example on how to use the function.

` install.package('devtools') # Only needed if you dont have this installed.`

library(devtools)

install_github('adam-m-mcelhinney/helpRFunctions')

df <- data.frame(matrix(rnorm(110), nrow = 11))

t <- split.data(df)

training <- t$training

testing <- t$testing

Let’s see how the function performs for a large-ish data frame.

df <- data.frame(matrix(rnorm(110*1000000), nrow = 11*1000000))

system.time(split.data(df))

Less than 3 seconds to split 11 million rows! Not too bad. The full code can be found here. I always, I welcome any feedback or pull requests.

]]>A few weeks ago, there was an interesting riddle presented.

**Riddle:** On an 8 x 8 chessboard, define two squares to be neighbors if they share a common side. Some squares will have two neighbors, some will have three, and some will have four. Now suppose each square contains a number subject to the following condition: The number in a square equals the average of the numbers of all its neighbors. If the square with coordinates [1, 1] (i.e. a corner square) contains the number 10, then find (with proof) all possible values that the square with coordinates [8, 8] (i.e. opposite corner) can have?

After some consideration, there is an easy proof by contradiction. However, I am trying to learn a new programming language, specifically Mathematica. I thought that attempting to solve this riddle by developing a system of equations in Mathematica would be good practice.

The Mathematica code is located below and also posted on my Github account. By the way, the answer is 10.

(* ::Package:: *)

(* Parameters *)

size = 8;

givenVal = 10;

(* Corners *)

bottemLeftCornerEq[{i_, j_}] := StringJoin[“(c”, ToString[i+1], ToString[j], ” + c”, ToString[i], ToString[j+1], “)/2 == c”, ToString[i], ToString[j]]

topRightCornerEq[{i_, j_}] := StringJoin[“(c”, ToString[i-1], ToString[j], ” + c”, ToString[i], ToString[j-1], “)/2 == c”, ToString[i], ToString[j]]

topLeftCornerEq[{i_, j_}] := StringJoin[“(c”, ToString[i+1], ToString[j], ” + c”, ToString[i], ToString[j-1], “)/2 == c”, ToString[i], ToString[j]]

bottmRightCornerEq[{i_, j_}] := StringJoin[“(c”, ToString[i-1], ToString[j], ” + c”, ToString[i], ToString[j+1], “)/2 == c”, ToString[i], ToString[j]]

(* Edges *)

bottomEdge[{i_, j_}]:= StringJoin[“(c”, ToString[i-1], ToString[j], ” + c”, ToString[i], ToString[j+1], ” + c”, ToString[i+1], ToString[j], “)/3 == c”, ToString[i], ToString[j]]

topEdge[{i_, j_}]:= StringJoin[“(c”, ToString[i-1], ToString[j], ” + c”, ToString[i], ToString[j-1], ” + c”, ToString[i+1], ToString[j], “)/3 == c”, ToString[i], ToString[j]]

leftEdge[{i_, j_}]:= StringJoin[“(c”, ToString[i], ToString[j+1], ” + c”, ToString[i+1], ToString[j], ” + c”, ToString[i], ToString[j-1], “)/3 == c”, ToString[i], ToString[j]]

rightEdge[{i_, j_}]:= StringJoin[“(c”, ToString[i-1], ToString[j], ” + c”, ToString[i], ToString[j+1], ” + c”, ToString[i], ToString[j-1], “)/3 == c”, ToString[i], ToString[j]]

(* Middle *)

middleEq[{i_, j_}] := StringJoin[“(c”, ToString[i-1], ToString[j], ” + c”, ToString[i], ToString[j-1], ” + c”, ToString[i+1], ToString[j],” + c”, ToString[i+1], ToString[j+1], “)/4 == c”, ToString[i], ToString[j]]

equationSelector[{i_, j_}] := Which[ i == 1 && j == 1

, bottemLeftCornerEq[{i, j}]

, i == size && j == 1

, bottmRightCornerEq[{i, j}]

, i == size && j == size

, topRightCornerEq[{i, j}]

, i == 1 && j == size

, topLeftCornerEq[{i, j}]

, i == 1

, leftEdge[{i, j}]

, i == size

, rightEdge[{i, j}]

, j == 1

, bottomEdge[{i, j}]

, j == size

, topEdge[{i, j}]

, i != 1 && i != size && j != 1 && j != size

, middleEq[{i, j}]

]

(* Test All Cases

i = 1; j = 1;

bottemLeftCornerEq[{i, j}] \[Equal] equationSelector[{i, j}]

i = 3; j = 1;

bottmRightCornerEq[{i, j}] \[Equal] equationSelector[{i, j}]

i = 1; j = 3;

topLeftCornerEq[{i, j}] \[Equal] equationSelector[{i, j}]

i = 3; j = 3;

topRightCornerEq[{i, j}] \[Equal] equationSelector[{i, j}]

i = 2; j = 1;

bottomEdge[{i, j}] \[Equal] equationSelector[{i, j}]

i = 2; j = 3;

topEdge[{i, j}] \[Equal] equationSelector[{i, j}]

i = 3; j = 2;

rightEdge[{i, j}] \[Equal] equationSelector[{i, j}]

i = 1; j = 2;

leftEdge[{i, j}] \[Equal] equationSelector[{i, j}]

i = 2; j = 2;

middleEq[{i, j}] \[Equal] equationSelector[{i, j}]

*)

(* Test on simple 2×2 case

eq1 = bottemLeftCornerEq[{1, 1}]

eq2 = bottmRightCornerEq[{2, 1}]

eq3 = topRightCornerEq[{2, 2}]

eq4 = topLeftCornerEq[{1, 2}]

Solve[ToExpression[eq1] && ToExpression[eq2] && ToExpression[eq3] && ToExpression[eq4] && c11 \[Equal] 10, {c11, c12, c21, c22}]

*)

(* Create System *)

(* Iterate over range *)

f[x_, y_] := {x, y}

genVar[{i_, j_}] := StringJoin[“c”, ToString[i], ToString[j]];

varList = Map[genVar, grid];

grid = Flatten[Outer[f, Range[size], Range[size]], 1];

eqPart1 = StringReplace[ToString[Map[equationSelector, grid]], {“{” -> “”, “}” -> “”, “,” -> ” &&”}];

SolveStr = StringJoin[eqPart1, ” && c11 ==”, ToString[givenVal]]

Solve[ToExpression[SolveStr], ToExpression[varList]]

(* Solve without a given value

*)

Solve[ToExpression[eqPart1], ToExpression[varList]]

(*

Solve[(c21 + c12)/2 == c11 && (c22 + c11)/2 == c12 && (c11 + c22)/2 == c21 && (c12 + c21)/2 == c22 && c11\[Equal]10, {c11,c12,c21,c22}] *)

(*

*)

(*

startCell = c11

c11 = 10

bottemLeftCornerEq[1, 1]

topRightCornerEq[8, 8]

topLeftCornerEq[1, 8]

bottmRightCornerEq[8, 1]

bottomEdge[{2,1}]

topEdge[4,8]

leftEdge[1, 4]

rightEdge[8, 4]

Map[bottomEdge, {{2,1}}]

eq1

eq2 *)

I hope to help people answer the questions:

1. Why is there a need for new analytical tools?

2. What are the potential alternatives?

3. What are some of the pro’s and con’s of each alternative?

4. Demonstrate some specific use-cases for each tool

5. Provide a roadmap to get started using these new tools

My full presentation will be available here.

]]>

http://news.mrdwab.com/2011/05/20/stratified-random-sampling-in-r-from-a-data-frame/

However, when I ran this function on my data, I received an error that R ran out of memory. Therefore, I had to create my own stratified sampling function that would work for large data sets with many groups.

After some trial and error, the key turned out to be sorting based on the desired groups and then computing counts for those groups. The procedure is extremely fast, taking only .18 seconds on a large data set. I welcome any feedback on how to improve!

stratified_sampling<-function(df,id, size) {

#df is the data to sample from

#id is the column to use for the groups to sample

#size is the count you want to sample from each group

# Order the data based on the groups

df<-df[order(df[,id],decreasing = FALSE),]

# Get unique groups

groups<-unique(df[,id])

group.counts<-c(0,table(df[,id]))

#group.counts<-table(df[,id])

rows<-mat.or.vec(nr=size, nc=length(groups))

# Generate Matrix of Sample Rows for Each Group

for (i in 1:(length(group.counts)-1)) {

start.row<-sum(group.counts[1:i])+1

samp<-sample(group.counts[i+1]-1,size,replace=FALSE)

rows[,i]<-start.row+samp

}

sample.rows<-as.vector(rows)

df[sample.rows,]

}

I wanted to share some new code with you. The code below allows a user to specify a range of data and then the code will output the standardized values (mean=0 and standard deviation 1) for each of the columns. This can be a big time saver over Excel’s standardize function, which requires the user to input the mean and standard deviation and only standardizes one cell at a time. Also, this allows the user to specify specifically how they want the standard deviation calculated.

I hope that you guys enjoy the code. I welcome any feedback!

Thanks,

Adam

Sub standardize_range() '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' 'This code allows a user to select a group of data organized by columns and it will provide a standardized 'output with mean 0, variance 1 ' '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' Dim variables As Range Dim change As Range Set variables = Application.InputBox("Select the row of variable names:", Default:=Range("a1:c1").Address, Type:=8) Set change = Application.InputBox("Select the row of cells containing the data to standardize:", Default:=Range("a2:c14").Address, Type:=8) Dim Total, Total_sq, Average, Variance As Double numRow = change.Rows.Count numCol = change.Columns.Count 'Debug.Print "numrow=" & numrow & " numcol=" & numCol '1. Calculate Average and Std for all Columns 'Note: This calculation for Variance and StdDev differs from Excel's slightly For j = 1 To numCol ReDim col(numRow) As Double Total = 0 Total_sq = 0 For i = 1 To numRow col(i) = change.Cells(i, j) 'Debug.Print col(i) Total = Total + col(i) Total_sq = Total_sq + col(i) ^ 2 Next i Average = Total / numRow Variance = (Total_sq / numRow) - (Average) ^ 2 Std = Variance ^ (1 / 2) 'Debug.Print "Total="; Total & " Average="; Average & " Total_sq=" & Total_sq & " Variance=" & Variance _ '& " Std=" & Std 'Store values change.Cells(numRow + 1, j).Value = Average change.Cells(numRow + 2, j).Value = Variance change.Cells(numRow + 3, j).Value = Std Next j '2. Create Sheet to store standardized values Application.DisplayAlerts = False Sheets("Standardized_Values").Delete Sheets.Add.Name = "Standardized_Values" 'Write variable names For k = 1 To numCol Sheets("Standardized_Values").Cells(1, k).Value = variables.Cells(1, k) & "_std" Next k '3. Compute standardized values For n = 1 To numCol For m = 1 To numRow Sheets("Standardized_Values").Cells(m + 1, n).Value = (change.Cells(m, n).Value - change.Cells(numRow + 1, n).Value) / _ change.Cells(numRow + 3, n).Value Next m Next n End Sub

]]>

Adam McElhinneyResearch Conducted Under the Supervision of Roy Gardener

IU Department of Economics

I. Introduction

IU Department of Economics

I. Introduction

Real estate is undoubtedly one of the most important components of the modern economy. In particular, owning a house has long typified the American dream and a residential home is the largest asset for a majority of Americans#. The housing market had swelled 12.5% from 1990 to 200, eventually totaling 119.6 million units#. Along with this growth in the housing market, has come an even greater growth in the accurate assessment of housing prices.

In response to this demand, the consumer market has responded with many websites offer users the ability to get a free and quick estimate of their homes value by inputting several variables such as homes location, size, and features. However, the accuracy of these estimates in unknown and the formulas they are using for estimation are often kept proprietary. Despite this, the internet has allowed for an unprecedented amount of real estate transactions to be tracked, often given details regarding the homes age, size, location, and features.

These data sets are currently being used in the world of financial economics, which has long been interested in real estate valuation. They have employed a variety of statistical techniques in an attempt to accurately predict the values of properties. The bulk of the research has relied on multivariate regression to assign weights to the various features of the home. However, the limitations of these techniques have been found and new statistical techniques such as neural networks are being tested as replacements.

The accuracy of these neural networks in comparison to traditional multivariate method is still in debate; however research is showing a growing niche where neural network models are able to outperform traditional techniques.

II. Background

There are three main methods of valuing real estate. One method is the discounted cash flow method. This is primarily used in the valuation of commercial properties. This method calculates the present value of all the income that one expects to receive from the property. However, this method has several flaws. First, the discount rate used is highly subjective. Various complex statistical techniques have been developed for calculating the discount rate. Further, the discount rate assumes that interest rates will stay constant. Interest rates can be highly volatile, additionally even small changes in the interest rate can have large effects of the property valuation. Lastly, the discounted cash flow model faces encounters circular logic if the purchase of the property is to be funded with a loan. The value of the asset is needed to calculate the WACC; however it is that value we seek to find#.

Another method for real estate valuation is the cost method. This method takes the value of the land as if it was vacant, plus the deprecated value of any structures occupying the land.# However, these methods are very limited in their practicality, especially when applied to large scale valuation of residential properties.

Undoubtedly the most popular technique for real estate valuation is the market comparison approach. In a traditional market comparison approach, a property’s value is determined by finding other recently sold properties with similar physical characteristics, called comparables, or ‘comps’ for short. Then, weights are applied to the various features of the house and adjusted to as closely approximate the target house as possible.

Several practical issues arise from this method. First of all, there may not be a large number of comparable properties that have been sold recently. This is particularly true if the property is very different from the average properties in the area, or has some unique characteristic that does not allow for comparison#. In addition, by omitting many other sales from the analysis, the appraiser may be losing valuable information. In a similar vein, typically only a small number of comparable properties are available. This number of comparable properties may not be enough to assume a statistically normal distribution and thus price estimates may be flawed.

Furthermore, the differences in the properties may be quite significant. There are 5 main differences listed that need to be accounted for in this method; physical, location, market conditions, terms of financing, and conditions of sale. For example, the prime rate, the rate that is largely responsible for the rate a homebuyer will pay for a loan, dropped 2.5% the first six months in 2001#. Thus any property sales that were more than 1 or 2 months old would need to be reconfigured to reflect this new rate.

Lastly, the weighting assigned to each of the various comparable properties is often subject to scrutiny and debate. The accuracy of these weights is highly relevant to the fair pricing of the home.

In order to counteract these limitations, appraisers have begun to rely on statistical techniques such as multivariate regression. In a typical appraisal, the prices of a large number of sold properties are regressed against the characteristics that are seen as influencing the prices of the properties. The defining work on this is the book Real Estate Valuation Theory published by the Appraisal Institute and American Real Estate Society.

In more recent years, appraisers have begun to rely on graphical interface systems (GIS) to assemble databases of properties that incorporate geographic information. Common geographic variables include distance to major metropolitan areas, crime rates, distance to public transportation, etc. An important early study by Wyatt incorporated other buildings, car parks, footpaths, boundary of the urban areas, tree preservation orders, open spaces, conservation areas, water features, road edges and centre-lines and railway lines and stations into his model of commercial real estate prices#.

However, these multiple variable regressions (MVR) have several limitations. First, for small sample sizes, multivariate regressions have proven to be extremely poor estimators of prices#. Therefore, in rural areas or properties that do not have many properties suitable for compassion, neural networks (NN) hold great promise.

In addition, many characteristics of the house are highly inter-correlated. This leads to the possibility of multicolinearity occurring. Further, the data used for the comps is often taken from different times. Thus, hetroskedasticity arises, further diminishing the accuracy of the estimates. Lastly, in an increasingly complex economy, assumptions of linearity seem to be less and less feasible.

In an effort to obtain more accurate sales predictions, some researchers have turned to neural networks to improve performance. The results of using neural networks versus multivariate regressions are contradictory and scattered. Studies by researchers such as Tskudua and Baba, Ngyuen and Cripps, and Hanson, all demonstrate effective ways of using neural networks for valuation #. However, seemingly just as many researchers have found MVR superior to NN#.

III. Explanation of the Neural Network System

To understand the neural network model, one must understand its motivation for creation. The neural network model is a statistical technique modeled off the way the brain processes data. A non-rigorous definition of neural networks is given by Gurney as follows:

A neural network is an interconnected assembly of simple processing elements, units or nodes, whose functionality is loosely based on the animal neuron. The processing ability of the network is stored in the inter-unit connection strengths, or weights, obtained by a process of adaptation to, or learning from, a set of training patterns.

The goal of the method was to create a non-linear and non-parametric statistical technique#. Nonparametric statistics are those that do not make any underlying assumptions, such as normality, about the population distribution#. This is a major weakness of traditional parametric statistical techniques, is that one must assume the underlying probability distribution. Non-parametric statistics are not bound by this constraint.

In order to better understand neural networks, it is helpful to have a little background knowledge on biological neurons, for which our model is based. For our purposes, biological neurons can be broken down into four simple parts. First, are the dendrites. Dendrites are responsible for receiving incoming signals from other neurons. Their structure typically resembles the branches of a tree, allowing them to interact with many different neurons in the brain. The second element is the soma, or cell body. It is the largest part of the cell and responsible for the cells vital functions. Moving forward, the axon is a single truck-like structure that extends from the soma. It serves to carry the impulse outward to the next part of the neuron, the pre-synaptic terminals. The pre-synaptic terminals branch out from the axon, connecting the neuron with other neurons and transmitting the impulse when applicable.

There are two types of impulses that the neurons transmit, inhibitory and exhibitory. When a neuron receives an impulse in the dendrites, it is then transmitted to the cell body#. If the strength of the impulse is greater than some value, it is said to be exhibitory and the cell continues to transfer the impulse. Conversely, if the strength of the impulse is less than some value, the impulse is not transmitted and it is labeled as inhibitory.

The artificial neuron used for our computation follows a similar form. The first artificial neuron neural network was the threshold logic unit (TLU), developed my McCulloch and Pitts#. A neuron receives inputs and the inputs are weighted in some fashion. If the summed value of the impulses exceeds a certain threshold level, the exiting impulse is given a value of one. If not, the impulse is given a value of zero and does not continue.

It is the arrangement of multiple neurons of this type that gives us a neural ‘network’. Arrangement on the neurons can take many forms, but in its simplest is a series of inputs, which then converge onto a hidden layer, continuing on to an outer layer of neurons, which then eventually produces the outputs#.

The network described above is referred to as a feed forward network. However, these networks can take many shapes and have nodes that not only send signals forward, but may send signals laterally and in some cases backward. More details on various structures will be given later.

The weights given to each impulse by the neuron are an important source of their computing strength. These weights need not necessarily be static. Rather, the weights may be altered by a set of known data. One extremely useful process to alter these weights is called supervised learning. In this process, the series of known data is passed through the neural network and the weights are adjusted according to a learning rule, to provide the desired outputs. This process is typically repeated many times until the weights very accurately produce the desired outputs. It is then the hope that when the network receives data it has not seen before, that it will be able to correctly identify the pattern and produce the correct outputs.

As one may notice, neural networks are not a very strongly defined statistical technique, thus giving them flexibility to be applied to a very wide range of areas and applications. Neural network models have been applied in diverse fields such as civil engineering#, pavement crack analysis#, soil and water retention# and many other seemingly unrelated areas. However, the diversity of the applications speaks to the utility of the method.

IV. Data Set

For this paper, data on133 real estate transactions, from the period from 3/20/2006 to 2/28/2007, was gathered from the website Zillow.com. Zillow is a free website designed to give consumers a quick estimate of their homes’ values. In order to calculate these estimates, Zillow contains a database of all home transactions for a particular region, going back to different periods, depending on the location. The transactions recorded are from the northwest suburbs of Chicago, primarily in the 60010, 60047, and 60042 zip codes. In addition, the database lists the address of the home, the number of bedrooms, the number of bathrooms, the size of the home in square-feet, the size of the lot in square feet, the price the home sold for, the date the home was sold on, and the age of the home. The variables, size of home in square feet, size of lot in square feet, have both been divided by 1000 to better condition the numbers to be squared and cubed. Similarly, the variable for the age of the house has been divided by 10. These will be the variables used in our model to predict housing prices.

The data was randomly broken down into 79 “training observations” and 54 “validation” observations. This ratio of training observations to validation observations is consistent with previous literature Each model will be examined for their ability to predict the values of the 54 validation housing prices.

Variable Name | Description |

Price | Listed in US Dollars |

BD | Number of Bedrooms |

BA | Number of Bathrooms |

SIZE | (Square Footage of the House)/1000 |

LOT | (Square Footage of the Lot)/1000 |

AGE | (Number of years since the house was built)/10 |

LN(VAR) | The natural log of the specified variable |

SQ(VAR) | The specified variable squared |

CU(VAR) | The specified variable cubed |

Q2 | Dummy Variable indicating whether or not the house was sold in the second quarter of 2006 |

Q3 | Dummy Variable indicating whether or not the house was sold in the third quarter of 2006 |

Q4 | Dummy Variable indicating whether or not the house was sold in the fourth quarter of 2006 |

Q5 | Dummy Variable indicating whether or not the house was sold in the first quarter of 2007 |

Descriptive statistics of the data are provided below.

===============================================================================

Variable Mean Std.Dev. Minimum Maximum Cases

===============================================================================

——————————————————————————-

All observations in current sample

——————————————————————————-

PRICE 387570.301 189266.582 100000.000 1275000.00 133

BD 3.48872180 .794074088 2.00000000 6.00000000 133

BA 2.36466165 .755980749 1.00000000 5.50000000 133

SIZE 2095.35338 1112.12044 854.000000 10018.0000 133

LOT 10698.8797 8592.70512 70.0000000 52272.0000 133

AGE 35.7142857 24.1584327 3.00000000 140.000000 133

Q2 .368421053 .484200125 .000000000 1.00000000 133

Q3 .323308271 .469507675 .000000000 1.00000000 133

Q4 .157894737 .366020890 .000000000 1.00000000 133

Q5 .827067669E-01 .276479842 .000000000 1.00000000 133

As the graphs show, there is a lot of variability amongst the different variables and their effects on price. This is most likely due in large part to the fact that our model does not control any geographic variables. Anyone who knows the Chicago area can tell you that the closer one gets to the city, the higher housing prices are. Despite this, these complicated relationships are exactly what neural networks are designed to predict and thus this high variability should be a good testing grounds for a non-linear model.

V. Multivariate Regression Models

In accordance with previous literature, several multivariate regression models were tested. Previous literature has hypothesized that real estate data may follow a semi-log, or a log-log pattern. In addition, research has shown that age, property size, and house size may have a squared or even cubic relationship.

Lastly, there is a large possibility for hetroskedasticity to occur due to the fact that data is pulled from different time periods, thus all the models are repeated using White’s model for hetroskedasticity. The “Un-Whitened” model formats are given below.

Linear: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

Semi-Log: Ln(Selling price)= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

Log-Log: Ln(Selling price)= +LN(BD) + LN(BA) + LN (Size) + LN (Lot) + LN (Age) + LN (Q2) + LN (Q3) + LN (Q4)+ LN (Q5) +

Age: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

Lot: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

Size: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

Age,Lot, Size: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

Age: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

Lot: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

Size: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

Age,Lot, Size: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

Somewhat surprisingly, the simplest linear model without correcting for hetreoskedasticity proves to yield the most statistically significant variables. All of the variables, except for the quarter sold variables, are significant. However, previous studies utilizing much larger sample sizes have shown all these variables to be significant, thus we accept that they are relevant for our analysis.

Linear: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

+———————————————————————–+

| Ordinary least squares regression Weighting variable = none |

| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |

| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |

| Residuals: Sum of squares= .1861022040E+13, Std.Dev.= 123005.12383 |

| Fit: R-squared= .606423, Adjusted R-squared = .57762 |

| Model test: F[ 9, 123] = 21.06, Prob value = .00000 |

| Diagnostic: Log-L = -1742.2784, Restricted(b=0) Log-L = -1804.2882 |

| LogAmemiyaPrCrt.= 23.512, Akaike Info. Crt.= 26.350 |

| Autocorrel: Durbin-Watson Statistic = 1.64819, Rho = .17591 |

+———————————————————————–+

+———+————–+—————-+——–+———+———-+

|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|

+———+————–+—————-+——–+———+———-+

Constant -226120.6909 72598.896 -3.115 .0023

BD 37659.81050 17601.113 2.140 .0344 3.4887218

BA 122116.0065 18335.652 6.660 .0000 2.3646617

SIZE 35.09598346 12.983202 2.703 .0078 2095.3534

LOT 2.715900837 1.3582596 2.000 .0478 10698.880

AGE 1245.204431 460.88289 2.702 .0079 35.714286

Q2 33570.17097 45364.143 .740 .4607 .36842105

Q3 54453.79975 45884.666 1.187 .2376 .32330827

Q4 68688.20522 50361.193 1.364 .1751 .15789474

Q5 68402.54427 57322.227 1.193 .2350 .82706767E-01

The model was then back tested using the 54 validation observations. The success or failure of the model was based upon its ability to achieve a low forecasting error.

Forecasting Error= |Actual Housing Price- Predicted Housing Price|

Actual Housing Price

Mean Forecasting Error of the Multiple Variable Regression Models

Linear | Semi-Log | Log-Log | Age | Lot | Size |

0.254316 | 0.235697 | 0.259342 | 0.258416 | 0.252312 | 0.250122 |

Age,Lot, Size | Age | Lot | Size | Age,Lot, Size | |

0.254224 |
0.261117 |
0.253886 |
0.249586 |
0.260392 |

It is somewhat surprising that the Semi-Log model has the lowest mean forecasting error. In order to predict with a semi-log model, one must take the exponential of the predicted prices. This is a non-linear transformation and therefore introduces bias. However, in this case it appears that this bias is overcome by the increased predictive power of the model.

The lack of variability between the 11 models is striking. The different in mean forecasting error between the highest and the lowest models is only .025. However, given a larger sample size, one would expect this to increase as more subtle relationships between the independent variables are teased out.

The distribution of the forecasting errors for the semi-log model is shown below. The model does an impressive job, 32 of the observations have a forecasting error of less than .2 and 46 of the observations have a forecasting effort of less than .40.

VI. Neural Network Model Explanation

In order to contrast the performance of the two statistical techniques, a traditional feed forward neural network was utilized. In a feed forward network, the data is separated into two sets, the training set and the validation set. As stated previously, our data was randomly broken down into a training set of 79 observations and a validation set of 54 observations. Then, the network is ‘trained’, meaning the training data is passed through the network and the initial parameters are adjusted in some manner, for a specified number of iterations. Then the validation set is used to check the accuracy of the model specified by the training. In this analysis, a sigmoid activation function was used for its simplicity and ease of differentiation.

Sigmoid function

The number of hidden layers, the number of neurons per hidden layer, and the number of training iterations to conduct are open to debate. Previous research on neural networks for real estate prediction has shown that one hidden layer is typically sufficient for accurate assessment#. Similarly, one hidden layer is used in our model. For the other parameters, there are a few ‘rules of thumb’ given for these parameters, but due to the low number of observations in our sample, numerous combinations were tested.

The neural network package utilized for this analysis automatically sets the initial values to random numbers within the data range. This results in quicker training than a pure random initialization#.

Neural Network Model Specifications

Model Number | Hidden Layers | Neurons Per Hidden Layer |

1 | 1 | 1 |

2 | 1 | 2 |

3 | 1 | 3 |

4 | 1 | 4 |

5 | 1 | 5 |

6 | 1 | 6 |

7 | 1 | 7 |

8 | 1 | 8 |

9 | 1 | 9 |

10 | 1 | 10 |

11 | 1 | 15 |

12 | 1 | 20 |

13 | 1 | 25 |

The following models were all run with 5, 10, 15, 20, 25, 30, 50 and 75 training iterations. One of the difficulties of the neural network model is estimating how many training iterations to use. If you underestimate the amount needed, the results are not as accurate as they could be. Conversely, if one uses too many training iterations the function may be over-fitted to the data. In turn, this over-fitting causes a loss of accuracy when new data is run through the model.

To compare the accuracy of the models, the validation data was run through both models and the mean forecasting error was computed.

Mean Forecasting Error of Various Neural Network Models

Model Number | Number of Neurons on One Hidden Layer | 5 Training Iterations | 10 Training Iterations | 15 Training Iterations | 20 Training Iterations | 25 Training Iterations | 30 Training Iterations | 50 Training Iterations | 75 Training Iterations |

1 |
1 |
0.277338 | 0.277165 | 0.277342 | 0.277027 | 0.277411 | 0.276779 | 0.2761 | 0.2761 |

2 |
2 |
0.283716 | 0.277502 | 0.280274 | 0.285198 | 0.28401 | 0.282454 | 0.302764 | 0.301875 |

3 |
3 |
0.267825 | 0.273145 | 0.300541 | 0.304717 | 0.317211 | 0.329043 | 0.35199 | 0.350984 |

4 |
4 |
0.293282 | 0.343108 | 0.338736 | 0.360779 | 0.371395 | 0.406882 | 0.429371 | 0.425458 |

5 |
5 |
0.319706 | 0.382284 | 0.385009 | 0.380813 | 0.376205 | 0.382051 | 0.384963 | 0.388435 |

6 |
6 |
2.06627 | 9.448 | 1963.7 | 2052.98 | 2361.01 | 399139. | 3436.7 | 4133.77 |

7 |
7 |
0.312132 | 0.287282 | 0.299599 | 0.309019 | 0.330474 | 0.349585 | 0.493541 | 0.635953 |

8 |
8 |
0.352379 | 0.393616 | 0.635656 | 0.959402 | 1.59276 | 0.538184 | 1.07871 | 6.10403 |

9 |
9 |
1.17289 | 1.57346 | 1.39654 | 93.6565 | 749.619 | 638.547 | 256.935 | 33.1817 |

10 |
10 |
0.297179 | 0.328543 | 0.354833 | 0.412025 | 0.461227 | 0.48286 | 0.703548 | 0.918779 |

11 |
15 |
0.33348 | 0.460503 | 0.489845 | 0.788403 | 0.917985 | 1.20471 | 1.13428 | 1.17958 |

12 |
20 |
0.417438 | 0.530489 | 0.692084 | 0.883574 | 1.16352 | 1.33265 | 1.39361 | 1.48593 |

13 |
25 |
0.657161 | 2.37524 | 3.20335 | 4.61429 | 5.74316 | 7.12753 | 11.5672 | 13.0967 |

FE<0.3 | |||||||||

.3<FE<0.6 | |||||||||

.6<FE<2 | |||||||||

2<FE<10 | |||||||||

FE<10 |

As one can see, the neural network model presented here has the best results with a relatively low number of neurons per hidden layer and low training iterations. In fact, the model with the lowest mean forecasting error has three neurons per hidden layer and only 5 training iterations. This is not entirely surprising, due to the fact that our entire data set was only 133 observations, with nine explanatory variables. If one were to add more observations and more explanatory variables, then the number of training iterations and the number of neurons per hidden layer, and possibly even the number of hidden layers, would likely need to be increased. Interestingly, there is very little variability in mean forecasting error between models 1-5. Despite the parameters ranging from 1-5 neurons and 5-75 training iterations, the mean forecasting error varies by a scant .16.

For models 6-13, there appears to be massive over-training tacking place, resulting in ridiculous price predictions. The absurdity of these estimates underscores the importance of choosing the correct parameters when constructing a neural network. Interestingly, there are some shockingly precise estimates that occur within this range. In particular, model 10 with 5 training iterations and model 7 with 10 and 15 training iterations have mean forecasting errors below .30. This is most likely due to the randomness utilized by the neural network in selecting the initial training parameters and is not repeatable. Again, this serves as a cautionary example of the utilization of neural networks. One must not only concern themselves with the neural network predictions for their particular data set, but whether this network could be repeated in different circumstances.

The chart below shows the distribution of the forecasting errors for the model with the lowest mean forecasting error, model 13. As is apparent in the chart, the model is quite effective for the majority of the predictions. However, several outliers are undoubtedly pulling the mean forecasting error much higher. In fact, approximately 28 of the predicted prices have a forecasting error of less than .2 and an impressive 46 of the predicted prices have a forecasting error of less than .40. This is contrasted by the drastic increases in forecasting errors for predictions 50 and onward. Once again, this high variability is most likely a result of the geographic regions from which the data was taken, but none the less illustrates how a neural network will handle predictions of data with high variability.

VII. Comparison of the Two Models

In our analysis, it seems clear that the Semi-log MVREG model is the most accurate model for predicting the real estate prices. The difference in mean forecasting between the best multivariable regression model and the best neural network model was approximately .032. Additionally, the chart below shows how the distribution of forecasting errors is shown below.

The Semi-Log model beats the NN Model 13 in almost every observation. However, in the low to middle spectrum of the forecasting error, the NN 13 competes closely with the Semi-Log model, even showing superiority in some cases over the range 37-49. However, the largest difference between the two models is that the Semi-Log model seems to do a better job of dealing with outliers than the NN Model 13. This is undoubtedly where a large portion of the difference in mean forecasting error is created.

Additionally, the computational requirements of the neural network models further hinder its practicality. Despite the small sample size used in this analysis, it took the 3.2 GHz Pentium IV with Hyper-Threading approximately 22 minutes to run the 104 neural network models, or .21 minutes per model. However, the 11 multivariate regressions models ran almost instantly.

The ambiguity of a neural network model also presents a problem. The difference between the highest and lowest forecasting error in a multivariate regression model was .025. The difference in the neural network models was over 11. The neural network models were admittedly grossly over trained, but there is not this potential for error in a multivariate regression model.

The greater potential for error in a neural network, model stems from the fact that there are many more parameters that the operator must specify than with the regression model. The chart below summarizes the decisions that must be made to construct the two different models.

Multivariable Regression Model | Neural Network Model |

Data Transformations (e.g., log, semi-log, squared, etc) | Model Structure (Feed-Forward, Radial Basis, Hopfield, etc.) |

Significance Level | Hidden Layers |

Number of Neurons per Hidden Layer | |

Number of Training Iterations | |

Activation Function | |

Initial Training Values |

Although there is research suggesting values for these parameters in the neural network model, there are not hard rules and experimentation is required. One may also pose the question of whether once the appropriate parameters of a neural network model are found, will those parameters remain optimal with different data. If a large scale real estate company was deciding whether or not to purchase several properties, the NN model they had previously relied on may not give accurate predictions of the price for this new set of possible purchases. In contrast, the parameters for the multivariable regression model are much fewer and have long established ranges (e.g. p<.05 significance level for variables, etc).

Another concern with neural network models is variation among various neural network programs. As a result of being very computationally intensive, neural network models tend to be particularly sensitive to computer programming intricacies. Every neural network program comes with a large manual that details the construction of the software and studies have shown how the different programs can produce different results from the same inputs#.

In contrast, the reliability of most regression software is well known. The software is much less sensitive to differences between programs, due to the decreased computations necessary. Additionally, the software is available to a wider range of users, particularly as a result of the data analysis pack that is embedded in all new versions of Microsoft Excel. Neural network software must be purchased separately, often at significant cost.

VII. Conclusion

Our data set showed a neural network model to be inferior to a multivariable regression. There is greater potential for error due to the increased number of user specified parameters. Additionally, the repeatability of the results amongst various computer programs is disputable. Lastly, the computational requirements make the NN models longer to run than the MVR’s. However, there is still much work to be done in the area of neural networks. Neural network models should be studied utilizing much larger data sets that was used in this study. Additionally, rules for establishing the parameters of the neural network model need to be created and rigorously tested. Further, the performance of neural networks with various levels of sufficiency of variables needs to be evaluated. Lastly, our study emphasizes that fact that whatever model of prediction being used, there must be variables to account for geographic differences.

That being said, the regression model was first contemplated in 1885 and is still being refined today#. In contrast, the neural network model as a computational tool came into being in 1949#. Thus, it is still an infant statistical technique, in need of much more refinement to find large scale application. Due to the ever increasing complexity of problems being studied by researchers and the exponential increases in computing power and technology, it is unlikely that neural networks will cease to be a promising area of study.

The complex nature of real estate valuation will certainly provide a plethora of applications for NN models. Once the technique evolves from its current state as something of an art form into a well defined statistical technique, we will see much more accurate real estate valuations.

Works Cited

“2002 Profile of US Real Estate Markets.” National Association of Realtors. 8 Apr. 2007 <http://www.realtor.org/cipshome.NSF/19A06EC3575CECE886256A1B00731E6F/AE4C9FEC10EFF28386256AE2006CE138?OpenDocument#Demographics>.

Adeli, Hojjat. “Neural Networks in Civil Engineering: 1989−2000.” Computer-Aided Civil and Infrastructure Engineering (2001): 126-142.

“Bank Prime Loan Rate Changes: Historical Dates of Changes and Rates.” Federal Reserve Bank of Saint Louis. 8 Apr. 2007 <http://research.stlouisfed.org/fred2/data/PRIME.txt>.

Benjamin, John D., Randall S. Guttery, and C.f. Sirmans. “Mass Appraisal: an Introduction to Multiple Regression Analysis for Real Estate Apprasial.” Journal of Real Estate Practice and Education (2004).

D. O. Hebb. The Organization of Behaviour. John Wiley & Sons, New York, 1949.

Francis Galton. Presidential address, Section H, Anthropology. (1885)

Gurney, Kevin. An Introduction to Neural Networks. London: UCL P Limited, 1997.

Hanson, James. “Australian House Prices; a Comparision of Hedonic and Repeat Sales Measures.” Reserve Bank of Australia (2006).

Hecht-Nielsen, R., Kolmogorov’s Mapping Neural Network Existence Theorem, Paper

presented at IEEE First International Conference on Neural Networks, San Diego, CA,1987.

Hoesli, Martin, Elion Jani, and Andre´ Bender. “Monte Carlo Simulations for Real Estate Valuation.” Journal of Property Investment & Finance (2006).

Johnston, Sarah Jane. “Real Estate: the Most Imperfect Asset.” 30 Aug. 2004. Harvard Business School. 8 Apr. 2007 <http://hbswk.hbs.edu/item/4342.html>.

Kershaw, Paul, and Peter Rossini. “Using Neural Networks to Estimate Constant Quality House Price Indices.” Fifth Annual Pacific-Rim Real Estate Society Conference (1999). 8 Apr. 2007.

Koekoek, E . J . W . “Neural Network Models to Predict Soil Water Retention.” European Journal of Soil Science Sept (1999): 489-495.

Lee, Byoung J. “Position-Invariant Neural Network for Digital.” Computer-Aided Civil and Infrastructure Engineering (2004): 105-118.

Nguyen, Nghiep, and Al Cripps. “Predicting Housing Value: a Comparison of Multiple Regression Analysis and Artificial Neural Network Analysis.” Journal of Real Estate Research 22 (2001): 313-336.

“Nonparametric Statistics.” StatSoft. 8 Apr. 2007 <http://www.statsoft.com/textbook/stnonpar.html>.

Pullum, Dr. Laura L., Dr. Marjorie A Darrah, and Mr. Brian J. Taylor. “Independent Verification and Validation of Neural Networks.” 8 Apr. 2007 <https://www.softwaretechnews.com//stn7-2/ivv.html>.

Segel, Arthur, and Sarah Jane Johnston. “Real Estate: the Most Imperfect Asset.” Harvard Business School (2004). 8 Apr. 2007 <http://hbswk.hbs.edu/item/4342.html>.

Sjoberg, Jonas. Mathmatica Neural Networks. Champaign, Illinois: Wolfram Research, 2004. 8 Apr. 2007 <http://www.wolfram.com/products/applications/neuralnetworks/>

“The Neuron.” UNAM, Instituto de Fisilogia Celular. 8 Apr. 2007 <http://www.ifisiol.unam.mx/Brain/neuron.htm>.

Limdep Code File Used for Multiple Regressions

read ;nobs=1474

;nvar=11

;file = “E:\Econ Thesis\Data\real estate data 3-30-07.txt”

;names =1$

create; if (quarter=2) q2=1$

create; if (quarter=3) q3=1$

create; if (quarter=4) q4=1$

create; if (quarter=5) q5=1$

dstats; rhs=price,bd, ba, size, lot, age, q2, q3, q4, q5$

skip;

regress;

lhs=price;

rhs=one, bd, ba, size, lot, age, q2, q3, q4, q5$ this is a linear regression

create; lnprice=log(price)$

skip;

regress;

lhs=lnprice;

rhs=one, bd, ba, size, lot, age, q2, q3, q4, q5$ this is a semi-log regression

create; lnbd=log(bd)$

create; lnba=log(ba)$

create; lnsize=log(size)$

create; lnlot=log(lot)$

create; lnage=log(age)$

skip;

regress;

lhs=lnprice;

rhs=one, lnbd, lnba, lnsize, lnlot, lnage, q2, q3, q4, q5$ this is a log-log regression

create; sqage=age*age$

create; sqlot=lot*lot$

create; sqsize=size*size$

skip;

regress;

lhs=price;

rhs=one, bd, ba, size, lot, sqage, q2, q3, q4, q5$ this is a linear regression with age^2

regress;

lhs=price;

rhs=one, bd, ba, size, sqlot, age, q2, q3, q4, q5$ this is a linear regression with lot^2

regress;

lhs=price;

rhs=one, bd, ba, sqsize, lot, age, q2, q3, q4, q5$ this is a linear regression with size^2

regress;

lhs=price;

rhs=one, bd, ba, sqsize, sqlot, sqage, q2, q3, q4, q5$ this is a linear regression with age^2,lot^2, and size^2

create; cuage=age*age*age$

create; culot=lot*lot*lot$

create; cusize=size*size*lot$

regress;

lhs=price;

rhs=one, bd, ba, size, lot, cuage, q2, q3, q4, q5$ this is a linear regression with age^3

regress;

lhs=price;

rhs=one, bd, ba, size, culot, age, q2, q3, q4, q5$ this is a linear regression with lot^3

regress;

lhs=price;

rhs=one, bd, ba, cusize, lot, age, q2, q3, q4, q5$ this is a linear regression with size^3

regress;

lhs=price;

rhs=one, bd, ba, cusize, culot, cuage, q2, q3, q4, q5$ this is a linear regression with age^3,lot^3, and size^3

skip;

regress;

lhs=price;

rhs=one, bd, ba, size, lot, age, q2, q3, q4, q5; hetro$ this is a linear regression

skip;

regress;

lhs=lnprice;

rhs=one, bd, ba, size, lot, age, q2, q3, q4, q5; hetro$ this is a semi-log regression

create; lnbd=log(bd)$

create; lnba=log(ba)$

create; lnsize=log(size)$

create; lnlot=log(lot)$

create; lnage=log(age)$

skip;

regress;

lhs=lnprice;

rhs=one, lnbd, lnba, lnsize, lnlot, lnage, q2, q3, q4, q5; hetro$ this is a log-log regression

skip;

regress;

lhs=price;

rhs=one, bd, ba, size, lot, sqage, q2, q3, q4, q5; hetro$ this is a linear regression with age^2

regress;

lhs=price;

rhs=one, bd, ba, size, sqlot, age, q2, q3, q4, q5; hetro$ this is a linear regression with lot^2

regress;

lhs=price;

rhs=one, bd, ba, sqsize, lot, age, q2, q3, q4, q5; hetro$ this is a linear regression with size^2

regress;

lhs=price;

rhs=one, bd, ba, sqsize, sqlot, sqage, q2, q3, q4, q5; hetro$ this is a linear regression with age^2,lot^2, and size^2

regress;

lhs=price;

rhs=one, bd, ba, size, lot, cuage, q2, q3, q4, q5; hetro$ this is a linear regression with age^3

regress;

lhs=price;

rhs=one, bd, ba, size, culot, age, q2, q3, q4, q5; hetro$ this is a linear regression with lot^3

regress;

lhs=price;

rhs=one, bd, ba, cusize, lot, age, q2, q3, q4, q5; hetro$ this is a linear regression with size^3

regress;

lhs=price;

rhs=one, bd, ba, cusize, culot, cuage, q2, q3, q4, q5; hetro$ this is a linear regression with age^3,lot^3, and size^3

Appendix of Regression Results

Linear: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

+———————————————————————–+

| Ordinary least squares regression Weighting variable = none |

| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |

| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |

| Residuals: Sum of squares= .1861022040E+13, Std.Dev.= 123005.12383 |

| Fit: R-squared= .606423, Adjusted R-squared = .57762 |

| Model test: F[ 9, 123] = 21.06, Prob value = .00000 |

| Diagnostic: Log-L = -1742.2784, Restricted(b=0) Log-L = -1804.2882 |

| LogAmemiyaPrCrt.= 23.512, Akaike Info. Crt.= 26.350 |

| Autocorrel: Durbin-Watson Statistic = 1.64819, Rho = .17591 |

+———————————————————————–+

+———+————–+—————-+——–+———+———-+

|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|

+———+————–+—————-+——–+———+———-+

Constant -226120.6909 72598.896 -3.115 .0023

BD 37659.81050 17601.113 2.140 .0344 3.4887218

BA 122116.0065 18335.652 6.660 .0000 2.3646617

SIZE 35.09598346 12.983202 2.703 .0078 2095.3534

LOT 2.715900837 1.3582596 2.000 .0478 10698.880

AGE 1245.204431 460.88289 2.702 .0079 35.714286

Q2 33570.17097 45364.143 .740 .4607 .36842105

Q3 54453.79975 45884.666 1.187 .2376 .32330827

Q4 68688.20522 50361.193 1.364 .1751 .15789474

Q5 68402.54427 57322.227 1.193 .2350 .82706767E-01

Semi-Log: Ln(Selling price)= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

+———————————————————————–+

| Ordinary least squares regression Weighting variable = none |

| Dep. var. = LNPRICE Mean= 12.77153525 , S.D.= .4290662382 |

| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |

| Residuals: Sum of squares= 9.657298755 , Std.Dev.= .28020 |

| Fit: R-squared= .602595, Adjusted R-squared = .57352 |

| Model test: F[ 9, 123] = 20.72, Prob value = .00000 |

| Diagnostic: Log-L = -14.3136, Restricted(b=0) Log-L = -75.6798 |

| LogAmemiyaPrCrt.= -2.472, Akaike Info. Crt.= .366 |

| Autocorrel: Durbin-Watson Statistic = 1.65052, Rho = .17474 |

+———————————————————————–+

+———+————–+—————-+——–+———+———-+

|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|

+———+————–+—————-+——–+———+———-+

Constant 11.37970888 .16537966 68.810 .0000

BD .1151867289 .40095183E-01 2.873 .0048 3.4887218

BA .2576886939 .41768456E-01 6.169 .0000 2.3646617

SIZE .7566868199E-04 .29575623E-04 2.558 .0117 2095.3534

LOT .5521456699E-05 .30941035E-05 1.785 .0768 10698.880

AGE .2349243288E-02 .10498872E-02 2.238 .0270 35.714286

Q2 .3073233877E-01 .10333913 .297 .7667 .36842105

Q3 .9723291039E-01 .10452487 .930 .3541 .32330827

Q4 .1497174823 .11472236 1.305 .1943 .15789474

Q5 .1535488191 .13057954 1.176 .2419 .82706767E-01

Log-Log: Ln(Selling price)= +LN(BD) + LN(BA) + LN (Size) + LN (Lot) + LN (Age) + LN (Q2) + LN (Q3) + LN (Q4)+ LN (Q5) +

+———————————————————————–+

| Ordinary least squares regression Weighting variable = none |

| Dep. var. = LNPRICE Mean= 12.77153525 , S.D.= .4290662382 |

| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |

| Residuals: Sum of squares= 10.20204619 , Std.Dev.= .28800 |

| Fit: R-squared= .580179, Adjusted R-squared = .54946 |

| Model test: F[ 9, 123] = 18.89, Prob value = .00000 |

| Diagnostic: Log-L = -17.9627, Restricted(b=0) Log-L = -75.6798 |

| LogAmemiyaPrCrt.= -2.417, Akaike Info. Crt.= .420 |

| Autocorrel: Durbin-Watson Statistic = 1.60387, Rho = .19806 |

+———————————————————————–+

+———+————–+—————-+——–+———+———-+

|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|

+———+————–+—————-+——–+———+———-+

Constant 8.757137166 .63677360 13.752 .0000

LNBD .2241713068 .15172804 1.477 .1421 1.2228521

LNBA .5008477393 .10998272 4.554 .0000 .81084373

LNSIZE .3844196249 .92943230E-01 4.136 .0001 7.5455685

LNLOT .6238403231E-02 .31755881E-01 .196 .8446 8.9728713

LNAGE .9113424286E-01 .36461800E-01 2.499 .0138 3.3225134

Q2 .3070924469E-01 .10623658 .289 .7730 .36842105

Q3 .8425880607E-01 .10815729 .779 .4375 .32330827

Q4 .1622575363 .11861502 1.368 .1738 .15789474

Q5 .1275544129 .13472863 .947 .3456 .82706767E-01

Age: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

+———————————————————————–+

| Ordinary least squares regression Weighting variable = none |

| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |

| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |

| Residuals: Sum of squares= .1894530209E+13, Std.Dev.= 124107.55265 |

| Fit: R-squared= .599337, Adjusted R-squared = .57002 |

| Model test: F[ 9, 123] = 20.44, Prob value = .00000 |

| Diagnostic: Log-L = -1743.4651, Restricted(b=0) Log-L = -1804.2882 |

| LogAmemiyaPrCrt.= 23.530, Akaike Info. Crt.= 26.368 |

| Autocorrel: Durbin-Watson Statistic = 1.58509, Rho = .20745 |

+———————————————————————–+

+———+————–+—————-+——–+———+———-+

|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|

+———+————–+—————-+——–+———+———-+

Constant -198197.6762 71355.991 -2.778 .0063

BD 41632.61060 17697.361 2.352 .0202 3.4887218

BA 119922.0764 18461.383 6.496 .0000 2.3646617

SIZE 30.77013285 12.951058 2.376 .0191 2095.3534

LOT 2.819835392 1.3728617 2.054 .0421 10698.880

SQAGE 9.136372604 4.0879400 2.235 .0272 1854.7519

Q2 31407.95264 45749.284 .687 .4937 .36842105

Q3 56904.58301 46290.731 1.229 .2213 .32330827

Q4 63905.15657 50810.226 1.258 .2109 .15789474

Q5 64079.37547 57762.299 1.109 .2694 .82706767E-01

Lot: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

+———————————————————————–+

| Ordinary least squares regression Weighting variable = none |

| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |

| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |

| Residuals: Sum of squares= .1864381533E+13, Std.Dev.= 123116.09740 |

| Fit: R-squared= .605713, Adjusted R-squared = .57686 |

| Model test: F[ 9, 123] = 21.00, Prob value = .00000 |

| Diagnostic: Log-L = -1742.3983, Restricted(b=0) Log-L = -1804.2882 |

| LogAmemiyaPrCrt.= 23.514, Akaike Info. Crt.= 26.352 |

| Autocorrel: Durbin-Watson Statistic = 1.65096, Rho = .17452 |

+———————————————————————–+

+———+————–+—————-+——–+———+———-+

|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|

+———+————–+—————-+——–+———+———-+

Constant -218881.7371 72780.583 -3.007 .0032

BD 42017.60864 17509.912 2.400 .0179 3.4887218

BA 119170.0447 18521.575 6.434 .0000 2.3646617

SIZE 36.16375705 12.928519 2.797 .0060 2095.3534

SQLOT .5893264706E-04 .30354526E-04 1.941 .0545 .18774546E+09

AGE 1243.157772 461.28091 2.695 .0080 35.714286

Q2 33332.79329 45410.126 .734 .4643 .36842105

Q3 55029.73049 45919.835 1.198 .2331 .32330827

Q4 71429.12637 50299.962 1.420 .1581 .15789474

Q5 66239.11398 57337.025 1.155 .2502 .82706767E-01

Size: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

+———————————————————————–+

| Ordinary least squares regression Weighting variable = none |

| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |

| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |

| Residuals: Sum of squares= .1940633545E+13, Std.Dev.= 125608.55264 |

| Fit: R-squared= .589586, Adjusted R-squared = .55956 |

| Model test: F[ 9, 123] = 19.63, Prob value = .00000 |

| Diagnostic: Log-L = -1745.0640, Restricted(b=0) Log-L = -1804.2882 |

| LogAmemiyaPrCrt.= 23.554, Akaike Info. Crt.= 26.392 |

| Autocorrel: Durbin-Watson Statistic = 1.59022, Rho = .20489 |

+———————————————————————–+

+———+————–+—————-+——–+———+———-+

|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|

+———+————–+—————-+——–+———+———-+

Constant -241337.0604 74193.454 -3.253 .0015

BD 49579.14015 17172.555 2.887 .0046 3.4887218

BA 135398.2084 17772.520 7.618 .0000 2.3646617

SQSIZE .1810164194E-02 .12924632E-02 1.401 .1639 5618018.3

LOT 3.151652620 1.3736058 2.294 .0235 10698.880

AGE 1117.317431 467.27311 2.391 .0183 35.714286

Q2 39357.25904 46281.101 .850 .3968 .36842105

Q3 61364.81777 46755.053 1.312 .1918 .32330827

Q4 74204.44587 51370.025 1.445 .1511 .15789474

Q5 71571.08568 58518.552 1.223 .2236 .82706767E-01

Age,Lot, Size: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

+———————————————————————–+

| Ordinary least squares regression Weighting variable = none |

| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |

| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |

| Residuals: Sum of squares= .1964510934E+13, Std.Dev.= 126378.92867 |

| Fit: R-squared= .584537, Adjusted R-squared = .55414 |

| Model test: F[ 9, 123] = 19.23, Prob value = .00000 |

| Diagnostic: Log-L = -1745.8772, Restricted(b=0) Log-L = -1804.2882 |

| LogAmemiyaPrCrt.= 23.567, Akaike Info. Crt.= 26.404 |

| Autocorrel: Durbin-Watson Statistic = 1.55620, Rho = .22190 |

+———————————————————————–+

+———+————–+—————-+——–+———+———-+

|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|

+———+————–+—————-+——–+———+———-+

Constant -209053.4763 72658.334 -2.877 .0047

BD 57570.61037 17043.151 3.378 .0010 3.4887218

BA 129251.6231 17991.802 7.184 .0000 2.3646617

SQSIZE .1652037302E-02 .12929569E-02 1.278 .2038 5618018.3

SQLOT .6912860879E-04 .31054480E-04 2.226 .0278 .18774546E+09

SQAGE 8.968801650 4.1642212 2.154 .0332 1854.7519

Q2 37103.33327 46532.998 .797 .4268 .36842105

Q3 63880.95430 47029.821 1.358 .1769 .32330827

Q4 72745.10419 51545.465 1.411 .1607 .15789474

Q5 65725.66481 58752.159 1.119 .2655 .82706767E-01

Age: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

+———————————————————————–+

| Ordinary least squares regression Weighting variable = none |

| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |

| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |

| Residuals: Sum of squares= .1925036441E+13, Std.Dev.= 125102.76885 |

| Fit: R-squared= .592885, Adjusted R-squared = .56310 |

| Model test: F[ 9, 123] = 19.90, Prob value = .00000 |

| Diagnostic: Log-L = -1744.5274, Restricted(b=0) Log-L = -1804.2882 |

| LogAmemiyaPrCrt.= 23.546, Akaike Info. Crt.= 26.384 |

| Autocorrel: Durbin-Watson Statistic = 1.52685, Rho = .23657 |

+———————————————————————–+

+———+————–+—————-+——–+———+———-+

|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|

+———+————–+—————-+——–+———+———-+

Constant -188301.8461 71581.262 -2.631 .0096

BD 43262.82077 17864.936 2.422 .0169 3.4887218

BA 119268.5435 18603.691 6.411 .0000 2.3646617

SIZE 29047.71377 13050.325 2.226 .0278 2.0953534

LOT 2784.715719 1384.2298 2.012 .0464 10.698880

CUAGE 61.32949530 35.606852 1.722 .0875 126.19379

Q2 30085.44489 46106.441 .653 .5153 .36842105

Q3 57714.60912 46667.604 1.237 .2185 .32330827

Q4 61911.92839 51262.681 1.208 .2295 .15789474

Q5 60372.26504 58168.168 1.038 .3014 .82706767E-01

Lot: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

+———————————————————————–+

| Ordinary least squares regression Weighting variable = none |

| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |

| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |

| Residuals: Sum of squares= .1887007804E+13, Std.Dev.= 123860.91740 |

| Fit: R-squared= .600927, Adjusted R-squared = .57173 |

| Model test: F[ 9, 123] = 20.58, Prob value = .00000 |

| Diagnostic: Log-L = -1743.2005, Restricted(b=0) Log-L = -1804.2882 |

| LogAmemiyaPrCrt.= 23.526, Akaike Info. Crt.= 26.364 |

| Autocorrel: Durbin-Watson Statistic = 1.64620, Rho = .17690 |

+———————————————————————–+

+———+————–+—————-+——–+———+———-+

|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|

+———+————–+—————-+——–+———+———-+

Constant -220786.5652 73216.452 -3.016 .0031

BD 43211.70481 17648.554 2.448 .0158 3.4887218

BA 119742.4105 18701.854 6.403 .0000 2.3646617

SIZE 37264.69827 12975.229 2.872 .0048 2.0953534

CULOT .9963556600 .66434010 1.500 .1362 4978.6157

AGE 12253.39723 4639.4159 2.641 .0093 3.5714286

Q2 33928.23230 45686.087 .743 .4591 .36842105

Q3 55877.82472 46192.497 1.210 .2287 .32330827

Q4 74211.29656 50552.385 1.468 .1447 .15789474

Q5 64860.89912 57673.176 1.125 .2629 .82706767E-01

Size: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

+———————————————————————–+

| Ordinary least squares regression Weighting variable = none |

| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |

| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |

| Residuals: Sum of squares= .1925640795E+13, Std.Dev.= 125122.40495 |

| Fit: R-squared= .592757, Adjusted R-squared = .56296 |

| Model test: F[ 9, 123] = 19.89, Prob value = .00000 |

| Diagnostic: Log-L = -1744.5482, Restricted(b=0) Log-L = -1804.2882 |

| LogAmemiyaPrCrt.= 23.547, Akaike Info. Crt.= 26.384 |

| Autocorrel: Durbin-Watson Statistic = 1.59886, Rho = .20057 |

+———————————————————————–+

+———+————–+—————-+——–+———+———-+

|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|

+———+————–+—————-+——–+———+———-+

Constant -221337.2167 76099.097 -2.909 .0043

BD 50303.69757 16942.158 2.969 .0036 3.4887218

BA 131567.8584 18073.353 7.280 .0000 2.3646617

CUSIZE 189.8603463 110.83285 1.713 .0892 75.419611

LOT 1613.499064 1676.2397 .963 .3377 10.698880

AGE 11162.99840 4646.6675 2.402 .0178 3.5714286

Q2 39134.77136 46066.731 .850 .3972 .36842105

Q3 59027.42881 46624.808 1.266 .2079 .32330827

Q4 73710.37655 51173.359 1.440 .1523 .15789474

Q5 69016.76995 58325.435 1.183 .2390 .82706767E-01

Age,Lot, Size: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +

+———————————————————————–+

| Ordinary least squares regression Weighting variable = none |

| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |

| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |

| Residuals: Sum of squares= .1981635825E+13, Std.Dev.= 126928.56403 |

| Fit: R-squared= .580915, Adjusted R-squared = .55025 |

| Model test: F[ 9, 123] = 18.94, Prob value = .00000 |

| Diagnostic: Log-L = -1746.4544, Restricted(b=0) Log-L = -1804.2882 |

| LogAmemiyaPrCrt.= 23.575, Akaike Info. Crt.= 26.413 |

| Autocorrel: Durbin-Watson Statistic = 1.52703, Rho = .23648 |

+———————————————————————–+

+———+————–+—————-+——–+———+———-+

|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|

+———+————–+—————-+——–+———+———-+

Constant -179207.8145 74321.138 -2.411 .0174

BD 56041.05757 17411.842 3.219 .0016 3.4887218

BA 125686.3363 18235.331 6.892 .0000 2.3646617

CUSIZE 207.0734780 108.47541 1.909 .0586 75.419611

CULOT .3907830999 .80803997 .484 .6295 4978.6157

CUAGE 61.71148077 36.127071 1.708 .0901 126.19379

Q2 34966.44174 46677.926 .749 .4552 .36842105

Q3 61530.57060 47310.295 1.301 .1958 .32330827

Q4 69789.42370 51816.146 1.347 .1805 .15789474

Q5 59428.41302 58942.872 1.008 .3153 .82706767E-01

Many thanks to Joris Claassen who provided feedback on how to better optimize this code. Please find the updated code below.

Hey guys,

As I continue my quest to become a master VBA programmer, I would like to share a piece of code with you that I recently constructed. This could was the result of Excel Solver not being able to generate a satisfactory solution to my optimization problem. Thus, I wanted to run a sensitivity analysis to see how various initial values would affect my target cell. This code allows the user to specify a target cell, a set of change cells, a range of lower bounds (inclusive) and a range of upper bounds (inclusive), as well as a precision level. The code then one-by-one tests the sensitivity of each of your change cells on your target cell and generates graphs demonstrating the output. I would appreciate any feedback on the code!

Thanks,

Adam

'Created by Adam McElhinney, 8/4/2011 'This code allows a user to specify a range of cells to adjust, a target cell as well as lower and upper bounds for their adjustments. 'Then the code generates a charts that determine how sensititive the target cell is to the input parameters. This can be useful for 'cases where Solver cannot generate a satisfactory solution. Sub refit_model() Call clear_sheet Call calc_configs Call ArrangeMyCharts End Sub 'Clear Sheets Sub clear_sheet() Dim check As Worksheet Application.DisplayAlerts = False Sheets("output").Select Sheets("output").Cells.Clear Dim wsSheet As Worksheet On Error Resume Next Set wsSheet = Sheets("graphs") If Not wsSheet Is Nothing Then wsSheet.Delete Sheets.Add.Name = "graphs" Else Sheets.Add.Name = "graphs" End If End Sub Sub calc_configs() Application.ScreenUpdating = True 'Timer 'Dim Starttime As Double, EndTime As Double 'Startime = Timer 'Prompt user to select range of coefficients, lower bounds and upper bounds Worksheets("calculation_configs").Activate Dim variables As Range Dim change As Range Dim lower As Range Dim upper As Range Dim current As Range Dim target As Range Dim precision As Integer Dim test_mode As Integer Set variables = Application.InputBox("Select the row of variable names:", Default:=Range("a6:a19").Address, Type:=8) Set current = Application.InputBox("Select the row of cells that are the current configs:", Default:=Range("b6:b19").Address, Type:=8) Set change = Application.InputBox("Select the row of cells to change:", Default:=Range("c6:c19").Address, Type:=8) Set lower = Application.InputBox("Select the row of cells that are lower bound:", Default:=Range("d6:d19").Address, Type:=8) Set upper = Application.InputBox("Select the row of cells that are upper bound:", Default:=Range("e6:e19").Address, Type:=8) Set target = Application.InputBox("Select the target cell", Default:=Range("k16").Address, Type:=8) precision = Application.InputBox("Enter in the graphics precision:", Default:=10) test_mode = (MsgBox("Do you want to run in test mode? Click 'Yes' for test mode or 'No' for regular mode. In test mode, cells will not recalculate.", vbYesNo)) 'Check to ensure that change, lower and upper are all same length Dim j, k, l As Integer j = change.Rows.Count k = lower.Rows.Count l = upper.Rows.Count If j <> k Or k <> l Then MsgBox ("The number of cells to change must equal the number of upper bounds and the number of lower bounds") Exit Sub End If Application.ScreenUpdating = False 'Assign initial coefficients, lower bounds and upper bounds to an array ReDim coeff(j) As Double ReDim lowerb(j) As Double ReDim upperb(j) As Double Dim i As Integer i = 1 For i = 1 To j coeff(i) = change.Cells(i, 1).value lowerb(i) = lower.Cells(i, 1).value upperb(i) = upper.Cells(i, 1).value 'MsgBox (lower.Cells(i, 1).Value) Next i 'QC Code 'ReDim testdisplay(j) As Double 'testdisplay = lowerb 'MsgBox testdisplay(1) & "-" & testdisplay(2) & "-" & testdisplay(3) & "-" & testdisplay(4) & "-" & testdisplay(5) & "-" & testdisplay(6) & "-" & testdisplay(7) & "-" & testdisplay(8) & "-" & testdisplay(9) & "-" & _ 'testdisplay(10) & "-" & testdisplay(11) & "-" & testdisplay(12) & "-" & testdisplay(13) & "-" & testdisplay(14) 'Determine distance between upper and lower bounds, as well as upper bound and initial values and lower bound and initial values ReDim distance(j) As Double ReDim UpperDistance(j) As Double ReDim LowerDistance(j) As Double i = 0 For i = 1 To j distance(i) = upperb(i) - lowerb(i) UpperDistance(i) = upperb(i) - coeff(i) LowerDistance(i) = coeff(i) - lowerb(i) Next i 'Determine the graphing precision change intervals ReDim interval(j) As Double For i = 1 To j interval(i) = distance(i) / precision 'MsgBox (interval(i)) Next i 'QC Code 'ReDim testdisplay2(j) As Double 'testdisplay2 = distance 'MsgBox testdisplay2(1) & "-" & testdisplay2(2) & "-" & testdisplay2(3) & "-" & testdisplay2(4) & "-" & testdisplay2(5) & "-" & testdisplay2(6) & "-" & testdisplay2(7) & "-" & testdisplay2(8) & "-" & testdisplay2(9) & "-" & _ 'testdisplay2(10) & "-" & testdisplay2(11) & "-" & testdisplay2(12) & "-" & testdisplay2(13) & "-" & testdisplay2(14) 'Set all configs to current values change.value = current.value 'Calc sum squares If test_mode = vbNo Then Application.CalculateFull End If 'Record results 'Create index variable Dim ObsCol As Range 'ObsCol=Sheets("output"). Dim ObsNum As Integer ObsNum = 1 For i = 1 To j Sheets("output").Cells(1, i).value = variables.Cells(i, 1) Sheets("output").Cells(2, i).value = change.Cells(i, 1) Next i Sheets("output").Cells(1, j + 1).value = "Target" Sheets("output").Cells(1, j + 2).value = "ObsNum" Sheets("output").Cells(2, j + 1).value = target.Cells(1, 1).value Sheets("output").Cells(2, j + 2).value = ObsNum 'New Counters Dim n As Integer Dim p As Integer n = 1 p = 1 'Iterate through each variable one by one in increments equal to the precision, calculating the value of the target i = 1 'Loop for each variable For i = 1 To j 'Set all configs to lower bounds change.value = lower.value 'Loop for each increment For m = 1 To precision change.Cells(i, 1).value = m * interval(i) + lowerb(i) 'MsgBox (m * interval(i) + lowerb(i)) If test_mode = vbNo Then Application.CalculateFull End If 'Print output n = 1 For n = 1 To j 'Write change values Sheets("output").Cells(2 + i * precision - (precision - m), n).value = change.Cells(n, 1).value Next n 'Write target value Sheets("output").Cells(2 + i * precision - (precision - m), j + 1).value = target.value 'Obs Num Sheets("output").Cells(2 + i * precision - (precision - m), j + 2).value = i * precision - (precision - m) + 1 Next m 'Create graph of output Worksheets("output").Activate ActiveSheet.Shapes.AddChart.Select ActiveChart.SetSourceData Source:=Sheets("output").Range(Cells(i * precision - (precision - 1) + 2, j + 1), _ Cells(i * precision + 2, j + 1)) Debug.Print "Chart Target range:" & i * precision - (precision - 1) + 2 & "-" & j + 1 ActiveChart.ChartType = xlLine ActiveChart.SeriesCollection(1).XValues = Sheets("output").Range(Cells(i * precision - (precision - 1) + 2, i), _ Cells(i * precision + 2, i)) ActiveChart.HasTitle = True ActiveChart.ChartTitle.Text = "Change in Target for a Change in Variable " & variables(i) ActiveChart.Location xlLocationAsObject, "graphs" Debug.Print "Chart Series range:" & ii * precision - (precision - 1) + 2 & "-" & i Next i 'Save workbook Application.DisplayAlerts = False 'ActiveWorkbook.SaveAs Filename:="C:\Documents and Settings\amcelhinney\Desktop\" & test_file EndTime = Timer Debug.Print "Execution Time in seconds:", EndTime - Starttime End Sub Sub ArrangeMyCharts() Worksheets("graphs").Activate Dim iChart As Long Dim nCharts As Long Dim dTop As Double Dim dLeft As Double Dim dHeight As Double Dim dWidth As Double Dim nColumns As Long dTop = 75 ' top of first row of charts dLeft = 100 ' left of first column of charts dHeight = 225 ' height of all charts dWidth = 375 ' width of all charts nColumns = 3 ' number of columns of charts nCharts = ActiveSheet.ChartObjects.Count Debug.Print nCharts For iChart = 1 To nCharts With ActiveSheet.ChartObjects(iChart) .Height = dHeight .Width = dWidth .Top = dTop + Int((iChart - 1) / nColumns) * dHeight .Left = dLeft + ((iChart - 1) Mod nColumns) * dWidth End With Next End Sub Sub ArrangeMyCharts_2() Worksheets("Calculation_Configs").Activate Dim iChart2 As Long Dim nCharts2 As Long Dim dTop As Double Dim dLeft As Double Dim dHeight As Double Dim dWidth As Double Dim nColumns As Long dTop = 300 ' top of first row of charts dLeft = 500 ' left of first column of charts dHeight = 225 ' height of all charts dWidth = 375 ' width of all charts nColumns = 2 ' number of columns of charts nCharts2 = ActiveSheet.ChartObjects.Count Debug.Print nCharts2 For iChart2 = 1 To nCharts2 With ActiveSheet.ChartObjects(iChart2) .Height = dHeight .Width = dWidth .Top = dTop + Int((iChart2 - 1) / nColumns) * dHeight .Left = dLeft + ((iChart2 - 1) Mod nColumns) * dWidth End With Next End Sub]]>

1. Ensure that the exam material is relevant how you use SAS. The certification is extremely detailed and mainly covers data importing, basic data manipulation and basic reporting (full description of exam content). If you use SAS primarily for predictive modeling purposes and already have an established method of getting data into SAS, this exam may not be a good use of your time.

2. Buy the exam book. Unless you have been using SAS for years, you won’t be able to pass the test without it (and even if you have been using SAS for years, I’d be shocked if you were able to pass without this book). Ensure that you read each chapter and make flashcards along the way. Also, take the practice exams at the end of each chapter and record which questions you got wrong. Then revisit those questions periodically until you are able to complete all of the practice exams with 90% or greater accuracy.

3. Consider purchasing the practice exam. I was not excited to shell out another $55 for this test, on top of the cost of the book and the cost of the exam, but it was definitely worth it. This is far more robust than the book and is very representative of the type of questions you will see on the exam.

I hope that this was helpful and good luck with your studies!

]]>