The following below is a thesis I wrote while an undergrad at IU.
Real estate is undoubtedly one of the most important components of the modern economy. In particular, owning a house has long typified the American dream and a residential home is the largest asset for a majority of Americans#. The housing market had swelled 12.5% from 1990 to 200, eventually totaling 119.6 million units#. Along with this growth in the housing market, has come an even greater growth in the accurate assessment of housing prices.
In response to this demand, the consumer market has responded with many websites offer users the ability to get a free and quick estimate of their homes value by inputting several variables such as homes location, size, and features. However, the accuracy of these estimates in unknown and the formulas they are using for estimation are often kept proprietary. Despite this, the internet has allowed for an unprecedented amount of real estate transactions to be tracked, often given details regarding the homes age, size, location, and features.
These data sets are currently being used in the world of financial economics, which has long been interested in real estate valuation. They have employed a variety of statistical techniques in an attempt to accurately predict the values of properties. The bulk of the research has relied on multivariate regression to assign weights to the various features of the home. However, the limitations of these techniques have been found and new statistical techniques such as neural networks are being tested as replacements.
The accuracy of these neural networks in comparison to traditional multivariate method is still in debate; however research is showing a growing niche where neural network models are able to outperform traditional techniques.
II. Background
There are three main methods of valuing real estate. One method is the discounted cash flow method. This is primarily used in the valuation of commercial properties. This method calculates the present value of all the income that one expects to receive from the property. However, this method has several flaws. First, the discount rate used is highly subjective. Various complex statistical techniques have been developed for calculating the discount rate. Further, the discount rate assumes that interest rates will stay constant. Interest rates can be highly volatile, additionally even small changes in the interest rate can have large effects of the property valuation. Lastly, the discounted cash flow model faces encounters circular logic if the purchase of the property is to be funded with a loan. The value of the asset is needed to calculate the WACC; however it is that value we seek to find#.
Another method for real estate valuation is the cost method. This method takes the value of the land as if it was vacant, plus the deprecated value of any structures occupying the land.# However, these methods are very limited in their practicality, especially when applied to large scale valuation of residential properties.
Undoubtedly the most popular technique for real estate valuation is the market comparison approach. In a traditional market comparison approach, a property’s value is determined by finding other recently sold properties with similar physical characteristics, called comparables, or ‘comps’ for short. Then, weights are applied to the various features of the house and adjusted to as closely approximate the target house as possible.
Several practical issues arise from this method. First of all, there may not be a large number of comparable properties that have been sold recently. This is particularly true if the property is very different from the average properties in the area, or has some unique characteristic that does not allow for comparison#. In addition, by omitting many other sales from the analysis, the appraiser may be losing valuable information. In a similar vein, typically only a small number of comparable properties are available. This number of comparable properties may not be enough to assume a statistically normal distribution and thus price estimates may be flawed.
Furthermore, the differences in the properties may be quite significant. There are 5 main differences listed that need to be accounted for in this method; physical, location, market conditions, terms of financing, and conditions of sale. For example, the prime rate, the rate that is largely responsible for the rate a homebuyer will pay for a loan, dropped 2.5% the first six months in 2001#. Thus any property sales that were more than 1 or 2 months old would need to be reconfigured to reflect this new rate.
Lastly, the weighting assigned to each of the various comparable properties is often subject to scrutiny and debate. The accuracy of these weights is highly relevant to the fair pricing of the home.
In order to counteract these limitations, appraisers have begun to rely on statistical techniques such as multivariate regression. In a typical appraisal, the prices of a large number of sold properties are regressed against the characteristics that are seen as influencing the prices of the properties. The defining work on this is the book Real Estate Valuation Theory published by the Appraisal Institute and American Real Estate Society.
In more recent years, appraisers have begun to rely on graphical interface systems (GIS) to assemble databases of properties that incorporate geographic information. Common geographic variables include distance to major metropolitan areas, crime rates, distance to public transportation, etc. An important early study by Wyatt incorporated other buildings, car parks, footpaths, boundary of the urban areas, tree preservation orders, open spaces, conservation areas, water features, road edges and centre-lines and railway lines and stations into his model of commercial real estate prices#.
However, these multiple variable regressions (MVR) have several limitations. First, for small sample sizes, multivariate regressions have proven to be extremely poor estimators of prices#. Therefore, in rural areas or properties that do not have many properties suitable for compassion, neural networks (NN) hold great promise.
In addition, many characteristics of the house are highly inter-correlated. This leads to the possibility of multicolinearity occurring. Further, the data used for the comps is often taken from different times. Thus, hetroskedasticity arises, further diminishing the accuracy of the estimates. Lastly, in an increasingly complex economy, assumptions of linearity seem to be less and less feasible.
In an effort to obtain more accurate sales predictions, some researchers have turned to neural networks to improve performance. The results of using neural networks versus multivariate regressions are contradictory and scattered. Studies by researchers such as Tskudua and Baba, Ngyuen and Cripps, and Hanson, all demonstrate effective ways of using neural networks for valuation #. However, seemingly just as many researchers have found MVR superior to NN#.
III. Explanation of the Neural Network System
To understand the neural network model, one must understand its motivation for creation. The neural network model is a statistical technique modeled off the way the brain processes data. A non-rigorous definition of neural networks is given by Gurney as follows:
A neural network is an interconnected assembly of simple processing elements, units or nodes, whose functionality is loosely based on the animal neuron. The processing ability of the network is stored in the inter-unit connection strengths, or weights, obtained by a process of adaptation to, or learning from, a set of training patterns.
The goal of the method was to create a non-linear and non-parametric statistical technique#. Nonparametric statistics are those that do not make any underlying assumptions, such as normality, about the population distribution#. This is a major weakness of traditional parametric statistical techniques, is that one must assume the underlying probability distribution. Non-parametric statistics are not bound by this constraint.
In order to better understand neural networks, it is helpful to have a little background knowledge on biological neurons, for which our model is based. For our purposes, biological neurons can be broken down into four simple parts. First, are the dendrites. Dendrites are responsible for receiving incoming signals from other neurons. Their structure typically resembles the branches of a tree, allowing them to interact with many different neurons in the brain. The second element is the soma, or cell body. It is the largest part of the cell and responsible for the cells vital functions. Moving forward, the axon is a single truck-like structure that extends from the soma. It serves to carry the impulse outward to the next part of the neuron, the pre-synaptic terminals. The pre-synaptic terminals branch out from the axon, connecting the neuron with other neurons and transmitting the impulse when applicable.
There are two types of impulses that the neurons transmit, inhibitory and exhibitory. When a neuron receives an impulse in the dendrites, it is then transmitted to the cell body#. If the strength of the impulse is greater than some value, it is said to be exhibitory and the cell continues to transfer the impulse. Conversely, if the strength of the impulse is less than some value, the impulse is not transmitted and it is labeled as inhibitory.
The artificial neuron used for our computation follows a similar form. The first artificial neuron neural network was the threshold logic unit (TLU), developed my McCulloch and Pitts#. A neuron receives inputs and the inputs are weighted in some fashion. If the summed value of the impulses exceeds a certain threshold level, the exiting impulse is given a value of one. If not, the impulse is given a value of zero and does not continue.
It is the arrangement of multiple neurons of this type that gives us a neural ‘network’. Arrangement on the neurons can take many forms, but in its simplest is a series of inputs, which then converge onto a hidden layer, continuing on to an outer layer of neurons, which then eventually produces the outputs#.
The network described above is referred to as a feed forward network. However, these networks can take many shapes and have nodes that not only send signals forward, but may send signals laterally and in some cases backward. More details on various structures will be given later.
The weights given to each impulse by the neuron are an important source of their computing strength. These weights need not necessarily be static. Rather, the weights may be altered by a set of known data. One extremely useful process to alter these weights is called supervised learning. In this process, the series of known data is passed through the neural network and the weights are adjusted according to a learning rule, to provide the desired outputs. This process is typically repeated many times until the weights very accurately produce the desired outputs. It is then the hope that when the network receives data it has not seen before, that it will be able to correctly identify the pattern and produce the correct outputs.
As one may notice, neural networks are not a very strongly defined statistical technique, thus giving them flexibility to be applied to a very wide range of areas and applications. Neural network models have been applied in diverse fields such as civil engineering#, pavement crack analysis#, soil and water retention# and many other seemingly unrelated areas. However, the diversity of the applications speaks to the utility of the method.
IV. Data Set
For this paper, data on133 real estate transactions, from the period from 3/20/2006 to 2/28/2007, was gathered from the website Zillow.com. Zillow is a free website designed to give consumers a quick estimate of their homes’ values. In order to calculate these estimates, Zillow contains a database of all home transactions for a particular region, going back to different periods, depending on the location. The transactions recorded are from the northwest suburbs of Chicago, primarily in the 60010, 60047, and 60042 zip codes. In addition, the database lists the address of the home, the number of bedrooms, the number of bathrooms, the size of the home in square-feet, the size of the lot in square feet, the price the home sold for, the date the home was sold on, and the age of the home. The variables, size of home in square feet, size of lot in square feet, have both been divided by 1000 to better condition the numbers to be squared and cubed. Similarly, the variable for the age of the house has been divided by 10. These will be the variables used in our model to predict housing prices.
The data was randomly broken down into 79 “training observations” and 54 “validation” observations. This ratio of training observations to validation observations is consistent with previous literature Each model will be examined for their ability to predict the values of the 54 validation housing prices.
Variable Name |
Description |
Price |
Listed in US Dollars |
BD |
Number of Bedrooms |
BA |
Number of Bathrooms |
SIZE |
(Square Footage of the House)/1000 |
LOT |
(Square Footage of the Lot)/1000 |
AGE |
(Number of years since the house was built)/10 |
LN(VAR) |
The natural log of the specified variable |
SQ(VAR) |
The specified variable squared |
CU(VAR) |
The specified variable cubed |
Q2 |
Dummy Variable indicating whether or not the house was sold in the second quarter of 2006 |
Q3 |
Dummy Variable indicating whether or not the house was sold in the third quarter of 2006 |
Q4 |
Dummy Variable indicating whether or not the house was sold in the fourth quarter of 2006 |
Q5 |
Dummy Variable indicating whether or not the house was sold in the first quarter of 2007 |
Descriptive statistics of the data are provided below.
===============================================================================
Variable Mean Std.Dev. Minimum Maximum Cases
===============================================================================
——————————————————————————-
All observations in current sample
——————————————————————————-
PRICE 387570.301 189266.582 100000.000 1275000.00 133
BD 3.48872180 .794074088 2.00000000 6.00000000 133
BA 2.36466165 .755980749 1.00000000 5.50000000 133
SIZE 2095.35338 1112.12044 854.000000 10018.0000 133
LOT 10698.8797 8592.70512 70.0000000 52272.0000 133
AGE 35.7142857 24.1584327 3.00000000 140.000000 133
Q2 .368421053 .484200125 .000000000 1.00000000 133
Q3 .323308271 .469507675 .000000000 1.00000000 133
Q4 .157894737 .366020890 .000000000 1.00000000 133
Q5 .827067669E-01 .276479842 .000000000 1.00000000 133
As the graphs show, there is a lot of variability amongst the different variables and their effects on price. This is most likely due in large part to the fact that our model does not control any geographic variables. Anyone who knows the Chicago area can tell you that the closer one gets to the city, the higher housing prices are. Despite this, these complicated relationships are exactly what neural networks are designed to predict and thus this high variability should be a good testing grounds for a non-linear model.
V. Multivariate Regression Models
In accordance with previous literature, several multivariate regression models were tested. Previous literature has hypothesized that real estate data may follow a semi-log, or a log-log pattern. In addition, research has shown that age, property size, and house size may have a squared or even cubic relationship.
Lastly, there is a large possibility for hetroskedasticity to occur due to the fact that data is pulled from different time periods, thus all the models are repeated using White’s model for hetroskedasticity. The “Un-Whitened” model formats are given below.
Linear: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
Semi-Log: Ln(Selling price)= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
Log-Log: Ln(Selling price)= +LN(BD) + LN(BA) + LN (Size) + LN (Lot) + LN (Age) + LN (Q2) + LN (Q3) + LN (Q4)+ LN (Q5) +
Age: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
Lot: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
Size: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
Age,Lot, Size: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
Age: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
Lot: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
Size: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
Age,Lot, Size: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
Somewhat surprisingly, the simplest linear model without correcting for hetreoskedasticity proves to yield the most statistically significant variables. All of the variables, except for the quarter sold variables, are significant. However, previous studies utilizing much larger sample sizes have shown all these variables to be significant, thus we accept that they are relevant for our analysis.
Linear: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
+———————————————————————–+
| Ordinary least squares regression Weighting variable = none |
| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |
| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |
| Residuals: Sum of squares= .1861022040E+13, Std.Dev.= 123005.12383 |
| Fit: R-squared= .606423, Adjusted R-squared = .57762 |
| Model test: F[ 9, 123] = 21.06, Prob value = .00000 |
| Diagnostic: Log-L = -1742.2784, Restricted(b=0) Log-L = -1804.2882 |
| LogAmemiyaPrCrt.= 23.512, Akaike Info. Crt.= 26.350 |
| Autocorrel: Durbin-Watson Statistic = 1.64819, Rho = .17591 |
+———————————————————————–+
+———+————–+—————-+——–+———+———-+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+———+————–+—————-+——–+———+———-+
Constant -226120.6909 72598.896 -3.115 .0023
BD 37659.81050 17601.113 2.140 .0344 3.4887218
BA 122116.0065 18335.652 6.660 .0000 2.3646617
SIZE 35.09598346 12.983202 2.703 .0078 2095.3534
LOT 2.715900837 1.3582596 2.000 .0478 10698.880
AGE 1245.204431 460.88289 2.702 .0079 35.714286
Q2 33570.17097 45364.143 .740 .4607 .36842105
Q3 54453.79975 45884.666 1.187 .2376 .32330827
Q4 68688.20522 50361.193 1.364 .1751 .15789474
Q5 68402.54427 57322.227 1.193 .2350 .82706767E-01
The model was then back tested using the 54 validation observations. The success or failure of the model was based upon its ability to achieve a low forecasting error.
Forecasting Error= |Actual Housing Price- Predicted Housing Price|
Actual Housing Price
Mean Forecasting Error of the Multiple Variable Regression Models
Linear |
Semi-Log |
Log-Log |
Age |
Lot |
Size |
0.254316 |
0.235697 |
0.259342 |
0.258416 |
0.252312 |
0.250122 |
Age,Lot, Size |
Age |
Lot |
Size |
Age,Lot, Size |
0.254224
|
0.261117
|
0.253886
|
0.249586
|
0.260392
|
It is somewhat surprising that the Semi-Log model has the lowest mean forecasting error. In order to predict with a semi-log model, one must take the exponential of the predicted prices. This is a non-linear transformation and therefore introduces bias. However, in this case it appears that this bias is overcome by the increased predictive power of the model.
The lack of variability between the 11 models is striking. The different in mean forecasting error between the highest and the lowest models is only .025. However, given a larger sample size, one would expect this to increase as more subtle relationships between the independent variables are teased out.
The distribution of the forecasting errors for the semi-log model is shown below. The model does an impressive job, 32 of the observations have a forecasting error of less than .2 and 46 of the observations have a forecasting effort of less than .40.
VI. Neural Network Model Explanation
In order to contrast the performance of the two statistical techniques, a traditional feed forward neural network was utilized. In a feed forward network, the data is separated into two sets, the training set and the validation set. As stated previously, our data was randomly broken down into a training set of 79 observations and a validation set of 54 observations. Then, the network is ‘trained’, meaning the training data is passed through the network and the initial parameters are adjusted in some manner, for a specified number of iterations. Then the validation set is used to check the accuracy of the model specified by the training. In this analysis, a sigmoid activation function was used for its simplicity and ease of differentiation.
Sigmoid function
The number of hidden layers, the number of neurons per hidden layer, and the number of training iterations to conduct are open to debate. Previous research on neural networks for real estate prediction has shown that one hidden layer is typically sufficient for accurate assessment#. Similarly, one hidden layer is used in our model. For the other parameters, there are a few ‘rules of thumb’ given for these parameters, but due to the low number of observations in our sample, numerous combinations were tested.
The neural network package utilized for this analysis automatically sets the initial values to random numbers within the data range. This results in quicker training than a pure random initialization#.
Neural Network Model Specifications
Model Number |
Hidden Layers |
Neurons Per Hidden Layer |
1 |
1 |
1 |
2 |
1 |
2 |
3 |
1 |
3 |
4 |
1 |
4 |
5 |
1 |
5 |
6 |
1 |
6 |
7 |
1 |
7 |
8 |
1 |
8 |
9 |
1 |
9 |
10 |
1 |
10 |
11 |
1 |
15 |
12 |
1 |
20 |
13 |
1 |
25 |
The following models were all run with 5, 10, 15, 20, 25, 30, 50 and 75 training iterations. One of the difficulties of the neural network model is estimating how many training iterations to use. If you underestimate the amount needed, the results are not as accurate as they could be. Conversely, if one uses too many training iterations the function may be over-fitted to the data. In turn, this over-fitting causes a loss of accuracy when new data is run through the model.
To compare the accuracy of the models, the validation data was run through both models and the mean forecasting error was computed.
Mean Forecasting Error of Various Neural Network Models
Model Number |
Number of Neurons on One Hidden Layer |
5 Training Iterations |
10 Training Iterations |
15 Training Iterations |
20 Training Iterations |
25 Training Iterations |
30 Training Iterations |
50 Training Iterations |
75 Training Iterations |
1
|
1
|
0.277338 |
0.277165 |
0.277342 |
0.277027 |
0.277411 |
0.276779 |
0.2761 |
0.2761 |
2
|
2
|
0.283716 |
0.277502 |
0.280274 |
0.285198 |
0.28401 |
0.282454 |
0.302764 |
0.301875 |
3
|
3
|
0.267825 |
0.273145 |
0.300541 |
0.304717 |
0.317211 |
0.329043 |
0.35199 |
0.350984 |
4
|
4
|
0.293282 |
0.343108 |
0.338736 |
0.360779 |
0.371395 |
0.406882 |
0.429371 |
0.425458 |
5
|
5
|
0.319706 |
0.382284 |
0.385009 |
0.380813 |
0.376205 |
0.382051 |
0.384963 |
0.388435 |
6
|
6
|
2.06627 |
9.448 |
1963.7 |
2052.98 |
2361.01 |
399139. |
3436.7 |
4133.77 |
7
|
7
|
0.312132 |
0.287282 |
0.299599 |
0.309019 |
0.330474 |
0.349585 |
0.493541 |
0.635953 |
8
|
8
|
0.352379 |
0.393616 |
0.635656 |
0.959402 |
1.59276 |
0.538184 |
1.07871 |
6.10403 |
9
|
9
|
1.17289 |
1.57346 |
1.39654 |
93.6565 |
749.619 |
638.547 |
256.935 |
33.1817 |
10
|
10
|
0.297179 |
0.328543 |
0.354833 |
0.412025 |
0.461227 |
0.48286 |
0.703548 |
0.918779 |
11
|
15
|
0.33348 |
0.460503 |
0.489845 |
0.788403 |
0.917985 |
1.20471 |
1.13428 |
1.17958 |
12
|
20
|
0.417438 |
0.530489 |
0.692084 |
0.883574 |
1.16352 |
1.33265 |
1.39361 |
1.48593 |
13
|
25
|
0.657161 |
2.37524 |
3.20335 |
4.61429 |
5.74316 |
7.12753 |
11.5672 |
13.0967 |
FE<0.3 |
.3<FE<0.6 |
.6<FE<2 |
2<FE<10 |
FE<10 |
As one can see, the neural network model presented here has the best results with a relatively low number of neurons per hidden layer and low training iterations. In fact, the model with the lowest mean forecasting error has three neurons per hidden layer and only 5 training iterations. This is not entirely surprising, due to the fact that our entire data set was only 133 observations, with nine explanatory variables. If one were to add more observations and more explanatory variables, then the number of training iterations and the number of neurons per hidden layer, and possibly even the number of hidden layers, would likely need to be increased. Interestingly, there is very little variability in mean forecasting error between models 1-5. Despite the parameters ranging from 1-5 neurons and 5-75 training iterations, the mean forecasting error varies by a scant .16.
For models 6-13, there appears to be massive over-training tacking place, resulting in ridiculous price predictions. The absurdity of these estimates underscores the importance of choosing the correct parameters when constructing a neural network. Interestingly, there are some shockingly precise estimates that occur within this range. In particular, model 10 with 5 training iterations and model 7 with 10 and 15 training iterations have mean forecasting errors below .30. This is most likely due to the randomness utilized by the neural network in selecting the initial training parameters and is not repeatable. Again, this serves as a cautionary example of the utilization of neural networks. One must not only concern themselves with the neural network predictions for their particular data set, but whether this network could be repeated in different circumstances.
The chart below shows the distribution of the forecasting errors for the model with the lowest mean forecasting error, model 13. As is apparent in the chart, the model is quite effective for the majority of the predictions. However, several outliers are undoubtedly pulling the mean forecasting error much higher. In fact, approximately 28 of the predicted prices have a forecasting error of less than .2 and an impressive 46 of the predicted prices have a forecasting error of less than .40. This is contrasted by the drastic increases in forecasting errors for predictions 50 and onward. Once again, this high variability is most likely a result of the geographic regions from which the data was taken, but none the less illustrates how a neural network will handle predictions of data with high variability.
VII. Comparison of the Two Models
In our analysis, it seems clear that the Semi-log MVREG model is the most accurate model for predicting the real estate prices. The difference in mean forecasting between the best multivariable regression model and the best neural network model was approximately .032. Additionally, the chart below shows how the distribution of forecasting errors is shown below.
The Semi-Log model beats the NN Model 13 in almost every observation. However, in the low to middle spectrum of the forecasting error, the NN 13 competes closely with the Semi-Log model, even showing superiority in some cases over the range 37-49. However, the largest difference between the two models is that the Semi-Log model seems to do a better job of dealing with outliers than the NN Model 13. This is undoubtedly where a large portion of the difference in mean forecasting error is created.
Additionally, the computational requirements of the neural network models further hinder its practicality. Despite the small sample size used in this analysis, it took the 3.2 GHz Pentium IV with Hyper-Threading approximately 22 minutes to run the 104 neural network models, or .21 minutes per model. However, the 11 multivariate regressions models ran almost instantly.
The ambiguity of a neural network model also presents a problem. The difference between the highest and lowest forecasting error in a multivariate regression model was .025. The difference in the neural network models was over 11. The neural network models were admittedly grossly over trained, but there is not this potential for error in a multivariate regression model.
The greater potential for error in a neural network, model stems from the fact that there are many more parameters that the operator must specify than with the regression model. The chart below summarizes the decisions that must be made to construct the two different models.
Multivariable Regression Model |
Neural Network Model |
Data Transformations (e.g., log, semi-log, squared, etc) |
Model Structure (Feed-Forward, Radial Basis, Hopfield, etc.) |
Significance Level |
Hidden Layers |
|
Number of Neurons per Hidden Layer |
|
Number of Training Iterations |
|
Activation Function |
|
Initial Training Values |
Although there is research suggesting values for these parameters in the neural network model, there are not hard rules and experimentation is required. One may also pose the question of whether once the appropriate parameters of a neural network model are found, will those parameters remain optimal with different data. If a large scale real estate company was deciding whether or not to purchase several properties, the NN model they had previously relied on may not give accurate predictions of the price for this new set of possible purchases. In contrast, the parameters for the multivariable regression model are much fewer and have long established ranges (e.g. p<.05 significance level for variables, etc).
Another concern with neural network models is variation among various neural network programs. As a result of being very computationally intensive, neural network models tend to be particularly sensitive to computer programming intricacies. Every neural network program comes with a large manual that details the construction of the software and studies have shown how the different programs can produce different results from the same inputs#.
In contrast, the reliability of most regression software is well known. The software is much less sensitive to differences between programs, due to the decreased computations necessary. Additionally, the software is available to a wider range of users, particularly as a result of the data analysis pack that is embedded in all new versions of Microsoft Excel. Neural network software must be purchased separately, often at significant cost.
VII. Conclusion
Our data set showed a neural network model to be inferior to a multivariable regression. There is greater potential for error due to the increased number of user specified parameters. Additionally, the repeatability of the results amongst various computer programs is disputable. Lastly, the computational requirements make the NN models longer to run than the MVR’s. However, there is still much work to be done in the area of neural networks. Neural network models should be studied utilizing much larger data sets that was used in this study. Additionally, rules for establishing the parameters of the neural network model need to be created and rigorously tested. Further, the performance of neural networks with various levels of sufficiency of variables needs to be evaluated. Lastly, our study emphasizes that fact that whatever model of prediction being used, there must be variables to account for geographic differences.
That being said, the regression model was first contemplated in 1885 and is still being refined today#. In contrast, the neural network model as a computational tool came into being in 1949#. Thus, it is still an infant statistical technique, in need of much more refinement to find large scale application. Due to the ever increasing complexity of problems being studied by researchers and the exponential increases in computing power and technology, it is unlikely that neural networks will cease to be a promising area of study.
The complex nature of real estate valuation will certainly provide a plethora of applications for NN models. Once the technique evolves from its current state as something of an art form into a well defined statistical technique, we will see much more accurate real estate valuations.
Works Cited
“2002 Profile of US Real Estate Markets.” National Association of Realtors. 8 Apr. 2007 <http://www.realtor.org/cipshome.NSF/19A06EC3575CECE886256A1B00731E6F/AE4C9FEC10EFF28386256AE2006CE138?OpenDocument#Demographics>.
Adeli, Hojjat. “Neural Networks in Civil Engineering: 1989−2000.” Computer-Aided Civil and Infrastructure Engineering (2001): 126-142.
“Bank Prime Loan Rate Changes: Historical Dates of Changes and Rates.” Federal Reserve Bank of Saint Louis. 8 Apr. 2007 <http://research.stlouisfed.org/fred2/data/PRIME.txt>.
Benjamin, John D., Randall S. Guttery, and C.f. Sirmans. “Mass Appraisal: an Introduction to Multiple Regression Analysis for Real Estate Apprasial.” Journal of Real Estate Practice and Education (2004).
D. O. Hebb. The Organization of Behaviour. John Wiley & Sons, New York, 1949.
Francis Galton. Presidential address, Section H, Anthropology. (1885)
Gurney, Kevin. An Introduction to Neural Networks. London: UCL P Limited, 1997.
Hanson, James. “Australian House Prices; a Comparision of Hedonic and Repeat Sales Measures.” Reserve Bank of Australia (2006).
Hecht-Nielsen, R., Kolmogorov’s Mapping Neural Network Existence Theorem, Paper
presented at IEEE First International Conference on Neural Networks, San Diego, CA,1987.
Hoesli, Martin, Elion Jani, and Andre´ Bender. “Monte Carlo Simulations for Real Estate Valuation.” Journal of Property Investment & Finance (2006).
Johnston, Sarah Jane. “Real Estate: the Most Imperfect Asset.” 30 Aug. 2004. Harvard Business School. 8 Apr. 2007 <http://hbswk.hbs.edu/item/4342.html>.
Kershaw, Paul, and Peter Rossini. “Using Neural Networks to Estimate Constant Quality House Price Indices.” Fifth Annual Pacific-Rim Real Estate Society Conference (1999). 8 Apr. 2007.
Koekoek, E . J . W . “Neural Network Models to Predict Soil Water Retention.” European Journal of Soil Science Sept (1999): 489-495.
Lee, Byoung J. “Position-Invariant Neural Network for Digital.” Computer-Aided Civil and Infrastructure Engineering (2004): 105-118.
Nguyen, Nghiep, and Al Cripps. “Predicting Housing Value: a Comparison of Multiple Regression Analysis and Artificial Neural Network Analysis.” Journal of Real Estate Research 22 (2001): 313-336.
“Nonparametric Statistics.” StatSoft. 8 Apr. 2007 <http://www.statsoft.com/textbook/stnonpar.html>.
Pullum, Dr. Laura L., Dr. Marjorie A Darrah, and Mr. Brian J. Taylor. “Independent Verification and Validation of Neural Networks.” 8 Apr. 2007 <https://www.softwaretechnews.com//stn7-2/ivv.html>.
Segel, Arthur, and Sarah Jane Johnston. “Real Estate: the Most Imperfect Asset.” Harvard Business School (2004). 8 Apr. 2007 <http://hbswk.hbs.edu/item/4342.html>.
Sjoberg, Jonas. Mathmatica Neural Networks. Champaign, Illinois: Wolfram Research, 2004. 8 Apr. 2007 <http://www.wolfram.com/products/applications/neuralnetworks/>
“The Neuron.” UNAM, Instituto de Fisilogia Celular. 8 Apr. 2007 <http://www.ifisiol.unam.mx/Brain/neuron.htm>.
Limdep Code File Used for Multiple Regressions
read ;nobs=1474
;nvar=11
;file = “E:\Econ Thesis\Data\real estate data 3-30-07.txt”
;names =1$
create; if (quarter=2) q2=1$
create; if (quarter=3) q3=1$
create; if (quarter=4) q4=1$
create; if (quarter=5) q5=1$
dstats; rhs=price,bd, ba, size, lot, age, q2, q3, q4, q5$
skip;
regress;
lhs=price;
rhs=one, bd, ba, size, lot, age, q2, q3, q4, q5$ this is a linear regression
create; lnprice=log(price)$
skip;
regress;
lhs=lnprice;
rhs=one, bd, ba, size, lot, age, q2, q3, q4, q5$ this is a semi-log regression
create; lnbd=log(bd)$
create; lnba=log(ba)$
create; lnsize=log(size)$
create; lnlot=log(lot)$
create; lnage=log(age)$
skip;
regress;
lhs=lnprice;
rhs=one, lnbd, lnba, lnsize, lnlot, lnage, q2, q3, q4, q5$ this is a log-log regression
create; sqage=age*age$
create; sqlot=lot*lot$
create; sqsize=size*size$
skip;
regress;
lhs=price;
rhs=one, bd, ba, size, lot, sqage, q2, q3, q4, q5$ this is a linear regression with age^2
regress;
lhs=price;
rhs=one, bd, ba, size, sqlot, age, q2, q3, q4, q5$ this is a linear regression with lot^2
regress;
lhs=price;
rhs=one, bd, ba, sqsize, lot, age, q2, q3, q4, q5$ this is a linear regression with size^2
regress;
lhs=price;
rhs=one, bd, ba, sqsize, sqlot, sqage, q2, q3, q4, q5$ this is a linear regression with age^2,lot^2, and size^2
create; cuage=age*age*age$
create; culot=lot*lot*lot$
create; cusize=size*size*lot$
regress;
lhs=price;
rhs=one, bd, ba, size, lot, cuage, q2, q3, q4, q5$ this is a linear regression with age^3
regress;
lhs=price;
rhs=one, bd, ba, size, culot, age, q2, q3, q4, q5$ this is a linear regression with lot^3
regress;
lhs=price;
rhs=one, bd, ba, cusize, lot, age, q2, q3, q4, q5$ this is a linear regression with size^3
regress;
lhs=price;
rhs=one, bd, ba, cusize, culot, cuage, q2, q3, q4, q5$ this is a linear regression with age^3,lot^3, and size^3
skip;
regress;
lhs=price;
rhs=one, bd, ba, size, lot, age, q2, q3, q4, q5; hetro$ this is a linear regression
skip;
regress;
lhs=lnprice;
rhs=one, bd, ba, size, lot, age, q2, q3, q4, q5; hetro$ this is a semi-log regression
create; lnbd=log(bd)$
create; lnba=log(ba)$
create; lnsize=log(size)$
create; lnlot=log(lot)$
create; lnage=log(age)$
skip;
regress;
lhs=lnprice;
rhs=one, lnbd, lnba, lnsize, lnlot, lnage, q2, q3, q4, q5; hetro$ this is a log-log regression
skip;
regress;
lhs=price;
rhs=one, bd, ba, size, lot, sqage, q2, q3, q4, q5; hetro$ this is a linear regression with age^2
regress;
lhs=price;
rhs=one, bd, ba, size, sqlot, age, q2, q3, q4, q5; hetro$ this is a linear regression with lot^2
regress;
lhs=price;
rhs=one, bd, ba, sqsize, lot, age, q2, q3, q4, q5; hetro$ this is a linear regression with size^2
regress;
lhs=price;
rhs=one, bd, ba, sqsize, sqlot, sqage, q2, q3, q4, q5; hetro$ this is a linear regression with age^2,lot^2, and size^2
regress;
lhs=price;
rhs=one, bd, ba, size, lot, cuage, q2, q3, q4, q5; hetro$ this is a linear regression with age^3
regress;
lhs=price;
rhs=one, bd, ba, size, culot, age, q2, q3, q4, q5; hetro$ this is a linear regression with lot^3
regress;
lhs=price;
rhs=one, bd, ba, cusize, lot, age, q2, q3, q4, q5; hetro$ this is a linear regression with size^3
regress;
lhs=price;
rhs=one, bd, ba, cusize, culot, cuage, q2, q3, q4, q5; hetro$ this is a linear regression with age^3,lot^3, and size^3
Appendix of Regression Results
Linear: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
+———————————————————————–+
| Ordinary least squares regression Weighting variable = none |
| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |
| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |
| Residuals: Sum of squares= .1861022040E+13, Std.Dev.= 123005.12383 |
| Fit: R-squared= .606423, Adjusted R-squared = .57762 |
| Model test: F[ 9, 123] = 21.06, Prob value = .00000 |
| Diagnostic: Log-L = -1742.2784, Restricted(b=0) Log-L = -1804.2882 |
| LogAmemiyaPrCrt.= 23.512, Akaike Info. Crt.= 26.350 |
| Autocorrel: Durbin-Watson Statistic = 1.64819, Rho = .17591 |
+———————————————————————–+
+———+————–+—————-+——–+———+———-+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+———+————–+—————-+——–+———+———-+
Constant -226120.6909 72598.896 -3.115 .0023
BD 37659.81050 17601.113 2.140 .0344 3.4887218
BA 122116.0065 18335.652 6.660 .0000 2.3646617
SIZE 35.09598346 12.983202 2.703 .0078 2095.3534
LOT 2.715900837 1.3582596 2.000 .0478 10698.880
AGE 1245.204431 460.88289 2.702 .0079 35.714286
Q2 33570.17097 45364.143 .740 .4607 .36842105
Q3 54453.79975 45884.666 1.187 .2376 .32330827
Q4 68688.20522 50361.193 1.364 .1751 .15789474
Q5 68402.54427 57322.227 1.193 .2350 .82706767E-01
Semi-Log: Ln(Selling price)= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
+———————————————————————–+
| Ordinary least squares regression Weighting variable = none |
| Dep. var. = LNPRICE Mean= 12.77153525 , S.D.= .4290662382 |
| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |
| Residuals: Sum of squares= 9.657298755 , Std.Dev.= .28020 |
| Fit: R-squared= .602595, Adjusted R-squared = .57352 |
| Model test: F[ 9, 123] = 20.72, Prob value = .00000 |
| Diagnostic: Log-L = -14.3136, Restricted(b=0) Log-L = -75.6798 |
| LogAmemiyaPrCrt.= -2.472, Akaike Info. Crt.= .366 |
| Autocorrel: Durbin-Watson Statistic = 1.65052, Rho = .17474 |
+———————————————————————–+
+———+————–+—————-+——–+———+———-+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+———+————–+—————-+——–+———+———-+
Constant 11.37970888 .16537966 68.810 .0000
BD .1151867289 .40095183E-01 2.873 .0048 3.4887218
BA .2576886939 .41768456E-01 6.169 .0000 2.3646617
SIZE .7566868199E-04 .29575623E-04 2.558 .0117 2095.3534
LOT .5521456699E-05 .30941035E-05 1.785 .0768 10698.880
AGE .2349243288E-02 .10498872E-02 2.238 .0270 35.714286
Q2 .3073233877E-01 .10333913 .297 .7667 .36842105
Q3 .9723291039E-01 .10452487 .930 .3541 .32330827
Q4 .1497174823 .11472236 1.305 .1943 .15789474
Q5 .1535488191 .13057954 1.176 .2419 .82706767E-01
Log-Log: Ln(Selling price)= +LN(BD) + LN(BA) + LN (Size) + LN (Lot) + LN (Age) + LN (Q2) + LN (Q3) + LN (Q4)+ LN (Q5) +
+———————————————————————–+
| Ordinary least squares regression Weighting variable = none |
| Dep. var. = LNPRICE Mean= 12.77153525 , S.D.= .4290662382 |
| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |
| Residuals: Sum of squares= 10.20204619 , Std.Dev.= .28800 |
| Fit: R-squared= .580179, Adjusted R-squared = .54946 |
| Model test: F[ 9, 123] = 18.89, Prob value = .00000 |
| Diagnostic: Log-L = -17.9627, Restricted(b=0) Log-L = -75.6798 |
| LogAmemiyaPrCrt.= -2.417, Akaike Info. Crt.= .420 |
| Autocorrel: Durbin-Watson Statistic = 1.60387, Rho = .19806 |
+———————————————————————–+
+———+————–+—————-+——–+———+———-+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+———+————–+—————-+——–+———+———-+
Constant 8.757137166 .63677360 13.752 .0000
LNBD .2241713068 .15172804 1.477 .1421 1.2228521
LNBA .5008477393 .10998272 4.554 .0000 .81084373
LNSIZE .3844196249 .92943230E-01 4.136 .0001 7.5455685
LNLOT .6238403231E-02 .31755881E-01 .196 .8446 8.9728713
LNAGE .9113424286E-01 .36461800E-01 2.499 .0138 3.3225134
Q2 .3070924469E-01 .10623658 .289 .7730 .36842105
Q3 .8425880607E-01 .10815729 .779 .4375 .32330827
Q4 .1622575363 .11861502 1.368 .1738 .15789474
Q5 .1275544129 .13472863 .947 .3456 .82706767E-01
Age: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
+———————————————————————–+
| Ordinary least squares regression Weighting variable = none |
| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |
| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |
| Residuals: Sum of squares= .1894530209E+13, Std.Dev.= 124107.55265 |
| Fit: R-squared= .599337, Adjusted R-squared = .57002 |
| Model test: F[ 9, 123] = 20.44, Prob value = .00000 |
| Diagnostic: Log-L = -1743.4651, Restricted(b=0) Log-L = -1804.2882 |
| LogAmemiyaPrCrt.= 23.530, Akaike Info. Crt.= 26.368 |
| Autocorrel: Durbin-Watson Statistic = 1.58509, Rho = .20745 |
+———————————————————————–+
+———+————–+—————-+——–+———+———-+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+———+————–+—————-+——–+———+———-+
Constant -198197.6762 71355.991 -2.778 .0063
BD 41632.61060 17697.361 2.352 .0202 3.4887218
BA 119922.0764 18461.383 6.496 .0000 2.3646617
SIZE 30.77013285 12.951058 2.376 .0191 2095.3534
LOT 2.819835392 1.3728617 2.054 .0421 10698.880
SQAGE 9.136372604 4.0879400 2.235 .0272 1854.7519
Q2 31407.95264 45749.284 .687 .4937 .36842105
Q3 56904.58301 46290.731 1.229 .2213 .32330827
Q4 63905.15657 50810.226 1.258 .2109 .15789474
Q5 64079.37547 57762.299 1.109 .2694 .82706767E-01
Lot: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
+———————————————————————–+
| Ordinary least squares regression Weighting variable = none |
| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |
| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |
| Residuals: Sum of squares= .1864381533E+13, Std.Dev.= 123116.09740 |
| Fit: R-squared= .605713, Adjusted R-squared = .57686 |
| Model test: F[ 9, 123] = 21.00, Prob value = .00000 |
| Diagnostic: Log-L = -1742.3983, Restricted(b=0) Log-L = -1804.2882 |
| LogAmemiyaPrCrt.= 23.514, Akaike Info. Crt.= 26.352 |
| Autocorrel: Durbin-Watson Statistic = 1.65096, Rho = .17452 |
+———————————————————————–+
+———+————–+—————-+——–+———+———-+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+———+————–+—————-+——–+———+———-+
Constant -218881.7371 72780.583 -3.007 .0032
BD 42017.60864 17509.912 2.400 .0179 3.4887218
BA 119170.0447 18521.575 6.434 .0000 2.3646617
SIZE 36.16375705 12.928519 2.797 .0060 2095.3534
SQLOT .5893264706E-04 .30354526E-04 1.941 .0545 .18774546E+09
AGE 1243.157772 461.28091 2.695 .0080 35.714286
Q2 33332.79329 45410.126 .734 .4643 .36842105
Q3 55029.73049 45919.835 1.198 .2331 .32330827
Q4 71429.12637 50299.962 1.420 .1581 .15789474
Q5 66239.11398 57337.025 1.155 .2502 .82706767E-01
Size: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
+———————————————————————–+
| Ordinary least squares regression Weighting variable = none |
| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |
| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |
| Residuals: Sum of squares= .1940633545E+13, Std.Dev.= 125608.55264 |
| Fit: R-squared= .589586, Adjusted R-squared = .55956 |
| Model test: F[ 9, 123] = 19.63, Prob value = .00000 |
| Diagnostic: Log-L = -1745.0640, Restricted(b=0) Log-L = -1804.2882 |
| LogAmemiyaPrCrt.= 23.554, Akaike Info. Crt.= 26.392 |
| Autocorrel: Durbin-Watson Statistic = 1.59022, Rho = .20489 |
+———————————————————————–+
+———+————–+—————-+——–+———+———-+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+———+————–+—————-+——–+———+———-+
Constant -241337.0604 74193.454 -3.253 .0015
BD 49579.14015 17172.555 2.887 .0046 3.4887218
BA 135398.2084 17772.520 7.618 .0000 2.3646617
SQSIZE .1810164194E-02 .12924632E-02 1.401 .1639 5618018.3
LOT 3.151652620 1.3736058 2.294 .0235 10698.880
AGE 1117.317431 467.27311 2.391 .0183 35.714286
Q2 39357.25904 46281.101 .850 .3968 .36842105
Q3 61364.81777 46755.053 1.312 .1918 .32330827
Q4 74204.44587 51370.025 1.445 .1511 .15789474
Q5 71571.08568 58518.552 1.223 .2236 .82706767E-01
Age,Lot, Size: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
+———————————————————————–+
| Ordinary least squares regression Weighting variable = none |
| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |
| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |
| Residuals: Sum of squares= .1964510934E+13, Std.Dev.= 126378.92867 |
| Fit: R-squared= .584537, Adjusted R-squared = .55414 |
| Model test: F[ 9, 123] = 19.23, Prob value = .00000 |
| Diagnostic: Log-L = -1745.8772, Restricted(b=0) Log-L = -1804.2882 |
| LogAmemiyaPrCrt.= 23.567, Akaike Info. Crt.= 26.404 |
| Autocorrel: Durbin-Watson Statistic = 1.55620, Rho = .22190 |
+———————————————————————–+
+———+————–+—————-+——–+———+———-+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+———+————–+—————-+——–+———+———-+
Constant -209053.4763 72658.334 -2.877 .0047
BD 57570.61037 17043.151 3.378 .0010 3.4887218
BA 129251.6231 17991.802 7.184 .0000 2.3646617
SQSIZE .1652037302E-02 .12929569E-02 1.278 .2038 5618018.3
SQLOT .6912860879E-04 .31054480E-04 2.226 .0278 .18774546E+09
SQAGE 8.968801650 4.1642212 2.154 .0332 1854.7519
Q2 37103.33327 46532.998 .797 .4268 .36842105
Q3 63880.95430 47029.821 1.358 .1769 .32330827
Q4 72745.10419 51545.465 1.411 .1607 .15789474
Q5 65725.66481 58752.159 1.119 .2655 .82706767E-01
Age: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
+———————————————————————–+
| Ordinary least squares regression Weighting variable = none |
| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |
| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |
| Residuals: Sum of squares= .1925036441E+13, Std.Dev.= 125102.76885 |
| Fit: R-squared= .592885, Adjusted R-squared = .56310 |
| Model test: F[ 9, 123] = 19.90, Prob value = .00000 |
| Diagnostic: Log-L = -1744.5274, Restricted(b=0) Log-L = -1804.2882 |
| LogAmemiyaPrCrt.= 23.546, Akaike Info. Crt.= 26.384 |
| Autocorrel: Durbin-Watson Statistic = 1.52685, Rho = .23657 |
+———————————————————————–+
+———+————–+—————-+——–+———+———-+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+———+————–+—————-+——–+———+———-+
Constant -188301.8461 71581.262 -2.631 .0096
BD 43262.82077 17864.936 2.422 .0169 3.4887218
BA 119268.5435 18603.691 6.411 .0000 2.3646617
SIZE 29047.71377 13050.325 2.226 .0278 2.0953534
LOT 2784.715719 1384.2298 2.012 .0464 10.698880
CUAGE 61.32949530 35.606852 1.722 .0875 126.19379
Q2 30085.44489 46106.441 .653 .5153 .36842105
Q3 57714.60912 46667.604 1.237 .2185 .32330827
Q4 61911.92839 51262.681 1.208 .2295 .15789474
Q5 60372.26504 58168.168 1.038 .3014 .82706767E-01
Lot: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
+———————————————————————–+
| Ordinary least squares regression Weighting variable = none |
| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |
| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |
| Residuals: Sum of squares= .1887007804E+13, Std.Dev.= 123860.91740 |
| Fit: R-squared= .600927, Adjusted R-squared = .57173 |
| Model test: F[ 9, 123] = 20.58, Prob value = .00000 |
| Diagnostic: Log-L = -1743.2005, Restricted(b=0) Log-L = -1804.2882 |
| LogAmemiyaPrCrt.= 23.526, Akaike Info. Crt.= 26.364 |
| Autocorrel: Durbin-Watson Statistic = 1.64620, Rho = .17690 |
+———————————————————————–+
+———+————–+—————-+——–+———+———-+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+———+————–+—————-+——–+———+———-+
Constant -220786.5652 73216.452 -3.016 .0031
BD 43211.70481 17648.554 2.448 .0158 3.4887218
BA 119742.4105 18701.854 6.403 .0000 2.3646617
SIZE 37264.69827 12975.229 2.872 .0048 2.0953534
CULOT .9963556600 .66434010 1.500 .1362 4978.6157
AGE 12253.39723 4639.4159 2.641 .0093 3.5714286
Q2 33928.23230 45686.087 .743 .4591 .36842105
Q3 55877.82472 46192.497 1.210 .2287 .32330827
Q4 74211.29656 50552.385 1.468 .1447 .15789474
Q5 64860.89912 57673.176 1.125 .2629 .82706767E-01
Size: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
+———————————————————————–+
| Ordinary least squares regression Weighting variable = none |
| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |
| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |
| Residuals: Sum of squares= .1925640795E+13, Std.Dev.= 125122.40495 |
| Fit: R-squared= .592757, Adjusted R-squared = .56296 |
| Model test: F[ 9, 123] = 19.89, Prob value = .00000 |
| Diagnostic: Log-L = -1744.5482, Restricted(b=0) Log-L = -1804.2882 |
| LogAmemiyaPrCrt.= 23.547, Akaike Info. Crt.= 26.384 |
| Autocorrel: Durbin-Watson Statistic = 1.59886, Rho = .20057 |
+———————————————————————–+
+———+————–+—————-+——–+———+———-+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+———+————–+—————-+——–+———+———-+
Constant -221337.2167 76099.097 -2.909 .0043
BD 50303.69757 16942.158 2.969 .0036 3.4887218
BA 131567.8584 18073.353 7.280 .0000 2.3646617
CUSIZE 189.8603463 110.83285 1.713 .0892 75.419611
LOT 1613.499064 1676.2397 .963 .3377 10.698880
AGE 11162.99840 4646.6675 2.402 .0178 3.5714286
Q2 39134.77136 46066.731 .850 .3972 .36842105
Q3 59027.42881 46624.808 1.266 .2079 .32330827
Q4 73710.37655 51173.359 1.440 .1523 .15789474
Q5 69016.76995 58325.435 1.183 .2390 .82706767E-01
Age,Lot, Size: Selling price= +(BD) + (BA) + (Size) + (Lot) + (Age) + (Q2) + (Q3) + (Q4)+(Q5) +
+———————————————————————–+
| Ordinary least squares regression Weighting variable = none |
| Dep. var. = PRICE Mean= 387570.3008 , S.D.= 189266.5817 |
| Model size: Observations = 133, Parameters = 10, Deg.Fr.= 123 |
| Residuals: Sum of squares= .1981635825E+13, Std.Dev.= 126928.56403 |
| Fit: R-squared= .580915, Adjusted R-squared = .55025 |
| Model test: F[ 9, 123] = 18.94, Prob value = .00000 |
| Diagnostic: Log-L = -1746.4544, Restricted(b=0) Log-L = -1804.2882 |
| LogAmemiyaPrCrt.= 23.575, Akaike Info. Crt.= 26.413 |
| Autocorrel: Durbin-Watson Statistic = 1.52703, Rho = .23648 |
+———————————————————————–+
+———+————–+—————-+——–+———+———-+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+———+————–+—————-+——–+———+———-+
Constant -179207.8145 74321.138 -2.411 .0174
BD 56041.05757 17411.842 3.219 .0016 3.4887218
BA 125686.3363 18235.331 6.892 .0000 2.3646617
CUSIZE 207.0734780 108.47541 1.909 .0586 75.419611
CULOT .3907830999 .80803997 .484 .6295 4978.6157
CUAGE 61.71148077 36.127071 1.708 .0901 126.19379
Q2 34966.44174 46677.926 .749 .4552 .36842105
Q3 61530.57060 47310.295 1.301 .1958 .32330827
Q4 69789.42370 51816.146 1.347 .1805 .15789474
Q5 59428.41302 58942.872 1.008 .3153 .82706767E-01