EC Data MIning Handbok
Share
Explore
Regression

# Linear RegressionLinear Regression

Linear regression is one of the most basic types of regression in machine learning. The linear regression model consists of a predictor variable and a dependent variable related linearly to each other. In case the data involves more than one independent variable, then linear regression is called multiple linear regression models.
The below-given equation is used to denote the linear regression model:
y=mx+c+e
where m is the slope of the line, c is an intercept, and e represents the error in the model.
The best fit line is determined by varying the values of m and c. The predictor error is the difference between the observed values and the predicted value. The values of m and c get selected in such a way that it gives the minimum predictor error. It is important to note that a simple linear regression model is susceptible to outliers. Therefore, it should not be used in case of big size data.
Usage:
You can use linear regression to determine a relationship between two continuous columns. For example, you can use linear regression to compute a trend line from manufacturing or sales data. You could also use the linear regression as a precursor to development of more complex data mining models, to assess the relationships among data columns.

Although there are many ways to compute linear regression that do not require data mining tools, the advantage of using the Microsoft Linear Regression algorithm for this task is that all the possible relationships among the variables are automatically computed and tested.
You do not have to select a computation method, such as solving for least squares. However, linear regression might oversimplify the relationships in scenarios where multiple factors affect the outcome.

The requirements for this model type are as follows:
A single key column Each model must contain one numeric or text column that uniquely identifies each record. Compound keys are not permitted.
A predictable column Requires at least one predictable column. You can include multiple predictable attributes in a model, but the predictable attributes must be continuous numeric data types. You cannot use a datetime data type as a predictable attribute even if the native storage for the data is numeric.
Input columns Input columns must contain continuous numeric data and be assigned the appropriate data type.