EC Data MIning Handbok
Share
Explore
Regression

# Logistic regressionLogistic regression

Used when the dependent variable is discrete. Example: 0 or 1, true or false, etc.
This means the target variable can have only two values, and a sigmoid curve denotes the relation between the target variable and the independent variable.
Logit function is used in Logistic Regression to measure the relationship between the target variable and independent variables. Below is the equation that denotes the logistic regression.

logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3….+bkXk
where p is the probability of occurrence of the feature.

For selecting logistic regression, as the regression analyst technique, it should be noted, the size of data is large with the almost equal occurrence of values to come in target variables. Also, there should be no multicollinearity, which means that there should be no correlation between independent variables in the dataset.

Usage
Consider a group of people who share similar demographic information and who buy products from the Adventure Works company. By modeling the data to relate to a specific outcome, such as purchase of a target product, you can see how the demographic information contributes to someone's likelihood of buying the target product.

The requirements for a logistic regression model are as follows:
A single key column Each model must contain one numeric or text column that uniquely identifies each record. Compound keys are not allowed.
Input columns Each model must contain at least one input column that contains the values that are used as factors in analysis. You can have as many input columns as you want, but depending on the number of values in each column, the addition of extra columns can increase the time it takes to train the model.
At least one predictable column The model must contain at least one predictable column of any data type, including continuous numeric data. The values of the predictable column can also be treated as inputs to the model, or you can specify that it be used for prediction only. Nested tables are not allowed for predictable columns, but can be used as inputs.