MultilinearRegression
Multilinear regression is a statistical technique used to model the relationship between a dependent variable and two or more independent variables. Unlike simple linear regression, which considers only one independent variable, multilinear regression accounts for the influence of multiple factors simultaneously on the target variable.
The fundamental assumption is that there's a linear relationship between the dependent variable and a combination of the independent variables. This relationship is represented by a mathematical equation where the dependent variable (y) is expressed as a weighted sum of the independent variables (x1, x2, … xn), plus an intercept term or constant.
# y = b0 + b1*X1 + b2*X2 + ... + bn*Xn
The model aims to find the best set of weights (coefficients) that minimize the difference between predicted and observed values of the target.
This module covers the following steps:
➤ 1. Import Data
➤ 2. Coefficient of determination
➤ 3. Predict and Test the model
➤ 4. Compare the actual and predicted values
➤ 5. Actual vs. Predicted Graph
➤ 6. Metrics
➤ 7. Predicting Close Price
➤ 8. Strengths and Weaknesses
MultilinearRegression
↪ 1. Import Data
This module demonstrates how to predict a stock's closing price using the previous day's closing price, the current day's opening price, and the total trading volume. The code begins by importing pre-processed data, which is then divided into training and testing sets.
# Import required libraries. import pandas as pd # Matplotlib is the fundamental plotting library import matplotlib.pyplot as plt # Seaborn builds upon Matplotlib, offering a import seaborn as sns # higher-level interface for statistical visualization. import numpy as np # Set default style and color scheme for Seaborn plots. sns.set(style="ticks", color_codes=True) # Import data data = pd.read_csv('https://raw.githubusercontent.com/csxplore/data/main/andromeda-cleaned.csv', header=0) X=data[['Prev Close','Open Price','Total Traded Quantity']] y=data[['Close Price']] # Split the data into training and test sets. from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2,random_state=0)
MultilinearRegression
↪ 2. Coefficient of determination
The coefficient of determination, commonly known as R-squared (R²), is a statistical measure used in linear regression. It quantifies the proportion of variance in the dependent variable (y) that is predictable from the independent variable (x), indicating the model's goodness of fit.
from sklearn.linear_model import LinearRegression #for linear regression model lm=LinearRegression() lm.fit(X_train,y_train) print("Slope: ", lm.coef_) print("Intercept: ", lm.intercept_)
---Output--- # Slope: [[1.46742456e-02 9.63624960e-01 4.91555129e-07]] # Intercept: [30.90481917]
The score() method returns the model's coefficient of determination.
print('Coefficient of determination: ',lm.score(X_train,y_train))The model explains 98.35% of the variance in the dependent variable 'Close Price', indicating a strong relationship between 'Prev Close', 'Open Price','Total Traded Quantity' and 'Close Price'.
---Output--- # Coefficient of determination: 0.9835526790784476
MultilinearRegression
↪ 3. Predict and Test the model
predictions = lm.predict(X_test)
The code below calculates and prints the R-squared (R²) score, also known as the coefficient of determination, for a regression model's predictions. It starts by importing the r2_score() function from the sklearn.metrics module. This function is specifically designed to evaluate the performance of regression models. The code then calls the r2_score() function, passing in two arguments: 1) y_test – the actual or true values of the dependent variable from the test dataset, and 2) predictions – the predicted values of the dependent variable generated by the trained regression model.
from sklearn.metrics import r2_score print('Coefficient of determination: ', r2_score(y_test, predictions))
---Output--- # Coefficient of determination: 0.9718303109010307
The output shows that the R² score is approximately 0.9718. An R² score ranges from 0 to 1, where 1 indicates a perfect fit of the model to the data. In this case, a score of 0.9718 suggests that the model explains a very high proportion (97.18%) of the variance in the target variable, indicating a strong fit between the model's predictions and the actual values in the test set.
MultilinearRegression
↪ 4. Compare the actual and predicted values
y_pred = pd.DataFrame(predictions, columns=['Pred']) dframe = pd.concat([y_test.reset_index(drop=True).astype(float),y_pred], axis=1) dframe.columns = ['Actual','Predicted'] graph = dframe.head(10) print(graph)
---Output--- # - Actual Predicted # 0 1671.80 1661.674921 # 1 1617.65 1627.239377 # 2 1409.80 1405.884155 # 3 1600.65 1597.242314 # 4 1568.30 1600.170675 # 5 1644.10 1635.130887 # 6 1564.35 1575.210479 # 7 1389.55 1409.412835 # 8 1486.10 1448.353305 # 9 1590.90 1568.568980
MultilinearRegression
↪ 5. Actual vs. Predicted Graph
graph.plot(kind='bar') plt.title('Actual vs Predicted') plt.ylabel('Closing price') plt.show()

MultilinearRegression
↪ 6. Metrics
The model's precision depends on the problem type that needs to be solved. Typically, use
Mean Absolute Error (MAE) Mean Absolute Error (MAE) is the mean of the absolute value of the difference between the predicted value and the actual value. The MAE tells how big of an error in the predicted value is expected in the model.
Mean squared error (MSE) Mean squared error (MSE) or Mean Squared Deviation (MSD)is the average squared distance between the actual and predicted values. The MSE represents the average squared residual.
Squaring the differences eliminates negative values of the differences and ensures that the MSE is positive. However, squaring increases the impact of larger errors, and these calculations disproportionately penalize larger errors more than smaller errors.
Variance is the average squared deviation of the observations from the mean. The MSE in contrast is the average of squared deviations of the predictions from the actual values (residuals).
The Root Mean Square Error (RMSE) The Root Mean Square Error (RMSE) measures the average difference between the actual and predicted values. The RMSE is the standard deviation of the residuals. Residuals represent the distance between the regression line and the data points.
MultilinearRegression
↪ 6. Metrics
from sklearn import metrics import math print('Mean Absolute Error: ', metrics.mean_absolute_error(y_test.astype(float),y_pred)) print('Mean Squared Error: ', metrics.mean_squared_error(y_test.astype(float),y_pred)) print('Root Mean Squared Error: ', math.sqrt(metrics.mean_squared_error(y_test.astype(float),y_pred)))
---Output--- # Mean Absolute Error: 13.217473604160523 # Mean Squared Error: 299.3529767486309 # Root Mean Squared Error: 17.3018200415052
MultilinearRegression
↪ 7. Predicting Close Price
Predict_Close_Price= lm.predict([[1508.80,1440,5000000]]) print("Predicted Value: ", Predict_Close_Price)
---Output--- # Predicted Value: [[1443.12303828]]
MultilinearRegression
↪ 8. Strengths and Weaknesses
Strengths
Weakness