Simple Linear Regression is a Supervised machine learning Algorithm coming under the concept of regression.
It is a statistical model that represents the relationship between one independent variable (X) and one dependent variable (y).
In this context the plot should be a straight line which is called as best-fit line or regression line.
It is represented by the regression equation, that is:
Y = m*X + c
where m represents the slope or gradient of the line, this can be positive slope, negative slope or zero, and c represents the y-intercept of the line.
Example: Let we have a csv file named as “homeprice.csv” having two columns like area and price. Here by implementing simple linear regression we will predict the price value by taking an area value. Here we have considered area as X(independent variable) and price as y(dependent variable)
#import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#Create dataframe
df=pd.read_csv("E:\dataset\homeprice.csv")
print(df)
Output:
area | price | |
0 | 2600 | 550000 |
1 | 3000 | 565000 |
2 | 3200 | 610000 |
3 | 3600 | 680000 |
4 | 4000 | 725000 |
df.shape
Output:
(5, 2)
df.info()
< class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 area 5 non-null int64
1 price 5 non-null int64
dtypes: int64(2)
memory usage: 208.0 bytes
df.describe()
Output:
#Check missing value
df.isnull().sum()
Output:
area 0
price 0
dtype: int64
#Scatter plot
plt.scatter(df.area,df.price,marker='*',color='red')
plt.xlabel("area values")
plt.ylabel("price values")
plt.show()
#Create LinearRegression model
from sklearn.linear_model import LinearRegression
obj=LinearRegression()
#Trained the model
obj.fit(df[['area']],df.price)
Output:
LinearRegression()
#Predict the value
obj.predict([[4500]])
Output:
array([791660.95890411])
#m-value
obj.coef_
Output:
array([135.78767123])
#c-value
obj.intercept_
Output:
180616.43835616432
#Now we will put m and c value in regression equation y=m*X+c
y=135.78767123*4500+180616.43835616432
print(y)
791660.9588911643
#Plot a bestfit line(regression line)
plt.scatter(df.area,df.price,marker='*',color='red')
plt.plot(df.area,obj.predict(df[['area']]),color='blue')
plt.xlabel("area values")
plt.ylabel("price values")
plt.show()
Silan Software is one of the India's leading provider of offline & online training for Java, Python, AI (Machine Learning, Deep Learning), Data Science, Software Development & many more emerging Technologies.
We provide Academic Training || Industrial Training || Corporate Training || Internship || Java || Python || AI using Python || Data Science etc