Sunday, December 8, 2019

MAchine Learning Day 1

In [1]:
# we will start with supervised learning or we can say regression. 
#Simple linear equation and multiple regression will be covered
In [2]:
#Simple Linear equation
In [3]:
#y=mx+b. b is the intercept and m is the slope. in DS, the formula would bez F(x1)=w0+w1x1
In [4]:
#multiple regression
In [5]:
#F(x1)=w0+w1x1....so  F(x1,x2,x3..xn)=w0+w1x1 +w2x2.....wnxn
In [6]:
#we weed values for W in the above example from the algorithm
In [7]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
In [8]:
from sklearn import datasets,linear_model
In [9]:
diabetes=datasets.load_diabetes()
print(diabetes.DESCR)
.. _diabetes_dataset:

Diabetes dataset
----------------

Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

**Data Set Characteristics:**

  :Number of Instances: 442

  :Number of Attributes: First 10 columns are numeric predictive values

  :Target: Column 11 is a quantitative measure of disease progression one year after baseline

  :Attribute Information:
      - Age
      - Sex
      - Body mass index
      - Average blood pressure
      - S1
      - S2
      - S3
      - S4
      - S5
      - S6

Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times `n_samples` (i.e. the sum of squares of each column totals 1).

Source URL:
https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html

For more information see:
Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499.
(https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)
In [10]:
from sklearn.metrics import mean_squared_error
In [16]:
diabetes_X=diabetes.data[:,np.newaxis,2]
In [ ]:
#np.newaxis command formats he data into the desire form
In [17]:
diabetes_X_test=diabetes_X[-30:]
diabetes_X_training=diabetes_X[:-30]
In [ ]:
#[-30:] means first 30% data
In [18]:
diabetes_y_test=diabetes.target[-30:]
diabetes_y_training=diabetes.target[:-30]
In [ ]:
#y ius the taget column
In [19]:
model=linear_model.LinearRegression()
In [20]:
model.fit(diabetes_X_training,diabetes_y_training)
Out[20]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
In [21]:
diabetes_y_predict=model.predict(diabetes_X_test)
In [22]:
plt.scatter(diabetes_X_training,diabetes_y_training)
plt.show()
In [23]:
plt.scatter(diabetes_X_training,diabetes_y_training)
plt.plot(diabetes_X_test,diabetes_y_predict)
plt.show()
In [25]:
print("coef",model.coef_)
coef [941.43097333]
In [26]:
print("Intercept=",model.intercept_)
Intercept= 153.39713623331698
In [27]:
print("Mean Squared Error is ",mean_squared_error(diabetes_y_test,diabetes_y_predict))
Mean Squared Error is  3035.0601152912695
In [28]:
type(diabetes)
Out[28]:
sklearn.utils.Bunch
In [ ]:
 

No comments:

Post a Comment

Featured Post

Ichimoku cloud

Here how you read a ichimoku cloud 1) Blue Converse line: It measures short term trend. it also shows minor support or resistance. Its ve...