Sunday, December 8, 2019

Machine learning Day2


In [1]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
In [2]:
from sklearn import datasets,linear_model
In [3]:
diabetes=datasets.load_diabetes()
print(diabetes.DESCR)
.. _diabetes_dataset:

Diabetes dataset
----------------

Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

**Data Set Characteristics:**

  :Number of Instances: 442

  :Number of Attributes: First 10 columns are numeric predictive values

  :Target: Column 11 is a quantitative measure of disease progression one year after baseline

  :Attribute Information:
      - Age
      - Sex
      - Body mass index
      - Average blood pressure
      - S1
      - S2
      - S3
      - S4
      - S5
      - S6

Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times `n_samples` (i.e. the sum of squares of each column totals 1).

Source URL:
https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html

For more information see:
Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499.
(https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)
In [ ]:
#There are 10 features on the basis of these 10 features we get an output
#say How wmuch the person is diabetic the 11the column will be the Traget column
In [4]:
from sklearn.metrics import mean_squared_error
In [5]:
diabetes_X=diabetes.data
In [6]:
diabetes_X_test=diabetes_X[-30:]
diabetes_X_training=diabetes_X[:-30]
In [7]:
diabetes_y_test=diabetes.target[-30:]
diabetes_y_training=diabetes.target[:-30]
In [8]:
model=linear_model.LinearRegression()
In [9]:
model.fit(diabetes_X_training,diabetes_y_training)
Out[9]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
In [10]:
diabetes_y_predict=model.predict(diabetes_X_test)
In [11]:
#plt.scatter(diabetes_X_training,diabetes_y_training)
#plt.show()
In [12]:
#plt.scatter(diabetes_X_training,diabetes_y_training)
#plt.plot(diabetes_X_test,diabetes_y_predict)
#plt.show()
In [13]:
print("coef",model.coef_)
coef [  -1.16924976 -237.18461486  518.30606657  309.04865826 -763.14121622
  458.90999325   80.62441437  174.32183366  721.49712065   79.19307944]
In [14]:
print("Intercept=",model.intercept_)
Intercept= 153.05827988224112
In [15]:
print("Mean Squared Error is ",mean_squared_error(diabetes_y_test,diabetes_y_predict))
Mean Squared Error is  1826.5364191345423
In [16]:
type(diabetes)
Out[16]:
sklearn.utils.Bunch
In [17]:
df=pd.read_csv('USA_Housing.csv')
In [18]:
df
Out[18]:
Avg. Area Income
Avg. Area House Age
Avg. Area Number of Rooms
Avg. Area Number of Bedrooms
Area Population
Price
Address
0
79545.45857
5.682861
7.009188
4.09
23086.80050
1.059034e+06
208 Michael Ferry Apt. 674\nLaurabury, NE 3701...
1
79248.64245
6.002900
6.730821
3.09
40173.07217
1.505891e+06
188 Johnson Views Suite 079\nLake Kathleen, CA...
2
61287.06718
5.865890
8.512727
5.13
36882.15940
1.058988e+06
9127 Elizabeth Stravenue\nDanieltown, WI 06482...
3
63345.24005
7.188236
5.586729
3.26
34310.24283
1.260617e+06
USS Barnett\nFPO AP 44820
4
59982.19723
5.040555
7.839388
4.23
26354.10947
6.309435e+05
USNS Raymond\nFPO AE 09386
5
80175.75416
4.988408
6.104512
4.04
26748.42842
1.068138e+06
06039 Jennifer Islands Apt. 443\nTracyport, KS...
6
64698.46343
6.025336
8.147760
3.41
60828.24909
1.502056e+06
4759 Daniel Shoals Suite 442\nNguyenburgh, CO ...
7
78394.33928
6.989780
6.620478
2.42
36516.35897
1.573937e+06
972 Joyce Viaduct\nLake William, TN 17778-6483
8
59927.66081
5.362126
6.393121
2.30
29387.39600
7.988695e+05
USS Gilbert\nFPO AA 20957
9
81885.92718
4.423672
8.167688
6.10
40149.96575
1.545155e+06
Unit 9446 Box 0958\nDPO AE 97025
10
80527.47208
8.093513
5.042747
4.10
47224.35984
1.707046e+06
6368 John Motorway Suite 700\nJanetbury, NM 26854
11
50593.69550
4.496513
7.467627
4.49
34343.99189
6.637324e+05
911 Castillo Park Apt. 717\nDavisborough, PW 7...
12
39033.80924
7.671755
7.250029
3.10
39220.36147
1.042814e+06
209 Natasha Stream Suite 961\nHuffmanland, NE ...
13
73163.66344
6.919535
5.993188
2.27
32326.12314
1.291332e+06
829 Welch Track Apt. 992\nNorth John, AR 26532...
14
69391.38018
5.344776
8.406418
4.37
35521.29403
1.402818e+06
PSC 5330, Box 4420\nAPO AP 08302
15
73091.86675
5.443156
8.517513
4.01
23929.52405
1.306675e+06
2278 Shannon View\nNorth Carriemouth, NM 84617
16
79706.96306
5.067890
8.219771
3.12
39717.81358
1.556787e+06
064 Hayley Unions\nNicholsborough, HI 44161-1887
17
61929.07702
4.788550
5.097010
4.30
24595.90150
5.284852e+05
5498 Rachel Locks\nNew Gregoryshire, PW 54755
18
63508.19430
5.947165
7.187774
5.12
35719.65305
1.019426e+06
Unit 7424 Box 2786\nDPO AE 71255
19
62085.27640
5.739411
7.091808
5.49
44922.10670
1.030591e+06
19696 Benjamin Cape\nStephentown, ME 36952-4733
20
86294.99909
6.627457
8.011898
4.07
47560.77534
2.146925e+06
030 Larry Park Suite 665\nThomashaven, HI 8794...
21
60835.08998
5.551222
6.517175
2.10
45574.74166
9.292476e+05
USNS Brown\nFPO AP 85833
22
64490.65027
4.210323
5.478088
4.31
40358.96011
7.188872e+05
95198 Ortiz Key\nPort Sara, TN 24541-2855
23
60697.35154
6.170484
7.150537
6.34
28140.96709
7.439998e+05
9003 Jay Plains Suite 838\nLake Elizabeth, IN ...
24
59748.85549
5.339340
7.748682
4.23
27809.98654
8.957371e+05
24282 Paul Valley\nWest Perry, MI 03169-5806
25
56974.47654
8.287562
7.312880
4.33
40694.86951
1.453975e+06
61938 Brady Falls\nLewisfort, DE 61227
26
82173.62608
4.018525
6.992699
2.03
38853.91807
1.125693e+06
3599 Ramirez Springs\nJacksonhaven, AZ 72798
27
64626.88098
5.443360
6.988754
4.00
27784.74228
9.754295e+05
073 Christopher Falls Suite 882\nWest Cynthia,...
28
90499.05745
6.384359
4.242191
3.04
33970.16499
1.240764e+06
6531 Chase Prairie Apt. 245\nSusanshire, MN 22365
29
59323.79210
6.977828
8.273697
4.07
37520.65773
1.577018e+06
17124 Johnson Squares\nLake Robertfurt, AL 618...
...
...
...
...
...
...
...
...
4970
55980.20481
7.014510
5.458789
2.11
43968.68705
1.120943e+06
2558 King Trail\nEast Catherinebury, MP 23625-...
4971
73491.13443
5.784430
4.425959
3.37
30800.54106
1.111307e+06
6043 Stevens Stream\nWest Kimberlymouth, ME 49723
4972
83695.27238
7.643507
7.127219
5.05
33113.75906
1.736402e+06
33465 Hernandez Forest Apt. 692\nPort Ashleyfo...
4973
78743.75927
6.583685
6.595683
4.07
24381.14454
1.340770e+06
805 David Knoll Apt. 216\nMccarthyview, GU 74316
4974
70720.29646
6.411801
5.048128
3.01
19114.01925
8.013486e+05
14742 Lopez Ridge Apt. 889\nJessicatown, CA 28254
4975
54037.58088
8.471765
6.966072
3.27
28696.17086
1.324382e+06
6278 Jenkins Harbors Apt. 807\nNew Yvettehaven...
4976
75046.31379
5.351169
7.797825
5.23
34107.88862
1.340344e+06
55823 Stuart Fields\nNunezstad, NM 03601
4977
75980.43884
6.583105
5.914892
3.23
40394.59349
1.518478e+06
1831 Escobar Plain Suite 171\nMartinezberg, OH...
4978
80393.33950
8.899713
5.652974
4.04
39547.93249
1.910585e+06
02084 Rivera Lock\nHallville, NJ 32367-9579
4979
82224.69501
5.434087
8.375708
3.12
57166.86751
1.823498e+06
4679 Turner Tunnel\nRosariobury, CT 68552-4766
4980
75664.02448
5.789203
6.415312
2.02
54724.25127
1.406865e+06
0476 Jessica Shoals\nMelissamouth, DE 39609-2777
4981
71663.87129
6.150745
7.311907
6.33
24109.77806
1.203850e+06
1316 Tony Inlet Suite 235\nWest Jimmy, SC 72946
4982
58800.90877
5.976507
7.304051
6.43
37426.70975
1.020096e+06
109 Lee Wall Apt. 315\nLunamouth, AZ 05121-3634
4983
69655.18395
7.721100
6.077795
4.29
32902.35558
1.194357e+06
39174 Jessica Mission Apt. 539\nWest Cindyboro...
4984
62623.35983
5.071624
6.771015
3.33
50985.97120
1.211900e+06
9894 Greg Ridge\nNorth Tiffanyhaven, ID 66602-...
4985
75117.04295
6.036275
6.538111
2.22
43976.03106
1.378938e+06
PSC 7442, Box 6234\nAPO AP 13017
4986
71060.40601
5.718839
7.222730
4.34
34814.58559
1.260241e+06
5611 Matthew Avenue\nLake Kevin, FM 72963-8891
4987
65729.22233
6.237787
6.860475
3.12
25573.85429
1.197073e+06
641 Lisa Parkways Suite 552\nWest Amandaside, ...
4988
67637.84067
7.056673
5.774409
3.05
43846.53134
1.275143e+06
6066 Sanders Court Apt. 914\nSouth Alexis, FM ...
4989
47965.40690
5.694638
7.363327
5.40
46071.94734
8.852050e+05
19960 Scott Street\nPort Brenda, MO 02292-8651
4990
52723.87656
5.452237
8.124571
6.39
14802.08844
4.795006e+05
86727 Kelly Plaza\nLake Veronica, IL 04474
4991
74102.19189
5.657841
7.683993
3.13
24041.27059
1.263721e+06
2871 John Lodge\nAmychester, GU 61734-5597
4992
87499.12574
6.403473
4.836091
4.02
40815.19968
1.568701e+06
Unit 2096 Box 9559\nDPO AE 80983-8797
4993
69639.14090
5.007510
7.778375
6.05
54056.12843
1.381831e+06
5259 David Causeway Apt. 975\nSouth Alexstad, ...
4994
73060.84623
5.293682
6.312253
4.16
22695.69548
9.053549e+05
5224 Lamb Passage\nNancystad, GA 16579
4995
60567.94414
7.830362
6.137356
3.46
22837.36103
1.060194e+06
USNS Williams\nFPO AP 30153-7653
4996
78491.27543
6.999135
6.576763
4.02
25616.11549
1.482618e+06
PSC 9258, Box 8489\nAPO AA 42991-3352
4997
63390.68689
7.250591
4.805081
2.13
33266.14549
1.030730e+06
4215 Tracy Garden Suite 076\nJoshualand, VA 01...
4998
68001.33124
5.534388
7.130144
5.44
42625.62016
1.198657e+06
USS Wallace\nFPO AE 73316
4999
65510.58180
5.992305
6.792336
4.07
46501.28380
1.298950e+06
37778 George Ridges Apt. 509\nEast Holly, NV 2...
5000 rows × 7 columns
In [20]:
df_x=df[::]
In [21]:
df_y=df['Price']
In [22]:
df_x_test=df_x[-30:]
In [28]:
df_x_training=df_x[:-70]
In [25]:
df_y_test=df_y[-30:]
In [26]:
df_y_training=df_y[:-70]
In [27]:
model=linear_model.LinearRegression()
In [29]:
model.fit(df_x_training,df_y_training)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-29-5be6df67d310> in <module>
----> 1 model.fit(df_x_training,df_y_training)

~\Anaconda3\lib\site-packages\sklearn\linear_model\base.py in fit(self, X, y, sample_weight)
    461         n_jobs_ = self.n_jobs
    462         X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'],
--> 463                          y_numeric=True, multi_output=True)
    464
    465         if sample_weight is not None and np.atleast_1d(sample_weight).ndim > 1:

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    717                     ensure_min_features=ensure_min_features,
    718                     warn_on_dtype=warn_on_dtype,
--> 719                     estimator=estimator)
    720     if multi_output:
    721         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    534         # make sure we actually converted to numeric:
    535         if dtype_numeric and array.dtype.kind == "O":
--> 536             array = array.astype(np.float64)
    537         if not allow_nd and array.ndim >= 3:
    538             raise ValueError("Found array with dim %d. %s expected <= 2."

ValueError: could not convert string to float: '208 Michael Ferry Apt. 674\nLaurabury, NE 37010-5101'
In [ ]:


No comments:

Post a Comment

Featured Post

Ichimoku cloud

Here how you read a ichimoku cloud 1) Blue Converse line: It measures short term trend. it also shows minor support or resistance. Its ve...