Thursday, April 30, 2020

machine learning step by step commands and structure

Machine learning follows a certain path. I am trying to make sure to give a structure for any dataset that you need to run

Common Steps


Actions that needed to be done
Commands
Step 1
Import Libraries
import numpy as np


import pandas as pd


from matplotlib import pyplot as plt


Import seaborn as sns



Step 2
Either import Datasets or create a dataframe


to import datastets
from sklearn import datasets

to create dataframe
df=pd.read_csv('path or the file name')



Step 3
once already built in Datasets are imported, load particular dataset
<dataset name>=datasets.load_<dataset name>s() for example diabetes=datasets.load_diabetes()



Step 4
to see the decription of the dataset
print(<dataset name>.DESCR)  for example print(diabetes.DESCR)



Step 5
To see the head of the dataframe
<dataset name>.head() for example recipes.head()



Step 6
Get the dimensions of the dataframe
<dataset name>.shape for example recipes.shape
Step 7
to get info regarding our dataset
df.info() where df is the dataframe name
Step 8
to find null values in the dataframe
df.isnull() where df is the dataframe name
Step 9
to find the correlation
df.corr()
Step 10
Heat map of corr
sns.heatmap(df.corr())
Step 11
to create heatmap of null values
sns.heatmap(df.isnull())
Step 12
to find the value count of null values of a column
,dataset name>['Column name '].isnull().value_counts()  for example  df['Cabin'].isnull().value_counts()
Step 13
to change the values to numerical from alphabetical
<list>={"value":0,"value":1}     df['<column>']=df['<column>'].map(<list>)   for example  gend={"Male":0,"Female":1}     df['Gender']=df['Gender'].map(gend)
Step 14
to drop any column from your dataset
df.drop("Address",axis=1,inplace=True) where df is the dataframe name
Step 15
to find the columns
df.columns
Step 16
To scale the data (if needed)
from sklearn.preprocessing import StandardScaler      ###we would need to import the scaleing model first###


scaler=StandardScaler() where scaler can be any name like bunny, america etc


scaler.fit(df.drop('Purchased',axis=1))  ###we are fitting the data to fit and dropping column Purchase###


scale_arr=scaler.transform(df.drop('Purchased',axis=1))  ###creating new array and transforming thoriginal dataframe###


<new dataframe>=pd.DataFrame(scale_arr,columns=['Column 1','Column 2']) 

for example 
 new_df=pd.DataFrame(scale_arr,columns=['Age','EstimatedSalary'])


LiNear Regression

Step 17
Now divide the data into X and Y where Y is the dependent variable and X is the independent variable.
<dataset name>_X=<dataset name>.data  for example diabetes_X=diabetes.data
x=<dataframe name >[['columns of dataframe', 'columns of dataframe', 'columns of dataframe',
       'columns of dataframe']] for example x=df[['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',
       'Area Population']]
y=<dataframe name >[['Target column of dataframe']] for example y=df[['Price']]
Step 18
Import the statisctical model
from sklearn import <model>  for example from sklearn import linear_model
from sklearn.<it can be any name>_selection import train_test_split

if mean squared needed to be find out else import SVC if svc needed to be found out or import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.30,random_state=42)   #we need to divide the data in training and test.test_size=0.30 means how much data you want to keep in test environment
#. here it is 30%
#random_state=42 means it will pick same data again and again

We will take part of the data for testing and part of the data for training
<dataset name>_X_test=<dataset name>_X[-30:] for example diabetes_X_test=diabetes_X[-30:]
<dataset name>_X_training=<dataset name>_X[:-30] for example diabetes_X_training=diabetes_X[:-30]
from sklearn.linear_<it can be any name> import LinearRegression


<dataset name>_y_test=<dataset name>.target[-30:] for example diabetes_y_test=diabetes.target[-30:]
<dataset name>_y_training=<dataset name>.target[:-30] for example diabetes_y_training=diabetes.target
<it can be any name>=LinearRegression()



<it can be any name>.fit(x_train,y_train)
Step 19
Now create the statistical model
model=linear_model.LinearRegression()
y_predict=<it can be any name>.predict(x_test)



y_predict



<it can be any name>.coef_



<it can be any name>.intercept_

LOGISTIC REGRESSION

Step 18
Import the statisctical model
from sklearn import <model>  for example from sklearn import linear_model
from sklearn.model_selection import train_test_split

if mean squared needed to be find out else import SVC if svc needed to be found out or import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.30,random_state=42)   #we need to divide the data in training and test.test_size=0.30 means how much data you want to keep in test environment
#. here it is 30%
#random_state=42 means it will pick same data again and again

We will take part of the data for testing and part of the data for training
<dataset name>_X_test=<dataset name>_X[-30:] for example diabetes_X_test=diabetes_X[-30:]
<dataset name>_X_training=<dataset name>_X[:-30] for example diabetes_X_training=diabetes_X[:-30]
from sklearn.linear_model import LogisticRegression


<dataset name>_y_test=<dataset name>.target[-30:] for example diabetes_y_test=diabetes.target[-30:]
<dataset name>_y_training=<dataset name>.target[:-30] for example diabetes_y_training=diabetes.target
<it can be any name>=LogisticRegression()



<it can be any name>.fit(x_train,y_train)
Step 19
Now create the statistical model
model=linear_model.LinearRegression()
y_predict=<it can be any name>.predict(x_test)



y_predict



<it can be any name>.coef_



<it can be any name>.intercept_



KNEighbor Classifier
Step 18
Import the statisctical model
from sklearn import <model>  for example from sklearn import linear_model
from sklearn.model_selection import train_test_split

if mean squared needed to be find out else import SVC if svc needed to be found out or import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.30,random_state=42,stratify=y)

We will take part of the data for testing and part of the data for training
<dataset name>_X_test=<dataset name>_X[-30:] for example diabetes_X_test=diabetes_X[-30:]
<dataset name>_X_training=<dataset name>_X[:-30] for example diabetes_X_training=diabetes_X[:-30]
from sklearn.neighbors import KNeighborsClassifier


<dataset name>_y_test=<dataset name>.target[-30:] for example diabetes_y_test=diabetes.target[-30:]
<dataset name>_y_training=<dataset name>.target[:-30] for example diabetes_y_training=diabetes.target
<it can be any name>=KNeighborsClassifier(n_neighbors=3)
<it can be any name>.fit(x_train,y_train)



y_predict=<it can be any name>.predict(x_test)
Step 19
Now create the statistical model
model=linear_model.LinearRegression()
from sklearn import metrics
print("Accuracy=",metrics.accuracy_score(y_test,y_predict))



<it can be any name>=KNeighborsClassifier(n_neighbors=5)
<it can be any name>.fit(x_train,y_train)
y_predict=<it can be any name>.predict(x_test)
print("Accuracy=",metrics.accuracy_score(y_test,y_predict))



<it can be any name>=KNeighborsClassifier(n_neighbors=7)
<it can be any name>.fit(x_train,y_train)
y_predict=<it can be any name>.predict(x_test)
print("Accuracy=",metrics.accuracy_score(y_test,y_predict))



###if the value of x have lot of distance between the values and when you plot a graph it will be very difficult so we will scale ###



from sklearn.preprocessing import StandardScaler



scaler=StandardScaler()
x_scaled=scaler.fit_transform(x)
x_scaled



x_train,x_test,y_train,y_test=train_test_split(x_scaled,y,test_size=0.30,random_state=42,stratify=y)



<it can be any name>=KNeighborsClassifier(n_neighbors=7)
<it can be any name>.fit(x_train,y_train)
y_predict=<it can be any name>.predict(x_test)
print("Accuracy=",metrics.accuracy_score(y_test,y_predict))



<it can be any name>=KNeighborsClassifier(n_neighbors=9)
<it can be any name>.fit(x_train,y_train)
y_predict=<it can be any name>.predict(x_test)
print("Accuracy=",metrics.accuracy_score(y_test,y_predict)))



from sklearn.model_selection import cross_val_score



neighbors=list(range(1,50,2))
cv_scores=[]
for k in neighbors:
    knn=KNeighborsClassifier(n_neighbors=k)
    scores=cross_val_score(knn,x_scaled,y,scoring='accuracy')
    cv_scores.append(scores.mean())



MSE



MSE=[1-x for x in cv_scores]



MSE



optimal_k=neighbors[MSE.index(min(MSE))]



print(optimal_k)



x_train,x_test,y_train,y_test=train_test_split(x_scaled,y,test_size=0.30,random_state=42,stratify=y)
<it can be any name>=KNeighborsClassifier(n_neighbors=25)
<it can be any name>.fit(x_train,y_train)
y_predict=<it can be any name>.predict(x_test)
print("Accuracy=",metrics.accuracy_score(y_test,y_predict))



plt.plot(neighbors,MSE)
plt.xlabel('Number of K')
plt.ylabel('Error')
plt.show()

Featured Post

Ichimoku cloud

Here how you read a ichimoku cloud 1) Blue Converse line: It measures short term trend. it also shows minor support or resistance. Its ve...