I there. I just started with the machine learning with a simple example to try and learn. So, I want to classify the files in my disk based on the file type by making use of a classifier. The code I have written is,
import sklearn import numpy as np #Importing a local data set from the desktop import pandas as pd mydata = pd.read_csv('file_format.csv',skipinitialspace=True) print mydata x_train = mydata.script y_train = mydata.label #print x_train #print y_train x_test = mydata.script from sklearn import tree classi = tree.DecisionTreeClassifier() classi.fit(x_train, y_train) predictions = classi.predict(x_test) print predictions
And I am getting the error as,
script class div label 0 5 6 7 html 1 0 0 0 python 2 1 1 1 csv Traceback (most recent call last): File "newtest.py", line 21, in <module> classi.fit(x_train, y_train) File "/home/initiouser2/.local/lib/python2.7/site- packages/sklearn/tree/tree.py", line 790, in fit X_idx_sorted=X_idx_sorted) File "/home/initiouser2/.local/lib/python2.7/site- packages/sklearn/tree/tree.py", line 116, in fit X = check_array(X, dtype=DTYPE, accept_sparse="csc") File "/home/initiouser2/.local/lib/python2.7/site- packages/sklearn/utils/validation.py", line 410, in check_array "if it contains a single sample.".format(array)) ValueError: Expected 2D array, got 1D array instead: array=[ 5. 0. 1.]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
If anyone can help me with the code, it would be so helpful to me !!