Sign Up

Have an account? Sign In Now

Sign In

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

Sorry, you do not have a permission to ask a question, You must login to ask question.

Forgot Password?

Need An Account, Sign Up Here
Sign InSign Up

ErrorCorner

ErrorCorner Logo ErrorCorner Logo

ErrorCorner Navigation

  • Home
  • Contact Us
  • About Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Contact Us
  • About Us
Home/ Questions/Q 638
Next
Answered
Kenil Vasani
Kenil Vasani

Kenil Vasani

  • 646 Questions
  • 567 Answers
  • 77 Best Answers
  • 26 Points
View Profile
  • 4
Kenil Vasani
Asked: December 14, 20202020-12-14T21:08:10+00:00 2020-12-14T21:08:10+00:00In: Python

How to fix “ValueError: Expected 2D array, got 1D array instead” in sklearn/python?

  • 4

I there. I just started with the machine learning with a simple example to try and learn. So, I want to classify the files in my disk based on the file type by making use of a classifier. The code I have written is,

import sklearn
import numpy as np


#Importing a local data set from the desktop
import pandas as pd
mydata = pd.read_csv('file_format.csv',skipinitialspace=True)
print mydata


x_train = mydata.script
y_train = mydata.label

#print x_train
#print y_train
x_test = mydata.script

from sklearn import tree
classi = tree.DecisionTreeClassifier()

classi.fit(x_train, y_train)

predictions = classi.predict(x_test)
print predictions

And I am getting the error as,

  script  class  div   label
0       5      6    7    html
1       0      0    0  python
2       1      1    1     csv
Traceback (most recent call last):
  File "newtest.py", line 21, in <module>
  classi.fit(x_train, y_train)
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/tree/tree.py", line 790, in fit
    X_idx_sorted=X_idx_sorted)
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/tree/tree.py", line 116, in fit
    X = check_array(X, dtype=DTYPE, accept_sparse="csc")
  File "/home/initiouser2/.local/lib/python2.7/site-
packages/sklearn/utils/validation.py", line 410, in check_array
    "if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[ 5.  0.  1.].
Reshape your data either using array.reshape(-1, 1) if your data has a 
single feature or array.reshape(1, -1) if it contains a single sample.

If anyone can help me with the code, it would be so helpful to me !!

arraysnumpypythonscikit-learn
  • 1 1 Answer
  • 10 Views
  • 0 Followers
  • 0
Answer
Share
  • Facebook

    1 Answer

    • Voted
    1. Kenil Vasani

      Kenil Vasani

      • 646 Questions
      • 567 Answers
      • 77 Best Answers
      • 26 Points
      View Profile
      Best Answer
      Kenil Vasani
      2020-12-14T21:02:23+00:00Added an answer on December 14, 2020 at 9:02 pm

      When passing your input to the classifiers, pass 2D arrays (of shape (M, N) where N >= 1), not 1D arrays (which have shape (N,)). The error message is pretty clear,

      Reshape your data either using array.reshape(-1, 1) if your data has a
      single feature or array.reshape(1, -1) if it contains a single sample.

      from sklearn.model_selection import train_test_split
      
      # X.shape should be (N, M) where M >= 1
      X = mydata[['script']]  
      # y.shape should be (N, 1)
      y = mydata['label'] 
      # perform label encoding if "label" contains strings
      # y = pd.factorize(mydata['label'])[0].reshape(-1, 1) 
      X_train, X_test, y_train, y_test = train_test_split(
                            X, y, test_size=0.33, random_state=42)
      ...
      
      clf.fit(X_train, y_train) 
      print(clf.score(X_test, y_test))
      

      Some other helpful tips –

      1. split your data into valid train and test portions. Do not use your training data to test – that leads to inaccurate estimations of your classifier’s strength
      2. I’d recommend factorizing your labels, so you’re dealing with integers. It’s just easier.
      • 0
      • Share
        Share
        • Share on Facebook
        • Share on Twitter
        • Share on LinkedIn
        • Share on WhatsApp

    You must login to add an answer.

    Forgot Password?

    Sidebar

    Ask A Question
    • Popular
    • Kenil Vasani

      SyntaxError: invalid syntax to repo init in the AOSP code

      • 5 Answers
    • Kenil Vasani

      xlrd.biffh.XLRDError: Excel xlsx file; not supported

      • 3 Answers
    • Kenil Vasani

      Homebrew fails on MacOS Big Sur

      • 3 Answers
    • Kenil Vasani

      runtimeError: package fails to pass a sanity check for numpy ...

      • 3 Answers
    • Kenil Vasani

      FATAL EXCEPTION: Firebase-Messaging-Intent-Handle — java.lang.NoClassDefFoundError

      • 2 Answers

    Explore

    • Most Answered
    • Most Visited
    • Most Voted
    • Random

    © 2020-2021 ErrorCorner. All Rights Reserved
    by ErrorCorner.com