I am trying to get a new dataset, or change the value of the current dataset columns to their unique values.
Here is an example of what I am trying to get :
A B
-----
0| 1 1
1| 2 5
2| 1 5
3| 7 9
4| 7 9
5| 8 9
Wanted Result Not Wanted Result
A B A B
----- -----
0| 1 1 0| 1 1
1| 2 5 1| 2 5
2| 7 9 2|
3| 8 3| 7 9
4|
5| 8
I don’t really care about the index but it seems to be the problem.
My code so far is pretty simple, I tried 2 approaches, 1 with a new dataFrame and one without.
#With New DataFrame
def UniqueResults(dataframe):
df = pd.DataFrame()
for col in dataframe:
S=pd.Series(dataframe[col].unique())
df[col]=S.values
return df
#Without new DataFrame
def UniqueResults(dataframe):
for col in dataframe:
dataframe[col]=dataframe[col].unique()
return dataframe
I have the error “Length of Values does not match length of index” both times.
Kenil Vasani
The error comes up when you are trying to assign a list of numpy array of different length to a data frame, and it can be reproduced as follows:
A data frame of four rows:
Now trying to assign a list/array of two elements to it:
Both errors out:
Because the data frame has four rows but the list and array has only two elements.
Work around Solution (use with caution): convert the list/array to a pandas Series, and then when you do assignment, missing index in the Series will be filled with NaN:
For your specific problem, if you don’t care about the index or the correspondence of values between columns, you can reset index for each column after dropping the duplicates: