Skip to content
This repository has been archived by the owner on Dec 21, 2023. It is now read-only.

Add to_dict SFrame method #2755

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions src/python/turicreate/data_structures/sframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -5881,6 +5881,17 @@ def _group(self, key_columns):
"""
gsf = GroupedSFrame(self, key_columns)
return gsf

def to_dict(self):
dict_sframe = {}
for key in self.column_names():
sa = SArray(self[key])
val_list = []
for val in sa:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm; Pandas actually seems to behave a bit differently. I'm not sure why but it gives dictionaries of dictionaries (where the inner dictionaries have numeric keys ranging from 0 to n for n rows), instead of dictionaries of lists. I suspect this is to better handle sparse dataframes? Example of this behavior:

In [23]: d = df.to_dict()

In [24]: type(d)
Out[24]: dict

In [25]: d.keys()
Out[25]: ['path', 'image']

In [26]: type(d['path'])
Out[26]: dict

In [27]: d['path'].keys()
Out[27]: 
[0,
 1,
 2,
...

In any case, we should probably preserve the same behavior here -- let's use a dict instead of a list for the inner structure, using row index as the key.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand pandas' behavior here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pandas always gives a sparse representation (even if the original data was dense) -- i.e. using dictionaries instead of lists, with row index as the dictionary key.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@znation - what do you recommend we do here? Should we modify this pull so we match pandas' behavior? Should we close the issue (#1909) as will not fix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I have an opinion on this one. I am not a fan of Pandas behavior here, but the goal of #1909 was to match Pandas behavior.

val_list.append(val)
dict_sframe.update({key:val_list})
del val_list
return dict_sframe

@property
def shape(self):
Expand Down
6 changes: 6 additions & 0 deletions src/python/turicreate/test/test_sframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -3787,6 +3787,12 @@ def test_export_empty_SFrame(self):
sf2 = SFrame.read_json(f.name)
_assert_sframe_equal(sf, sf2)

def test_to_dict(self):
sf = SFrame({'col1': [1, 2],'col2': [0.5, 0.75]})
result = sf.to_dict()
expected = {'col1': [1, 2], 'col2': [0.5, 0.75]}
self.assertEqual(result, expected)

if __name__ == "__main__":

import sys
Expand Down