Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json to datastore #230

Open
aminumoha opened this issue Oct 23, 2024 · 4 comments
Open

Json to datastore #230

aminumoha opened this issue Oct 23, 2024 · 4 comments

Comments

@aminumoha
Copy link

I tried to add json as one of xloader format thinking it would be easier to parse the JSON to CSV and then follow the same procedure to send data to the datastore. Unfortunately the xloader fails to recognize the format and simply tries to load with the default CSV format -which leads to error.

@aminumoha aminumoha changed the title Json to data store Json to datastore Oct 23, 2024
@duttonw
Copy link
Collaborator

duttonw commented Oct 23, 2024

JSON data is a hard one due to the ability to have objects in objects and no idea on how the schema to flat table should be read.

Are you able to attach the JSON file example and the expected table you think it should make.

Also what do you think the requirements should be in loading into a data store.

Ie is it a list of key value items or a list of stings

@aminumoha
Copy link
Author

aminumoha commented Oct 24, 2024

Ie is it a list of key value items or a list of stings

my case is a list of of objects with key-value items which is pretty much like [{"id": 1, "name": abc,.....},.....]. So, I would like to store these objects in a datastore table just like I would to CSV file. I can see the challenges in the case of nested objects, but for flat CSV-like JSON record format, the utility of having that data in a datastore can be immense,

@duttonw
Copy link
Collaborator

duttonw commented Oct 25, 2024

So would these test cases be what your after for importing into the datastore?

[
  {"Name": "Alice", "Age": 30, "Occupation": "Engineer"},
  {"Name": "Bob", "Age": 25, "Occupation": "Designer"},
  {"Name": "Charlie", "Age": 35, "Occupation": "Manager", "Extra field": "wont be included"}
]

Which would make a csv/table like

Name, Age, Occupation
Alice, 30, Engineer
Bob, 25, Designer
Charlie, 35,Manager

i guess we should also handle

[
  ["Header1", "Header2", "Header3", "Number Header"],
  ["Cell", "Cell", "Cell", 10],
  ["Cell", "Cell", "Cell", 15],
  ["Cell", "Cell", "Cell", 20],
  ["Cell", "Cell", "Cell", 25]
]

expected output

Header1, Header2, Header3, Number Header
Cell, Cell, Cell, 10
Cell, Cell, Cell, 15
Cell, Cell, Cell, 20
Cell, Cell, Cell, 25

I'm unsure how we could handle if its wrapped in a key that holds the array as it gets tricky in programming the conversion.

@wardi
Copy link
Contributor

wardi commented Oct 25, 2024

yet another format is the one from CKAN's datastore json dump endpoint:

{
  "fields": [
    {"id": "Name", "type": "text"},
    {"id": "Age", "type": "numeric"},
    {"id": "Occupation", "type": "text"},
  ],
  "records": [
    ["Alice", 30, "Engineer"],
    ["Bob", 35, "Designer"]
  ]
}

Additional information entered into the data dictionary also appears in the "fields" dicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants