Saving CSV data to data warehouse and harvesting to CKAN
I am creating a special combine to import data from an external site into CKAN (version 1.8).
It works really well and creates metadata and associated resources. I would like to combine these resources and create a new CSV to save it to the datastore when built during the import phase.
I know I can use the DataStore API, but I would rather not use HTTP (it makes no sense for me to give the API key / user / URL / ... a harvester that has permissions to add stuff)
Can DataStore API functions be called directly from the harvester? https://github.com/okfn/ckan/blob/master/ckanext/datastore/logic/action.py
Each function takes a context parameter, which is not documented.
source to share
You have a couple of great things to do here:
- Convert CSV to appropriate python (or JSON) structure for insertion into data store
- Data Warehouse Insert
For the latter, you can use either:
- Logical actions (direct)
- DataStore API
The API just calls logical actions (plus it does auth), so they are pretty similar, but the logical approach is likely to be faster and might be more natural if you're already doing the code. However, the API can be conceptually cleaner as you have good boundaries for your different components in the form of a specific web avite.
For the former (i.e. converting CSV to JSON), we recommend using the Data Converters Library , especially the commas.py part that converts to exactly the format you want. A complete web service has been developed based on Data Converters, but not fully operational yet.
source to share
I solved it with ckanext-datastorer (for DataStore) and ckanclient (for file download)
ckanclient is listening on CKAN 1.8 because it does not handle redirects correctly. We solved it with this bleeding and dirty patch https://gist.github.com/mammadori/4945812
The best solution would be to completely remove urllib and change the whole ckanclient instead of using requests.
thanks for the support
source to share