Which approach is better for data integration

I see that many companies are currently using python as their ETL tool. I came from PDI (Pentaho Data Integration), SSIS and other ETL tools. How performance-efficient does Python offer compared to the above tools?

Currently My approach to data integration

  • If the source is any storage like Mysql, MSSQL, Salesforce API, Google Spreadsheet, CSV file, Nosql DB, I prefer ETL PDI tool for data integration.
  • If the source is any API like Graphana, Humanity or API, and also for a dirty data source file like CSV, then I prefer Python

Is my approach correct?

+3


source to share


2 answers


Python, especially when combined with an ORM like SQLAlchemy, seems to do the job quite well, with the advantage of simpler threading integration below pandas.



0


source


automatic integration tools like jitterbit, mulesoft are very helpful. If not, then I think python works better by taking / storing big data in a csv format like a data loader. And yes, I think you are correct in approach!



-1


source







All Articles