Copy file content from Azure Storage to Azure SQL Db using Azure Factory data

First poster, long time reader.

The third-party provider uploads CSV files once a day to Azure Blob shared storage. The files are prefixed with a timestamp in the filename and are located in the same directory. FI "dw_palkkatekijat_20170320T021" Each file will have all the data from the previous one, plus the recently added data from the previous day. I would like to import all rows from all files into a SQL table in an Azure SQL Database. I can do this.

The problem is that I don't know how to add the filename to a separate column in the table, so I can separate the file that the rows came from and only use the newest rows. I need to import all the contents of the files and store all the "versions" of those files. Is there a way to send the filename as a parameter to a SQL stored procedure? Or is there any alternative way to deal with this problem?

Thank you for your help.

+3


source to share


1 answer


In the current situation you described, you will not be able to get the exact file name. ADF is not a data transformation service, so it doesn't give you this layer functionality ... I wish it were done!

However, there are several options for getting the filename or something similar to use. None of them that I accept are perfect!

Option 1 (Best option I think!)

As you requested. Pass the parameter to the SQL DB stored procedure. This is certainly possible with the ADF activity parameter attribute.

What to pass as a parameter? ...

It's good if your blob source files have a nice date and time in the filename. This is what you already use in your input dataset definition, then pass that to the process. Store it in a SQL DB table. Then you can work when the file was loaded and when and the overlap period. May be?

You can access the start of the time slice for the dataset in action. JSON example ...

    "activities": [
        {
            "name": "StoredProcedureActivityTemplate",
            "type": "SqlServerStoredProcedure",
            "inputs": [
                {
                    "name": "BlobFile"
                }
            ],
            "outputs": [
                {
                    "name": "RelationalTable"
                }
            ],
            "typeProperties": {
              "storedProcedureName": "[dbo].[usp_LoadMyBlobs]",
              "storedProcedureParameters": {
                  //like this:
                  "ExactParamName": "$$Text.Format('{0:yyyyMMdd}', Time.AddMinutes(SliceStart, 0))" //tweak the date format
              }
            }, //etc ....

      

Option 2 (Effort load)

Create a middle person ADF custom activity that reads the file plus the filename and adds the value as a column.



Custom Actions in ADF basically give you the ability to do whatever you have to handle the data transformation behavior in C #.

I would recommend finding out what's involved with using custom actions if you want to go down that route. It takes more effort and requires Azure Batch.

Option 3 (Total overkill)

Use Azure Data Lake Analytics Service! Applying the same approach as option 2. Use USQL in the data lake to parse the file and provide the filename in the output dataset. In USQL, you can pass a wildcard for the filename as part of the extractor and use it in the output dataset.

I mark this as overkill because the bolt on the full data lake service just to read the file is overkill. In reality, lake data can probably replace your SQL DB layer and give you filename conversion for free.

By the way, you don't need to use Azure Data Lake to store the source files. You can give Analytics access to an existing shared blob storage account. But you only need analytics support.

Option 4

Reimagine and use Azure Data Lake instead of Azure SQL DB

Hope it helps

+2


source







All Articles