Incremental/delta data export using command line data loader

A few months back, we got a requirement for hourly incremental/delta export of data from Salesforce. There was a restriction that we didn't want to purchase any other licensed ETL tool, so we decided to use the command line data loader using that export can be scheduled also.

Command line data loader requires SOQL to export the data, so to export the data hourly we scheduled the process using windows scheduler for every hour, but the main challenge here was to write a dynamic SOQL that would apply a filter to extract the records for those created date is greater than last 1 hour. Salesforce does not provide any SOQL datetime literal like LAST_1_Hour or LAST_N_HOUR.

There are 2 methods to accomplish the requirement. First is using powershell script and the second is using SOQL itself.

Method 1 - PowerShell Script

In command line data loader, process-conf.xml contains the SOQL, so in case of an hourly incremental data extract SOQL will be like

SELECT Id, Name FROM Contact WHERE CreatedDate > 2018-06-28T08:00:00.000Z AND CreatedDate <= 2018-06-28T09:00:00.000Z

But above SOQL is only for the extract that will run at 09:00, we need to write a SOQL that is dynamic and put the correct datetime in where clause for every run.

Neither command line data loader nor SOQL provides any feature to achieve above requirement. So, to achieve this you have to follow below steps:
 
   Step 1 - Create a process-conf-template.xml file with same content as process-conf.xml but instead of putting any hard coded date or literal in the SOQL put placeholders like below.
           
  SELECT Id, Name FROM Contact WHERE CreatedDate > #LAST_EXTRACT_RUN_DATE_TIME# AND CreatedDate <= #CURRENT_DATE_TIME#

   Step 2 - Now, create a bat file (create-process-config.bat) with below powershell script that will replace the "#LAST_EXTRACT_RUN_DATE_TIME#" and "#CURRENT_DATE_TIME#" placeholder with actual date/time and will put in the process-conf.xml that will be used in the actual data loader process.
           
 powershell.exe -Command "Set-Content -Path "E:\CommandLine\Script\currentdatetime.txt" -Value $(Get-Date).ToString('yyyy-MM-ddTHH:mm:ss.000Z')"

 powershell.exe -Command "Get-Content E:\CommandLine\Script\process-conf-template.xml | Foreach-Object{$_ -replace '#LAST_EXTRACT_RUN_DATE_TIME#',(Get-Content E:\CommandLine\Script\lastrundatetime.txt)} | Foreach-Object{$_ -replace '#CURRENT_DATE_TIME#',(Get-Content E:\CommandLine\Script\currentdatetime.txt)} | set-content E:\CommandLine\Config\process-conf.xml"

 powershell.exe -Command "Set-Content -Path "E:\CommandLine\Script\lastrundatetime.txt" -Value ((Get-Content E:\CommandLine\Script\currentdatetime.txt))"

  In the above powershell script, you can replace your file path in place of "E:\CommandLine\Script\" and "E:\CommandLine\Config\".

  Step 3 - Now, create-process-config.bat must execute before the actual data loader process runs. So, create a new bat file (mainProcess.bat) that will run the create-process-config.bat first and after that it will run the data extract. Put below code in the mainProcess.bat file:
             
                echo Creating process config file
                call E:\CommandLine\Script\create-process-config.bat
                echo Initiating data extract
                call E:\CommandLine\Script\DataExtract.bat

  Step 4 (Optional) - Additionally, if you want to add datetime stamp in the file that is extracted, you can write below script in a bat file (renamefile.bat) and call that file in the end of mainProcess.bat:
             
               echo Initiating File Renaming
               set d=%date:~-4,4%%date:~-7,2%%date:~0,2%
               set d=%d: =_%
               set t=%time:~0,2%%time:~3,2%%time:~6,2%
               set t=%t: =0%
               RENAME "<put file path here>DataExtract.csv" "DataExtract_%d%_EST_%t%.csv"

   Step 5 - Now, schedule the mainProcess.bat file using windows scheduler. This method is dynamic, so it will run for any frequency, so it can be scheduled for every 5 minutes, hourly or any given frequency because it will put last data extract run time and current time in the SOQL.

Method 2 - SOQL

In this case following queries can be used for different scenarios:
  • Scenario 1 - If daily export is required to run at 18:00 (6:00 PM) daily to export the data that is created after 6:00 PM yesterday. 
                 To achieve this, schedule the process to run daily at 18:00 with following SOQL:
                      
                 SELECT Id, Name, OwnerId FROM sObjectName WHERE (CreatedDate = YESTERDAY AND HOUR_IN_DAY(CreatedDate) >= 18) OR (CreatedDate = TODAY AND HOUR_IN_DAY(CreatedDate) < 18)

          Above scenario can be implemented to export the data daily at N:00 hour.
  • Scenario 2 - If daily export is required for every hour. 
                To achieve this, schedule the process for every hour with following SOQL:
                 
                 SELECT Id, Name, OwnerId FROM sObjectName WHERE CreatedDate = TODAY AND HOUR_IN_DAY(CreatedDate) >= N-1 AND HOUR_IN_DAY(CreatedDate) < N

                 In the above query replace N with the time when you are scheduling the process, so you have to schedule the process 24 times for daily. See below examples:
      • Schedular at 9:00 AM will export the data for last 1 hour using below SOQL:
                              SELECT Id, Name, OwnerId FROM sObjectName WHERE CreatedDate = TODAY AND HOUR_IN_DAY(CreatedDate) >= 8 AND HOUR_IN_DAY(CreatedDate) < 9
      • Schedular at 10:00 PM (22:00) will export the data for last 1 hour using below SOQL:
                              SELECT Id, Name, OwnerId FROM sObjectName WHERE CreatedDate =   TODAY AND HOUR_IN_DAY(CreatedDate) >= 21 AND HOUR_IN_DAY(CreatedDate) < 22

Now using the above methods you can get the incremental/delta export of data from Salesforce, the preferred method should be the powershell script one.

Happy Coding!!

Comments

Popular posts from SFDC Drona

Retrieve IP Address for the Guest Users in Lightning Community

Email Alert to Parent object's email field without apex