Import Excel file into MSSQL via Python, using an SQL Agent job

635 Views Asked by At

Task specs:

Import Excel file(s) into MSSQL database(s) using Python, but in a parametrized manner, and using SQL Server Agent job(s). With the added requirement to set parameter values and/or run the job steps from SQL (query or SP). And without using Access Database Engine(s) and/or any code that makes use of such drivers (in any wrapping).

1

There are 1 best solutions below

0
On

First. Let's get some preparatory stuff out of the way. We will need to set some PowerShell settings. Run windows PowerShell as Administrator and do:

Set-ExecutionPolicy -ExecutionPolicy Unrestricted

Second. Some assumptions for reasons of clarity. And those are: 1a. You have at least one instance of SQL2017 or later (Developer / Enterprise / Standard edition) installed and running on your machine. 1b. You have not bootstrapped the installation of this SQL instance so as to exclude Integration Services (SSIS). 1c. There exists SQL Server Agent running, bound to this SQL instance. 1d. You have some SSMS installed. 2a. There is at least one database attached to this instance (if not create one – please refrain from using in-memory filegroups for this exercise, I have not tested on those). 2b. There are no database level DML triggers that log all data changes in a designated table. 3. There is no active Server Audit Specification for this database logging everything we do. 4. Replication is not enabled (I mean the proper MSSQL replication feature not like scripts by 3rd party apps). For 2b and 3 it's just cause I have not tested this with those on, but for number 4 it defo won't work with that on. 5. You are windows authenticated into the chosen SQL instance and your instance login and db mappings and privileges are sufficient for at least table creation and basic stuff.

Third. We are going to need some kind of Python script to do this right? Ok let's make one.

import pandas as pd
import sqlalchemy as sa
import urllib
import sys
import warnings
import os
import re
import time

#COMMAND LINE PARAMETERS

server = sys.argv[1]
database = sys.argv[2]
ExcelFileHolder = sys.argv[3]
SQLTableName = sys.argv[4]

#END OF COMMAND LINE PARAMETERS

excel_sheet_number_left_to_right = 0

warnings.filterwarnings('ignore')

driver = "SQL Server Native Client 11.0"
params = "DRIVER={%s};SERVER=%s;DATABASE=%s;Trusted_Connection=yes;QuotedID=Yes;" % (driver, server, database) #added the explicit "QuotedID=Yes;" to ensure no issues with column names
params = urllib.parse.quote_plus(params) #urllib.parse.quote_plus for Python 3
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s?charset=utf8" % params) #charset is cool to have here
conn = engine.connect()

def execute_sql_trans(sql_string, log_entry):
    with conn.begin() as trans:
        result = conn.execute(sql_string)
    if len(log_entry) >= 1:
        log.write(log_entry + "\n")
    return result

excelfilesCursor = {}

def process_excel_file(excelfile, excel_sheet_name, tableName, withPyIndexOrSQLIndex, orderByCandidateFields):
    withPyIndexOrSQLIndex = 0

    excelfilesCursor.update({tableName: withPyIndexOrSQLIndex})

    df = pd.read_excel(open(excelfile,'rb'), sheet_name=excel_sheet_name)
    now = time.time()
    mlsec = repr(now).split('.')[1][:3]
    log_string = "Reading file \"" + excelfile + "\" to memory: " + str(time.strftime("%Y-%m-%d %H:%M:%S.{} %Z".format(mlsec), time.localtime(now))) + "\n"
    print(log_string)
    
    df.to_sql(tableName, engine, if_exists='replace', index_label='index.py')

    now = time.time()
    mlsec = repr(now).split('.')[1][:3]
    log_string = "Writing file \"" + excelfile + "\", sheet " +str(excel_sheet_name)+ " to SQL instance " +server+ ", into ["+database+"].[dbo].["+tableName+"]: " + str(time.strftime("%Y-%m-%d %H:%M:%S.{} %Z".format(mlsec), time.localtime(now))) + "\n"
    print(log_string)

def convert_datetimes_to_dates(tableNameParam):
    sql_string = "exec [convert_datetimes_to_dates] '"+tableNameParam+"';"
    execute_sql_trans(sql_string, "")

process_excel_file(ExcelFileHolder, excel_sheet_number_left_to_right, SQLTableName, 0, None)

sys.exit(0)

Ok you may or may not notice that my script contains some extra defs, I sometimes use them for convenience you may as well ignore them. Save the python script somewhere nice say C:\PythonWorkspace\ExcelPythonToSQL.py Also, needless to mention that you will need some py modules in your venv. The ones you don't already have you need to pip install them obviously.

Fourth. Connect to your db, SSMS, etc. and create a new Agent job. Let's call it "ExcelPythonToSQL". New step, let's call it "PowerShell parametrized script". Set the Type to PowerShell. And place this code inside it:

$pyFile="C:\PythonWorkspace\ExcelPythonToSQL.py"
$SQLInstance="SomeMachineName\SomeNamedSQLInstance"
#or . or just the computername or localhost if your SQL instance is a default instance i.e. not a named one.
$dbName="SomeDatabase"
$ExcelFileFullPath="C:\Temp\ExampleExcelFile.xlsx"
$targetTableName="somenewtable"

C:\ProgramData\Miniconda3\envs\YOURVENVNAMEHERE\python $pyFile $SQLInstance $dbName $ExcelFileFullPath $targetTableName

Save the step and the job.

Now let's wrap it around something easier to handle. Because remember, this job and step is not like an SSIS step where you could potentially alter the parameter values in its configuration tab. You don't want to properties the job and the step each time and specify different excel file or target table. So.

Ah also, do me a solid and do this little trick. Do a small alteration in the code, anything and then instead of OK do a Script to New Query Window. That way we can capture the guid of the job without having to query for it.

So now. Create a SP like so:

use [YourDatabase];
GO

create proc [ExcelPythonToSQL_amend_job_step_params](   @pyFile nvarchar(max),
                                                        @SQLInstance nvarchar(max),
                                                        @dbName nvarchar(max),
                                                        @ExcelFileFullPath nvarchar(max),
                                                        @targetTableName nvarchar(max)='somenewtable'
)
as
begin

declare @sql nvarchar(max);

set @sql = '
exec msdb.dbo.sp_update_jobstep @job_id=N''7f6ff378-56cd-4a8d-ba40-e9057439a5bc'', @step_id=1,
        @command=N''
$pyFile="'+@pyFile+'"
$SQLInstance="'+@SQLInstance+'"
$dbName="'+@dbName+'"
$ExcelFileFullPath="'+@ExcelFileFullPath+'"
$targetTableName="'+@targetTableName+'"

C:\ProgramData\Miniconda3\envs\YOURVENVGOESHERE\python $pyFile $SQLInstance $dbName $ExcelFileFullPath $targetTableName''
';
--print @sql;
exec sp_executesql @sql;

end

But inside it you must replace 2 things. One, the global uniqueidentifier for the Agent job that you found by doing the trick I described earlier, yes the one with the script to new query window. Two, you must fill in the name of your Python venv replacing the word YOURVENVGOESHERE in the code. Cool.

Now, with a simple script we can play-test it. Let's have in a new query window something like this:

use [YourDatabase];
GO

--to set parameters
exec [ExcelPythonToSQL_amend_job_step_params]   @pyFile='C:\PythonWorkspace\ExcelPythonToSQL.py',
                                                @SQLInstance='.',
                                                @dbName='YourDatabase',
                                                @ExcelFileFullPath='C:\Temp\ExampleExcelFile.xlsx',
                                                @targetTableName='somenewtable';

--to execute the job
exec msdb.dbo.sp_start_job N'ExcelPythonToSQL', @step_name = N'PowerShell parametrized script';

--let's test that the table is there and drop it.
if object_id('YourDatabase..somenewtable') is not null
begin
    select 'Table was here!' [test: table exists?];
    drop table [somenewtable];
end
else select 'NADA!' [test: table exists?];

You can run the set parameters part, then the execution, carefull to then wait a little bit like a few seconds, calling the sp_start_job like in this script is asynchronous. And then run the test script to clean up and make sure it had gone in.

That's it. Obviously lots of variations are possible. Like in the job step, we could instead call a batch file, we could call a powershell .ps1 file and have the parameters in there, lots and lots of other ways of doing it. I merely described one in this post.