Using Spark, I have successfully loaded a Dataframe from a Wonderware Historian SQL DB (see this SO question).

Listing the column names:

df.columns

'TagName',
'Value',
'vValue',
'Quality', 'QualityDetail', 'OPCQuality',
'wwTagKey',
'wwRowCount',
'wwResolution', ...

I however when I execute an Action on the DataFrame an error is thrown:

df.select('TagName').head()

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 13.0 failed 4 times, most recent failure: Lost task 0.3 in stage 13.0 (TID 55) (10.252.13.132 executor driver): com.microsoft.sqlserver.jdbc.SQLServerException: An error occurred during the current command (Done status 0). Cannot execute the query "SELECT Tbl1001.TagName Col1005 FROM Runtime.dbo.History Tbl1001" against OLE DB provider "INSQL" for linked server "INSQL".

Any Action raises the same kind of error.

1

There are 1 best solutions below

0
On

Solution was to add a query:

query = "SELECT * FROM History WHERE TagName LIKE '%' AND wwRetrievalMode = 'Full'"

remote_table = (spark.read
  .format("sqlserver")
  .option("host", "mydomainname.com")
  .option("user", "myusername")
  .option("password", "mypassword")
  .option("encrypt", True)
  .option("trustServerCertificate", True)
  .option("query", query)
  .load()
)