I am trying to load a CSV file from S3 to the apache pinot table. One column of data has a semicolon in the CSV file as I highlighted: TestCSV; displayType
I am getting the below error while loading this data to the pinot table: java.lang.IllegalArgumentException: Cannot read single-value from Object[] : [TestCSV, displayType]
I noticed from the error that the semicolon in the data is converted to comma, so it's throwing the above error.
Here I have added the sample CSV data for reference:
| column1 | column2 | column3 | column4 | column5 |
|---|---|---|---|---|
| 925aa-1 | 00d925 | TestCSV; displayType | testbox | sample.com |
Also, here I have listed what I have provided in jobSpec.yml file:
executionFrameworkSpec:
name: 'standalone'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: 's3://********/******/******/'
includeFileNamePattern: 'glob:**/*.csv'
outputDirURI: 's3://********/******/******/segments'
overwriteOutput: true
pinotFSSpecs:
- scheme: s3
className: org.apache.pinot.plugin.filesystem.S3PinotFS
configs:
region: us-east-1
- scheme: file
className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
dataFormat: 'csv'
className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
configs:
fileFormat: 'csv'
delimiter: ','
tableSpec:
tableName: 'testload'
schemaURI: 'http://localhost:9000/tables/testload/schema'
tableConfigURI: 'http://localhost:9000/tables/testload'
pinotClusterSpecs:
- controllerURI: 'http://localhost:9000'
pushJobSpec:
# pushAttempts: number of attempts for push job, default is 1, which means no retry.
pushAttempts: 2
# pushRetryIntervalMillis: retry wait Ms, default to 1 second.
pushRetryIntervalMillis: 1000
I want to load the data with a semicolon. Can anyone help me with this?
Note: Data got loaded without issues after removing a semicolon.