BCP/ Bulk Insert Fails (tab delimited file)

22.8k Views Asked by At

I have been trying to import data (tab delimited) into SQL server. The source data is exported from IBM Cognos. Data can be downloaded from: sample data

I have tried BCP / Bulk Insert, but it did not help. The original datafile contains a header row (which needs to be skipped).

================================== Schema:

CREATE TABLE [dbo].[DIM_Assessment](
[QueryType] [nvarchar](4000) NULL,
[QueryDate] [nvarchar](4000) NULL,
[APUID] [nvarchar](4000) NULL,
[AssessmentID] [nvarchar](4000) NULL,
[ICDCode] [nvarchar](4000) NULL,
[ICDName] [nvarchar](4000) NULL,
[LoadDate] [nvarchar](4000) NULL
) ON [PRIMARY]
GO

============================= Format File generated using the following command

bcp [dbname].dbo.dim_assessment format nul -c -f C:\config\dim_assessment.Fmt -S <IP>  -U sa -P Pwd

Content of the format file:

11.0
7
1       SQLCHAR             0       8000    "\t"     1     QueryType                    SQL_Latin1_General_CP1_CI_AS
2       SQLCHAR             0       8000    "\t"     2     QueryDate                    SQL_Latin1_General_CP1_CI_AS
3       SQLCHAR             0       8000    "\t"     3     APUID                        SQL_Latin1_General_CP1_CI_AS
4       SQLCHAR             0       8000    "\t"     4     AssessmentID                 SQL_Latin1_General_CP1_CI_AS
5       SQLCHAR             0       8000    "\t"     5     ICDCode                      SQL_Latin1_General_CP1_CI_AS
6       SQLCHAR             0       8000    "\t"     6     ICDName                      SQL_Latin1_General_CP1_CI_AS
7       SQLCHAR             0       8000    "\r\n"   7     LoadDate                     SQL_Latin1_General_CP1_CI_AS

=============================

I tried importing data using BCP / Bulk Insert, however, non of them worked.

bcp [dbname].dbo.dim_assessment IN C:\dim_assessment.dat -f C:\config\dim_assessment.Fmt -S <IP>  -U sa -P Pwd

BULK INSERT dim_assessment FROM '\\dbserver\DIM_Assessment.dat'
WITH (
  DATAFILETYPE = 'char',
  FIELDTERMINATOR = '\t',
  ROWTERMINATOR = '\r\n'
);
GO

Thank you in advance for your help@

4

There are 4 best solutions below

5
On

Just Saved your file as .CSV and bulk inserted with the following statement.

BULK INSERT dim_assessment FROM 'C:\Blabla\TestFile.csv'
WITH (
  FIRSTROW = 2,
  FIELDTERMINATOR = ',',
  ROWTERMINATOR = '\n'
);
GO

Returned Message

(22587 row(s) affected)

Loaded Data

enter image description here

Just notice that some data from ICD name has overflown into LoadDate Column, Just use the | pipe character to deliminate and use the same bulk insert statement with FIELDTERMINATOR = '|' and happy days .

1
On

Your input file is in a terrible format.

Your format file and your BULK INSERT command both state that the end of a row should be a carriage return and line feed combination, and that there are seven columns of data. However if you open your CSV file in Notepad you will quickly see that the carriage returns and line feeds are not observed correctly in Windows (meaning they must be something other than precisely \r\n). You can also see that there aren't actually seven columns of data, but five:

QueryType   QueryDate   APUID   AssessmentID    ICDCode ICDName LoadDate
PPIC    2013-11-20 10:23:14 11431   10963       Tremors
PPIC    2013-11-20 10:23:14 11431   11299       THUMB PAIN
PPIC    2013-11-20 10:23:14 11431   11348       Environmental allergies
...

Just looking at it visually you can tell it isn't right, and you need to get a better source file before throwing it over the wall at SQL Server and expecting it to handle it smoothly:

enter image description here

8
On

Opening the file via Excel shows the following:

  • There are indeed 7 row headers
  • Only the first six of them are populated
  • Columns 1, 2 and 3 hold identical values
  • There is some confusing data, where the fifth column can be either empty, or filled with numbers, or filled with text.

I guess that, in these conditions, bulk insert might not work properly. As Excel seems to manage your file in quite a clean way, you should think about an extra step, from CSV to Excel and then to your database.

enter image description here

0
On

Ok, so, this was a seemingly simple task to push delimited data from flat-file to SQL server. I thought BCP was the way to go (i used it earlier and was successful).

A quick rundown of what was suggested: a. fix the source file b. saving source data in native excel format c. saving source data as pipe-delimited data

I tried all the options, but, it was adding multiple steps to my process, but was do-able.

I stumbled upon invoke-sqlcmd & import-csv commandlets from powershell. Turns out, I can import the data using powershell directly. it is a bit slow at this time, but, i can live with that for now.

$DATA=IMPORT-CSV dim_assessment.CSV -Delimiter "`t"

FOREACH ($LINE in $DATA)

{
$QueryType="`'"+$Line.QueryType+"`'"
$QueryDate="`'"+$Line.QueryDate+"`'"
$APUID="`'"+$Line.APUID+"`'"
$AssessmentID="`'"+$Line.AssessmentID+"`'"
$ICDCode="`'"+$Line.ICDCode+"`'"
    $ICDName=$Line.ICDName
    $ICDName = $ICDName.replace("'","''")
$ICDName="`'"+$ICDName+"`'" 
$LoadDate="`'"+$Line.LoadDate+"`'"

$SQLHEADER="INSERT INTO [dim_assessment] ([QueryType],[QueryDate],[APUID],[AssessmentID],[ICDCode],[ICDName],[LoadDate])"
$SQLVALUES="VALUES ($QueryType,$QueryDate,$APUID,$AssessmentID,$ICDCode,$ICDName,$LoadDate)"

$SQLQUERY=$SQLHEADER+$SQLVALUES
Invoke-Sqlcmd –Query $SQLQuery -ServerInstance HA -U sa -P Pwd

}

Thanks for all your help!