gocb: bulk insert into couchbase using golang- entire data is not being inserted

903 Views Asked by At

I am creating JSON Data (approx. 5000 records) in my SQL server instance and trying to Insert it into couchbase bucket using bulk insert operation in golang. The problem here is that entire data is not being pushed and a random number of records (between 2000 to 3000) are being insert only.

The code is:

package main

import (
    "database/sql"
    "log"
    "fmt"
    _ "github.com/denisenkom/go-mssqldb"
    "gopkg.in/couchbase/gocb.v1"
)


func main() {
    var (
        ID string
        JSONData string
    )

    var items []gocb.BulkOp      
    cluster, _ := gocb.Connect("couchbase://localhost")
    bucket, _ := cluster.OpenBucket("example", "")

    condb, _ := sql.Open("mssql", "server=.\\SQLEXPRESS;port=62587; user id=<id>;password=<pwd>;")

    // Get approx 5000 Records From SQL Server in JSON format
    rows, err = condb.Query("Select id, JSONData From User")
    if err != nil {
        log.Fatal(err)
        err = nil
    }

    for rows.Next() {
        _ = rows.Scan(&ID,&JSONData)
        items = append(items, &gocb.UpsertOp{Key: ID, Value: JSONData})
    }

    //Bulk Load JSON into Couchbase
    err = bucket.Do(items)
    if err != nil {
        fmt.Println("ERRROR PERFORMING BULK INSERT:", err)
    }

    _ = bucket.Close() 
}

Please tell me where I went wrong here.

FYI the columns ID and JSONdata in sql query contain valid key and JSON strings. Also, any improvement advice in the the way its coded will be appreciated.

2

There are 2 best solutions below

0
On BEST ANSWER

I missed checking the Err field of InsertOp type and when I did that, I came to know that the items array overflows when the data exceeds it's capacity and a message 'queue overflowed' shows on the screen when you print that field

for i := range items {
    fmt.Println( items[i].(*gocb.InsertOp).Err)
}

Attatched screenshot of the error message is here: Err.png

Is there any workaround for this limitation apart from splitting the data into a number of batches and performing multiple bulk inserts?

4
On

Why not try using a number of goroutines and a channel to synchronize them. Create a channel of items that need to be inserted, and then start 16 or more goroutines which read form the channel, perform the insert and then continue. The most common obvious bottleneck for a strictly serial inserter is going to be the network round-trip, if you can have many goroutines performing inserts at once, you will vastly improve the performance.

P.S. The issue with bulk insert not inserting every document is a strange one, I am going to take a look into this. As @ingenthr mentioned above though, is it possible that you are doing upsert's and have multiple operations for the same keys?

Old Question, In the Answers section in error: Are you getting any error outputs from the bulk insert?