Using Fiber v2.38.1, which in turn use fasthttp/websocket v1.5.0.
App crashed with following error:
2022-10-15T04:05:42.983563+00:00 app[web.1]: time="2022-10-15T04:05:42Z" level=info msg="close msg received: &{418 close}"
2022-10-15T04:05:42.983564+00:00 app[web.1]: conn is nil: false, msg: &{418 close <nil> conn 0}
2022-10-15T04:05:42.986035+00:00 app[web.1]: panic: runtime error: invalid memory address or nil pointer dereference
2022-10-15T04:05:42.986035+00:00 app[web.1]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0x8db967]
2022-10-15T04:05:42.986036+00:00 app[web.1]:
2022-10-15T04:05:42.986037+00:00 app[web.1]: goroutine 86 [running]:
2022-10-15T04:05:42.986037+00:00 app[web.1]: github.com/fasthttp/websocket.(*Conn).beginMessage(0x0, 0xc000348f60, 0x1)
2022-10-15T04:05:42.986037+00:00 app[web.1]: /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/fasthttp/[email protected]/conn.go:479 +0x27
2022-10-15T04:05:42.986038+00:00 app[web.1]: github.com/fasthttp/websocket.(*Conn).NextWriter(0x0, 0x1)
2022-10-15T04:05:42.986038+00:00 app[web.1]: /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/fasthttp/[email protected]/conn.go:520 +0x45
2022-10-15T04:05:42.986039+00:00 app[web.1]: github.com/fasthttp/websocket.(*Conn).WriteJSON(0xe79a80?, {0xc143a0, 0xc00041e600})
2022-10-15T04:05:42.986039+00:00 app[web.1]: /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/fasthttp/[email protected]/json.go:24 +0x45
2022-10-15T04:05:42.986039+00:00 app[web.1]: github.com/kuchaguangjie/go-fit/model.(*WsClient).WriteMsg(0xc00056a280, 0xc00041e600)
2022-10-15T04:05:42.986040+00:00 app[web.1]: /tmp/build_0b2fe533/model/ws_model.go:97 +0xf9
2022-10-15T04:05:42.986040+00:00 app[web.1]: github.com/kuchaguangjie/go-fit/controller.(*ContentController).Ws.func1.2()
2022-10-15T04:05:42.986041+00:00 app[web.1]: /tmp/build_0b2fe533/controller/content_ctl.go:623 +0x171
2022-10-15T04:05:42.986041+00:00 app[web.1]: created by github.com/kuchaguangjie/go-fit/controller.(*ContentController).Ws.func1
2022-10-15T04:05:42.986042+00:00 app[web.1]: /tmp/build_0b2fe533/controller/content_ctl.go:608 +0x10cc
2022-10-15T04:05:43.113930+00:00 heroku[web.1]: Process exited with status 2
It crashed at conn.go:479, in function beginMessage():
if c.writer != nil {
I've checked that c is not nil (as shown in log), so how it crashed ...
BTW, it's deployed on heroku, I saw following log on heroku:
2022-10-15T04:59:30.344791+00:00 heroku[router]: at=error code=H15 desc="Idle connection"
Not sure is it relevent.
Update: relevant code
- A separate goroutine that handle close msg (part of
content_ctl.go):
// handle close - from a channel,
// TODO: use goroutine pool ?
go func() {
if r := recover(); r != nil { // TODO: is this proper ?,
fmt.Printf("recover from panic: %v\n", r)
}
closeMsg := <-(*client).CloseChan // handle close,
log.Infof("close msg received: %v", closeMsg)
// send close msg,
closeResp := &model.WsResp{
Status: closeMsg.HttpCode,
Source: model.RespSourceConn,
Msg: closeMsg.Msg,
}
log.Infof("is conn nil: %v, msg: %v\n", client.WsConn == nil, closeResp)
err = client.WriteMsg(closeResp)
if err != nil {
log.Errorf("error - on write close msg: %v", closeResp)
}
ctl.cwm.CloseAndCleanup(client.Id) // close & clean up,
}()
- Definition of
WsClientstruct, and its methodWriteMsg()(part ofws_model.go):
// websocket client,
type WsClient struct {
WsUser
ContentUuid string `json:"contentUuid"` // content uuid id,
WsConn *websocket.Conn `json:"-"` // websocket connection,
CloseChan chan *CloseMsg `json:"-"` // close channel,
WriteLock *sync.Mutex // write lock, to avoid concurrent write to ws connection, which will cause panic, acquire lock when write or close,
}
// write ws msg, with lock,
func (wsc *WsClient) WriteMsg(msg *WsResp) error {
if wsc.WsConn == nil {
return errutil.ErrNoConn
}
wsc.WriteLock.Lock()
err := wsc.WsConn.WriteJSON(msg)
wsc.WriteLock.Unlock()
return err
}
CloseMsgstruct (part ofws_model.go):
// close msg sent to close chan,
type CloseMsg struct {
HttpCode int `json:"httpCode"`
Msg string `json:"msg"`
}
Problem status:
- Actually, before asking the question I've solved the problem by checking the
httpCodein theCloseMsg, if it's418, then I won't send the close msg. - Though the issue is gone for now, but still I want to know the actual reason of the problem, so I asked the question.
Some background:
- I've several possible value for
CloseMsg'sHttpCode.418is the default one, it happens only when unexpected io issue happen on the connection (e.g heroku terminate the connection due to idle ?).- For other cases, e.g if I want to kick out a connection, then I will send other http code.
- From the log, I found the problem happens only when the http code from close msg is
418. - BTW, though
418might not be the best http code to use, (it's from a joke), but that doesn't effect the problem I guess. - And, In
WsClient'sWriteMsg()method, I've always checked whether the conn nil, before send msg.
Update - more log from heroku
2022-10-16T05:19:03.088622+00:00 heroku[router]: at=error code=H15 desc="Idle connection" method=GET path="/api/v1/content/ws/RgPx8znal7AJb
2022-10-16T05:19:03.065493+00:00 app[web.1]: time="2022-10-16T05:19:03Z" level=info msg="error - on read: websocket: close 1006 (abnormal closure): unexpected EOF"
2022-10-16T05:20:57.758379+00:00 app[web.1]: time="2022-10-16T05:20:57Z" level=info msg="cleanup client, id = 'wscid-ee670cc5-4100-49d6-9857-8284d93a6d33'"
2022-10-16T05:20:57.758505+00:00 app[web.1]: time="2022-10-16T05:20:57Z" level=info msg="close msg received: &{418 close}"
Seems heroku will kill the websocket connection if there is no msg for a while. That's when 418 appear.
And, I've asked a question about heroku killing the websocket connection on SO: heroku keep killing idle connections (websocket)
This is a fasthttp bug, and fixed in
v1.5.1-rc.8refer: https://github.com/fasthttp/websocket/issues/30#issuecomment-1326860975