MXNet AdamW optimizer

339 Views Asked by At

Adam optimizer has flaws when used with weight decay. In 2018, AdamW optimizer has been proposed.

Is there any standard way to implement AdamW in MXNet framework (python implementation)? There is mxnet.optimizer.Adam class, but no mxnet.optimizer.AdamW one (checked in mxnet-cu102==1.6.0, mxnet==1.5.0 package versions).

P.S. I asked this questions on MXNet forum and on datascience.stackexchange.com, but to no avail.

1

There are 1 best solutions below

1
On

Short answer: There isn't a standard way to use AdamW in Gluon yet, but there is some existing work in that direction that would make that relatively easy to add.

Longer answer:

Please let me know if you get this working, as I'd love to be able to use that as well.