A method for training deep networks comprises calculating (201) an intermediate weight based on an existing quantized weight and an existing quantized residual parameter, and quantizing (202) the intermediate weight to derive an updated weight. The method further comprises generating (203) an updated residual parameter using the intermediate weight and the updated weight.