Mitigating catastrophic forgetting remains a fundamental challenge in incremental learning. This paper identifies a key limitation of the widely used softmax cross-entropy loss: the non-identifiability inherent in the standard softmax cross-entropy distillation loss. To address this issue, we propose two complementary strategies: (1) adopting an imbalance-invariant distillation loss to mitigate the adverse effect of imbalanced weights during distillation, and (2) regularizing the original prediction/distillation loss with shift-sensitive alternatives, which render the optimization problem identifiable and proactively prevent imbalance from arising. These strategies form the foundation of five novel approaches that can be seamlessly integrated into existing distillation-based incremental learning frameworks such as LWF, LWM, and LUCIR. We validate the effectiveness of our approaches through extensive numerical experiments, demonstrating consistent improvements in predictive accuracy and substantial reductions in forgetting. For example, in a 10-task incremental learning setting on CIFAR-100, our methods improve the average accuracy of three widely used approaches - LWF, LWM, and LUCIR - by 11.8 %, 11.5 %, and 12.8 %, respectively, while reducing their average forgetting rates by 16.5 %, 16.8 %, and 13.8 %, respectively. Our code is publicly available at https://github.com/nexais/RethinkSoftmax.