We investigate the learning dynamics of fully-connected neural networks through the lens of the neural gradient signal-to-noise ratio (SNR), examining the behavior of first-order optimizers in non-convex objectives. By interpreting the drift/diffusion phases as proposed in the information bottleneck theory, we identify a third phase termed "diffusion equilibrium" (DE), a stable training phase characterized by highly-ordered neural gradients across the sample space. This phase is marked by an abrupt (first-order) transition, where sample-wise gradients align (SNR increases), and stable optimizer convergence. Moreover, we find that when homogeneous residuals are also met across the sample space during the DE phase, this leads to better generalization, as the optimization steps are equally sensitive to each sample. Based on this observation, we propose a sample-wise re-weighting scheme, which considerably improves the residual homogeneity and generalization in quadratic loss functions, by targeting the problematic samples with large residuals and vanishing gradients. Finally, we explore the information compression phenomenon, pinpointing a significant saturation-induced compression of activations at the DE phase transition, driven by the sample-wise gradient directional alignment. Interestingly, it is during the saturation of activations that the model converges, with deeper layers experiencing negligible information loss. Supported by experimental examples on physics-informed neural networks (PINNs), which highlight the critical role of gradient agreement due to their inherent PDE-based interdependence of samples, our findings suggest that when both sample-wise gradients and residuals transition in an ordered state, this leads to faster convergence and better generalization. Identifying these phase transitions could improve deep learning optimization strategies, enhancing physics-informed methods and overall machine learning performance.