Large language models (LLMs), despite their remarkable successes, remain significantly expensive to implement. Among various strategies, key-value cache (KV cache) stands out as a crucial technique for expediting the inference of LLMs, yet it comes with substantial memory costs. To reduce the KV cache size, conventional methods often sacrifice accuracy or require additional data for calibration, which restricts their feasibility in real-world LLMs applications. Here, we introduce MPOQ, a novel data-free quantization technique based on matrix product operators (MPO) to effectively compress the KV cache. The MPO can decompose the original matrix into a series of local tensors, effectively transferring the quantization challenges from the original matrix to these local tensors, thereby allowing us to adjust the distribution of outliers within the original matrix. Specifically, we have discovered that outliers are predominantly concentrated in smaller local tensors, whereas larger tensors exhibit a more constrained value range. Leveraging this insight, we propose a strategy that employs low-bit quantization for the large tensor while preserving a high-precision representation for the smaller tensor. Extensive experiments based on OPT, LLaMA and Mistral demonstrate the effectiveness of our method in improving both the performance and efficiency of LLMs ( ∼ 75 % reduction in memory footprint while maintaining comparable generation quality).