Treffer: Adapting radix sort to the memory hierarchy Naila Rahman and Rajeev Raman King's College London
Weitere Informationen
We demonstrate the importance of minimising misses in the translation-lookaside buffer (TLB) for obtaining good performance on modern computer architectures. We focus on least-significant-bit first (LSB) radix sort, standard implementations of which make many TLB misses. We give three techniques for reducing TLB misses for LSB radix sort: reducing working set size, explicit block transfer and grouping by pre-sorting. We note that: ffl Choosing the radix size to minimise TLB misses greatly improves performance relative to radix sizes which are chosen on the basis of cache performance alone, even though doing so nearly doubles the number of operations performed. ffl All TLB-optimised algorithms outperform LSB radix sort tuned for cache alone and cachetuned implementations of comparison-based sorting algorithms. Pre-sorting outperforms all other implementations, improving on the other TLB-optimised algorithms by over 20%, and on cacheoptimised LSB radix sort and cache-optimised quicksort by over 45%. ffl Pre-sorting and explicit block transfer make few cache and TLB misses in the worst case. This is provably not true for standard LSB radix sort. We also apply these techniques to the problem of permuting an array of integers, and obtain gains of over 30 % relative to the naive algorithm by using explicit block transfer.