Treffer: Adapting radix sort to the memory hierarchy Naila Rahman and Rajeev Raman King's College London

Title:

Adapting radix sort to the memory hierarchy Naila Rahman and Rajeev Raman King's College London

Contributors:

The Pennsylvania State University CiteSeerX Archives

Source:

http://www.dcs.kcl.ac.uk/pg/naila/papers/ALENEX2000.ps.gz.

Collection:

CiteSeerX

Subject Terms:

Categories and Subject Descriptors, F.2.2 [Theory of Computation, Analysis of Algorithms and Problem Complexity--Nonnumerical Algorithms and Problems, E.5 [Data, Files--Sorting/searching, D.1.0 [Software, Programming Techniques--General, B.3.2 [Hardware, Memory Structures--Design Styles General Terms, Efficient sorting algorithms, Locality of reference, Radix sort, Memory hierarchy, External-memory algorithms Key Words and Phrases, Cache, Translation-lookaside buffer (TLB

Document Type:

Fachzeitschrift text

File Description:

application/postscript

Language:

English

Relation:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.65.7331; http://www.dcs.kcl.ac.uk/pg/naila/papers/ALENEX2000.ps.gz

Availability:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.65.7331
http://www.dcs.kcl.ac.uk/pg/naila/papers/ALENEX2000.ps.gz

Rights:

Metadata may be used without restrictions as long as the oai identifier remains attached to it.

Accession Number:

edsbas.DE2992CC

Database:

BASE

Weitere Informationen

We demonstrate the importance of minimising misses in the translation-lookaside buffer (TLB) for obtaining good performance on modern computer architectures. We focus on least-significant-bit first (LSB) radix sort, standard implementations of which make many TLB misses. We give three techniques for reducing TLB misses for LSB radix sort: reducing working set size, explicit block transfer and grouping by pre-sorting. We note that: ffl Choosing the radix size to minimise TLB misses greatly improves performance relative to radix sizes which are chosen on the basis of cache performance alone, even though doing so nearly doubles the number of operations performed. ffl All TLB-optimised algorithms outperform LSB radix sort tuned for cache alone and cachetuned implementations of comparison-based sorting algorithms. Pre-sorting outperforms all other implementations, improving on the other TLB-optimised algorithms by over 20%, and on cacheoptimised LSB radix sort and cache-optimised quicksort by over 45%. ffl Pre-sorting and explicit block transfer make few cache and TLB misses in the worst case. This is provably not true for standard LSB radix sort. We also apply these techniques to the problem of permuting an array of integers, and obtain gains of over 30 % relative to the naive algorithm by using explicit block transfer.

Treffer: Adapting radix sort to the memory hierarchy Naila Rahman and Rajeev Raman King's College London

Weitere Informationen

Links

Zusatz-Funktionen