Simplifying Active Memory Clusters by Leveraging Directory Protocol Threads

2007 
Address re-mapping techniques in so-called active memory systems have been shown to dramatically increase the performance of applications with poor cache and/or communication behavior on shared memory multiprocessors. However, these systems require custom hardware in the memory controller for cache line assembly/disassembly, address translation between re-mapped and normal addresses, and coherence logic. In this paper we make the important observation that on a traditional flexible distributed shared memory (DSM) multiprocessor node, equipped with a coherence protocol thread context as in SMTp or a simple dedicated in-order protocol processing core as in a CMP, the address re-mapping techniques can be implemented in software running on the protocol thread or core without custom hardware in the memory controller while delivering high performance. We implement the active memory address re-mapping techniques of parallel reduction and matrix transpose (two popular kernels in scientific, multimedia, and data mining applications) on these systems, outline the novel coherence protocol extensions needed to make them run efficiently in software protocols, and evaluate these protocols on four different DSM multiprocessor architectures with multi-threaded and/or dual-core nodes. The proposed protocol extensions yield speedup of 1.45 for parallel reduction and 1.29 for matrix transpose on a 16-node DSM multiprocessor when compared to non-active memory baseline systems and achieve performance comparable to the existing active memory architectures that rely on custom hardware in the memory controller
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    0
    Citations
    NaN
    KQI
    []