Also , the relactions between the best block size for matrix transpose and the size and associativity of the processor ' s cache is formulized . for parallel optimization , several programming models available on a numa system , such as lightweight processes ( sproc ) , posix threads , openmp and mpi , are compared , and their speedup and coding complexity are analyzed 对于sar成像处理的并行优化,本文对比了在numa架构上可用的几种并行编程模型:轻量级进程、 posix线程、 openmp和mpi ,针对numa架构和sar成像处理的特点从加速比、编程复杂度等多个方面进行了讨论。