Decoupled architecture
In computer science a decoupled architecture refers to a processor with out-of-order execution that separates the fetch and decode stages from the execute stage in a pipelined processor by using a buffer.
The buffers purpose is to partition the memory access and execute functions in a computer program and achieve high-performance by exploiting the fine-grain parallelism between the two. [1] In doing so it effectively hides all memory latency from the processors perspective.
A larger buffer can in theory increase throughput however if the processor has a branch misprediction then the entire buffer may need to be flushed wasting a lot of clock cycles and reducing the effectiveness. Furthermore larger buffers create more heat and use more die space. For this reason processor designers today favour a multi-threaded design approach.
References
[1] Kurian, L.; Hulina, P.T.; Coraor, L.D.; "Memory latency effects in decoupled architectures". Computers, IEEE Transactions on Volume 43, Issue 10, Oct. 1994 Page(s):1129 - 1139