Chart: Comparison of sequential and parallel computations using R; source
R is probably the most versatile computing environment available, but it is definitely not the fastest one.
Some more complex calculations can drag hours or even days.
One of the ways to get the results faster is to utilize all the cores your processor may have.
By using doMC package as the backend for foreach %dopar%, you can usually reduce the computation time proportionally to the number of physical cores you have.
On the chart above you may notice that it was possible to speed up the test computations by a factor of 8.3, on an 8-core / 16-thread dual Intel Xeon processor machine.
It is worth mentioning that hyper-threading technology implemented in the tested processor didn't deliver any substantial additional boost. However, all physical cores on the Intel processor were employed, while it was possible to get only a 3.67x performance gain on a 6-core AMD Phenom chip.
When the single machine multicore performance is not enough for your purpose, the next step is to distribute computation on a multicore cluster. Unfortunately you shouldn't expect a linear gain, proportional to the number of nodes in the cluster.
I have recently tested a cluster of four 8-core / 16-thread homogeneous machines running Linux.
The faster of them in this test - SOCKS - delivered a 3x gain over parallel computations on a single multicore system. However, it was "just" 25x better than sequential calculations. Hence 32 physical cores (and 64 threads) comprising the cluster were able to deliver a 25x boost. Some 22% of the potential computation power was "lost" on communication and coordination between cluster components.
Still, it looks like a pretty decent speed gain. I'm not sure how it will scale with the increase of the number of nodes, though.