|
15Bit Registered: Jan 27, 2008 Total Posts: 2852 Country: Norway |
The appearance of yet another thread discussing the slow performance of LR4 and the question of how well it uses multicore CPUs has spurred me into doing some actual performance just to see if I can more quantitatively identify some of the slowdowns many of us seem to see. I have done some testing previously, but not looking at the interactive aspects of the Library and Develop modes. These are at: ![]() As you would expect, the CPU is pegged at 100% for the core for the duration of the rendering. Rendering on 2 cores ![]() Now we start to see spikes in the trace, but each spike peaks at 50% overall load (i.e. filling 2 cores). Rendering on 3 cores ![]() Again we see spikes in the trace, but they are now peaking below the 75% max load allowed for 3 cores. Rendering on 4 cores ![]() And again the spikes, and again they are not reaching near the 100% max allowed load for 4 cores, though they do peak higher than for 3 cores, at around 75% Scaling with core count ![]() Scaling reflects what we see in the CPU traces – there is a speed up but scaling is some way from ideal. Note that the slowest time (1 core) was 6m 3sec, whilst the fastest (4 cores) was 2m 43sec. Notes It seems that LR is using more cores as they are available, though not with great scaling – rendering previews on 4 cores is only 2.2x faster than on 1 core. It is interesting that for rendering on 2-4 cores there are 41 “spikes” in the cpu trace, corresponding to the rendering of 81 images. From the poor scaling and the way that the “spikes” increase in height with core count I suspect that individual renders are to some extent spread over more than 1 core. We can test this by rendering the large panorama on 4 cores: ![]() So rendering of a single image is multithreaded and we can speculate that maybe size “matters” – if there is enough data all cores can spin up to render the image, but if the image is smaller maybe this does not happen. --------------------------------------------------------------------------- Test 2 – Library Module: Rendering previews on the fly - clicking from image to image For this test I just clicked on images in the 81 image catalogue, zooming in on each to make the image render. Rendering on 1 core ![]() Rendering on 4 cores ![]() Notes: I only present the 1 and 4 core results as the others pretty much followed the trend above. We can confirm here that the rendering of a single image is multithreaded, and that we are not making full use of the 4 cores. I note though that the X10 images are not all that big – it is set to record at 6MP. Perceptively I would say that it was pretty responsive when I did this on images which had no edits, a bit slower on images which had. I couldn’t see any increase in the CPU load plot for rendering of images with edits though. --------------------------------------------------------------------------- Test 3 –Develop Module, rendering and scrolling around Now in the Develop module, for this test I’ve used the 81 image catalogue and clicked from image to image, zooming in and scrolling around on each. Some of them are jpegs, some are RAW, some have edits, some don’t. LR is allowed to use all 4 cores. ![]() Notes Every tall spike here represents a change of image, the lower loads represent scrolling around within the image. It thus appears that whilst initial rendering can take up a big chunk of CPU power, scrolling around doesn’t even utilise 2 cores fully. Perceptively I would say that scrolling around within the images felt a little sluggish. I could see the picture being re-drawn every time I dragged it, and I would say that when I clocked the CPU back to 4.3Ghz scrolling around in images was much more snappy. --------------------------------------------------------------------------- Test 4 –Develop Module: Moving the sliders This test is done on the big panorama and is to see how image adjustments are reflected in the CPU trace. To start, all adjustments are set to default (i.e. the image is “reset”) and then individual sliders are moved around and set back to default before moving to the next. I was zoomed out of the image so that it fit in the screen while doing these, but I did try repeating with the image zoomed to 100% and saw no difference in the results. In order of the numbers on the trace, the adjustments are: 1. Exposure 2. Highlights 3. Shadows 4. Whites 5. Blacks 6. Clarity 7. Vibrance 8. Saturation 9. Quick tweak on exposure, highlights, shadows followed by zooming in and scrolling around ![]() Notes It seemed that no matter how fast I moved the sliders around I couldn’t make LR use much more than 2 cores’ worth of CPU. I would say that I saw no real lag in the editing response. So the edits are threaded, but are also so fast that they don’t need a lot of CPU. I remind you that this is a 454MbTIF image, and I’ve down-clocked my CPU to 1.6Ghz, so the software seems pretty well coded. Again I see that scrolling around the image (no.9 at the end) doesn’t chew up a lot of CPU, but again I can see the image re-drawing as I drag. And again bumping the CPU back to full speed makes it a lot more responsive. --------------------------------------------------------------------------- Test 5 –Develop Module: The sharpening and noise reduction sliders Using the panorama, with the image “reset” to default before starting. Zoomed to fit the screen to begin: 1. Sharpening by typing a number in the box 2. Sharpening by moving the slider 3. Noise reduction by typing a number in the box 4. Noise reduction by moving the slider 5. Same as 1, but zoomed to 100% view 6. Same as 2, zoomed at 100% view 7. Same as 3, zoomed at 100% view 8. Same as 4, zoomed at 100% view ![]() Notes Sharpening is again threaded according to the CPU trace, and perceptively happens quickly also. Noise reduction is the first action where I can peg all 4 cores, and I would say is also perceptively slower to respond. It wasn’t “very” slow though. Scrolling around after applying sharpening and NR was a lot slower though, leading to test 6… --------------------------------------------------------------------------- Test 6 –Develop Module: Scrolling around with sharpening and NR applied Simple test here - scrolling around the panorama with and without sharpening and NR applied. So: 1. Scrolling around “reset” image 2. Sharpening and NR applied 3. Scrolling around again ![]() Notes I think here we have found the problem. Scrolling around and zooming with no sharpening and NR applied the response is decent. Once sharpening and NR are applied it slows down a lot. Note though that the CPU load is about the same for both actions. I would say that although I didn’t do a CPU load plot for it, I have separately tested NR and sharpening and it’s the NR that is the culprit. --------------------------------------------------------------------------- Test 7 –Develop Module: Local adjustments with and without NR applied Further exploring the effect of having NR turned on. Again the panorama and sweeping around a local adjustment brush that has a combination of Exposure, Contrast, Highlights, Shadows, Clarity and Saturation set. The image was “reset” between adjustments : 1. Zoomed out to fit screen, no NR applied 2. Zoomed out to fit screen, NR applied 3. Zoomed to 100%, no NR applied 4. Zoomed to 100%, NR applied ![]() Notes Some differences to be seen in the CPU trace: when doing local adjustments with the NR turned on it does seem to use more CPU, but it doesn’t get near using 100%. Perceptively I would have to say that it really *needs* to be using 100% as there was several seconds of lag when doing the local adjustments with NR turned on and the image zoomed to 100%. With the NR turned off it was a touch sluggish, but more than acceptably responsive. --------------------------------------------------------------------------- Test 7 – Exporting Testing whether image exporting is using all the CPU. Again the big panorama image. During the course of this I noticed something “funny” going on, and so tried exporting with the long edge set to different lengths, just to see: ![]() Notes Again, LR is showing good multithreading – exporting a single image can use multiple cores. This is significant I think, as the easiest option to speed up exports would be by not having multithreading and assigning 1 image per core. Doing it this way indicates that Adobe has probably spent a lot of time on the code for the export module. There is some interesting behaviour though - with the long edge of the image set to anything above 1000 pixels all 4 cores get utilised for the export, with the long edge set between 750 and 1000px, you get 3 cores, between 500 and 750px you get 2 cores and below that you get 1 core. I expect this is a deliberate choice Adobe have, but i don’t know the reasons and I don’t feel like speculating. I’ve not tested, but it is my guess that when exporting a batch of downsized images LR will export multiple images in parallel to compensate for limiting the number of cores each export can use. --------------------------------------------------------------------------- CONCLUSIONS I think we can conclude that LR4 is pretty thoroughly multi-threaded, for which Adobe should be praised. All the adjustments I tried utilise multiple cores as they are being implemented. Scaling is not perfect, but it never is for these types of calculation. The only part of the software that seemingly doesn’t make good use of multiple cores is zooming and panning images, and when NR has been applied this really really slows down. I confess I don’t really understand why this is. It is obvious from the CPU traces that edits (incl NR) are applied as you move the slider, not as you move around the image. You can tell this from the fact that applying NR peaks the CPU at over 90%, whilst scrolling around the image with NR applied never gets it close to this. On the other hand, if the images are being fully rendered as you move the sliders then zooming and scrolling should simply involve shifting the data from RAM to display, which should surely be instantaneous as it is in the Library module. So I am curious about what actually goes on in the Develop module. In terms of how many cores you should buy, I would say that from these numbers there is little value in going above 4 cores to improve responsiveness. I would expect more cores to significantly speed up large batches of image exports though. So in summary, for responsiveness in Library and Develop modules I reckon an overclocked quad core CPU is the way to go. And do noise reduction as late in your workflow as possible. |
|
snapsy Registered: Feb 24, 2008 Total Posts: 2329 Country: United States |
Nice work. One important note though is that observing CPU utilization spread across cores is not necessarily indicative of an application that was explicitly written to take advantage of multiple cores, ie doesn't imply multi-threaded operation for a given task. The distribution of utilization also occurs when the OS dispatches a different core across executive time slices of the kernel. For example, if I write a single-threaded app that does processor-intensive work I'll see all cores participating in that work even though only one core is executing the logic at a time. The preemptive dispatches occur at intervals fast enough where they're not detected by the coarse sampling of perfmon, giving the false appearance that all cores are participating in the work at the same time. |
|
15Bit Registered: Jan 27, 2008 Total Posts: 2852 Country: Norway |
Thanks snapsy, |
|
Hammy Registered: May 21, 2002 Total Posts: 2797 Country: temp |
15Bit wrote:The only part of the software that seemingly doesn’t make good use of multiple cores is zooming and panning images, and when NR has been applied this really really slows down. |
|
amonline Registered: Jul 16, 2006 Total Posts: 5680 Country: United States |
15Bit wrote: |
|
15Bit Registered: Jan 27, 2008 Total Posts: 2852 Country: Norway |
Hammy, |
|
Matthew Cham Registered: Sep 13, 2007 Total Posts: 334 Country: United States |
Thank you for this brilliant testing. This is exactly the kind of information I was looking for in my other thread. Sounds like additional cores above 1 core provide incrementally lower performance gains. Best bang for the buck is 1 core, and lowest bang for the buck is 6 cores. If money is no object, then get as many cores as you can afford. |
|
morganb4 Registered: Nov 03, 2005 Total Posts: 5200 Country: Australia |
Brilliant work. Very thorough. |
|
Hammy Registered: May 21, 2002 Total Posts: 2797 Country: temp |
15Bit wrote: |
|
15Bit Registered: Jan 27, 2008 Total Posts: 2852 Country: Norway |
Matthew Cham wrote: |
|
morganb4 Registered: Nov 03, 2005 Total Posts: 5200 Country: Australia |
^On 8 of the threads, only 4 of them are properly active during me playing around with the noise slider, 2 of them are sort of doing something and 2 are doing nothing. My net cpu usage does not go above 41%. |
|
15Bit Registered: Jan 27, 2008 Total Posts: 2852 Country: Norway |
morganb4 wrote: |
|
morganb4 Registered: Nov 03, 2005 Total Posts: 5200 Country: Australia |
Ok... So would you please confirm my little experiment above? I I. E. Do you get instant noise sliders joy with highlight or shadow or clarity set to anything other than zero? Zoomed in on a high mp raw like 5d3 or 2 or something? |
|
15Bit Registered: Jan 27, 2008 Total Posts: 2852 Country: Norway |
I'll try to test it tonight when i get home. I don't have any modern high MP RAW files though - the biggest i have are 1DsII files. I shoot an original 5D... |
|
morganb4 Registered: Nov 03, 2005 Total Posts: 5200 Country: Australia |
The 1ds2 is 16mp which will be fine. Thanks HEAPS. |
|
15Bit Registered: Jan 27, 2008 Total Posts: 2852 Country: Norway |
Ok, a bit more testing then. This time to try out the interplay of various sliders with the NR plugin. ![]() Notes Perceptively i found NR to be quite responsive without any adjustments applied. With clarity, highlights and shadows turned on, it was a *lot* slower, showing a couple of seconds lag at each end of the NR slider. The other adjustments were nice and snappy, just like having nothing turned on. You can actually see this in the plots - the three "slow" adjustments use more CPU, *but* they are not maxing out the 4 cores for the duration of "lag" as i would hope. I would say that i repeated this experiment with some sharpening turned on and response slowed a touch for all the "fast" adjustments above (including "no" adjustment). It slowed a lot for the clarity, highlights and shadows though. Then, just to see, i clocked my CPU back to 4.3Ghz and repeated the NR adjustment with not adjustments and with clarity set to +1: ![]() The CPU use is about the same as at 1.6Ghz, and again there is a noticeable slowdown with the clarity set, but the responsiveness is a lot higher. And finally, I would comment that whilst responsiveness of LR when *moving* the NR slider is seriously slowed by the application of clarity etc, zooming and scrolling around the screen with sharpening, NR and clarity turned on seemed to be about the same as with just NR and sharpening on (and clarity off). So it seems clarity, highlights and shadows only affect the calculation of NR, not the redrawing of the image to the screen. It would seem that this is a complex issue with many variables. They all (so far) involve the NR slider though, so the easiest solution seems to be to do NR at the end of your workflow. |
|
morganb4 Registered: Nov 03, 2005 Total Posts: 5200 Country: Australia |
Wow, so perceptually, with you fully clocked system what is the lag on the noise sliders with shadow/highlight set? 0.5 seconds or still 2 seconds? |
|
15Bit Registered: Jan 27, 2008 Total Posts: 2852 Country: Norway |
On the 1DsII image i reckon around a second maximum. I would describe it as laggy but usable. For sure i would not sit swearing at the screen in frustration. For the big pano i tested earlier, the lag is more like 3 secs, so its moving into annoying territory there. |
|
morganb4 Registered: Nov 03, 2005 Total Posts: 5200 Country: Australia |
Clearly your setup is working better than mine. I have my suspicions that there is some weird CPU board interaction going on here. |