Upload & Sell: Off
| p.1 #1 · Lightroom 5 Performance Testing: Pt.1 - Library Module |
Some of you will hopefully remember that from time to time I try to do semi-thorough performance evaluations of Lightroom. Just to refresh your memories, the old ones are here:
The goal of these tests is to see where LR runs fast and where it doesn’t, and to provide some fact-based insight into the software which we can all use to help us to speed up our post processing and minimise the stresses that come from sitting in front of a computer that doesn’t respond as well as fast as we’d like.
The system used for testing is:
Intel i5-3570K clocked at various speeds from 1.6Ghz to 4.3Ghz, with turbo boost off.
Asus P8Z77-V-Pro motherboard
16Gb of DDR3-1600 RAM
128Gb Samsung 830 SSD - Boot drive (mounted as C-Drive)
80Gb Intel X25 G2 SSD – Images, LR catalogue, ACR temp dir (mounted as G-Drive)
Assorted other hard disks which don’t come into play here.
I am comparing LR4.4 and LR5.0
System monitoring is done using the nifty utils “process explorer” and “process monitor”, formally from sysinternals, but now from Microsoft. Timing is done by hand using the stopwatch on my phone.
For general testing, two catalogues were made, totally independent from one another: Though the same images are used for both, 2 different directories are used to house them. All catalogues and images were located on the Intel X-25 SSD (G-Drive). The image directory contained the same 200 images, 100 from a Canon 5D, 100 from a Fuji X10. Apologies to those of you with high MP cameras, but I just don’t have the money (or justification) to upgrade my trusty 5D to something more modern.
To really provide a specific and tough test, and thus hopefully slow things down enough to measure any differences between the software versions, additional separate catalogues comprising a single large image were also used. The image was a panorama made up of several images from a 5D stitched together and saved as a TIF. It has a lot of trees, leaves etc. and so a lot of fine detail to render. The image is 13147 x 4532 pixels and takes up 454Mb on disk.
As these tests tend to get a bit long and complicated, I’ve decided to split them up this time. Today then, is testing of only the Library module. When I get time (hopefully tomorrow) I will do a “Pt. 2” testing out the tools in the Develop module.
Test 1 – Library module: Importing
This test takes a completely fresh catalogue, and adds a folder of images on disk. This occurs via the “import” module with concurrent rendering of 1:1 previews for the 200 images. In order to speed things up, the test was run at the full 4.3Ghz clockspeed.
LR5 – import time 4 mins 5 secs
LR4.4 – Import time 4 mins 10 secs
I would comment that whilst getting the sysinternals logging software running correctly I repeated these tests a couple of times and there is a little spread in times from run to run of around 5-10 secs, so within that error I am uncomfortable concluding that LR5 is definitely faster than LR4.4, though it does appear slightly so.
Looking at the CPU usage (below), LR5 is clearly utilising all 4 cores, but usage never gets near 100%. The results are pretty much exactly the same as i found for last years testing on LR4.3 so i’m not going to comment more on that.
Test 2 - Performance Scaling of Importing
Because there numbers above are so similar I chose not to repeat the in-depth core scaling that I did last time, and just do a couple of quick tests with LR5: Compare 2 cores with 4 cores, and compare 1.6Ghz with 3.2Ghz and 4.3Ghz.
LR5 – Import time with 2 cores at 4.3Ghz = 5min 46sec
LR5 – Import time with 4 cores at 1.6Ghz = 9min 51sec
LR5 – Import time with 4 cores at 3.2Ghz = 5min 17sec
We can see a few things from these numbers.
- Firstly, importing into LR5 does not scale well with core count. I found the same sort of scaling for LR4.3 last year, and nothing seems to have changed in LR5: Doubling from 2 to 4 cores gives only a 1.41x speed up. I am surprised at how poor that is – I would expect much better scaling for a task where the software has the option to spin out one image per core.
- Secondly, importing into LR5 does scale well with Ghz. Doubling from 1.6Ghz to 3.2Ghz gives a 1.86x speed up. Not quite linear, but very good nevertheless.
- Thirdly, a heavily overclocked dual core is not very much slower than a quad core at standard speed.
It is of course interesting to also look at the file I/O that goes on during import, particularly to gauge where to use your SSD if you have one. This is what I see:
I’ve closed down the file tree to keep things compact in the image, so I should add that the I/O to C-drive is all to c:\users\username\AppData\Local\Temp where LR appears to keep some temp files.
On G-Drive most activity is reading the images and writing out ACR cache and previews. Around 3.2Gb of data is read (for reference, there are 2.8Gb of images) and 725Mb is written. Most of the write I/O is to the previews directory, with a small amount of activity to the catalogue as well, as you would expect. In total we have around 725Mb data going into the catalogue directory. Most of this is the previews (714Mb data), with the remaining I/O going to the database in some way.
I’m not a programmer, so I confess I don’t really understand why there have to be so many file events accompanying the data reading and writing. Is it really necessary have 18k file opens and 34k reads to process 200 images? And how does that affect performance?
I note i also recorded the file I/O for LR4, but it was to all intents identical to LR5.
Test 3 – Browsing in the Library module
As for my LR4.3 testing, I cleared out the caches and then clicked and zoomed on images in the Library module, forcing them to render 1:1. This test was done with the CPU set to 4 cores at 1.6Ghz, just to emphasize any differences. However, given the results above I expected to see the same results for LR5 and LR4. I was not disappointed in this prediction.
LR5 CPU usage:
LR4.4 CPU usage
Again, LR is using all 4 cores, but not getting close to 100% usage.
I would add that subjectively I didn’t “feel” any difference between the two versions.
Again it is interesting to see the accompanying file access stats. They tell exactly the same story as for the import and rendering of 1:1 previews test
LR5 File access
Test 4 – So is the LR5 develop module faster to use than LR4?
From the CPU usage plots above, LR5 is clearly pretty similar to LR4 in the way it is rendering the previews. It is suggested by the import times that LR5 *might* be slightly faster, though the result is hardly conclusive and LR5 didn’t feel more snappy when I was browsing the catalogue of 200 images. In order to really slow things down then, this test is to see how LR will render my big pano image when I zoom into it. The CPU is set at 4 cores running at 1.6Ghz, and the test is repeated 5 times and the average time taken for the image to go “sharp” is given.
LR5 – Average 14.4 sec
LR4.4 – Average 15.3 sec
And the CPU load:
Well I didn’t believe this when I measured it, so I repeated it and took care with the timing. There is a spread of 0.4 secs in the timings (which isn’t bad for a manual stopwatch), so you could argue that the averages are within errors of each other, but I think the result is real – LR5 does render the pano about a second faster. Both LR4 and LR5 give similar CPU load profiles, and both use all 4 cores at 100%, but LR5 is just a touch faster.
And just for the hell of it, I also checked how this test scales:
LR5, 4 cores at 3.2Ghz – Average 7.7 sec
LR5, 2 cores at 4.3Ghz – Average 10.5 sec
LR5, 4 cores at 4.3Ghz – Average 7.0 sec
So going from 2 to 4 cores (at 4.3Ghz) gives a 1.5x speed up. Going from 1.6Ghz to 3.2Ghz (4 cores) gives a 1.9x speed up. By and large, these are the same as we saw for importing and rendering the 200 images in test 2, though there is a more pronounced difference between the timing for 2 firstname.lastname@example.orgGhz vs 4 email@example.comGhz.
Given the similar import times and very similar file I/O I would conclude that Adobe have re-used the same rendering code in LR5 as LR4 (big surprise). There is an indication that they might have tweaked it a touch, but there is no huge speed-up that the user will notice.
Performance scaling is quite interesting. Firstly, performance scales much better with Mhz than with core count. We saw this with LR4.3 last year too, and I am going to make the same conclusion as I did then and say that the performance sweet spot is an overclocked quad core i5-3570K or i7-3770K. A hex core i7-3930K might well give you a bit more oomph, but it’s not going to be at a good performance per $.
There is perhaps an insight into the rendering engine in the scaling results too. Notice that when going from 2 to 4 cores there is a 1.4x speed-up for rendering 200 images and a 1.5x speed up for rendering the big pano? Obviously the engine is fully threaded as we see the CPU use flatline at 100% for the pano render, but the numbers sort of suggest that the rendering engine throws all available cores at each image irrespective of the type of workload: For the relatively small 5D and X10 images I have, the CPU never gets to 100% when rendering and so scaling is worse than for the pano render. This also explains the results for 2 firstname.lastname@example.orgGhz compared with 4 email@example.comGhz. If LR were to more intelligently assess the workload and divide up rendering according to image size (i.e. assign 1 image per core or per 2 cores for smaller image sizes) I wonder if the scaling wouldn’t improve quite a lot for higher core-count PC's.
With respect to hard disks, the headline result is pretty obvious – there is a lot of reading from the image directory and a lot of writing to the ACR cache and preview directories. C-drive is also touched, but not so much. The sticky question is what impact all those file opens and closes have on performance and how much that favours SSD’s over spinning disks.