Performance Testing - How does LR4 utilise multiple cores
/forum/topic/1165920/1

1      
2
       3       end

morganb4
Registered: Nov 03, 2005
Total Posts: 5312
Country: Australia

^Sure. I just loaded up a 1Ds2 file - not used since I got LR 4 and yeah I get probably 1.25 second response time. Similar on 1D4, bit longer maybe, easily 2+ seconds with 5D3 files.

Tried turning of HT. No joy.
Tried bumping process priority in Win7. No Joy.
HOWEVER, I just tried using the renice command to bum up process priority in the hackintosh and saw a speed reasonable speed bump, not useable but closer.



morganb4
Registered: Nov 03, 2005
Total Posts: 5312
Country: Australia

OK NOW this is heading into BS territory:
Two separate parts of the graph the fist block of activity was just me pushing the noise luminance slider around with shadow and highlight set to something.

The NEXT Block of activity is me doing the same thing WITHOUT shadow highlight set.

i.e. with S/H set we get less CPU utilisation than without. It should be the other way round.....!!!

I would greatly appreciate it if anyone else having this problem can please reply an vote for the bug report here:

http://feedback.photoshop.com/photoshop_family/topics/adjusting_shadow_highlight_or_clarity_by_only_1pt_causes_huge_lag_on_noise_sliders




15Bit
Registered: Jan 27, 2008
Total Posts: 3589
Country: Norway

It looks to me like the second action is taking more cpu overall - it looks like less because it is throwing more work to the hyperthreading cores.

This is quite an interesting plot actually - it shows LR is aware of the Hyperthread cores and tries to avoid them for some jobs.



15Bit
Registered: Jan 27, 2008
Total Posts: 3589
Country: Norway

morganb4 wrote:
^Sure. I just loaded up a 1Ds2 file - not used since I got LR 4 and yeah I get probably 1.25 second response time. Similar on 1D4, bit longer maybe, easily 2+ seconds with 5D3 files.


As we doing it perceptively i think your 1.25 secs is probably much the same as my 1 sec. And of course we are using different files. So its not enough of a difference to believe my PC is somehow superior to yours.

Be careful with renice - you can seriously affect the responsiveness of your computer with it. Even to the point of effectively locking it up until a process finishes.



morganb4
Registered: Nov 03, 2005
Total Posts: 5312
Country: Australia

Well I wish it would use more resources :-(

Yeah, I know renice can be dodgy but I'll eventually find a balance between stability and speed.

Thanks man!



Hammy
Registered: May 21, 2002
Total Posts: 2843
Country: temp

I just downloaded LR4 and am playing with a 256Mb file...on a Windows based PC.

I'm not getting the anomolies that 15bit and Morgan are getting. When I move the noise slider, ALL 8 cores of my i7 go to 100%.

I'm running it in "Auto" Turbo mode, so when I'm doing nothing, it idles at 1.6Ghz @0.96V, but when all of the cores peak, it boosts to 4.8Ghz @1.38V. So I'm definately using all cores to their max at least when I move the Noise slider around. Other adjustments don't spike/peak the cores, but simply nudge them up across the board.

I think it's more the OS that is spreading the threads. While sitting idle typing this, I have 'activity' on every other core - assuming the physical cores, whereas the virtual HT cores/threads are non existant - and this is with LR closed.



15Bit
Registered: Jan 27, 2008
Total Posts: 3589
Country: Norway

Hammy, are you measuring the overall load or the load attributed only to LR?

And how is your responsiveness with the interplay of NR and clarity?



Hammy
Registered: May 21, 2002
Total Posts: 2843
Country: temp

Ah, went back to your original post and found Process Explorer!

Measuring LR as you did, similar results to you across the board. But as I didn't lock my clock down, it scales to 3x performance right away.

With 4 cores and 4 virtual HT threads, they were all used when called for. No segregation to physical cores (verified by Perfmon and Task Manager).

For NR, peak CPU usage was 98% @4.8Ghz, albeit breifly. Spike/peak was definately longer with Basic adjustments applied. Sluggish response with CPU peaked and dedicated to processing, otherwise, very smooth for most operations.

For basic adjustments, the first one will spike at 50% usage, the remaining adjustments all around 10% spike.

One very interesting note is the lag time when going to the LR window. If I have it full screen on one monitor, as soon as I move my mouse over the LR window, it pauses for a few seconds as if to 'get ready' to for me to begin doing something by re-applying filters (even though they are already visually applied). If I have it not full screen, then as soon as I click my mouse on the LR window, I get the pause. This is with basic adjustments and NR currently applied. With the image in 'reset' mode, responsiveness is immediate and 'ready'.

Disclaimer, I have done no tweaking on LR setup. Just install and play. I'm running the file off of a RAID10 of Velociraptors, but I beleive the cache that LR setup on install is set to my Vertex3 SSD.

Hope all this helps, for which I see that LR does use all the cores you throw at it... for some (needed) filters/adjustments.



morganb4
Registered: Nov 03, 2005
Total Posts: 5312
Country: Australia

Hammy wrote:
I just downloaded LR4 and am playing with a 256Mb file...on a Windows based PC.

I'm not getting the anomolies that 15bit and Morgan are getting. When I move the noise slider, ALL 8 cores of my i7 go to 100%.

I'm running it in "Auto" Turbo mode, so when I'm doing nothing, it idles at 1.6Ghz @0.96V, but when all of the cores peak, it boosts to 4.8Ghz @1.38V. So I'm definately using all cores to their max at least when I move the Noise slider around. Other adjustments don't spike/peak the cores, but simply nudge them up across the board.

I think it's more the OS that is spreading the threads. While sitting idle typing this, I have 'activity' on every other core - assuming the physical cores, whereas the virtual HT cores/threads are non existant - and this is with LR closed.


Can you please verify the specific scenario I have outlined above? Can you please describe as much as you can how you perceive the nr slider works on a highish MP file (22 or so) when you have shadow/highlight applied (something <> 0) for those specific adjustments only. Can you describe how it reacts; lag/no lag/acts immediately on release of slider etc etc...

I am trying to pin down specifically which platforms don't exhibit the negative behaviour I have mentioned...

Thanks IA.
Ben



Hammy
Registered: May 21, 2002
Total Posts: 2843
Country: temp

Ben,

I just re-tested with your scenario with my 252Mb (25000x12085) jpg file (from previous pano stitch)

I clicked the shadow and highlight sliders to the right (+26), then went down to the luminance channel and moved it around, back and forth, clicking more and less, etc...

- Overall CPU usage for LR4.exe went from 70%-98%
- Memory usage bounced around between 2.2Gb-2.4Gb
- I/O was pretty much non existant
(these are the graphs that Process Explorer registers)

Slider access and movement was always available, however, going to the image to move, pan or zoom was suspended until the image was manipulated with settings. Once the NR level was applied (along with Shadow and Highlight), then I could move/pan/zoom with ease.

Reseting the image with no Basic adjustments, then moving/bouncing the slider around resulted in pretty much instantaneous NR application to the image (therefore freedom to move/pan/zoom on the image) and these resulting numbers from the graph:

- CPU usage bounced from 18-30%
- Memory however steadily increased from 2.4Gb up to 4.5Gb, then dropped to 2.4Gb when I stopped pestering it.
- I/O traffic was like a busy intersection: Reads from 35Kb/s-135Kb/s, Writes from 136Kb/s-336Kb/s

Here are my computer spex to get an idea of what I have available:
- MSI P67 chipset mobo
- i7-2700k (3.5Ghz normal, but C-states down to 1.6Ghz, Turbos up to 4.8Ghz)
- 16Gb Gskill, DDR3-1600, 1.35V memory
- 240Gb OCZ Vertex3 SSD (f/w 2.15)
- 4x 300Gb Velociraptors in RAID10 off mobo intel controller
- GTX580 512 Cuda cores @925Mhz
- Two 1920x1080 42" LCDs
- Water cooling on CPU and GPU (custom case is radiator for completely silent operation)

I'm not sure if I ran off of the file off of the SSD or if I had more memory... would minimize any I/O ? LR was pretty much the only thing running, with plenty of physical memory available... and again, no modifications to default LR settings.



15Bit
Registered: Jan 27, 2008
Total Posts: 3589
Country: Norway

Ben,

I think you're going to have to accept that all systems probably show this behaviour. I have an Ivy Bridge i5 that spins up to 4.3Ghz, you and Hammy have Sandy Bridge i7's running even faster. And we are all reporting exactly the same results. I suspect there is no real bug, and this is just a product of how LR is coded - its obvious they've done a really good job coding the software for multicore cpu's, so i would be very surprised if an obvious bug slipped through testing. I can't see that a 6 core i7 is going to offer a dramatically different level of performance either.

The only variables we haven't really looked at here are Windows 8, AMD's different CPU architecture (with 8 cores also), and perhaps a little more testing could be done with MacOS to confirm the results there.



morganb4
Registered: Nov 03, 2005
Total Posts: 5312
Country: Australia

Hi guys, thanks:
@15bit. I have been canvassing 6 core owners (when I find them) and Amonline here has a 6 core Gulftown and he reports no problems. Someone in the LR forum has an 860 based 6 core and has no problem. Another guy in the same forum has a 3930K and does not report the specific problem I am trying to address so there has to be some degree of platform specificity to this problem. The image I showed above was from MacOs I think, at any rate I can confirm that the problem exists on the same platform under MacOS 10.6.8 ('Snow Leopard') and 10.8.2 ('Mountain Lion') and I expect the upcoming 10.9 ('Ceiling Cat'?)

@Hammy. Thanks mate you have gone above and beyond, both of you have. Hammy I was really just angling for a day to day RAW conversion test on a 22MB RAW, just your feeling as to how the NR slider operated with those adjustments i.e. how much lag does it have. Your system may be one that calculates the noise only upon release of the slider, in which case is it immediate etc. If you dont have time thought I completely understand. All of us have sunk enough time into this already.

On the basis of reports from the 3930k owner, Im considering getting one and taking it to 4.5 - 4.8 or so. But I have not conclusively ruled out 3770K i7 yet. I just haven't had a report from anyone yet. All the i7 owners that have detailed this test appear to be running 2600k or 2700k's.

Cheers



15Bit
Registered: Jan 27, 2008
Total Posts: 3589
Country: Norway

I wouldn't bother with the 3770K - there is only a little difference between that and the 2600K you have now, and it is slightly less overclockable. If you can afford and justify the 3930K then go for it. We've shown here that LR does use multiple cores fairly well, so there's a good chance you'll speed things up a bit. Its just the interaction of clarity etc with the noise slider that might not benefit greatly if my tests are correct, and perceptively this is the only part of LR i find to be "slow" with my i5. The 3930K is an expensive upgrade though ($770 for the cpu alone here), and i know i couldn't justify the money over the i5 i bought.

Be sure to report back here after you upgrade.



Hammy
Registered: May 21, 2002
Total Posts: 2843
Country: temp

morganb4 wrote:
Hammy I was really just angling for a day to day RAW conversion test on a 22MB RAW, just your feeling as to how the NR slider operated with those adjustments i.e. how much lag does it have.



Ben,

Sorry to get back to this so late. I actually had to figure out where I had some RAW files...and found some 7D: 25Mb CR2s to work with.

Definately alot more responsive working with a file 1/10th the size.
With no Highlight/Shadow applied while moving Noise Luminance slider:
- CPU bounced from 40-60%, mostly around 40%
- Memory made a few little bumps
- I/O again nearly a no show

Lag here is non-existant. Noise applied as I drag the slider, instant availablity of image move/pan/zoom, etc..


With Highlight and Shadow both at +19
- CPU peaks at 70-80%
- Memory using a tad more with little spikes during processing
- I/O again like a teenage chatterbox.

Lag is noticable but much less so than large image. I am able to pan the image before the full noise filter is applied and all adjustments settle on the image in about a second.

So seeing as I'm not pegging a 4 core (+4 HT) CPU, I'm not sure if a 6-core would be noticably better. Of course there is a trade off between price/longevity and cores/overclockability. I have two 2600K chips that go to 4.6 without much problem and this 2700K will easily go over 5Ghz, but I like the auto voltage from 1.6-4.8Ghz.
From what I've gathered, depending on how well an application uses more cores (and I believe in the future they will), is where the benefit of more cores vs more clock speed comes in to play. The question comes down to whether you spend the money now for more cores, and let software catch up to you, or will it take software more time to utilize the cores... enough time that you'll be upgrading again anyway?

Ah, technology... I'm just glad that as it gets faster/better with every generation, it also gets relatively cheaper (unless you buy something new in the first month or two)



morganb4
Registered: Nov 03, 2005
Total Posts: 5312
Country: Australia

Thanks Hammy. Ive got a 6 core and board waiting to go in. Just got to find the time to beat it into working with MacOs. I will report when it gets past the ice-cream-tub stage and into a proper environment.

After many years of stuffing around with technology, I no longer believe in buying for the future. I get what is necessary right now.

I have rented time on a Cray-1 so that I can use LR 5.



morganb4
Registered: Nov 03, 2005
Total Posts: 5312
Country: Australia

As requested by 15bit, here is my experience of migrating to a 6-core 3930k:

OK sorry for delay, have had a lot on and getting the x79 board set up under mac os has been a COW in the utmost.

Yes its better.
Is it much better? No
Have I seen a big improvement in noise slider behaviour due to the different platform? No
Have I seen an improvement only in-line with extra grunt? Yes
Are LR Noise sliders better with HT off? Seemingly yes.
Is it screamingly fast and fun to use? Yes

Was it worth it? Well.... I sank about 25+ hours into getting it right including an RMA due to a screwed BIOS, I got unbelievably frustrated but pressed on to get a MacOs based solution on it and still had to spend extra money on an external Audio solution.

So yeah after about $1100, 25+ hours of work, late nights, cut fingers on cases, lost time and an incremental perfromance improvement, I would have to say through gritted teeth and with a slightly robotic glare: "yes, it was totally worth it, best decission i ever made etc etc" :-[ (buyer psychology).

Still its an improvement, its an improvement I needed that drags the problem back into the 'workable' range but it was an expensive improvement. Also, Im overclocked to 4.5GHz @ 1.375Vcore. I cant seem to get past this so I reckon Im at the ceiling of my chip. If you are considering a jump based on my results, you need to be prepared to O/C with at least a closed loop H100 water cooler.


15bit. Your conclusions appear to be prophetic.



15Bit
Registered: Jan 27, 2008
Total Posts: 3589
Country: Norway

Thanks for the update Ben



knower
Registered: Aug 13, 2012
Total Posts: 88
Country: Canada

This is a very nice post. I think I'll add some more information, apologies if I say something already clear or said.

Multi-threading and Multi-core is not the same thing. Lightroom, at the moment, is a Multi-threaded application, but not a Muli-core one.
A multi-thread is only marginally better than a simple single-core one, and significantly worse than a multi-core.
Maybe some of you remember that when hyper-threading came out it was a partial flop, because 2 physical processors were still much better than a single hyper-threaded one.
The hyper-threading is used in a little amount of calculations, and not in the very though ones.

This means that you'll have very little benefit from a Multi-Core cpu if you ONLY use Lightroom. There is still benefits, though, since a multi-core CPU can handle many tasks at the same time in a better way, if the system has enough memory to manage that.

Another nice test you can do is to use the new Nik Color Efex which is instead using multi-cores. The increase in speed is very high between the last version and the previous sinlge-core/multi-threaded one. On a quad core and a 36mpx the changes are almost real-time.

If you ONLY use Lightroom, than is better to buy a FASTER processor, whatever it is, dual or quad core.
In general I won't buy a dual-core anymore, since the quad-cores are handling the system much faster.
You can run Photoshop, Lightroom and other things at the same time. When and IF LR will handle quad-cores you'll have a very big increase in performance since NR and Sharpening and in general all the pixel-interpolation processes can be heavily spliced between CPUs or cores, but this requires a big rewrite of the algorithm that are behind them.

If somebody wants to have a try, can use a copy of NUKE (by the foundry) and edit there a picture. You'll see how much faster than Lightroom (and photoshop) it is. Also is totally non-destructive and nodal. But is a software born to do Compositing for VFX and such stuff.

Last thing, just to clarify: Lightroom is NOT using any openGL acceleration through the video card for the images elaboration. Photoshop is, but in a small amount.

Photoshop is very old now, it's a bit frustrating to work with it when there are so many more advanced tools.
Lightroom needs a better interface customization, especially when used with two monitors and a proper multi-core developing to speed it up considerably.

Hope this helps a bit!

Ciao!
G.




Bifurcator
Registered: Oct 22, 2008
Total Posts: 9247
Country: Japan

A couple of things here.


knower,
Adobe says that LR4 is indeed Multi-core and not just MT. So where is your information coming from?

Nuke is a video and FX editor/compositor, not a photo editor.

OpenGL is a 3D display language not really suited for 2D image processing at all. There's CUDA and such which are different, and could be used for 2D image processing and rendering.

Older is usually better in the world of apps. It spells maturity. If the PS GUI and workflow don't suit you personally that's a different issue. I'm an example of the opposite case.



Anyone,
It seems to me that LR is only a cludge of retrofitted PS code - if I can get away with using such ridiculous laymen terminology. It's a slow dog even compared to PS which is actually much faster and of course order's of magnitude more capable and diverse. Between Bridge and ACR you have 100% of the librarian, management and display functionality of LR4 and at about 4 to 6 times the speed. If you include a few functions from PS itself then you have 100% of all functions in all areas - and 4 to 6 times faster. Few people know this because of the way the GUI is laid out, the location of the various menu items, and the metaphorical terminology used to convince us we're doing something different or have something more in LR. Great for Adobe sales but kinda silly to anyone who has taken the time to actually look at all the functionality in Bridge and ACR. And speaking on a lower level it's identical as well. The same demosiacing routines are used, the same color models, the same (identical) core routines are present in LR for every one existing in Bridge, ACR, or Photoshop - with a few exceptions because PS gets the newer improved code before it hits LR 6 to 8 months later.

Here's a speed test for those not convinced of this. Load 200 RAW images into ACR, select all, make some edits you think will be CPU intensive, click Done. Now import those into LR and try browsing them. All the ACR edits render perfectly but it takes 2 to 3 seconds per image to "load" and render making everything fell very sluggish and making it nearly impossible or at least VERY uncomfortable to look navigate and compare images. Now load up Bridge and point it to the folder containing those same 200 RAW images. Everything is nearly instant and yet still all of the ACR edits render just like in ACR itself or in LR.

Why? Beats me! If can suggest a reason, I think it's because Adobe actually employs retarded chimpanzees instead of intelligent and accomplished application programmers. And I'm actually serious here - not kidding. I guess it's actually a management decision based on budget constraints and profit profiles. We went through some of that at NewTek for awhile too tho that was taken care of some years ago now. Basically Adobe employed a method (read: hackish cludge) of bringing together existing components into a GUI which streamlines image processing workflow to match that of several competing products, appear unique and different from ACR/Br/PS, and "create" an additional product.

So why do people use it?
a) They were told it's "the best" by someone with pretty images posted on-line.
b) They're stuck in a rut - it's all they know and don't wanna learn something different.
c) They didn't do any homework and believed the marketing hype - related to a).
d) They got it cheap or are using their dad's computer and can't afford something else.
e) They only process a few images a day anyway so who cares - they like the handholding GUI.

By now you're all thinking that I'm just bashing this app. But I'm really not. It's the actual state of affairs we find ourselves in with this product. Of the "popular" editors it's the very slowest there is. And exactly because of that slowness it's the most cumbersome and "unusable" to photographers who have any kind of schedule to keep with any sizable workload. So what are we doing here profiling LR? Trying to determine WHY it's slow? Unless Adobe want to start cutting me a paycheck I don't really see any value in determining why. It just is and there's nothing we can do about it.

MP/MC Advantage,
Not only are almost all other editors faster but what happens in the future when/if Adobe gets it together too? Will you then go out and buy a machine with more processors to take advantage of the added speed? That seems like a waste of time and probably money too. It's the opposite of how trained system engineers profile as well. You select the hardware for the kinds of tasks you're wanting or expecting to have to achieve, and then you select the most suitable software to accomplish that - based on budget, company ecological and political standing, EULA, function, performance, and so forth.

Image rendering (not so much editing), and video editing and rendering (what most photogs are concerned with in these modern times) greatly benefit from adding more cores. It scales almost linearly with each added core. The cost per CPU cycle also scales favorably to the typical professional - or can be made to!. So if the initial price can be justified, adding more cores to a system's spec is a great way to improve productivity - and perhaps even so for LR in the future too. To the casual photog shooting 1 to 20 images a day on average and almost no video, then for sure, Corei3 2-core boxes running LR are fine - they won't realize any of the advantages from higher bandwidth systems or (probably) faster editors either. And this profiles across the board to someone like Chase Jarvas who shot twenty thousand plus images in a single weekend for a single sporting goods ad. Can you imagine just the sorting & selecting job alone with that many images?

Here again, the system architecture is selected based on what you expect to be using it for and the most suitable software follows. For me LR doesn't fit into my profile anywhere at all - it's just too slow - and imo too restrictive as most of these kinds of apps are. It sounds to me from reading this thread that many of you have come to the same conclusion as well. So why fight it? Give it up and move to something that isn't currently a dog.



blob loblaw
Registered: Aug 19, 2007
Total Posts: 315
Country: N/A

I could not agree more!
I'm glad to see I was not the only one that feels this way, because calling out LR like that people don't seem to take you seriously and it sounds like you're just ranting and being unreasonable. Well, in my circles anyway.

Almost two years ago, I decided to try and find a replacement and tried as many trials as I could get on a PC: Bibble, Adobe Bridge+PS, DxO, CaptureOne, etc, etc.
I was just fed up with the way LR kept running. Took me a few weeks of frustration because of a learning curve and I've been very happy with C1Pro since.

I was a user of LR since before initial release of v1, a beta. Every new update, and every new version I kept hoping they would put priority on performance. They would always mention 'speed improvements' but I could not see any. In fact it kept feeling slower.
It feels like their target demo is non-professionals, an advanced amateur or non-technical user. I don't want to say an 'Apple/Mac' user, but that's the kind of user I envisioned. Someone who is willing to compromise performance for usability.
To me C1Pro definitely has a steeper learning curve. It requires knowledge of color theory and a host of other industry features, which makes it more of a craftsman tool rather than enduser tool, if that makes any sense but the results are amazing.






Bifurcator wrote:
A couple of things here.


knower,
Adobe says that LR4 is ..



1      
2
       3       end