[KyOSS Discuss] hwloc and process affinity for performance issues

alan blount alan at zeroasterisk.com
Thu Jun 12 23:28:28 EDT 2014


Sounds like it was a great discussion, which is continuing here...  (damn
those kids keeping me at home)

@Jeff (et. al.)

CakePHP is a very heavy-handed framework.  It used to be much worse... It
still kinda is heavy-handed, but a lot better at establishing a map of "how
to load stuff" and calling when needed, vs. "loading in stuff" in case you
need it... but it's still pretty "hog-ish" as platforms go... I wouldn't
run it without a php compiled cacher like Xcache or APC or whatever.

(I will be glad to give you my reasons for "still" running it anytime
though, a lot of it boils down to wanting more explicit structure to give
to my team, and my goal aligning closely with what CakePHP expects and sets
things up for)

@Charles

firing a job every minute shouldn't be an issue... if it's pegging your
capacity "some" of the time, I'd suspect it's a queue processor or
something?  Here are some CakePHP ones I've used which may be along the
lines of what you're talking about, yes?

http://cakeresque.kamisama.me/
https://github.com/webtechnick/CakePHP-Queue-Plugin

We have run into issues with "locking/race conditions" where:

1st cron run: starts something that takes a long time...
2nd cron run: attempts to start something and isn't aware that 1st process
is in the middle of running... (etc)

Is that the problem you are trying to solve?

The easy solution is to lock a table, lock a row, or some redis/memcache
record, so that the 2nd process aborts if 1st is still running.  That might
put you into a "locked for too long" state and give you a gap in
processing... not sure what your goals and requirements are there... you
could also allow up to 2 [or 3, 4, ...] concurrent processes before
aborting, to help mitigate that.

... and then back to Jeff's point - you could always come back to the
process itself to see if you can make it faster, simpler, nicer...
 profiling and optimization is FUN!



Thanks,
-alan


On Thu, Jun 12, 2014 at 5:30 PM, Jeff Squyres <jeff at squyres.com> wrote:

> @Deven: Yeah, there was a lot of talk about this exact point last night.
>  I was basically making the argument:
>
>    - "Best software practices" should not mean "totally and completely
>    disregarding performance".  Put differently: not-terrible-performance is a
>    best practice, too.
>    - Abstraction should not have to mean performance penalties
>    - If you totally disregard performance, it will cost you in terms of
>    hardware expenditures
>
> All of this is in the spirit of: don't be a performance-freak like Jeff,
> but don't knowingly write bad-performing code and rely on the hardware to
> make it better for you.  The free lunch of processors just getting faster
> and faster is pretty much over.  And IO speeds and memory speeds haven't
> improved much over the past several years.  SSDs are helping, but depending
> on your app, they may not help at all.
>
> I also gave the example of CakePHP last night.  When I played with
> CakePHP, it seemed like a pretty great framework -- it make writing some
> apps pretty darn easy.  But it also looked *horrendously inefficient* to
> me -- lots of memory usage, and oodles of preemptive database queries for
> every HTTP request, *just in case my app **might** want to use some of that
> database data*!  That was pretty horrifying to me.  I'm all for
> abstraction, but abstraction does not need to be the enemy of performance,
> nor should abstraction preemptively do things under the covers that your
> app *might* want to use in the future.  Particularly when there's great
> cost for those preemptive actions.
>
> So I totally agree: don't be a performance freak like Jeff.  But *do*
> give *some* thought to performance.  It is a best practice, after all.
>
> @Charles: Sure, you can always break things.  That's why people like...
> er... I'm sorry, I forget his name, but the guy who was there last night
> who was a performance engineer... that's why people like him exist.
>  They're trained to know how to get the best performance; they know how to
> tweak OSs, hardware, and even tweak-able applications.  He said multiple
> times last night, "There were times when we just had to throw more hardware
> at the problem to make the program perform at scale."
>
> For which we server vendors thank you.  :-)
>
> That being said, you mentioned that you have performance problems with
> cron jobs that run *every minute*.  It's not hard to make a tweak and see
> what happens to the load / performance. Hmm.. that tweak didn't work, so
> try another tweak; see what happens.  Rinse/repeat.  If you break
> something, fix it in the next minute.  Spend as much time as you want on
> this... until you get into the crossover point of programmer productivity
> vs. performance optimization, as Deven mentioned.
>
> From what little you said about these jobs, I can infer that the most you
> might break it is making your every-minute cron jobs run a little slower.
>  I can't say that for sure, of course, but if the jobs run a little slow
> for a few minutes while you're experimenting with tweaks, is that a
> problem?  Given everyone's attitudes about performance, I suspect not, *but
> only you can answer that* (i.e., I don't know what your application is
> doing and how time-sensitive it is).
>
> If it's super easy to break your every-minute cron job, then you have to
> question how robust that app is and whether you should really be running it
> every minute or not!
>
> As for your description of how it looked like the machine was load
> balancing: maybe, maybe not.  Remember that "top" gives a snapshot at an
> instant in time.  Processor affinity is generally most useful for jobs that
> run for a little time -- not ones that effectively run "immediately" and
> complete.
>
>
> On Thu, Jun 12, 2014 at 10:20 AM, Charles Griffin <cegrif01 at gmail.com>
> wrote:
>
>> @Deven I was thinking the same thing myself.  I always like to start
>>  with "Is it possible to screw it up by doing x thing?  If so, what is the
>> cost of screwing up x thing?"  If it's really risky then you automatically
>> need to pay someone good money to make sure that you reduce your risk.  If
>> it's something that you can try and screwing up is very benign, then I
>> think we should just try it without consulting expert opinion and if it
>> doesn't work, no harm no foul.  Unfortunately most business problems fall
>> in the "high risk, need an expert to make sure things go better" category.
>>
>> @Jeff What are some of the ways tweeking could back fire?  When we were
>> running our crons under heavy load, I noticed that one core would get up to
>> 100% for a split second, then that same core would drop to 20% while
>> another core was at 100%.  In other words, it appeared to be balancing the
>> load.  If I force these series of processes to just 1 core, could I
>> overheat one of the cores?
>>
>>
>> On Thu, Jun 12, 2014 at 10:01 AM, Deven Phillips <
>> deven.phillips at gmail.com> wrote:
>>
>>> Just to play devil's advocate though, this sort of tweaking incurs a
>>> significant administration and maintenance cost. Having someone
>>> knowledgable enough making sure that your applications are being
>>> "affinitized" correctly costs time and money which could just as easily be
>>> spent on additional hardware. It would be an interesting case study to
>>> determine where the break-even or threshold of return might lie depending
>>> on the project and the team working on it...
>>>
>>> Deven
>>>
>>>
>>> On Thu, Jun 12, 2014 at 9:55 AM, Conrad Storz <conradstorz at gmail.com>
>>> wrote:
>>>
>>>> I appreciate your help and advice Jeff. Your knowledge of hardware is
>>>> far beyond mine. I liked the concrete examples of usage but I don't know
>>>> anything about that level of hardware optimization any more than I know
>>>> about database structure and optimization. Each of us should be aware of
>>>> the other aspects but each of us also are becoming more and more
>>>> specialists. Like doctors, we specialize. I go to a neurologist for one
>>>> problem and a dentist for another. Gone are the days of the barber/dentist!
>>>> Maybe that's a good thing lol
>>>> On Jun 12, 2014 8:46 AM, "Jeff Squyres" <jeff at squyres.com> wrote:
>>>>
>>>>>  Charles / everyone --
>>>>>
>>>>> For all my talk last night, the only point I was really trying to
>>>>> convey is that programmers cannot stick their fingers in their ears and
>>>>> cover their eyes and ignore the underlying hardware, and just trust that it
>>>>> will always go fast.  You absolutely don't need to be an expert in the
>>>>> underlying hardware, but you should know *something* about it, and at least
>>>>> keep it in mind when writing software.
>>>>>
>>>>> A good example is your car: 99% of the world doesn't know how (or
>>>>> care) how a carburetor works, and yet they can operate their vehicles just
>>>>> fine.  But consider: everyone had to take a minimum competency test and
>>>>> certification (i.e., driver's test/license) before they were allowed to
>>>>> operate that car.  Meaning: everyone knows about pushing on the gas and the
>>>>> brakes, windshield wipers, turn signals, ...etc.
>>>>>
>>>>> This kind of basic information -- gas/brakes/winshield wipers/turn
>>>>> signals/etc. -- is all that I'm encouraging programmers to understand.
>>>>>  Understanding and designing for the basic model of a modern server can
>>>>> actually make tangible differences in the operating performance of your
>>>>> software.  And that, in turn, can turn into tangible savings in hardware
>>>>> expenditures (regardless of your hosting scenario).
>>>>>
>>>>> Finally, I want to give some disclaimers about the affinity advice I
>>>>> gave to Charles last night...
>>>>>
>>>>>    1. The commands you want to use out of the hwloc package are
>>>>>    lstopo (list topology) and hwloc-bind (bind a process -- and its children
>>>>>    -- to a set of cores/hyperthreads).
>>>>>    2. Adding process affinity to your cron jobs will likely not
>>>>>    magically solve your performance problems.  Affinity may *help*, but the
>>>>>    degree to which it helps your performance issues depends on exactly what
>>>>>    the performance problems are.
>>>>>    3. Many other factors come into play, too.  You should examine the
>>>>>    processes in question and see exactly what the bottlenecks are: raw disk
>>>>>    IO? Memory pressure / swapping? Database queries?  Network activity?  ...?
>>>>>    4. Affinity *may* help (some) in these cases -- e.g., if part of
>>>>>    your problem is raw disk IO, try locking the process down to a core (or
>>>>>    hyperthread) that is NUMA-close to where the disk is located.  Remember
>>>>>    last night that I showed a server with 2 NUMA domains, and the disk was a
>>>>>    PCI device hanging off one of them.  Likewise, if the bottleneck is network
>>>>>    IO, then try locking the process to a core NUMA-close to the network device
>>>>>    that you're using.  And so on.
>>>>>    5. I spoke last night about the example of running one web server
>>>>>    (apache, nginx, etc.) per processor (i.e., set of 8 cores). This not only
>>>>>    tends to keep the web server process physically close to the memory that it
>>>>>    uses, you can also configure the web server to use a NIC that is
>>>>>    NUMA-close, too, further reducing server-internal network congestion (I
>>>>>    don't believe I mentioned the latter point last night).
>>>>>    6. Sometimes using process affinity does not increase the
>>>>>    performance of any individual process.  But if used judiciously with lots
>>>>>    of processes in a single server, it can improve the overall throughput of
>>>>>    the server because you've decreased the amount of "code movement" within a
>>>>>    server, and potentially removed contention for internal resources (L1/L2/L3
>>>>>    caches, NUMA interconnect, memory controllers, etc.).  There have been a
>>>>>    few academic papers showing exactly this effect -- individual processes
>>>>>    weren't noticeably faster/more efficient when affinitized vs.
>>>>>    non-affinitized, but servers were able to be loaded higher and still run
>>>>>    efficiently/with a high degree of concurrency as compared to not using
>>>>>    affinitized/locale-aware processes.  Put simply: without
>>>>>    affinitization/locale-awareness, they could run X processes at Y%
>>>>>    efficiency, but *with* affinitization/locale-awareness, they could run
>>>>>    (X+Z) processes at the same Y% efficiency.  Meaning: you can run more stuff
>>>>>    at the same level of efficiency, because you're effectively using the same
>>>>>    hardware more efficiency.
>>>>>    7. Additionally, if your jobs are running in a VM, if the VM does
>>>>>    not lock virtual cores to actual cores (or virtual cores to physical
>>>>>    hyperthreads, at the very least), then affinity likely won't help much --
>>>>>    if at all -- because the hypervisor has already virtualized the processors,
>>>>>    and can therefore remap your affinitized process around at will (i.e., your
>>>>>    guest OS thinks the process is locked to a core, but that definition of
>>>>>    that core may be changed at any time by the hypervisor).
>>>>>
>>>>> In short: as usual, YMMV.
>>>>>
>>>>> ​PS: Bonus words of the day include "affinitized" and
>>>>> "affinitization".  Use them in sentences today.  :-)​
>>>>>
>>>>> --
>>>>> {+} Jeff Squyres
>>>>>
>>>>> _______________________________________________
>>>>> KyOSS-Discuss mailing list
>>>>> KyOSS-Discuss at kyoss.org
>>>>> Subscribe by sending email to kyoss-discuss-subscribe at kyoss.org
>>>>> Unsubscribe by sending email (from the address you wish to
>>>>> unsubscribe) to kyoss-discuss-unsubscribe at kyoss.org
>>>>> Difficulty unsubscribing? Check your email headers for originally-to
>>>>> address in case you are forwarding your mail.
>>>>> More options at
>>>>> http://kyoss.org/cgi-bin/mailman/listinfo/kyoss-discuss
>>>>>
>>>>
>>>> _______________________________________________
>>>> KyOSS-Discuss mailing list
>>>> KyOSS-Discuss at kyoss.org
>>>> Subscribe by sending email to kyoss-discuss-subscribe at kyoss.org
>>>> Unsubscribe by sending email (from the address you wish to unsubscribe)
>>>> to kyoss-discuss-unsubscribe at kyoss.org
>>>> Difficulty unsubscribing? Check your email headers for originally-to
>>>> address in case you are forwarding your mail.
>>>> More options at http://kyoss.org/cgi-bin/mailman/listinfo/kyoss-discuss
>>>>
>>>
>>>
>>> _______________________________________________
>>> KyOSS-Discuss mailing list
>>> KyOSS-Discuss at kyoss.org
>>> Subscribe by sending email to kyoss-discuss-subscribe at kyoss.org
>>> Unsubscribe by sending email (from the address you wish to unsubscribe)
>>> to kyoss-discuss-unsubscribe at kyoss.org
>>> Difficulty unsubscribing? Check your email headers for originally-to
>>> address in case you are forwarding your mail.
>>> More options at http://kyoss.org/cgi-bin/mailman/listinfo/kyoss-discuss
>>>
>>
>>
>> _______________________________________________
>> KyOSS-Discuss mailing list
>> KyOSS-Discuss at kyoss.org
>> Subscribe by sending email to kyoss-discuss-subscribe at kyoss.org
>> Unsubscribe by sending email (from the address you wish to unsubscribe)
>> to kyoss-discuss-unsubscribe at kyoss.org
>> Difficulty unsubscribing? Check your email headers for originally-to
>> address in case you are forwarding your mail.
>> More options at http://kyoss.org/cgi-bin/mailman/listinfo/kyoss-discuss
>>
>
>
>
> --
> {+} Jeff Squyres
>
> _______________________________________________
> KyOSS-Discuss mailing list
> KyOSS-Discuss at kyoss.org
> Subscribe by sending email to kyoss-discuss-subscribe at kyoss.org
> Unsubscribe by sending email (from the address you wish to unsubscribe) to
> kyoss-discuss-unsubscribe at kyoss.org
> Difficulty unsubscribing? Check your email headers for originally-to
> address in case you are forwarding your mail.
> More options at http://kyoss.org/cgi-bin/mailman/listinfo/kyoss-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://kyoss.org/pipermail/kyoss-discuss/attachments/20140612/c2dc54e0/attachment-0001.html>


More information about the KyOSS-Discuss mailing list