From: (Anatol Belski)
Date:  29 Dec 2018 06:19:16 Hong Kong Time

RE: [INTERNALS-WIN] --enable-native-intrinsics issue

NNTP-Posting-Host:  null

Hi Kalle,

> -----Original Message-----
> From: Kalle Sommer Nielsen 
> Sent: Friday, December 28, 2018 9:03 PM
> To: Anatol Belski 
> Cc: Christoph M. Becker ; internals-win>
> Subject: Re: [INTERNALS-WIN] --enable-native-intrinsics issue
> Hi Anatol
> Den fre. 28. dec. 2018 kl. 20.38 skrev Anatol Belski :
> > --enable-native-intrinsics doesn't require the actual host to support the
> chosen features. Thus, the actual features need to be present on the target
> machine. AVX2 is still a rare case, but most of the current x86 hardware
> supports AVX.
> >
> > The default processor feature set is SSE2. If compiler generates some AVX2
> optimized binary, it won't run on machines that don't support AVX2. An
> external starter tool could be used, that invokes a suitable binary depending
> on the feature detection. However that's not universally doable. Say for
> Apache where PHP is a DLL. For FCGI it might be error prone, too. And
> otherwise, one would have to deliver bins with different optimizations along
> with the default ones. So that's not a real solution. And btw, even in the SSE2
> builds there's a dynamic feature detection which is used internally for some
> functions in the core, like addslashes(), etc.
> >
> > Still, binaries generated to use more recent CPU features might show
> better results. Snapshots currently produce AVX optimized binaries for 7.3
> and master. I had no chance to test AVX2 yet :( In any case that's not
> different from other platforms when doing cross compilation. For example
> any distributions would usually deliver max SSE2 bins, but a user is free to use
> -march=native when compiling themselves. I don't see a reason to
> overcomplicate this, especially as there's no good way to deal with this at
> runtime. Implementing -march=native might make sense, but its absence is
> not critical. Free tools like coreinfo or cpuz are available, too.
> Yeah I realize that, however what I was thinking was to add a program that
> can tell the CPU features so that potential builders can more easily spot. Like
> a simple C++ program that exposes that with __cpuid/__cpuidx.
Yeah, sure. Just cpuz or other tools already exist :) A custom tool could be probably useful for -march=native functionality to automatically set the feature set, but likely just for a common case. Perhaps it could be done by using --with-native-intrinsics=native (similar way one can currently pass "all"), then it'd use your tool to automatically set the highest available CPU features. What do you think?

Otherwise, there is no reason to have to maintain our own tool. AFAIR there was a ready to use sample on MSDN that could be used as a base for a general case, processor model specific differences will sure appear and it's not always possible to get onto the exact hardware. Referring to an existing one (specific tools are also provided by processor manufacturers and are for sure better than a basic sample) were appropriate in that case and even if it's a bit more steps than just doing `lscpu` on Linux, the general expectation were builders to know what they build anyway.

Visual Studio delivers a specific switch only starting with /AVX, so we only can assume any earlier instructions will be present. Otherwise, say if passing --with-native-intrinsics=sse4.2, what can only be done is enabling the corresponding macros that would allow the code generation for SSE4.2 and lower. There's a subtle difference in this process, as /AVX would also define __AVX__ so it doesn't have to be done manually. For an automation therefore, it's impossible to go by CFLAGS. 

> But I agree, a runtime solution is not gonna be an option worthwhile at all, for
> the obvious performance degration it comes with. However I think we could
> help our users/developers (especially those not too familiar with more
> advanced Windows build setups) on this front.
Probably a launcher tool solution could help in such cases, even if not universally. Another area to have a launcher could be also to handle an absent or wrong CRT. Or fe if CRT is not Spectre mitigated. Other topics might appear. Regarding performance it would only impact the startup time, but wouldn't critically slow down. Still it's not applicable to non .exe SAPIs, but could be something to start thinking about. Of course we should try to make these topics as much user friendly as possible, also taking in account the resource capacity we have :) Probably once/if we deliver some non SSE2 binaries, some better convenience ways should be considered. For now such builds are experimental or either for users with the corresponding expertise. The dynamic CPU feature discovery in 7.3 is a very new thing, too. The most of support is handling the bug reports, I'd guess. The topic is otherwise quite internals specific, too labour intensive and probably of no interest to developers not familiar with low level details.