A proprietary embedded operating system that uses clang as its primary
compiler ships headers that require __clang__ to be defined. Defining
that macro causes libstdc++ to adopt workarounds that work for clang
but that break for GCC.
So, introduce a _GLIBCXX_CLANG macro, and a convention to test for it
rather than for __clang__, so that a GCC variant that adds -D__clang__
to satisfy system headers can also -D_GLIBCXX_CLANG=0 to avoid
workarounds that are not meant for GCC.
I've left fast_float and ryu files alone, their tests for __clang__
don't seem to be harmful for GCC, they don't include bits/c++config,
and patching such third-party files would just make trouble for
updating them without visible benefit. pstl_config.h, though also
imported, required adjustment.
for libstdc++-v3/ChangeLog
* include/bits/c++config (_GLIBCXX_CLANG): Define or undefine.
* include/bits/locale_facets_nonio.tcc: Test for it.
* include/bits/stl_bvector.h: Likewise.
* include/c_compatibility/stdatomic.h: Likewise.
* include/experimental/bits/simd.h: Likewise.
* include/experimental/bits/simd_builtin.h: Likewise.
* include/experimental/bits/simd_detail.h: Likewise.
* include/experimental/bits/simd_x86.h: Likewise.
* include/experimental/simd: Likewise.
* include/std/complex: Likewise.
* include/std/ranges: Likewise.
* include/std/variant: Likewise.
* include/pstl/pstl_config.h: Likewise.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
PR libstdc++/114958
* include/experimental/bits/simd.h (__as_vector): Return scalar
simd as one-element vector. Return vector from single-vector
fixed_size simd.
(__vec_shuffle): New.
(__extract_part): Adjust return type signature.
(split): Use __extract_part for any split into non-fixed_size
simds.
(concat): If the return type stores a single vector, use
__vec_shuffle (which calls __builtin_shufflevector) to produce
the return value.
* include/experimental/bits/simd_builtin.h
(__shift_elements_right): Removed.
(__extract_part): Return single elements directly. Use
__vec_shuffle (which calls __builtin_shufflevector) to for all
non-trivial cases.
* include/experimental/bits/simd_fixed_size.h (__extract_part):
Return single elements directly.
* testsuite/experimental/simd/pr114958.cc: New test.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
PR libstdc++/114803
* include/experimental/bits/simd_builtin.h
(_SimdBase2::operator __vector_type_t): There is no __builtin()
function in _SimdWrapper, instead use its conversion operator.
* testsuite/experimental/simd/pr114803_vecbuiltin_cvt.cc: New
test.
Avoid
-Wnarrowing in C code;
-Wtautological-compare in unconditional static_assert (necessary for
faking a dependency on a template parameter)
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd.h: Ignore -Wnarrowing for
arm_neon.h.
(__int_for_sizeof): Replace tautological compare with checking
for invalid template parameter value.
* include/experimental/bits/simd_builtin.h (__extract_part):
Remove tautological compare by combining two static_assert.
This resolves failing tests in check-simd.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
PR libstdc++/114750
* include/experimental/bits/simd_builtin.h
(_SimdImplBuiltin::_S_load, _S_store): Fall back to copying
scalars if the memory type cannot be vectorized for the target.
libstdc++-v3/ChangeLog:
* include/Makefile.am: Add simd_sve.h.
* include/Makefile.in: Add simd_sve.h.
* include/experimental/bits/simd.h: Add new SveAbi.
* include/experimental/bits/simd_builtin.h: Use
__no_sve_deduce_t to support existing Neon Abi.
* include/experimental/bits/simd_converter.h: Convert
sequentially when sve is available.
* include/experimental/bits/simd_detail.h: Define sve
specific macro.
* include/experimental/bits/simd_math.h: Fallback frexp
to execute sequntially when sve is available, to handle
fixed_size_simd return type that always uses sve.
* include/experimental/simd: Include bits/simd_sve.h.
* testsuite/experimental/simd/tests/bits/main.h: Enable
testing for sve128, sve256, sve512.
* include/experimental/bits/simd_sve.h: New file.
Signed-off-by: Srinivas Yadav Singanaboina <vasu.srinivasvasu.14@gmail.com>
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
PR libstdc++/109822
* include/experimental/bits/simd_builtin.h (_S_store): Rewrite
to avoid casts to other vector types. Implement store as
succession of power-of-2 sized memcpy to avoid PR90424.
The call to the base implementation sometimes didn't find a matching
signature because the _Abi parameter of _SimdImpl* was "wrong" after
conversion. It has to call into <new ABI tag>::_SimdImpl instead of the
current ABI tag's _SimdImpl. This also reduces the number of possible
template instantiations.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
PR libstdc++/110054
* include/experimental/bits/simd_builtin.h (_S_masked_store):
Call into deduced ABI's SimdImpl after conversion.
* include/experimental/bits/simd_x86.h (_S_masked_store_nocvt):
Don't use _mm_maskmoveu_si128. Use the generic fall-back
implementation. Also fix masked stores without SSE2, which
were not doing anything before.
The constexpr API is only available with -std=gnu++XX (and proposed for
C++26). The proposal is to have the complete simd API usable in constant
expressions.
This patch resolves several issues with using simd in constant
expressions.
Issues why constant_evaluated branches are necessary:
* subscripting vector builtins is not allowed in constant expressions
* if the implementation needs/uses memcpy
* if the implementation would otherwise call SIMD intrinsics/builtins
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
PR libstdc++/109261
* include/experimental/bits/simd.h (_SimdWrapper::_M_set):
Avoid vector builtin subscripting in constant expressions.
(resizing_simd_cast): Avoid memcpy if constant_evaluated.
(const_where_expression, where_expression, where)
(__extract_part, simd_mask, _SimdIntOperators, simd): Add either
_GLIBCXX_SIMD_CONSTEXPR (on public APIs), or constexpr (on
internal APIs).
* include/experimental/bits/simd_builtin.h (__vector_permute)
(__vector_shuffle, __extract_part, _GnuTraits::_SimdCastType1)
(_GnuTraits::_SimdCastType2, _SimdImplBuiltin)
(_MaskImplBuiltin::_S_store): Add constexpr.
(_CommonImplBuiltin::_S_store_bool_array)
(_SimdImplBuiltin::_S_load, _SimdImplBuiltin::_S_store)
(_SimdImplBuiltin::_S_reduce, _MaskImplBuiltin::_S_load): Add
constant_evaluated case.
* include/experimental/bits/simd_fixed_size.h
(_S_masked_load): Reword comment.
(__tuple_element_meta, __make_meta, _SimdTuple::_M_apply_r)
(_SimdTuple::_M_subscript_read, _SimdTuple::_M_subscript_write)
(__make_simd_tuple, __optimize_simd_tuple, __extract_part)
(__autocvt_to_simd, _Fixed::__traits::_SimdBase)
(_Fixed::__traits::_SimdCastType, _SimdImplFixedSize): Add
constexpr.
(_SimdTuple::operator[], _M_set): Add constexpr and add
constant_evaluated case.
(_MaskImplFixedSize::_S_load): Add constant_evaluated case.
* include/experimental/bits/simd_scalar.h: Add constexpr.
* include/experimental/bits/simd_x86.h (_CommonImplX86): Add
constexpr and add constant_evaluated case.
(_SimdImplX86::_S_equal_to, _S_not_equal_to, _S_less)
(_S_less_equal): Value-initialize to satisfy constexpr
evaluation.
(_MaskImplX86::_S_load): Add constant_evaluated case.
(_MaskImplX86::_S_store): Add constexpr and constant_evaluated
case. Value-initialize local variables.
(_MaskImplX86::_S_logical_and, _S_logical_or, _S_bit_not)
(_S_bit_and, _S_bit_or, _S_bit_xor): Add constant_evaluated
case.
* testsuite/experimental/simd/pr109261_constexpr_simd.cc: New
test.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
PR libstdc++/108856
* include/experimental/bits/simd_builtin.h
(_SimdImplBuiltin::_S_masked_unary): More efficient
implementation of masked inc-/decrement for integers and floats
without AVX2.
* include/experimental/bits/simd_x86.h
(_SimdImplX86::_S_masked_unary): New. Use AVX512 masked subtract
builtins for masked inc-/decrement.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd_builtin.h (_S_set): Compare as
int. The actual range of these indexes is very small.
All of the annotated lambdas are simply a necessary means for
implementing these functions and should never result in an actual
function call. Many of these lambdas would go away if C++ had better
language support for packs.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
PR libstdc++/108030
* include/experimental/bits/simd_detail.h: Define
_GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA.
* include/experimental/bits/simd.h: Annotate lambdas with
_GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA.
* include/experimental/bits/simd_builtin.h: Ditto.
* include/experimental/bits/simd_converter.h: Ditto.
* include/experimental/bits/simd_fixed_size.h: Ditto.
* include/experimental/bits/simd_math.h: Ditto.
* include/experimental/bits/simd_neon.h: Ditto.
* include/experimental/bits/simd_x86.h: Ditto.
Explicitly support use of the stdx::simd implementation in situations
where the user links TUs that were compiled with different -m flags. In
general, this is always a (quasi) ODR violation for inline functions
because at least codegen may differ in important ways. However, in the
resulting executable only one (unspecified which one) of them might be
used. For simd we want to support users to compile code multiple times,
with different -m flags and have a runtime dispatch to the TU matching
the target CPU. But if internal functions are not inlined this may lead
to unexpected performance loss or execution of illegal instructions.
Therefore, inline functions that are not marked as always_inline must
use an additional template parameter somewhere in their name, to
disambiguate between the different -m translations.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd.h: Move feature detection bools
and add __have_avx512bitalg, __have_avx512vbmi2,
__have_avx512vbmi, __have_avx512ifma, __have_avx512cd,
__have_avx512vnni, __have_avx512vpopcntdq.
(__detail::__machine_flags): New function which returns a unique
uint64 depending on relevant -m and -f flags.
(__detail::__odr_helper): New type alias for either an anonymous
type or a type specialized with the __machine_flags number.
(_SimdIntOperators): Change template parameters from _Impl to
_Tp, _Abi because _Impl now has an __odr_helper parameter which
may be _OdrEnforcer from the anonymous namespace, which makes
for a bad base class.
(many): Either add __odr_helper template parameter or mark as
always_inline.
* include/experimental/bits/simd_detail.h: Add defines for
AVX512BITALG, AVX512VBMI2, AVX512VBMI, AVX512IFMA, AVX512CD,
AVX512VNNI, AVX512VPOPCNTDQ, and AVX512VP2INTERSECT.
* include/experimental/bits/simd_builtin.h: Add __odr_helper
template parameter or mark as always_inline.
* include/experimental/bits/simd_fixed_size.h: Ditto.
* include/experimental/bits/simd_math.h: Ditto.
* include/experimental/bits/simd_scalar.h: Ditto.
* include/experimental/bits/simd_neon.h: Add __odr_helper
template parameter.
* include/experimental/bits/simd_ppc.h: Ditto.
* include/experimental/bits/simd_x86.h: Ditto.
This also resolves a test failure on aarch64 with -ffast-math and
fixed_size<N> with large N.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd.h: Add missing operator~
overload for simd<floating-point> to __float_bitwise_operators.
* include/experimental/bits/simd_builtin.h
(_SimdImplBuiltin::_S_complement): Bitcast to int (and back) to
implement complement for floating-point vectors.
* include/experimental/bits/simd_fixed_size.h
(_SimdImplFixedSize::_S_copysign): New function, forwarding to
copysign implementation of _SimdTuple members.
* include/experimental/bits/simd_math.h (copysign): Call
_SimdImpl::_S_copysign for fixed_size arguments. Simplify
generic copysign implementation using the new ~ operator.
Two simd tests FAIL on Solaris, both SPARC and x86:
FAIL: experimental/simd/standard_abi_usable.cc -msse2 -O2 -Wno-psabi (test for excess errors)
FAIL: experimental/simd/standard_abi_usable_2.cc -msse2 -O2 -Wno-psabi (test for excess errors)
This happens because the simd headers use identifiers documented in the
libstdc++ manual as reserved by system headers.
Fixed as follows, tested on i386-pc-solaris2.11, sparc-sun-solaris2.11,
and x86_64-pc-linux-gnu.
2021-02-01 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
libstdc++-v3:
* include/experimental/bits/simd.h: Replace reserved _X, _B by
_Xp, _Bp.
* include/experimental/bits/simd_builtin.h: Likewise.
* include/experimental/bits/simd_x86.h: Likewise.
POWER7 does not support __vector long long reductions, making the
generic _S_popcount implementation ill-formed. Specializing _S_popcount
for PPC allows optimization and avoids the issue.
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd.h: Add __have_power10vec
conditional on _ARCH_PWR10.
* include/experimental/bits/simd_builtin.h: Forward declare
_MaskImplPpc and use it as _MaskImpl when __ALTIVEC__ is
defined.
(_MaskImplBuiltin::_S_some_of): Call _S_popcount from the
_SuperImpl for optimizations and correctness.
* include/experimental/bits/simd_ppc.h: Add _MaskImplPpc.
(_MaskImplPpc::_S_popcount): Implement via vec_cntm for POWER10.
Otherwise, for >=int use -vec_sums divided by a sizeof factor.
For <int use -vec_sums(vec_sum4s(...)) to sum all mask entries.
Adds <experimental/simd>.
This implements the simd and simd_mask class templates via
[[gnu::vector_size(N)]] data members. It implements overloads for all of
<cmath> for simd. Explicit vectorization of the <cmath> functions is not
finished.
The majority of functions are marked as [[gnu::always_inline]] to enable
quasi-ODR-conforming linking of TUs with different -m flags.
Performance optimization was done for x86_64. ARM, Aarch64, and POWER
rely on the compiler to recognize reduction, conversion, and shuffle
patterns.
Besides verification using many different machine flages, the code was
also verified with different fast-math flags.
libstdc++-v3/ChangeLog:
* doc/xml/manual/status_cxx2017.xml: Add implementation status
of the Parallelism TS 2. Document implementation-defined types
and behavior.
* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/experimental/simd: New file. New header for
Parallelism TS 2.
* include/experimental/bits/numeric_traits.h: New file.
Implementation of P1841R1 using internal naming. Addition of
missing IEC559 functionality query.
* include/experimental/bits/simd.h: New file. Definition of the
public simd interfaces and general implementation helpers.
* include/experimental/bits/simd_builtin.h: New file.
Implementation of the _VecBuiltin simd_abi.
* include/experimental/bits/simd_converter.h: New file. Generic
simd conversions.
* include/experimental/bits/simd_detail.h: New file. Internal
macros for the simd implementation.
* include/experimental/bits/simd_fixed_size.h: New file. Simd
fixed_size ABI specific implementations.
* include/experimental/bits/simd_math.h: New file. Math
overloads for simd.
* include/experimental/bits/simd_neon.h: New file. Simd NEON
specific implementations.
* include/experimental/bits/simd_ppc.h: New file. Implement bit
shifts to avoid invalid results for integral types smaller than
int.
* include/experimental/bits/simd_scalar.h: New file. Simd scalar
ABI specific implementations.
* include/experimental/bits/simd_x86.h: New file. Simd x86
specific implementations.
* include/experimental/bits/simd_x86_conversions.h: New file.
x86 specific conversion optimizations. The conversion patterns
work around missing conversion patterns in the compiler and
should be removed as soon as PR85048 is resolved.
* testsuite/experimental/simd/standard_abi_usable.cc: New file.
Test that all (not all fixed_size<N>, though) standard simd and
simd_mask types are usable.
* testsuite/experimental/simd/standard_abi_usable_2.cc: New
file. As above but with -ffast-math.
* testsuite/libstdc++-dg/conformance.exp: Don't build simd tests
from the standard test loop. Instead use
check_vect_support_and_set_flags to build simd tests with the
relevant machine flags.