mirror of
https://forge.sourceware.org/marek/gcc.git
synced 2026-02-22 03:47:02 -05:00
The following fixes a regression from the time we split load groups along SLP boundaries. When we face a permuted load from an access that is contiguous across loop iterations we emit code that loads the whole group and then emit required permutations. The permutations might not need all those loads, and if we split the group we would not have emitted them. Fortunately when analyzing a permutation we compute both the number of required permutes and the number of loads that will survive the followin DCE. So make sure to use that when costing. This allows the previously added testcase for PR123190 to undergo epilog vectorization also at -O2 plus when using non-generic tuning, such as tuning for Zen4 which ups the cost for XMM loads. PR tree-optimization/123190 * tree-vectorizer.h (vect_load_store_data): Add n_loads member. * tree-vect-stmts.cc (get_load_store_type): Record the number of required loads for permuted loads. (vectorizable_load): Make use of this when costing loads for VMAT_CONTIGUOUS[_REVERSE]. * gcc.dg/vect/costmodel/x86_64/costmodel-pr123190-1.c: Do not require -mtune=generic. * gcc.dg/vect/costmodel/x86_64/costmodel-pr123190-2.c: Add variant with -O2 instead of -O3, inner loop not unrolled.
Notes for testsuite/gcc.dg.
1) There should be only one driver, dg.exp.
2) Try to organize the tests by topic using file name prefixes.
Eg: All bitfield tests are named "bf-*.c".
This lets the person running the tests choose particular sets of tests to
run easily (using wildcards).
Eg: make check RUNTESTFLAGS='dg.exp=bf-*.c'
3) Remember DOS file name restrictions (8.3). Sigh.
4) Send bugs, comments, etc. to dje@cygnus.com.
Copyright (C) 1997-2026 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.