GCC #80

Merged
ghost merged 43 commits from gcc into main 2022-08-28 09:29:16 -04:00
ghost commented 2022-08-26 19:50:45 -04:00 (Migrated from github.com)

Now compiles on GCC and Clang.

Now compiles on GCC and Clang.
ArtemisX64 commented 2022-08-26 23:08:29 -04:00 (Migrated from github.com)

Can confirm using gcc also fixes the final linking error

Can confirm using gcc also fixes the final linking error
ArtemisX64 commented 2022-08-26 23:13:29 -04:00 (Migrated from github.com)

Weird but it no longer compiles on clang(clang 12)
Oops as expected. Won't compile on Windows too. (Checked in actions on my repo)

Weird but it no longer compiles on clang(clang 12) Oops as expected. Won't compile on Windows too. (Checked in actions on my repo)
ArtemisX64 commented 2022-08-26 23:44:53 -04:00 (Migrated from github.com)

It will work fine in clang14(libstdc++12)

It will work fine in clang14(libstdc++12)
Tachi107 (Migrated from github.com) requested changes 2022-08-27 03:54:39 -04:00
Tachi107 (Migrated from github.com) left a comment

Thanks for working on this! I've left a few comments below.

Thanks for working on this! I've left a few comments below.
@ -418,3 +418,3 @@
throw std::invalid_argument(fmt::format("persistent id {:#x} is invalid", persistent_id));
return CemuApp::GetMLCPath(fmt::format(L"usr\\save\\system\\act\\{:08x}\\account.dat", persistent_id)).ToStdWstring();
return CemuApp::GetMLCPath(fmt::format(L"usr/save/system/act/{:08x}/account.dat", persistent_id)).ToStdWstring();
Tachi107 (Migrated from github.com) commented 2022-08-27 03:36:51 -04:00

Maybe this have to be ifdefd, it might create issues on Windows, no?

Maybe this have to be `ifdef`d, it might create issues on Windows, no?
Tachi107 (Migrated from github.com) commented 2022-08-27 03:38:13 -04:00

Use STREQUAL instead of MATCHES. The latter is for regular expressions.

if((CMAKE_C_COMPILER_ID STREQUAL "GNU") OR (CMAKE_C_COMPILER_ID STREQUAL "Clang"))
Use `STREQUAL` instead of `MATCHES`. The latter is for regular expressions. ```suggestion if((CMAKE_C_COMPILER_ID STREQUAL "GNU") OR (CMAKE_C_COMPILER_ID STREQUAL "Clang")) ```
@ -5,13 +5,6 @@
#include "util/helpers/fspinlock.h"
#include "util/highresolutiontimer/HighResolutionTimer.h"
Tachi107 (Migrated from github.com) commented 2022-08-27 03:40:40 -04:00

Is uint64 standard? Why not uint64_t?

Is `uint64` standard? Why not `uint64_t`?
Tachi107 (Migrated from github.com) commented 2022-08-27 03:47:45 -04:00

C++20 has std::rotl and std::rotr

C++20 has [`std::rotl`](https://en.cppreference.com/w/cpp/numeric/rotl) and [`std::rotr`](https://en.cppreference.com/w/cpp/numeric/rotr)
@ -206,0 +216,4 @@
#define UNREACHABLE __builtin_unreachable()
#else
#define UNREACHABLE
#endif
Tachi107 (Migrated from github.com) commented 2022-08-27 03:48:45 -04:00

thread_local is standard since C++11, you can drop this macro

[`thread_local`](https://en.cppreference.com/w/cpp/keyword/thread_local) is standard since C++11, you can drop this macro
Tachi107 (Migrated from github.com) commented 2022-08-27 03:49:31 -04:00

Stackoverflow suggests a solution, I don't know how affective it is.

Stackoverflow suggests a [solution](https://stackoverflow.com/questions/25667901/assume-clause-in-gcc/26195434#26195434), I don't know how affective it is.
Tachi107 (Migrated from github.com) commented 2022-08-27 03:53:02 -04:00

I've never used break points in code (my code editor allows me to insert them at a specific line without altering the source), so the best I can do is link this SO page

I've never used break points in code (my code editor allows me to insert them at a specific line without altering the source), so the best I can do is link this [SO page](https://stackoverflow.com/questions/173618/is-there-a-portable-equivalent-to-debugbreak-debugbreak)
Tachi107 (Migrated from github.com) commented 2022-08-27 03:53:35 -04:00

Clang defines __GNUC__ already

Clang defines `__GNUC__` already
Tachi107 (Migrated from github.com) commented 2022-08-27 03:53:57 -04:00

?

?
ArtemisX64 (Migrated from github.com) reviewed 2022-08-27 03:58:25 -04:00
ArtemisX64 (Migrated from github.com) commented 2022-08-27 03:58:25 -04:00

It causes UB when used with 64 bit values. (Need to investigate further)

It causes UB when used with 64 bit values. (Need to investigate further)
ArtemisX64 (Migrated from github.com) reviewed 2022-08-27 03:59:07 -04:00
@ -206,0 +216,4 @@
#define UNREACHABLE __builtin_unreachable()
#else
#define UNREACHABLE
#endif
ArtemisX64 (Migrated from github.com) commented 2022-08-27 03:59:07 -04:00

I added it in a separate PR

I added it in a separate PR
ArtemisX64 (Migrated from github.com) reviewed 2022-08-27 03:59:47 -04:00
@ -5,13 +5,6 @@
#include "util/helpers/fspinlock.h"
#include "util/highresolutiontimer/HighResolutionTimer.h"
ArtemisX64 (Migrated from github.com) commented 2022-08-27 03:59:46 -04:00

I think it's for consistency. (Since uint64 is used throughout the code)

I think it's for consistency. (Since uint64 is used throughout the code)
ArtemisX64 (Migrated from github.com) reviewed 2022-08-27 04:00:42 -04:00
@ -418,3 +418,3 @@
throw std::invalid_argument(fmt::format("persistent id {:#x} is invalid", persistent_id));
return CemuApp::GetMLCPath(fmt::format(L"usr\\save\\system\\act\\{:08x}\\account.dat", persistent_id)).ToStdWstring();
return CemuApp::GetMLCPath(fmt::format(L"usr/save/system/act/{:08x}/account.dat", persistent_id)).ToStdWstring();
ArtemisX64 (Migrated from github.com) commented 2022-08-27 04:00:42 -04:00
It won't. See https://github.com/cemu-project/Cemu/pull/70
ArtemisX64 (Migrated from github.com) reviewed 2022-08-27 04:07:00 -04:00
@ -206,0 +216,4 @@
#define UNREACHABLE __builtin_unreachable()
#else
#define UNREACHABLE
#endif
ArtemisX64 (Migrated from github.com) commented 2022-08-27 04:07:00 -04:00
https://stackoverflow.com/questions/13106049/what-is-the-performance-penalty-of-c11-thread-local-variables-in-gcc-4-8
ArtemisX64 (Migrated from github.com) reviewed 2022-08-27 04:07:35 -04:00
@ -206,0 +216,4 @@
#define UNREACHABLE __builtin_unreachable()
#else
#define UNREACHABLE
#endif
ArtemisX64 (Migrated from github.com) commented 2022-08-27 04:07:34 -04:00

I have added it in a separate PR

I have added it in a separate PR
ArtemisX64 (Migrated from github.com) reviewed 2022-08-27 04:09:22 -04:00
ArtemisX64 (Migrated from github.com) commented 2022-08-27 04:09:22 -04:00

Fixes the need for -fdelayed-template-parsing flag in clang and make it work on gcc

Fixes the need for -fdelayed-template-parsing flag in clang and make it work on gcc
Tachi107 (Migrated from github.com) reviewed 2022-08-27 04:11:56 -04:00
@ -206,0 +216,4 @@
#define UNREACHABLE __builtin_unreachable()
#else
#define UNREACHABLE
#endif
Tachi107 (Migrated from github.com) commented 2022-08-27 04:11:56 -04:00

Oh, wow, didn't know that. But it also says:

This wrapper is not needed for in every use case of thread_local though. This can be revealed from decl2.c. The wrapper is generated only when:

It is not function-local, and,
    It is extern (the example shown above), or
    The type has a non-trivial destructor (which is not allowed for __thread variables), or
    The type variable is initialized by a non-constant-expression (which is also not allowed for __thread variables).

In all other use cases, it behaves the same as __thread. That means, unless you have some extern __thread variables, you could replace all __thread by thread_local without any loss of performance.

Oh, wow, didn't know that. But it also says: > This wrapper is not needed for in every use case of thread_local though. This can be revealed from decl2.c. The wrapper is generated only when: > > It is not function-local, and, > It is extern (the example shown above), or > The type has a non-trivial destructor (which is not allowed for __thread variables), or > The type variable is initialized by a non-constant-expression (which is also not allowed for __thread variables). > > In all other use cases, it behaves the same as __thread. That means, unless you have some extern __thread variables, you could replace all __thread by thread_local without any loss of performance.
Tachi107 (Migrated from github.com) reviewed 2022-08-27 04:15:13 -04:00
@ -5,13 +5,6 @@
#include "util/helpers/fspinlock.h"
#include "util/highresolutiontimer/HighResolutionTimer.h"
Tachi107 (Migrated from github.com) commented 2022-08-27 04:15:13 -04:00

I would personally replace all instances of uint64 with uint64_t, but it's not a big deal obviously :)

I would personally replace all instances of `uint64` with `uint64_t`, but it's not a big deal obviously :)
Tachi107 (Migrated from github.com) reviewed 2022-08-27 04:22:43 -04:00
Tachi107 (Migrated from github.com) commented 2022-08-27 04:22:43 -04:00

What UB does it cause? The C++ standard says that the implementation should behave basically as if it was implemented like these two functions, see https://timsong-cpp.github.io/cppwp/n4868/bit.rotate

Let r be s % N.

Returns: If r is 0, x; if r is positive, (x << r) | (x >> (N - r)); if r is negative, rotr(x, -r).

What UB does it cause? The C++ standard says that the implementation should behave basically as if it was implemented like these two functions, see https://timsong-cpp.github.io/cppwp/n4868/bit.rotate > Let `r` be `s % N`. > > Returns: If `r` is `0`, `x`; if `r` is positive, `(x << r) | (x >> (N - r))`; if `r` is negative, `rotr(x, -r)`.
ArtemisX64 (Migrated from github.com) reviewed 2022-08-27 04:40:14 -04:00
ArtemisX64 (Migrated from github.com) commented 2022-08-27 04:40:14 -04:00

I don't know why but yesterday, when I used it, I got an UB. Might be related to the code I was testing

I don't know why but yesterday, when I used it, I got an UB. Might be related to the code I was testing
Tachi107 commented 2022-08-27 04:41:21 -04:00 (Migrated from github.com)

What kind of UB did you encounter?

--
OpenPGP key: 66DE F152 8299 0C21 99EF A801 A8A1 28A8 AB1C EE49

What kind of UB did you encounter? -- OpenPGP key: 66DE F152 8299 0C21 99EF A801 A8A1 28A8 AB1C EE49
ArtemisX64 commented 2022-08-27 04:51:14 -04:00 (Migrated from github.com)

What kind of UB did you encounter?

-- OpenPGP key: 66DE F152 8299 0C21 99EF A801 A8A1 28A8 AB1C EE49

Can't recollect. But I think it will be fine. (As long as it receives uint64_t as a param)

> What kind of UB did you encounter? > […](#) > -- OpenPGP key: 66DE F152 8299 0C21 99EF A801 A8A1 28A8 AB1C EE49 Can't recollect. But I think it will be fine. (As long as it receives uint64_t as a param)
Tachi107 commented 2022-08-27 05:03:14 -04:00 (Migrated from github.com)

What kind of UB did you encounter?

Can't recollect. But I think it will be fine. (As long as it receives uint64_t as a param)

Maybe the issue was that the parameter type was uint64 instead of uint64_t, and numeric_limits<uint64>​::​digits doesn't work on some platforms?

> > What kind of UB did you encounter? > > Can't recollect. But I think it will be fine. (As long as it receives uint64_t as a param) Maybe the issue was that the parameter type was uint64 instead of uint64_t, and `numeric_limits<uint64>​::​digits` doesn't work on some platforms?
RyzenDew commented 2022-08-27 17:05:13 -04:00 (Migrated from github.com)

after hours of testing and compiling on Fedora 36/37 and Arch and testing games ive not had any issues or crashing that isn't normal for the main linux build great work LGTM

after hours of testing and compiling on Fedora 36/37 and Arch and testing games ive not had any issues or crashing that isn't normal for the main linux build great work LGTM
RyzenDew (Migrated from github.com) approved these changes 2022-08-27 17:05:55 -04:00
Fijxu reviewed 2022-08-27 19:23:33 -04:00

Yep, thanks. Fixed.

Yep, thanks. Fixed.
Fijxu reviewed 2022-08-27 19:29:52 -04:00

I only wrote these wrappers because _rotl64 and _rotr64 were used in the code and I wanted to get it compiling asap. Since then I've added <bit> to precompiled.h so maybe it's a good idea to just go through and replace all occurrences with the std functions. Thoughts?

I only wrote these wrappers because _rotl64 and _rotr64 were used in the code and I wanted to get it compiling asap. Since then I've added `<bit>` to `precompiled.h` so maybe it's a good idea to just go through and replace all occurrences with the std functions. Thoughts?
Fijxu reviewed 2022-08-27 19:32:04 -04:00
@ -206,0 +216,4 @@
#define UNREACHABLE __builtin_unreachable()
#else
#define UNREACHABLE
#endif
See https://github.com/tomlally/Cemu/commit/73a8e6faaca15a80ae3e90cf6cbf603d9e6f03b1
Fijxu reviewed 2022-08-27 19:36:22 -04:00
@ -5,13 +5,6 @@
#include "util/helpers/fspinlock.h"
#include "util/highresolutiontimer/HighResolutionTimer.h"

I think it's for consistency. (Since uint64 is used throughout the code)

Yep, that was my reasoning.

> I think it's for consistency. (Since uint64 is used throughout the code) Yep, that was my reasoning.
Fijxu reviewed 2022-08-27 19:51:42 -04:00
@ -206,0 +216,4 @@
#define UNREACHABLE __builtin_unreachable()
#else
#define UNREACHABLE
#endif

I would use thread_local if I knew it had no overhead on all compilers but I don't know that for sure so... macros :(

I would use `thread_local` if I knew it had no overhead on all compilers but I don't know that for sure so... macros :(
Fijxu reviewed 2022-08-27 20:12:28 -04:00
@ -206,0 +216,4 @@
#define UNREACHABLE __builtin_unreachable()
#else
#define UNREACHABLE
#endif

I've realised __debugbreak was only used in precompiled.h so now I'm thinking it may be tidier to conditionally replace the cemu_assert_... functions.

I've realised `__debugbreak` was only used in `precompiled.h` so now I'm thinking it may be tidier to conditionally replace the cemu_assert_... functions.
Crementif (Migrated from github.com) requested changes 2022-08-27 21:39:45 -04:00
Crementif (Migrated from github.com) left a comment

Seems like the PR kinda got stacked with changes meant for Linux. Nice work!

Also, you've got empty line changes in many (unrelated) files that can probably be cleaned up as well. And if you could follow @Tachi107 suggestions.

Seems like the PR kinda got stacked with changes meant for Linux. Nice work! Also, you've got empty line changes in many (unrelated) files that can probably be cleaned up as well. And if you could follow @Tachi107 suggestions.
Crementif (Migrated from github.com) commented 2022-08-27 19:27:23 -04:00

This is a remnant of Cemuhook requiring this function to be not inlined, but that's no longer necessary with #73.

This is a remnant of Cemuhook requiring this function to be not inlined, but that's no longer necessary with #73.
Crementif (Migrated from github.com) commented 2022-08-27 20:28:24 -04:00

Is this code change necessary for GCC support or is there another reason?

Is this code change necessary for GCC support or is there another reason?
Crementif (Migrated from github.com) commented 2022-08-27 20:25:17 -04:00

Seems like a workaround, if this is needed then dependencies aren't matching or the cmakelists need to be adjusted.

Seems like a workaround, if this is needed then dependencies aren't matching or the cmakelists need to be adjusted.
Crementif (Migrated from github.com) commented 2022-08-27 20:39:07 -04:00

Is this change necessary?

Is this change necessary?
Crementif (Migrated from github.com) commented 2022-08-27 20:40:17 -04:00

noinline can be removed entirely, see other comment.

noinline can be removed entirely, see other comment.
@ -206,0 +216,4 @@
#define UNREACHABLE __builtin_unreachable()
#else
#define UNREACHABLE
#endif
Crementif (Migrated from github.com) commented 2022-08-27 20:59:54 -04:00

Only having DLLEXPORT is fine, I don't think there's really a need for DLLIMPORT.

Only having DLLEXPORT is fine, I don't think there's really a need for DLLIMPORT.
Crementif (Migrated from github.com) commented 2022-08-27 21:00:42 -04:00

NOINLINE can be removed, see other comment.

NOINLINE can be removed, see other comment.
Crementif (Migrated from github.com) commented 2022-08-27 21:17:27 -04:00

Is this include necessary for compilation on GCC?

Is this include necessary for compilation on GCC?
Exzap (Migrated from github.com) reviewed 2022-08-27 22:29:25 -04:00
Exzap (Migrated from github.com) commented 2022-08-27 22:29:25 -04:00

I believe doing offsetof where the member is a dynamic expression like in spr.UGQR[sprIndex-SPR_UGQR0] is not very portable. So this change is fine.

I believe doing offsetof where the member is a dynamic expression like in `spr.UGQR[sprIndex-SPR_UGQR0]` is not very portable. So this change is fine.
ArtemisX64 (Migrated from github.com) reviewed 2022-08-28 02:41:43 -04:00
ArtemisX64 (Migrated from github.com) commented 2022-08-28 02:41:43 -04:00

I only wrote these wrappers because _rotl64 and _rotr64 were used in the code and I wanted to get it compiling asap. Since then I've added <bit> to precompiled.h so maybe it's a good idea to just go through and replace all occurrences with the std functions. Thoughts?

Since we are on C++20, it is safe to use rotr or rotl rather than a custom implementation. And also, I found rotr and rotl to be more optimized in godbolt

> I only wrote these wrappers because _rotl64 and _rotr64 were used in the code and I wanted to get it compiling asap. Since then I've added `<bit>` to `precompiled.h` so maybe it's a good idea to just go through and replace all occurrences with the std functions. Thoughts? Since we are on C++20, it is safe to use rotr or rotl rather than a custom implementation. And also, I found rotr and rotl to be more optimized in godbolt
Tatsh (Migrated from github.com) reviewed 2022-08-28 03:22:57 -04:00
Tatsh (Migrated from github.com) commented 2022-08-28 03:22:57 -04:00

I ran into this same issue with GCC 11.3.0.

I ran into this same issue with GCC 11.3.0.
Fijxu reviewed 2022-08-28 05:06:18 -04:00

I think it is for unix/platform which is included in Common/platform. Will double check.

I think it is for unix/platform which is included in Common/platform. Will double check.
Fijxu reviewed 2022-08-28 05:07:53 -04:00

Noted. Will remove NOINLINE.

Noted. Will remove NOINLINE.
Fijxu reviewed 2022-08-28 05:41:58 -04:00
@ -206,0 +216,4 @@
#define UNREACHABLE __builtin_unreachable()
#else
#define UNREACHABLE
#endif

Done.

Done.
Fijxu reviewed 2022-08-28 05:50:07 -04:00
Fijxu reviewed 2022-08-28 05:56:54 -04:00

Sorry, didn't see this one. That was blatant hacking, I'll undo this. I think the actual solution was to have the spirv-headers package installed.

Sorry, didn't see this one. That was blatant hacking, I'll undo this. ~I think the actual solution was to have the spirv-headers package installed.~
Fijxu reviewed 2022-08-28 06:09:56 -04:00

I suppose __clang__ or __GNUC__ is somewhat redundant. Changed to just __GNUC__ in 71af6e54f4

I suppose `__clang__` or `__GNUC__` is somewhat redundant. Changed to just `__GNUC__` in https://github.com/cemu-project/Cemu/commit/71af6e54f44339ab9151c878335a0dfdd1bd1b05
Fijxu reviewed 2022-08-28 06:17:19 -04:00

It was causing problems for some people. I was following advice in this comment: https://github.com/cemu-project/Cemu/issues/1#issuecomment-1225289477. RendererShaderVk.cpp:134:18: error: excess elements in scalar initializer was the error.

It was causing problems for some people. I was following advice in this comment: https://github.com/cemu-project/Cemu/issues/1#issuecomment-1225289477. `RendererShaderVk.cpp:134:18: error: excess elements in scalar initializer` was the error.
Fijxu reviewed 2022-08-28 07:10:19 -04:00

Looks okay w/out. Removed here 3d28b5e014

Looks okay w/out. Removed here https://github.com/tomlally/Cemu/commit/3d28b5e0142ad641ac68887ceaff54adaf6c0ae3
Fijxu reviewed 2022-08-28 07:17:48 -04:00

I only wrote these wrappers because _rotl64 and _rotr64 were used in the code and I wanted to get it compiling asap. Since then I've added <bit> to precompiled.h so maybe it's a good idea to just go through and replace all occurrences with the std functions. Thoughts?

Since we are on C++20, it is safe to use rotr or rotl rather than a custom implementation. And also, I found rotr and rotl to be more optimized in godbolt

I've gone ahead replaced _rotl64, _rotr64. Slightly different semantics since std::rot on type T also returns T but there's only one place where that matters: 6408cfdd4b/src/Cafe/HW/Latte/Core/LatteShader.cpp (L228)

> > > I only wrote these wrappers because _rotl64 and _rotr64 were used in the code and I wanted to get it compiling asap. Since then I've added `<bit>` to `precompiled.h` so maybe it's a good idea to just go through and replace all occurrences with the std functions. Thoughts? > > Since we are on C++20, it is safe to use rotr or rotl rather than a custom implementation. And also, I found rotr and rotl to be more optimized in godbolt I've gone ahead replaced _rotl64, _rotr64. Slightly different semantics since std::rot on type T also returns T but there's only one place where that matters: https://github.com/tomlally/Cemu/blob/6408cfdd4b391ca47f8296fe002cd1bf8adec37a/src/Cafe/HW/Latte/Core/LatteShader.cpp#L228
Fijxu reviewed 2022-08-28 07:22:06 -04:00
@ -206,0 +216,4 @@
#define UNREACHABLE __builtin_unreachable()
#else
#define UNREACHABLE
#endif

Thanks. Removed DLLIMPORT here 3b5eb49469

Thanks. Removed DLLIMPORT here https://github.com/cemu-project/Cemu/commit/3b5eb4946993064949f2f7bce6c74bc3e918458a
Fijxu reviewed 2022-08-28 08:24:57 -04:00

Works w/out on gcc-12.2.0 and clang-14.0.6 (debug) so I've undone this change (see 056ae1f882). Is it possible the trailing comma was causing the error?

Works w/out on gcc-12.2.0 and clang-14.0.6 (debug) so I've undone this change (see https://github.com/tomlally/Cemu/commit/056ae1f882726ed46cd3389f00538a47bfcb1b86). Is it possible the trailing comma was causing the error?
Crementif (Migrated from github.com) reviewed 2022-08-28 08:36:28 -04:00
Crementif (Migrated from github.com) commented 2022-08-28 08:36:28 -04:00

Pretty sure this is cuz the struct has too many values if you don't have maxDualSourceDrawBuffersEXT in the header. Or a fix might be that we need to make sure that spirv-headers are also used as a dependency.

Maybe all compilers would be happier if this was just a separate struct that we use for limits or we could actually solve the issue and use a designated initializer list to make the code easier to check since we use C++20 now.

Pretty sure this is cuz the struct has too many values if you don't have maxDualSourceDrawBuffersEXT in the header. Or a fix might be that we need to make sure that spirv-headers are also used as a dependency. Maybe all compilers would be happier if this was just a separate struct that we use for limits or we could actually solve the issue and use a [designated initializer list](https://en.cppreference.com/w/cpp/language/aggregate_initialization#Designated_initializers) to make the code easier to check since we use C++20 now.
Fijxu reviewed 2022-08-28 08:40:47 -04:00

That sounds quite likely. C++20 designated initialisers may be useful here?

That sounds quite likely. C++20 designated initialisers may be useful here?
Crementif (Migrated from github.com) reviewed 2022-08-28 09:24:58 -04:00
Crementif (Migrated from github.com) commented 2022-08-28 09:24:57 -04:00

Maybe, but I think we can do that separately.

Maybe, but I think we can do that separately.
Crementif (Migrated from github.com) approved these changes 2022-08-28 09:28:23 -04:00
Crementif (Migrated from github.com) left a comment

LGTM

LGTM
Tachi107 (Migrated from github.com) reviewed 2022-08-28 09:41:23 -04:00
Tachi107 (Migrated from github.com) commented 2022-08-28 09:41:23 -04:00

I'm facing the same error in https://github.com/cemu-project/Cemu/pull/75#issuecomment-1229451547 - do you have any suggestions?

I'm facing the same error in https://github.com/cemu-project/Cemu/pull/75#issuecomment-1229451547 - do you have any suggestions?
Tachi107 (Migrated from github.com) reviewed 2022-08-28 09:50:00 -04:00
@ -206,0 +216,4 @@
#define UNREACHABLE __builtin_unreachable()
#else
#define UNREACHABLE
#endif
Tachi107 (Migrated from github.com) commented 2022-08-28 09:50:00 -04:00

As mentioned in the previous comment using thread_local the way it is currently used has no performance impact on GCC, while MSVC's documentation explicitly states that:

On Windows, thread_local is functionally equivalent to the Microsoft-specific __declspec( thread ) attribute.

See this for details: https://docs.microsoft.com/en-us/cpp/cpp/storage-classes-cpp?view=msvc-170#thread_local

So we could get rid of the macro.

As mentioned in the previous comment using `thread_local` the way it is currently used has no performance impact on GCC, while MSVC's documentation explicitly states that: > On Windows, `thread_local` is functionally equivalent to the Microsoft-specific `__declspec( thread )` attribute. See this for details: https://docs.microsoft.com/en-us/cpp/cpp/storage-classes-cpp?view=msvc-170#thread_local So we could get rid of the macro.
bitscher commented 2022-09-02 04:11:48 -04:00 (Migrated from github.com)

The addition of the -mavx2 flag is causing issues on pre-Haswell CPUs #144

The addition of the `-mavx2` flag is causing issues on pre-Haswell CPUs #144
Fijxu reviewed 2022-09-02 07:14:58 -04:00
@ -206,0 +216,4 @@
#define UNREACHABLE __builtin_unreachable()
#else
#define UNREACHABLE
#endif
Thanks. https://github.com/cemu-project/Cemu/pull/101
ghost commented 2022-09-02 08:05:50 -04:00 (Migrated from github.com)

-mavx2 is required for all the _mm256_... functions.
-mssse3 is for _mm_shuffle_epi8

Other instructions used in Cemu:

  • sse _mm_prefetch, _mm_pause, _mm_setcsr
  • sse2 _mm_mfence, _mm_set_epi8, _mm_set_epi16, _mm_loadu_si128, _mm_store_si128
  • sse4.1 _mm_min_epu16, _mm_max_epu16

So it may actually be a good idea to add additional flags.

`-mavx2` is required for all the `_mm256_...` functions. `-mssse3` is for `_mm_shuffle_epi8` Other instructions used in Cemu: - sse `_mm_prefetch`, `_mm_pause`, `_mm_setcsr` - sse2 `_mm_mfence`, `_mm_set_epi8`, `_mm_set_epi16`, `_mm_loadu_si128`, `_mm_store_si128` - sse4.1 `_mm_min_epu16`, `_mm_max_epu16` So it may actually be a good idea to add additional flags.
Tachi107 commented 2022-09-02 08:15:38 -04:00 (Migrated from github.com)

What about adding -march=x86-64-v2 and trying to stick to the x86_64-v2 psABI?

Edit: nevermind, AVX2 is part of v3

What about adding `-march=x86-64-v2` and trying to stick to the [x86_64-v2 psABI](https://gitlab.com/x86-psABIs/x86-64-ABI/-/wikis/uploads/01de35b2c8adc7545de52604cc45d942/x86-64-psABI-2021-05-20.pdf#page=16)? Edit: nevermind, AVX2 is part of v3
Exzap commented 2022-09-02 09:07:23 -04:00 (Migrated from github.com)

Both clang and gcc have a way to selectively enable CPU extensions for individual functions. Here is an example. It's better to do it this way than to set -mavx2 or similar globally as Cemu doesn't have a hard requirement on any of the optional CPU extensions.

Both clang and gcc have a way to selectively enable CPU extensions for individual functions. [Here](https://godbolt.org/z/c9cP49Psc) is an example. It's better to do it this way than to set `-mavx2` or similar globally as Cemu doesn't have a hard requirement on any of the optional CPU extensions.
ghost commented 2022-09-02 09:17:51 -04:00 (Migrated from github.com)

Both clang and gcc have a way to selectively enable CPU extensions for individual functions. Here is an example. It's better to do it this way than to set -mavx2 or similar globally

I agree, I can make a pr changing this if you wish but I don't think it would help insofar as compiling Cemu for pre-haswell CPUs on GCC.
From https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes

For instance, on an x86, you could declare one function with the target("sse4.1,arch=core2") attribute and another with target("sse4a,arch=amdfam10"). This is equivalent to compiling the first function with -msse4.1 and -march=core2 options, and the second function with -msse4a and -march=amdfam10 options. It is up to you to make sure that a function is only invoked on a machine that supports the particular ISA it is compiled for

> Both clang and gcc have a way to selectively enable CPU extensions for individual functions. [Here](https://godbolt.org/z/c9cP49Psc) is an example. It's better to do it this way than to set `-mavx2` or similar globally I agree, I can make a pr changing this if you wish but I don't think it would help insofar as compiling Cemu for pre-haswell CPUs on GCC. From https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes > For instance, on an x86, you could declare one function with the target("sse4.1,arch=core2") attribute and another with target("sse4a,arch=amdfam10"). This is equivalent to compiling the first function with -msse4.1 and -march=core2 options, and the second function with -msse4a and -march=amdfam10 options. It is up to you to make sure that a function is only invoked on a machine that supports the particular ISA it is compiled for
Exzap commented 2022-09-02 10:33:39 -04:00 (Migrated from github.com)

I agree, I can make a pr changing this if you wish

It would be appreciated.

I don't think it would help insofar as compiling Cemu for pre-haswell CPUs on GCC.

Why not? As it says in your quote from the spec, you have to manually make sure the CPU extensions are supported at runtime before running the function with the intrinsics. This is something we already do. Optionally running the hand-optimized functions if the CPU extensions are supported and otherwise falling back to a generic implementation that can run on any CPU. The code for handling GPU drawcall indices is a good example. On Windows this is all pretty well tested and people run Cemu with the most ancient of CPUs.

> I agree, I can make a pr changing this if you wish It would be appreciated. > I don't think it would help insofar as compiling Cemu for pre-haswell CPUs on GCC. Why not? As it says in your quote from the spec, you have to manually make sure the CPU extensions are supported at runtime before running the function with the intrinsics. This is something we already do. Optionally running the hand-optimized functions if the CPU extensions are supported and otherwise falling back to a generic implementation that can run on any CPU. The code for handling GPU drawcall indices is a good [example](https://github.com/cemu-project/Cemu/blob/68fa5b32a16d9c74dbd4c99854f00c84b62e2921/src/Cafe/HW/Latte/Core/LatteIndices.cpp#L685). On Windows this is all pretty well tested and people run Cemu with the most ancient of CPUs.
ghost commented 2022-09-02 10:37:27 -04:00 (Migrated from github.com)

Alright, I didn't realise you already checked this, thanks.

Alright, I didn't realise you already checked this, thanks.
ghost commented 2022-09-02 10:58:46 -04:00 (Migrated from github.com)
Here you go https://github.com/cemu-project/Cemu/pull/152
ghost commented 2022-09-02 11:16:41 -04:00 (Migrated from github.com)

I must have missed these clang specific pragmas when I made #80. Sorry about that. 68fa5b32a1/src/Cafe/HW/Latte/Core/LatteIndices.cpp (L288)

I must have missed these clang specific pragmas when I made #80. Sorry about that. https://github.com/cemu-project/Cemu/blob/68fa5b32a16d9c74dbd4c99854f00c84b62e2921/src/Cafe/HW/Latte/Core/LatteIndices.cpp#L288
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: cemu-project_Mirror/Cemu-2024-03-05#80
No description provided.