Compare commits

..

68 commits
v2.2 ... main

Author SHA1 Message Date
goeiecool9999
e6a64aadda undo revert of style improvement 2025-04-27 17:03:00 +02:00
goeiecool9999
a5f3558b79 Revert "fix building with fmt11 and GCC"
This reverts commit 372c314f06.
It broke formatting in an attempt to fix GCC builds.
Some other change (perhaps dependency updates) has resolved the issue.
2025-04-27 16:57:22 +02:00
Exzap
b089ae5b32
PowerPC recompiler rework (#641) 2025-04-26 17:59:32 +02:00
Exzap
06233e3462 UI: Fix wxWidgets debug assert
Adding the same component multiple times is not allowed. Use sizers instead
2025-04-16 14:36:11 +02:00
Exzap
4972381edc Vulkan: Fix imgui validation error when sRGB framebuffer is used 2025-04-15 22:46:19 +02:00
Exzap
cd6eb1097b Vulkan: Fix a validation error + minor code refactor
We were using VK_EXT_DEPTH_CLIP_ENABLE but didn't actually request it.

Also fixed an assert when closing Cemu caused by incorrectly tracking the number of allocated pipelines
2025-04-15 21:10:11 +02:00
Exzap
c4eab08f30 Update vcpkg 2025-04-03 19:11:14 +02:00
mitoposter
57ff99ce53
cubeb: Show default device option even if enumerating devices fails (#1515) 2025-03-19 17:06:55 +01:00
capitalistspz
8b5cafa98e
Wiimote/L2CAP: More accurate descriptions for descriptors (#1512) 2025-03-13 01:09:45 +01:00
Crementif
186e92221a
debugger: allow printing registers using logging breakpoint placeholders (#1510)
This allows a savy user, developer or modder to change the comment field of a logging breakpoint to include placeholders such as {r3} or {f3} to log the register values whenever that code is hit.
2025-03-07 23:40:17 +01:00
goeiecool9999
31d2db6f78 OpenGL: Add explicit/matching qualifiers in output shader interface
fixes issues with old intel drivers
2025-03-05 22:23:06 +01:00
capitalistspz
ebb5ab53e2
Add menu item for opening shader cache directory (#1494) 2025-02-14 20:56:51 +01:00
capitalistspz
a6fb0a48eb
BUILD.md: Provide more info about build configuration flags (#1486) 2025-02-04 10:56:33 +01:00
Exzap
ec2d7c086a coreinit: Clean up time functions 2025-01-30 03:49:17 +01:00
Exzap
c714e8cb6b coreinit: Time to tick conversion is unsigned
The result is treated as signed in most cases, but the calculation uses unsigned arithmetic.

As a concrete example where this matters, DS VC passes -1 (2^64-1) to OSWaitEventWithTimeout which internally causes an overflow. But only with unsigned arithmetic this will result in a large positive number that behaves like the intended infinite timeout. With signed arithmetic the result is negative and the events will timeout immediately.
2025-01-30 03:32:24 +01:00
goeiecool9999
e834515f43
Vulkan: Improve post-shutdown cleanup and minor improvements (#1401) 2025-01-23 21:20:03 +01:00
Exzap
4f9eea07e0 CI: Update action version 2025-01-23 21:06:07 +01:00
goeiecool9999
372c314f06 fix building with fmt11 and GCC 2025-01-23 21:03:11 +01:00
Exzap
5bd253a1f8 Revert "Fix building against fmt 11.1.0 (#1474)"
Reverting commit 4ac65159ef because game profile enums use the stringifying formatters from config.h and are not supposed to store raw integers
2025-01-23 17:33:06 +01:00
Alexandre Bouvier
4ac65159ef
Fix building against fmt 11.1.0 (#1474) 2025-01-16 12:54:29 +01:00
Joshua de Reeper
eab1b24320
nsyshid: Initialise interface index as 0 (#1473) 2025-01-12 20:20:48 +01:00
Exzap
07cd402531
Update precompiled.h 2025-01-12 18:33:15 +01:00
Joshua de Reeper
0a59085021
nsyshid: Make Libusb the Windows backend (#1471) 2025-01-12 14:33:24 +01:00
Exzap
8dd809d725
Latte: Implement better index caching (#1443) 2025-01-12 12:39:02 +01:00
rcaridade145
1923b7a7c4
Vulkan: Added R5_G6_B5_UNORM to supported readback formats (#1430) 2025-01-12 12:37:56 +01:00
brysma1
f61539a262
Update build instructions for fedora and add troubleshooting step for alternative architectures (#1468) 2025-01-08 04:22:55 +01:00
Crementif
92021db230
Use one CPU emulation thread for --force-interpreter (#1467) 2025-01-05 04:08:13 +01:00
Crementif
4b792aa4d2
debug: Fix shader dumping (#1466) 2025-01-04 20:38:42 +01:00
capitalistspz
1e30d72658
build: Add ALLOW_PORTABLE flag (#1464)
* Add ALLOW_PORTABLE cmake flag
* Also check that `portable` is a directory
2024-12-30 18:49:51 +01:00
Mike Lothian
2b0cbf7f6b
Fix building against Boost 1.87.0 (#1455) 2024-12-18 22:15:42 +01:00
goeiecool9999
3738ccd2e6
Play bootSound.btsnd while shaders/pipelines are compiling (#1047) 2024-12-18 15:55:23 +01:00
Exzap
b53b223ba9 Vulkan: Use cache for sampler objects 2024-12-16 13:05:22 +01:00
Exzap
6aaad1eb83 Debugger: Added right click context menu to disasm view + small fixes 2024-12-16 13:05:22 +01:00
Exzap
adab729f43 UI: Correctly handle unicode paths during save export 2024-12-16 13:05:22 +01:00
capitalistspz
dd0af0a56f
Linux: Allow connecting Wiimotes via L2CAP (#1353) 2024-12-07 12:02:40 +01:00
Exzap
934cb54605 Properly check if MLC is writeable 2024-12-07 10:26:17 +01:00
Exzap
356cf0e5e0 Multiple smaller HLE improvements 2024-12-07 10:26:17 +01:00
Exzap
e2d0871ca3 Camera: Set error code in CAMInit
Fixes Hunter's Trophy 2 crashing on boot
2024-12-07 10:26:17 +01:00
Cemu-Language CI
40d9664d1c Update translation files 2024-12-07 07:14:20 +00:00
neebyA
eca7374567
Set version for macOS bundle (#1431) 2024-12-02 05:19:15 +01:00
Jeremy Kescher
80a6057512
build: Fix linker failure with glslang 15.0.0 (#1436) 2024-12-02 01:01:22 +01:00
capitalistspz
0735237686
Input: Move pairing dialog button and source (#1424) 2024-11-30 23:05:50 +01:00
capitalistspz
90eb2e01f4
nsyshid/dimensions: add missing return (#1425) 2024-11-22 13:43:12 +01:00
Exzap
409f12b13a coreinit: Fix calculation of thread total awake time 2024-11-21 20:34:24 +01:00
Exzap
7b513f1744 Latte: Add workaround for infinite loop in Fatal Frame shaders 2024-11-21 20:34:24 +01:00
Exzap
c3e29fb619 Latte: Add support for shader instructions MIN_UINT and MAX_UINT
Seen in the eShop version of Fatal Frame
Also made some warnings less spammy since this game seems to trigger it a lot
2024-11-21 20:34:24 +01:00
Exzap
2065ac5f63 GfxPack: Better logging messages for diagnosing problems in rules.txt 2024-11-21 20:34:24 +01:00
goeiecool9999
269d5b9aab
Vulkan: Make scaling shaders compatible + fixes (#1392) 2024-11-16 10:02:43 +01:00
Exzap
6f9f3d52ea CI: Remove outdated workflow 2024-11-13 06:38:17 +01:00
Exzap
719c631f13 config: Fix receive_untested_updates using the wrong default 2024-11-13 06:29:24 +01:00
Exzap
66658351c1 erreula: Rework implementation and fix bugs
- ErrEula doesn't disappear on its own anymore. The expected behavior is for the game to call Disappear once a button has been selected. This fixes issues where the dialog would softlock in some games
- Modernized code a bit
- Added a subtle fade in/out effect
2024-11-13 06:29:24 +01:00
Exzap
a5717e1b11 FST: Refactoring to fix a read bug + verify all reads
- Fixes a bug where corrupted data would be returned when reading files from unhashed sections with non-block aligned offset or size
- Added hash checks for all reads where possible. This means that FST now can automatically catch corruptions when they are encountered while reading from the volume
2024-11-13 06:29:23 +01:00
Joshua de Reeper
ca2e0a7c31
nsyshid: Add support for emulated Dimensions Toypad (#1371) 2024-11-11 08:58:01 +01:00
capitalistspz
2e829479d9
nsyshid/libusb: correct error message formatting and print error string on open fail (#1407) 2024-11-09 06:22:13 +01:00
capitalistspz
4ac1ab162a
procui: swap tickDelay and priority args in callbacks (#1408) 2024-11-09 06:21:06 +01:00
SamoZ256
813f9148b1
macOS: Fix absolute path to libusb dylib (#1405) 2024-11-07 07:09:35 +01:00
SamoZ256
9941e00b54
macOS: Fix libusb path for bundle (#1403) 2024-11-05 22:22:00 +01:00
Exzap
1c49a8a1ba nn_nfp: Implement GetNfpReadOnlyInfo and fix deactivate event
Fixes Amiibos not being detected in MK8
2024-11-01 22:47:19 +01:00
capitalistspz
47001ad233
Make MEMPTR<T> a little more T*-like (#1385) 2024-10-30 23:10:32 +01:00
goeiecool9999
459fd5d9bb
input: Fix crash when closing add controller dialog before search completes (#1386) 2024-10-28 09:37:30 +01:00
capitalistspz
63e1289bb5
Windows: Save icons to Cemu user data directory (#1390) 2024-10-25 18:48:21 +02:00
goeiecool9999
f9a4b2dbb1
input: Add option to make show screen button a toggle (#1383) 2024-10-19 01:56:56 +02:00
goeiecool9999
d6575455ee Linux: Fix crash on invalid command-line arguments
use std::cout instead of wxMessageBox which does not work when wxWidgets has not been initialised yet
2024-10-17 22:24:20 +02:00
goeiecool9999
3acd0c4f2c
Vulkan: Protect against uniform var ringbuffer overflow (#1378) 2024-10-14 14:03:36 +02:00
Alexandre Bouvier
6dc73f5d79
Add support for fmt 11 (#1366) 2024-10-03 08:48:25 +02:00
capitalistspz
8508c62540
Various smaller code improvements (#1343) 2024-09-17 02:00:26 +02:00
Andrea Toska
adffd53dbd
boss: Fix BOSS not honoring the proxy_server setting (#1344) 2024-09-16 12:40:38 +02:00
goeiecool9999
a05bdb172d
Vulkan: Add explicit synchronization on frame boundaries (#1290) 2024-09-15 20:23:11 +02:00
211 changed files with 21178 additions and 15067 deletions

View file

@ -39,7 +39,7 @@ jobs:
- name: "Install system dependencies"
run: |
sudo apt update -qq
sudo apt install -y clang-15 cmake freeglut3-dev libgcrypt20-dev libglm-dev libgtk-3-dev libpulse-dev libsecret-1-dev libsystemd-dev libudev-dev nasm ninja-build
sudo apt install -y clang-15 cmake freeglut3-dev libgcrypt20-dev libglm-dev libgtk-3-dev libpulse-dev libsecret-1-dev libsystemd-dev libudev-dev nasm ninja-build libbluetooth-dev
- name: "Setup cmake"
uses: jwlawson/actions-setup-cmake@v2
@ -96,7 +96,7 @@ jobs:
- name: "Install system dependencies"
run: |
sudo apt update -qq
sudo apt install -y clang-15 cmake freeglut3-dev libgcrypt20-dev libglm-dev libgtk-3-dev libpulse-dev libsecret-1-dev libsystemd-dev nasm ninja-build appstream
sudo apt install -y clang-15 cmake freeglut3-dev libgcrypt20-dev libglm-dev libgtk-3-dev libpulse-dev libsecret-1-dev libsystemd-dev nasm ninja-build appstream libbluetooth-dev
- name: "Build AppImage"
run: |

View file

@ -1,4 +1,4 @@
name: Deploy experimental release
name: Deploy release
on:
workflow_dispatch:
inputs:
@ -54,7 +54,7 @@ jobs:
next_version_major: ${{ needs.calculate-version.outputs.next_version_major }}
next_version_minor: ${{ needs.calculate-version.outputs.next_version_minor }}
deploy:
name: Deploy experimental release
name: Deploy release
runs-on: ubuntu-22.04
needs: [call-release-build, calculate-version]
steps:

View file

@ -1,85 +0,0 @@
name: Create new release
on:
workflow_dispatch:
inputs:
PlaceholderInput:
description: PlaceholderInput
required: false
jobs:
call-release-build:
uses: ./.github/workflows/build.yml
with:
deploymode: release
deploy:
name: Deploy release
runs-on: ubuntu-20.04
needs: call-release-build
steps:
- uses: actions/checkout@v3
- uses: actions/download-artifact@v4
with:
name: cemu-bin-linux-x64
path: cemu-bin-linux-x64
- uses: actions/download-artifact@v4
with:
name: cemu-appimage-x64
path: cemu-appimage-x64
- uses: actions/download-artifact@v4
with:
name: cemu-bin-windows-x64
path: cemu-bin-windows-x64
- uses: actions/download-artifact@v4
with:
name: cemu-bin-macos-x64
path: cemu-bin-macos-x64
- name: Initialize
run: |
mkdir upload
sudo apt update -qq
sudo apt install -y zip
- name: Get Cemu release version
run: |
gcc -o getversion .github/getversion.cpp
echo "Cemu CI version: $(./getversion)"
echo "CEMU_FOLDER_NAME=Cemu_$(./getversion)" >> $GITHUB_ENV
echo "CEMU_VERSION=$(./getversion)" >> $GITHUB_ENV
- name: Create release from windows-bin
run: |
ls ./
ls ./bin/
cp -R ./bin ./${{ env.CEMU_FOLDER_NAME }}
mv cemu-bin-windows-x64/Cemu.exe ./${{ env.CEMU_FOLDER_NAME }}/Cemu.exe
zip -9 -r upload/cemu-${{ env.CEMU_VERSION }}-windows-x64.zip ${{ env.CEMU_FOLDER_NAME }}
rm -r ./${{ env.CEMU_FOLDER_NAME }}
- name: Create appimage
run: |
VERSION=${{ env.CEMU_VERSION }}
echo "Cemu Version is $VERSION"
ls cemu-appimage-x64
mv cemu-appimage-x64/Cemu-*-x86_64.AppImage upload/Cemu-$VERSION-x86_64.AppImage
- name: Create release from ubuntu-bin
run: |
ls ./
ls ./bin/
cp -R ./bin ./${{ env.CEMU_FOLDER_NAME }}
mv cemu-bin-linux-x64/Cemu ./${{ env.CEMU_FOLDER_NAME }}/Cemu
zip -9 -r upload/cemu-${{ env.CEMU_VERSION }}-ubuntu-20.04-x64.zip ${{ env.CEMU_FOLDER_NAME }}
rm -r ./${{ env.CEMU_FOLDER_NAME }}
- name: Create release from macos-bin
run: cp cemu-bin-macos-x64/Cemu.dmg upload/cemu-${{ env.CEMU_VERSION }}-macos-12-x64.dmg
- name: Create release
run: |
wget -O ghr.tar.gz https://github.com/tcnksm/ghr/releases/download/v0.15.0/ghr_v0.15.0_linux_amd64.tar.gz
tar xvzf ghr.tar.gz; rm ghr.tar.gz
ghr_v0.15.0_linux_amd64/ghr -t ${{ secrets.GITHUB_TOKEN }} -n "Cemu ${{ env.CEMU_VERSION }}" -b "Changelog:" v${{ env.CEMU_VERSION }} ./upload

View file

@ -35,7 +35,7 @@ jobs:
-o cemu.pot
- name: Upload artifact
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v4
with:
name: POT file
path: ./cemu.pot

View file

@ -46,10 +46,10 @@ To compile Cemu, a recent enough compiler and STL with C++20 support is required
### Dependencies
#### For Arch and derivatives:
`sudo pacman -S --needed base-devel clang cmake freeglut git glm gtk3 libgcrypt libpulse libsecret linux-headers llvm nasm ninja systemd unzip zip`
`sudo pacman -S --needed base-devel bluez-libs clang cmake freeglut git glm gtk3 libgcrypt libpulse libsecret linux-headers llvm nasm ninja systemd unzip zip`
#### For Debian, Ubuntu and derivatives:
`sudo apt install -y cmake curl clang-15 freeglut3-dev git libgcrypt20-dev libglm-dev libgtk-3-dev libpulse-dev libsecret-1-dev libsystemd-dev libtool nasm ninja-build`
`sudo apt install -y cmake curl clang-15 freeglut3-dev git libbluetooth-dev libgcrypt20-dev libglm-dev libgtk-3-dev libpulse-dev libsecret-1-dev libsystemd-dev libtool nasm ninja-build`
You may also need to install `libusb-1.0-0-dev` as a workaround for an issue with the vcpkg hidapi package.
@ -57,7 +57,7 @@ At Step 3 in [Build Cemu using cmake and clang](#build-cemu-using-cmake-and-clan
`cmake -S . -B build -DCMAKE_BUILD_TYPE=release -DCMAKE_C_COMPILER=/usr/bin/clang-15 -DCMAKE_CXX_COMPILER=/usr/bin/clang++-15 -G Ninja -DCMAKE_MAKE_PROGRAM=/usr/bin/ninja`
#### For Fedora and derivatives:
`sudo dnf install clang cmake cubeb-devel freeglut-devel git glm-devel gtk3-devel kernel-headers libgcrypt-devel libsecret-devel libtool libusb1-devel llvm nasm ninja-build perl-core systemd-devel zlib-devel zlib-static`
`sudo dnf install bluez-libs-devel clang cmake cubeb-devel freeglut-devel git glm-devel gtk3-devel kernel-headers libgcrypt-devel libsecret-devel libtool libusb1-devel llvm nasm ninja-build perl-core systemd-devel wayland-protocols-devel zlib-devel zlib-static`
### Build Cemu
@ -120,6 +120,9 @@ This section refers to running `cmake -S...` (truncated).
* Compiling failed during rebuild after `git pull` with an error that mentions RPATH
* Add the following and try running the command again:
* `-DCMAKE_BUILD_WITH_INSTALL_RPATH=ON`
* Environment variable `VCPKG_FORCE_SYSTEM_BINARIES` must be set.
* Execute the folowing and then try running the command again:
* `export VCPKG_FORCE_SYSTEM_BINARIES=1`
* If you are getting a random error, read the [package-name-and-platform]-out.log and [package-name-and-platform]-err.log for the actual reason to see if you might be lacking the headers from a dependency.
@ -189,3 +192,41 @@ Then install the dependencies:
If CMake complains about Cemu already being compiled or another similar error, try deleting the `CMakeCache.txt` file inside the `build` folder and retry building.
## CMake configure flags
Some flags can be passed during CMake configure to customise which features are enabled on build.
Example usage: `cmake -S . -B build -DCMAKE_BUILD_TYPE=release -DENABLE_SDL=ON -DENABLE_VULKAN=OFF`
### All platforms
| Flag | | Description | Default | Note |
|--------------------|:--|-----------------------------------------------------------------------------|---------|--------------------|
| ALLOW_PORTABLE | | Allow Cemu to use the `portable` directory to store configs and data | ON | |
| CEMU_CXX_FLAGS | | Flags passed straight to the compiler, e.g. `-march=native`, `-Wall`, `/W3` | "" | |
| ENABLE_CUBEB | | Enable cubeb audio backend | ON | |
| ENABLE_DISCORD_RPC | | Enable Discord Rich presence support | ON | |
| ENABLE_OPENGL | | Enable OpenGL graphics backend | ON | Currently required |
| ENABLE_HIDAPI | | Enable HIDAPI (used for Wiimote controller API) | ON | |
| ENABLE_SDL | | Enable SDLController controller API | ON | Currently required |
| ENABLE_VCPKG | | Use VCPKG package manager to obtain dependencies | ON | |
| ENABLE_VULKAN | | Enable the Vulkan graphics backend | ON | |
| ENABLE_WXWIDGETS | | Enable wxWidgets UI | ON | Currently required |
### Windows
| Flag | Description | Default | Note |
|--------------------|-----------------------------------|---------|--------------------|
| ENABLE_DIRECTAUDIO | Enable DirectAudio audio backend | ON | Currently required |
| ENABLE_DIRECTINPUT | Enable DirectInput controller API | ON | Currently required |
| ENABLE_XAUDIO | Enable XAudio audio backend | ON | |
| ENABLE_XINPUT | Enable XInput controller API | ON | |
### Linux
| Flag | Description | Default |
|-----------------------|----------------------------------------------------|---------|
| ENABLE_BLUEZ | Build with Bluez (used for Wiimote controller API) | ON |
| ENABLE_FERAL_GAMEMODE | Enable Feral Interactive GameMode support | ON |
| ENABLE_WAYLAND | Enable Wayland support | ON |
### macOS
| Flag | Description | Default |
|--------------|------------------------------------------------|---------|
| MACOS_BUNDLE | MacOS executable will be an application bundle | OFF |

View file

@ -2,6 +2,7 @@ cmake_minimum_required(VERSION 3.21.1)
option(ENABLE_VCPKG "Enable the vcpkg package manager" ON)
option(MACOS_BUNDLE "The executable when built on macOS will be created as an application bundle" OFF)
option(ALLOW_PORTABLE "Allow Cemu to be run in portable mode" ON)
# used by CI script to set version:
set(EMULATOR_VERSION_MAJOR "0" CACHE STRING "")
@ -98,6 +99,7 @@ endif()
if (UNIX AND NOT APPLE)
option(ENABLE_WAYLAND "Build with Wayland support" ON)
option(ENABLE_FERAL_GAMEMODE "Enables Feral Interactive GameMode Support" ON)
option(ENABLE_BLUEZ "Build with Bluez support" ON)
endif()
option(ENABLE_OPENGL "Enables the OpenGL backend" ON)
@ -122,23 +124,6 @@ if (WIN32)
endif()
option(ENABLE_CUBEB "Enabled cubeb backend" ON)
# usb hid backends
if (WIN32)
option(ENABLE_NSYSHID_WINDOWS_HID "Enables the native Windows HID backend for nsyshid" ON)
endif ()
# libusb and windows hid backends shouldn't be active at the same time; otherwise we'd see all devices twice!
if (NOT ENABLE_NSYSHID_WINDOWS_HID)
option(ENABLE_NSYSHID_LIBUSB "Enables the libusb backend for nsyshid" ON)
else ()
set(ENABLE_NSYSHID_LIBUSB OFF CACHE BOOL "" FORCE)
endif ()
if (ENABLE_NSYSHID_WINDOWS_HID)
add_compile_definitions(NSYSHID_ENABLE_BACKEND_WINDOWS_HID)
endif ()
if (ENABLE_NSYSHID_LIBUSB)
add_compile_definitions(NSYSHID_ENABLE_BACKEND_LIBUSB)
endif ()
option(ENABLE_WXWIDGETS "Build with wxWidgets UI (Currently required)" ON)
set(THREADS_PREFER_PTHREAD_FLAG true)
@ -179,6 +164,12 @@ if (UNIX AND NOT APPLE)
endif()
find_package(GTK3 REQUIRED)
if(ENABLE_BLUEZ)
find_package(bluez REQUIRED)
set(ENABLE_WIIMOTE ON)
add_compile_definitions(HAS_BLUEZ)
endif()
endif()
if (ENABLE_VULKAN)
@ -222,7 +213,7 @@ if (ENABLE_CUBEB)
option(BUILD_TOOLS "" OFF)
option(BUNDLE_SPEEX "" OFF)
set(USE_WINMM OFF CACHE BOOL "")
add_subdirectory("dependencies/cubeb" EXCLUDE_FROM_ALL)
add_subdirectory("dependencies/cubeb" EXCLUDE_FROM_ALL SYSTEM)
set_property(TARGET cubeb PROPERTY MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:Debug>")
add_library(cubeb::cubeb ALIAS cubeb)
endif()

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

26
boost.natvis Normal file
View file

@ -0,0 +1,26 @@
<?xml version='1.0' encoding='utf-8'?>
<AutoVisualizer xmlns="http://schemas.microsoft.com/vstudio/debugger/natvis/2010">
<Type Name="boost::container::small_vector&lt;*&gt;">
<Expand>
<Item Name="[size]">m_holder.m_size</Item>
<ArrayItems>
<Size>m_holder.m_size</Size>
<ValuePointer>m_holder.m_start</ValuePointer>
</ArrayItems>
</Expand>
</Type>
<Type Name="boost::container::static_vector&lt;*&gt;">
<DisplayString>{{ size={m_holder.m_size} }}</DisplayString>
<Expand>
<Item Name="[size]" ExcludeView="simple">m_holder.m_size</Item>
<Item Name="[capacity]" ExcludeView="simple">static_capacity</Item>
<ArrayItems>
<Size>m_holder.m_size</Size>
<ValuePointer>($T1*)m_holder.storage.data</ValuePointer>
</ArrayItems>
</Expand>
</Type>
</AutoVisualizer>

20
cmake/Findbluez.cmake Normal file
View file

@ -0,0 +1,20 @@
# SPDX-FileCopyrightText: 2022 Andrea Pappacoda <andrea@pappacoda.it>
# SPDX-License-Identifier: ISC
find_package(bluez CONFIG)
if (NOT bluez_FOUND)
find_package(PkgConfig)
if (PKG_CONFIG_FOUND)
pkg_search_module(bluez IMPORTED_TARGET GLOBAL bluez-1.0 bluez)
if (bluez_FOUND)
add_library(bluez::bluez ALIAS PkgConfig::bluez)
endif ()
endif ()
endif ()
find_package_handle_standard_args(bluez
REQUIRED_VARS
bluez_LINK_LIBRARIES
bluez_FOUND
VERSION_VAR bluez_VERSION
)

2
dependencies/vcpkg vendored

@ -1 +1 @@
Subproject commit a4275b7eee79fb24ec2e135481ef5fce8b41c339
Subproject commit 533a5fda5c0646d1771345fb572e759283444d5f

View file

@ -82,8 +82,8 @@ if (MACOS_BUNDLE)
set(MACOSX_BUNDLE_ICON_FILE "cemu.icns")
set(MACOSX_BUNDLE_GUI_IDENTIFIER "info.cemu.Cemu")
set(MACOSX_BUNDLE_BUNDLE_NAME "Cemu")
set(MACOSX_BUNDLE_SHORT_VERSION_STRING ${CMAKE_PROJECT_VERSION})
set(MACOSX_BUNDLE_BUNDLE_VERSION ${CMAKE_PROJECT_VERSION})
set(MACOSX_BUNDLE_SHORT_VERSION_STRING "${EMULATOR_VERSION_MAJOR}.${EMULATOR_VERSION_MINOR}.${EMULATOR_VERSION_PATCH}")
set(MACOSX_BUNDLE_BUNDLE_VERSION "${EMULATOR_VERSION_MAJOR}.${EMULATOR_VERSION_MINOR}.${EMULATOR_VERSION_PATCH}")
set(MACOSX_BUNDLE_COPYRIGHT "Copyright © 2024 Cemu Project")
set(MACOSX_BUNDLE_CATEGORY "public.app-category.games")
@ -101,12 +101,18 @@ if (MACOS_BUNDLE)
COMMAND ${CMAKE_COMMAND} ARGS -E copy_directory "${CMAKE_SOURCE_DIR}/bin/${folder}" "${CMAKE_SOURCE_DIR}/bin/${OUTPUT_NAME}.app/Contents/SharedSupport/${folder}")
endforeach(folder)
if(CMAKE_BUILD_TYPE STREQUAL "Debug")
set(LIBUSB_PATH "${CMAKE_BINARY_DIR}/vcpkg_installed/x64-osx/debug/lib/libusb-1.0.0.dylib")
else()
set(LIBUSB_PATH "${CMAKE_BINARY_DIR}/vcpkg_installed/x64-osx/lib/libusb-1.0.0.dylib")
endif()
add_custom_command (TARGET CemuBin POST_BUILD
COMMAND ${CMAKE_COMMAND} ARGS -E copy "/usr/local/lib/libMoltenVK.dylib" "${CMAKE_SOURCE_DIR}/bin/${OUTPUT_NAME}.app/Contents/Frameworks/libMoltenVK.dylib"
COMMAND ${CMAKE_COMMAND} ARGS -E copy "${CMAKE_BINARY_DIR}/vcpkg_installed/x64-osx/lib/libusb-1.0.0.dylib" "${CMAKE_SOURCE_DIR}/bin/${OUTPUT_NAME}.app/Contents/Frameworks/libusb-1.0.0.dylib"
COMMAND ${CMAKE_COMMAND} ARGS -E copy "${LIBUSB_PATH}" "${CMAKE_SOURCE_DIR}/bin/${OUTPUT_NAME}.app/Contents/Frameworks/libusb-1.0.0.dylib"
COMMAND ${CMAKE_COMMAND} ARGS -E copy "${CMAKE_SOURCE_DIR}/src/resource/update.sh" "${CMAKE_SOURCE_DIR}/bin/${OUTPUT_NAME}.app/Contents/MacOS/update.sh"
COMMAND bash -c "install_name_tool -add_rpath @executable_path/../Frameworks ${CMAKE_SOURCE_DIR}/bin/${OUTPUT_NAME}.app/Contents/MacOS/${OUTPUT_NAME}"
COMMAND bash -c "install_name_tool -change /Users/runner/work/Cemu/Cemu/build/vcpkg_installed/x64-osx/lib/libusb-1.0.0.dylib @executable_path/../Frameworks/libusb-1.0.0.dylib ${CMAKE_SOURCE_DIR}/bin/${OUTPUT_NAME}.app/Contents/MacOS/${OUTPUT_NAME}")
COMMAND bash -c "install_name_tool -change ${LIBUSB_PATH} @executable_path/../Frameworks/libusb-1.0.0.dylib ${CMAKE_SOURCE_DIR}/bin/${OUTPUT_NAME}.app/Contents/MacOS/${OUTPUT_NAME}")
endif()
set_target_properties(CemuBin PROPERTIES

View file

@ -67,24 +67,31 @@ add_library(CemuCafe
HW/Espresso/Recompiler/PPCFunctionBoundaryTracker.h
HW/Espresso/Recompiler/PPCRecompiler.cpp
HW/Espresso/Recompiler/PPCRecompiler.h
HW/Espresso/Recompiler/PPCRecompilerImlAnalyzer.cpp
HW/Espresso/Recompiler/IML/IML.h
HW/Espresso/Recompiler/IML/IMLSegment.cpp
HW/Espresso/Recompiler/IML/IMLSegment.h
HW/Espresso/Recompiler/IML/IMLInstruction.cpp
HW/Espresso/Recompiler/IML/IMLInstruction.h
HW/Espresso/Recompiler/IML/IMLDebug.cpp
HW/Espresso/Recompiler/IML/IMLAnalyzer.cpp
HW/Espresso/Recompiler/IML/IMLOptimizer.cpp
HW/Espresso/Recompiler/IML/IMLRegisterAllocator.cpp
HW/Espresso/Recompiler/IML/IMLRegisterAllocator.h
HW/Espresso/Recompiler/IML/IMLRegisterAllocatorRanges.cpp
HW/Espresso/Recompiler/IML/IMLRegisterAllocatorRanges.h
HW/Espresso/Recompiler/PPCRecompilerImlGen.cpp
HW/Espresso/Recompiler/PPCRecompilerImlGenFPU.cpp
HW/Espresso/Recompiler/PPCRecompilerIml.h
HW/Espresso/Recompiler/PPCRecompilerImlOptimizer.cpp
HW/Espresso/Recompiler/PPCRecompilerImlRanges.cpp
HW/Espresso/Recompiler/PPCRecompilerImlRanges.h
HW/Espresso/Recompiler/PPCRecompilerImlRegisterAllocator2.cpp
HW/Espresso/Recompiler/PPCRecompilerImlRegisterAllocator.cpp
HW/Espresso/Recompiler/PPCRecompilerIntermediate.cpp
HW/Espresso/Recompiler/PPCRecompilerX64AVX.cpp
HW/Espresso/Recompiler/PPCRecompilerX64BMI.cpp
HW/Espresso/Recompiler/PPCRecompilerX64.cpp
HW/Espresso/Recompiler/PPCRecompilerX64FPU.cpp
HW/Espresso/Recompiler/PPCRecompilerX64Gen.cpp
HW/Espresso/Recompiler/PPCRecompilerX64GenFPU.cpp
HW/Espresso/Recompiler/PPCRecompilerX64.h
HW/Espresso/Recompiler/x64Emit.hpp
HW/Espresso/Recompiler/BackendX64/BackendX64AVX.cpp
HW/Espresso/Recompiler/BackendX64/BackendX64BMI.cpp
HW/Espresso/Recompiler/BackendX64/BackendX64.cpp
HW/Espresso/Recompiler/BackendX64/BackendX64FPU.cpp
HW/Espresso/Recompiler/BackendX64/BackendX64Gen.cpp
HW/Espresso/Recompiler/BackendX64/BackendX64GenFPU.cpp
HW/Espresso/Recompiler/BackendX64/BackendX64.h
HW/Espresso/Recompiler/BackendX64/X64Emit.hpp
HW/Espresso/Recompiler/BackendX64/x86Emitter.h
HW/Latte/Common/RegisterSerializer.cpp
HW/Latte/Common/RegisterSerializer.h
HW/Latte/Common/ShaderSerializer.cpp
@ -463,8 +470,8 @@ add_library(CemuCafe
OS/libs/nsyshid/BackendEmulated.h
OS/libs/nsyshid/BackendLibusb.cpp
OS/libs/nsyshid/BackendLibusb.h
OS/libs/nsyshid/BackendWindowsHID.cpp
OS/libs/nsyshid/BackendWindowsHID.h
OS/libs/nsyshid/Dimensions.cpp
OS/libs/nsyshid/Dimensions.h
OS/libs/nsyshid/Infinity.cpp
OS/libs/nsyshid/Infinity.h
OS/libs/nsyshid/Skylander.cpp
@ -530,6 +537,12 @@ set_property(TARGET CemuCafe PROPERTY MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CON
target_include_directories(CemuCafe PUBLIC "../")
if (glslang_VERSION VERSION_LESS "15.0.0")
set(glslang_target "glslang::SPIRV")
else()
set(glslang_target "glslang")
endif()
target_link_libraries(CemuCafe PRIVATE
CemuAsm
CemuAudio
@ -545,7 +558,7 @@ target_link_libraries(CemuCafe PRIVATE
Boost::nowide
CURL::libcurl
fmt::fmt
glslang::SPIRV
${glslang_target}
ih264d
OpenSSL::Crypto
OpenSSL::SSL
@ -561,15 +574,16 @@ if (ENABLE_WAYLAND)
target_link_libraries(CemuCafe PUBLIC Wayland::Client)
endif()
if (ENABLE_NSYSHID_LIBUSB)
if (ENABLE_VCPKG)
find_package(PkgConfig REQUIRED)
pkg_check_modules(libusb REQUIRED IMPORTED_TARGET libusb-1.0)
target_link_libraries(CemuCafe PRIVATE PkgConfig::libusb)
else ()
find_package(libusb MODULE REQUIRED)
target_link_libraries(CemuCafe PRIVATE libusb::libusb)
endif ()
if (ENABLE_VCPKG)
if(WIN32)
set(PKG_CONFIG_EXECUTABLE "${VCPKG_INSTALLED_DIR}/x64-windows/tools/pkgconf/pkgconf.exe")
endif()
find_package(PkgConfig REQUIRED)
pkg_check_modules(libusb REQUIRED IMPORTED_TARGET libusb-1.0)
target_link_libraries(CemuCafe PRIVATE PkgConfig::libusb)
else ()
find_package(libusb MODULE REQUIRED)
target_link_libraries(CemuCafe PRIVATE libusb::libusb)
endif ()
if (ENABLE_WXWIDGETS)

View file

@ -9,6 +9,7 @@
#include "audio/IAudioAPI.h"
#include "audio/IAudioInputAPI.h"
#include "config/ActiveSettings.h"
#include "config/LaunchSettings.h"
#include "Cafe/TitleList/GameInfo.h"
#include "Cafe/GraphicPack/GraphicPack2.h"
#include "util/helpers/SystemException.h"
@ -396,7 +397,7 @@ void cemu_initForGame()
// replace any known function signatures with our HLE implementations and patch bugs in the games
GamePatch_scan();
}
LatteGPUState.alwaysDisplayDRC = ActiveSettings::DisplayDRCEnabled();
LatteGPUState.isDRCPrimary = ActiveSettings::DisplayDRCEnabled();
InfoLog_PrintActiveSettings();
Latte_Start();
// check for debugger entrypoint bp
@ -637,40 +638,40 @@ namespace CafeSystem
fsc_unmount("/cemuBossStorage/", FSC_PRIORITY_BASE);
}
STATUS_CODE LoadAndMountForegroundTitle(TitleId titleId)
PREPARE_STATUS_CODE LoadAndMountForegroundTitle(TitleId titleId)
{
cemuLog_log(LogType::Force, "Mounting title {:016x}", (uint64)titleId);
sGameInfo_ForegroundTitle = CafeTitleList::GetGameInfo(titleId);
if (!sGameInfo_ForegroundTitle.IsValid())
{
cemuLog_log(LogType::Force, "Mounting failed: Game meta information is either missing, inaccessible or not valid (missing or invalid .xml files in code and meta folder)");
return STATUS_CODE::UNABLE_TO_MOUNT;
return PREPARE_STATUS_CODE::UNABLE_TO_MOUNT;
}
// check base
TitleInfo& titleBase = sGameInfo_ForegroundTitle.GetBase();
if (!titleBase.IsValid())
return STATUS_CODE::UNABLE_TO_MOUNT;
return PREPARE_STATUS_CODE::UNABLE_TO_MOUNT;
if(!titleBase.ParseXmlInfo())
return STATUS_CODE::UNABLE_TO_MOUNT;
return PREPARE_STATUS_CODE::UNABLE_TO_MOUNT;
cemuLog_log(LogType::Force, "Base: {}", titleBase.GetPrintPath());
// mount base
if (!titleBase.Mount("/vol/content", "content", FSC_PRIORITY_BASE) || !titleBase.Mount(GetInternalVirtualCodeFolder(), "code", FSC_PRIORITY_BASE))
{
cemuLog_log(LogType::Force, "Mounting failed");
return STATUS_CODE::UNABLE_TO_MOUNT;
return PREPARE_STATUS_CODE::UNABLE_TO_MOUNT;
}
// check update
TitleInfo& titleUpdate = sGameInfo_ForegroundTitle.GetUpdate();
if (titleUpdate.IsValid())
{
if (!titleUpdate.ParseXmlInfo())
return STATUS_CODE::UNABLE_TO_MOUNT;
return PREPARE_STATUS_CODE::UNABLE_TO_MOUNT;
cemuLog_log(LogType::Force, "Update: {}", titleUpdate.GetPrintPath());
// mount update
if (!titleUpdate.Mount("/vol/content", "content", FSC_PRIORITY_PATCH) || !titleUpdate.Mount(GetInternalVirtualCodeFolder(), "code", FSC_PRIORITY_PATCH))
{
cemuLog_log(LogType::Force, "Mounting failed");
return STATUS_CODE::UNABLE_TO_MOUNT;
return PREPARE_STATUS_CODE::UNABLE_TO_MOUNT;
}
}
else
@ -682,20 +683,20 @@ namespace CafeSystem
// todo - support for multi-title AOC
TitleInfo& titleAOC = aocList[0];
if (!titleAOC.ParseXmlInfo())
return STATUS_CODE::UNABLE_TO_MOUNT;
return PREPARE_STATUS_CODE::UNABLE_TO_MOUNT;
cemu_assert_debug(titleAOC.IsValid());
cemuLog_log(LogType::Force, "DLC: {}", titleAOC.GetPrintPath());
// mount AOC
if (!titleAOC.Mount(fmt::format("/vol/aoc{:016x}", titleAOC.GetAppTitleId()), "content", FSC_PRIORITY_PATCH))
{
cemuLog_log(LogType::Force, "Mounting failed");
return STATUS_CODE::UNABLE_TO_MOUNT;
return PREPARE_STATUS_CODE::UNABLE_TO_MOUNT;
}
}
else
cemuLog_log(LogType::Force, "DLC: Not present");
sForegroundTitleId = titleId;
return STATUS_CODE::SUCCESS;
return PREPARE_STATUS_CODE::SUCCESS;
}
void UnmountForegroundTitle()
@ -723,7 +724,7 @@ namespace CafeSystem
}
}
STATUS_CODE SetupExecutable()
PREPARE_STATUS_CODE SetupExecutable()
{
// set rpx path from cos.xml if available
_pathToBaseExecutable = _pathToExecutable;
@ -755,7 +756,7 @@ namespace CafeSystem
}
}
LoadMainExecutable();
return STATUS_CODE::SUCCESS;
return PREPARE_STATUS_CODE::SUCCESS;
}
void SetupMemorySpace()
@ -769,7 +770,7 @@ namespace CafeSystem
memory_unmapForCurrentTitle();
}
STATUS_CODE PrepareForegroundTitle(TitleId titleId)
PREPARE_STATUS_CODE PrepareForegroundTitle(TitleId titleId)
{
CafeTitleList::WaitForMandatoryScan();
sLaunchModeIsStandalone = false;
@ -780,21 +781,21 @@ namespace CafeSystem
// mount mlc storage
MountBaseDirectories();
// mount title folders
STATUS_CODE r = LoadAndMountForegroundTitle(titleId);
if (r != STATUS_CODE::SUCCESS)
PREPARE_STATUS_CODE r = LoadAndMountForegroundTitle(titleId);
if (r != PREPARE_STATUS_CODE::SUCCESS)
return r;
gameProfile_load();
// setup memory space and PPC recompiler
SetupMemorySpace();
PPCRecompiler_init();
r = SetupExecutable(); // load RPX
if (r != STATUS_CODE::SUCCESS)
if (r != PREPARE_STATUS_CODE::SUCCESS)
return r;
InitVirtualMlcStorage();
return STATUS_CODE::SUCCESS;
return PREPARE_STATUS_CODE::SUCCESS;
}
STATUS_CODE PrepareForegroundTitleFromStandaloneRPX(const fs::path& path)
PREPARE_STATUS_CODE PrepareForegroundTitleFromStandaloneRPX(const fs::path& path)
{
sLaunchModeIsStandalone = true;
cemuLog_log(LogType::Force, "Launching executable in standalone mode due to incorrect layout or missing meta files");
@ -812,7 +813,7 @@ namespace CafeSystem
if (!r)
{
cemuLog_log(LogType::Force, "Failed to mount {}", _pathToUtf8(contentPath));
return STATUS_CODE::UNABLE_TO_MOUNT;
return PREPARE_STATUS_CODE::UNABLE_TO_MOUNT;
}
}
}
@ -824,7 +825,7 @@ namespace CafeSystem
// since a lot of systems (including save folder location) rely on a TitleId, we derive a placeholder id from the executable hash
auto execData = fsc_extractFile(_pathToExecutable.c_str());
if (!execData)
return STATUS_CODE::INVALID_RPX;
return PREPARE_STATUS_CODE::INVALID_RPX;
uint32 h = generateHashFromRawRPXData(execData->data(), execData->size());
sForegroundTitleId = 0xFFFFFFFF00000000ULL | (uint64)h;
cemuLog_log(LogType::Force, "Generated placeholder TitleId: {:016x}", sForegroundTitleId);
@ -834,7 +835,7 @@ namespace CafeSystem
// load executable
SetupExecutable();
InitVirtualMlcStorage();
return STATUS_CODE::SUCCESS;
return PREPARE_STATUS_CODE::SUCCESS;
}
void _LaunchTitleThread()
@ -843,7 +844,7 @@ namespace CafeSystem
module->TitleStart();
cemu_initForGame();
// enter scheduler
if (ActiveSettings::GetCPUMode() == CPUMode::MulticoreRecompiler)
if (ActiveSettings::GetCPUMode() == CPUMode::MulticoreRecompiler && !LaunchSettings::ForceInterpreter())
coreinit::OSSchedulerBegin(3);
else
coreinit::OSSchedulerBegin(1);

View file

@ -15,20 +15,19 @@ namespace CafeSystem
virtual void CafeRecreateCanvas() = 0;
};
enum class STATUS_CODE
enum class PREPARE_STATUS_CODE
{
SUCCESS,
INVALID_RPX,
UNABLE_TO_MOUNT, // failed to mount through TitleInfo (most likely caused by an invalid or outdated path)
//BAD_META_DATA, - the title list only stores titles with valid meta, so this error code is impossible
};
void Initialize();
void SetImplementation(SystemImplementation* impl);
void Shutdown();
STATUS_CODE PrepareForegroundTitle(TitleId titleId);
STATUS_CODE PrepareForegroundTitleFromStandaloneRPX(const fs::path& path);
PREPARE_STATUS_CODE PrepareForegroundTitle(TitleId titleId);
PREPARE_STATUS_CODE PrepareForegroundTitleFromStandaloneRPX(const fs::path& path);
void LaunchForegroundTitle();
bool IsTitleRunning();

View file

@ -3,8 +3,7 @@
#include "Cemu/ncrypto/ncrypto.h"
#include "Cafe/Filesystem/WUD/wud.h"
#include "util/crypto/aes128.h"
#include "openssl/evp.h" /* EVP_Digest */
#include "openssl/sha.h" /* SHA1 / SHA256_DIGEST_LENGTH */
#include "openssl/sha.h" /* SHA1 / SHA256 */
#include "fstUtil.h"
#include "FST.h"
@ -141,7 +140,7 @@ struct DiscPartitionTableHeader
static constexpr uint32 MAGIC_VALUE = 0xCCA6E67B;
/* +0x00 */ uint32be magic;
/* +0x04 */ uint32be sectorSize; // must be 0x8000?
/* +0x04 */ uint32be blockSize; // must be 0x8000?
/* +0x08 */ uint8 partitionTableHash[20]; // hash of the data range at +0x800 to end of sector (0x8000)
/* +0x1C */ uint32be numPartitions;
};
@ -164,10 +163,10 @@ struct DiscPartitionHeader
static constexpr uint32 MAGIC_VALUE = 0xCC93A4F5;
/* +0x00 */ uint32be magic;
/* +0x04 */ uint32be sectorSize; // must match DISC_SECTOR_SIZE
/* +0x04 */ uint32be sectorSize; // must match DISC_SECTOR_SIZE for hashed blocks
/* +0x08 */ uint32be ukn008;
/* +0x0C */ uint32be ukn00C;
/* +0x0C */ uint32be ukn00C; // h3 array size?
/* +0x10 */ uint32be h3HashNum;
/* +0x14 */ uint32be fstSize; // in bytes
/* +0x18 */ uint32be fstSector; // relative to partition start
@ -178,13 +177,15 @@ struct DiscPartitionHeader
/* +0x24 */ uint8 fstHashType;
/* +0x25 */ uint8 fstEncryptionType; // purpose of this isn't really understood. Maybe it controls which key is being used? (1 -> disc key, 2 -> partition key)
/* +0x26 */ uint8 versionA;
/* +0x27 */ uint8 ukn027; // also a version field?
/* +0x26 */ uint8be versionA;
/* +0x27 */ uint8be ukn027; // also a version field?
// there is an array at +0x40 ? Related to H3 list. Also related to value at +0x0C and h3HashNum
/* +0x28 */ uint8be _uknOrPadding028[0x18];
/* +0x40 */ uint8be h3HashArray[32]; // dynamic size. Only present if fstHashType != 0
};
static_assert(sizeof(DiscPartitionHeader) == 0x28);
static_assert(sizeof(DiscPartitionHeader) == 0x40+0x20);
bool FSTVolume::FindDiscKey(const fs::path& path, NCrypto::AesKey& discTitleKey)
{
@ -269,7 +270,7 @@ FSTVolume* FSTVolume::OpenFromDiscImage(const fs::path& path, NCrypto::AesKey& d
cemuLog_log(LogType::Force, "Disc image rejected because decryption failed");
return nullptr;
}
if (partitionHeader->sectorSize != DISC_SECTOR_SIZE)
if (partitionHeader->blockSize != DISC_SECTOR_SIZE)
{
cemuLog_log(LogType::Force, "Disc image rejected because partition sector size is invalid");
return nullptr;
@ -336,6 +337,9 @@ FSTVolume* FSTVolume::OpenFromDiscImage(const fs::path& path, NCrypto::AesKey& d
cemu_assert_debug(partitionHeaderSI.fstEncryptionType == 1);
// todo - check other fields?
if(partitionHeaderSI.fstHashType == 0 && partitionHeaderSI.h3HashNum != 0)
cemuLog_log(LogType::Force, "FST: Partition uses unhashed blocks but stores a non-zero amount of H3 hashes");
// GM partition
DiscPartitionHeader partitionHeaderGM{};
if (!readPartitionHeader(partitionHeaderGM, gmPartitionIndex))
@ -349,9 +353,10 @@ FSTVolume* FSTVolume::OpenFromDiscImage(const fs::path& path, NCrypto::AesKey& d
// if decryption is necessary
// load SI FST
dataSource->SetBaseOffset((uint64)partitionArray[siPartitionIndex].partitionAddress * DISC_SECTOR_SIZE);
auto siFST = OpenFST(dataSource.get(), (uint64)partitionHeaderSI.fstSector * DISC_SECTOR_SIZE, partitionHeaderSI.fstSize, &discTitleKey, static_cast<FSTVolume::ClusterHashMode>(partitionHeaderSI.fstHashType));
auto siFST = OpenFST(dataSource.get(), (uint64)partitionHeaderSI.fstSector * DISC_SECTOR_SIZE, partitionHeaderSI.fstSize, &discTitleKey, static_cast<FSTVolume::ClusterHashMode>(partitionHeaderSI.fstHashType), nullptr);
if (!siFST)
return nullptr;
cemu_assert_debug(!(siFST->HashIsDisabled() && partitionHeaderSI.h3HashNum != 0)); // if hash is disabled, no H3 data may be present
// load ticket file for partition that we want to decrypt
NCrypto::ETicketParser ticketParser;
std::vector<uint8> ticketData = siFST->ExtractFile(fmt::format("{:02x}/title.tik", gmPartitionIndex));
@ -360,16 +365,32 @@ FSTVolume* FSTVolume::OpenFromDiscImage(const fs::path& path, NCrypto::AesKey& d
cemuLog_log(LogType::Force, "Disc image ticket file is invalid");
return nullptr;
}
#if 0
// each SI partition seems to contain a title.tmd that we could parse and which should have information about the associated GM partition
// but the console seems to ignore this file for disc images, at least when mounting, so we shouldn't rely on it either
std::vector<uint8> tmdData = siFST->ExtractFile(fmt::format("{:02x}/title.tmd", gmPartitionIndex));
if (tmdData.empty())
{
cemuLog_log(LogType::Force, "Disc image TMD file is missing");
return nullptr;
}
// parse TMD
NCrypto::TMDParser tmdParser;
if (!tmdParser.parse(tmdData.data(), tmdData.size()))
{
cemuLog_log(LogType::Force, "Disc image TMD file is invalid");
return nullptr;
}
#endif
delete siFST;
NCrypto::AesKey gmTitleKey;
ticketParser.GetTitleKey(gmTitleKey);
// load GM partition
dataSource->SetBaseOffset((uint64)partitionArray[gmPartitionIndex].partitionAddress * DISC_SECTOR_SIZE);
FSTVolume* r = OpenFST(std::move(dataSource), (uint64)partitionHeaderGM.fstSector * DISC_SECTOR_SIZE, partitionHeaderGM.fstSize, &gmTitleKey, static_cast<FSTVolume::ClusterHashMode>(partitionHeaderGM.fstHashType));
FSTVolume* r = OpenFST(std::move(dataSource), (uint64)partitionHeaderGM.fstSector * DISC_SECTOR_SIZE, partitionHeaderGM.fstSize, &gmTitleKey, static_cast<FSTVolume::ClusterHashMode>(partitionHeaderGM.fstHashType), nullptr);
if (r)
SET_FST_ERROR(OK);
cemu_assert_debug(!(r->HashIsDisabled() && partitionHeaderGM.h3HashNum != 0)); // if hash is disabled, no H3 data may be present
return r;
}
@ -426,15 +447,15 @@ FSTVolume* FSTVolume::OpenFromContentFolder(fs::path folderPath, ErrorCode* erro
}
// load FST
// fstSize = size of first cluster?
FSTVolume* fstVolume = FSTVolume::OpenFST(std::move(dataSource), 0, fstSize, &titleKey, fstHashMode);
FSTVolume* fstVolume = FSTVolume::OpenFST(std::move(dataSource), 0, fstSize, &titleKey, fstHashMode, &tmdParser);
if (fstVolume)
SET_FST_ERROR(OK);
return fstVolume;
}
FSTVolume* FSTVolume::OpenFST(FSTDataSource* dataSource, uint64 fstOffset, uint32 fstSize, NCrypto::AesKey* partitionTitleKey, ClusterHashMode fstHashMode)
FSTVolume* FSTVolume::OpenFST(FSTDataSource* dataSource, uint64 fstOffset, uint32 fstSize, NCrypto::AesKey* partitionTitleKey, ClusterHashMode fstHashMode, NCrypto::TMDParser* optionalTMD)
{
cemu_assert_debug(fstHashMode != ClusterHashMode::RAW || fstHashMode != ClusterHashMode::RAW2);
cemu_assert_debug(fstHashMode != ClusterHashMode::RAW || fstHashMode != ClusterHashMode::RAW_STREAM);
if (fstSize < sizeof(FSTHeader))
return nullptr;
constexpr uint64 FST_CLUSTER_OFFSET = 0;
@ -465,6 +486,34 @@ FSTVolume* FSTVolume::OpenFST(FSTDataSource* dataSource, uint64 fstOffset, uint3
clusterTable[i].offset = clusterDataTable[i].offset;
clusterTable[i].size = clusterDataTable[i].size;
clusterTable[i].hashMode = static_cast<FSTVolume::ClusterHashMode>((uint8)clusterDataTable[i].hashMode);
clusterTable[i].hasContentHash = false; // from the TMD file (H4?)
}
// if the TMD is available (when opening .app files) we can use the extra info from it to validate unhashed clusters
// each content entry in the TMD corresponds to one cluster used by the FST
if(optionalTMD)
{
if(numCluster != optionalTMD->GetContentList().size())
{
cemuLog_log(LogType::Force, "FST: Number of clusters does not match TMD content list");
return nullptr;
}
auto& contentList = optionalTMD->GetContentList();
for(size_t i=0; i<contentList.size(); i++)
{
auto& cluster = clusterTable[i];
auto& content = contentList[i];
cluster.hasContentHash = true;
cluster.contentHashIsSHA1 = HAS_FLAG(contentList[i].contentFlags, NCrypto::TMDParser::TMDContentFlags::FLAG_SHA1);
cluster.contentSize = content.size;
static_assert(sizeof(content.hash32) == sizeof(cluster.contentHash32));
memcpy(cluster.contentHash32, content.hash32, sizeof(cluster.contentHash32));
// if unhashed mode, then initialize the hash context
if(cluster.hashMode == ClusterHashMode::RAW || cluster.hashMode == ClusterHashMode::RAW_STREAM)
{
cluster.singleHashCtx.reset(EVP_MD_CTX_new());
EVP_DigestInit_ex(cluster.singleHashCtx.get(), cluster.contentHashIsSHA1 ? EVP_sha1() : EVP_sha256(), nullptr);
}
}
}
// preprocess FST table
FSTHeader_FileEntry* fileTable = (FSTHeader_FileEntry*)(clusterDataTable + numCluster);
@ -491,16 +540,17 @@ FSTVolume* FSTVolume::OpenFST(FSTDataSource* dataSource, uint64 fstOffset, uint3
fstVolume->m_offsetFactor = fstHeader->offsetFactor;
fstVolume->m_sectorSize = DISC_SECTOR_SIZE;
fstVolume->m_partitionTitlekey = *partitionTitleKey;
std::swap(fstVolume->m_cluster, clusterTable);
std::swap(fstVolume->m_entries, fstEntries);
std::swap(fstVolume->m_nameStringTable, nameStringTable);
fstVolume->m_hashIsDisabled = fstHeader->hashIsDisabled != 0;
fstVolume->m_cluster = std::move(clusterTable);
fstVolume->m_entries = std::move(fstEntries);
fstVolume->m_nameStringTable = std::move(nameStringTable);
return fstVolume;
}
FSTVolume* FSTVolume::OpenFST(std::unique_ptr<FSTDataSource> dataSource, uint64 fstOffset, uint32 fstSize, NCrypto::AesKey* partitionTitleKey, ClusterHashMode fstHashMode)
FSTVolume* FSTVolume::OpenFST(std::unique_ptr<FSTDataSource> dataSource, uint64 fstOffset, uint32 fstSize, NCrypto::AesKey* partitionTitleKey, ClusterHashMode fstHashMode, NCrypto::TMDParser* optionalTMD)
{
FSTDataSource* ds = dataSource.release();
FSTVolume* fstVolume = OpenFST(ds, fstOffset, fstSize, partitionTitleKey, fstHashMode);
FSTVolume* fstVolume = OpenFST(ds, fstOffset, fstSize, partitionTitleKey, fstHashMode, optionalTMD);
if (!fstVolume)
{
delete ds;
@ -757,7 +807,7 @@ uint32 FSTVolume::ReadFile(FSTFileHandle& fileHandle, uint32 offset, uint32 size
return 0;
cemu_assert_debug(!HAS_FLAG(entry.GetFlags(), FSTEntry::FLAGS::FLAG_LINK));
FSTCluster& cluster = m_cluster[entry.fileInfo.clusterIndex];
if (cluster.hashMode == ClusterHashMode::RAW || cluster.hashMode == ClusterHashMode::RAW2)
if (cluster.hashMode == ClusterHashMode::RAW || cluster.hashMode == ClusterHashMode::RAW_STREAM)
return ReadFile_HashModeRaw(entry.fileInfo.clusterIndex, entry, offset, size, dataOut);
else if (cluster.hashMode == ClusterHashMode::HASH_INTERLEAVED)
return ReadFile_HashModeHashed(entry.fileInfo.clusterIndex, entry, offset, size, dataOut);
@ -765,87 +815,15 @@ uint32 FSTVolume::ReadFile(FSTFileHandle& fileHandle, uint32 offset, uint32 size
return 0;
}
uint32 FSTVolume::ReadFile_HashModeRaw(uint32 clusterIndex, FSTEntry& entry, uint32 readOffset, uint32 readSize, void* dataOut)
{
const uint32 readSizeInput = readSize;
uint8* dataOutU8 = (uint8*)dataOut;
if (readOffset >= entry.fileInfo.fileSize)
return 0;
else if ((readOffset + readSize) >= entry.fileInfo.fileSize)
readSize = (entry.fileInfo.fileSize - readOffset);
const FSTCluster& cluster = m_cluster[clusterIndex];
uint64 clusterOffset = (uint64)cluster.offset * m_sectorSize;
uint64 absFileOffset = entry.fileInfo.fileOffset * m_offsetFactor + readOffset;
// make sure the raw range we read is aligned to AES block size (16)
uint64 readAddrStart = absFileOffset & ~0xF;
uint64 readAddrEnd = (absFileOffset + readSize + 0xF) & ~0xF;
bool usesInitialIV = readOffset < 16;
if (!usesInitialIV)
readAddrStart -= 16; // read previous AES block since we require it for the IV
uint32 prePadding = (uint32)(absFileOffset - readAddrStart); // number of extra bytes we read before readOffset (for AES alignment and IV calculation)
uint32 postPadding = (uint32)(readAddrEnd - (absFileOffset + readSize));
uint8 readBuffer[64 * 1024];
// read first chunk
// if file read offset (readOffset) is within the first AES-block then use initial IV calculated from cluster index
// otherwise read previous AES-block is the IV (AES-CBC)
uint64 readAddrCurrent = readAddrStart;
uint32 rawBytesToRead = (uint32)std::min((readAddrEnd - readAddrStart), (uint64)sizeof(readBuffer));
if (m_dataSource->readData(clusterIndex, clusterOffset, readAddrCurrent, readBuffer, rawBytesToRead) != rawBytesToRead)
{
cemuLog_log(LogType::Force, "FST read error in raw content");
return 0;
}
readAddrCurrent += rawBytesToRead;
uint8 iv[16]{};
if (usesInitialIV)
{
// for the first AES block, the IV is initialized from cluster index
iv[0] = (uint8)(clusterIndex >> 8);
iv[1] = (uint8)(clusterIndex >> 0);
AES128_CBC_decrypt_updateIV(readBuffer, readBuffer, rawBytesToRead, m_partitionTitlekey.b, iv);
std::memcpy(dataOutU8, readBuffer + prePadding, rawBytesToRead - prePadding - postPadding);
dataOutU8 += (rawBytesToRead - prePadding - postPadding);
readSize -= (rawBytesToRead - prePadding - postPadding);
}
else
{
// IV is initialized from previous AES block (AES-CBC)
std::memcpy(iv, readBuffer, 16);
AES128_CBC_decrypt_updateIV(readBuffer + 16, readBuffer + 16, rawBytesToRead - 16, m_partitionTitlekey.b, iv);
std::memcpy(dataOutU8, readBuffer + prePadding, rawBytesToRead - prePadding - postPadding);
dataOutU8 += (rawBytesToRead - prePadding - postPadding);
readSize -= (rawBytesToRead - prePadding - postPadding);
}
// read remaining chunks
while (readSize > 0)
{
uint32 bytesToRead = (uint32)std::min((uint32)sizeof(readBuffer), readSize);
uint32 alignedBytesToRead = (bytesToRead + 15) & ~0xF;
if (m_dataSource->readData(clusterIndex, clusterOffset, readAddrCurrent, readBuffer, alignedBytesToRead) != alignedBytesToRead)
{
cemuLog_log(LogType::Force, "FST read error in raw content");
return 0;
}
AES128_CBC_decrypt_updateIV(readBuffer, readBuffer, alignedBytesToRead, m_partitionTitlekey.b, iv);
std::memcpy(dataOutU8, readBuffer, bytesToRead);
dataOutU8 += bytesToRead;
readSize -= bytesToRead;
readAddrCurrent += alignedBytesToRead;
}
return readSizeInput - readSize;
}
constexpr size_t BLOCK_SIZE = 0x10000;
constexpr size_t BLOCK_HASH_SIZE = 0x0400;
constexpr size_t BLOCK_FILE_SIZE = 0xFC00;
struct FSTRawBlock
{
std::vector<uint8> rawData; // unhashed block size depends on sector size field in partition header
};
struct FSTHashedBlock
{
uint8 rawData[BLOCK_SIZE];
@ -887,12 +865,160 @@ struct FSTHashedBlock
static_assert(sizeof(FSTHashedBlock) == BLOCK_SIZE);
struct FSTCachedRawBlock
{
FSTRawBlock blockData;
uint8 ivForNextBlock[16];
uint64 lastAccess;
};
struct FSTCachedHashedBlock
{
FSTHashedBlock blockData;
uint64 lastAccess;
};
// Checks cache fill state and if necessary drops least recently accessed block from the cache. Optionally allows to recycle the released cache entry to cut down cost of memory allocation and clearing
void FSTVolume::TrimCacheIfRequired(FSTCachedRawBlock** droppedRawBlock, FSTCachedHashedBlock** droppedHashedBlock)
{
// calculate size used by cache
size_t cacheSize = 0;
for (auto& itr : m_cacheDecryptedRawBlocks)
cacheSize += itr.second->blockData.rawData.size();
for (auto& itr : m_cacheDecryptedHashedBlocks)
cacheSize += sizeof(FSTCachedHashedBlock) + sizeof(FSTHashedBlock);
// only trim if cache is full (larger than 2MB)
if (cacheSize < 2*1024*1024) // 2MB
return;
// scan both cache lists to find least recently accessed block to drop
auto dropRawItr = std::min_element(m_cacheDecryptedRawBlocks.begin(), m_cacheDecryptedRawBlocks.end(), [](const auto& a, const auto& b) -> bool
{ return a.second->lastAccess < b.second->lastAccess; });
auto dropHashedItr = std::min_element(m_cacheDecryptedHashedBlocks.begin(), m_cacheDecryptedHashedBlocks.end(), [](const auto& a, const auto& b) -> bool
{ return a.second->lastAccess < b.second->lastAccess; });
uint64 lastAccess = std::numeric_limits<uint64>::max();
if(dropRawItr != m_cacheDecryptedRawBlocks.end())
lastAccess = dropRawItr->second->lastAccess;
if(dropHashedItr != m_cacheDecryptedHashedBlocks.end())
lastAccess = std::min<uint64>(lastAccess, dropHashedItr->second->lastAccess);
if(dropRawItr != m_cacheDecryptedRawBlocks.end() && dropRawItr->second->lastAccess == lastAccess)
{
if (droppedRawBlock)
*droppedRawBlock = dropRawItr->second;
else
delete dropRawItr->second;
m_cacheDecryptedRawBlocks.erase(dropRawItr);
return;
}
else if(dropHashedItr != m_cacheDecryptedHashedBlocks.end() && dropHashedItr->second->lastAccess == lastAccess)
{
if (droppedHashedBlock)
*droppedHashedBlock = dropHashedItr->second;
else
delete dropHashedItr->second;
m_cacheDecryptedHashedBlocks.erase(dropHashedItr);
}
}
void FSTVolume::DetermineUnhashedBlockIV(uint32 clusterIndex, uint32 blockIndex, uint8 ivOut[16])
{
memset(ivOut, 0, sizeof(ivOut));
if(blockIndex == 0)
{
ivOut[0] = (uint8)(clusterIndex >> 8);
ivOut[1] = (uint8)(clusterIndex >> 0);
}
else
{
// the last 16 encrypted bytes of the previous block are the IV (AES CBC)
// if the previous block is cached we can grab the IV from there. Otherwise we have to read the 16 bytes from the data source
uint32 prevBlockIndex = blockIndex - 1;
uint64 cacheBlockId = ((uint64)clusterIndex << (64 - 16)) | (uint64)prevBlockIndex;
auto itr = m_cacheDecryptedRawBlocks.find(cacheBlockId);
if (itr != m_cacheDecryptedRawBlocks.end())
{
memcpy(ivOut, itr->second->ivForNextBlock, 16);
}
else
{
cemu_assert(m_sectorSize >= 16);
uint64 clusterOffset = (uint64)m_cluster[clusterIndex].offset * m_sectorSize;
uint8 prevIV[16];
if (m_dataSource->readData(clusterIndex, clusterOffset, blockIndex * m_sectorSize - 16, prevIV, 16) != 16)
{
cemuLog_log(LogType::Force, "Failed to read IV for raw FST block");
m_detectedCorruption = true;
return;
}
memcpy(ivOut, prevIV, 16);
}
}
}
FSTCachedRawBlock* FSTVolume::GetDecryptedRawBlock(uint32 clusterIndex, uint32 blockIndex)
{
FSTCluster& cluster = m_cluster[clusterIndex];
uint64 clusterOffset = (uint64)cluster.offset * m_sectorSize;
// generate id for cache
uint64 cacheBlockId = ((uint64)clusterIndex << (64 - 16)) | (uint64)blockIndex;
// lookup block in cache
FSTCachedRawBlock* block = nullptr;
auto itr = m_cacheDecryptedRawBlocks.find(cacheBlockId);
if (itr != m_cacheDecryptedRawBlocks.end())
{
block = itr->second;
block->lastAccess = ++m_cacheAccessCounter;
return block;
}
// if cache already full, drop least recently accessed block and recycle FSTCachedRawBlock object if possible
TrimCacheIfRequired(&block, nullptr);
if (!block)
block = new FSTCachedRawBlock();
block->blockData.rawData.resize(m_sectorSize);
// block not cached, read new
block->lastAccess = ++m_cacheAccessCounter;
if (m_dataSource->readData(clusterIndex, clusterOffset, blockIndex * m_sectorSize, block->blockData.rawData.data(), m_sectorSize) != m_sectorSize)
{
cemuLog_log(LogType::Force, "Failed to read raw FST block");
delete block;
m_detectedCorruption = true;
return nullptr;
}
// decrypt hash data
uint8 iv[16]{};
DetermineUnhashedBlockIV(clusterIndex, blockIndex, iv);
memcpy(block->ivForNextBlock, block->blockData.rawData.data() + m_sectorSize - 16, 16);
AES128_CBC_decrypt(block->blockData.rawData.data(), block->blockData.rawData.data(), m_sectorSize, m_partitionTitlekey.b, iv);
// if this is the next block, then hash it
if(cluster.hasContentHash)
{
if(cluster.singleHashNumBlocksHashed == blockIndex)
{
cemu_assert_debug(!(cluster.contentSize % m_sectorSize)); // size should be multiple of sector size? Regardless, the hashing code below can handle non-aligned sizes
bool isLastBlock = blockIndex == (std::max<uint32>(cluster.contentSize / m_sectorSize, 1) - 1);
uint32 hashSize = m_sectorSize;
if(isLastBlock)
hashSize = cluster.contentSize - (uint64)blockIndex*m_sectorSize;
EVP_DigestUpdate(cluster.singleHashCtx.get(), block->blockData.rawData.data(), hashSize);
cluster.singleHashNumBlocksHashed++;
if(isLastBlock)
{
uint8 hash[32];
EVP_DigestFinal_ex(cluster.singleHashCtx.get(), hash, nullptr);
if(memcmp(hash, cluster.contentHash32, cluster.contentHashIsSHA1 ? 20 : 32) != 0)
{
cemuLog_log(LogType::Force, "FST: Raw section hash mismatch");
delete block;
m_detectedCorruption = true;
return nullptr;
}
}
}
}
// register in cache
m_cacheDecryptedRawBlocks.emplace(cacheBlockId, block);
return block;
}
FSTCachedHashedBlock* FSTVolume::GetDecryptedHashedBlock(uint32 clusterIndex, uint32 blockIndex)
{
const FSTCluster& cluster = m_cluster[clusterIndex];
@ -908,22 +1034,17 @@ FSTCachedHashedBlock* FSTVolume::GetDecryptedHashedBlock(uint32 clusterIndex, ui
block->lastAccess = ++m_cacheAccessCounter;
return block;
}
// if cache already full, drop least recently accessed block (but recycle the FSTHashedBlock* object)
if (m_cacheDecryptedHashedBlocks.size() >= 16)
{
auto dropItr = std::min_element(m_cacheDecryptedHashedBlocks.begin(), m_cacheDecryptedHashedBlocks.end(), [](const auto& a, const auto& b) -> bool
{ return a.second->lastAccess < b.second->lastAccess; });
block = dropItr->second;
m_cacheDecryptedHashedBlocks.erase(dropItr);
}
else
// if cache already full, drop least recently accessed block and recycle FSTCachedHashedBlock object if possible
TrimCacheIfRequired(nullptr, &block);
if (!block)
block = new FSTCachedHashedBlock();
// block not cached, read new
block->lastAccess = ++m_cacheAccessCounter;
if (m_dataSource->readData(clusterIndex, clusterOffset, blockIndex * BLOCK_SIZE, block->blockData.rawData, BLOCK_SIZE) != BLOCK_SIZE)
{
cemuLog_log(LogType::Force, "Failed to read FST block");
cemuLog_log(LogType::Force, "Failed to read hashed FST block");
delete block;
m_detectedCorruption = true;
return nullptr;
}
// decrypt hash data
@ -931,11 +1052,46 @@ FSTCachedHashedBlock* FSTVolume::GetDecryptedHashedBlock(uint32 clusterIndex, ui
AES128_CBC_decrypt(block->blockData.getHashData(), block->blockData.getHashData(), BLOCK_HASH_SIZE, m_partitionTitlekey.b, iv);
// decrypt file data
AES128_CBC_decrypt(block->blockData.getFileData(), block->blockData.getFileData(), BLOCK_FILE_SIZE, m_partitionTitlekey.b, block->blockData.getH0Hash(blockIndex%16));
// compare with H0 to verify data integrity
NCrypto::CHash160 h0;
SHA1(block->blockData.getFileData(), BLOCK_FILE_SIZE, h0.b);
uint32 h0Index = (blockIndex % 4096);
if (memcmp(h0.b, block->blockData.getH0Hash(h0Index & 0xF), sizeof(h0.b)) != 0)
{
cemuLog_log(LogType::Force, "FST: Hash H0 mismatch in hashed block (section {} index {})", clusterIndex, blockIndex);
delete block;
m_detectedCorruption = true;
return nullptr;
}
// register in cache
m_cacheDecryptedHashedBlocks.emplace(cacheBlockId, block);
return block;
}
uint32 FSTVolume::ReadFile_HashModeRaw(uint32 clusterIndex, FSTEntry& entry, uint32 readOffset, uint32 readSize, void* dataOut)
{
uint8* dataOutU8 = (uint8*)dataOut;
if (readOffset >= entry.fileInfo.fileSize)
return 0;
else if ((readOffset + readSize) >= entry.fileInfo.fileSize)
readSize = (entry.fileInfo.fileSize - readOffset);
uint64 absFileOffset = entry.fileInfo.fileOffset * m_offsetFactor + readOffset;
uint32 remainingReadSize = readSize;
while (remainingReadSize > 0)
{
const FSTCachedRawBlock* rawBlock = this->GetDecryptedRawBlock(clusterIndex, absFileOffset/m_sectorSize);
if (!rawBlock)
break;
uint32 blockOffset = (uint32)(absFileOffset % m_sectorSize);
uint32 bytesToRead = std::min<uint32>(remainingReadSize, m_sectorSize - blockOffset);
std::memcpy(dataOutU8, rawBlock->blockData.rawData.data() + blockOffset, bytesToRead);
dataOutU8 += bytesToRead;
remainingReadSize -= bytesToRead;
absFileOffset += bytesToRead;
}
return readSize - remainingReadSize;
}
uint32 FSTVolume::ReadFile_HashModeHashed(uint32 clusterIndex, FSTEntry& entry, uint32 readOffset, uint32 readSize, void* dataOut)
{
/*
@ -966,7 +1122,6 @@ uint32 FSTVolume::ReadFile_HashModeHashed(uint32 clusterIndex, FSTEntry& entry,
*/
const FSTCluster& cluster = m_cluster[clusterIndex];
uint64 clusterBaseOffset = (uint64)cluster.offset * m_sectorSize;
uint64 fileReadOffset = entry.fileInfo.fileOffset * m_offsetFactor + readOffset;
uint32 blockIndex = (uint32)(fileReadOffset / BLOCK_FILE_SIZE);
uint32 bytesRemaining = readSize;
@ -1019,6 +1174,8 @@ bool FSTVolume::Next(FSTDirectoryIterator& directoryIterator, FSTFileHandle& fil
FSTVolume::~FSTVolume()
{
for (auto& itr : m_cacheDecryptedRawBlocks)
delete itr.second;
for (auto& itr : m_cacheDecryptedHashedBlocks)
delete itr.second;
if (m_sourceIsOwned)
@ -1115,4 +1272,4 @@ bool FSTVerifier::VerifyHashedContentFile(FileStream* fileContent, const NCrypto
void FSTVolumeTest()
{
FSTPathUnitTest();
}
}

View file

@ -1,5 +1,6 @@
#pragma once
#include "Cemu/ncrypto/ncrypto.h"
#include "openssl/evp.h"
struct FSTFileHandle
{
@ -45,6 +46,7 @@ public:
~FSTVolume();
uint32 GetFileCount() const;
bool HasCorruption() const { return m_detectedCorruption; }
bool OpenFile(std::string_view path, FSTFileHandle& fileHandleOut, bool openOnlyFiles = false);
@ -86,15 +88,25 @@ private:
enum class ClusterHashMode : uint8
{
RAW = 0, // raw data + encryption, no hashing?
RAW2 = 1, // raw data + encryption, with hash stored in tmd?
RAW_STREAM = 1, // raw data + encryption, with hash stored in tmd?
HASH_INTERLEAVED = 2, // hashes + raw interleaved in 0x10000 blocks (0x400 bytes of hashes at the beginning, followed by 0xFC00 bytes of data)
};
struct FSTCluster
{
FSTCluster() : singleHashCtx(nullptr, &EVP_MD_CTX_free) {}
uint32 offset;
uint32 size;
ClusterHashMode hashMode;
// extra data if TMD is available
bool hasContentHash;
uint8 contentHash32[32];
bool contentHashIsSHA1; // if true then it's SHA1 (with extra bytes zeroed out), otherwise it's SHA256
uint64 contentSize; // size of the content (in blocks)
// hash context for single hash mode (content hash must be available)
std::unique_ptr<EVP_MD_CTX, decltype(&EVP_MD_CTX_free)> singleHashCtx; // unique_ptr to make this move-only
uint32 singleHashNumBlocksHashed{0};
};
struct FSTEntry
@ -164,17 +176,30 @@ private:
bool m_sourceIsOwned{};
uint32 m_sectorSize{}; // for cluster offsets
uint32 m_offsetFactor{}; // for file offsets
bool m_hashIsDisabled{}; // disables hash verification (for all clusters of this volume?)
std::vector<FSTCluster> m_cluster;
std::vector<FSTEntry> m_entries;
std::vector<char> m_nameStringTable;
NCrypto::AesKey m_partitionTitlekey;
bool m_detectedCorruption{false};
/* Cache for decrypted hashed blocks */
bool HashIsDisabled() const
{
return m_hashIsDisabled;
}
/* Cache for decrypted raw and hashed blocks */
std::unordered_map<uint64, struct FSTCachedRawBlock*> m_cacheDecryptedRawBlocks;
std::unordered_map<uint64, struct FSTCachedHashedBlock*> m_cacheDecryptedHashedBlocks;
uint64 m_cacheAccessCounter{};
void DetermineUnhashedBlockIV(uint32 clusterIndex, uint32 blockIndex, uint8 ivOut[16]);
struct FSTCachedRawBlock* GetDecryptedRawBlock(uint32 clusterIndex, uint32 blockIndex);
struct FSTCachedHashedBlock* GetDecryptedHashedBlock(uint32 clusterIndex, uint32 blockIndex);
void TrimCacheIfRequired(struct FSTCachedRawBlock** droppedRawBlock, struct FSTCachedHashedBlock** droppedHashedBlock);
/* File reading */
uint32 ReadFile_HashModeRaw(uint32 clusterIndex, FSTEntry& entry, uint32 readOffset, uint32 readSize, void* dataOut);
uint32 ReadFile_HashModeHashed(uint32 clusterIndex, FSTEntry& entry, uint32 readOffset, uint32 readSize, void* dataOut);
@ -185,7 +210,10 @@ private:
/* +0x00 */ uint32be magic;
/* +0x04 */ uint32be offsetFactor;
/* +0x08 */ uint32be numCluster;
/* +0x0C */ uint32be ukn0C;
/* +0x0C */ uint8be hashIsDisabled;
/* +0x0D */ uint8be ukn0D;
/* +0x0E */ uint8be ukn0E;
/* +0x0F */ uint8be ukn0F;
/* +0x10 */ uint32be ukn10;
/* +0x14 */ uint32be ukn14;
/* +0x18 */ uint32be ukn18;
@ -262,8 +290,8 @@ private:
static_assert(sizeof(FSTHeader_FileEntry) == 0x10);
static FSTVolume* OpenFST(FSTDataSource* dataSource, uint64 fstOffset, uint32 fstSize, NCrypto::AesKey* partitionTitleKey, ClusterHashMode fstHashMode);
static FSTVolume* OpenFST(std::unique_ptr<FSTDataSource> dataSource, uint64 fstOffset, uint32 fstSize, NCrypto::AesKey* partitionTitleKey, ClusterHashMode fstHashMode);
static FSTVolume* OpenFST(FSTDataSource* dataSource, uint64 fstOffset, uint32 fstSize, NCrypto::AesKey* partitionTitleKey, ClusterHashMode fstHashMode, NCrypto::TMDParser* optionalTMD);
static FSTVolume* OpenFST(std::unique_ptr<FSTDataSource> dataSource, uint64 fstOffset, uint32 fstSize, NCrypto::AesKey* partitionTitleKey, ClusterHashMode fstHashMode, NCrypto::TMDParser* optionalTMD);
static bool ProcessFST(FSTHeader_FileEntry* fileTable, uint32 numFileEntries, uint32 numCluster, std::vector<char>& nameStringTable, std::vector<FSTEntry>& fstEntries);
bool MatchFSTEntryName(FSTEntry& entry, std::string_view comparedName)

View file

@ -140,7 +140,7 @@ bool gameProfile_loadEnumOption(IniParser& iniParser, const char* optionName, T&
for(const T& v : T())
{
// test integer option
if (boost::iequals(fmt::format("{}", static_cast<typename std::underlying_type<T>::type>(v)), *option_value))
if (boost::iequals(fmt::format("{}", fmt::underlying(v)), *option_value))
{
option = v;
return true;

View file

@ -345,7 +345,7 @@ GraphicPack2::GraphicPack2(fs::path rulesPath, IniParser& rules)
const auto preset_name = rules.FindOption("name");
if (!preset_name)
{
cemuLog_log(LogType::Force, "Graphic pack \"{}\": Preset in line {} skipped because it has no name option defined", m_name, rules.GetCurrentSectionLineNumber());
cemuLog_log(LogType::Force, "Graphic pack \"{}\": Preset in line {} skipped because it has no name option defined", GetNormalizedPathString(), rules.GetCurrentSectionLineNumber());
continue;
}
@ -369,7 +369,7 @@ GraphicPack2::GraphicPack2(fs::path rulesPath, IniParser& rules)
}
catch (const std::exception & ex)
{
cemuLog_log(LogType::Force, "Graphic pack \"{}\": Can't parse preset \"{}\": {}", m_name, *preset_name, ex.what());
cemuLog_log(LogType::Force, "Graphic pack \"{}\": Can't parse preset \"{}\": {}", GetNormalizedPathString(), *preset_name, ex.what());
}
}
else if (boost::iequals(currentSectionName, "RAM"))
@ -383,7 +383,7 @@ GraphicPack2::GraphicPack2(fs::path rulesPath, IniParser& rules)
{
if (m_version <= 5)
{
cemuLog_log(LogType::Force, "Graphic pack \"{}\": [RAM] options are only available for graphic pack version 6 or higher", m_name, optionNameBuf);
cemuLog_log(LogType::Force, "Graphic pack \"{}\": [RAM] options are only available for graphic pack version 6 or higher", GetNormalizedPathString(), optionNameBuf);
throw std::exception();
}
@ -393,12 +393,12 @@ GraphicPack2::GraphicPack2(fs::path rulesPath, IniParser& rules)
{
if (addrEnd <= addrStart)
{
cemuLog_log(LogType::Force, "Graphic pack \"{}\": start address (0x{:08x}) must be greater than end address (0x{:08x}) for {}", m_name, addrStart, addrEnd, optionNameBuf);
cemuLog_log(LogType::Force, "Graphic pack \"{}\": start address (0x{:08x}) must be greater than end address (0x{:08x}) for {}", GetNormalizedPathString(), addrStart, addrEnd, optionNameBuf);
throw std::exception();
}
else if ((addrStart & 0xFFF) != 0 || (addrEnd & 0xFFF) != 0)
{
cemuLog_log(LogType::Force, "Graphic pack \"{}\": addresses for %s are not aligned to 0x1000", m_name, optionNameBuf);
cemuLog_log(LogType::Force, "Graphic pack \"{}\": addresses for %s are not aligned to 0x1000", GetNormalizedPathString(), optionNameBuf);
throw std::exception();
}
else
@ -408,7 +408,7 @@ GraphicPack2::GraphicPack2(fs::path rulesPath, IniParser& rules)
}
else
{
cemuLog_log(LogType::Force, "Graphic pack \"{}\": has invalid syntax for option {}", m_name, optionNameBuf);
cemuLog_log(LogType::Force, "Graphic pack \"{}\": has invalid syntax for option {}", GetNormalizedPathString(), optionNameBuf);
throw std::exception();
}
}
@ -422,24 +422,32 @@ GraphicPack2::GraphicPack2(fs::path rulesPath, IniParser& rules)
std::unordered_map<std::string, std::vector<PresetPtr>> tmp_map;
// all vars must be defined in the default preset vars before
for (const auto& entry : m_presets)
std::vector<std::pair<std::string, std::string>> mismatchingPresetVars;
for (const auto& presetEntry : m_presets)
{
tmp_map[entry->category].emplace_back(entry);
tmp_map[presetEntry->category].emplace_back(presetEntry);
for (auto& kv : entry->variables)
for (auto& presetVar : presetEntry->variables)
{
const auto it = m_preset_vars.find(kv.first);
const auto it = m_preset_vars.find(presetVar.first);
if (it == m_preset_vars.cend())
{
cemuLog_log(LogType::Force, "Graphic pack: \"{}\" contains preset variables which are not defined in the default section", m_name);
throw std::exception();
mismatchingPresetVars.emplace_back(presetEntry->name, presetVar.first);
continue;
}
// overwrite var type with default var type
kv.second.first = it->second.first;
presetVar.second.first = it->second.first;
}
}
if(!mismatchingPresetVars.empty())
{
cemuLog_log(LogType::Force, "Graphic pack \"{}\" contains preset variables which are not defined in the [Default] section:", GetNormalizedPathString());
for (const auto& [presetName, varName] : mismatchingPresetVars)
cemuLog_log(LogType::Force, "Preset: {} Variable: {}", presetName, varName);
throw std::exception();
}
// have first entry be default active for every category if no default= is set
for(auto entry : get_values(tmp_map))
{
@ -469,7 +477,7 @@ GraphicPack2::GraphicPack2(fs::path rulesPath, IniParser& rules)
auto& p2 = kv.second[i + 1];
if (p1->variables.size() != p2->variables.size())
{
cemuLog_log(LogType::Force, "Graphic pack: \"{}\" contains inconsistent preset variables", m_name);
cemuLog_log(LogType::Force, "Graphic pack: \"{}\" contains inconsistent preset variables", GetNormalizedPathString());
throw std::exception();
}
@ -477,14 +485,14 @@ GraphicPack2::GraphicPack2(fs::path rulesPath, IniParser& rules)
std::set<std::string> keys2(get_keys(p2->variables).begin(), get_keys(p2->variables).end());
if (keys1 != keys2)
{
cemuLog_log(LogType::Force, "Graphic pack: \"{}\" contains inconsistent preset variables", m_name);
cemuLog_log(LogType::Force, "Graphic pack: \"{}\" contains inconsistent preset variables", GetNormalizedPathString());
throw std::exception();
}
if(p1->is_default)
{
if(has_default)
cemuLog_log(LogType::Force, "Graphic pack: \"{}\" has more than one preset with the default key set for the same category \"{}\"", m_name, p1->name);
cemuLog_log(LogType::Force, "Graphic pack: \"{}\" has more than one preset with the default key set for the same category \"{}\"", GetNormalizedPathString(), p1->name);
p1->active = true;
has_default = true;
}
@ -960,7 +968,7 @@ bool GraphicPack2::Activate()
auto option_upscale = rules.FindOption("upscaleMagFilter");
if(option_upscale && boost::iequals(*option_upscale, "NearestNeighbor"))
m_output_settings.upscale_filter = LatteTextureView::MagFilter::kNearestNeighbor;
auto option_downscale = rules.FindOption("NearestNeighbor");
auto option_downscale = rules.FindOption("downscaleMinFilter");
if (option_downscale && boost::iequals(*option_downscale, "NearestNeighbor"))
m_output_settings.downscale_filter = LatteTextureView::MagFilter::kNearestNeighbor;
}

View file

@ -8,6 +8,7 @@
#include "gui/debugger/DebuggerWindow2.h"
#include "Cafe/OS/libs/coreinit/coreinit.h"
#include "util/helpers/helpers.h"
#if BOOST_OS_WINDOWS
#include <Windows.h>
@ -136,11 +137,6 @@ void debugger_createCodeBreakpoint(uint32 address, uint8 bpType)
debugger_updateExecutionBreakpoint(address);
}
void debugger_createExecuteBreakpoint(uint32 address)
{
debugger_createCodeBreakpoint(address, DEBUGGER_BP_T_NORMAL);
}
namespace coreinit
{
std::vector<std::thread::native_handle_type>& OSGetSchedulerThreads();
@ -294,8 +290,23 @@ void debugger_toggleExecuteBreakpoint(uint32 address)
}
else
{
// create new breakpoint
debugger_createExecuteBreakpoint(address);
// create new execution breakpoint
debugger_createCodeBreakpoint(address, DEBUGGER_BP_T_NORMAL);
}
}
void debugger_toggleLoggingBreakpoint(uint32 address)
{
auto existingBP = debugger_getFirstBP(address, DEBUGGER_BP_T_LOGGING);
if (existingBP)
{
// delete existing breakpoint
debugger_deleteBreakpoint(existingBP);
}
else
{
// create new logging breakpoint
debugger_createCodeBreakpoint(address, DEBUGGER_BP_T_LOGGING);
}
}
@ -447,6 +458,34 @@ bool debugger_hasPatch(uint32 address)
return false;
}
void debugger_removePatch(uint32 address)
{
for (sint32 i = 0; i < debuggerState.patches.size(); i++)
{
auto& patch = debuggerState.patches[i];
if (address < patch->address || address >= (patch->address + patch->length))
continue;
MPTR startAddress = patch->address;
MPTR endAddress = patch->address + patch->length;
// remove any breakpoints overlapping with the patch
for (auto& bp : debuggerState.breakpoints)
{
if (bp->address + 4 > startAddress && bp->address < endAddress)
{
bp->enabled = false;
debugger_updateExecutionBreakpoint(bp->address);
}
}
// restore original data
memcpy(MEMPTR<void>(startAddress).GetPtr(), patch->origData.data(), patch->length);
PPCRecompiler_invalidateRange(startAddress, endAddress);
// remove patch
delete patch;
debuggerState.patches.erase(debuggerState.patches.begin() + i);
return;
}
}
void debugger_stepInto(PPCInterpreter_t* hCPU, bool updateDebuggerWindow = true)
{
bool isRecEnabled = ppcRecompilerEnabled;
@ -510,7 +549,48 @@ void debugger_enterTW(PPCInterpreter_t* hCPU)
{
if (bp->bpType == DEBUGGER_BP_T_LOGGING && bp->enabled)
{
std::string logName = !bp->comment.empty() ? "Breakpoint '"+boost::nowide::narrow(bp->comment)+"'" : fmt::format("Breakpoint at 0x{:08X} (no comment)", bp->address);
std::string comment = !bp->comment.empty() ? boost::nowide::narrow(bp->comment) : fmt::format("Breakpoint at 0x{:08X} (no comment)", bp->address);
auto replacePlaceholders = [&](const std::string& prefix, const auto& formatFunc)
{
size_t pos = 0;
while ((pos = comment.find(prefix, pos)) != std::string::npos)
{
size_t endPos = comment.find('}', pos);
if (endPos == std::string::npos)
break;
try
{
if (int regNum = ConvertString<int>(comment.substr(pos + prefix.length(), endPos - pos - prefix.length())); regNum >= 0 && regNum < 32)
{
std::string replacement = formatFunc(regNum);
comment.replace(pos, endPos - pos + 1, replacement);
pos += replacement.length();
}
else
{
pos = endPos + 1;
}
}
catch (...)
{
pos = endPos + 1;
}
}
};
// Replace integer register placeholders {rX}
replacePlaceholders("{r", [&](int regNum) {
return fmt::format("0x{:08X}", hCPU->gpr[regNum]);
});
// Replace floating point register placeholders {fX}
replacePlaceholders("{f", [&](int regNum) {
return fmt::format("{}", hCPU->fpr[regNum].fpr);
});
std::string logName = "Breakpoint '" + comment + "'";
std::string logContext = fmt::format("Thread: {:08x} LR: 0x{:08x}", MEMPTR<OSThread_t>(coreinit::OSGetCurrentThread()).GetMPTR(), hCPU->spr.LR, cemuLog_advancedPPCLoggingEnabled() ? " Stack Trace:" : "");
cemuLog_log(LogType::Force, "[Debugger] {} was executed! {}", logName, logContext);
if (cemuLog_advancedPPCLoggingEnabled())
@ -547,7 +627,7 @@ void debugger_enterTW(PPCInterpreter_t* hCPU)
debuggerState.debugSession.stepInto = false;
debuggerState.debugSession.stepOver = false;
debuggerState.debugSession.run = false;
while (true)
while (debuggerState.debugSession.isTrapped)
{
std::this_thread::sleep_for(std::chrono::milliseconds(1));
// check for step commands

View file

@ -100,8 +100,8 @@ extern debuggerState_t debuggerState;
// new API
DebuggerBreakpoint* debugger_getFirstBP(uint32 address);
void debugger_createCodeBreakpoint(uint32 address, uint8 bpType);
void debugger_createExecuteBreakpoint(uint32 address);
void debugger_toggleExecuteBreakpoint(uint32 address); // create/remove execute breakpoint
void debugger_toggleLoggingBreakpoint(uint32 address); // create/remove logging breakpoint
void debugger_toggleBreakpoint(uint32 address, bool state, DebuggerBreakpoint* bp);
void debugger_createMemoryBreakpoint(uint32 address, bool onRead, bool onWrite);
@ -114,6 +114,7 @@ void debugger_updateExecutionBreakpoint(uint32 address, bool forceRestore = fals
void debugger_createPatch(uint32 address, std::span<uint8> patchData);
bool debugger_hasPatch(uint32 address);
void debugger_removePatch(uint32 address);
void debugger_forceBreak(); // force breakpoint at the next possible instruction
bool debugger_isTrapped();

View file

@ -91,13 +91,15 @@ namespace Espresso
BCCTR = 528
};
enum class OPCODE_31
enum class Opcode31
{
TW = 4,
MFTB = 371,
};
inline PrimaryOpcode GetPrimaryOpcode(uint32 opcode) { return (PrimaryOpcode)(opcode >> 26); };
inline Opcode19 GetGroup19Opcode(uint32 opcode) { return (Opcode19)((opcode >> 1) & 0x3FF); };
inline Opcode31 GetGroup31Opcode(uint32 opcode) { return (Opcode31)((opcode >> 1) & 0x3FF); };
struct BOField
{
@ -132,6 +134,12 @@ namespace Espresso
uint8 bo;
};
// returns true if LK bit is set, only valid for branch instructions
inline bool DecodeLK(uint32 opcode)
{
return (opcode & 1) != 0;
}
inline void _decodeForm_I(uint32 opcode, uint32& LI, bool& AA, bool& LK)
{
LI = opcode & 0x3fffffc;
@ -183,13 +191,7 @@ namespace Espresso
_decodeForm_D_branch(opcode, BD, BO, BI, AA, LK);
}
inline void decodeOp_BCLR(uint32 opcode, BOField& BO, uint32& BI, bool& LK)
{
// form XL (with BD field expected to be zero)
_decodeForm_XL(opcode, BO, BI, LK);
}
inline void decodeOp_BCCTR(uint32 opcode, BOField& BO, uint32& BI, bool& LK)
inline void decodeOp_BCSPR(uint32 opcode, BOField& BO, uint32& BI, bool& LK) // BCLR and BCSPR
{
// form XL (with BD field expected to be zero)
_decodeForm_XL(opcode, BO, BI, LK);

View file

@ -3,12 +3,12 @@ static void PPCInterpreter_setXerOV(PPCInterpreter_t* hCPU, bool hasOverflow)
{
if (hasOverflow)
{
hCPU->spr.XER |= XER_SO;
hCPU->spr.XER |= XER_OV;
hCPU->xer_so = 1;
hCPU->xer_ov = 1;
}
else
{
hCPU->spr.XER &= ~XER_OV;
hCPU->xer_ov = 0;
}
}
@ -246,7 +246,7 @@ static void PPCInterpreter_SUBFCO(PPCInterpreter_t* hCPU, uint32 opcode)
uint32 a = hCPU->gpr[rA];
uint32 b = hCPU->gpr[rB];
hCPU->gpr[rD] = ~a + b + 1;
// update xer
// update carry
if (ppc_carry_3(~a, b, 1))
hCPU->xer_ca = 1;
else
@ -848,8 +848,7 @@ static void PPCInterpreter_CMP(PPCInterpreter_t* hCPU, uint32 opcode)
hCPU->cr[cr * 4 + CR_BIT_GT] = 1;
else
hCPU->cr[cr * 4 + CR_BIT_EQ] = 1;
if ((hCPU->spr.XER & XER_SO) != 0)
hCPU->cr[cr * 4 + CR_BIT_SO] = 1;
hCPU->cr[cr * 4 + CR_BIT_SO] = hCPU->xer_so;
PPCInterpreter_nextInstruction(hCPU);
}
@ -871,8 +870,7 @@ static void PPCInterpreter_CMPL(PPCInterpreter_t* hCPU, uint32 opcode)
hCPU->cr[cr * 4 + CR_BIT_GT] = 1;
else
hCPU->cr[cr * 4 + CR_BIT_EQ] = 1;
if ((hCPU->spr.XER & XER_SO) != 0)
hCPU->cr[cr * 4 + CR_BIT_SO] = 1;
hCPU->cr[cr * 4 + CR_BIT_SO] = hCPU->xer_so;
PPCInterpreter_nextInstruction(hCPU);
}
@ -895,8 +893,7 @@ static void PPCInterpreter_CMPI(PPCInterpreter_t* hCPU, uint32 opcode)
hCPU->cr[cr * 4 + CR_BIT_GT] = 1;
else
hCPU->cr[cr * 4 + CR_BIT_EQ] = 1;
if (hCPU->spr.XER & XER_SO)
hCPU->cr[cr * 4 + CR_BIT_SO] = 1;
hCPU->cr[cr * 4 + CR_BIT_SO] = hCPU->xer_so;
PPCInterpreter_nextInstruction(hCPU);
}
@ -919,8 +916,7 @@ static void PPCInterpreter_CMPLI(PPCInterpreter_t* hCPU, uint32 opcode)
hCPU->cr[cr * 4 + CR_BIT_GT] = 1;
else
hCPU->cr[cr * 4 + CR_BIT_EQ] = 1;
if (hCPU->spr.XER & XER_SO)
hCPU->cr[cr * 4 + CR_BIT_SO] = 1;
hCPU->cr[cr * 4 + CR_BIT_SO] = hCPU->xer_so;
PPCInterpreter_nextInstruction(hCPU);
}

View file

@ -32,7 +32,7 @@ espresso_frsqrte_entry_t frsqrteLookupTable[32] =
{0x20c1000, 0x35e},{0x1f12000, 0x332},{0x1d79000, 0x30a},{0x1bf4000, 0x2e6},
};
double frsqrte_espresso(double input)
ATTR_MS_ABI double frsqrte_espresso(double input)
{
unsigned long long x = *(unsigned long long*)&input;
@ -111,7 +111,7 @@ espresso_fres_entry_t fresLookupTable[32] =
{0x88400, 0x11a}, {0x65000, 0x11a}, {0x41c00, 0x108}, {0x20c00, 0x106}
};
double fres_espresso(double input)
ATTR_MS_ABI double fres_espresso(double input)
{
// based on testing we know that fres uses only the first 15 bits of the mantissa
// seee eeee eeee mmmm mmmm mmmm mmmx xxxx .... (s = sign, e = exponent, m = mantissa, x = not used)

View file

@ -50,9 +50,9 @@
#define CR_BIT_EQ 2
#define CR_BIT_SO 3
#define XER_SO (1<<31) // summary overflow bit
#define XER_OV (1<<30) // overflow bit
#define XER_BIT_CA (29) // carry bit index. To accelerate frequent access, this bit is stored as a separate uint8
#define XER_BIT_SO (31) // summary overflow, counterpart to CR SO
#define XER_BIT_OV (30)
// FPSCR
#define FPSCR_VXSNAN (1<<24)
@ -118,7 +118,8 @@
static inline void ppc_update_cr0(PPCInterpreter_t* hCPU, uint32 r)
{
hCPU->cr[CR_BIT_SO] = (hCPU->spr.XER&XER_SO) ? 1 : 0;
cemu_assert_debug(hCPU->xer_so <= 1);
hCPU->cr[CR_BIT_SO] = hCPU->xer_so;
hCPU->cr[CR_BIT_LT] = ((r != 0) ? 1 : 0) & ((r & 0x80000000) ? 1 : 0);
hCPU->cr[CR_BIT_EQ] = (r == 0);
hCPU->cr[CR_BIT_GT] = hCPU->cr[CR_BIT_EQ] ^ hCPU->cr[CR_BIT_LT] ^ 1; // this works because EQ and LT can never be set at the same time. So the only case where GT becomes 1 is when LT=0 and EQ=0
@ -190,8 +191,8 @@ inline double roundTo25BitAccuracy(double d)
return *(double*)&v;
}
double fres_espresso(double input);
double frsqrte_espresso(double input);
ATTR_MS_ABI double fres_espresso(double input);
ATTR_MS_ABI double frsqrte_espresso(double input);
void fcmpu_espresso(PPCInterpreter_t* hCPU, int crfD, double a, double b);

View file

@ -85,7 +85,8 @@ static void PPCInterpreter_STWCX(PPCInterpreter_t* hCPU, uint32 Opcode)
ppc_setCRBit(hCPU, CR_BIT_GT, 0);
ppc_setCRBit(hCPU, CR_BIT_EQ, 1);
}
ppc_setCRBit(hCPU, CR_BIT_SO, (hCPU->spr.XER&XER_SO) != 0 ? 1 : 0);
cemu_assert_debug(hCPU->xer_so <= 1);
ppc_setCRBit(hCPU, CR_BIT_SO, hCPU->xer_so);
// remove reservation
hCPU->reservedMemAddr = 0;
hCPU->reservedMemValue = 0;

View file

@ -63,16 +63,24 @@ void PPCInterpreter_setDEC(PPCInterpreter_t* hCPU, uint32 newValue)
uint32 PPCInterpreter_getXER(PPCInterpreter_t* hCPU)
{
uint32 xerValue = hCPU->spr.XER;
xerValue &= ~(1<<XER_BIT_CA);
if( hCPU->xer_ca )
xerValue |= (1<<XER_BIT_CA);
xerValue &= ~(1 << XER_BIT_CA);
xerValue &= ~(1 << XER_BIT_SO);
xerValue &= ~(1 << XER_BIT_OV);
if (hCPU->xer_ca)
xerValue |= (1 << XER_BIT_CA);
if (hCPU->xer_so)
xerValue |= (1 << XER_BIT_SO);
if (hCPU->xer_ov)
xerValue |= (1 << XER_BIT_OV);
return xerValue;
}
void PPCInterpreter_setXER(PPCInterpreter_t* hCPU, uint32 v)
{
hCPU->spr.XER = v;
hCPU->xer_ca = (v>>XER_BIT_CA)&1;
hCPU->xer_ca = (v >> XER_BIT_CA) & 1;
hCPU->xer_so = (v >> XER_BIT_SO) & 1;
hCPU->xer_ov = (v >> XER_BIT_OV) & 1;
}
uint32 PPCInterpreter_getCoreIndex(PPCInterpreter_t* hCPU)

View file

@ -5,7 +5,6 @@
#include "Cafe/OS/libs/coreinit/coreinit_CodeGen.h"
#include "../Recompiler/PPCRecompiler.h"
#include "../Recompiler/PPCRecompilerX64.h"
#include <float.h>
#include "Cafe/HW/Latte/Core/LatteBufferCache.h"

View file

@ -49,6 +49,8 @@ struct PPCInterpreter_t
uint32 fpscr;
uint8 cr[32]; // 0 -> bit not set, 1 -> bit set (upper 7 bits of each byte must always be zero) (cr0 starts at index 0, cr1 at index 4 ..)
uint8 xer_ca; // carry from xer
uint8 xer_so;
uint8 xer_ov;
uint8 LSQE;
uint8 PSE;
// thread remaining cycles
@ -67,7 +69,8 @@ struct PPCInterpreter_t
uint32 reservedMemValue;
// temporary storage for recompiler
FPR_t temporaryFPR[8];
uint32 temporaryGPR[4];
uint32 temporaryGPR[4]; // deprecated, refactor backend dependency on this away
uint32 temporaryGPR_reg[4];
// values below this are not used by Cafe OS usermode
struct
{

File diff suppressed because it is too large Load diff

View file

@ -1,104 +1,56 @@
typedef struct
#include "../PPCRecompiler.h" // todo - get rid of dependency
#include "x86Emitter.h"
struct x64RelocEntry_t
{
x64RelocEntry_t(uint32 offset, void* extraInfo) : offset(offset), extraInfo(extraInfo) {};
uint32 offset;
uint8 type;
void* extraInfo;
}x64RelocEntry_t;
};
typedef struct
struct x64GenContext_t
{
uint8* codeBuffer;
sint32 codeBufferIndex;
sint32 codeBufferSize;
// cr state
sint32 activeCRRegister; // current x86 condition flags reflect this cr* register
sint32 activeCRState; // describes the way in which x86 flags map to the cr register (signed / unsigned)
IMLSegment* currentSegment{};
x86Assembler64* emitter;
sint32 m_currentInstructionEmitIndex;
x64GenContext_t()
{
emitter = new x86Assembler64();
}
~x64GenContext_t()
{
delete emitter;
}
IMLInstruction* GetNextInstruction(sint32 relativeIndex = 1)
{
sint32 index = m_currentInstructionEmitIndex + relativeIndex;
if(index < 0 || index >= (sint32)currentSegment->imlList.size())
return nullptr;
return currentSegment->imlList.data() + index;
}
// relocate offsets
x64RelocEntry_t* relocateOffsetTable;
sint32 relocateOffsetTableSize;
sint32 relocateOffsetTableCount;
}x64GenContext_t;
// Some of these are defined by winnt.h and gnu headers
#undef REG_EAX
#undef REG_ECX
#undef REG_EDX
#undef REG_EBX
#undef REG_ESP
#undef REG_EBP
#undef REG_ESI
#undef REG_EDI
#undef REG_NONE
#undef REG_RAX
#undef REG_RCX
#undef REG_RDX
#undef REG_RBX
#undef REG_RSP
#undef REG_RBP
#undef REG_RSI
#undef REG_RDI
#undef REG_R8
#undef REG_R9
#undef REG_R10
#undef REG_R11
#undef REG_R12
#undef REG_R13
#undef REG_R14
#undef REG_R15
#define REG_EAX 0
#define REG_ECX 1
#define REG_EDX 2
#define REG_EBX 3
#define REG_ESP 4 // reserved for low half of hCPU pointer
#define REG_EBP 5
#define REG_ESI 6
#define REG_EDI 7
#define REG_NONE -1
#define REG_RAX 0
#define REG_RCX 1
#define REG_RDX 2
#define REG_RBX 3
#define REG_RSP 4 // reserved for hCPU pointer
#define REG_RBP 5
#define REG_RSI 6
#define REG_RDI 7
#define REG_R8 8
#define REG_R9 9
#define REG_R10 10
#define REG_R11 11
#define REG_R12 12
#define REG_R13 13 // reserved to hold pointer to memory base? (Not decided yet)
#define REG_R14 14 // reserved as temporary register
#define REG_R15 15 // reserved for pointer to ppcRecompilerInstanceData
#define REG_AL 0
#define REG_CL 1
#define REG_DL 2
#define REG_BL 3
#define REG_AH 4
#define REG_CH 5
#define REG_DH 6
#define REG_BH 7
std::vector<x64RelocEntry_t> relocateOffsetTable2;
};
// reserved registers
#define REG_RESV_TEMP (REG_R14)
#define REG_RESV_HCPU (REG_RSP)
#define REG_RESV_MEMBASE (REG_R13)
#define REG_RESV_RECDATA (REG_R15)
#define REG_RESV_TEMP (X86_REG_R14)
#define REG_RESV_HCPU (X86_REG_RSP)
#define REG_RESV_MEMBASE (X86_REG_R13)
#define REG_RESV_RECDATA (X86_REG_R15)
// reserved floating-point registers
#define REG_RESV_FPR_TEMP (15)
#define reg32ToReg16(__x) (__x) // deprecated
extern sint32 x64Gen_registerMap[12];
#define tempToRealRegister(__x) (x64Gen_registerMap[__x])
#define tempToRealFPRRegister(__x) (__x)
#define reg32ToReg16(__x) (__x)
// deprecated condition flags
enum
{
X86_CONDITION_EQUAL, // or zero
@ -119,36 +71,23 @@ enum
X86_CONDITION_NONE, // no condition, jump always
};
#define PPCREC_CR_TEMPORARY (8) // never stored
#define PPCREC_CR_STATE_TYPE_UNSIGNED_ARITHMETIC (0) // for signed arithmetic operations (ADD, CMPI)
#define PPCREC_CR_STATE_TYPE_SIGNED_ARITHMETIC (1) // for unsigned arithmetic operations (ADD, CMPI)
#define PPCREC_CR_STATE_TYPE_LOGICAL (2) // for unsigned operations (CMPLI)
#define X86_RELOC_MAKE_RELATIVE (0) // make code imm relative to instruction
#define X64_RELOC_LINK_TO_PPC (1) // translate from ppc address to x86 offset
#define X64_RELOC_LINK_TO_SEGMENT (2) // link to beginning of segment
#define PPC_X64_GPR_USABLE_REGISTERS (16-4)
#define PPC_X64_FPR_USABLE_REGISTERS (16-1) // Use XMM0 - XMM14, XMM15 is the temp register
bool PPCRecompiler_generateX64Code(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext);
void PPCRecompilerX64Gen_crConditionFlags_forget(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext);
bool PPCRecompiler_generateX64Code(struct PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext);
void PPCRecompilerX64Gen_redirectRelativeJump(x64GenContext_t* x64GenContext, sint32 jumpInstructionOffset, sint32 destinationOffset);
void PPCRecompilerX64Gen_generateRecompilerInterfaceFunctions();
void PPCRecompilerX64Gen_imlInstruction_fpr_r_name(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext, PPCRecImlInstruction_t* imlInstruction);
void PPCRecompilerX64Gen_imlInstruction_fpr_name_r(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext, PPCRecImlInstruction_t* imlInstruction);
bool PPCRecompilerX64Gen_imlInstruction_fpr_load(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext, PPCRecImlInstruction_t* imlInstruction, bool indexed);
bool PPCRecompilerX64Gen_imlInstruction_fpr_store(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext, PPCRecImlInstruction_t* imlInstruction, bool indexed);
void PPCRecompilerX64Gen_imlInstruction_fpr_r_name(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext, IMLInstruction* imlInstruction);
void PPCRecompilerX64Gen_imlInstruction_fpr_name_r(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext, IMLInstruction* imlInstruction);
bool PPCRecompilerX64Gen_imlInstruction_fpr_load(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext, IMLInstruction* imlInstruction, bool indexed);
bool PPCRecompilerX64Gen_imlInstruction_fpr_store(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext, IMLInstruction* imlInstruction, bool indexed);
void PPCRecompilerX64Gen_imlInstruction_fpr_r_r(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext, PPCRecImlInstruction_t* imlInstruction);
void PPCRecompilerX64Gen_imlInstruction_fpr_r_r_r(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext, PPCRecImlInstruction_t* imlInstruction);
void PPCRecompilerX64Gen_imlInstruction_fpr_r_r_r_r(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext, PPCRecImlInstruction_t* imlInstruction);
void PPCRecompilerX64Gen_imlInstruction_fpr_r(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext, PPCRecImlInstruction_t* imlInstruction);
void PPCRecompilerX64Gen_imlInstruction_fpr_r_r(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext, IMLInstruction* imlInstruction);
void PPCRecompilerX64Gen_imlInstruction_fpr_r_r_r(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext, IMLInstruction* imlInstruction);
void PPCRecompilerX64Gen_imlInstruction_fpr_r_r_r_r(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext, IMLInstruction* imlInstruction);
void PPCRecompilerX64Gen_imlInstruction_fpr_r(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext, IMLInstruction* imlInstruction);
void PPCRecompilerX64Gen_imlInstruction_fpr_compare(PPCRecFunction_t* PPCRecFunction, ppcImlGenContext_t* ppcImlGenContext, x64GenContext_t* x64GenContext, IMLInstruction* imlInstruction);
// ASM gen
void x64Gen_writeU8(x64GenContext_t* x64GenContext, uint8 v);
@ -196,9 +135,6 @@ void x64Gen_or_reg64Low8_mem8Reg64(x64GenContext_t* x64GenContext, sint32 dstReg
void x64Gen_and_reg64Low8_mem8Reg64(x64GenContext_t* x64GenContext, sint32 dstRegister, sint32 memRegister64, sint32 memImmS32);
void x64Gen_mov_mem8Reg64_reg64Low8(x64GenContext_t* x64GenContext, sint32 dstRegister, sint32 memRegister64, sint32 memImmS32);
void x64Gen_lock_cmpxchg_mem32Reg64PlusReg64_reg64(x64GenContext_t* x64GenContext, sint32 memRegisterA64, sint32 memRegisterB64, sint32 memImmS32, sint32 srcRegister);
void x64Gen_lock_cmpxchg_mem32Reg64_reg64(x64GenContext_t* x64GenContext, sint32 memRegister64, sint32 memImmS32, sint32 srcRegister);
void x64Gen_add_reg64_reg64(x64GenContext_t* x64GenContext, sint32 destRegister, sint32 srcRegister);
void x64Gen_add_reg64Low32_reg64Low32(x64GenContext_t* x64GenContext, sint32 destRegister, sint32 srcRegister);
void x64Gen_add_reg64_imm32(x64GenContext_t* x64GenContext, sint32 srcRegister, uint32 immU32);
@ -207,9 +143,6 @@ void x64Gen_sub_reg64Low32_reg64Low32(x64GenContext_t* x64GenContext, sint32 des
void x64Gen_sub_reg64Low32_imm32(x64GenContext_t* x64GenContext, sint32 srcRegister, uint32 immU32);
void x64Gen_sub_reg64_imm32(x64GenContext_t* x64GenContext, sint32 srcRegister, uint32 immU32);
void x64Gen_sub_mem32reg64_imm32(x64GenContext_t* x64GenContext, sint32 memRegister, sint32 memImmS32, uint64 immU32);
void x64Gen_sbb_reg64Low32_reg64Low32(x64GenContext_t* x64GenContext, sint32 destRegister, sint32 srcRegister);
void x64Gen_adc_reg64Low32_reg64Low32(x64GenContext_t* x64GenContext, sint32 destRegister, sint32 srcRegister);
void x64Gen_adc_reg64Low32_imm32(x64GenContext_t* x64GenContext, sint32 srcRegister, uint32 immU32);
void x64Gen_dec_mem32(x64GenContext_t* x64GenContext, sint32 memoryRegister, uint32 memoryImmU32);
void x64Gen_imul_reg64Low32_reg64Low32(x64GenContext_t* x64GenContext, sint32 destRegister, sint32 operandRegister);
void x64Gen_idiv_reg64Low32(x64GenContext_t* x64GenContext, sint32 operandRegister);
@ -241,9 +174,7 @@ void x64Gen_not_reg64Low32(x64GenContext_t* x64GenContext, sint32 destRegister);
void x64Gen_neg_reg64Low32(x64GenContext_t* x64GenContext, sint32 destRegister);
void x64Gen_cdq(x64GenContext_t* x64GenContext);
void x64Gen_bswap_reg64(x64GenContext_t* x64GenContext, sint32 destRegister);
void x64Gen_bswap_reg64Lower32bit(x64GenContext_t* x64GenContext, sint32 destRegister);
void x64Gen_bswap_reg64Lower16bit(x64GenContext_t* x64GenContext, sint32 destRegister);
void x64Gen_lzcnt_reg64Low32_reg64Low32(x64GenContext_t* x64GenContext, sint32 destRegister, sint32 srcRegister);
void x64Gen_bsr_reg64Low32_reg64Low32(x64GenContext_t* x64GenContext, sint32 destRegister, sint32 srcRegister);
@ -329,4 +260,8 @@ void x64Gen_movBEZeroExtend_reg64Low16_mem16Reg64PlusReg64(x64GenContext_t* x64G
void x64Gen_movBETruncate_mem32Reg64PlusReg64_reg64(x64GenContext_t* x64GenContext, sint32 memRegisterA64, sint32 memRegisterB64, sint32 memImmS32, sint32 srcRegister);
void x64Gen_shrx_reg64_reg64_reg64(x64GenContext_t* x64GenContext, sint32 registerDst, sint32 registerA, sint32 registerB);
void x64Gen_shlx_reg64_reg64_reg64(x64GenContext_t* x64GenContext, sint32 registerDst, sint32 registerA, sint32 registerB);
void x64Gen_shrx_reg32_reg32_reg32(x64GenContext_t* x64GenContext, sint32 registerDst, sint32 registerA, sint32 registerB);
void x64Gen_sarx_reg64_reg64_reg64(x64GenContext_t* x64GenContext, sint32 registerDst, sint32 registerA, sint32 registerB);
void x64Gen_sarx_reg32_reg32_reg32(x64GenContext_t* x64GenContext, sint32 registerDst, sint32 registerA, sint32 registerB);
void x64Gen_shlx_reg64_reg64_reg64(x64GenContext_t* x64GenContext, sint32 registerDst, sint32 registerA, sint32 registerB);
void x64Gen_shlx_reg32_reg32_reg32(x64GenContext_t* x64GenContext, sint32 registerDst, sint32 registerA, sint32 registerB);

View file

@ -1,5 +1,4 @@
#include "PPCRecompiler.h"
#include "PPCRecompilerX64.h"
#include "BackendX64.h"
void _x64Gen_writeMODRMDeprecated(x64GenContext_t* x64GenContext, sint32 dataRegister, sint32 memRegisterA64, sint32 memRegisterB64, sint32 memImmS32);
@ -21,11 +20,10 @@ void _x64Gen_vex128_nds(x64GenContext_t* x64GenContext, uint8 opcodeMap, uint8 a
x64Gen_writeU8(x64GenContext, opcode);
}
#define VEX_PP_0F 0 // guessed
#define VEX_PP_0F 0
#define VEX_PP_66_0F 1
#define VEX_PP_F3_0F 2 // guessed
#define VEX_PP_F2_0F 3 // guessed
#define VEX_PP_F3_0F 2
#define VEX_PP_F2_0F 3
void x64Gen_avx_VPUNPCKHQDQ_xmm_xmm_xmm(x64GenContext_t* x64GenContext, sint32 dstRegister, sint32 srcRegisterA, sint32 srcRegisterB)
{

View file

@ -1,5 +1,4 @@
#include "PPCRecompiler.h"
#include "PPCRecompilerX64.h"
#include "BackendX64.h"
void _x64Gen_writeMODRMDeprecated(x64GenContext_t* x64GenContext, sint32 dataRegister, sint32 memRegisterA64, sint32 memRegisterB64, sint32 memImmS32);
@ -69,6 +68,34 @@ void x64Gen_shrx_reg64_reg64_reg64(x64GenContext_t* x64GenContext, sint32 regist
x64Gen_writeU8(x64GenContext, 0xC0 + (registerDst & 7) * 8 + (registerA & 7));
}
void x64Gen_shrx_reg32_reg32_reg32(x64GenContext_t* x64GenContext, sint32 registerDst, sint32 registerA, sint32 registerB)
{
x64Gen_writeU8(x64GenContext, 0xC4);
x64Gen_writeU8(x64GenContext, 0xE2 - ((registerDst >= 8) ? 0x80 : 0) - ((registerA >= 8) ? 0x20 : 0));
x64Gen_writeU8(x64GenContext, 0x7B - registerB * 8);
x64Gen_writeU8(x64GenContext, 0xF7);
x64Gen_writeU8(x64GenContext, 0xC0 + (registerDst & 7) * 8 + (registerA & 7));
}
void x64Gen_sarx_reg64_reg64_reg64(x64GenContext_t* x64GenContext, sint32 registerDst, sint32 registerA, sint32 registerB)
{
// SARX reg64, reg64, reg64
x64Gen_writeU8(x64GenContext, 0xC4);
x64Gen_writeU8(x64GenContext, 0xE2 - ((registerDst >= 8) ? 0x80 : 0) - ((registerA >= 8) ? 0x20 : 0));
x64Gen_writeU8(x64GenContext, 0xFA - registerB * 8);
x64Gen_writeU8(x64GenContext, 0xF7);
x64Gen_writeU8(x64GenContext, 0xC0 + (registerDst & 7) * 8 + (registerA & 7));
}
void x64Gen_sarx_reg32_reg32_reg32(x64GenContext_t* x64GenContext, sint32 registerDst, sint32 registerA, sint32 registerB)
{
x64Gen_writeU8(x64GenContext, 0xC4);
x64Gen_writeU8(x64GenContext, 0xE2 - ((registerDst >= 8) ? 0x80 : 0) - ((registerA >= 8) ? 0x20 : 0));
x64Gen_writeU8(x64GenContext, 0x7A - registerB * 8);
x64Gen_writeU8(x64GenContext, 0xF7);
x64Gen_writeU8(x64GenContext, 0xC0 + (registerDst & 7) * 8 + (registerA & 7));
}
void x64Gen_shlx_reg64_reg64_reg64(x64GenContext_t* x64GenContext, sint32 registerDst, sint32 registerA, sint32 registerB)
{
// SHLX reg64, reg64, reg64
@ -77,4 +104,13 @@ void x64Gen_shlx_reg64_reg64_reg64(x64GenContext_t* x64GenContext, sint32 regist
x64Gen_writeU8(x64GenContext, 0xF9 - registerB * 8);
x64Gen_writeU8(x64GenContext, 0xF7);
x64Gen_writeU8(x64GenContext, 0xC0 + (registerDst & 7) * 8 + (registerA & 7));
}
void x64Gen_shlx_reg32_reg32_reg32(x64GenContext_t* x64GenContext, sint32 registerDst, sint32 registerA, sint32 registerB)
{
x64Gen_writeU8(x64GenContext, 0xC4);
x64Gen_writeU8(x64GenContext, 0xE2 - ((registerDst >= 8) ? 0x80 : 0) - ((registerA >= 8) ? 0x20 : 0));
x64Gen_writeU8(x64GenContext, 0x79 - registerB * 8);
x64Gen_writeU8(x64GenContext, 0xF7);
x64Gen_writeU8(x64GenContext, 0xC0 + (registerDst & 7) * 8 + (registerA & 7));
}

View file

@ -1,62 +1,31 @@
#include "PPCRecompiler.h"
#include "PPCRecompilerIml.h"
#include "PPCRecompilerX64.h"
#include "BackendX64.h"
// x86/x64 extension opcodes that could be useful:
// ANDN
// mulx, rorx, sarx, shlx, shrx
// PDEP, PEXT
void x64Gen_checkBuffer(x64GenContext_t* x64GenContext)
{
// todo
}
void x64Gen_writeU8(x64GenContext_t* x64GenContext, uint8 v)
{
if( x64GenContext->codeBufferIndex+1 > x64GenContext->codeBufferSize )
{
x64GenContext->codeBufferSize *= 2;
x64GenContext->codeBuffer = (uint8*)realloc(x64GenContext->codeBuffer, x64GenContext->codeBufferSize);
}
*(uint8*)(x64GenContext->codeBuffer+x64GenContext->codeBufferIndex) = v;
x64GenContext->codeBufferIndex++;
x64GenContext->emitter->_emitU8(v);
}
void x64Gen_writeU16(x64GenContext_t* x64GenContext, uint32 v)
{
if( x64GenContext->codeBufferIndex+2 > x64GenContext->codeBufferSize )
{
x64GenContext->codeBufferSize *= 2;
x64GenContext->codeBuffer = (uint8*)realloc(x64GenContext->codeBuffer, x64GenContext->codeBufferSize);
}
*(uint16*)(x64GenContext->codeBuffer+x64GenContext->codeBufferIndex) = v;
x64GenContext->codeBufferIndex += 2;
x64GenContext->emitter->_emitU16(v);
}
void x64Gen_writeU32(x64GenContext_t* x64GenContext, uint32 v)
{
if( x64GenContext->codeBufferIndex+4 > x64GenContext->codeBufferSize )
{
x64GenContext->codeBufferSize *= 2;
x64GenContext->codeBuffer = (uint8*)realloc(x64GenContext->codeBuffer, x64GenContext->codeBufferSize);
}
*(uint32*)(x64GenContext->codeBuffer+x64GenContext->codeBufferIndex) = v;
x64GenContext->codeBufferIndex += 4;
x64GenContext->emitter->_emitU32(v);
}
void x64Gen_writeU64(x64GenContext_t* x64GenContext, uint64 v)
{
if( x64GenContext->codeBufferIndex+8 > x64GenContext->codeBufferSize )
{
x64GenContext->codeBufferSize *= 2;
x64GenContext->codeBuffer = (uint8*)realloc(x64GenContext->codeBuffer, x64GenContext->codeBufferSize);
}
*(uint64*)(x64GenContext->codeBuffer+x64GenContext->codeBufferIndex) = v;
x64GenContext->codeBufferIndex += 8;
x64GenContext->emitter->_emitU64(v);
}
#include "x64Emit.hpp"
#include "X64Emit.hpp"
void _x64Gen_writeMODRMDeprecated(x64GenContext_t* x64GenContext, sint32 dataRegister, sint32 memRegisterA64, sint32 memRegisterB64, sint32 memImmS32)
{
@ -67,7 +36,7 @@ void _x64Gen_writeMODRMDeprecated(x64GenContext_t* x64GenContext, sint32 dataReg
forceUseOffset = true;
}
if (memRegisterB64 == REG_NONE)
if (memRegisterB64 == X86_REG_NONE)
{
// memRegisterA64 + memImmS32
uint8 modRM = (dataRegister & 7) * 8 + (memRegisterA64 & 7);
@ -352,7 +321,7 @@ void x64Gen_mov_mem32Reg64_imm32(x64GenContext_t* x64GenContext, sint32 memRegis
void x64Gen_mov_mem64Reg64_imm32(x64GenContext_t* x64GenContext, sint32 memRegister, uint32 memImmU32, uint32 dataImmU32)
{
// MOV QWORD [<memReg>+<memImmU32>], dataImmU32
if( memRegister == REG_R14 )
if( memRegister == X86_REG_R14 )
{
sint32 memImmS32 = (sint32)memImmU32;
if( memImmS32 == 0 )
@ -384,7 +353,7 @@ void x64Gen_mov_mem64Reg64_imm32(x64GenContext_t* x64GenContext, sint32 memRegis
void x64Gen_mov_mem8Reg64_imm8(x64GenContext_t* x64GenContext, sint32 memRegister, uint32 memImmU32, uint8 dataImmU8)
{
// MOV BYTE [<memReg64>+<memImmU32>], dataImmU8
if( memRegister == REG_RSP )
if( memRegister == X86_REG_RSP )
{
sint32 memImmS32 = (sint32)memImmU32;
if( memImmS32 >= -128 && memImmS32 <= 127 )
@ -625,7 +594,7 @@ void _x64_op_reg64Low_mem8Reg64(x64GenContext_t* x64GenContext, sint32 dstRegist
if (memRegister64 >= 8)
x64Gen_writeU8(x64GenContext, 0x41);
x64Gen_writeU8(x64GenContext, opByte);
_x64Gen_writeMODRMDeprecated(x64GenContext, dstRegister, memRegister64, REG_NONE, memImmS32);
_x64Gen_writeMODRMDeprecated(x64GenContext, dstRegister, memRegister64, X86_REG_NONE, memImmS32);
}
void x64Gen_or_reg64Low8_mem8Reg64(x64GenContext_t* x64GenContext, sint32 dstRegister, sint32 memRegister64, sint32 memImmS32)
@ -643,40 +612,6 @@ void x64Gen_mov_mem8Reg64_reg64Low8(x64GenContext_t* x64GenContext, sint32 dstRe
_x64_op_reg64Low_mem8Reg64(x64GenContext, dstRegister, memRegister64, memImmS32, 0x88);
}
void x64Gen_lock_cmpxchg_mem32Reg64PlusReg64_reg64(x64GenContext_t* x64GenContext, sint32 memRegisterA64, sint32 memRegisterB64, sint32 memImmS32, sint32 srcRegister)
{
// LOCK CMPXCHG DWORD [<reg64> + <reg64> + <imm64>], <srcReg64> (low dword)
x64Gen_writeU8(x64GenContext, 0xF0); // LOCK prefix
if( srcRegister >= 8 || memRegisterA64 >= 8|| memRegisterB64 >= 8 )
x64Gen_writeU8(x64GenContext, 0x40+((srcRegister>=8)?4:0)+((memRegisterA64>=8)?1:0)+((memRegisterB64>=8)?2:0));
x64Gen_writeU8(x64GenContext, 0x0F);
x64Gen_writeU8(x64GenContext, 0xB1);
_x64Gen_writeMODRMDeprecated(x64GenContext, srcRegister, memRegisterA64, memRegisterB64, memImmS32);
}
void x64Gen_lock_cmpxchg_mem32Reg64_reg64(x64GenContext_t* x64GenContext, sint32 memRegister64, sint32 memImmS32, sint32 srcRegister)
{
// LOCK CMPXCHG DWORD [<reg64> + <imm64>], <srcReg64> (low dword)
x64Gen_writeU8(x64GenContext, 0xF0); // LOCK prefix
if( srcRegister >= 8 || memRegister64 >= 8 )
x64Gen_writeU8(x64GenContext, 0x40+((srcRegister>=8)?4:0)+((memRegister64>=8)?1:0));
x64Gen_writeU8(x64GenContext, 0x0F);
x64Gen_writeU8(x64GenContext, 0xB1);
if( memImmS32 == 0 )
{
x64Gen_writeU8(x64GenContext, 0x45+(srcRegister&7)*8);
x64Gen_writeU8(x64GenContext, 0x00);
}
else
assert_dbg();
}
void x64Gen_add_reg64_reg64(x64GenContext_t* x64GenContext, sint32 destRegister, sint32 srcRegister)
{
// ADD <destReg>, <srcReg>
@ -732,7 +667,7 @@ void x64Gen_add_reg64Low32_imm32(x64GenContext_t* x64GenContext, sint32 srcRegis
}
else
{
if( srcRegister == REG_RAX )
if( srcRegister == X86_REG_RAX )
{
// special EAX short form
x64Gen_writeU8(x64GenContext, 0x05);
@ -772,7 +707,7 @@ void x64Gen_sub_reg64Low32_imm32(x64GenContext_t* x64GenContext, sint32 srcRegis
}
else
{
if( srcRegister == REG_RAX )
if( srcRegister == X86_REG_RAX )
{
// special EAX short form
x64Gen_writeU8(x64GenContext, 0x2D);
@ -811,7 +746,7 @@ void x64Gen_sub_mem32reg64_imm32(x64GenContext_t* x64GenContext, sint32 memRegis
{
// SUB <mem32_memReg64>, <imm32>
sint32 immS32 = (sint32)immU32;
if( memRegister == REG_RSP )
if( memRegister == X86_REG_RSP )
{
if( memImmS32 >= 128 )
{
@ -843,64 +778,11 @@ void x64Gen_sub_mem32reg64_imm32(x64GenContext_t* x64GenContext, sint32 memRegis
}
}
void x64Gen_sbb_reg64Low32_reg64Low32(x64GenContext_t* x64GenContext, sint32 destRegister, sint32 srcRegister)
{
// SBB <destReg64_low32>, <srcReg64_low32>
if( destRegister >= 8 && srcRegister >= 8 )
x64Gen_writeU8(x64GenContext, 0x45);
else if( srcRegister >= 8 )
x64Gen_writeU8(x64GenContext, 0x44);
else if( destRegister >= 8 )
x64Gen_writeU8(x64GenContext, 0x41);
x64Gen_writeU8(x64GenContext, 0x19);
x64Gen_writeU8(x64GenContext, 0xC0+(srcRegister&7)*8+(destRegister&7));
}
void x64Gen_adc_reg64Low32_reg64Low32(x64GenContext_t* x64GenContext, sint32 destRegister, sint32 srcRegister)
{
// ADC <destReg64_low32>, <srcReg64_low32>
if( destRegister >= 8 && srcRegister >= 8 )
x64Gen_writeU8(x64GenContext, 0x45);
else if( srcRegister >= 8 )
x64Gen_writeU8(x64GenContext, 0x44);
else if( destRegister >= 8 )
x64Gen_writeU8(x64GenContext, 0x41);
x64Gen_writeU8(x64GenContext, 0x11);
x64Gen_writeU8(x64GenContext, 0xC0+(srcRegister&7)*8+(destRegister&7));
}
void x64Gen_adc_reg64Low32_imm32(x64GenContext_t* x64GenContext, sint32 srcRegister, uint32 immU32)
{
sint32 immS32 = (sint32)immU32;
if( srcRegister >= 8 )
x64Gen_writeU8(x64GenContext, 0x41);
if( immS32 >= -128 && immS32 <= 127 )
{
x64Gen_writeU8(x64GenContext, 0x83);
x64Gen_writeU8(x64GenContext, 0xD0+(srcRegister&7));
x64Gen_writeU8(x64GenContext, (uint8)immS32);
}
else
{
if( srcRegister == REG_RAX )
{
// special EAX short form
x64Gen_writeU8(x64GenContext, 0x15);
}
else
{
x64Gen_writeU8(x64GenContext, 0x81);
x64Gen_writeU8(x64GenContext, 0xD0+(srcRegister&7));
}
x64Gen_writeU32(x64GenContext, immU32);
}
}
void x64Gen_dec_mem32(x64GenContext_t* x64GenContext, sint32 memoryRegister, uint32 memoryImmU32)
{
// DEC dword [<reg64>+imm]
sint32 memoryImmS32 = (sint32)memoryImmU32;
if (memoryRegister != REG_RSP)
if (memoryRegister != X86_REG_RSP)
assert_dbg(); // not supported yet
if (memoryImmS32 >= -128 && memoryImmS32 <= 127)
{
@ -981,7 +863,7 @@ void x64Gen_and_reg64Low32_imm32(x64GenContext_t* x64GenContext, sint32 srcRegis
}
else
{
if( srcRegister == REG_RAX )
if( srcRegister == X86_REG_RAX )
{
// special EAX short form
x64Gen_writeU8(x64GenContext, 0x25);
@ -1026,7 +908,7 @@ void x64Gen_test_reg64Low32_imm32(x64GenContext_t* x64GenContext, sint32 srcRegi
sint32 immS32 = (sint32)immU32;
if( srcRegister >= 8 )
x64Gen_writeU8(x64GenContext, 0x41);
if( srcRegister == REG_RAX )
if( srcRegister == X86_REG_RAX )
{
// special EAX short form
x64Gen_writeU8(x64GenContext, 0xA9);
@ -1052,7 +934,7 @@ void x64Gen_cmp_reg64Low32_imm32(x64GenContext_t* x64GenContext, sint32 srcRegis
}
else
{
if( srcRegister == REG_RAX )
if( srcRegister == X86_REG_RAX )
{
// special RAX short form
x64Gen_writeU8(x64GenContext, 0x3D);
@ -1082,7 +964,7 @@ void x64Gen_cmp_reg64Low32_reg64Low32(x64GenContext_t* x64GenContext, sint32 des
void x64Gen_cmp_reg64Low32_mem32reg64(x64GenContext_t* x64GenContext, sint32 destRegister, sint32 memRegister, sint32 memImmS32)
{
// CMP <destReg64_lowDWORD>, DWORD [<memRegister>+<immS32>]
if( memRegister == REG_RSP )
if( memRegister == X86_REG_RSP )
{
if( memImmS32 >= -128 && memImmS32 <= 127 )
assert_dbg(); // todo -> Shorter instruction form
@ -1112,7 +994,7 @@ void x64Gen_or_reg64Low32_imm32(x64GenContext_t* x64GenContext, sint32 srcRegist
}
else
{
if( srcRegister == REG_RAX )
if( srcRegister == X86_REG_RAX )
{
// special EAX short form
x64Gen_writeU8(x64GenContext, 0x0D);
@ -1172,7 +1054,7 @@ void x64Gen_xor_reg64Low32_imm32(x64GenContext_t* x64GenContext, sint32 srcRegis
}
else
{
if( srcRegister == REG_RAX )
if( srcRegister == X86_REG_RAX )
{
// special EAX short form
x64Gen_writeU8(x64GenContext, 0x35);
@ -1326,16 +1208,6 @@ void x64Gen_cdq(x64GenContext_t* x64GenContext)
x64Gen_writeU8(x64GenContext, 0x99);
}
void x64Gen_bswap_reg64(x64GenContext_t* x64GenContext, sint32 destRegister)
{
if( destRegister >= 8 )
x64Gen_writeU8(x64GenContext, 0x41|8);
else
x64Gen_writeU8(x64GenContext, 0x40|8);
x64Gen_writeU8(x64GenContext, 0x0F);
x64Gen_writeU8(x64GenContext, 0xC8+(destRegister&7));
}
void x64Gen_bswap_reg64Lower32bit(x64GenContext_t* x64GenContext, sint32 destRegister)
{
if( destRegister >= 8 )
@ -1344,16 +1216,6 @@ void x64Gen_bswap_reg64Lower32bit(x64GenContext_t* x64GenContext, sint32 destReg
x64Gen_writeU8(x64GenContext, 0xC8+(destRegister&7));
}
void x64Gen_bswap_reg64Lower16bit(x64GenContext_t* x64GenContext, sint32 destRegister)
{
assert_dbg(); // do not use this instruction, it's result is always undefined. Instead use ROL <reg16>, 8
//x64Gen_writeU8(x64GenContext, 0x66);
//if( destRegister >= 8 )
// x64Gen_writeU8(x64GenContext, 0x41);
//x64Gen_writeU8(x64GenContext, 0x0F);
//x64Gen_writeU8(x64GenContext, 0xC8+(destRegister&7));
}
void x64Gen_lzcnt_reg64Low32_reg64Low32(x64GenContext_t* x64GenContext, sint32 destRegister, sint32 srcRegister)
{
// SSE4
@ -1388,7 +1250,7 @@ void x64Gen_setcc_mem8(x64GenContext_t* x64GenContext, sint32 conditionType, sin
{
// SETcc [<reg64>+imm]
sint32 memoryImmS32 = (sint32)memoryImmU32;
if( memoryRegister != REG_RSP )
if( memoryRegister != X86_REG_RSP )
assert_dbg(); // not supported
if( memoryRegister >= 8 )
assert_dbg(); // not supported
@ -1627,7 +1489,7 @@ void x64Gen_bt_mem8(x64GenContext_t* x64GenContext, sint32 memoryRegister, uint3
{
// BT [<reg64>+imm], bitIndex (bit test)
sint32 memoryImmS32 = (sint32)memoryImmU32;
if( memoryRegister != REG_RSP )
if( memoryRegister != X86_REG_RSP )
assert_dbg(); // not supported yet
if( memoryImmS32 >= -128 && memoryImmS32 <= 127 )
{
@ -1662,7 +1524,7 @@ void x64Gen_jmp_imm32(x64GenContext_t* x64GenContext, uint32 destImm32)
void x64Gen_jmp_memReg64(x64GenContext_t* x64GenContext, sint32 memRegister, uint32 immU32)
{
if( memRegister == REG_NONE )
if( memRegister == X86_REG_NONE )
{
assert_dbg();
}

View file

@ -1,6 +1,4 @@
#include "PPCRecompiler.h"
#include "PPCRecompilerIml.h"
#include "PPCRecompilerX64.h"
#include "BackendX64.h"
void x64Gen_genSSEVEXPrefix2(x64GenContext_t* x64GenContext, sint32 xmmRegister1, sint32 xmmRegister2, bool use64BitMode)
{
@ -44,7 +42,7 @@ void x64Gen_movupd_xmmReg_memReg128(x64GenContext_t* x64GenContext, sint32 xmmRe
// SSE2
// move two doubles from memory into xmm register
// MOVUPD <xmm>, [<reg>+<imm>]
if( memRegister == REG_ESP )
if( memRegister == X86_REG_ESP )
{
// todo: Short form of instruction if memImmU32 is 0 or in -128 to 127 range
// 66 0F 10 84 E4 23 01 00 00
@ -56,7 +54,7 @@ void x64Gen_movupd_xmmReg_memReg128(x64GenContext_t* x64GenContext, sint32 xmmRe
x64Gen_writeU8(x64GenContext, 0xE4);
x64Gen_writeU32(x64GenContext, memImmU32);
}
else if( memRegister == REG_NONE )
else if( memRegister == X86_REG_NONE )
{
assert_dbg();
//x64Gen_writeU8(x64GenContext, 0x66);
@ -76,7 +74,7 @@ void x64Gen_movupd_memReg128_xmmReg(x64GenContext_t* x64GenContext, sint32 xmmRe
// SSE2
// move two doubles from memory into xmm register
// MOVUPD [<reg>+<imm>], <xmm>
if( memRegister == REG_ESP )
if( memRegister == X86_REG_ESP )
{
// todo: Short form of instruction if memImmU32 is 0 or in -128 to 127 range
x64Gen_writeU8(x64GenContext, 0x66);
@ -87,7 +85,7 @@ void x64Gen_movupd_memReg128_xmmReg(x64GenContext_t* x64GenContext, sint32 xmmRe
x64Gen_writeU8(x64GenContext, 0xE4);
x64Gen_writeU32(x64GenContext, memImmU32);
}
else if( memRegister == REG_NONE )
else if( memRegister == X86_REG_NONE )
{
assert_dbg();
//x64Gen_writeU8(x64GenContext, 0x66);
@ -106,7 +104,7 @@ void x64Gen_movddup_xmmReg_memReg64(x64GenContext_t* x64GenContext, sint32 xmmRe
{
// SSE3
// move one double from memory into lower and upper half of a xmm register
if( memRegister == REG_RSP )
if( memRegister == X86_REG_RSP )
{
// MOVDDUP <xmm>, [<reg>+<imm>]
// todo: Short form of instruction if memImmU32 is 0 or in -128 to 127 range
@ -119,7 +117,7 @@ void x64Gen_movddup_xmmReg_memReg64(x64GenContext_t* x64GenContext, sint32 xmmRe
x64Gen_writeU8(x64GenContext, 0xE4);
x64Gen_writeU32(x64GenContext, memImmU32);
}
else if( memRegister == REG_R15 )
else if( memRegister == X86_REG_R15 )
{
// MOVDDUP <xmm>, [<reg>+<imm>]
// todo: Short form of instruction if memImmU32 is 0 or in -128 to 127 range
@ -131,7 +129,7 @@ void x64Gen_movddup_xmmReg_memReg64(x64GenContext_t* x64GenContext, sint32 xmmRe
x64Gen_writeU8(x64GenContext, 0x87+(xmmRegister&7)*8);
x64Gen_writeU32(x64GenContext, memImmU32);
}
else if( memRegister == REG_NONE )
else if( memRegister == X86_REG_NONE )
{
// MOVDDUP <xmm>, [<imm>]
// 36 F2 0F 12 05 - 00 00 00 00
@ -185,7 +183,7 @@ void x64Gen_movsd_memReg64_xmmReg(x64GenContext_t* x64GenContext, sint32 xmmRegi
{
// SSE2
// move lower 64bits (double) of xmm register to memory location
if( memRegister == REG_NONE )
if( memRegister == X86_REG_NONE )
{
// MOVSD [<imm>], <xmm>
// F2 0F 11 05 - 45 23 01 00
@ -197,7 +195,7 @@ void x64Gen_movsd_memReg64_xmmReg(x64GenContext_t* x64GenContext, sint32 xmmRegi
//x64Gen_writeU8(x64GenContext, 0x05+xmmRegister*8);
//x64Gen_writeU32(x64GenContext, memImmU32);
}
else if( memRegister == REG_RSP )
else if( memRegister == X86_REG_RSP )
{
// MOVSD [RSP+<imm>], <xmm>
// F2 0F 11 84 24 - 33 22 11 00
@ -219,7 +217,7 @@ void x64Gen_movlpd_xmmReg_memReg64(x64GenContext_t* x64GenContext, sint32 xmmReg
{
// SSE3
// move one double from memory into lower half of a xmm register, leave upper half unchanged(?)
if( memRegister == REG_NONE )
if( memRegister == X86_REG_NONE )
{
// MOVLPD <xmm>, [<imm>]
//x64Gen_writeU8(x64GenContext, 0x66);
@ -229,7 +227,7 @@ void x64Gen_movlpd_xmmReg_memReg64(x64GenContext_t* x64GenContext, sint32 xmmReg
//x64Gen_writeU32(x64GenContext, memImmU32);
assert_dbg();
}
else if( memRegister == REG_RSP )
else if( memRegister == X86_REG_RSP )
{
// MOVLPD <xmm>, [<reg64>+<imm>]
// 66 0F 12 84 24 - 33 22 11 00
@ -348,11 +346,11 @@ void x64Gen_mulpd_xmmReg_xmmReg(x64GenContext_t* x64GenContext, sint32 xmmRegist
void x64Gen_mulpd_xmmReg_memReg128(x64GenContext_t* x64GenContext, sint32 xmmRegister, sint32 memRegister, uint32 memImmU32)
{
// SSE2
if (memRegister == REG_NONE)
if (memRegister == X86_REG_NONE)
{
assert_dbg();
}
else if (memRegister == REG_R14)
else if (memRegister == X86_REG_R14)
{
x64Gen_writeU8(x64GenContext, 0x66);
x64Gen_writeU8(x64GenContext, (xmmRegister < 8) ? 0x41 : 0x45);
@ -404,7 +402,7 @@ void x64Gen_comisd_xmmReg_mem64Reg64(x64GenContext_t* x64GenContext, sint32 xmmR
{
// SSE2
// compare bottom double with double from memory location
if( memoryReg == REG_R15 )
if( memoryReg == X86_REG_R15 )
{
x64Gen_writeU8(x64GenContext, 0x66);
x64Gen_genSSEVEXPrefix1(x64GenContext, xmmRegisterDest, true);
@ -432,7 +430,7 @@ void x64Gen_comiss_xmmReg_mem64Reg64(x64GenContext_t* x64GenContext, sint32 xmmR
{
// SSE2
// compare bottom float with float from memory location
if (memoryReg == REG_R15)
if (memoryReg == X86_REG_R15)
{
x64Gen_genSSEVEXPrefix1(x64GenContext, xmmRegisterDest, true);
x64Gen_writeU8(x64GenContext, 0x0F);
@ -448,7 +446,7 @@ void x64Gen_orps_xmmReg_mem128Reg64(x64GenContext_t* x64GenContext, sint32 xmmRe
{
// SSE2
// and xmm register with 128 bit value from memory
if( memReg == REG_R15 )
if( memReg == X86_REG_R15 )
{
x64Gen_genSSEVEXPrefix2(x64GenContext, memReg, xmmRegisterDest, false);
x64Gen_writeU8(x64GenContext, 0x0F);
@ -464,7 +462,7 @@ void x64Gen_xorps_xmmReg_mem128Reg64(x64GenContext_t* x64GenContext, sint32 xmmR
{
// SSE2
// xor xmm register with 128 bit value from memory
if( memReg == REG_R15 )
if( memReg == X86_REG_R15 )
{
x64Gen_genSSEVEXPrefix1(x64GenContext, xmmRegisterDest, true); // todo: should be x64Gen_genSSEVEXPrefix2() with memReg?
x64Gen_writeU8(x64GenContext, 0x0F);
@ -479,11 +477,11 @@ void x64Gen_xorps_xmmReg_mem128Reg64(x64GenContext_t* x64GenContext, sint32 xmmR
void x64Gen_andpd_xmmReg_memReg128(x64GenContext_t* x64GenContext, sint32 xmmRegister, sint32 memRegister, uint32 memImmU32)
{
// SSE2
if (memRegister == REG_NONE)
if (memRegister == X86_REG_NONE)
{
assert_dbg();
}
else if (memRegister == REG_R14)
else if (memRegister == X86_REG_R14)
{
x64Gen_writeU8(x64GenContext, 0x66);
x64Gen_writeU8(x64GenContext, (xmmRegister < 8) ? 0x41 : 0x45);
@ -502,7 +500,7 @@ void x64Gen_andps_xmmReg_mem128Reg64(x64GenContext_t* x64GenContext, sint32 xmmR
{
// SSE2
// and xmm register with 128 bit value from memory
if( memReg == REG_R15 )
if( memReg == X86_REG_R15 )
{
x64Gen_genSSEVEXPrefix1(x64GenContext, xmmRegisterDest, true); // todo: should be x64Gen_genSSEVEXPrefix2() with memReg?
x64Gen_writeU8(x64GenContext, 0x0F);
@ -528,7 +526,7 @@ void x64Gen_pcmpeqd_xmmReg_mem128Reg64(x64GenContext_t* x64GenContext, sint32 xm
{
// SSE2
// doubleword integer compare
if( memReg == REG_R15 )
if( memReg == X86_REG_R15 )
{
x64Gen_writeU8(x64GenContext, 0x66);
x64Gen_genSSEVEXPrefix1(x64GenContext, xmmRegisterDest, true);
@ -610,7 +608,7 @@ void x64Gen_cvtpi2pd_xmmReg_mem64Reg64(x64GenContext_t* x64GenContext, sint32 xm
{
// SSE2
// converts two signed 32bit integers to two doubles
if( memReg == REG_RSP )
if( memReg == X86_REG_RSP )
{
x64Gen_writeU8(x64GenContext, 0x66);
x64Gen_genSSEVEXPrefix1(x64GenContext, xmmRegisterDest, false);
@ -684,7 +682,7 @@ void x64Gen_rcpss_xmmReg_xmmReg(x64GenContext_t* x64GenContext, sint32 xmmRegist
void x64Gen_mulss_xmmReg_memReg64(x64GenContext_t* x64GenContext, sint32 xmmRegister, sint32 memRegister, uint32 memImmU32)
{
// SSE2
if( memRegister == REG_NONE )
if( memRegister == X86_REG_NONE )
{
assert_dbg();
}

View file

@ -203,7 +203,6 @@ template<class opcodeBytes, typename TA, typename TB>
void _x64Gen_writeMODRM_internal(x64GenContext_t* x64GenContext, TA opA, TB opB)
{
static_assert(TA::getType() == MODRM_OPR_TYPE::REG);
x64Gen_checkBuffer(x64GenContext);
// REX prefix
// 0100 WRXB
if constexpr (TA::getType() == MODRM_OPR_TYPE::REG && TB::getType() == MODRM_OPR_TYPE::REG)

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,16 @@
#pragma once
#include "IMLInstruction.h"
#include "IMLSegment.h"
// optimizer passes
void IMLOptimizer_OptimizeDirectFloatCopies(struct ppcImlGenContext_t* ppcImlGenContext);
void IMLOptimizer_OptimizeDirectIntegerCopies(struct ppcImlGenContext_t* ppcImlGenContext);
void PPCRecompiler_optimizePSQLoadAndStore(struct ppcImlGenContext_t* ppcImlGenContext);
void IMLOptimizer_StandardOptimizationPass(ppcImlGenContext_t& ppcImlGenContext);
// debug
void IMLDebug_DisassembleInstruction(const IMLInstruction& inst, std::string& disassemblyLineOut);
void IMLDebug_DumpSegment(struct ppcImlGenContext_t* ctx, IMLSegment* imlSegment, bool printLivenessRangeInfo = false);
void IMLDebug_Dump(struct ppcImlGenContext_t* ppcImlGenContext, bool printLivenessRangeInfo = false);

View file

@ -0,0 +1,5 @@
#include "IML.h"
//#include "PPCRecompilerIml.h"
#include "util/helpers/fixedSizeList.h"
#include "Cafe/HW/Espresso/Interpreter/PPCInterpreterInternal.h"

View file

@ -0,0 +1,528 @@
#include "IML.h"
#include "IMLInstruction.h"
#include "IMLSegment.h"
#include "IMLRegisterAllocatorRanges.h"
#include "util/helpers/StringBuf.h"
#include "../PPCRecompiler.h"
const char* IMLDebug_GetOpcodeName(const IMLInstruction* iml)
{
static char _tempOpcodename[32];
uint32 op = iml->operation;
if (op == PPCREC_IML_OP_ASSIGN)
return "MOV";
else if (op == PPCREC_IML_OP_ADD)
return "ADD";
else if (op == PPCREC_IML_OP_ADD_WITH_CARRY)
return "ADC";
else if (op == PPCREC_IML_OP_SUB)
return "SUB";
else if (op == PPCREC_IML_OP_OR)
return "OR";
else if (op == PPCREC_IML_OP_AND)
return "AND";
else if (op == PPCREC_IML_OP_XOR)
return "XOR";
else if (op == PPCREC_IML_OP_LEFT_SHIFT)
return "LSH";
else if (op == PPCREC_IML_OP_RIGHT_SHIFT_U)
return "RSH";
else if (op == PPCREC_IML_OP_RIGHT_SHIFT_S)
return "ARSH";
else if (op == PPCREC_IML_OP_LEFT_ROTATE)
return "LROT";
else if (op == PPCREC_IML_OP_MULTIPLY_SIGNED)
return "MULS";
else if (op == PPCREC_IML_OP_DIVIDE_SIGNED)
return "DIVS";
sprintf(_tempOpcodename, "OP0%02x_T%d", iml->operation, iml->type);
return _tempOpcodename;
}
std::string IMLDebug_GetRegName(IMLReg r)
{
std::string regName;
uint32 regId = r.GetRegID();
switch (r.GetRegFormat())
{
case IMLRegFormat::F32:
regName.append("f");
break;
case IMLRegFormat::F64:
regName.append("fd");
break;
case IMLRegFormat::I32:
regName.append("i");
break;
case IMLRegFormat::I64:
regName.append("r");
break;
default:
DEBUG_BREAK;
}
regName.append(fmt::format("{}", regId));
return regName;
}
void IMLDebug_AppendRegisterParam(StringBuf& strOutput, IMLReg virtualRegister, bool isLast = false)
{
strOutput.add(IMLDebug_GetRegName(virtualRegister));
if (!isLast)
strOutput.add(", ");
}
void IMLDebug_AppendS32Param(StringBuf& strOutput, sint32 val, bool isLast = false)
{
if (val < 0)
{
strOutput.add("-");
val = -val;
}
strOutput.addFmt("0x{:08x}", val);
if (!isLast)
strOutput.add(", ");
}
void IMLDebug_PrintLivenessRangeInfo(StringBuf& currentLineText, IMLSegment* imlSegment, sint32 offset)
{
// pad to 70 characters
sint32 index = currentLineText.getLen();
while (index < 70)
{
currentLineText.add(" ");
index++;
}
raLivenessRange* subrangeItr = imlSegment->raInfo.linkedList_allSubranges;
while (subrangeItr)
{
if (subrangeItr->interval.start.GetInstructionIndexEx() == offset)
{
if(subrangeItr->interval.start.IsInstructionIndex() && !subrangeItr->interval.start.IsOnInputEdge())
currentLineText.add(".");
else
currentLineText.add("|");
currentLineText.addFmt("{:<4}", subrangeItr->GetVirtualRegister());
}
else if (subrangeItr->interval.end.GetInstructionIndexEx() == offset)
{
if(subrangeItr->interval.end.IsInstructionIndex() && !subrangeItr->interval.end.IsOnOutputEdge())
currentLineText.add("* ");
else
currentLineText.add("| ");
}
else if (subrangeItr->interval.ContainsInstructionIndexEx(offset))
{
currentLineText.add("| ");
}
else
{
currentLineText.add(" ");
}
index += 5;
// next
subrangeItr = subrangeItr->link_allSegmentRanges.next;
}
}
std::string IMLDebug_GetSegmentName(ppcImlGenContext_t* ctx, IMLSegment* seg)
{
if (!ctx)
{
return "<NoNameWithoutCtx>";
}
// find segment index
for (size_t i = 0; i < ctx->segmentList2.size(); i++)
{
if (ctx->segmentList2[i] == seg)
{
return fmt::format("Seg{:04x}", i);
}
}
return "<SegmentNotInCtx>";
}
std::string IMLDebug_GetConditionName(IMLCondition cond)
{
switch (cond)
{
case IMLCondition::EQ:
return "EQ";
case IMLCondition::NEQ:
return "NEQ";
case IMLCondition::UNSIGNED_GT:
return "UGT";
case IMLCondition::UNSIGNED_LT:
return "ULT";
case IMLCondition::SIGNED_GT:
return "SGT";
case IMLCondition::SIGNED_LT:
return "SLT";
default:
cemu_assert_unimplemented();
}
return "ukn";
}
void IMLDebug_DisassembleInstruction(const IMLInstruction& inst, std::string& disassemblyLineOut)
{
const sint32 lineOffsetParameters = 10;//18;
StringBuf strOutput(1024);
strOutput.reset();
if (inst.type == PPCREC_IML_TYPE_R_NAME || inst.type == PPCREC_IML_TYPE_NAME_R)
{
if (inst.type == PPCREC_IML_TYPE_R_NAME)
strOutput.add("R_NAME");
else
strOutput.add("NAME_R");
while ((sint32)strOutput.getLen() < lineOffsetParameters)
strOutput.add(" ");
if(inst.type == PPCREC_IML_TYPE_R_NAME)
IMLDebug_AppendRegisterParam(strOutput, inst.op_r_name.regR);
strOutput.add("name_");
if (inst.op_r_name.name >= PPCREC_NAME_R0 && inst.op_r_name.name < (PPCREC_NAME_R0 + 999))
{
strOutput.addFmt("r{}", inst.op_r_name.name - PPCREC_NAME_R0);
}
else if (inst.op_r_name.name >= PPCREC_NAME_FPR0 && inst.op_r_name.name < (PPCREC_NAME_FPR0 + 999))
{
strOutput.addFmt("f{}", inst.op_r_name.name - PPCREC_NAME_FPR0);
}
else if (inst.op_r_name.name >= PPCREC_NAME_SPR0 && inst.op_r_name.name < (PPCREC_NAME_SPR0 + 999))
{
strOutput.addFmt("spr{}", inst.op_r_name.name - PPCREC_NAME_SPR0);
}
else if (inst.op_r_name.name >= PPCREC_NAME_CR && inst.op_r_name.name <= PPCREC_NAME_CR_LAST)
strOutput.addFmt("cr{}", inst.op_r_name.name - PPCREC_NAME_CR);
else if (inst.op_r_name.name == PPCREC_NAME_XER_CA)
strOutput.add("xer.ca");
else if (inst.op_r_name.name == PPCREC_NAME_XER_SO)
strOutput.add("xer.so");
else if (inst.op_r_name.name == PPCREC_NAME_XER_OV)
strOutput.add("xer.ov");
else if (inst.op_r_name.name == PPCREC_NAME_CPU_MEMRES_EA)
strOutput.add("cpuReservation.ea");
else if (inst.op_r_name.name == PPCREC_NAME_CPU_MEMRES_VAL)
strOutput.add("cpuReservation.value");
else
{
strOutput.addFmt("name_ukn{}", inst.op_r_name.name);
}
if (inst.type != PPCREC_IML_TYPE_R_NAME)
{
strOutput.add(", ");
IMLDebug_AppendRegisterParam(strOutput, inst.op_r_name.regR, true);
}
}
else if (inst.type == PPCREC_IML_TYPE_R_R)
{
strOutput.addFmt("{}", IMLDebug_GetOpcodeName(&inst));
while ((sint32)strOutput.getLen() < lineOffsetParameters)
strOutput.add(" ");
IMLDebug_AppendRegisterParam(strOutput, inst.op_r_r.regR);
IMLDebug_AppendRegisterParam(strOutput, inst.op_r_r.regA, true);
}
else if (inst.type == PPCREC_IML_TYPE_R_R_R)
{
strOutput.addFmt("{}", IMLDebug_GetOpcodeName(&inst));
while ((sint32)strOutput.getLen() < lineOffsetParameters)
strOutput.add(" ");
IMLDebug_AppendRegisterParam(strOutput, inst.op_r_r_r.regR);
IMLDebug_AppendRegisterParam(strOutput, inst.op_r_r_r.regA);
IMLDebug_AppendRegisterParam(strOutput, inst.op_r_r_r.regB, true);
}
else if (inst.type == PPCREC_IML_TYPE_R_R_R_CARRY)
{
strOutput.addFmt("{}", IMLDebug_GetOpcodeName(&inst));
while ((sint32)strOutput.getLen() < lineOffsetParameters)
strOutput.add(" ");
IMLDebug_AppendRegisterParam(strOutput, inst.op_r_r_r_carry.regR);
IMLDebug_AppendRegisterParam(strOutput, inst.op_r_r_r_carry.regA);
IMLDebug_AppendRegisterParam(strOutput, inst.op_r_r_r_carry.regB);
IMLDebug_AppendRegisterParam(strOutput, inst.op_r_r_r_carry.regCarry, true);
}
else if (inst.type == PPCREC_IML_TYPE_COMPARE)
{
strOutput.add("CMP ");
while ((sint32)strOutput.getLen() < lineOffsetParameters)
strOutput.add(" ");
IMLDebug_AppendRegisterParam(strOutput, inst.op_compare.regA);
IMLDebug_AppendRegisterParam(strOutput, inst.op_compare.regB);
strOutput.addFmt("{}", IMLDebug_GetConditionName(inst.op_compare.cond));
strOutput.add(" -> ");
IMLDebug_AppendRegisterParam(strOutput, inst.op_compare.regR, true);
}
else if (inst.type == PPCREC_IML_TYPE_COMPARE_S32)
{
strOutput.add("CMP ");
while ((sint32)strOutput.getLen() < lineOffsetParameters)
strOutput.add(" ");
IMLDebug_AppendRegisterParam(strOutput, inst.op_compare_s32.regA);
strOutput.addFmt("{}", inst.op_compare_s32.immS32);
strOutput.addFmt(", {}", IMLDebug_GetConditionName(inst.op_compare_s32.cond));
strOutput.add(" -> ");
IMLDebug_AppendRegisterParam(strOutput, inst.op_compare_s32.regR, true);
}
else if (inst.type == PPCREC_IML_TYPE_CONDITIONAL_JUMP)
{
strOutput.add("CJUMP ");
while ((sint32)strOutput.getLen() < lineOffsetParameters)
strOutput.add(" ");
IMLDebug_AppendRegisterParam(strOutput, inst.op_conditional_jump.registerBool, true);
if (!inst.op_conditional_jump.mustBeTrue)
strOutput.add("(inverted)");
}
else if (inst.type == PPCREC_IML_TYPE_JUMP)
{
strOutput.add("JUMP");
}
else if (inst.type == PPCREC_IML_TYPE_R_R_S32)
{
strOutput.addFmt("{}", IMLDebug_GetOpcodeName(&inst));
while ((sint32)strOutput.getLen() < lineOffsetParameters)
strOutput.add(" ");
IMLDebug_AppendRegisterParam(strOutput, inst.op_r_r_s32.regR);
IMLDebug_AppendRegisterParam(strOutput, inst.op_r_r_s32.regA);
IMLDebug_AppendS32Param(strOutput, inst.op_r_r_s32.immS32, true);
}
else if (inst.type == PPCREC_IML_TYPE_R_R_S32_CARRY)
{
strOutput.addFmt("{}", IMLDebug_GetOpcodeName(&inst));
while ((sint32)strOutput.getLen() < lineOffsetParameters)
strOutput.add(" ");
IMLDebug_AppendRegisterParam(strOutput, inst.op_r_r_s32_carry.regR);
IMLDebug_AppendRegisterParam(strOutput, inst.op_r_r_s32_carry.regA);
IMLDebug_AppendS32Param(strOutput, inst.op_r_r_s32_carry.immS32);
IMLDebug_AppendRegisterParam(strOutput, inst.op_r_r_s32_carry.regCarry, true);
}
else if (inst.type == PPCREC_IML_TYPE_R_S32)
{
strOutput.addFmt("{}", IMLDebug_GetOpcodeName(&inst));
while ((sint32)strOutput.getLen() < lineOffsetParameters)
strOutput.add(" ");
IMLDebug_AppendRegisterParam(strOutput, inst.op_r_immS32.regR);
IMLDebug_AppendS32Param(strOutput, inst.op_r_immS32.immS32, true);
}
else if (inst.type == PPCREC_IML_TYPE_LOAD || inst.type == PPCREC_IML_TYPE_STORE ||
inst.type == PPCREC_IML_TYPE_LOAD_INDEXED || inst.type == PPCREC_IML_TYPE_STORE_INDEXED)
{
if (inst.type == PPCREC_IML_TYPE_LOAD || inst.type == PPCREC_IML_TYPE_LOAD_INDEXED)
strOutput.add("LD_");
else
strOutput.add("ST_");
if (inst.op_storeLoad.flags2.signExtend)
strOutput.add("S");
else
strOutput.add("U");
strOutput.addFmt("{}", inst.op_storeLoad.copyWidth);
while ((sint32)strOutput.getLen() < lineOffsetParameters)
strOutput.add(" ");
IMLDebug_AppendRegisterParam(strOutput, inst.op_storeLoad.registerData);
if (inst.type == PPCREC_IML_TYPE_LOAD_INDEXED || inst.type == PPCREC_IML_TYPE_STORE_INDEXED)
strOutput.addFmt("[{}+{}]", IMLDebug_GetRegName(inst.op_storeLoad.registerMem), IMLDebug_GetRegName(inst.op_storeLoad.registerMem2));
else
strOutput.addFmt("[{}+{}]", IMLDebug_GetRegName(inst.op_storeLoad.registerMem), inst.op_storeLoad.immS32);
}
else if (inst.type == PPCREC_IML_TYPE_ATOMIC_CMP_STORE)
{
strOutput.add("ATOMIC_ST_U32");
while ((sint32)strOutput.getLen() < lineOffsetParameters)
strOutput.add(" ");
IMLDebug_AppendRegisterParam(strOutput, inst.op_atomic_compare_store.regEA);
IMLDebug_AppendRegisterParam(strOutput, inst.op_atomic_compare_store.regCompareValue);
IMLDebug_AppendRegisterParam(strOutput, inst.op_atomic_compare_store.regWriteValue);
IMLDebug_AppendRegisterParam(strOutput, inst.op_atomic_compare_store.regBoolOut, true);
}
else if (inst.type == PPCREC_IML_TYPE_NO_OP)
{
strOutput.add("NOP");
}
else if (inst.type == PPCREC_IML_TYPE_MACRO)
{
if (inst.operation == PPCREC_IML_MACRO_B_TO_REG)
{
strOutput.addFmt("MACRO B_TO_REG {}", IMLDebug_GetRegName(inst.op_macro.paramReg));
}
else if (inst.operation == PPCREC_IML_MACRO_BL)
{
strOutput.addFmt("MACRO BL 0x{:08x} -> 0x{:08x} cycles (depr): {}", inst.op_macro.param, inst.op_macro.param2, (sint32)inst.op_macro.paramU16);
}
else if (inst.operation == PPCREC_IML_MACRO_B_FAR)
{
strOutput.addFmt("MACRO B_FAR 0x{:08x} -> 0x{:08x} cycles (depr): {}", inst.op_macro.param, inst.op_macro.param2, (sint32)inst.op_macro.paramU16);
}
else if (inst.operation == PPCREC_IML_MACRO_LEAVE)
{
strOutput.addFmt("MACRO LEAVE ppc: 0x{:08x}", inst.op_macro.param);
}
else if (inst.operation == PPCREC_IML_MACRO_HLE)
{
strOutput.addFmt("MACRO HLE ppcAddr: 0x{:08x} funcId: 0x{:08x}", inst.op_macro.param, inst.op_macro.param2);
}
else if (inst.operation == PPCREC_IML_MACRO_COUNT_CYCLES)
{
strOutput.addFmt("MACRO COUNT_CYCLES cycles: {}", inst.op_macro.param);
}
else
{
strOutput.addFmt("MACRO ukn operation {}", inst.operation);
}
}
else if (inst.type == PPCREC_IML_TYPE_FPR_LOAD)
{
strOutput.addFmt("{} = ", IMLDebug_GetRegName(inst.op_storeLoad.registerData));
if (inst.op_storeLoad.flags2.signExtend)
strOutput.add("S");
else
strOutput.add("U");
strOutput.addFmt("{} [{}+{}] mode {}", inst.op_storeLoad.copyWidth / 8, IMLDebug_GetRegName(inst.op_storeLoad.registerMem), inst.op_storeLoad.immS32, inst.op_storeLoad.mode);
if (inst.op_storeLoad.flags2.notExpanded)
{
strOutput.addFmt(" <No expand>");
}
}
else if (inst.type == PPCREC_IML_TYPE_FPR_STORE)
{
if (inst.op_storeLoad.flags2.signExtend)
strOutput.add("S");
else
strOutput.add("U");
strOutput.addFmt("{} [t{}+{}]", inst.op_storeLoad.copyWidth / 8, inst.op_storeLoad.registerMem.GetRegID(), inst.op_storeLoad.immS32);
strOutput.addFmt(" = {} mode {}", IMLDebug_GetRegName(inst.op_storeLoad.registerData), inst.op_storeLoad.mode);
}
else if (inst.type == PPCREC_IML_TYPE_FPR_R_R)
{
strOutput.addFmt("{:>6} ", IMLDebug_GetOpcodeName(&inst));
strOutput.addFmt("{}, {}", IMLDebug_GetRegName(inst.op_fpr_r_r.regR), IMLDebug_GetRegName(inst.op_fpr_r_r.regA));
}
else if (inst.type == PPCREC_IML_TYPE_FPR_R_R_R_R)
{
strOutput.addFmt("{:>6} ", IMLDebug_GetOpcodeName(&inst));
strOutput.addFmt("{}, {}, {}, {}", IMLDebug_GetRegName(inst.op_fpr_r_r_r_r.regR), IMLDebug_GetRegName(inst.op_fpr_r_r_r_r.regA), IMLDebug_GetRegName(inst.op_fpr_r_r_r_r.regB), IMLDebug_GetRegName(inst.op_fpr_r_r_r_r.regC));
}
else if (inst.type == PPCREC_IML_TYPE_FPR_R_R_R)
{
strOutput.addFmt("{:>6} ", IMLDebug_GetOpcodeName(&inst));
strOutput.addFmt("{}, {}, {}", IMLDebug_GetRegName(inst.op_fpr_r_r_r.regR), IMLDebug_GetRegName(inst.op_fpr_r_r_r.regA), IMLDebug_GetRegName(inst.op_fpr_r_r_r.regB));
}
else if (inst.type == PPCREC_IML_TYPE_CJUMP_CYCLE_CHECK)
{
strOutput.addFmt("CYCLE_CHECK");
}
else if (inst.type == PPCREC_IML_TYPE_X86_EFLAGS_JCC)
{
strOutput.addFmt("X86_JCC {}", IMLDebug_GetConditionName(inst.op_x86_eflags_jcc.cond));
}
else
{
strOutput.addFmt("Unknown iml type {}", inst.type);
}
disassemblyLineOut.assign(strOutput.c_str());
}
void IMLDebug_DumpSegment(ppcImlGenContext_t* ctx, IMLSegment* imlSegment, bool printLivenessRangeInfo)
{
StringBuf strOutput(4096);
strOutput.addFmt("SEGMENT {} | PPC=0x{:08x} Loop-depth {}", IMLDebug_GetSegmentName(ctx, imlSegment), imlSegment->ppcAddress, imlSegment->loopDepth);
if (imlSegment->isEnterable)
{
strOutput.addFmt(" ENTERABLE (0x{:08x})", imlSegment->enterPPCAddress);
}
if (imlSegment->deadCodeEliminationHintSeg)
{
strOutput.addFmt(" InheritOverwrite: {}", IMLDebug_GetSegmentName(ctx, imlSegment->deadCodeEliminationHintSeg));
}
cemuLog_log(LogType::Force, "{}", strOutput.c_str());
if (printLivenessRangeInfo)
{
strOutput.reset();
IMLDebug_PrintLivenessRangeInfo(strOutput, imlSegment, RA_INTER_RANGE_START);
cemuLog_log(LogType::Force, "{}", strOutput.c_str());
}
//debug_printf("\n");
strOutput.reset();
std::string disassemblyLine;
for (sint32 i = 0; i < imlSegment->imlList.size(); i++)
{
const IMLInstruction& inst = imlSegment->imlList[i];
// don't log NOP instructions
if (inst.type == PPCREC_IML_TYPE_NO_OP)
continue;
strOutput.reset();
strOutput.addFmt("{:02x} ", i);
//cemuLog_log(LogType::Force, "{:02x} ", i);
disassemblyLine.clear();
IMLDebug_DisassembleInstruction(inst, disassemblyLine);
strOutput.add(disassemblyLine);
if (printLivenessRangeInfo)
{
IMLDebug_PrintLivenessRangeInfo(strOutput, imlSegment, i);
}
cemuLog_log(LogType::Force, "{}", strOutput.c_str());
}
// all ranges
if (printLivenessRangeInfo)
{
strOutput.reset();
strOutput.add("Ranges-VirtReg ");
raLivenessRange* subrangeItr = imlSegment->raInfo.linkedList_allSubranges;
while (subrangeItr)
{
strOutput.addFmt("v{:<4}", (uint32)subrangeItr->GetVirtualRegister());
subrangeItr = subrangeItr->link_allSegmentRanges.next;
}
cemuLog_log(LogType::Force, "{}", strOutput.c_str());
strOutput.reset();
strOutput.add("Ranges-PhysReg ");
subrangeItr = imlSegment->raInfo.linkedList_allSubranges;
while (subrangeItr)
{
strOutput.addFmt("p{:<4}", subrangeItr->GetPhysicalRegister());
subrangeItr = subrangeItr->link_allSegmentRanges.next;
}
cemuLog_log(LogType::Force, "{}", strOutput.c_str());
}
// branch info
strOutput.reset();
strOutput.add("Links from: ");
for (sint32 i = 0; i < imlSegment->list_prevSegments.size(); i++)
{
if (i)
strOutput.add(", ");
strOutput.addFmt("{}", IMLDebug_GetSegmentName(ctx, imlSegment->list_prevSegments[i]).c_str());
}
cemuLog_log(LogType::Force, "{}", strOutput.c_str());
if (imlSegment->nextSegmentBranchNotTaken)
cemuLog_log(LogType::Force, "BranchNotTaken: {}", IMLDebug_GetSegmentName(ctx, imlSegment->nextSegmentBranchNotTaken).c_str());
if (imlSegment->nextSegmentBranchTaken)
cemuLog_log(LogType::Force, "BranchTaken: {}", IMLDebug_GetSegmentName(ctx, imlSegment->nextSegmentBranchTaken).c_str());
if (imlSegment->nextSegmentIsUncertain)
cemuLog_log(LogType::Force, "Dynamic target");
}
void IMLDebug_Dump(ppcImlGenContext_t* ppcImlGenContext, bool printLivenessRangeInfo)
{
for (size_t i = 0; i < ppcImlGenContext->segmentList2.size(); i++)
{
IMLDebug_DumpSegment(ppcImlGenContext, ppcImlGenContext->segmentList2[i], printLivenessRangeInfo);
cemuLog_log(LogType::Force, "");
}
}

View file

@ -0,0 +1,669 @@
#include "IMLInstruction.h"
#include "IML.h"
#include "../PPCRecompiler.h"
#include "../PPCRecompilerIml.h"
// return true if an instruction has side effects on top of just reading and writing registers
bool IMLInstruction::HasSideEffects() const
{
bool hasSideEffects = true;
if(type == PPCREC_IML_TYPE_R_R || type == PPCREC_IML_TYPE_R_R_S32 || type == PPCREC_IML_TYPE_COMPARE || type == PPCREC_IML_TYPE_COMPARE_S32)
hasSideEffects = false;
// todo - add more cases
return hasSideEffects;
}
void IMLInstruction::CheckRegisterUsage(IMLUsedRegisters* registersUsed) const
{
registersUsed->readGPR1 = IMLREG_INVALID;
registersUsed->readGPR2 = IMLREG_INVALID;
registersUsed->readGPR3 = IMLREG_INVALID;
registersUsed->readGPR4 = IMLREG_INVALID;
registersUsed->writtenGPR1 = IMLREG_INVALID;
registersUsed->writtenGPR2 = IMLREG_INVALID;
if (type == PPCREC_IML_TYPE_R_NAME)
{
registersUsed->writtenGPR1 = op_r_name.regR;
}
else if (type == PPCREC_IML_TYPE_NAME_R)
{
registersUsed->readGPR1 = op_r_name.regR;
}
else if (type == PPCREC_IML_TYPE_R_R)
{
if (operation == PPCREC_IML_OP_X86_CMP)
{
// both operands are read only
registersUsed->readGPR1 = op_r_r.regR;
registersUsed->readGPR2 = op_r_r.regA;
}
else if (
operation == PPCREC_IML_OP_ASSIGN ||
operation == PPCREC_IML_OP_ENDIAN_SWAP ||
operation == PPCREC_IML_OP_CNTLZW ||
operation == PPCREC_IML_OP_NOT ||
operation == PPCREC_IML_OP_NEG ||
operation == PPCREC_IML_OP_ASSIGN_S16_TO_S32 ||
operation == PPCREC_IML_OP_ASSIGN_S8_TO_S32)
{
// result is written, operand is read
registersUsed->writtenGPR1 = op_r_r.regR;
registersUsed->readGPR1 = op_r_r.regA;
}
else
cemu_assert_unimplemented();
}
else if (type == PPCREC_IML_TYPE_R_S32)
{
cemu_assert_debug(operation != PPCREC_IML_OP_ADD &&
operation != PPCREC_IML_OP_SUB &&
operation != PPCREC_IML_OP_AND &&
operation != PPCREC_IML_OP_OR &&
operation != PPCREC_IML_OP_XOR); // deprecated, use r_r_s32 for these
if (operation == PPCREC_IML_OP_LEFT_ROTATE)
{
// register operand is read and write
registersUsed->readGPR1 = op_r_immS32.regR;
registersUsed->writtenGPR1 = op_r_immS32.regR;
}
else if (operation == PPCREC_IML_OP_X86_CMP)
{
// register operand is read only
registersUsed->readGPR1 = op_r_immS32.regR;
}
else
{
// register operand is write only
// todo - use explicit lists, avoid default cases
registersUsed->writtenGPR1 = op_r_immS32.regR;
}
}
else if (type == PPCREC_IML_TYPE_R_R_S32)
{
registersUsed->writtenGPR1 = op_r_r_s32.regR;
registersUsed->readGPR1 = op_r_r_s32.regA;
}
else if (type == PPCREC_IML_TYPE_R_R_S32_CARRY)
{
registersUsed->writtenGPR1 = op_r_r_s32_carry.regR;
registersUsed->readGPR1 = op_r_r_s32_carry.regA;
// some operations read carry
switch (operation)
{
case PPCREC_IML_OP_ADD_WITH_CARRY:
registersUsed->readGPR2 = op_r_r_s32_carry.regCarry;
break;
case PPCREC_IML_OP_ADD:
break;
default:
cemu_assert_unimplemented();
}
// carry is always written
registersUsed->writtenGPR2 = op_r_r_s32_carry.regCarry;
}
else if (type == PPCREC_IML_TYPE_R_R_R)
{
// in all cases result is written and other operands are read only
// with the exception of XOR, where if regA == regB then all bits are zeroed out. So we don't consider it a read
registersUsed->writtenGPR1 = op_r_r_r.regR;
if(!(operation == PPCREC_IML_OP_XOR && op_r_r_r.regA == op_r_r_r.regB))
{
registersUsed->readGPR1 = op_r_r_r.regA;
registersUsed->readGPR2 = op_r_r_r.regB;
}
}
else if (type == PPCREC_IML_TYPE_R_R_R_CARRY)
{
registersUsed->writtenGPR1 = op_r_r_r_carry.regR;
registersUsed->readGPR1 = op_r_r_r_carry.regA;
registersUsed->readGPR2 = op_r_r_r_carry.regB;
// some operations read carry
switch (operation)
{
case PPCREC_IML_OP_ADD_WITH_CARRY:
registersUsed->readGPR3 = op_r_r_r_carry.regCarry;
break;
case PPCREC_IML_OP_ADD:
break;
default:
cemu_assert_unimplemented();
}
// carry is always written
registersUsed->writtenGPR2 = op_r_r_r_carry.regCarry;
}
else if (type == PPCREC_IML_TYPE_CJUMP_CYCLE_CHECK)
{
// no effect on registers
}
else if (type == PPCREC_IML_TYPE_NO_OP)
{
// no effect on registers
}
else if (type == PPCREC_IML_TYPE_MACRO)
{
if (operation == PPCREC_IML_MACRO_BL || operation == PPCREC_IML_MACRO_B_FAR || operation == PPCREC_IML_MACRO_LEAVE || operation == PPCREC_IML_MACRO_DEBUGBREAK || operation == PPCREC_IML_MACRO_COUNT_CYCLES || operation == PPCREC_IML_MACRO_HLE)
{
// no effect on registers
}
else if (operation == PPCREC_IML_MACRO_B_TO_REG)
{
cemu_assert_debug(op_macro.paramReg.IsValid());
registersUsed->readGPR1 = op_macro.paramReg;
}
else
cemu_assert_unimplemented();
}
else if (type == PPCREC_IML_TYPE_COMPARE)
{
registersUsed->readGPR1 = op_compare.regA;
registersUsed->readGPR2 = op_compare.regB;
registersUsed->writtenGPR1 = op_compare.regR;
}
else if (type == PPCREC_IML_TYPE_COMPARE_S32)
{
registersUsed->readGPR1 = op_compare_s32.regA;
registersUsed->writtenGPR1 = op_compare_s32.regR;
}
else if (type == PPCREC_IML_TYPE_CONDITIONAL_JUMP)
{
registersUsed->readGPR1 = op_conditional_jump.registerBool;
}
else if (type == PPCREC_IML_TYPE_JUMP)
{
// no registers affected
}
else if (type == PPCREC_IML_TYPE_LOAD)
{
registersUsed->writtenGPR1 = op_storeLoad.registerData;
if (op_storeLoad.registerMem.IsValid())
registersUsed->readGPR1 = op_storeLoad.registerMem;
}
else if (type == PPCREC_IML_TYPE_LOAD_INDEXED)
{
registersUsed->writtenGPR1 = op_storeLoad.registerData;
if (op_storeLoad.registerMem.IsValid())
registersUsed->readGPR1 = op_storeLoad.registerMem;
if (op_storeLoad.registerMem2.IsValid())
registersUsed->readGPR2 = op_storeLoad.registerMem2;
}
else if (type == PPCREC_IML_TYPE_STORE)
{
registersUsed->readGPR1 = op_storeLoad.registerData;
if (op_storeLoad.registerMem.IsValid())
registersUsed->readGPR2 = op_storeLoad.registerMem;
}
else if (type == PPCREC_IML_TYPE_STORE_INDEXED)
{
registersUsed->readGPR1 = op_storeLoad.registerData;
if (op_storeLoad.registerMem.IsValid())
registersUsed->readGPR2 = op_storeLoad.registerMem;
if (op_storeLoad.registerMem2.IsValid())
registersUsed->readGPR3 = op_storeLoad.registerMem2;
}
else if (type == PPCREC_IML_TYPE_ATOMIC_CMP_STORE)
{
registersUsed->readGPR1 = op_atomic_compare_store.regEA;
registersUsed->readGPR2 = op_atomic_compare_store.regCompareValue;
registersUsed->readGPR3 = op_atomic_compare_store.regWriteValue;
registersUsed->writtenGPR1 = op_atomic_compare_store.regBoolOut;
}
else if (type == PPCREC_IML_TYPE_CALL_IMM)
{
if (op_call_imm.regParam0.IsValid())
registersUsed->readGPR1 = op_call_imm.regParam0;
if (op_call_imm.regParam1.IsValid())
registersUsed->readGPR2 = op_call_imm.regParam1;
if (op_call_imm.regParam2.IsValid())
registersUsed->readGPR3 = op_call_imm.regParam2;
registersUsed->writtenGPR1 = op_call_imm.regReturn;
}
else if (type == PPCREC_IML_TYPE_FPR_LOAD)
{
// fpr load operation
registersUsed->writtenGPR1 = op_storeLoad.registerData;
// address is in gpr register
if (op_storeLoad.registerMem.IsValid())
registersUsed->readGPR1 = op_storeLoad.registerMem;
// determine partially written result
switch (op_storeLoad.mode)
{
case PPCREC_FPR_LD_MODE_PSQ_GENERIC_PS0:
case PPCREC_FPR_LD_MODE_PSQ_GENERIC_PS0_PS1:
cemu_assert_debug(op_storeLoad.registerGQR.IsValid());
registersUsed->readGPR2 = op_storeLoad.registerGQR;
break;
case PPCREC_FPR_LD_MODE_DOUBLE_INTO_PS0:
// PS1 remains the same
cemu_assert_debug(op_storeLoad.registerGQR.IsInvalid());
registersUsed->readGPR2 = op_storeLoad.registerData;
break;
case PPCREC_FPR_LD_MODE_SINGLE_INTO_PS0_PS1:
case PPCREC_FPR_LD_MODE_PSQ_FLOAT_PS0_PS1:
case PPCREC_FPR_LD_MODE_PSQ_FLOAT_PS0:
case PPCREC_FPR_LD_MODE_PSQ_S16_PS0:
case PPCREC_FPR_LD_MODE_PSQ_S16_PS0_PS1:
case PPCREC_FPR_LD_MODE_PSQ_U16_PS0_PS1:
case PPCREC_FPR_LD_MODE_PSQ_U16_PS0:
case PPCREC_FPR_LD_MODE_PSQ_S8_PS0_PS1:
case PPCREC_FPR_LD_MODE_PSQ_U8_PS0_PS1:
case PPCREC_FPR_LD_MODE_PSQ_U8_PS0:
case PPCREC_FPR_LD_MODE_PSQ_S8_PS0:
cemu_assert_debug(op_storeLoad.registerGQR.IsInvalid());
break;
default:
cemu_assert_unimplemented();
}
}
else if (type == PPCREC_IML_TYPE_FPR_LOAD_INDEXED)
{
// fpr load operation
registersUsed->writtenGPR1 = op_storeLoad.registerData;
// address is in gpr registers
if (op_storeLoad.registerMem.IsValid())
registersUsed->readGPR1 = op_storeLoad.registerMem;
if (op_storeLoad.registerMem2.IsValid())
registersUsed->readGPR2 = op_storeLoad.registerMem2;
// determine partially written result
switch (op_storeLoad.mode)
{
case PPCREC_FPR_LD_MODE_PSQ_GENERIC_PS0:
case PPCREC_FPR_LD_MODE_PSQ_GENERIC_PS0_PS1:
cemu_assert_debug(op_storeLoad.registerGQR.IsValid());
registersUsed->readGPR3 = op_storeLoad.registerGQR;
break;
case PPCREC_FPR_LD_MODE_DOUBLE_INTO_PS0:
// PS1 remains the same
cemu_assert_debug(op_storeLoad.registerGQR.IsInvalid());
registersUsed->readGPR3 = op_storeLoad.registerData;
break;
case PPCREC_FPR_LD_MODE_SINGLE_INTO_PS0_PS1:
case PPCREC_FPR_LD_MODE_PSQ_FLOAT_PS0_PS1:
case PPCREC_FPR_LD_MODE_PSQ_FLOAT_PS0:
case PPCREC_FPR_LD_MODE_PSQ_S16_PS0:
case PPCREC_FPR_LD_MODE_PSQ_S16_PS0_PS1:
case PPCREC_FPR_LD_MODE_PSQ_U16_PS0_PS1:
case PPCREC_FPR_LD_MODE_PSQ_U16_PS0:
case PPCREC_FPR_LD_MODE_PSQ_S8_PS0_PS1:
case PPCREC_FPR_LD_MODE_PSQ_U8_PS0_PS1:
case PPCREC_FPR_LD_MODE_PSQ_U8_PS0:
cemu_assert_debug(op_storeLoad.registerGQR.IsInvalid());
break;
default:
cemu_assert_unimplemented();
}
}
else if (type == PPCREC_IML_TYPE_FPR_STORE)
{
// fpr store operation
registersUsed->readGPR1 = op_storeLoad.registerData;
if (op_storeLoad.registerMem.IsValid())
registersUsed->readGPR2 = op_storeLoad.registerMem;
// PSQ generic stores also access GQR
switch (op_storeLoad.mode)
{
case PPCREC_FPR_ST_MODE_PSQ_GENERIC_PS0:
case PPCREC_FPR_ST_MODE_PSQ_GENERIC_PS0_PS1:
cemu_assert_debug(op_storeLoad.registerGQR.IsValid());
registersUsed->readGPR3 = op_storeLoad.registerGQR;
break;
default:
cemu_assert_debug(op_storeLoad.registerGQR.IsInvalid());
break;
}
}
else if (type == PPCREC_IML_TYPE_FPR_STORE_INDEXED)
{
// fpr store operation
registersUsed->readGPR1 = op_storeLoad.registerData;
// address is in gpr registers
if (op_storeLoad.registerMem.IsValid())
registersUsed->readGPR2 = op_storeLoad.registerMem;
if (op_storeLoad.registerMem2.IsValid())
registersUsed->readGPR3 = op_storeLoad.registerMem2;
// PSQ generic stores also access GQR
switch (op_storeLoad.mode)
{
case PPCREC_FPR_ST_MODE_PSQ_GENERIC_PS0:
case PPCREC_FPR_ST_MODE_PSQ_GENERIC_PS0_PS1:
cemu_assert_debug(op_storeLoad.registerGQR.IsValid());
registersUsed->readGPR4 = op_storeLoad.registerGQR;
break;
default:
cemu_assert_debug(op_storeLoad.registerGQR.IsInvalid());
break;
}
}
else if (type == PPCREC_IML_TYPE_FPR_R_R)
{
// fpr operation
if (operation == PPCREC_IML_OP_FPR_COPY_BOTTOM_TO_BOTTOM_AND_TOP ||
operation == PPCREC_IML_OP_FPR_COPY_TOP_TO_BOTTOM_AND_TOP ||
operation == PPCREC_IML_OP_FPR_COPY_BOTTOM_AND_TOP_SWAPPED ||
operation == PPCREC_IML_OP_ASSIGN ||
operation == PPCREC_IML_OP_FPR_NEGATE_PAIR ||
operation == PPCREC_IML_OP_FPR_ABS_PAIR ||
operation == PPCREC_IML_OP_FPR_FRES_PAIR ||
operation == PPCREC_IML_OP_FPR_FRSQRTE_PAIR)
{
// operand read, result written
registersUsed->readGPR1 = op_fpr_r_r.regA;
registersUsed->writtenGPR1 = op_fpr_r_r.regR;
}
else if (
operation == PPCREC_IML_OP_FPR_COPY_BOTTOM_TO_BOTTOM ||
operation == PPCREC_IML_OP_FPR_COPY_BOTTOM_TO_TOP ||
operation == PPCREC_IML_OP_FPR_COPY_TOP_TO_TOP ||
operation == PPCREC_IML_OP_FPR_COPY_TOP_TO_BOTTOM ||
operation == PPCREC_IML_OP_FPR_EXPAND_BOTTOM32_TO_BOTTOM64_AND_TOP64 ||
operation == PPCREC_IML_OP_FPR_BOTTOM_FCTIWZ ||
operation == PPCREC_IML_OP_FPR_BOTTOM_RECIPROCAL_SQRT
)
{
// operand read, result read and (partially) written
registersUsed->readGPR1 = op_fpr_r_r.regA;
registersUsed->readGPR2 = op_fpr_r_r.regR;
registersUsed->writtenGPR1 = op_fpr_r_r.regR;
}
else if (operation == PPCREC_IML_OP_FPR_MULTIPLY_BOTTOM ||
operation == PPCREC_IML_OP_FPR_MULTIPLY_PAIR ||
operation == PPCREC_IML_OP_FPR_DIVIDE_BOTTOM ||
operation == PPCREC_IML_OP_FPR_DIVIDE_PAIR ||
operation == PPCREC_IML_OP_FPR_ADD_BOTTOM ||
operation == PPCREC_IML_OP_FPR_ADD_PAIR ||
operation == PPCREC_IML_OP_FPR_SUB_PAIR ||
operation == PPCREC_IML_OP_FPR_SUB_BOTTOM)
{
// operand read, result read and written
registersUsed->readGPR1 = op_fpr_r_r.regA;
registersUsed->readGPR2 = op_fpr_r_r.regR;
registersUsed->writtenGPR1 = op_fpr_r_r.regR;
}
else if (operation == PPCREC_IML_OP_FPR_FCMPU_BOTTOM ||
operation == PPCREC_IML_OP_FPR_FCMPU_TOP ||
operation == PPCREC_IML_OP_FPR_FCMPO_BOTTOM)
{
// operand read, result read
registersUsed->readGPR1 = op_fpr_r_r.regA;
registersUsed->readGPR2 = op_fpr_r_r.regR;
}
else
cemu_assert_unimplemented();
}
else if (type == PPCREC_IML_TYPE_FPR_R_R_R)
{
// fpr operation
registersUsed->readGPR1 = op_fpr_r_r_r.regA;
registersUsed->readGPR2 = op_fpr_r_r_r.regB;
registersUsed->writtenGPR1 = op_fpr_r_r_r.regR;
// handle partially written result
switch (operation)
{
case PPCREC_IML_OP_FPR_MULTIPLY_BOTTOM:
case PPCREC_IML_OP_FPR_ADD_BOTTOM:
case PPCREC_IML_OP_FPR_SUB_BOTTOM:
registersUsed->readGPR3 = op_fpr_r_r_r.regR;
break;
case PPCREC_IML_OP_FPR_SUB_PAIR:
break;
default:
cemu_assert_unimplemented();
}
}
else if (type == PPCREC_IML_TYPE_FPR_R_R_R_R)
{
// fpr operation
registersUsed->readGPR1 = op_fpr_r_r_r_r.regA;
registersUsed->readGPR2 = op_fpr_r_r_r_r.regB;
registersUsed->readGPR3 = op_fpr_r_r_r_r.regC;
registersUsed->writtenGPR1 = op_fpr_r_r_r_r.regR;
// handle partially written result
switch (operation)
{
case PPCREC_IML_OP_FPR_SELECT_BOTTOM:
registersUsed->readGPR4 = op_fpr_r_r_r_r.regR;
break;
case PPCREC_IML_OP_FPR_SUM0:
case PPCREC_IML_OP_FPR_SUM1:
case PPCREC_IML_OP_FPR_SELECT_PAIR:
break;
default:
cemu_assert_unimplemented();
}
}
else if (type == PPCREC_IML_TYPE_FPR_R)
{
// fpr operation
if (operation == PPCREC_IML_OP_FPR_NEGATE_BOTTOM ||
operation == PPCREC_IML_OP_FPR_ABS_BOTTOM ||
operation == PPCREC_IML_OP_FPR_NEGATIVE_ABS_BOTTOM ||
operation == PPCREC_IML_OP_FPR_EXPAND_BOTTOM32_TO_BOTTOM64_AND_TOP64 ||
operation == PPCREC_IML_OP_FPR_ROUND_TO_SINGLE_PRECISION_BOTTOM ||
operation == PPCREC_IML_OP_FPR_ROUND_TO_SINGLE_PRECISION_PAIR)
{
registersUsed->readGPR1 = op_fpr_r.regR;
registersUsed->writtenGPR1 = op_fpr_r.regR;
}
else
cemu_assert_unimplemented();
}
else if (type == PPCREC_IML_TYPE_FPR_COMPARE)
{
registersUsed->writtenGPR1 = op_fpr_compare.regR;
registersUsed->readGPR1 = op_fpr_compare.regA;
registersUsed->readGPR2 = op_fpr_compare.regB;
}
else if (type == PPCREC_IML_TYPE_X86_EFLAGS_JCC)
{
// no registers read or written (except for the implicit eflags)
}
else
{
cemu_assert_unimplemented();
}
}
IMLReg replaceRegisterIdMultiple(IMLReg reg, const std::unordered_map<IMLRegID, IMLRegID>& translationTable)
{
if (reg.IsInvalid())
return reg;
const auto& it = translationTable.find(reg.GetRegID());
cemu_assert_debug(it != translationTable.cend());
IMLReg alteredReg = reg;
alteredReg.SetRegID(it->second);
return alteredReg;
}
void IMLInstruction::RewriteGPR(const std::unordered_map<IMLRegID, IMLRegID>& translationTable)
{
if (type == PPCREC_IML_TYPE_R_NAME)
{
op_r_name.regR = replaceRegisterIdMultiple(op_r_name.regR, translationTable);
}
else if (type == PPCREC_IML_TYPE_NAME_R)
{
op_r_name.regR = replaceRegisterIdMultiple(op_r_name.regR, translationTable);
}
else if (type == PPCREC_IML_TYPE_R_R)
{
op_r_r.regR = replaceRegisterIdMultiple(op_r_r.regR, translationTable);
op_r_r.regA = replaceRegisterIdMultiple(op_r_r.regA, translationTable);
}
else if (type == PPCREC_IML_TYPE_R_S32)
{
op_r_immS32.regR = replaceRegisterIdMultiple(op_r_immS32.regR, translationTable);
}
else if (type == PPCREC_IML_TYPE_R_R_S32)
{
op_r_r_s32.regR = replaceRegisterIdMultiple(op_r_r_s32.regR, translationTable);
op_r_r_s32.regA = replaceRegisterIdMultiple(op_r_r_s32.regA, translationTable);
}
else if (type == PPCREC_IML_TYPE_R_R_S32_CARRY)
{
op_r_r_s32_carry.regR = replaceRegisterIdMultiple(op_r_r_s32_carry.regR, translationTable);
op_r_r_s32_carry.regA = replaceRegisterIdMultiple(op_r_r_s32_carry.regA, translationTable);
op_r_r_s32_carry.regCarry = replaceRegisterIdMultiple(op_r_r_s32_carry.regCarry, translationTable);
}
else if (type == PPCREC_IML_TYPE_R_R_R)
{
op_r_r_r.regR = replaceRegisterIdMultiple(op_r_r_r.regR, translationTable);
op_r_r_r.regA = replaceRegisterIdMultiple(op_r_r_r.regA, translationTable);
op_r_r_r.regB = replaceRegisterIdMultiple(op_r_r_r.regB, translationTable);
}
else if (type == PPCREC_IML_TYPE_R_R_R_CARRY)
{
op_r_r_r_carry.regR = replaceRegisterIdMultiple(op_r_r_r_carry.regR, translationTable);
op_r_r_r_carry.regA = replaceRegisterIdMultiple(op_r_r_r_carry.regA, translationTable);
op_r_r_r_carry.regB = replaceRegisterIdMultiple(op_r_r_r_carry.regB, translationTable);
op_r_r_r_carry.regCarry = replaceRegisterIdMultiple(op_r_r_r_carry.regCarry, translationTable);
}
else if (type == PPCREC_IML_TYPE_COMPARE)
{
op_compare.regR = replaceRegisterIdMultiple(op_compare.regR, translationTable);
op_compare.regA = replaceRegisterIdMultiple(op_compare.regA, translationTable);
op_compare.regB = replaceRegisterIdMultiple(op_compare.regB, translationTable);
}
else if (type == PPCREC_IML_TYPE_COMPARE_S32)
{
op_compare_s32.regR = replaceRegisterIdMultiple(op_compare_s32.regR, translationTable);
op_compare_s32.regA = replaceRegisterIdMultiple(op_compare_s32.regA, translationTable);
}
else if (type == PPCREC_IML_TYPE_CONDITIONAL_JUMP)
{
op_conditional_jump.registerBool = replaceRegisterIdMultiple(op_conditional_jump.registerBool, translationTable);
}
else if (type == PPCREC_IML_TYPE_CJUMP_CYCLE_CHECK || type == PPCREC_IML_TYPE_JUMP)
{
// no effect on registers
}
else if (type == PPCREC_IML_TYPE_NO_OP)
{
// no effect on registers
}
else if (type == PPCREC_IML_TYPE_MACRO)
{
if (operation == PPCREC_IML_MACRO_BL || operation == PPCREC_IML_MACRO_B_FAR || operation == PPCREC_IML_MACRO_LEAVE || operation == PPCREC_IML_MACRO_DEBUGBREAK || operation == PPCREC_IML_MACRO_HLE || operation == PPCREC_IML_MACRO_COUNT_CYCLES)
{
// no effect on registers
}
else if (operation == PPCREC_IML_MACRO_B_TO_REG)
{
op_macro.paramReg = replaceRegisterIdMultiple(op_macro.paramReg, translationTable);
}
else
{
cemu_assert_unimplemented();
}
}
else if (type == PPCREC_IML_TYPE_LOAD)
{
op_storeLoad.registerData = replaceRegisterIdMultiple(op_storeLoad.registerData, translationTable);
if (op_storeLoad.registerMem.IsValid())
{
op_storeLoad.registerMem = replaceRegisterIdMultiple(op_storeLoad.registerMem, translationTable);
}
}
else if (type == PPCREC_IML_TYPE_LOAD_INDEXED)
{
op_storeLoad.registerData = replaceRegisterIdMultiple(op_storeLoad.registerData, translationTable);
if (op_storeLoad.registerMem.IsValid())
op_storeLoad.registerMem = replaceRegisterIdMultiple(op_storeLoad.registerMem, translationTable);
if (op_storeLoad.registerMem2.IsValid())
op_storeLoad.registerMem2 = replaceRegisterIdMultiple(op_storeLoad.registerMem2, translationTable);
}
else if (type == PPCREC_IML_TYPE_STORE)
{
op_storeLoad.registerData = replaceRegisterIdMultiple(op_storeLoad.registerData, translationTable);
if (op_storeLoad.registerMem.IsValid())
op_storeLoad.registerMem = replaceRegisterIdMultiple(op_storeLoad.registerMem, translationTable);
}
else if (type == PPCREC_IML_TYPE_STORE_INDEXED)
{
op_storeLoad.registerData = replaceRegisterIdMultiple(op_storeLoad.registerData, translationTable);
if (op_storeLoad.registerMem.IsValid())
op_storeLoad.registerMem = replaceRegisterIdMultiple(op_storeLoad.registerMem, translationTable);
if (op_storeLoad.registerMem2.IsValid())
op_storeLoad.registerMem2 = replaceRegisterIdMultiple(op_storeLoad.registerMem2, translationTable);
}
else if (type == PPCREC_IML_TYPE_ATOMIC_CMP_STORE)
{
op_atomic_compare_store.regEA = replaceRegisterIdMultiple(op_atomic_compare_store.regEA, translationTable);
op_atomic_compare_store.regCompareValue = replaceRegisterIdMultiple(op_atomic_compare_store.regCompareValue, translationTable);
op_atomic_compare_store.regWriteValue = replaceRegisterIdMultiple(op_atomic_compare_store.regWriteValue, translationTable);
op_atomic_compare_store.regBoolOut = replaceRegisterIdMultiple(op_atomic_compare_store.regBoolOut, translationTable);
}
else if (type == PPCREC_IML_TYPE_CALL_IMM)
{
op_call_imm.regReturn = replaceRegisterIdMultiple(op_call_imm.regReturn, translationTable);
if (op_call_imm.regParam0.IsValid())
op_call_imm.regParam0 = replaceRegisterIdMultiple(op_call_imm.regParam0, translationTable);
if (op_call_imm.regParam1.IsValid())
op_call_imm.regParam1 = replaceRegisterIdMultiple(op_call_imm.regParam1, translationTable);
if (op_call_imm.regParam2.IsValid())
op_call_imm.regParam2 = replaceRegisterIdMultiple(op_call_imm.regParam2, translationTable);
}
else if (type == PPCREC_IML_TYPE_FPR_LOAD)
{
op_storeLoad.registerData = replaceRegisterIdMultiple(op_storeLoad.registerData, translationTable);
op_storeLoad.registerMem = replaceRegisterIdMultiple(op_storeLoad.registerMem, translationTable);
op_storeLoad.registerGQR = replaceRegisterIdMultiple(op_storeLoad.registerGQR, translationTable);
}
else if (type == PPCREC_IML_TYPE_FPR_LOAD_INDEXED)
{
op_storeLoad.registerData = replaceRegisterIdMultiple(op_storeLoad.registerData, translationTable);
op_storeLoad.registerMem = replaceRegisterIdMultiple(op_storeLoad.registerMem, translationTable);
op_storeLoad.registerMem2 = replaceRegisterIdMultiple(op_storeLoad.registerMem2, translationTable);
op_storeLoad.registerGQR = replaceRegisterIdMultiple(op_storeLoad.registerGQR, translationTable);
}
else if (type == PPCREC_IML_TYPE_FPR_STORE)
{
op_storeLoad.registerData = replaceRegisterIdMultiple(op_storeLoad.registerData, translationTable);
op_storeLoad.registerMem = replaceRegisterIdMultiple(op_storeLoad.registerMem, translationTable);
op_storeLoad.registerGQR = replaceRegisterIdMultiple(op_storeLoad.registerGQR, translationTable);
}
else if (type == PPCREC_IML_TYPE_FPR_STORE_INDEXED)
{
op_storeLoad.registerData = replaceRegisterIdMultiple(op_storeLoad.registerData, translationTable);
op_storeLoad.registerMem = replaceRegisterIdMultiple(op_storeLoad.registerMem, translationTable);
op_storeLoad.registerMem2 = replaceRegisterIdMultiple(op_storeLoad.registerMem2, translationTable);
op_storeLoad.registerGQR = replaceRegisterIdMultiple(op_storeLoad.registerGQR, translationTable);
}
else if (type == PPCREC_IML_TYPE_FPR_R)
{
op_fpr_r.regR = replaceRegisterIdMultiple(op_fpr_r.regR, translationTable);
}
else if (type == PPCREC_IML_TYPE_FPR_R_R)
{
op_fpr_r_r.regR = replaceRegisterIdMultiple(op_fpr_r_r.regR, translationTable);
op_fpr_r_r.regA = replaceRegisterIdMultiple(op_fpr_r_r.regA, translationTable);
}
else if (type == PPCREC_IML_TYPE_FPR_R_R_R)
{
op_fpr_r_r_r.regR = replaceRegisterIdMultiple(op_fpr_r_r_r.regR, translationTable);
op_fpr_r_r_r.regA = replaceRegisterIdMultiple(op_fpr_r_r_r.regA, translationTable);
op_fpr_r_r_r.regB = replaceRegisterIdMultiple(op_fpr_r_r_r.regB, translationTable);
}
else if (type == PPCREC_IML_TYPE_FPR_R_R_R_R)
{
op_fpr_r_r_r_r.regR = replaceRegisterIdMultiple(op_fpr_r_r_r_r.regR, translationTable);
op_fpr_r_r_r_r.regA = replaceRegisterIdMultiple(op_fpr_r_r_r_r.regA, translationTable);
op_fpr_r_r_r_r.regB = replaceRegisterIdMultiple(op_fpr_r_r_r_r.regB, translationTable);
op_fpr_r_r_r_r.regC = replaceRegisterIdMultiple(op_fpr_r_r_r_r.regC, translationTable);
}
else if (type == PPCREC_IML_TYPE_FPR_COMPARE)
{
op_fpr_compare.regA = replaceRegisterIdMultiple(op_fpr_compare.regA, translationTable);
op_fpr_compare.regB = replaceRegisterIdMultiple(op_fpr_compare.regB, translationTable);
op_fpr_compare.regR = replaceRegisterIdMultiple(op_fpr_compare.regR, translationTable);
}
else if (type == PPCREC_IML_TYPE_X86_EFLAGS_JCC)
{
// no registers read or written (except for the implicit eflags)
}
else
{
cemu_assert_unimplemented();
}
}

View file

@ -0,0 +1,785 @@
#pragma once
using IMLRegID = uint16; // 16 bit ID
using IMLPhysReg = sint32; // arbitrary value that is up to the architecture backend, usually this will be the register index. A value of -1 is reserved and means not assigned
// format of IMLReg:
// 0-15 (16 bit) IMLRegID
// 19-23 (5 bit) Offset In elements, for SIMD registers
// 24-27 (4 bit) IMLRegFormat RegFormat
// 28-31 (4 bit) IMLRegFormat BaseFormat
enum class IMLRegFormat : uint8
{
INVALID_FORMAT,
I64,
I32,
I16,
I8,
// I1 ?
F64,
F32,
TYPE_COUNT,
};
class IMLReg
{
public:
IMLReg()
{
m_raw = 0; // 0 is invalid
}
IMLReg(IMLRegFormat baseRegFormat, IMLRegFormat regFormat, uint8 viewOffset, IMLRegID regId)
{
m_raw = 0;
m_raw |= ((uint8)baseRegFormat << 28);
m_raw |= ((uint8)regFormat << 24);
m_raw |= (uint32)regId;
}
IMLReg(IMLReg&& baseReg, IMLRegFormat viewFormat, uint8 viewOffset, IMLRegID regId)
{
DEBUG_BREAK;
//m_raw = 0;
//m_raw |= ((uint8)baseRegFormat << 28);
//m_raw |= ((uint8)viewFormat << 24);
//m_raw |= (uint32)regId;
}
IMLReg(const IMLReg& other) : m_raw(other.m_raw) {}
IMLRegFormat GetBaseFormat() const
{
return (IMLRegFormat)((m_raw >> 28) & 0xF);
}
IMLRegFormat GetRegFormat() const
{
return (IMLRegFormat)((m_raw >> 24) & 0xF);
}
IMLRegID GetRegID() const
{
cemu_assert_debug(GetBaseFormat() != IMLRegFormat::INVALID_FORMAT);
cemu_assert_debug(GetRegFormat() != IMLRegFormat::INVALID_FORMAT);
return (IMLRegID)(m_raw & 0xFFFF);
}
void SetRegID(IMLRegID regId)
{
cemu_assert_debug(regId <= 0xFFFF);
m_raw &= ~0xFFFF;
m_raw |= (uint32)regId;
}
bool IsInvalid() const
{
return GetBaseFormat() == IMLRegFormat::INVALID_FORMAT;
}
bool IsValid() const
{
return GetBaseFormat() != IMLRegFormat::INVALID_FORMAT;
}
bool IsValidAndSameRegID(IMLRegID regId) const
{
return IsValid() && GetRegID() == regId;
}
// compare all fields
bool operator==(const IMLReg& other) const
{
return m_raw == other.m_raw;
}
private:
uint32 m_raw;
};
static const IMLReg IMLREG_INVALID(IMLRegFormat::INVALID_FORMAT, IMLRegFormat::INVALID_FORMAT, 0, 0);
static const IMLRegID IMLRegID_INVALID(0xFFFF);
using IMLName = uint32;
enum
{
PPCREC_IML_OP_ASSIGN, // '=' operator
PPCREC_IML_OP_ENDIAN_SWAP, // '=' operator with 32bit endian swap
PPCREC_IML_OP_MULTIPLY_SIGNED, // '*' operator (signed multiply)
PPCREC_IML_OP_MULTIPLY_HIGH_UNSIGNED, // unsigned 64bit multiply, store only high 32bit-word of result
PPCREC_IML_OP_MULTIPLY_HIGH_SIGNED, // signed 64bit multiply, store only high 32bit-word of result
PPCREC_IML_OP_DIVIDE_SIGNED, // '/' operator (signed divide)
PPCREC_IML_OP_DIVIDE_UNSIGNED, // '/' operator (unsigned divide)
// binary operation
PPCREC_IML_OP_OR, // '|' operator
PPCREC_IML_OP_AND, // '&' operator
PPCREC_IML_OP_XOR, // '^' operator
PPCREC_IML_OP_LEFT_ROTATE, // left rotate operator
PPCREC_IML_OP_LEFT_SHIFT, // shift left operator
PPCREC_IML_OP_RIGHT_SHIFT_U, // right shift operator (unsigned)
PPCREC_IML_OP_RIGHT_SHIFT_S, // right shift operator (signed)
// ppc
PPCREC_IML_OP_SLW, // SLW (shift based on register by up to 63 bits)
PPCREC_IML_OP_SRW, // SRW (shift based on register by up to 63 bits)
PPCREC_IML_OP_CNTLZW,
// FPU
PPCREC_IML_OP_FPR_ADD_BOTTOM,
PPCREC_IML_OP_FPR_ADD_PAIR,
PPCREC_IML_OP_FPR_SUB_PAIR,
PPCREC_IML_OP_FPR_SUB_BOTTOM,
PPCREC_IML_OP_FPR_MULTIPLY_BOTTOM,
PPCREC_IML_OP_FPR_MULTIPLY_PAIR,
PPCREC_IML_OP_FPR_DIVIDE_BOTTOM,
PPCREC_IML_OP_FPR_DIVIDE_PAIR,
PPCREC_IML_OP_FPR_COPY_BOTTOM_TO_BOTTOM_AND_TOP,
PPCREC_IML_OP_FPR_COPY_TOP_TO_BOTTOM_AND_TOP,
PPCREC_IML_OP_FPR_COPY_BOTTOM_TO_BOTTOM,
PPCREC_IML_OP_FPR_COPY_BOTTOM_TO_TOP, // leave bottom of destination untouched
PPCREC_IML_OP_FPR_COPY_TOP_TO_TOP, // leave bottom of destination untouched
PPCREC_IML_OP_FPR_COPY_TOP_TO_BOTTOM, // leave top of destination untouched
PPCREC_IML_OP_FPR_COPY_BOTTOM_AND_TOP_SWAPPED,
PPCREC_IML_OP_FPR_EXPAND_BOTTOM32_TO_BOTTOM64_AND_TOP64, // expand bottom f32 to f64 in bottom and top half
PPCREC_IML_OP_FPR_FCMPO_BOTTOM, // deprecated
PPCREC_IML_OP_FPR_FCMPU_BOTTOM, // deprecated
PPCREC_IML_OP_FPR_FCMPU_TOP, // deprecated
PPCREC_IML_OP_FPR_NEGATE_BOTTOM,
PPCREC_IML_OP_FPR_NEGATE_PAIR,
PPCREC_IML_OP_FPR_ABS_BOTTOM, // abs(fp0)
PPCREC_IML_OP_FPR_ABS_PAIR,
PPCREC_IML_OP_FPR_FRES_PAIR, // 1.0/fp approx (Espresso accuracy)
PPCREC_IML_OP_FPR_FRSQRTE_PAIR, // 1.0/sqrt(fp) approx (Espresso accuracy)
PPCREC_IML_OP_FPR_NEGATIVE_ABS_BOTTOM, // -abs(fp0)
PPCREC_IML_OP_FPR_ROUND_TO_SINGLE_PRECISION_BOTTOM, // round 64bit double to 64bit double with 32bit float precision (in bottom half of xmm register)
PPCREC_IML_OP_FPR_ROUND_TO_SINGLE_PRECISION_PAIR, // round two 64bit doubles to 64bit double with 32bit float precision
PPCREC_IML_OP_FPR_BOTTOM_RECIPROCAL_SQRT,
PPCREC_IML_OP_FPR_BOTTOM_FCTIWZ,
PPCREC_IML_OP_FPR_SELECT_BOTTOM, // selectively copy bottom value from operand B or C based on value in operand A
PPCREC_IML_OP_FPR_SELECT_PAIR, // selectively copy top/bottom from operand B or C based on value in top/bottom of operand A
// PS
PPCREC_IML_OP_FPR_SUM0,
PPCREC_IML_OP_FPR_SUM1,
// R_R_R only
// R_R_S32 only
// R_R_R + R_R_S32
PPCREC_IML_OP_ADD, // also R_R_R_CARRY
PPCREC_IML_OP_SUB,
// R_R only
PPCREC_IML_OP_NOT,
PPCREC_IML_OP_NEG,
PPCREC_IML_OP_ASSIGN_S16_TO_S32,
PPCREC_IML_OP_ASSIGN_S8_TO_S32,
// R_R_R_carry
PPCREC_IML_OP_ADD_WITH_CARRY, // similar to ADD but also adds carry bit (0 or 1)
// X86 extension
PPCREC_IML_OP_X86_CMP, // R_R and R_S32
PPCREC_IML_OP_INVALID
};
#define PPCREC_IML_OP_FPR_COPY_PAIR (PPCREC_IML_OP_ASSIGN)
enum
{
PPCREC_IML_MACRO_B_TO_REG, // branch to PPC address in register (used for BCCTR, BCLR)
PPCREC_IML_MACRO_BL, // call to different function (can be within same function)
PPCREC_IML_MACRO_B_FAR, // branch to different function
PPCREC_IML_MACRO_COUNT_CYCLES, // decrease current remaining thread cycles by a certain amount
PPCREC_IML_MACRO_HLE, // HLE function call
PPCREC_IML_MACRO_LEAVE, // leaves recompiler and switches to interpeter
// debugging
PPCREC_IML_MACRO_DEBUGBREAK, // throws a debugbreak
};
enum class IMLCondition : uint8
{
EQ,
NEQ,
SIGNED_GT,
SIGNED_LT,
UNSIGNED_GT,
UNSIGNED_LT,
// floating point conditions
UNORDERED_GT, // a > b, false if either is NaN
UNORDERED_LT, // a < b, false if either is NaN
UNORDERED_EQ, // a == b, false if either is NaN
UNORDERED_U, // unordered (true if either operand is NaN)
ORDERED_GT,
ORDERED_LT,
ORDERED_EQ,
ORDERED_U
};
enum
{
PPCREC_IML_TYPE_NONE,
PPCREC_IML_TYPE_NO_OP, // no-op instruction
PPCREC_IML_TYPE_R_R, // r* = (op) *r (can also be r* (op) *r)
PPCREC_IML_TYPE_R_R_R, // r* = r* (op) r*
PPCREC_IML_TYPE_R_R_R_CARRY, // r* = r* (op) r* (reads and/or updates carry)
PPCREC_IML_TYPE_R_R_S32, // r* = r* (op) s32*
PPCREC_IML_TYPE_R_R_S32_CARRY, // r* = r* (op) s32* (reads and/or updates carry)
PPCREC_IML_TYPE_LOAD, // r* = [r*+s32*]
PPCREC_IML_TYPE_LOAD_INDEXED, // r* = [r*+r*]
PPCREC_IML_TYPE_STORE, // [r*+s32*] = r*
PPCREC_IML_TYPE_STORE_INDEXED, // [r*+r*] = r*
PPCREC_IML_TYPE_R_NAME, // r* = name
PPCREC_IML_TYPE_NAME_R, // name* = r*
PPCREC_IML_TYPE_R_S32, // r* (op) imm
PPCREC_IML_TYPE_MACRO,
PPCREC_IML_TYPE_CJUMP_CYCLE_CHECK, // jumps only if remaining thread cycles < 0
// conditions and branches
PPCREC_IML_TYPE_COMPARE, // r* = r* CMP[cond] r*
PPCREC_IML_TYPE_COMPARE_S32, // r* = r* CMP[cond] imm
PPCREC_IML_TYPE_JUMP, // jump always
PPCREC_IML_TYPE_CONDITIONAL_JUMP, // jump conditionally based on boolean value in register
// atomic
PPCREC_IML_TYPE_ATOMIC_CMP_STORE,
// function call
PPCREC_IML_TYPE_CALL_IMM, // call to fixed immediate address
// FPR
PPCREC_IML_TYPE_FPR_LOAD, // r* = (bitdepth) [r*+s32*] (single or paired single mode)
PPCREC_IML_TYPE_FPR_LOAD_INDEXED, // r* = (bitdepth) [r*+r*] (single or paired single mode)
PPCREC_IML_TYPE_FPR_STORE, // (bitdepth) [r*+s32*] = r* (single or paired single mode)
PPCREC_IML_TYPE_FPR_STORE_INDEXED, // (bitdepth) [r*+r*] = r* (single or paired single mode)
PPCREC_IML_TYPE_FPR_R_R,
PPCREC_IML_TYPE_FPR_R_R_R,
PPCREC_IML_TYPE_FPR_R_R_R_R,
PPCREC_IML_TYPE_FPR_R,
PPCREC_IML_TYPE_FPR_COMPARE, // r* = r* CMP[cond] r*
// X86 specific
PPCREC_IML_TYPE_X86_EFLAGS_JCC,
};
enum // IMLName
{
PPCREC_NAME_NONE,
PPCREC_NAME_TEMPORARY = 1000,
PPCREC_NAME_R0 = 2000,
PPCREC_NAME_SPR0 = 3000,
PPCREC_NAME_FPR0 = 4000,
PPCREC_NAME_TEMPORARY_FPR0 = 5000, // 0 to 7
PPCREC_NAME_XER_CA = 6000, // carry bit from XER
PPCREC_NAME_XER_OV = 6001, // overflow bit from XER
PPCREC_NAME_XER_SO = 6002, // summary overflow bit from XER
PPCREC_NAME_CR = 7000, // CR register bits (31 to 0)
PPCREC_NAME_CR_LAST = PPCREC_NAME_CR+31,
PPCREC_NAME_CPU_MEMRES_EA = 8000,
PPCREC_NAME_CPU_MEMRES_VAL = 8001
};
#define PPC_REC_INVALID_REGISTER 0xFF // deprecated. Use IMLREG_INVALID instead
enum
{
// fpr load
PPCREC_FPR_LD_MODE_SINGLE_INTO_PS0,
PPCREC_FPR_LD_MODE_SINGLE_INTO_PS0_PS1,
PPCREC_FPR_LD_MODE_DOUBLE_INTO_PS0,
PPCREC_FPR_LD_MODE_PSQ_GENERIC_PS0,
PPCREC_FPR_LD_MODE_PSQ_GENERIC_PS0_PS1,
PPCREC_FPR_LD_MODE_PSQ_FLOAT_PS0,
PPCREC_FPR_LD_MODE_PSQ_FLOAT_PS0_PS1,
PPCREC_FPR_LD_MODE_PSQ_S16_PS0,
PPCREC_FPR_LD_MODE_PSQ_S16_PS0_PS1,
PPCREC_FPR_LD_MODE_PSQ_U16_PS0,
PPCREC_FPR_LD_MODE_PSQ_U16_PS0_PS1,
PPCREC_FPR_LD_MODE_PSQ_S8_PS0,
PPCREC_FPR_LD_MODE_PSQ_S8_PS0_PS1,
PPCREC_FPR_LD_MODE_PSQ_U8_PS0,
PPCREC_FPR_LD_MODE_PSQ_U8_PS0_PS1,
// fpr store
PPCREC_FPR_ST_MODE_SINGLE_FROM_PS0, // store 1 single precision float from ps0
PPCREC_FPR_ST_MODE_DOUBLE_FROM_PS0, // store 1 double precision float from ps0
PPCREC_FPR_ST_MODE_UI32_FROM_PS0, // store raw low-32bit of PS0
PPCREC_FPR_ST_MODE_PSQ_GENERIC_PS0_PS1,
PPCREC_FPR_ST_MODE_PSQ_GENERIC_PS0,
PPCREC_FPR_ST_MODE_PSQ_FLOAT_PS0_PS1,
PPCREC_FPR_ST_MODE_PSQ_FLOAT_PS0,
PPCREC_FPR_ST_MODE_PSQ_S8_PS0,
PPCREC_FPR_ST_MODE_PSQ_S8_PS0_PS1,
PPCREC_FPR_ST_MODE_PSQ_U8_PS0,
PPCREC_FPR_ST_MODE_PSQ_U8_PS0_PS1,
PPCREC_FPR_ST_MODE_PSQ_U16_PS0,
PPCREC_FPR_ST_MODE_PSQ_U16_PS0_PS1,
PPCREC_FPR_ST_MODE_PSQ_S16_PS0,
PPCREC_FPR_ST_MODE_PSQ_S16_PS0_PS1,
};
struct IMLUsedRegisters
{
IMLUsedRegisters() {};
bool IsWrittenByRegId(IMLRegID regId) const
{
if (writtenGPR1.IsValid() && writtenGPR1.GetRegID() == regId)
return true;
if (writtenGPR2.IsValid() && writtenGPR2.GetRegID() == regId)
return true;
return false;
}
bool IsBaseGPRWritten(IMLReg imlReg) const
{
cemu_assert_debug(imlReg.IsValid());
auto regId = imlReg.GetRegID();
return IsWrittenByRegId(regId);
}
template<typename Fn>
void ForEachWrittenGPR(Fn F) const
{
if (writtenGPR1.IsValid())
F(writtenGPR1);
if (writtenGPR2.IsValid())
F(writtenGPR2);
}
template<typename Fn>
void ForEachReadGPR(Fn F) const
{
if (readGPR1.IsValid())
F(readGPR1);
if (readGPR2.IsValid())
F(readGPR2);
if (readGPR3.IsValid())
F(readGPR3);
if (readGPR4.IsValid())
F(readGPR4);
}
template<typename Fn>
void ForEachAccessedGPR(Fn F) const
{
// GPRs
if (readGPR1.IsValid())
F(readGPR1, false);
if (readGPR2.IsValid())
F(readGPR2, false);
if (readGPR3.IsValid())
F(readGPR3, false);
if (readGPR4.IsValid())
F(readGPR4, false);
if (writtenGPR1.IsValid())
F(writtenGPR1, true);
if (writtenGPR2.IsValid())
F(writtenGPR2, true);
}
IMLReg readGPR1;
IMLReg readGPR2;
IMLReg readGPR3;
IMLReg readGPR4;
IMLReg writtenGPR1;
IMLReg writtenGPR2;
};
struct IMLInstruction
{
IMLInstruction() {}
IMLInstruction(const IMLInstruction& other)
{
memcpy(this, &other, sizeof(IMLInstruction));
}
uint8 type;
uint8 operation;
union
{
struct
{
uint8 _padding[7];
}padding;
struct
{
IMLReg regR;
IMLReg regA;
}op_r_r;
struct
{
IMLReg regR;
IMLReg regA;
IMLReg regB;
}op_r_r_r;
struct
{
IMLReg regR;
IMLReg regA;
IMLReg regB;
IMLReg regCarry;
}op_r_r_r_carry;
struct
{
IMLReg regR;
IMLReg regA;
sint32 immS32;
}op_r_r_s32;
struct
{
IMLReg regR;
IMLReg regA;
IMLReg regCarry;
sint32 immS32;
}op_r_r_s32_carry;
struct
{
IMLReg regR;
IMLName name;
}op_r_name; // alias op_name_r
struct
{
IMLReg regR;
sint32 immS32;
}op_r_immS32;
struct
{
uint32 param;
uint32 param2;
uint16 paramU16;
IMLReg paramReg;
}op_macro;
struct
{
IMLReg registerData;
IMLReg registerMem;
IMLReg registerMem2;
IMLReg registerGQR;
uint8 copyWidth;
struct
{
bool swapEndian : 1;
bool signExtend : 1;
bool notExpanded : 1; // for floats
}flags2;
uint8 mode; // transfer mode (copy width, ps0/ps1 behavior)
sint32 immS32;
}op_storeLoad;
struct
{
uintptr_t callAddress;
IMLReg regParam0;
IMLReg regParam1;
IMLReg regParam2;
IMLReg regReturn;
}op_call_imm;
struct
{
IMLReg regR;
IMLReg regA;
}op_fpr_r_r;
struct
{
IMLReg regR;
IMLReg regA;
IMLReg regB;
}op_fpr_r_r_r;
struct
{
IMLReg regR;
IMLReg regA;
IMLReg regB;
IMLReg regC;
}op_fpr_r_r_r_r;
struct
{
IMLReg regR;
}op_fpr_r;
struct
{
IMLReg regR; // stores the boolean result of the comparison
IMLReg regA;
IMLReg regB;
IMLCondition cond;
}op_fpr_compare;
struct
{
IMLReg regR; // stores the boolean result of the comparison
IMLReg regA;
IMLReg regB;
IMLCondition cond;
}op_compare;
struct
{
IMLReg regR; // stores the boolean result of the comparison
IMLReg regA;
sint32 immS32;
IMLCondition cond;
}op_compare_s32;
struct
{
IMLReg registerBool;
bool mustBeTrue;
}op_conditional_jump;
struct
{
IMLReg regEA;
IMLReg regCompareValue;
IMLReg regWriteValue;
IMLReg regBoolOut;
}op_atomic_compare_store;
// conditional operations (emitted if supported by target platform)
struct
{
// r_s32
IMLReg regR;
sint32 immS32;
// condition
uint8 crRegisterIndex;
uint8 crBitIndex;
bool bitMustBeSet;
}op_conditional_r_s32;
// X86 specific
struct
{
IMLCondition cond;
bool invertedCondition;
}op_x86_eflags_jcc;
};
bool IsSuffixInstruction() const
{
if (type == PPCREC_IML_TYPE_MACRO && operation == PPCREC_IML_MACRO_BL ||
type == PPCREC_IML_TYPE_MACRO && operation == PPCREC_IML_MACRO_B_FAR ||
type == PPCREC_IML_TYPE_MACRO && operation == PPCREC_IML_MACRO_B_TO_REG ||
type == PPCREC_IML_TYPE_MACRO && operation == PPCREC_IML_MACRO_LEAVE ||
type == PPCREC_IML_TYPE_MACRO && operation == PPCREC_IML_MACRO_HLE ||
type == PPCREC_IML_TYPE_CJUMP_CYCLE_CHECK ||
type == PPCREC_IML_TYPE_JUMP ||
type == PPCREC_IML_TYPE_CONDITIONAL_JUMP ||
type == PPCREC_IML_TYPE_X86_EFLAGS_JCC)
return true;
return false;
}
// instruction setters
void make_no_op()
{
type = PPCREC_IML_TYPE_NO_OP;
operation = 0;
}
void make_r_name(IMLReg regR, IMLName name)
{
cemu_assert_debug(regR.GetBaseFormat() == regR.GetRegFormat()); // for name load/store instructions the register must match the base format
type = PPCREC_IML_TYPE_R_NAME;
operation = PPCREC_IML_OP_ASSIGN;
op_r_name.regR = regR;
op_r_name.name = name;
}
void make_name_r(IMLName name, IMLReg regR)
{
cemu_assert_debug(regR.GetBaseFormat() == regR.GetRegFormat()); // for name load/store instructions the register must match the base format
type = PPCREC_IML_TYPE_NAME_R;
operation = PPCREC_IML_OP_ASSIGN;
op_r_name.regR = regR;
op_r_name.name = name;
}
void make_debugbreak(uint32 currentPPCAddress = 0)
{
make_macro(PPCREC_IML_MACRO_DEBUGBREAK, 0, currentPPCAddress, 0, IMLREG_INVALID);
}
void make_macro(uint32 macroId, uint32 param, uint32 param2, uint16 paramU16, IMLReg regParam)
{
this->type = PPCREC_IML_TYPE_MACRO;
this->operation = macroId;
this->op_macro.param = param;
this->op_macro.param2 = param2;
this->op_macro.paramU16 = paramU16;
this->op_macro.paramReg = regParam;
}
void make_cjump_cycle_check()
{
this->type = PPCREC_IML_TYPE_CJUMP_CYCLE_CHECK;
this->operation = 0;
}
void make_r_r(uint32 operation, IMLReg regR, IMLReg regA)
{
this->type = PPCREC_IML_TYPE_R_R;
this->operation = operation;
this->op_r_r.regR = regR;
this->op_r_r.regA = regA;
}
void make_r_s32(uint32 operation, IMLReg regR, sint32 immS32)
{
this->type = PPCREC_IML_TYPE_R_S32;
this->operation = operation;
this->op_r_immS32.regR = regR;
this->op_r_immS32.immS32 = immS32;
}
void make_r_r_r(uint32 operation, IMLReg regR, IMLReg regA, IMLReg regB)
{
this->type = PPCREC_IML_TYPE_R_R_R;
this->operation = operation;
this->op_r_r_r.regR = regR;
this->op_r_r_r.regA = regA;
this->op_r_r_r.regB = regB;
}
void make_r_r_r_carry(uint32 operation, IMLReg regR, IMLReg regA, IMLReg regB, IMLReg regCarry)
{
this->type = PPCREC_IML_TYPE_R_R_R_CARRY;
this->operation = operation;
this->op_r_r_r_carry.regR = regR;
this->op_r_r_r_carry.regA = regA;
this->op_r_r_r_carry.regB = regB;
this->op_r_r_r_carry.regCarry = regCarry;
}
void make_r_r_s32(uint32 operation, IMLReg regR, IMLReg regA, sint32 immS32)
{
this->type = PPCREC_IML_TYPE_R_R_S32;
this->operation = operation;
this->op_r_r_s32.regR = regR;
this->op_r_r_s32.regA = regA;
this->op_r_r_s32.immS32 = immS32;
}
void make_r_r_s32_carry(uint32 operation, IMLReg regR, IMLReg regA, sint32 immS32, IMLReg regCarry)
{
this->type = PPCREC_IML_TYPE_R_R_S32_CARRY;
this->operation = operation;
this->op_r_r_s32_carry.regR = regR;
this->op_r_r_s32_carry.regA = regA;
this->op_r_r_s32_carry.immS32 = immS32;
this->op_r_r_s32_carry.regCarry = regCarry;
}
void make_compare(IMLReg regA, IMLReg regB, IMLReg regR, IMLCondition cond)
{
this->type = PPCREC_IML_TYPE_COMPARE;
this->operation = PPCREC_IML_OP_INVALID;
this->op_compare.regR = regR;
this->op_compare.regA = regA;
this->op_compare.regB = regB;
this->op_compare.cond = cond;
}
void make_compare_s32(IMLReg regA, sint32 immS32, IMLReg regR, IMLCondition cond)
{
this->type = PPCREC_IML_TYPE_COMPARE_S32;
this->operation = PPCREC_IML_OP_INVALID;
this->op_compare_s32.regR = regR;
this->op_compare_s32.regA = regA;
this->op_compare_s32.immS32 = immS32;
this->op_compare_s32.cond = cond;
}
void make_conditional_jump(IMLReg regBool, bool mustBeTrue)
{
this->type = PPCREC_IML_TYPE_CONDITIONAL_JUMP;
this->operation = PPCREC_IML_OP_INVALID;
this->op_conditional_jump.registerBool = regBool;
this->op_conditional_jump.mustBeTrue = mustBeTrue;
}
void make_jump()
{
this->type = PPCREC_IML_TYPE_JUMP;
this->operation = PPCREC_IML_OP_INVALID;
}
// load from memory
void make_r_memory(IMLReg regD, IMLReg regMem, sint32 immS32, uint32 copyWidth, bool signExtend, bool switchEndian)
{
this->type = PPCREC_IML_TYPE_LOAD;
this->operation = 0;
this->op_storeLoad.registerData = regD;
this->op_storeLoad.registerMem = regMem;
this->op_storeLoad.immS32 = immS32;
this->op_storeLoad.copyWidth = copyWidth;
this->op_storeLoad.flags2.swapEndian = switchEndian;
this->op_storeLoad.flags2.signExtend = signExtend;
}
// store to memory
void make_memory_r(IMLReg regS, IMLReg regMem, sint32 immS32, uint32 copyWidth, bool switchEndian)
{
this->type = PPCREC_IML_TYPE_STORE;
this->operation = 0;
this->op_storeLoad.registerData = regS;
this->op_storeLoad.registerMem = regMem;
this->op_storeLoad.immS32 = immS32;
this->op_storeLoad.copyWidth = copyWidth;
this->op_storeLoad.flags2.swapEndian = switchEndian;
this->op_storeLoad.flags2.signExtend = false;
}
void make_atomic_cmp_store(IMLReg regEA, IMLReg regCompareValue, IMLReg regWriteValue, IMLReg regSuccessOutput)
{
this->type = PPCREC_IML_TYPE_ATOMIC_CMP_STORE;
this->operation = 0;
this->op_atomic_compare_store.regEA = regEA;
this->op_atomic_compare_store.regCompareValue = regCompareValue;
this->op_atomic_compare_store.regWriteValue = regWriteValue;
this->op_atomic_compare_store.regBoolOut = regSuccessOutput;
}
void make_call_imm(uintptr_t callAddress, IMLReg param0, IMLReg param1, IMLReg param2, IMLReg regReturn)
{
this->type = PPCREC_IML_TYPE_CALL_IMM;
this->operation = 0;
this->op_call_imm.callAddress = callAddress;
this->op_call_imm.regParam0 = param0;
this->op_call_imm.regParam1 = param1;
this->op_call_imm.regParam2 = param2;
this->op_call_imm.regReturn = regReturn;
}
void make_fpr_compare(IMLReg regA, IMLReg regB, IMLReg regR, IMLCondition cond)
{
this->type = PPCREC_IML_TYPE_FPR_COMPARE;
this->operation = -999;
this->op_fpr_compare.regR = regR;
this->op_fpr_compare.regA = regA;
this->op_fpr_compare.regB = regB;
this->op_fpr_compare.cond = cond;
}
/* X86 specific */
void make_x86_eflags_jcc(IMLCondition cond, bool invertedCondition)
{
this->type = PPCREC_IML_TYPE_X86_EFLAGS_JCC;
this->operation = -999;
this->op_x86_eflags_jcc.cond = cond;
this->op_x86_eflags_jcc.invertedCondition = invertedCondition;
}
void CheckRegisterUsage(IMLUsedRegisters* registersUsed) const;
bool HasSideEffects() const; // returns true if the instruction has side effects beyond just reading and writing registers. Dead code elimination uses this to know if an instruction can be dropped when the regular register outputs are not used
void RewriteGPR(const std::unordered_map<IMLRegID, IMLRegID>& translationTable);
};
// architecture specific constants
namespace IMLArchX86
{
static constexpr int PHYSREG_GPR_BASE = 0;
static constexpr int PHYSREG_FPR_BASE = 16;
};

View file

@ -0,0 +1,794 @@
#include "Cafe/HW/Espresso/Interpreter/PPCInterpreterInternal.h"
#include "Cafe/HW/Espresso/Recompiler/IML/IML.h"
#include "Cafe/HW/Espresso/Recompiler/IML/IMLInstruction.h"
#include "../PPCRecompiler.h"
#include "../PPCRecompilerIml.h"
#include "../BackendX64/BackendX64.h"
#include "Common/FileStream.h"
#include <boost/container/static_vector.hpp>
#include <boost/container/small_vector.hpp>
IMLReg _FPRRegFromID(IMLRegID regId)
{
return IMLReg(IMLRegFormat::F64, IMLRegFormat::F64, 0, regId);
}
void PPCRecompiler_optimizeDirectFloatCopiesScanForward(ppcImlGenContext_t* ppcImlGenContext, IMLSegment* imlSegment, sint32 imlIndexLoad, IMLReg fprReg)
{
IMLRegID fprIndex = fprReg.GetRegID();
IMLInstruction* imlInstructionLoad = imlSegment->imlList.data() + imlIndexLoad;
if (imlInstructionLoad->op_storeLoad.flags2.notExpanded)
return;
IMLUsedRegisters registersUsed;
sint32 scanRangeEnd = std::min<sint32>(imlIndexLoad + 25, imlSegment->imlList.size()); // don't scan too far (saves performance and also the chances we can merge the load+store become low at high distances)
bool foundMatch = false;
sint32 lastStore = -1;
for (sint32 i = imlIndexLoad + 1; i < scanRangeEnd; i++)
{
IMLInstruction* imlInstruction = imlSegment->imlList.data() + i;
if (imlInstruction->IsSuffixInstruction())
break;
// check if FPR is stored
if ((imlInstruction->type == PPCREC_IML_TYPE_FPR_STORE && imlInstruction->op_storeLoad.mode == PPCREC_FPR_ST_MODE_SINGLE_FROM_PS0) ||
(imlInstruction->type == PPCREC_IML_TYPE_FPR_STORE_INDEXED && imlInstruction->op_storeLoad.mode == PPCREC_FPR_ST_MODE_SINGLE_FROM_PS0))
{
if (imlInstruction->op_storeLoad.registerData.GetRegID() == fprIndex)
{
if (foundMatch == false)
{
// flag the load-single instruction as "don't expand" (leave single value as-is)
imlInstructionLoad->op_storeLoad.flags2.notExpanded = true;
}
// also set the flag for the store instruction
IMLInstruction* imlInstructionStore = imlInstruction;
imlInstructionStore->op_storeLoad.flags2.notExpanded = true;
foundMatch = true;
lastStore = i + 1;
continue;
}
}
// check if FPR is overwritten (we can actually ignore read operations?)
imlInstruction->CheckRegisterUsage(&registersUsed);
if (registersUsed.writtenGPR1.IsValidAndSameRegID(fprIndex) || registersUsed.writtenGPR2.IsValidAndSameRegID(fprIndex))
break;
if (registersUsed.readGPR1.IsValidAndSameRegID(fprIndex))
break;
if (registersUsed.readGPR2.IsValidAndSameRegID(fprIndex))
break;
if (registersUsed.readGPR3.IsValidAndSameRegID(fprIndex))
break;
if (registersUsed.readGPR4.IsValidAndSameRegID(fprIndex))
break;
}
if (foundMatch)
{
// insert expand instruction after store
IMLInstruction* newExpand = PPCRecompiler_insertInstruction(imlSegment, lastStore);
PPCRecompilerImlGen_generateNewInstruction_fpr_r(ppcImlGenContext, newExpand, PPCREC_IML_OP_FPR_EXPAND_BOTTOM32_TO_BOTTOM64_AND_TOP64, _FPRRegFromID(fprIndex));
}
}
/*
* Scans for patterns:
* <Load sp float into register f>
* <Random unrelated instructions>
* <Store sp float from register f>
* For these patterns the store and load is modified to work with un-extended values (float remains as float, no double conversion)
* The float->double extension is then executed later
* Advantages:
* Keeps denormals and other special float values intact
* Slightly improves performance
*/
void IMLOptimizer_OptimizeDirectFloatCopies(ppcImlGenContext_t* ppcImlGenContext)
{
for (IMLSegment* segIt : ppcImlGenContext->segmentList2)
{
for (sint32 i = 0; i < segIt->imlList.size(); i++)
{
IMLInstruction* imlInstruction = segIt->imlList.data() + i;
if (imlInstruction->type == PPCREC_IML_TYPE_FPR_LOAD && imlInstruction->op_storeLoad.mode == PPCREC_FPR_LD_MODE_SINGLE_INTO_PS0_PS1)
{
PPCRecompiler_optimizeDirectFloatCopiesScanForward(ppcImlGenContext, segIt, i, imlInstruction->op_storeLoad.registerData);
}
else if (imlInstruction->type == PPCREC_IML_TYPE_FPR_LOAD_INDEXED && imlInstruction->op_storeLoad.mode == PPCREC_FPR_LD_MODE_SINGLE_INTO_PS0_PS1)
{
PPCRecompiler_optimizeDirectFloatCopiesScanForward(ppcImlGenContext, segIt, i, imlInstruction->op_storeLoad.registerData);
}
}
}
}
void PPCRecompiler_optimizeDirectIntegerCopiesScanForward(ppcImlGenContext_t* ppcImlGenContext, IMLSegment* imlSegment, sint32 imlIndexLoad, IMLReg gprReg)
{
cemu_assert_debug(gprReg.GetBaseFormat() == IMLRegFormat::I64); // todo - proper handling required for non-standard sizes
cemu_assert_debug(gprReg.GetRegFormat() == IMLRegFormat::I32);
IMLRegID gprIndex = gprReg.GetRegID();
IMLInstruction* imlInstructionLoad = imlSegment->imlList.data() + imlIndexLoad;
if ( imlInstructionLoad->op_storeLoad.flags2.swapEndian == false )
return;
bool foundMatch = false;
IMLUsedRegisters registersUsed;
sint32 scanRangeEnd = std::min<sint32>(imlIndexLoad + 25, imlSegment->imlList.size()); // don't scan too far (saves performance and also the chances we can merge the load+store become low at high distances)
sint32 i = imlIndexLoad + 1;
for (; i < scanRangeEnd; i++)
{
IMLInstruction* imlInstruction = imlSegment->imlList.data() + i;
if (imlInstruction->IsSuffixInstruction())
break;
// check if GPR is stored
if ((imlInstruction->type == PPCREC_IML_TYPE_STORE && imlInstruction->op_storeLoad.copyWidth == 32 ) )
{
if (imlInstruction->op_storeLoad.registerMem.GetRegID() == gprIndex)
break;
if (imlInstruction->op_storeLoad.registerData.GetRegID() == gprIndex)
{
IMLInstruction* imlInstructionStore = imlInstruction;
if (foundMatch == false)
{
// switch the endian swap flag for the load instruction
imlInstructionLoad->op_storeLoad.flags2.swapEndian = !imlInstructionLoad->op_storeLoad.flags2.swapEndian;
foundMatch = true;
}
// switch the endian swap flag for the store instruction
imlInstructionStore->op_storeLoad.flags2.swapEndian = !imlInstructionStore->op_storeLoad.flags2.swapEndian;
// keep scanning
continue;
}
}
// check if GPR is accessed
imlInstruction->CheckRegisterUsage(&registersUsed);
if (registersUsed.readGPR1.IsValidAndSameRegID(gprIndex) ||
registersUsed.readGPR2.IsValidAndSameRegID(gprIndex) ||
registersUsed.readGPR3.IsValidAndSameRegID(gprIndex))
{
break;
}
if (registersUsed.IsBaseGPRWritten(gprReg))
return; // GPR overwritten, we don't need to byte swap anymore
}
if (foundMatch)
{
PPCRecompiler_insertInstruction(imlSegment, i)->make_r_r(PPCREC_IML_OP_ENDIAN_SWAP, gprReg, gprReg);
}
}
/*
* Scans for patterns:
* <Load sp integer into register r>
* <Random unrelated instructions>
* <Store sp integer from register r>
* For these patterns the store and load is modified to work with non-swapped values
* The big_endian->little_endian conversion is then executed later
* Advantages:
* Slightly improves performance
*/
void IMLOptimizer_OptimizeDirectIntegerCopies(ppcImlGenContext_t* ppcImlGenContext)
{
for (IMLSegment* segIt : ppcImlGenContext->segmentList2)
{
for (sint32 i = 0; i < segIt->imlList.size(); i++)
{
IMLInstruction* imlInstruction = segIt->imlList.data() + i;
if (imlInstruction->type == PPCREC_IML_TYPE_LOAD && imlInstruction->op_storeLoad.copyWidth == 32 && imlInstruction->op_storeLoad.flags2.swapEndian )
{
PPCRecompiler_optimizeDirectIntegerCopiesScanForward(ppcImlGenContext, segIt, i, imlInstruction->op_storeLoad.registerData);
}
}
}
}
IMLName PPCRecompilerImlGen_GetRegName(ppcImlGenContext_t* ppcImlGenContext, IMLReg reg);
sint32 _getGQRIndexFromRegister(ppcImlGenContext_t* ppcImlGenContext, IMLReg gqrReg)
{
if (gqrReg.IsInvalid())
return -1;
sint32 namedReg = PPCRecompilerImlGen_GetRegName(ppcImlGenContext, gqrReg);
if (namedReg >= (PPCREC_NAME_SPR0 + SPR_UGQR0) && namedReg <= (PPCREC_NAME_SPR0 + SPR_UGQR7))
{
return namedReg - (PPCREC_NAME_SPR0 + SPR_UGQR0);
}
else
{
cemu_assert_suspicious();
}
return -1;
}
bool PPCRecompiler_isUGQRValueKnown(ppcImlGenContext_t* ppcImlGenContext, sint32 gqrIndex, uint32& gqrValue)
{
// UGQR 2 to 7 are initialized by the OS and we assume that games won't ever permanently touch those
// todo - hack - replace with more accurate solution
if (gqrIndex == 2)
gqrValue = 0x00040004;
else if (gqrIndex == 3)
gqrValue = 0x00050005;
else if (gqrIndex == 4)
gqrValue = 0x00060006;
else if (gqrIndex == 5)
gqrValue = 0x00070007;
else
return false;
return true;
}
/*
* If value of GQR can be predicted for a given PSQ load or store instruction then replace it with an optimized version
*/
void PPCRecompiler_optimizePSQLoadAndStore(ppcImlGenContext_t* ppcImlGenContext)
{
for (IMLSegment* segIt : ppcImlGenContext->segmentList2)
{
for(IMLInstruction& instIt : segIt->imlList)
{
if (instIt.type == PPCREC_IML_TYPE_FPR_LOAD || instIt.type == PPCREC_IML_TYPE_FPR_LOAD_INDEXED)
{
if(instIt.op_storeLoad.mode != PPCREC_FPR_LD_MODE_PSQ_GENERIC_PS0 &&
instIt.op_storeLoad.mode != PPCREC_FPR_LD_MODE_PSQ_GENERIC_PS0_PS1 )
continue;
// get GQR value
cemu_assert_debug(instIt.op_storeLoad.registerGQR.IsValid());
sint32 gqrIndex = _getGQRIndexFromRegister(ppcImlGenContext, instIt.op_storeLoad.registerGQR);
cemu_assert(gqrIndex >= 0);
if (ppcImlGenContext->tracking.modifiesGQR[gqrIndex])
continue;
uint32 gqrValue;
if (!PPCRecompiler_isUGQRValueKnown(ppcImlGenContext, gqrIndex, gqrValue))
continue;
uint32 formatType = (gqrValue >> 16) & 7;
uint32 scale = (gqrValue >> 24) & 0x3F;
if (scale != 0)
continue; // only generic handler supports scale
if (instIt.op_storeLoad.mode == PPCREC_FPR_LD_MODE_PSQ_GENERIC_PS0)
{
if (formatType == 0)
instIt.op_storeLoad.mode = PPCREC_FPR_LD_MODE_PSQ_FLOAT_PS0;
else if (formatType == 4)
instIt.op_storeLoad.mode = PPCREC_FPR_LD_MODE_PSQ_U8_PS0;
else if (formatType == 5)
instIt.op_storeLoad.mode = PPCREC_FPR_LD_MODE_PSQ_U16_PS0;
else if (formatType == 6)
instIt.op_storeLoad.mode = PPCREC_FPR_LD_MODE_PSQ_S8_PS0;
else if (formatType == 7)
instIt.op_storeLoad.mode = PPCREC_FPR_LD_MODE_PSQ_S16_PS0;
if (instIt.op_storeLoad.mode != PPCREC_FPR_LD_MODE_PSQ_GENERIC_PS0)
instIt.op_storeLoad.registerGQR = IMLREG_INVALID;
}
else if (instIt.op_storeLoad.mode == PPCREC_FPR_LD_MODE_PSQ_GENERIC_PS0_PS1)
{
if (formatType == 0)
instIt.op_storeLoad.mode = PPCREC_FPR_LD_MODE_PSQ_FLOAT_PS0_PS1;
else if (formatType == 4)
instIt.op_storeLoad.mode = PPCREC_FPR_LD_MODE_PSQ_U8_PS0_PS1;
else if (formatType == 5)
instIt.op_storeLoad.mode = PPCREC_FPR_LD_MODE_PSQ_U16_PS0_PS1;
else if (formatType == 6)
instIt.op_storeLoad.mode = PPCREC_FPR_LD_MODE_PSQ_S8_PS0_PS1;
else if (formatType == 7)
instIt.op_storeLoad.mode = PPCREC_FPR_LD_MODE_PSQ_S16_PS0_PS1;
if (instIt.op_storeLoad.mode != PPCREC_FPR_LD_MODE_PSQ_GENERIC_PS0_PS1)
instIt.op_storeLoad.registerGQR = IMLREG_INVALID;
}
}
else if (instIt.type == PPCREC_IML_TYPE_FPR_STORE || instIt.type == PPCREC_IML_TYPE_FPR_STORE_INDEXED)
{
if(instIt.op_storeLoad.mode != PPCREC_FPR_ST_MODE_PSQ_GENERIC_PS0 &&
instIt.op_storeLoad.mode != PPCREC_FPR_ST_MODE_PSQ_GENERIC_PS0_PS1)
continue;
// get GQR value
cemu_assert_debug(instIt.op_storeLoad.registerGQR.IsValid());
sint32 gqrIndex = _getGQRIndexFromRegister(ppcImlGenContext, instIt.op_storeLoad.registerGQR);
cemu_assert(gqrIndex >= 0 && gqrIndex < 8);
if (ppcImlGenContext->tracking.modifiesGQR[gqrIndex])
continue;
uint32 gqrValue;
if(!PPCRecompiler_isUGQRValueKnown(ppcImlGenContext, gqrIndex, gqrValue))
continue;
uint32 formatType = (gqrValue >> 16) & 7;
uint32 scale = (gqrValue >> 24) & 0x3F;
if (scale != 0)
continue; // only generic handler supports scale
if (instIt.op_storeLoad.mode == PPCREC_FPR_ST_MODE_PSQ_GENERIC_PS0)
{
if (formatType == 0)
instIt.op_storeLoad.mode = PPCREC_FPR_ST_MODE_PSQ_FLOAT_PS0;
else if (formatType == 4)
instIt.op_storeLoad.mode = PPCREC_FPR_ST_MODE_PSQ_U8_PS0;
else if (formatType == 5)
instIt.op_storeLoad.mode = PPCREC_FPR_ST_MODE_PSQ_U16_PS0;
else if (formatType == 6)
instIt.op_storeLoad.mode = PPCREC_FPR_ST_MODE_PSQ_S8_PS0;
else if (formatType == 7)
instIt.op_storeLoad.mode = PPCREC_FPR_ST_MODE_PSQ_S16_PS0;
if (instIt.op_storeLoad.mode != PPCREC_FPR_ST_MODE_PSQ_GENERIC_PS0)
instIt.op_storeLoad.registerGQR = IMLREG_INVALID;
}
else if (instIt.op_storeLoad.mode == PPCREC_FPR_ST_MODE_PSQ_GENERIC_PS0_PS1)
{
if (formatType == 0)
instIt.op_storeLoad.mode = PPCREC_FPR_ST_MODE_PSQ_FLOAT_PS0_PS1;
else if (formatType == 4)
instIt.op_storeLoad.mode = PPCREC_FPR_ST_MODE_PSQ_U8_PS0_PS1;
else if (formatType == 5)
instIt.op_storeLoad.mode = PPCREC_FPR_ST_MODE_PSQ_U16_PS0_PS1;
else if (formatType == 6)
instIt.op_storeLoad.mode = PPCREC_FPR_ST_MODE_PSQ_S8_PS0_PS1;
else if (formatType == 7)
instIt.op_storeLoad.mode = PPCREC_FPR_ST_MODE_PSQ_S16_PS0_PS1;
if (instIt.op_storeLoad.mode != PPCREC_FPR_ST_MODE_PSQ_GENERIC_PS0_PS1)
instIt.op_storeLoad.registerGQR = IMLREG_INVALID;
}
}
}
}
}
// analyses register dependencies across the entire function
// per segment this will generate information about which registers need to be preserved and which ones don't (e.g. are overwritten)
class IMLOptimizerRegIOAnalysis
{
public:
// constructor with segment pointer list as span
IMLOptimizerRegIOAnalysis(std::span<IMLSegment*> segmentList, uint32 maxRegId) : m_segmentList(segmentList), m_maxRegId(maxRegId)
{
m_segRegisterInOutList.resize(segmentList.size());
}
struct IMLSegmentRegisterInOut
{
// todo - since our register ID range is usually pretty small (<64) we could use integer bitmasks to accelerate this? There is a helper class used in RA code already
std::unordered_set<IMLRegID> regWritten; // registers which are modified in this segment
std::unordered_set<IMLRegID> regImported; // registers which are read in this segment before they are written (importing value from previous segments)
std::unordered_set<IMLRegID> regForward; // registers which are not read or written in this segment, but are imported into a later segment (propagated info)
};
// calculate which registers are imported (read-before-written) and forwarded (read-before-written by a later segment) per segment
// then in a second step propagate the dependencies across linked segments
void ComputeDepedencies()
{
std::vector<IMLSegmentRegisterInOut>& segRegisterInOutList = m_segRegisterInOutList;
IMLSegmentRegisterInOut* segIO = segRegisterInOutList.data();
uint32 index = 0;
for(auto& seg : m_segmentList)
{
seg->momentaryIndex = index;
index++;
for(auto& instr : seg->imlList)
{
IMLUsedRegisters registerUsage;
instr.CheckRegisterUsage(&registerUsage);
// registers are considered imported if they are read before being written in this seg
registerUsage.ForEachReadGPR([&](IMLReg gprReg) {
IMLRegID gprId = gprReg.GetRegID();
if (!segIO->regWritten.contains(gprId))
{
segIO->regImported.insert(gprId);
}
});
registerUsage.ForEachWrittenGPR([&](IMLReg gprReg) {
IMLRegID gprId = gprReg.GetRegID();
segIO->regWritten.insert(gprId);
});
}
segIO++;
}
// for every exit segment, import all registers
for(auto& seg : m_segmentList)
{
if (!seg->nextSegmentIsUncertain)
continue;
if(seg->deadCodeEliminationHintSeg)
continue;
IMLSegmentRegisterInOut& segIO = segRegisterInOutList[seg->momentaryIndex];
for(uint32 i=0; i<=m_maxRegId; i++)
{
segIO.regImported.insert((IMLRegID)i);
}
}
// broadcast dependencies across segment chains
std::unordered_set<uint32> segIdsWhichNeedUpdate;
for (uint32 i = 0; i < m_segmentList.size(); i++)
{
segIdsWhichNeedUpdate.insert(i);
}
while(!segIdsWhichNeedUpdate.empty())
{
auto firstIt = segIdsWhichNeedUpdate.begin();
uint32 segId = *firstIt;
segIdsWhichNeedUpdate.erase(firstIt);
// forward regImported and regForward to earlier segments into their regForward, unless the register is written
auto& curSeg = m_segmentList[segId];
IMLSegmentRegisterInOut& curSegIO = segRegisterInOutList[segId];
for(auto& prevSeg : curSeg->list_prevSegments)
{
IMLSegmentRegisterInOut& prevSegIO = segRegisterInOutList[prevSeg->momentaryIndex];
bool prevSegChanged = false;
for(auto& regId : curSegIO.regImported)
{
if (!prevSegIO.regWritten.contains(regId))
prevSegChanged |= prevSegIO.regForward.insert(regId).second;
}
for(auto& regId : curSegIO.regForward)
{
if (!prevSegIO.regWritten.contains(regId))
prevSegChanged |= prevSegIO.regForward.insert(regId).second;
}
if(prevSegChanged)
segIdsWhichNeedUpdate.insert(prevSeg->momentaryIndex);
}
// same for hint links
for(auto& prevSeg : curSeg->list_deadCodeHintBy)
{
IMLSegmentRegisterInOut& prevSegIO = segRegisterInOutList[prevSeg->momentaryIndex];
bool prevSegChanged = false;
for(auto& regId : curSegIO.regImported)
{
if (!prevSegIO.regWritten.contains(regId))
prevSegChanged |= prevSegIO.regForward.insert(regId).second;
}
for(auto& regId : curSegIO.regForward)
{
if (!prevSegIO.regWritten.contains(regId))
prevSegChanged |= prevSegIO.regForward.insert(regId).second;
}
if(prevSegChanged)
segIdsWhichNeedUpdate.insert(prevSeg->momentaryIndex);
}
}
}
std::unordered_set<IMLRegID> GetRegistersNeededAtEndOfSegment(IMLSegment& seg)
{
std::unordered_set<IMLRegID> regsNeeded;
if(seg.nextSegmentIsUncertain)
{
if(seg.deadCodeEliminationHintSeg)
{
auto& nextSegIO = m_segRegisterInOutList[seg.deadCodeEliminationHintSeg->momentaryIndex];
regsNeeded.insert(nextSegIO.regImported.begin(), nextSegIO.regImported.end());
regsNeeded.insert(nextSegIO.regForward.begin(), nextSegIO.regForward.end());
}
else
{
// add all regs
for(uint32 i = 0; i <= m_maxRegId; i++)
regsNeeded.insert(i);
}
return regsNeeded;
}
if(seg.nextSegmentBranchTaken)
{
auto& nextSegIO = m_segRegisterInOutList[seg.nextSegmentBranchTaken->momentaryIndex];
regsNeeded.insert(nextSegIO.regImported.begin(), nextSegIO.regImported.end());
regsNeeded.insert(nextSegIO.regForward.begin(), nextSegIO.regForward.end());
}
if(seg.nextSegmentBranchNotTaken)
{
auto& nextSegIO = m_segRegisterInOutList[seg.nextSegmentBranchNotTaken->momentaryIndex];
regsNeeded.insert(nextSegIO.regImported.begin(), nextSegIO.regImported.end());
regsNeeded.insert(nextSegIO.regForward.begin(), nextSegIO.regForward.end());
}
return regsNeeded;
}
bool IsRegisterNeededAtEndOfSegment(IMLSegment& seg, IMLRegID regId)
{
if(seg.nextSegmentIsUncertain)
{
if(!seg.deadCodeEliminationHintSeg)
return true;
auto& nextSegIO = m_segRegisterInOutList[seg.deadCodeEliminationHintSeg->momentaryIndex];
if(nextSegIO.regImported.contains(regId))
return true;
if(nextSegIO.regForward.contains(regId))
return true;
return false;
}
if(seg.nextSegmentBranchTaken)
{
auto& nextSegIO = m_segRegisterInOutList[seg.nextSegmentBranchTaken->momentaryIndex];
if(nextSegIO.regImported.contains(regId))
return true;
if(nextSegIO.regForward.contains(regId))
return true;
}
if(seg.nextSegmentBranchNotTaken)
{
auto& nextSegIO = m_segRegisterInOutList[seg.nextSegmentBranchNotTaken->momentaryIndex];
if(nextSegIO.regImported.contains(regId))
return true;
if(nextSegIO.regForward.contains(regId))
return true;
}
return false;
}
private:
std::span<IMLSegment*> m_segmentList;
uint32 m_maxRegId;
std::vector<IMLSegmentRegisterInOut> m_segRegisterInOutList;
};
// scan backwards starting from index and return the index of the first found instruction which writes to the given register (by id)
sint32 IMLUtil_FindInstructionWhichWritesRegister(IMLSegment& seg, sint32 startIndex, IMLReg reg, sint32 maxScanDistance = -1)
{
sint32 endIndex = std::max<sint32>(startIndex - maxScanDistance, 0);
for (sint32 i = startIndex; i >= endIndex; i--)
{
IMLInstruction& imlInstruction = seg.imlList[i];
IMLUsedRegisters registersUsed;
imlInstruction.CheckRegisterUsage(&registersUsed);
if (registersUsed.IsBaseGPRWritten(reg))
return i;
}
return -1;
}
// returns true if the instruction can safely be moved while keeping ordering constraints and data dependencies intact
// initialIndex is inclusive, targetIndex is exclusive
bool IMLUtil_CanMoveInstructionTo(IMLSegment& seg, sint32 initialIndex, sint32 targetIndex)
{
boost::container::static_vector<IMLRegID, 8> regsWritten;
boost::container::static_vector<IMLRegID, 8> regsRead;
// get list of read and written registers
IMLUsedRegisters registersUsed;
seg.imlList[initialIndex].CheckRegisterUsage(&registersUsed);
registersUsed.ForEachAccessedGPR([&](IMLReg reg, bool isWritten) {
if (isWritten)
regsWritten.push_back(reg.GetRegID());
else
regsRead.push_back(reg.GetRegID());
});
// check all the instructions inbetween
if(initialIndex < targetIndex)
{
sint32 scanStartIndex = initialIndex+1; // +1 to skip the moving instruction itself
sint32 scanEndIndex = targetIndex;
for (sint32 i = scanStartIndex; i < scanEndIndex; i++)
{
IMLUsedRegisters registersUsed;
seg.imlList[i].CheckRegisterUsage(&registersUsed);
// in order to be able to move an instruction past another instruction, any of the read registers must not be modified (written)
// and any of it's written registers must not be read
bool canMove = true;
registersUsed.ForEachAccessedGPR([&](IMLReg reg, bool isWritten) {
IMLRegID regId = reg.GetRegID();
if (!isWritten)
canMove = canMove && std::find(regsWritten.begin(), regsWritten.end(), regId) == regsWritten.end();
else
canMove = canMove && std::find(regsRead.begin(), regsRead.end(), regId) == regsRead.end();
});
if(!canMove)
return false;
}
}
else
{
cemu_assert_unimplemented(); // backwards scan is todo
return false;
}
return true;
}
sint32 IMLUtil_CountRegisterReadsInRange(IMLSegment& seg, sint32 scanStartIndex, sint32 scanEndIndex, IMLRegID regId)
{
cemu_assert_debug(scanStartIndex <= scanEndIndex);
cemu_assert_debug(scanEndIndex < seg.imlList.size());
sint32 count = 0;
for (sint32 i = scanStartIndex; i <= scanEndIndex; i++)
{
IMLUsedRegisters registersUsed;
seg.imlList[i].CheckRegisterUsage(&registersUsed);
registersUsed.ForEachReadGPR([&](IMLReg reg) {
if (reg.GetRegID() == regId)
count++;
});
}
return count;
}
// move instruction from one index to another
// instruction will be inserted before the instruction at targetIndex
// returns the new instruction index of the moved instruction
sint32 IMLUtil_MoveInstructionTo(IMLSegment& seg, sint32 initialIndex, sint32 targetIndex)
{
cemu_assert_debug(initialIndex != targetIndex);
IMLInstruction temp = seg.imlList[initialIndex];
if (initialIndex < targetIndex)
{
cemu_assert_debug(targetIndex > 0);
targetIndex--;
for(size_t i=initialIndex; i<targetIndex; i++)
seg.imlList[i] = seg.imlList[i+1];
seg.imlList[targetIndex] = temp;
return targetIndex;
}
else
{
cemu_assert_unimplemented(); // testing needed
std::copy(seg.imlList.begin() + targetIndex, seg.imlList.begin() + initialIndex, seg.imlList.begin() + targetIndex + 1);
seg.imlList[targetIndex] = temp;
return targetIndex;
}
}
// x86 specific
bool IMLOptimizerX86_ModifiesEFlags(IMLInstruction& inst)
{
// this is a very conservative implementation. There are more cases but this is good enough for now
if(inst.type == PPCREC_IML_TYPE_NAME_R || inst.type == PPCREC_IML_TYPE_R_NAME)
return false;
if((inst.type == PPCREC_IML_TYPE_R_R || inst.type == PPCREC_IML_TYPE_R_S32) && inst.operation == PPCREC_IML_OP_ASSIGN)
return false;
return true; // if we dont know for sure, assume it does
}
void IMLOptimizer_DebugPrintSeg(ppcImlGenContext_t& ppcImlGenContext, IMLSegment& seg)
{
printf("----------------\n");
IMLDebug_DumpSegment(&ppcImlGenContext, &seg);
fflush(stdout);
}
void IMLOptimizer_RemoveDeadCodeFromSegment(IMLOptimizerRegIOAnalysis& regIoAnalysis, IMLSegment& seg)
{
// algorithm works like this:
// Calculate which registers need to be preserved at the end of each segment
// Then for each segment:
// - Iterate instructions backwards
// - Maintain a list of registers which are read at a later point (initially this is the list from the first step)
// - If an instruction only modifies registers which are not in the read list and has no side effects, then it is dead code and can be replaced with a no-op
std::unordered_set<IMLRegID> regsNeeded = regIoAnalysis.GetRegistersNeededAtEndOfSegment(seg);
// start with suffix instruction
if(seg.HasSuffixInstruction())
{
IMLInstruction& imlInstruction = seg.imlList[seg.GetSuffixInstructionIndex()];
IMLUsedRegisters registersUsed;
imlInstruction.CheckRegisterUsage(&registersUsed);
registersUsed.ForEachWrittenGPR([&](IMLReg reg) {
regsNeeded.erase(reg.GetRegID());
});
registersUsed.ForEachReadGPR([&](IMLReg reg) {
regsNeeded.insert(reg.GetRegID());
});
}
// iterate instructions backwards
for (sint32 i = seg.imlList.size() - (seg.HasSuffixInstruction() ? 2:1); i >= 0; i--)
{
IMLInstruction& imlInstruction = seg.imlList[i];
IMLUsedRegisters registersUsed;
imlInstruction.CheckRegisterUsage(&registersUsed);
// register read -> remove from overwritten list
// register written -> add to overwritten list
// check if this instruction only writes registers which will never be read
bool onlyWritesRedundantRegisters = true;
registersUsed.ForEachWrittenGPR([&](IMLReg reg) {
if (regsNeeded.contains(reg.GetRegID()))
onlyWritesRedundantRegisters = false;
});
// check if any of the written registers are read after this point
registersUsed.ForEachWrittenGPR([&](IMLReg reg) {
regsNeeded.erase(reg.GetRegID());
});
registersUsed.ForEachReadGPR([&](IMLReg reg) {
regsNeeded.insert(reg.GetRegID());
});
if(!imlInstruction.HasSideEffects() && onlyWritesRedundantRegisters)
{
imlInstruction.make_no_op();
}
}
}
void IMLOptimizerX86_SubstituteCJumpForEflagsJump(IMLOptimizerRegIOAnalysis& regIoAnalysis, IMLSegment& seg)
{
// convert and optimize bool condition jumps to eflags condition jumps
// - Moves eflag setter (e.g. cmp) closer to eflags consumer (conditional jump) if necessary. If not possible but required then exit early
// - Since we only rely on eflags, the boolean register can be optimized out if DCE considers it unused
// - Further detect and optimize patterns like DEC + CMP + JCC into fused ops (todo)
// check if this segment ends with a conditional jump
if(!seg.HasSuffixInstruction())
return;
sint32 cjmpInstIndex = seg.GetSuffixInstructionIndex();
if(cjmpInstIndex < 0)
return;
IMLInstruction& cjumpInstr = seg.imlList[cjmpInstIndex];
if( cjumpInstr.type != PPCREC_IML_TYPE_CONDITIONAL_JUMP )
return;
IMLReg regCondBool = cjumpInstr.op_conditional_jump.registerBool;
bool invertedCondition = !cjumpInstr.op_conditional_jump.mustBeTrue;
// find the instruction which sets the bool
sint32 cmpInstrIndex = IMLUtil_FindInstructionWhichWritesRegister(seg, cjmpInstIndex-1, regCondBool, 20);
if(cmpInstrIndex < 0)
return;
// check if its an instruction combo which can be optimized (currently only cmp + cjump) and get the condition
IMLInstruction& condSetterInstr = seg.imlList[cmpInstrIndex];
IMLCondition cond;
if(condSetterInstr.type == PPCREC_IML_TYPE_COMPARE)
cond = condSetterInstr.op_compare.cond;
else if(condSetterInstr.type == PPCREC_IML_TYPE_COMPARE_S32)
cond = condSetterInstr.op_compare_s32.cond;
else
return;
// check if instructions inbetween modify eflags
sint32 indexEflagsSafeStart = -1; // index of the first instruction which does not modify eflags up to cjump
for(sint32 i = cjmpInstIndex-1; i > cmpInstrIndex; i--)
{
if(IMLOptimizerX86_ModifiesEFlags(seg.imlList[i]))
{
indexEflagsSafeStart = i+1;
break;
}
}
if(indexEflagsSafeStart >= 0)
{
cemu_assert(indexEflagsSafeStart > 0);
// there are eflags-modifying instructions inbetween the bool setter and cjump
// try to move the eflags setter close enough to the cjump (to indexEflagsSafeStart)
bool canMove = IMLUtil_CanMoveInstructionTo(seg, cmpInstrIndex, indexEflagsSafeStart);
if(!canMove)
{
return;
}
else
{
cmpInstrIndex = IMLUtil_MoveInstructionTo(seg, cmpInstrIndex, indexEflagsSafeStart);
}
}
// we can turn the jump into an eflags jump
cjumpInstr.make_x86_eflags_jcc(cond, invertedCondition);
if (IMLUtil_CountRegisterReadsInRange(seg, cmpInstrIndex, cjmpInstIndex, regCondBool.GetRegID()) > 1 || regIoAnalysis.IsRegisterNeededAtEndOfSegment(seg, regCondBool.GetRegID()))
return; // bool register is used beyond the CMP, we can't drop it
auto& cmpInstr = seg.imlList[cmpInstrIndex];
cemu_assert_debug(cmpInstr.type == PPCREC_IML_TYPE_COMPARE || cmpInstr.type == PPCREC_IML_TYPE_COMPARE_S32);
if(cmpInstr.type == PPCREC_IML_TYPE_COMPARE)
{
IMLReg regA = cmpInstr.op_compare.regA;
IMLReg regB = cmpInstr.op_compare.regB;
seg.imlList[cmpInstrIndex].make_r_r(PPCREC_IML_OP_X86_CMP, regA, regB);
}
else
{
IMLReg regA = cmpInstr.op_compare_s32.regA;
sint32 val = cmpInstr.op_compare_s32.immS32;
seg.imlList[cmpInstrIndex].make_r_s32(PPCREC_IML_OP_X86_CMP, regA, val);
}
}
void IMLOptimizer_StandardOptimizationPassForSegment(IMLOptimizerRegIOAnalysis& regIoAnalysis, IMLSegment& seg)
{
IMLOptimizer_RemoveDeadCodeFromSegment(regIoAnalysis, seg);
// x86 specific optimizations
IMLOptimizerX86_SubstituteCJumpForEflagsJump(regIoAnalysis, seg); // this pass should be applied late since it creates invisible eflags dependencies (which would break further register dependency analysis)
}
void IMLOptimizer_StandardOptimizationPass(ppcImlGenContext_t& ppcImlGenContext)
{
IMLOptimizerRegIOAnalysis regIoAnalysis(ppcImlGenContext.segmentList2, ppcImlGenContext.GetMaxRegId());
regIoAnalysis.ComputeDepedencies();
for (IMLSegment* segIt : ppcImlGenContext.segmentList2)
{
IMLOptimizer_StandardOptimizationPassForSegment(regIoAnalysis, *segIt);
}
}

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,125 @@
#pragma once
// container for storing a set of register indices
// specifically optimized towards storing typical range of physical register indices (expected to be below 64)
class IMLPhysRegisterSet
{
public:
void SetAvailable(uint32 index)
{
cemu_assert_debug(index < 64);
m_regBitmask |= ((uint64)1 << index);
}
void SetReserved(uint32 index)
{
cemu_assert_debug(index < 64);
m_regBitmask &= ~((uint64)1 << index);
}
void SetAllAvailable()
{
m_regBitmask = ~0ull;
}
bool HasAllAvailable() const
{
return m_regBitmask == ~0ull;
}
bool IsAvailable(uint32 index) const
{
return (m_regBitmask & ((uint64)1 << index)) != 0;
}
IMLPhysRegisterSet& operator&=(const IMLPhysRegisterSet& other)
{
this->m_regBitmask &= other.m_regBitmask;
return *this;
}
IMLPhysRegisterSet& operator=(const IMLPhysRegisterSet& other)
{
this->m_regBitmask = other.m_regBitmask;
return *this;
}
void RemoveRegisters(const IMLPhysRegisterSet& other)
{
this->m_regBitmask &= ~other.m_regBitmask;
}
bool HasAnyAvailable() const
{
return m_regBitmask != 0;
}
bool HasExactlyOneAvailable() const
{
return m_regBitmask != 0 && (m_regBitmask & (m_regBitmask - 1)) == 0;
}
// returns index of first available register. Do not call when HasAnyAvailable() == false
IMLPhysReg GetFirstAvailableReg()
{
cemu_assert_debug(m_regBitmask != 0);
sint32 regIndex = 0;
auto tmp = m_regBitmask;
while ((tmp & 0xFF) == 0)
{
regIndex += 8;
tmp >>= 8;
}
while ((tmp & 0x1) == 0)
{
regIndex++;
tmp >>= 1;
}
return regIndex;
}
// returns index of next available register (search includes any register index >= startIndex)
// returns -1 if there is no more register
IMLPhysReg GetNextAvailableReg(sint32 startIndex) const
{
if (startIndex >= 64)
return -1;
uint32 regIndex = startIndex;
auto tmp = m_regBitmask;
tmp >>= regIndex;
if (!tmp)
return -1;
while ((tmp & 0xFF) == 0)
{
regIndex += 8;
tmp >>= 8;
}
while ((tmp & 0x1) == 0)
{
regIndex++;
tmp >>= 1;
}
return regIndex;
}
sint32 CountAvailableRegs() const
{
return std::popcount(m_regBitmask);
}
private:
uint64 m_regBitmask{ 0 };
};
struct IMLRegisterAllocatorParameters
{
inline IMLPhysRegisterSet& GetPhysRegPool(IMLRegFormat regFormat)
{
return perTypePhysPool[stdx::to_underlying(regFormat)];
}
IMLPhysRegisterSet perTypePhysPool[stdx::to_underlying(IMLRegFormat::TYPE_COUNT)];
std::unordered_map<IMLRegID, IMLName> regIdToName;
};
void IMLRegisterAllocator_AllocateRegisters(ppcImlGenContext_t* ppcImlGenContext, IMLRegisterAllocatorParameters& raParam);

View file

@ -0,0 +1,635 @@
#include "../PPCRecompiler.h"
#include "../PPCRecompilerIml.h"
#include "IMLRegisterAllocatorRanges.h"
#include "util/helpers/MemoryPool.h"
uint32 IMLRA_GetNextIterationIndex();
IMLRegID raLivenessRange::GetVirtualRegister() const
{
return virtualRegister;
}
sint32 raLivenessRange::GetPhysicalRegister() const
{
return physicalRegister;
}
IMLName raLivenessRange::GetName() const
{
return name;
}
void raLivenessRange::SetPhysicalRegister(IMLPhysReg physicalRegister)
{
this->physicalRegister = physicalRegister;
}
void raLivenessRange::SetPhysicalRegisterForCluster(IMLPhysReg physicalRegister)
{
auto clusterRanges = GetAllSubrangesInCluster();
for(auto& range : clusterRanges)
range->physicalRegister = physicalRegister;
}
boost::container::small_vector<raLivenessRange*, 128> raLivenessRange::GetAllSubrangesInCluster()
{
uint32 iterationIndex = IMLRA_GetNextIterationIndex();
boost::container::small_vector<raLivenessRange*, 128> subranges;
subranges.push_back(this);
this->lastIterationIndex = iterationIndex;
size_t i = 0;
while(i<subranges.size())
{
raLivenessRange* cur = subranges[i];
i++;
// check successors
if(cur->subrangeBranchTaken && cur->subrangeBranchTaken->lastIterationIndex != iterationIndex)
{
cur->subrangeBranchTaken->lastIterationIndex = iterationIndex;
subranges.push_back(cur->subrangeBranchTaken);
}
if(cur->subrangeBranchNotTaken && cur->subrangeBranchNotTaken->lastIterationIndex != iterationIndex)
{
cur->subrangeBranchNotTaken->lastIterationIndex = iterationIndex;
subranges.push_back(cur->subrangeBranchNotTaken);
}
// check predecessors
for(auto& prev : cur->previousRanges)
{
if(prev->lastIterationIndex != iterationIndex)
{
prev->lastIterationIndex = iterationIndex;
subranges.push_back(prev);
}
}
}
return subranges;
}
void raLivenessRange::GetAllowedRegistersExRecursive(raLivenessRange* range, uint32 iterationIndex, IMLPhysRegisterSet& allowedRegs)
{
range->lastIterationIndex = iterationIndex;
for (auto& it : range->list_fixedRegRequirements)
allowedRegs &= it.allowedReg;
// check successors
if (range->subrangeBranchTaken && range->subrangeBranchTaken->lastIterationIndex != iterationIndex)
GetAllowedRegistersExRecursive(range->subrangeBranchTaken, iterationIndex, allowedRegs);
if (range->subrangeBranchNotTaken && range->subrangeBranchNotTaken->lastIterationIndex != iterationIndex)
GetAllowedRegistersExRecursive(range->subrangeBranchNotTaken, iterationIndex, allowedRegs);
// check predecessors
for (auto& prev : range->previousRanges)
{
if (prev->lastIterationIndex != iterationIndex)
GetAllowedRegistersExRecursive(prev, iterationIndex, allowedRegs);
}
};
bool raLivenessRange::GetAllowedRegistersEx(IMLPhysRegisterSet& allowedRegisters)
{
uint32 iterationIndex = IMLRA_GetNextIterationIndex();
allowedRegisters.SetAllAvailable();
GetAllowedRegistersExRecursive(this, iterationIndex, allowedRegisters);
return !allowedRegisters.HasAllAvailable();
}
IMLPhysRegisterSet raLivenessRange::GetAllowedRegisters(IMLPhysRegisterSet regPool)
{
IMLPhysRegisterSet fixedRegRequirements = regPool;
if(interval.ExtendsPreviousSegment() || interval.ExtendsIntoNextSegment())
{
auto clusterRanges = GetAllSubrangesInCluster();
for(auto& subrange : clusterRanges)
{
for(auto& fixedRegLoc : subrange->list_fixedRegRequirements)
fixedRegRequirements &= fixedRegLoc.allowedReg;
}
return fixedRegRequirements;
}
for(auto& fixedRegLoc : list_fixedRegRequirements)
fixedRegRequirements &= fixedRegLoc.allowedReg;
return fixedRegRequirements;
}
void PPCRecRARange_addLink_perVirtualGPR(std::unordered_map<IMLRegID, raLivenessRange*>& root, raLivenessRange* subrange)
{
IMLRegID regId = subrange->GetVirtualRegister();
auto it = root.find(regId);
if (it == root.end())
{
// new single element
root.try_emplace(regId, subrange);
subrange->link_sameVirtualRegister.prev = nullptr;
subrange->link_sameVirtualRegister.next = nullptr;
}
else
{
// insert in first position
raLivenessRange* priorFirst = it->second;
subrange->link_sameVirtualRegister.next = priorFirst;
it->second = subrange;
subrange->link_sameVirtualRegister.prev = nullptr;
priorFirst->link_sameVirtualRegister.prev = subrange;
}
}
void PPCRecRARange_addLink_allSegmentRanges(raLivenessRange** root, raLivenessRange* subrange)
{
subrange->link_allSegmentRanges.next = *root;
if (*root)
(*root)->link_allSegmentRanges.prev = subrange;
subrange->link_allSegmentRanges.prev = nullptr;
*root = subrange;
}
void PPCRecRARange_removeLink_perVirtualGPR(std::unordered_map<IMLRegID, raLivenessRange*>& root, raLivenessRange* subrange)
{
#ifdef CEMU_DEBUG_ASSERT
raLivenessRange* cur = root.find(subrange->GetVirtualRegister())->second;
bool hasRangeFound = false;
while(cur)
{
if(cur == subrange)
{
hasRangeFound = true;
break;
}
cur = cur->link_sameVirtualRegister.next;
}
cemu_assert_debug(hasRangeFound);
#endif
IMLRegID regId = subrange->GetVirtualRegister();
raLivenessRange* nextRange = subrange->link_sameVirtualRegister.next;
raLivenessRange* prevRange = subrange->link_sameVirtualRegister.prev;
raLivenessRange* newBase = prevRange ? prevRange : nextRange;
if (prevRange)
prevRange->link_sameVirtualRegister.next = subrange->link_sameVirtualRegister.next;
if (nextRange)
nextRange->link_sameVirtualRegister.prev = subrange->link_sameVirtualRegister.prev;
if (!prevRange)
{
if (nextRange)
{
root.find(regId)->second = nextRange;
}
else
{
cemu_assert_debug(root.find(regId)->second == subrange);
root.erase(regId);
}
}
#ifdef CEMU_DEBUG_ASSERT
subrange->link_sameVirtualRegister.prev = (raLivenessRange*)1;
subrange->link_sameVirtualRegister.next = (raLivenessRange*)1;
#endif
}
void PPCRecRARange_removeLink_allSegmentRanges(raLivenessRange** root, raLivenessRange* subrange)
{
raLivenessRange* tempPrev = subrange->link_allSegmentRanges.prev;
if (subrange->link_allSegmentRanges.prev)
subrange->link_allSegmentRanges.prev->link_allSegmentRanges.next = subrange->link_allSegmentRanges.next;
else
(*root) = subrange->link_allSegmentRanges.next;
if (subrange->link_allSegmentRanges.next)
subrange->link_allSegmentRanges.next->link_allSegmentRanges.prev = tempPrev;
#ifdef CEMU_DEBUG_ASSERT
subrange->link_allSegmentRanges.prev = (raLivenessRange*)1;
subrange->link_allSegmentRanges.next = (raLivenessRange*)1;
#endif
}
MemoryPoolPermanentObjects<raLivenessRange> memPool_livenessSubrange(4096);
// startPosition and endPosition are inclusive
raLivenessRange* IMLRA_CreateRange(ppcImlGenContext_t* ppcImlGenContext, IMLSegment* imlSegment, IMLRegID virtualRegister, IMLName name, raInstructionEdge startPosition, raInstructionEdge endPosition)
{
raLivenessRange* range = memPool_livenessSubrange.acquireObj();
range->previousRanges.clear();
range->list_accessLocations.clear();
range->list_fixedRegRequirements.clear();
range->imlSegment = imlSegment;
cemu_assert_debug(startPosition <= endPosition);
range->interval.start = startPosition;
range->interval.end = endPosition;
// register mapping
range->virtualRegister = virtualRegister;
range->name = name;
range->physicalRegister = -1;
// default values
range->hasStore = false;
range->hasStoreDelayed = false;
range->lastIterationIndex = 0;
range->subrangeBranchNotTaken = nullptr;
range->subrangeBranchTaken = nullptr;
cemu_assert_debug(range->previousRanges.empty());
range->_noLoad = false;
// add to segment linked lists
PPCRecRARange_addLink_perVirtualGPR(imlSegment->raInfo.linkedList_perVirtualRegister, range);
PPCRecRARange_addLink_allSegmentRanges(&imlSegment->raInfo.linkedList_allSubranges, range);
return range;
}
void _unlinkSubrange(raLivenessRange* range)
{
IMLSegment* imlSegment = range->imlSegment;
PPCRecRARange_removeLink_perVirtualGPR(imlSegment->raInfo.linkedList_perVirtualRegister, range);
PPCRecRARange_removeLink_allSegmentRanges(&imlSegment->raInfo.linkedList_allSubranges, range);
// unlink reverse references
if(range->subrangeBranchTaken)
range->subrangeBranchTaken->previousRanges.erase(std::find(range->subrangeBranchTaken->previousRanges.begin(), range->subrangeBranchTaken->previousRanges.end(), range));
if(range->subrangeBranchNotTaken)
range->subrangeBranchNotTaken->previousRanges.erase(std::find(range->subrangeBranchNotTaken->previousRanges.begin(), range->subrangeBranchNotTaken->previousRanges.end(), range));
range->subrangeBranchTaken = (raLivenessRange*)(uintptr_t)-1;
range->subrangeBranchNotTaken = (raLivenessRange*)(uintptr_t)-1;
// remove forward references
for(auto& prev : range->previousRanges)
{
if(prev->subrangeBranchTaken == range)
prev->subrangeBranchTaken = nullptr;
if(prev->subrangeBranchNotTaken == range)
prev->subrangeBranchNotTaken = nullptr;
}
range->previousRanges.clear();
}
void IMLRA_DeleteRange(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange* range)
{
_unlinkSubrange(range);
range->list_accessLocations.clear();
range->list_fixedRegRequirements.clear();
memPool_livenessSubrange.releaseObj(range);
}
void IMLRA_DeleteRangeCluster(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange* range)
{
auto clusterRanges = range->GetAllSubrangesInCluster();
for (auto& subrange : clusterRanges)
IMLRA_DeleteRange(ppcImlGenContext, subrange);
}
void IMLRA_DeleteAllRanges(ppcImlGenContext_t* ppcImlGenContext)
{
for(auto& seg : ppcImlGenContext->segmentList2)
{
raLivenessRange* cur;
while(cur = seg->raInfo.linkedList_allSubranges)
IMLRA_DeleteRange(ppcImlGenContext, cur);
seg->raInfo.linkedList_allSubranges = nullptr;
seg->raInfo.linkedList_perVirtualRegister.clear();
}
}
void IMLRA_MergeSubranges(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange* subrange, raLivenessRange* absorbedSubrange)
{
#ifdef CEMU_DEBUG_ASSERT
PPCRecRA_debugValidateSubrange(subrange);
PPCRecRA_debugValidateSubrange(absorbedSubrange);
if (subrange->imlSegment != absorbedSubrange->imlSegment)
assert_dbg();
cemu_assert_debug(subrange->interval.end == absorbedSubrange->interval.start);
if (subrange->subrangeBranchTaken || subrange->subrangeBranchNotTaken)
assert_dbg();
if (subrange == absorbedSubrange)
assert_dbg();
#endif
// update references
subrange->subrangeBranchTaken = absorbedSubrange->subrangeBranchTaken;
subrange->subrangeBranchNotTaken = absorbedSubrange->subrangeBranchNotTaken;
absorbedSubrange->subrangeBranchTaken = nullptr;
absorbedSubrange->subrangeBranchNotTaken = nullptr;
if(subrange->subrangeBranchTaken)
*std::find(subrange->subrangeBranchTaken->previousRanges.begin(), subrange->subrangeBranchTaken->previousRanges.end(), absorbedSubrange) = subrange;
if(subrange->subrangeBranchNotTaken)
*std::find(subrange->subrangeBranchNotTaken->previousRanges.begin(), subrange->subrangeBranchNotTaken->previousRanges.end(), absorbedSubrange) = subrange;
// merge usage locations
for (auto& accessLoc : absorbedSubrange->list_accessLocations)
subrange->list_accessLocations.push_back(accessLoc);
absorbedSubrange->list_accessLocations.clear();
// merge fixed reg locations
#ifdef CEMU_DEBUG_ASSERT
if(!subrange->list_fixedRegRequirements.empty() && !absorbedSubrange->list_fixedRegRequirements.empty())
{
cemu_assert_debug(subrange->list_fixedRegRequirements.back().pos < absorbedSubrange->list_fixedRegRequirements.front().pos);
}
#endif
for (auto& fixedReg : absorbedSubrange->list_fixedRegRequirements)
subrange->list_fixedRegRequirements.push_back(fixedReg);
absorbedSubrange->list_fixedRegRequirements.clear();
subrange->interval.end = absorbedSubrange->interval.end;
PPCRecRA_debugValidateSubrange(subrange);
IMLRA_DeleteRange(ppcImlGenContext, absorbedSubrange);
}
// remove all inter-segment connections from the range cluster and split it into local ranges. Ranges are trimmed and if they have no access location they will be removed
void IMLRA_ExplodeRangeCluster(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange* originRange)
{
cemu_assert_debug(originRange->interval.ExtendsPreviousSegment() || originRange->interval.ExtendsIntoNextSegment()); // only call this on ranges that span multiple segments
auto clusterRanges = originRange->GetAllSubrangesInCluster();
for (auto& subrange : clusterRanges)
{
if (subrange->list_accessLocations.empty())
continue;
raInterval interval;
interval.SetInterval(subrange->list_accessLocations.front().pos, subrange->list_accessLocations.back().pos);
raLivenessRange* newSubrange = IMLRA_CreateRange(ppcImlGenContext, subrange->imlSegment, subrange->GetVirtualRegister(), subrange->GetName(), interval.start, interval.end);
// copy locations and fixed reg indices
newSubrange->list_accessLocations = subrange->list_accessLocations;
newSubrange->list_fixedRegRequirements = subrange->list_fixedRegRequirements;
if(originRange->HasPhysicalRegister())
{
cemu_assert_debug(subrange->list_fixedRegRequirements.empty()); // avoid unassigning a register from a range with a fixed register requirement
}
// validate
if(!newSubrange->list_accessLocations.empty())
{
cemu_assert_debug(newSubrange->list_accessLocations.front().pos >= newSubrange->interval.start);
cemu_assert_debug(newSubrange->list_accessLocations.back().pos <= newSubrange->interval.end);
}
if(!newSubrange->list_fixedRegRequirements.empty())
{
cemu_assert_debug(newSubrange->list_fixedRegRequirements.front().pos >= newSubrange->interval.start); // fixed register requirements outside of the actual access range probably means there is a mistake in GetInstructionFixedRegisters()
cemu_assert_debug(newSubrange->list_fixedRegRequirements.back().pos <= newSubrange->interval.end);
}
}
// delete the original range cluster
IMLRA_DeleteRangeCluster(ppcImlGenContext, originRange);
}
#ifdef CEMU_DEBUG_ASSERT
void PPCRecRA_debugValidateSubrange(raLivenessRange* range)
{
// validate subrange
if (range->subrangeBranchTaken && range->subrangeBranchTaken->imlSegment != range->imlSegment->nextSegmentBranchTaken)
assert_dbg();
if (range->subrangeBranchNotTaken && range->subrangeBranchNotTaken->imlSegment != range->imlSegment->nextSegmentBranchNotTaken)
assert_dbg();
if(range->subrangeBranchTaken || range->subrangeBranchNotTaken)
{
cemu_assert_debug(range->interval.end.ConnectsToNextSegment());
}
if(!range->previousRanges.empty())
{
cemu_assert_debug(range->interval.start.ConnectsToPreviousSegment());
}
// validate locations
if (!range->list_accessLocations.empty())
{
cemu_assert_debug(range->list_accessLocations.front().pos >= range->interval.start);
cemu_assert_debug(range->list_accessLocations.back().pos <= range->interval.end);
}
// validate fixed reg requirements
if (!range->list_fixedRegRequirements.empty())
{
cemu_assert_debug(range->list_fixedRegRequirements.front().pos >= range->interval.start);
cemu_assert_debug(range->list_fixedRegRequirements.back().pos <= range->interval.end);
for(sint32 i = 0; i < (sint32)range->list_fixedRegRequirements.size()-1; i++)
cemu_assert_debug(range->list_fixedRegRequirements[i].pos < range->list_fixedRegRequirements[i+1].pos);
}
}
#else
void PPCRecRA_debugValidateSubrange(raLivenessRange* range) {}
#endif
// trim start and end of range to match first and last read/write locations
// does not trim start/endpoints which extend into the next/previous segment
void IMLRA_TrimRangeToUse(raLivenessRange* range)
{
if(range->list_accessLocations.empty())
{
// special case where we trim ranges extending from other segments to a single instruction edge
cemu_assert_debug(!range->interval.start.IsInstructionIndex() || !range->interval.end.IsInstructionIndex());
if(range->interval.start.IsInstructionIndex())
range->interval.start = range->interval.end;
if(range->interval.end.IsInstructionIndex())
range->interval.end = range->interval.start;
return;
}
// trim start and end
raInterval prevInterval = range->interval;
if(range->interval.start.IsInstructionIndex())
range->interval.start = range->list_accessLocations.front().pos;
if(range->interval.end.IsInstructionIndex())
range->interval.end = range->list_accessLocations.back().pos;
// extra checks
#ifdef CEMU_DEBUG_ASSERT
cemu_assert_debug(range->interval.start <= range->interval.end);
for(auto& loc : range->list_accessLocations)
{
cemu_assert_debug(range->interval.ContainsEdge(loc.pos));
}
cemu_assert_debug(prevInterval.ContainsWholeInterval(range->interval));
#endif
}
// split range at the given position
// After the split there will be two ranges:
// head -> subrange is shortened to end at splitIndex (exclusive)
// tail -> a new subrange that ranges from splitIndex (inclusive) to the end of the original subrange
// if head has a physical register assigned it will not carry over to tail
// The return value is the tail range
// If trimToUsage is true, the end of the head subrange and the start of the tail subrange will be shrunk to fit the read/write locations within. If there are no locations then the range will be deleted
raLivenessRange* IMLRA_SplitRange(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange*& subrange, raInstructionEdge splitPosition, bool trimToUsage)
{
cemu_assert_debug(splitPosition.IsInstructionIndex());
cemu_assert_debug(!subrange->interval.IsNextSegmentOnly() && !subrange->interval.IsPreviousSegmentOnly());
cemu_assert_debug(subrange->interval.ContainsEdge(splitPosition));
// determine new intervals
raInterval headInterval, tailInterval;
headInterval.SetInterval(subrange->interval.start, splitPosition-1);
tailInterval.SetInterval(splitPosition, subrange->interval.end);
cemu_assert_debug(headInterval.start <= headInterval.end);
cemu_assert_debug(tailInterval.start <= tailInterval.end);
// create tail
raLivenessRange* tailSubrange = IMLRA_CreateRange(ppcImlGenContext, subrange->imlSegment, subrange->GetVirtualRegister(), subrange->GetName(), tailInterval.start, tailInterval.end);
tailSubrange->SetPhysicalRegister(subrange->GetPhysicalRegister());
// carry over branch targets and update reverse references
tailSubrange->subrangeBranchTaken = subrange->subrangeBranchTaken;
tailSubrange->subrangeBranchNotTaken = subrange->subrangeBranchNotTaken;
subrange->subrangeBranchTaken = nullptr;
subrange->subrangeBranchNotTaken = nullptr;
if(tailSubrange->subrangeBranchTaken)
*std::find(tailSubrange->subrangeBranchTaken->previousRanges.begin(), tailSubrange->subrangeBranchTaken->previousRanges.end(), subrange) = tailSubrange;
if(tailSubrange->subrangeBranchNotTaken)
*std::find(tailSubrange->subrangeBranchNotTaken->previousRanges.begin(), tailSubrange->subrangeBranchNotTaken->previousRanges.end(), subrange) = tailSubrange;
// we assume that list_locations is ordered by instruction index and contains no duplicate indices, so lets check that here just in case
#ifdef CEMU_DEBUG_ASSERT
if(subrange->list_accessLocations.size() > 1)
{
for(size_t i=0; i<subrange->list_accessLocations.size()-1; i++)
{
cemu_assert_debug(subrange->list_accessLocations[i].pos < subrange->list_accessLocations[i+1].pos);
}
}
#endif
// split locations
auto it = std::lower_bound(
subrange->list_accessLocations.begin(), subrange->list_accessLocations.end(), splitPosition,
[](const raAccessLocation& accessLoc, raInstructionEdge value) { return accessLoc.pos < value; }
);
size_t originalCount = subrange->list_accessLocations.size();
tailSubrange->list_accessLocations.insert(tailSubrange->list_accessLocations.end(), it, subrange->list_accessLocations.end());
subrange->list_accessLocations.erase(it, subrange->list_accessLocations.end());
cemu_assert_debug(subrange->list_accessLocations.empty() || subrange->list_accessLocations.back().pos < splitPosition);
cemu_assert_debug(tailSubrange->list_accessLocations.empty() || tailSubrange->list_accessLocations.front().pos >= splitPosition);
cemu_assert_debug(subrange->list_accessLocations.size() + tailSubrange->list_accessLocations.size() == originalCount);
// split fixed reg requirements
for (sint32 i = 0; i < subrange->list_fixedRegRequirements.size(); i++)
{
raFixedRegRequirement* fixedReg = subrange->list_fixedRegRequirements.data() + i;
if (tailInterval.ContainsEdge(fixedReg->pos))
{
tailSubrange->list_fixedRegRequirements.push_back(*fixedReg);
}
}
// remove tail fixed reg requirements from head
for (sint32 i = 0; i < subrange->list_fixedRegRequirements.size(); i++)
{
raFixedRegRequirement* fixedReg = subrange->list_fixedRegRequirements.data() + i;
if (!headInterval.ContainsEdge(fixedReg->pos))
{
subrange->list_fixedRegRequirements.resize(i);
break;
}
}
// adjust intervals
subrange->interval = headInterval;
tailSubrange->interval = tailInterval;
// trim to hole
if(trimToUsage)
{
if(subrange->list_accessLocations.empty() && (subrange->interval.start.IsInstructionIndex() && subrange->interval.end.IsInstructionIndex()))
{
IMLRA_DeleteRange(ppcImlGenContext, subrange);
subrange = nullptr;
}
else
{
IMLRA_TrimRangeToUse(subrange);
}
if(tailSubrange->list_accessLocations.empty() && (tailSubrange->interval.start.IsInstructionIndex() && tailSubrange->interval.end.IsInstructionIndex()))
{
IMLRA_DeleteRange(ppcImlGenContext, tailSubrange);
tailSubrange = nullptr;
}
else
{
IMLRA_TrimRangeToUse(tailSubrange);
}
}
// validation
cemu_assert_debug(!subrange || subrange->interval.start <= subrange->interval.end);
cemu_assert_debug(!tailSubrange || tailSubrange->interval.start <= tailSubrange->interval.end);
cemu_assert_debug(!tailSubrange || tailSubrange->interval.start >= splitPosition);
if (!trimToUsage)
cemu_assert_debug(!tailSubrange || tailSubrange->interval.start == splitPosition);
if(subrange)
PPCRecRA_debugValidateSubrange(subrange);
if(tailSubrange)
PPCRecRA_debugValidateSubrange(tailSubrange);
return tailSubrange;
}
sint32 IMLRA_GetSegmentReadWriteCost(IMLSegment* imlSegment)
{
sint32 v = imlSegment->loopDepth + 1;
v *= 5;
return v*v; // 25, 100, 225, 400
}
// calculate additional cost of range that it would have after calling _ExplodeRange() on it
sint32 IMLRA_CalculateAdditionalCostOfRangeExplode(raLivenessRange* subrange)
{
auto ranges = subrange->GetAllSubrangesInCluster();
sint32 cost = 0;//-PPCRecRARange_estimateTotalCost(ranges);
for (auto& subrange : ranges)
{
if (subrange->list_accessLocations.empty())
continue; // this range would be deleted and thus has no cost
sint32 segmentLoadStoreCost = IMLRA_GetSegmentReadWriteCost(subrange->imlSegment);
bool hasAdditionalLoad = subrange->interval.ExtendsPreviousSegment();
bool hasAdditionalStore = subrange->interval.ExtendsIntoNextSegment();
if(hasAdditionalLoad && subrange->list_accessLocations.front().IsWrite()) // if written before read then a load isn't necessary
{
cemu_assert_debug(!subrange->list_accessLocations.front().IsRead());
cost += segmentLoadStoreCost;
}
if(hasAdditionalStore)
{
bool hasWrite = std::find_if(subrange->list_accessLocations.begin(), subrange->list_accessLocations.end(), [](const raAccessLocation& loc) { return loc.IsWrite(); }) != subrange->list_accessLocations.end();
if(!hasWrite) // ranges which don't modify their value do not need to be stored
cost += segmentLoadStoreCost;
}
}
// todo - properly calculating all the data-flow dependency based costs is more complex so this currently is an approximation
return cost;
}
sint32 IMLRA_CalculateAdditionalCostAfterSplit(raLivenessRange* subrange, raInstructionEdge splitPosition)
{
// validation
#ifdef CEMU_DEBUG_ASSERT
if (subrange->interval.ExtendsIntoNextSegment())
assert_dbg();
#endif
cemu_assert_debug(splitPosition.IsInstructionIndex());
sint32 cost = 0;
// find split position in location list
if (subrange->list_accessLocations.empty())
return 0;
if (splitPosition <= subrange->list_accessLocations.front().pos)
return 0;
if (splitPosition > subrange->list_accessLocations.back().pos)
return 0;
size_t firstTailLocationIndex = 0;
for (size_t i = 0; i < subrange->list_accessLocations.size(); i++)
{
if (subrange->list_accessLocations[i].pos >= splitPosition)
{
firstTailLocationIndex = i;
break;
}
}
std::span<raAccessLocation> headLocations{subrange->list_accessLocations.data(), firstTailLocationIndex};
std::span<raAccessLocation> tailLocations{subrange->list_accessLocations.data() + firstTailLocationIndex, subrange->list_accessLocations.size() - firstTailLocationIndex};
cemu_assert_debug(headLocations.empty() || headLocations.back().pos < splitPosition);
cemu_assert_debug(tailLocations.empty() || tailLocations.front().pos >= splitPosition);
sint32 segmentLoadStoreCost = IMLRA_GetSegmentReadWriteCost(subrange->imlSegment);
auto CalculateCostFromLocationRange = [segmentLoadStoreCost](std::span<raAccessLocation> locations, bool trackLoadCost = true, bool trackStoreCost = true) -> sint32
{
if(locations.empty())
return 0;
sint32 cost = 0;
if(locations.front().IsRead() && trackLoadCost)
cost += segmentLoadStoreCost; // not overwritten, so there is a load cost
bool hasWrite = std::find_if(locations.begin(), locations.end(), [](const raAccessLocation& loc) { return loc.IsWrite(); }) != locations.end();
if(hasWrite && trackStoreCost)
cost += segmentLoadStoreCost; // modified, so there is a store cost
return cost;
};
sint32 baseCost = CalculateCostFromLocationRange(subrange->list_accessLocations);
bool tailOverwritesValue = !tailLocations.empty() && !tailLocations.front().IsRead() && tailLocations.front().IsWrite();
sint32 newCost = CalculateCostFromLocationRange(headLocations) + CalculateCostFromLocationRange(tailLocations, !tailOverwritesValue, true);
cemu_assert_debug(newCost >= baseCost);
cost = newCost - baseCost;
return cost;
}

View file

@ -0,0 +1,364 @@
#pragma once
#include "IMLRegisterAllocator.h"
struct raLivenessSubrangeLink
{
struct raLivenessRange* prev;
struct raLivenessRange* next;
};
struct raInstructionEdge
{
friend struct raInterval;
public:
raInstructionEdge()
{
index = 0;
}
raInstructionEdge(sint32 instructionIndex, bool isInputEdge)
{
Set(instructionIndex, isInputEdge);
}
void Set(sint32 instructionIndex, bool isInputEdge)
{
if(instructionIndex == RA_INTER_RANGE_START || instructionIndex == RA_INTER_RANGE_END)
{
index = instructionIndex;
return;
}
index = instructionIndex * 2 + (isInputEdge ? 0 : 1);
cemu_assert_debug(index >= 0 && index < 0x100000*2); // make sure index value is sane
}
void SetRaw(sint32 index)
{
this->index = index;
cemu_assert_debug(index == RA_INTER_RANGE_START || index == RA_INTER_RANGE_END || (index >= 0 && index < 0x100000*2)); // make sure index value is sane
}
// sint32 GetRaw()
// {
// this->index = index;
// }
std::string GetDebugString()
{
if(index == RA_INTER_RANGE_START)
return "RA_START";
else if(index == RA_INTER_RANGE_END)
return "RA_END";
std::string str = fmt::format("{}", GetInstructionIndex());
if(IsOnInputEdge())
str += "i";
else if(IsOnOutputEdge())
str += "o";
return str;
}
sint32 GetInstructionIndex() const
{
cemu_assert_debug(index != RA_INTER_RANGE_START && index != RA_INTER_RANGE_END);
return index >> 1;
}
// returns instruction index or RA_INTER_RANGE_START/RA_INTER_RANGE_END
sint32 GetInstructionIndexEx() const
{
if(index == RA_INTER_RANGE_START || index == RA_INTER_RANGE_END)
return index;
return index >> 1;
}
sint32 GetRaw() const
{
return index;
}
bool IsOnInputEdge() const
{
cemu_assert_debug(index != RA_INTER_RANGE_START && index != RA_INTER_RANGE_END);
return (index&1) == 0;
}
bool IsOnOutputEdge() const
{
cemu_assert_debug(index != RA_INTER_RANGE_START && index != RA_INTER_RANGE_END);
return (index&1) != 0;
}
bool ConnectsToPreviousSegment() const
{
return index == RA_INTER_RANGE_START;
}
bool ConnectsToNextSegment() const
{
return index == RA_INTER_RANGE_END;
}
bool IsInstructionIndex() const
{
return index != RA_INTER_RANGE_START && index != RA_INTER_RANGE_END;
}
// comparison operators
bool operator>(const raInstructionEdge& other) const
{
return index > other.index;
}
bool operator<(const raInstructionEdge& other) const
{
return index < other.index;
}
bool operator<=(const raInstructionEdge& other) const
{
return index <= other.index;
}
bool operator>=(const raInstructionEdge& other) const
{
return index >= other.index;
}
bool operator==(const raInstructionEdge& other) const
{
return index == other.index;
}
raInstructionEdge operator+(sint32 offset) const
{
cemu_assert_debug(IsInstructionIndex());
cemu_assert_debug(offset >= 0 && offset < RA_INTER_RANGE_END);
raInstructionEdge edge;
edge.index = index + offset;
return edge;
}
raInstructionEdge operator-(sint32 offset) const
{
cemu_assert_debug(IsInstructionIndex());
cemu_assert_debug(offset >= 0 && offset < RA_INTER_RANGE_END);
raInstructionEdge edge;
edge.index = index - offset;
return edge;
}
raInstructionEdge& operator++()
{
cemu_assert_debug(IsInstructionIndex());
index++;
return *this;
}
private:
sint32 index; // can also be RA_INTER_RANGE_START or RA_INTER_RANGE_END, otherwise contains instruction index * 2
};
struct raAccessLocation
{
raAccessLocation(raInstructionEdge pos) : pos(pos) {}
bool IsRead() const
{
return pos.IsOnInputEdge();
}
bool IsWrite() const
{
return pos.IsOnOutputEdge();
}
raInstructionEdge pos;
};
struct raInterval
{
raInterval()
{
}
raInterval(raInstructionEdge start, raInstructionEdge end)
{
SetInterval(start, end);
}
// isStartOnInput = Input+Output edge on first instruction. If false then only output
// isEndOnOutput = Input+Output edge on last instruction. If false then only input
void SetInterval(sint32 start, bool isStartOnInput, sint32 end, bool isEndOnOutput)
{
this->start.Set(start, isStartOnInput);
this->end.Set(end, !isEndOnOutput);
}
void SetInterval(raInstructionEdge start, raInstructionEdge end)
{
cemu_assert_debug(start <= end);
this->start = start;
this->end = end;
}
void SetStart(const raInstructionEdge& edge)
{
start = edge;
}
void SetEnd(const raInstructionEdge& edge)
{
end = edge;
}
sint32 GetStartIndex() const
{
return start.GetInstructionIndex();
}
sint32 GetEndIndex() const
{
return end.GetInstructionIndex();
}
bool ExtendsPreviousSegment() const
{
return start.ConnectsToPreviousSegment();
}
bool ExtendsIntoNextSegment() const
{
return end.ConnectsToNextSegment();
}
bool IsNextSegmentOnly() const
{
return start.ConnectsToNextSegment() && end.ConnectsToNextSegment();
}
bool IsPreviousSegmentOnly() const
{
return start.ConnectsToPreviousSegment() && end.ConnectsToPreviousSegment();
}
// returns true if range is contained within a single segment
bool IsLocal() const
{
return start.GetRaw() > RA_INTER_RANGE_START && end.GetRaw() < RA_INTER_RANGE_END;
}
bool ContainsInstructionIndex(sint32 instructionIndex) const
{
cemu_assert_debug(instructionIndex != RA_INTER_RANGE_START && instructionIndex != RA_INTER_RANGE_END);
return instructionIndex >= start.GetInstructionIndexEx() && instructionIndex <= end.GetInstructionIndexEx();
}
// similar to ContainsInstructionIndex, but allows RA_INTER_RANGE_START/END as input
bool ContainsInstructionIndexEx(sint32 instructionIndex) const
{
if(instructionIndex == RA_INTER_RANGE_START)
return start.ConnectsToPreviousSegment();
if(instructionIndex == RA_INTER_RANGE_END)
return end.ConnectsToNextSegment();
return instructionIndex >= start.GetInstructionIndexEx() && instructionIndex <= end.GetInstructionIndexEx();
}
bool ContainsEdge(const raInstructionEdge& edge) const
{
return edge >= start && edge <= end;
}
bool ContainsWholeInterval(const raInterval& other) const
{
return other.start >= start && other.end <= end;
}
bool IsOverlapping(const raInterval& other) const
{
return start <= other.end && end >= other.start;
}
sint32 GetPreciseDistance()
{
cemu_assert_debug(!start.ConnectsToNextSegment()); // how to handle this?
if(start == end)
return 1;
cemu_assert_debug(!end.ConnectsToPreviousSegment() && !end.ConnectsToNextSegment());
if(start.ConnectsToPreviousSegment())
return end.GetRaw() + 1;
return end.GetRaw() - start.GetRaw() + 1; // +1 because end is inclusive
}
//private: not making these directly accessible only forces us to create loads of verbose getters and setters
raInstructionEdge start;
raInstructionEdge end;
};
struct raFixedRegRequirement
{
raInstructionEdge pos;
IMLPhysRegisterSet allowedReg;
};
struct raLivenessRange
{
IMLSegment* imlSegment;
raInterval interval;
// dirty state tracking
bool _noLoad;
bool hasStore;
bool hasStoreDelayed;
// next
raLivenessRange* subrangeBranchTaken;
raLivenessRange* subrangeBranchNotTaken;
// reverse counterpart of BranchTaken/BranchNotTaken
boost::container::small_vector<raLivenessRange*, 4> previousRanges;
// processing
uint32 lastIterationIndex;
// instruction read/write locations
std::vector<raAccessLocation> list_accessLocations;
// ordered list of all raInstructionEdge indices which require a fixed register
std::vector<raFixedRegRequirement> list_fixedRegRequirements;
// linked list (subranges with same GPR virtual register)
raLivenessSubrangeLink link_sameVirtualRegister;
// linked list (all subranges for this segment)
raLivenessSubrangeLink link_allSegmentRanges;
// register info
IMLRegID virtualRegister;
IMLName name;
// register allocator result
IMLPhysReg physicalRegister;
boost::container::small_vector<raLivenessRange*, 128> GetAllSubrangesInCluster();
bool GetAllowedRegistersEx(IMLPhysRegisterSet& allowedRegisters); // if the cluster has fixed register requirements in any instruction this returns the combined register mask. Otherwise returns false in which case allowedRegisters is left undefined
IMLPhysRegisterSet GetAllowedRegisters(IMLPhysRegisterSet regPool); // return regPool with fixed register requirements filtered out
IMLRegID GetVirtualRegister() const;
sint32 GetPhysicalRegister() const;
bool HasPhysicalRegister() const { return physicalRegister >= 0; }
IMLName GetName() const;
void SetPhysicalRegister(IMLPhysReg physicalRegister);
void SetPhysicalRegisterForCluster(IMLPhysReg physicalRegister);
void UnsetPhysicalRegister() { physicalRegister = -1; }
private:
void GetAllowedRegistersExRecursive(raLivenessRange* range, uint32 iterationIndex, IMLPhysRegisterSet& allowedRegs);
};
raLivenessRange* IMLRA_CreateRange(ppcImlGenContext_t* ppcImlGenContext, IMLSegment* imlSegment, IMLRegID virtualRegister, IMLName name, raInstructionEdge startPosition, raInstructionEdge endPosition);
void IMLRA_DeleteRange(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange* subrange);
void IMLRA_DeleteAllRanges(ppcImlGenContext_t* ppcImlGenContext);
void IMLRA_ExplodeRangeCluster(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange* originRange);
void IMLRA_MergeSubranges(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange* subrange, raLivenessRange* absorbedSubrange);
raLivenessRange* IMLRA_SplitRange(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange*& subrange, raInstructionEdge splitPosition, bool trimToUsage = false);
void PPCRecRA_debugValidateSubrange(raLivenessRange* subrange);
// cost estimation
sint32 IMLRA_GetSegmentReadWriteCost(IMLSegment* imlSegment);
sint32 IMLRA_CalculateAdditionalCostOfRangeExplode(raLivenessRange* subrange);
//sint32 PPCRecRARange_estimateAdditionalCostAfterSplit(raLivenessRange* subrange, sint32 splitIndex);
sint32 IMLRA_CalculateAdditionalCostAfterSplit(raLivenessRange* subrange, raInstructionEdge splitPosition);

View file

@ -0,0 +1,133 @@
#include "IMLInstruction.h"
#include "IMLSegment.h"
void IMLSegment::SetEnterable(uint32 enterAddress)
{
cemu_assert_debug(!isEnterable || enterPPCAddress == enterAddress);
isEnterable = true;
enterPPCAddress = enterAddress;
}
bool IMLSegment::HasSuffixInstruction() const
{
if (imlList.empty())
return false;
const IMLInstruction& imlInstruction = imlList.back();
return imlInstruction.IsSuffixInstruction();
}
sint32 IMLSegment::GetSuffixInstructionIndex() const
{
cemu_assert_debug(HasSuffixInstruction());
return (sint32)(imlList.size() - 1);
}
IMLInstruction* IMLSegment::GetLastInstruction()
{
if (imlList.empty())
return nullptr;
return &imlList.back();
}
void IMLSegment::SetLinkBranchNotTaken(IMLSegment* imlSegmentDst)
{
if (nextSegmentBranchNotTaken)
nextSegmentBranchNotTaken->list_prevSegments.erase(std::find(nextSegmentBranchNotTaken->list_prevSegments.begin(), nextSegmentBranchNotTaken->list_prevSegments.end(), this));
nextSegmentBranchNotTaken = imlSegmentDst;
if(imlSegmentDst)
imlSegmentDst->list_prevSegments.push_back(this);
}
void IMLSegment::SetLinkBranchTaken(IMLSegment* imlSegmentDst)
{
if (nextSegmentBranchTaken)
nextSegmentBranchTaken->list_prevSegments.erase(std::find(nextSegmentBranchTaken->list_prevSegments.begin(), nextSegmentBranchTaken->list_prevSegments.end(), this));
nextSegmentBranchTaken = imlSegmentDst;
if (imlSegmentDst)
imlSegmentDst->list_prevSegments.push_back(this);
}
IMLInstruction* IMLSegment::AppendInstruction()
{
IMLInstruction& inst = imlList.emplace_back();
memset(&inst, 0, sizeof(IMLInstruction));
return &inst;
}
void IMLSegment_SetLinkBranchNotTaken(IMLSegment* imlSegmentSrc, IMLSegment* imlSegmentDst)
{
// make sure segments aren't already linked
if (imlSegmentSrc->nextSegmentBranchNotTaken == imlSegmentDst)
return;
// add as next segment for source
if (imlSegmentSrc->nextSegmentBranchNotTaken != nullptr)
assert_dbg();
imlSegmentSrc->nextSegmentBranchNotTaken = imlSegmentDst;
// add as previous segment for destination
imlSegmentDst->list_prevSegments.push_back(imlSegmentSrc);
}
void IMLSegment_SetLinkBranchTaken(IMLSegment* imlSegmentSrc, IMLSegment* imlSegmentDst)
{
// make sure segments aren't already linked
if (imlSegmentSrc->nextSegmentBranchTaken == imlSegmentDst)
return;
// add as next segment for source
if (imlSegmentSrc->nextSegmentBranchTaken != nullptr)
assert_dbg();
imlSegmentSrc->nextSegmentBranchTaken = imlSegmentDst;
// add as previous segment for destination
imlSegmentDst->list_prevSegments.push_back(imlSegmentSrc);
}
void IMLSegment_RemoveLink(IMLSegment* imlSegmentSrc, IMLSegment* imlSegmentDst)
{
if (imlSegmentSrc->nextSegmentBranchNotTaken == imlSegmentDst)
{
imlSegmentSrc->nextSegmentBranchNotTaken = nullptr;
}
else if (imlSegmentSrc->nextSegmentBranchTaken == imlSegmentDst)
{
imlSegmentSrc->nextSegmentBranchTaken = nullptr;
}
else
assert_dbg();
bool matchFound = false;
for (sint32 i = 0; i < imlSegmentDst->list_prevSegments.size(); i++)
{
if (imlSegmentDst->list_prevSegments[i] == imlSegmentSrc)
{
imlSegmentDst->list_prevSegments.erase(imlSegmentDst->list_prevSegments.begin() + i);
matchFound = true;
break;
}
}
if (matchFound == false)
assert_dbg();
}
/*
* Replaces all links to segment orig with linkts to segment new
*/
void IMLSegment_RelinkInputSegment(IMLSegment* imlSegmentOrig, IMLSegment* imlSegmentNew)
{
while (imlSegmentOrig->list_prevSegments.size() != 0)
{
IMLSegment* prevSegment = imlSegmentOrig->list_prevSegments[0];
if (prevSegment->nextSegmentBranchNotTaken == imlSegmentOrig)
{
IMLSegment_RemoveLink(prevSegment, imlSegmentOrig);
IMLSegment_SetLinkBranchNotTaken(prevSegment, imlSegmentNew);
}
else if (prevSegment->nextSegmentBranchTaken == imlSegmentOrig)
{
IMLSegment_RemoveLink(prevSegment, imlSegmentOrig);
IMLSegment_SetLinkBranchTaken(prevSegment, imlSegmentNew);
}
else
{
assert_dbg();
}
}
}

View file

@ -0,0 +1,193 @@
#pragma once
#include "IMLInstruction.h"
#include <boost/container/small_vector.hpp>
// special values to mark the index of ranges that reach across the segment border
#define RA_INTER_RANGE_START (-1)
#define RA_INTER_RANGE_END (0x70000000)
struct IMLSegmentPoint
{
friend struct IMLSegmentInterval;
sint32 index;
struct IMLSegment* imlSegment; // do we really need to track this? SegmentPoints are always accessed via the segment that they are part of
IMLSegmentPoint* next;
IMLSegmentPoint* prev;
// the index is the instruction index times two.
// this gives us the ability to cover half an instruction with RA ranges
// covering only the first half of an instruction (0-0) means that the register is read, but not preserved
// covering first and the second half means the register is read and preserved
// covering only the second half means the register is written but not read
sint32 GetInstructionIndex() const
{
return index;
}
void SetInstructionIndex(sint32 index)
{
this->index = index;
}
void ShiftIfAfter(sint32 instructionIndex, sint32 shiftCount)
{
if (!IsPreviousSegment() && !IsNextSegment())
{
if (GetInstructionIndex() >= instructionIndex)
index += shiftCount;
}
}
void DecrementByOneInstruction()
{
index--;
}
// the segment point can point beyond the first and last instruction which indicates that it is an infinite range reaching up to the previous or next segment
bool IsPreviousSegment() const { return index == RA_INTER_RANGE_START; }
bool IsNextSegment() const { return index == RA_INTER_RANGE_END; }
// overload operand > and <
bool operator>(const IMLSegmentPoint& other) const { return index > other.index; }
bool operator<(const IMLSegmentPoint& other) const { return index < other.index; }
bool operator==(const IMLSegmentPoint& other) const { return index == other.index; }
bool operator!=(const IMLSegmentPoint& other) const { return index != other.index; }
// overload comparison operands for sint32
bool operator>(const sint32 other) const { return index > other; }
bool operator<(const sint32 other) const { return index < other; }
bool operator<=(const sint32 other) const { return index <= other; }
bool operator>=(const sint32 other) const { return index >= other; }
};
struct IMLSegmentInterval
{
IMLSegmentPoint start;
IMLSegmentPoint end;
bool ContainsInstructionIndex(sint32 offset) const { return start <= offset && end > offset; }
bool IsRangeOverlapping(const IMLSegmentInterval& other)
{
// todo - compare the raw index
sint32 r1start = this->start.GetInstructionIndex();
sint32 r1end = this->end.GetInstructionIndex();
sint32 r2start = other.start.GetInstructionIndex();
sint32 r2end = other.end.GetInstructionIndex();
if (r1start < r2end && r1end > r2start)
return true;
if (this->start.IsPreviousSegment() && r1start == r2start)
return true;
if (this->end.IsNextSegment() && r1end == r2end)
return true;
return false;
}
bool ExtendsIntoPreviousSegment() const
{
return start.IsPreviousSegment();
}
bool ExtendsIntoNextSegment() const
{
return end.IsNextSegment();
}
bool IsNextSegmentOnly() const
{
if(!start.IsNextSegment())
return false;
cemu_assert_debug(end.IsNextSegment());
return true;
}
bool IsPreviousSegmentOnly() const
{
if (!end.IsPreviousSegment())
return false;
cemu_assert_debug(start.IsPreviousSegment());
return true;
}
sint32 GetDistance() const
{
// todo - assert if either start or end is outside the segment
// we may also want to switch this to raw indices?
return end.GetInstructionIndex() - start.GetInstructionIndex();
}
};
struct PPCSegmentRegisterAllocatorInfo_t
{
// used during loop detection
bool isPartOfProcessedLoop{};
sint32 lastIterationIndex{};
// linked lists
struct raLivenessRange* linkedList_allSubranges{};
std::unordered_map<IMLRegID, struct raLivenessRange*> linkedList_perVirtualRegister;
};
struct IMLSegment
{
sint32 momentaryIndex{}; // index in segment list, generally not kept up to date except if needed (necessary for loop detection)
sint32 loopDepth{};
uint32 ppcAddress{}; // ppc address (0xFFFFFFFF if not associated with an address)
uint32 x64Offset{}; // x64 code offset of segment start
// list of intermediate instructions in this segment
std::vector<IMLInstruction> imlList;
// segment link
IMLSegment* nextSegmentBranchNotTaken{}; // this is also the default for segments where there is no branch
IMLSegment* nextSegmentBranchTaken{};
bool nextSegmentIsUncertain{};
std::vector<IMLSegment*> list_prevSegments{};
// source for overwrite analysis (if nextSegmentIsUncertain is true)
// sometimes a segment is marked as an exit point, but for the purposes of dead code elimination we know the next segment
IMLSegment* deadCodeEliminationHintSeg{};
std::vector<IMLSegment*> list_deadCodeHintBy{};
// enterable segments
bool isEnterable{}; // this segment can be entered from outside the recompiler (no preloaded registers necessary)
uint32 enterPPCAddress{}; // used if isEnterable is true
// register allocator info
PPCSegmentRegisterAllocatorInfo_t raInfo{};
// segment state API
void SetEnterable(uint32 enterAddress);
void SetLinkBranchNotTaken(IMLSegment* imlSegmentDst);
void SetLinkBranchTaken(IMLSegment* imlSegmentDst);
IMLSegment* GetBranchTaken()
{
return nextSegmentBranchTaken;
}
IMLSegment* GetBranchNotTaken()
{
return nextSegmentBranchNotTaken;
}
void SetNextSegmentForOverwriteHints(IMLSegment* seg)
{
cemu_assert_debug(!deadCodeEliminationHintSeg);
deadCodeEliminationHintSeg = seg;
if (seg)
seg->list_deadCodeHintBy.push_back(this);
}
// instruction API
IMLInstruction* AppendInstruction();
bool HasSuffixInstruction() const;
sint32 GetSuffixInstructionIndex() const;
IMLInstruction* GetLastInstruction();
// segment points
IMLSegmentPoint* segmentPointList{};
};
void IMLSegment_SetLinkBranchNotTaken(IMLSegment* imlSegmentSrc, IMLSegment* imlSegmentDst);
void IMLSegment_SetLinkBranchTaken(IMLSegment* imlSegmentSrc, IMLSegment* imlSegmentDst);
void IMLSegment_RelinkInputSegment(IMLSegment* imlSegmentOrig, IMLSegment* imlSegmentNew);
void IMLSegment_RemoveLink(IMLSegment* imlSegmentSrc, IMLSegment* imlSegmentDst);

View file

@ -21,6 +21,16 @@ public:
};
public:
~PPCFunctionBoundaryTracker()
{
while (!map_ranges.empty())
{
PPCRange_t* range = *map_ranges.begin();
delete range;
map_ranges.erase(map_ranges.begin());
}
}
void trackStartPoint(MPTR startAddress)
{
processRange(startAddress, nullptr, nullptr);
@ -40,10 +50,34 @@ public:
return false;
}
std::vector<PPCRange_t> GetRanges()
{
std::vector<PPCRange_t> r;
for (auto& it : map_ranges)
r.emplace_back(*it);
return r;
}
bool ContainsAddress(uint32 addr) const
{
for (auto& it : map_ranges)
{
if (addr >= it->startAddress && addr < it->getEndAddress())
return true;
}
return false;
}
const std::set<uint32>& GetBranchTargets() const
{
return map_branchTargetsAll;
}
private:
void addBranchDestination(PPCRange_t* sourceRange, MPTR address)
{
map_branchTargets.emplace(address);
map_queuedBranchTargets.emplace(address);
map_branchTargetsAll.emplace(address);
}
// process flow of instruction
@ -114,7 +148,7 @@ private:
Espresso::BOField BO;
uint32 BI;
bool LK;
Espresso::decodeOp_BCLR(opcode, BO, BI, LK);
Espresso::decodeOp_BCSPR(opcode, BO, BI, LK);
if (BO.branchAlways() && !LK)
{
// unconditional BLR
@ -218,7 +252,7 @@ private:
auto rangeItr = map_ranges.begin();
PPCRange_t* previousRange = nullptr;
for (std::set<uint32_t>::const_iterator targetItr = map_branchTargets.begin() ; targetItr != map_branchTargets.end(); )
for (std::set<uint32_t>::const_iterator targetItr = map_queuedBranchTargets.begin() ; targetItr != map_queuedBranchTargets.end(); )
{
while (rangeItr != map_ranges.end() && ((*rangeItr)->startAddress + (*rangeItr)->length) <= (*targetItr))
{
@ -239,7 +273,7 @@ private:
(*targetItr) < ((*rangeItr)->startAddress + (*rangeItr)->length))
{
// delete visited targets
targetItr = map_branchTargets.erase(targetItr);
targetItr = map_queuedBranchTargets.erase(targetItr);
continue;
}
@ -289,5 +323,6 @@ private:
};
std::set<PPCRange_t*, RangePtrCmp> map_ranges;
std::set<uint32> map_branchTargets;
std::set<uint32> map_queuedBranchTargets;
std::set<uint32> map_branchTargetsAll;
};

View file

@ -2,7 +2,6 @@
#include "PPCFunctionBoundaryTracker.h"
#include "PPCRecompiler.h"
#include "PPCRecompilerIml.h"
#include "PPCRecompilerX64.h"
#include "Cafe/OS/RPL/rpl.h"
#include "util/containers/RangeStore.h"
#include "Cafe/OS/libs/coreinit/coreinit_CodeGen.h"
@ -14,6 +13,14 @@
#include "util/helpers/helpers.h"
#include "util/MemMapper/MemMapper.h"
#include "IML/IML.h"
#include "IML/IMLRegisterAllocator.h"
#include "BackendX64/BackendX64.h"
#include "util/highresolutiontimer/HighResolutionTimer.h"
#define PPCREC_FORCE_SYNCHRONOUS_COMPILATION 0 // if 1, then function recompilation will block and execute on the thread that called PPCRecompiler_visitAddressNoBlock
#define PPCREC_LOG_RECOMPILATION_RESULTS 0
struct PPCInvalidationRange
{
MPTR startAddress;
@ -37,11 +44,36 @@ void ATTR_MS_ABI (*PPCRecompiler_leaveRecompilerCode_unvisited)();
PPCRecompilerInstanceData_t* ppcRecompilerInstanceData;
#if PPCREC_FORCE_SYNCHRONOUS_COMPILATION
static std::mutex s_singleRecompilationMutex;
#endif
bool ppcRecompilerEnabled = false;
void PPCRecompiler_recompileAtAddress(uint32 address);
// this function does never block and can fail if the recompiler lock cannot be acquired immediately
void PPCRecompiler_visitAddressNoBlock(uint32 enterAddress)
{
#if PPCREC_FORCE_SYNCHRONOUS_COMPILATION
if (ppcRecompilerInstanceData->ppcRecompilerDirectJumpTable[enterAddress / 4] != PPCRecompiler_leaveRecompilerCode_unvisited)
return;
PPCRecompilerState.recompilerSpinlock.lock();
if (ppcRecompilerInstanceData->ppcRecompilerDirectJumpTable[enterAddress / 4] != PPCRecompiler_leaveRecompilerCode_unvisited)
{
PPCRecompilerState.recompilerSpinlock.unlock();
return;
}
ppcRecompilerInstanceData->ppcRecompilerDirectJumpTable[enterAddress / 4] = PPCRecompiler_leaveRecompilerCode_visited;
PPCRecompilerState.recompilerSpinlock.unlock();
s_singleRecompilationMutex.lock();
if (ppcRecompilerInstanceData->ppcRecompilerDirectJumpTable[enterAddress / 4] == PPCRecompiler_leaveRecompilerCode_visited)
{
PPCRecompiler_recompileAtAddress(enterAddress);
}
s_singleRecompilationMutex.unlock();
return;
#endif
// quick read-only check without lock
if (ppcRecompilerInstanceData->ppcRecompilerDirectJumpTable[enterAddress / 4] != PPCRecompiler_leaveRecompilerCode_unvisited)
return;
@ -127,15 +159,15 @@ void PPCRecompiler_attemptEnter(PPCInterpreter_t* hCPU, uint32 enterAddress)
PPCRecompiler_enter(hCPU, funcPtr);
}
}
bool PPCRecompiler_ApplyIMLPasses(ppcImlGenContext_t& ppcImlGenContext);
PPCRecFunction_t* PPCRecompiler_recompileFunction(PPCFunctionBoundaryTracker::PPCRange_t range, std::set<uint32>& entryAddresses, std::vector<std::pair<MPTR, uint32>>& entryPointsOut)
PPCRecFunction_t* PPCRecompiler_recompileFunction(PPCFunctionBoundaryTracker::PPCRange_t range, std::set<uint32>& entryAddresses, std::vector<std::pair<MPTR, uint32>>& entryPointsOut, PPCFunctionBoundaryTracker& boundaryTracker)
{
if (range.startAddress >= PPC_REC_CODE_AREA_END)
{
cemuLog_log(LogType::Force, "Attempting to recompile function outside of allowed code area");
return nullptr;
}
uint32 codeGenRangeStart;
uint32 codeGenRangeSize = 0;
coreinit::OSGetCodegenVirtAddrRangeInternal(codeGenRangeStart, codeGenRangeSize);
@ -153,29 +185,61 @@ PPCRecFunction_t* PPCRecompiler_recompileFunction(PPCFunctionBoundaryTracker::PP
PPCRecFunction_t* ppcRecFunc = new PPCRecFunction_t();
ppcRecFunc->ppcAddress = range.startAddress;
ppcRecFunc->ppcSize = range.length;
#if PPCREC_LOG_RECOMPILATION_RESULTS
BenchmarkTimer bt;
bt.Start();
#endif
// generate intermediate code
ppcImlGenContext_t ppcImlGenContext = { 0 };
bool compiledSuccessfully = PPCRecompiler_generateIntermediateCode(ppcImlGenContext, ppcRecFunc, entryAddresses);
ppcImlGenContext.debug_entryPPCAddress = range.startAddress;
bool compiledSuccessfully = PPCRecompiler_generateIntermediateCode(ppcImlGenContext, ppcRecFunc, entryAddresses, boundaryTracker);
if (compiledSuccessfully == false)
{
// todo: Free everything
PPCRecompiler_freeContext(&ppcImlGenContext);
delete ppcRecFunc;
return NULL;
return nullptr;
}
uint32 ppcRecLowerAddr = LaunchSettings::GetPPCRecLowerAddr();
uint32 ppcRecUpperAddr = LaunchSettings::GetPPCRecUpperAddr();
if (ppcRecLowerAddr != 0 && ppcRecUpperAddr != 0)
{
if (ppcRecFunc->ppcAddress < ppcRecLowerAddr || ppcRecFunc->ppcAddress > ppcRecUpperAddr)
{
delete ppcRecFunc;
return nullptr;
}
}
// apply passes
if (!PPCRecompiler_ApplyIMLPasses(ppcImlGenContext))
{
delete ppcRecFunc;
return nullptr;
}
// emit x64 code
bool x64GenerationSuccess = PPCRecompiler_generateX64Code(ppcRecFunc, &ppcImlGenContext);
if (x64GenerationSuccess == false)
{
PPCRecompiler_freeContext(&ppcImlGenContext);
return nullptr;
}
if (ActiveSettings::DumpRecompilerFunctionsEnabled())
{
FileStream* fs = FileStream::createFile2(ActiveSettings::GetUserDataPath(fmt::format("dump/recompiler/ppc_{:08x}.bin", ppcRecFunc->ppcAddress)));
if (fs)
{
fs->writeData(ppcRecFunc->x86Code, ppcRecFunc->x86Size);
delete fs;
}
}
// collect list of PPC-->x64 entry points
entryPointsOut.clear();
for (sint32 s = 0; s < ppcImlGenContext.segmentListCount; s++)
for(IMLSegment* imlSegment : ppcImlGenContext.segmentList2)
{
PPCRecImlSegment_t* imlSegment = ppcImlGenContext.segmentList[s];
if (imlSegment->isEnterable == false)
continue;
@ -185,10 +249,83 @@ PPCRecFunction_t* PPCRecompiler_recompileFunction(PPCFunctionBoundaryTracker::PP
entryPointsOut.emplace_back(ppcEnterOffset, x64Offset);
}
PPCRecompiler_freeContext(&ppcImlGenContext);
#if PPCREC_LOG_RECOMPILATION_RESULTS
bt.Stop();
uint32 codeHash = 0;
for (uint32 i = 0; i < ppcRecFunc->x86Size; i++)
{
codeHash = _rotr(codeHash, 3);
codeHash += ((uint8*)ppcRecFunc->x86Code)[i];
}
cemuLog_log(LogType::Force, "[Recompiler] PPC 0x{:08x} -> x64: 0x{:x} Took {:.4}ms | Size {:04x} CodeHash {:08x}", (uint32)ppcRecFunc->ppcAddress, (uint64)(uintptr_t)ppcRecFunc->x86Code, bt.GetElapsedMilliseconds(), ppcRecFunc->x86Size, codeHash);
#endif
return ppcRecFunc;
}
void PPCRecompiler_NativeRegisterAllocatorPass(ppcImlGenContext_t& ppcImlGenContext)
{
IMLRegisterAllocatorParameters raParam;
for (auto& it : ppcImlGenContext.mappedRegs)
raParam.regIdToName.try_emplace(it.second.GetRegID(), it.first);
auto& gprPhysPool = raParam.GetPhysRegPool(IMLRegFormat::I64);
gprPhysPool.SetAvailable(IMLArchX86::PHYSREG_GPR_BASE + X86_REG_RAX);
gprPhysPool.SetAvailable(IMLArchX86::PHYSREG_GPR_BASE + X86_REG_RDX);
gprPhysPool.SetAvailable(IMLArchX86::PHYSREG_GPR_BASE + X86_REG_RBX);
gprPhysPool.SetAvailable(IMLArchX86::PHYSREG_GPR_BASE + X86_REG_RBP);
gprPhysPool.SetAvailable(IMLArchX86::PHYSREG_GPR_BASE + X86_REG_RSI);
gprPhysPool.SetAvailable(IMLArchX86::PHYSREG_GPR_BASE + X86_REG_RDI);
gprPhysPool.SetAvailable(IMLArchX86::PHYSREG_GPR_BASE + X86_REG_R8);
gprPhysPool.SetAvailable(IMLArchX86::PHYSREG_GPR_BASE + X86_REG_R9);
gprPhysPool.SetAvailable(IMLArchX86::PHYSREG_GPR_BASE + X86_REG_R10);
gprPhysPool.SetAvailable(IMLArchX86::PHYSREG_GPR_BASE + X86_REG_R11);
gprPhysPool.SetAvailable(IMLArchX86::PHYSREG_GPR_BASE + X86_REG_R12);
gprPhysPool.SetAvailable(IMLArchX86::PHYSREG_GPR_BASE + X86_REG_RCX);
// add XMM registers, except XMM15 which is the temporary register
auto& fprPhysPool = raParam.GetPhysRegPool(IMLRegFormat::F64);
fprPhysPool.SetAvailable(IMLArchX86::PHYSREG_FPR_BASE + 0);
fprPhysPool.SetAvailable(IMLArchX86::PHYSREG_FPR_BASE + 1);
fprPhysPool.SetAvailable(IMLArchX86::PHYSREG_FPR_BASE + 2);
fprPhysPool.SetAvailable(IMLArchX86::PHYSREG_FPR_BASE + 3);
fprPhysPool.SetAvailable(IMLArchX86::PHYSREG_FPR_BASE + 4);
fprPhysPool.SetAvailable(IMLArchX86::PHYSREG_FPR_BASE + 5);
fprPhysPool.SetAvailable(IMLArchX86::PHYSREG_FPR_BASE + 6);
fprPhysPool.SetAvailable(IMLArchX86::PHYSREG_FPR_BASE + 7);
fprPhysPool.SetAvailable(IMLArchX86::PHYSREG_FPR_BASE + 8);
fprPhysPool.SetAvailable(IMLArchX86::PHYSREG_FPR_BASE + 9);
fprPhysPool.SetAvailable(IMLArchX86::PHYSREG_FPR_BASE + 10);
fprPhysPool.SetAvailable(IMLArchX86::PHYSREG_FPR_BASE + 11);
fprPhysPool.SetAvailable(IMLArchX86::PHYSREG_FPR_BASE + 12);
fprPhysPool.SetAvailable(IMLArchX86::PHYSREG_FPR_BASE + 13);
fprPhysPool.SetAvailable(IMLArchX86::PHYSREG_FPR_BASE + 14);
IMLRegisterAllocator_AllocateRegisters(&ppcImlGenContext, raParam);
}
bool PPCRecompiler_ApplyIMLPasses(ppcImlGenContext_t& ppcImlGenContext)
{
// isolate entry points from function flow (enterable segments must not be the target of any other segment)
// this simplifies logic during register allocation
PPCRecompilerIML_isolateEnterableSegments(&ppcImlGenContext);
// if GQRs can be predicted, optimize PSQ load/stores
PPCRecompiler_optimizePSQLoadAndStore(&ppcImlGenContext);
// merge certain float load+store patterns (must happen before FPR register remapping)
IMLOptimizer_OptimizeDirectFloatCopies(&ppcImlGenContext);
// delay byte swapping for certain load+store patterns
IMLOptimizer_OptimizeDirectIntegerCopies(&ppcImlGenContext);
IMLOptimizer_StandardOptimizationPass(ppcImlGenContext);
PPCRecompiler_NativeRegisterAllocatorPass(ppcImlGenContext);
return true;
}
bool PPCRecompiler_makeRecompiledFunctionActive(uint32 initialEntryPoint, PPCFunctionBoundaryTracker::PPCRange_t& range, PPCRecFunction_t* ppcRecFunc, std::vector<std::pair<MPTR, uint32>>& entryPoints)
{
// update jump table
@ -202,7 +339,7 @@ bool PPCRecompiler_makeRecompiledFunctionActive(uint32 initialEntryPoint, PPCFun
return false;
}
// check if the current range got invalidated in the time it took to recompile it
// check if the current range got invalidated during the time it took to recompile it
bool isInvalidated = false;
for (auto& invRange : PPCRecompilerState.invalidationRanges)
{
@ -280,7 +417,7 @@ void PPCRecompiler_recompileAtAddress(uint32 address)
PPCRecompilerState.recompilerSpinlock.unlock();
std::vector<std::pair<MPTR, uint32>> functionEntryPoints;
auto func = PPCRecompiler_recompileFunction(range, entryAddresses, functionEntryPoints);
auto func = PPCRecompiler_recompileFunction(range, entryAddresses, functionEntryPoints, funcBoundaries);
if (!func)
{
@ -295,6 +432,10 @@ std::atomic_bool s_recompilerThreadStopSignal{false};
void PPCRecompiler_thread()
{
SetThreadName("PPCRecompiler");
#if PPCREC_FORCE_SYNCHRONOUS_COMPILATION
return;
#endif
while (true)
{
if(s_recompilerThreadStopSignal)
@ -475,6 +616,41 @@ void PPCRecompiler_invalidateRange(uint32 startAddr, uint32 endAddr)
#if defined(ARCH_X86_64)
void PPCRecompiler_initPlatform()
{
ppcRecompilerInstanceData->_x64XMM_xorNegateMaskBottom[0] = 1ULL << 63ULL;
ppcRecompilerInstanceData->_x64XMM_xorNegateMaskBottom[1] = 0ULL;
ppcRecompilerInstanceData->_x64XMM_xorNegateMaskPair[0] = 1ULL << 63ULL;
ppcRecompilerInstanceData->_x64XMM_xorNegateMaskPair[1] = 1ULL << 63ULL;
ppcRecompilerInstanceData->_x64XMM_xorNOTMask[0] = 0xFFFFFFFFFFFFFFFFULL;
ppcRecompilerInstanceData->_x64XMM_xorNOTMask[1] = 0xFFFFFFFFFFFFFFFFULL;
ppcRecompilerInstanceData->_x64XMM_andAbsMaskBottom[0] = ~(1ULL << 63ULL);
ppcRecompilerInstanceData->_x64XMM_andAbsMaskBottom[1] = ~0ULL;
ppcRecompilerInstanceData->_x64XMM_andAbsMaskPair[0] = ~(1ULL << 63ULL);
ppcRecompilerInstanceData->_x64XMM_andAbsMaskPair[1] = ~(1ULL << 63ULL);
ppcRecompilerInstanceData->_x64XMM_andFloatAbsMaskBottom[0] = ~(1 << 31);
ppcRecompilerInstanceData->_x64XMM_andFloatAbsMaskBottom[1] = 0xFFFFFFFF;
ppcRecompilerInstanceData->_x64XMM_andFloatAbsMaskBottom[2] = 0xFFFFFFFF;
ppcRecompilerInstanceData->_x64XMM_andFloatAbsMaskBottom[3] = 0xFFFFFFFF;
ppcRecompilerInstanceData->_x64XMM_singleWordMask[0] = 0xFFFFFFFFULL;
ppcRecompilerInstanceData->_x64XMM_singleWordMask[1] = 0ULL;
ppcRecompilerInstanceData->_x64XMM_constDouble1_1[0] = 1.0;
ppcRecompilerInstanceData->_x64XMM_constDouble1_1[1] = 1.0;
ppcRecompilerInstanceData->_x64XMM_constDouble0_0[0] = 0.0;
ppcRecompilerInstanceData->_x64XMM_constDouble0_0[1] = 0.0;
ppcRecompilerInstanceData->_x64XMM_constFloat0_0[0] = 0.0f;
ppcRecompilerInstanceData->_x64XMM_constFloat0_0[1] = 0.0f;
ppcRecompilerInstanceData->_x64XMM_constFloat1_1[0] = 1.0f;
ppcRecompilerInstanceData->_x64XMM_constFloat1_1[1] = 1.0f;
*(uint32*)&ppcRecompilerInstanceData->_x64XMM_constFloatMin[0] = 0x00800000;
*(uint32*)&ppcRecompilerInstanceData->_x64XMM_constFloatMin[1] = 0x00800000;
ppcRecompilerInstanceData->_x64XMM_flushDenormalMask1[0] = 0x7F800000;
ppcRecompilerInstanceData->_x64XMM_flushDenormalMask1[1] = 0x7F800000;
ppcRecompilerInstanceData->_x64XMM_flushDenormalMask1[2] = 0x7F800000;
ppcRecompilerInstanceData->_x64XMM_flushDenormalMask1[3] = 0x7F800000;
ppcRecompilerInstanceData->_x64XMM_flushDenormalMaskResetSignBits[0] = ~0x80000000;
ppcRecompilerInstanceData->_x64XMM_flushDenormalMaskResetSignBits[1] = ~0x80000000;
ppcRecompilerInstanceData->_x64XMM_flushDenormalMaskResetSignBits[2] = ~0x80000000;
ppcRecompilerInstanceData->_x64XMM_flushDenormalMaskResetSignBits[3] = ~0x80000000;
// mxcsr
ppcRecompilerInstanceData->_x64XMM_mxCsr_ftzOn = 0x1F80 | 0x8000;
ppcRecompilerInstanceData->_x64XMM_mxCsr_ftzOff = 0x1F80;
@ -512,42 +688,6 @@ void PPCRecompiler_init()
PPCRecompiler_allocateRange(mmuRange_TRAMPOLINE_AREA.getBase(), mmuRange_TRAMPOLINE_AREA.getSize());
PPCRecompiler_allocateRange(mmuRange_CODECAVE.getBase(), mmuRange_CODECAVE.getSize());
// init x64 recompiler instance data
ppcRecompilerInstanceData->_x64XMM_xorNegateMaskBottom[0] = 1ULL << 63ULL;
ppcRecompilerInstanceData->_x64XMM_xorNegateMaskBottom[1] = 0ULL;
ppcRecompilerInstanceData->_x64XMM_xorNegateMaskPair[0] = 1ULL << 63ULL;
ppcRecompilerInstanceData->_x64XMM_xorNegateMaskPair[1] = 1ULL << 63ULL;
ppcRecompilerInstanceData->_x64XMM_xorNOTMask[0] = 0xFFFFFFFFFFFFFFFFULL;
ppcRecompilerInstanceData->_x64XMM_xorNOTMask[1] = 0xFFFFFFFFFFFFFFFFULL;
ppcRecompilerInstanceData->_x64XMM_andAbsMaskBottom[0] = ~(1ULL << 63ULL);
ppcRecompilerInstanceData->_x64XMM_andAbsMaskBottom[1] = ~0ULL;
ppcRecompilerInstanceData->_x64XMM_andAbsMaskPair[0] = ~(1ULL << 63ULL);
ppcRecompilerInstanceData->_x64XMM_andAbsMaskPair[1] = ~(1ULL << 63ULL);
ppcRecompilerInstanceData->_x64XMM_andFloatAbsMaskBottom[0] = ~(1 << 31);
ppcRecompilerInstanceData->_x64XMM_andFloatAbsMaskBottom[1] = 0xFFFFFFFF;
ppcRecompilerInstanceData->_x64XMM_andFloatAbsMaskBottom[2] = 0xFFFFFFFF;
ppcRecompilerInstanceData->_x64XMM_andFloatAbsMaskBottom[3] = 0xFFFFFFFF;
ppcRecompilerInstanceData->_x64XMM_singleWordMask[0] = 0xFFFFFFFFULL;
ppcRecompilerInstanceData->_x64XMM_singleWordMask[1] = 0ULL;
ppcRecompilerInstanceData->_x64XMM_constDouble1_1[0] = 1.0;
ppcRecompilerInstanceData->_x64XMM_constDouble1_1[1] = 1.0;
ppcRecompilerInstanceData->_x64XMM_constDouble0_0[0] = 0.0;
ppcRecompilerInstanceData->_x64XMM_constDouble0_0[1] = 0.0;
ppcRecompilerInstanceData->_x64XMM_constFloat0_0[0] = 0.0f;
ppcRecompilerInstanceData->_x64XMM_constFloat0_0[1] = 0.0f;
ppcRecompilerInstanceData->_x64XMM_constFloat1_1[0] = 1.0f;
ppcRecompilerInstanceData->_x64XMM_constFloat1_1[1] = 1.0f;
*(uint32*)&ppcRecompilerInstanceData->_x64XMM_constFloatMin[0] = 0x00800000;
*(uint32*)&ppcRecompilerInstanceData->_x64XMM_constFloatMin[1] = 0x00800000;
ppcRecompilerInstanceData->_x64XMM_flushDenormalMask1[0] = 0x7F800000;
ppcRecompilerInstanceData->_x64XMM_flushDenormalMask1[1] = 0x7F800000;
ppcRecompilerInstanceData->_x64XMM_flushDenormalMask1[2] = 0x7F800000;
ppcRecompilerInstanceData->_x64XMM_flushDenormalMask1[3] = 0x7F800000;
ppcRecompilerInstanceData->_x64XMM_flushDenormalMaskResetSignBits[0] = ~0x80000000;
ppcRecompilerInstanceData->_x64XMM_flushDenormalMaskResetSignBits[1] = ~0x80000000;
ppcRecompilerInstanceData->_x64XMM_flushDenormalMaskResetSignBits[2] = ~0x80000000;
ppcRecompilerInstanceData->_x64XMM_flushDenormalMaskResetSignBits[3] = ~0x80000000;
// setup GQR scale tables
for (uint32 i = 0; i < 32; i++)
@ -623,4 +763,4 @@ void PPCRecompiler_Shutdown()
// mark as unmapped
ppcRecompiler_reservedBlockMask[i] = false;
}
}
}

View file

@ -1,4 +1,4 @@
#include <vector>
#pragma once
#define PPC_REC_CODE_AREA_START (0x00000000) // lower bound of executable memory area. Recompiler expects this address to be 0
#define PPC_REC_CODE_AREA_END (0x10000000) // upper bound of executable memory area
@ -6,336 +6,113 @@
#define PPC_REC_ALIGN_TO_4MB(__v) (((__v)+4*1024*1024-1)&~(4*1024*1024-1))
#define PPC_REC_MAX_VIRTUAL_GPR (40) // enough to store 32 GPRs + a few SPRs + temp registers (usually only 1-2)
#define PPC_REC_MAX_VIRTUAL_GPR (40 + 32) // enough to store 32 GPRs + a few SPRs + temp registers (usually only 1-2)
typedef struct
struct ppcRecRange_t
{
uint32 ppcAddress;
uint32 ppcSize;
//void* x86Start;
//size_t x86Size;
void* storedRange;
}ppcRecRange_t;
};
typedef struct
struct PPCRecFunction_t
{
uint32 ppcAddress;
uint32 ppcSize; // ppc code size of function
void* x86Code; // pointer to x86 code
size_t x86Size;
std::vector<ppcRecRange_t> list_ranges;
}PPCRecFunction_t;
#define PPCREC_IML_OP_FLAG_SIGNEXTEND (1<<0)
#define PPCREC_IML_OP_FLAG_SWITCHENDIAN (1<<1)
#define PPCREC_IML_OP_FLAG_NOT_EXPANDED (1<<2) // set single-precision load instructions to indicate that the value should not be rounded to double-precision
#define PPCREC_IML_OP_FLAG_UNUSED (1<<7) // used to mark instructions that are not used
typedef struct
{
uint8 type;
uint8 operation;
uint8 crRegister; // set to 0xFF if not set, not all IML instruction types support cr.
uint8 crMode; // only used when crRegister is valid, used to differentiate between various forms of condition flag set/clear behavior
uint32 crIgnoreMask; // bit set for every respective CR bit that doesn't need to be updated
uint32 associatedPPCAddress; // ppc address that is associated with this instruction
union
{
struct
{
uint8 _padding[7];
}padding;
struct
{
// R (op) A [update cr* in mode *]
uint8 registerResult;
uint8 registerA;
}op_r_r;
struct
{
// R = A (op) B [update cr* in mode *]
uint8 registerResult;
uint8 registerA;
uint8 registerB;
}op_r_r_r;
struct
{
// R = A (op) immS32 [update cr* in mode *]
uint8 registerResult;
uint8 registerA;
sint32 immS32;
}op_r_r_s32;
struct
{
// R/F = NAME or NAME = R/F
uint8 registerIndex;
uint8 copyWidth;
uint32 name;
uint8 flags;
}op_r_name;
struct
{
// R (op) s32 [update cr* in mode *]
uint8 registerIndex;
sint32 immS32;
}op_r_immS32;
struct
{
uint32 address;
uint8 flags;
}op_jumpmark;
struct
{
uint32 param;
uint32 param2;
uint16 paramU16;
}op_macro;
struct
{
uint32 jumpmarkAddress;
bool jumpAccordingToSegment; //PPCRecImlSegment_t* destinationSegment; // if set, this replaces jumpmarkAddress
uint8 condition; // only used when crRegisterIndex is 8 or above (update: Apparently only used to mark jumps without a condition? -> Cleanup)
uint8 crRegisterIndex;
uint8 crBitIndex;
bool bitMustBeSet;
}op_conditionalJump;
struct
{
uint8 registerData;
uint8 registerMem;
uint8 registerMem2;
uint8 registerGQR;
uint8 copyWidth;
//uint8 flags;
struct
{
bool swapEndian : 1;
bool signExtend : 1;
bool notExpanded : 1; // for floats
}flags2;
uint8 mode; // transfer mode (copy width, ps0/ps1 behavior)
sint32 immS32;
}op_storeLoad;
struct
{
struct
{
uint8 registerMem;
sint32 immS32;
}src;
struct
{
uint8 registerMem;
sint32 immS32;
}dst;
uint8 copyWidth;
}op_mem2mem;
struct
{
uint8 registerResult;
uint8 registerOperand;
uint8 flags;
}op_fpr_r_r;
struct
{
uint8 registerResult;
uint8 registerOperandA;
uint8 registerOperandB;
uint8 flags;
}op_fpr_r_r_r;
struct
{
uint8 registerResult;
uint8 registerOperandA;
uint8 registerOperandB;
uint8 registerOperandC;
uint8 flags;
}op_fpr_r_r_r_r;
struct
{
uint8 registerResult;
//uint8 flags;
}op_fpr_r;
struct
{
uint32 ppcAddress;
uint32 x64Offset;
}op_ppcEnter;
struct
{
uint8 crD; // crBitIndex (result)
uint8 crA; // crBitIndex
uint8 crB; // crBitIndex
}op_cr;
// conditional operations (emitted if supported by target platform)
struct
{
// r_s32
uint8 registerIndex;
sint32 immS32;
// condition
uint8 crRegisterIndex;
uint8 crBitIndex;
bool bitMustBeSet;
}op_conditional_r_s32;
};
}PPCRecImlInstruction_t;
typedef struct _PPCRecImlSegment_t PPCRecImlSegment_t;
typedef struct _ppcRecompilerSegmentPoint_t
{
sint32 index;
PPCRecImlSegment_t* imlSegment;
_ppcRecompilerSegmentPoint_t* next;
_ppcRecompilerSegmentPoint_t* prev;
}ppcRecompilerSegmentPoint_t;
struct raLivenessLocation_t
{
sint32 index;
bool isRead;
bool isWrite;
raLivenessLocation_t() = default;
raLivenessLocation_t(sint32 index, bool isRead, bool isWrite)
: index(index), isRead(isRead), isWrite(isWrite) {};
};
struct raLivenessSubrangeLink_t
{
struct raLivenessSubrange_t* prev;
struct raLivenessSubrange_t* next;
};
#include "Cafe/HW/Espresso/Recompiler/IML/IMLInstruction.h"
#include "Cafe/HW/Espresso/Recompiler/IML/IMLSegment.h"
struct raLivenessSubrange_t
{
struct raLivenessRange_t* range;
PPCRecImlSegment_t* imlSegment;
ppcRecompilerSegmentPoint_t start;
ppcRecompilerSegmentPoint_t end;
// dirty state tracking
bool _noLoad;
bool hasStore;
bool hasStoreDelayed;
// next
raLivenessSubrange_t* subrangeBranchTaken;
raLivenessSubrange_t* subrangeBranchNotTaken;
// processing
uint32 lastIterationIndex;
// instruction locations
std::vector<raLivenessLocation_t> list_locations;
// linked list (subranges with same GPR virtual register)
raLivenessSubrangeLink_t link_sameVirtualRegisterGPR;
// linked list (all subranges for this segment)
raLivenessSubrangeLink_t link_segmentSubrangesGPR;
};
struct raLivenessRange_t
{
sint32 virtualRegister;
sint32 physicalRegister;
sint32 name;
std::vector<raLivenessSubrange_t*> list_subranges;
};
struct PPCSegmentRegisterAllocatorInfo_t
{
// analyzer stage
bool isPartOfProcessedLoop{}; // used during loop detection
sint32 lastIterationIndex{};
// linked lists
raLivenessSubrange_t* linkedList_allSubranges{};
raLivenessSubrange_t* linkedList_perVirtualGPR[PPC_REC_MAX_VIRTUAL_GPR]{};
};
struct PPCRecVGPRDistances_t
{
struct _RegArrayEntry
{
sint32 usageStart{};
sint32 usageEnd{};
}reg[PPC_REC_MAX_VIRTUAL_GPR];
bool isProcessed[PPC_REC_MAX_VIRTUAL_GPR]{};
};
typedef struct _PPCRecImlSegment_t
{
sint32 momentaryIndex{}; // index in segment list, generally not kept up to date except if needed (necessary for loop detection)
sint32 startOffset{}; // offset to first instruction in iml instruction list
sint32 count{}; // number of instructions in segment
uint32 ppcAddress{}; // ppc address (0xFFFFFFFF if not associated with an address)
uint32 x64Offset{}; // x64 code offset of segment start
uint32 cycleCount{}; // number of PPC cycles required to execute this segment (roughly)
// list of intermediate instructions in this segment
PPCRecImlInstruction_t* imlList{};
sint32 imlListSize{};
sint32 imlListCount{};
// segment link
_PPCRecImlSegment_t* nextSegmentBranchNotTaken{}; // this is also the default for segments where there is no branch
_PPCRecImlSegment_t* nextSegmentBranchTaken{};
bool nextSegmentIsUncertain{};
sint32 loopDepth{};
//sList_t* list_prevSegments;
std::vector<_PPCRecImlSegment_t*> list_prevSegments{};
// PPC range of segment
uint32 ppcAddrMin{};
uint32 ppcAddrMax{};
// enterable segments
bool isEnterable{}; // this segment can be entered from outside the recompiler (no preloaded registers necessary)
uint32 enterPPCAddress{}; // used if isEnterable is true
// jump destination segments
bool isJumpDestination{}; // segment is a destination for one or more (conditional) jumps
uint32 jumpDestinationPPCAddress{};
// PPC FPR use mask
bool ppcFPRUsed[32]{}; // same as ppcGPRUsed, but for FPR
// CR use mask
uint32 crBitsInput{}; // bits that are expected to be set from the previous segment (read in this segment but not overwritten)
uint32 crBitsRead{}; // all bits that are read in this segment
uint32 crBitsWritten{}; // bits that are written in this segment
// register allocator info
PPCSegmentRegisterAllocatorInfo_t raInfo{};
PPCRecVGPRDistances_t raDistances{};
bool raRangeExtendProcessed{};
// segment points
ppcRecompilerSegmentPoint_t* segmentPointList{};
}PPCRecImlSegment_t;
struct IMLInstruction* PPCRecompilerImlGen_generateNewEmptyInstruction(struct ppcImlGenContext_t* ppcImlGenContext);
struct ppcImlGenContext_t
{
PPCRecFunction_t* functionRef;
class PPCFunctionBoundaryTracker* boundaryTracker;
uint32* currentInstruction;
uint32 ppcAddressOfCurrentInstruction;
IMLSegment* currentOutputSegment;
struct PPCBasicBlockInfo* currentBasicBlock{};
// fpr mode
bool LSQE{ true };
bool PSE{ true };
// cycle counter
uint32 cyclesSinceLastBranch; // used to track ppc cycles
// temporary general purpose registers
uint32 mappedRegister[PPC_REC_MAX_VIRTUAL_GPR];
// temporary floating point registers (single and double precision)
uint32 mappedFPRRegister[256];
// list of intermediate instructions
PPCRecImlInstruction_t* imlList;
sint32 imlListSize;
sint32 imlListCount;
std::unordered_map<IMLName, IMLReg> mappedRegs;
uint32 GetMaxRegId() const
{
if (mappedRegs.empty())
return 0;
return mappedRegs.size()-1;
}
// list of segments
PPCRecImlSegment_t** segmentList;
sint32 segmentListSize;
sint32 segmentListCount;
std::vector<IMLSegment*> segmentList2;
// code generation control
bool hasFPUInstruction; // if true, PPCEnter macro will create FP_UNAVAIL checks -> Not needed in user mode
// register allocator info
struct
{
std::vector<raLivenessRange_t*> list_ranges;
}raInfo;
// analysis info
struct
{
bool modifiesGQR[8];
}tracking;
// debug helpers
uint32 debug_entryPPCAddress{0};
~ppcImlGenContext_t()
{
for (IMLSegment* imlSegment : segmentList2)
delete imlSegment;
segmentList2.clear();
}
// append raw instruction
IMLInstruction& emitInst()
{
return *PPCRecompilerImlGen_generateNewEmptyInstruction(this);
}
IMLSegment* NewSegment()
{
IMLSegment* seg = new IMLSegment();
segmentList2.emplace_back(seg);
return seg;
}
size_t GetSegmentIndex(IMLSegment* seg)
{
for (size_t i = 0; i < segmentList2.size(); i++)
{
if (segmentList2[i] == seg)
return i;
}
cemu_assert_error();
return 0;
}
IMLSegment* InsertSegment(size_t index)
{
IMLSegment* newSeg = new IMLSegment();
segmentList2.insert(segmentList2.begin() + index, 1, newSeg);
return newSeg;
}
std::span<IMLSegment*> InsertSegments(size_t index, size_t count)
{
segmentList2.insert(segmentList2.begin() + index, count, {});
for (size_t i = index; i < (index + count); i++)
segmentList2[i] = new IMLSegment();
return { segmentList2.data() + index, count};
}
void UpdateSegmentIndices()
{
for (size_t i = 0; i < segmentList2.size(); i++)
segmentList2[i]->momentaryIndex = (sint32)i;
}
};
typedef void ATTR_MS_ABI (*PPCREC_JUMP_ENTRY)();
@ -385,8 +162,6 @@ extern void ATTR_MS_ABI (*PPCRecompiler_leaveRecompilerCode_unvisited)();
#define PPC_REC_INVALID_FUNCTION ((PPCRecFunction_t*)-1)
// todo - move some of the stuff above into PPCRecompilerInternal.h
// recompiler interface
void PPCRecompiler_recompileIfUnvisited(uint32 enterAddress);

View file

@ -1,275 +1,29 @@
bool PPCRecompiler_generateIntermediateCode(ppcImlGenContext_t& ppcImlGenContext, PPCRecFunction_t* PPCRecFunction, std::set<uint32>& entryAddresses, class PPCFunctionBoundaryTracker& boundaryTracker);
#define PPCREC_CR_REG_TEMP 8 // there are only 8 cr registers (0-7) we use the 8th as temporary cr register that is never stored (BDNZ instruction for example)
IMLSegment* PPCIMLGen_CreateSplitSegmentAtEnd(ppcImlGenContext_t& ppcImlGenContext, PPCBasicBlockInfo& basicBlockInfo);
IMLSegment* PPCIMLGen_CreateNewSegmentAsBranchTarget(ppcImlGenContext_t& ppcImlGenContext, PPCBasicBlockInfo& basicBlockInfo);
enum
{
PPCREC_IML_OP_ASSIGN, // '=' operator
PPCREC_IML_OP_ENDIAN_SWAP, // '=' operator with 32bit endian swap
PPCREC_IML_OP_ADD, // '+' operator
PPCREC_IML_OP_SUB, // '-' operator
PPCREC_IML_OP_SUB_CARRY_UPDATE_CARRY, // complex operation, result = operand + ~operand2 + carry bit, updates carry bit
PPCREC_IML_OP_COMPARE_SIGNED, // arithmetic/signed comparison operator (updates cr)
PPCREC_IML_OP_COMPARE_UNSIGNED, // logical/unsigned comparison operator (updates cr)
PPCREC_IML_OP_MULTIPLY_SIGNED, // '*' operator (signed multiply)
PPCREC_IML_OP_MULTIPLY_HIGH_UNSIGNED, // unsigned 64bit multiply, store only high 32bit-word of result
PPCREC_IML_OP_MULTIPLY_HIGH_SIGNED, // signed 64bit multiply, store only high 32bit-word of result
PPCREC_IML_OP_DIVIDE_SIGNED, // '/' operator (signed divide)
PPCREC_IML_OP_DIVIDE_UNSIGNED, // '/' operator (unsigned divide)
PPCREC_IML_OP_ADD_CARRY, // complex operation, result = operand + carry bit, updates carry bit
PPCREC_IML_OP_ADD_CARRY_ME, // complex operation, result = operand + carry bit + (-1), updates carry bit
PPCREC_IML_OP_ADD_UPDATE_CARRY, // '+' operator but also updates carry flag
PPCREC_IML_OP_ADD_CARRY_UPDATE_CARRY, // '+' operator and also adds carry, updates carry flag
// assign operators with cast
PPCREC_IML_OP_ASSIGN_S16_TO_S32, // copy 16bit and sign extend
PPCREC_IML_OP_ASSIGN_S8_TO_S32, // copy 8bit and sign extend
// binary operation
PPCREC_IML_OP_OR, // '|' operator
PPCREC_IML_OP_ORC, // '|' operator, second operand is complemented first
PPCREC_IML_OP_AND, // '&' operator
PPCREC_IML_OP_XOR, // '^' operator
PPCREC_IML_OP_LEFT_ROTATE, // left rotate operator
PPCREC_IML_OP_LEFT_SHIFT, // shift left operator
PPCREC_IML_OP_RIGHT_SHIFT, // right shift operator (unsigned)
PPCREC_IML_OP_NOT, // complement each bit
PPCREC_IML_OP_NEG, // negate
// ppc
PPCREC_IML_OP_RLWIMI, // RLWIMI instruction (rotate, merge based on mask)
PPCREC_IML_OP_SRAW, // SRAWI/SRAW instruction (algebraic shift right, sets ca flag)
PPCREC_IML_OP_SLW, // SLW (shift based on register by up to 63 bits)
PPCREC_IML_OP_SRW, // SRW (shift based on register by up to 63 bits)
PPCREC_IML_OP_CNTLZW,
PPCREC_IML_OP_SUBFC, // SUBFC and SUBFIC (subtract from and set carry)
PPCREC_IML_OP_DCBZ, // clear 32 bytes aligned to 0x20
PPCREC_IML_OP_MFCR, // copy cr to gpr
PPCREC_IML_OP_MTCRF, // copy gpr to cr (with mask)
// condition register
PPCREC_IML_OP_CR_CLEAR, // clear cr bit
PPCREC_IML_OP_CR_SET, // set cr bit
PPCREC_IML_OP_CR_OR, // OR cr bits
PPCREC_IML_OP_CR_ORC, // OR cr bits, complement second input operand bit first
PPCREC_IML_OP_CR_AND, // AND cr bits
PPCREC_IML_OP_CR_ANDC, // AND cr bits, complement second input operand bit first
// FPU
PPCREC_IML_OP_FPR_ADD_BOTTOM,
PPCREC_IML_OP_FPR_ADD_PAIR,
PPCREC_IML_OP_FPR_SUB_PAIR,
PPCREC_IML_OP_FPR_SUB_BOTTOM,
PPCREC_IML_OP_FPR_MULTIPLY_BOTTOM,
PPCREC_IML_OP_FPR_MULTIPLY_PAIR,
PPCREC_IML_OP_FPR_DIVIDE_BOTTOM,
PPCREC_IML_OP_FPR_DIVIDE_PAIR,
PPCREC_IML_OP_FPR_COPY_BOTTOM_TO_BOTTOM_AND_TOP,
PPCREC_IML_OP_FPR_COPY_TOP_TO_BOTTOM_AND_TOP,
PPCREC_IML_OP_FPR_COPY_BOTTOM_TO_BOTTOM,
PPCREC_IML_OP_FPR_COPY_BOTTOM_TO_TOP, // leave bottom of destination untouched
PPCREC_IML_OP_FPR_COPY_TOP_TO_TOP, // leave bottom of destination untouched
PPCREC_IML_OP_FPR_COPY_TOP_TO_BOTTOM, // leave top of destination untouched
PPCREC_IML_OP_FPR_COPY_BOTTOM_AND_TOP_SWAPPED,
PPCREC_IML_OP_FPR_EXPAND_BOTTOM32_TO_BOTTOM64_AND_TOP64, // expand bottom f32 to f64 in bottom and top half
PPCREC_IML_OP_FPR_BOTTOM_FRES_TO_BOTTOM_AND_TOP, // calculate reciprocal with Espresso accuracy of source bottom half and write result to destination bottom and top half
PPCREC_IML_OP_FPR_FCMPO_BOTTOM,
PPCREC_IML_OP_FPR_FCMPU_BOTTOM,
PPCREC_IML_OP_FPR_FCMPU_TOP,
PPCREC_IML_OP_FPR_NEGATE_BOTTOM,
PPCREC_IML_OP_FPR_NEGATE_PAIR,
PPCREC_IML_OP_FPR_ABS_BOTTOM, // abs(fp0)
PPCREC_IML_OP_FPR_ABS_PAIR,
PPCREC_IML_OP_FPR_FRES_PAIR, // 1.0/fp approx (Espresso accuracy)
PPCREC_IML_OP_FPR_FRSQRTE_PAIR, // 1.0/sqrt(fp) approx (Espresso accuracy)
PPCREC_IML_OP_FPR_NEGATIVE_ABS_BOTTOM, // -abs(fp0)
PPCREC_IML_OP_FPR_ROUND_TO_SINGLE_PRECISION_BOTTOM, // round 64bit double to 64bit double with 32bit float precision (in bottom half of xmm register)
PPCREC_IML_OP_FPR_ROUND_TO_SINGLE_PRECISION_PAIR, // round two 64bit doubles to 64bit double with 32bit float precision
PPCREC_IML_OP_FPR_BOTTOM_RECIPROCAL_SQRT,
PPCREC_IML_OP_FPR_BOTTOM_FCTIWZ,
PPCREC_IML_OP_FPR_SELECT_BOTTOM, // selectively copy bottom value from operand B or C based on value in operand A
PPCREC_IML_OP_FPR_SELECT_PAIR, // selectively copy top/bottom from operand B or C based on value in top/bottom of operand A
// PS
PPCREC_IML_OP_FPR_SUM0,
PPCREC_IML_OP_FPR_SUM1,
};
void PPCIMLGen_AssertIfNotLastSegmentInstruction(ppcImlGenContext_t& ppcImlGenContext);
#define PPCREC_IML_OP_FPR_COPY_PAIR (PPCREC_IML_OP_ASSIGN)
enum
{
PPCREC_IML_MACRO_BLR, // macro for BLR instruction code
PPCREC_IML_MACRO_BLRL, // macro for BLRL instruction code
PPCREC_IML_MACRO_BCTR, // macro for BCTR instruction code
PPCREC_IML_MACRO_BCTRL, // macro for BCTRL instruction code
PPCREC_IML_MACRO_BL, // call to different function (can be within same function)
PPCREC_IML_MACRO_B_FAR, // branch to different function
PPCREC_IML_MACRO_COUNT_CYCLES, // decrease current remaining thread cycles by a certain amount
PPCREC_IML_MACRO_HLE, // HLE function call
PPCREC_IML_MACRO_MFTB, // get TB register value (low or high)
PPCREC_IML_MACRO_LEAVE, // leaves recompiler and switches to interpeter
// debugging
PPCREC_IML_MACRO_DEBUGBREAK, // throws a debugbreak
};
enum
{
PPCREC_JUMP_CONDITION_NONE,
PPCREC_JUMP_CONDITION_E, // equal / zero
PPCREC_JUMP_CONDITION_NE, // not equal / not zero
PPCREC_JUMP_CONDITION_LE, // less or equal
PPCREC_JUMP_CONDITION_L, // less
PPCREC_JUMP_CONDITION_GE, // greater or equal
PPCREC_JUMP_CONDITION_G, // greater
// special case:
PPCREC_JUMP_CONDITION_SUMMARYOVERFLOW, // needs special handling
PPCREC_JUMP_CONDITION_NSUMMARYOVERFLOW, // not summaryoverflow
};
enum
{
PPCREC_CR_MODE_COMPARE_SIGNED,
PPCREC_CR_MODE_COMPARE_UNSIGNED, // alias logic compare
// others: PPCREC_CR_MODE_ARITHMETIC,
PPCREC_CR_MODE_ARITHMETIC, // arithmetic use (for use with add/sub instructions without generating extra code)
PPCREC_CR_MODE_LOGICAL,
};
enum
{
PPCREC_IML_TYPE_NONE,
PPCREC_IML_TYPE_NO_OP, // no-op instruction
PPCREC_IML_TYPE_JUMPMARK, // possible jump destination (generated before each ppc instruction)
PPCREC_IML_TYPE_R_R, // r* (op) *r
PPCREC_IML_TYPE_R_R_R, // r* = r* (op) r*
PPCREC_IML_TYPE_R_R_S32, // r* = r* (op) s32*
PPCREC_IML_TYPE_LOAD, // r* = [r*+s32*]
PPCREC_IML_TYPE_LOAD_INDEXED, // r* = [r*+r*]
PPCREC_IML_TYPE_STORE, // [r*+s32*] = r*
PPCREC_IML_TYPE_STORE_INDEXED, // [r*+r*] = r*
PPCREC_IML_TYPE_R_NAME, // r* = name
PPCREC_IML_TYPE_NAME_R, // name* = r*
PPCREC_IML_TYPE_R_S32, // r* (op) imm
PPCREC_IML_TYPE_MACRO,
PPCREC_IML_TYPE_CJUMP, // conditional jump
PPCREC_IML_TYPE_CJUMP_CYCLE_CHECK, // jumps only if remaining thread cycles >= 0
PPCREC_IML_TYPE_PPC_ENTER, // used to mark locations that should be written to recompilerCallTable
PPCREC_IML_TYPE_CR, // condition register specific operations (one or more operands)
// conditional
PPCREC_IML_TYPE_CONDITIONAL_R_S32,
// FPR
PPCREC_IML_TYPE_FPR_R_NAME, // name = f*
PPCREC_IML_TYPE_FPR_NAME_R, // f* = name
PPCREC_IML_TYPE_FPR_LOAD, // r* = (bitdepth) [r*+s32*] (single or paired single mode)
PPCREC_IML_TYPE_FPR_LOAD_INDEXED, // r* = (bitdepth) [r*+r*] (single or paired single mode)
PPCREC_IML_TYPE_FPR_STORE, // (bitdepth) [r*+s32*] = r* (single or paired single mode)
PPCREC_IML_TYPE_FPR_STORE_INDEXED, // (bitdepth) [r*+r*] = r* (single or paired single mode)
PPCREC_IML_TYPE_FPR_R_R,
PPCREC_IML_TYPE_FPR_R_R_R,
PPCREC_IML_TYPE_FPR_R_R_R_R,
PPCREC_IML_TYPE_FPR_R,
// special
PPCREC_IML_TYPE_MEM2MEM, // memory to memory copy (deprecated)
};
enum
{
PPCREC_NAME_NONE,
PPCREC_NAME_TEMPORARY,
PPCREC_NAME_R0 = 1000,
PPCREC_NAME_SPR0 = 2000,
PPCREC_NAME_FPR0 = 3000,
PPCREC_NAME_TEMPORARY_FPR0 = 4000, // 0 to 7
//PPCREC_NAME_CR0 = 3000, // value mapped condition register (usually it isn't needed and can be optimized away)
};
// special cases for LOAD/STORE
#define PPC_REC_LOAD_LWARX_MARKER (100) // lwarx instruction (similar to LWZX but sets reserved address/value)
#define PPC_REC_STORE_STWCX_MARKER (100) // stwcx instruction (similar to STWX but writes only if reservation from LWARX is valid)
#define PPC_REC_STORE_STSWI_1 (200) // stswi nb = 1
#define PPC_REC_STORE_STSWI_2 (201) // stswi nb = 2
#define PPC_REC_STORE_STSWI_3 (202) // stswi nb = 3
#define PPC_REC_STORE_LSWI_1 (200) // lswi nb = 1
#define PPC_REC_STORE_LSWI_2 (201) // lswi nb = 2
#define PPC_REC_STORE_LSWI_3 (202) // lswi nb = 3
#define PPC_REC_INVALID_REGISTER 0xFF
#define PPCREC_CR_BIT_LT 0
#define PPCREC_CR_BIT_GT 1
#define PPCREC_CR_BIT_EQ 2
#define PPCREC_CR_BIT_SO 3
enum
{
// fpr load
PPCREC_FPR_LD_MODE_SINGLE_INTO_PS0,
PPCREC_FPR_LD_MODE_SINGLE_INTO_PS0_PS1,
PPCREC_FPR_LD_MODE_DOUBLE_INTO_PS0,
PPCREC_FPR_LD_MODE_PSQ_GENERIC_PS0,
PPCREC_FPR_LD_MODE_PSQ_GENERIC_PS0_PS1,
PPCREC_FPR_LD_MODE_PSQ_FLOAT_PS0,
PPCREC_FPR_LD_MODE_PSQ_FLOAT_PS0_PS1,
PPCREC_FPR_LD_MODE_PSQ_S16_PS0,
PPCREC_FPR_LD_MODE_PSQ_S16_PS0_PS1,
PPCREC_FPR_LD_MODE_PSQ_U16_PS0,
PPCREC_FPR_LD_MODE_PSQ_U16_PS0_PS1,
PPCREC_FPR_LD_MODE_PSQ_S8_PS0,
PPCREC_FPR_LD_MODE_PSQ_S8_PS0_PS1,
PPCREC_FPR_LD_MODE_PSQ_U8_PS0,
PPCREC_FPR_LD_MODE_PSQ_U8_PS0_PS1,
// fpr store
PPCREC_FPR_ST_MODE_SINGLE_FROM_PS0, // store 1 single precision float from ps0
PPCREC_FPR_ST_MODE_DOUBLE_FROM_PS0, // store 1 double precision float from ps0
PPCREC_FPR_ST_MODE_UI32_FROM_PS0, // store raw low-32bit of PS0
PPCREC_FPR_ST_MODE_PSQ_GENERIC_PS0_PS1,
PPCREC_FPR_ST_MODE_PSQ_GENERIC_PS0,
PPCREC_FPR_ST_MODE_PSQ_FLOAT_PS0_PS1,
PPCREC_FPR_ST_MODE_PSQ_FLOAT_PS0,
PPCREC_FPR_ST_MODE_PSQ_S8_PS0,
PPCREC_FPR_ST_MODE_PSQ_S8_PS0_PS1,
PPCREC_FPR_ST_MODE_PSQ_U8_PS0,
PPCREC_FPR_ST_MODE_PSQ_U8_PS0_PS1,
PPCREC_FPR_ST_MODE_PSQ_U16_PS0,
PPCREC_FPR_ST_MODE_PSQ_U16_PS0_PS1,
PPCREC_FPR_ST_MODE_PSQ_S16_PS0,
PPCREC_FPR_ST_MODE_PSQ_S16_PS0_PS1,
};
bool PPCRecompiler_generateIntermediateCode(ppcImlGenContext_t& ppcImlGenContext, PPCRecFunction_t* PPCRecFunction, std::set<uint32>& entryAddresses);
void PPCRecompiler_freeContext(ppcImlGenContext_t* ppcImlGenContext); // todo - move to destructor
PPCRecImlInstruction_t* PPCRecompilerImlGen_generateNewEmptyInstruction(ppcImlGenContext_t* ppcImlGenContext);
void PPCRecompiler_pushBackIMLInstructions(PPCRecImlSegment_t* imlSegment, sint32 index, sint32 shiftBackCount);
PPCRecImlInstruction_t* PPCRecompiler_insertInstruction(PPCRecImlSegment_t* imlSegment, sint32 index);
IMLInstruction* PPCRecompilerImlGen_generateNewEmptyInstruction(ppcImlGenContext_t* ppcImlGenContext);
void PPCRecompiler_pushBackIMLInstructions(IMLSegment* imlSegment, sint32 index, sint32 shiftBackCount);
IMLInstruction* PPCRecompiler_insertInstruction(IMLSegment* imlSegment, sint32 index);
void PPCRecompilerIml_insertSegments(ppcImlGenContext_t* ppcImlGenContext, sint32 index, sint32 count);
void PPCRecompilerIml_setSegmentPoint(ppcRecompilerSegmentPoint_t* segmentPoint, PPCRecImlSegment_t* imlSegment, sint32 index);
void PPCRecompilerIml_removeSegmentPoint(ppcRecompilerSegmentPoint_t* segmentPoint);
void PPCRecompilerIml_setSegmentPoint(IMLSegmentPoint* segmentPoint, IMLSegment* imlSegment, sint32 index);
void PPCRecompilerIml_removeSegmentPoint(IMLSegmentPoint* segmentPoint);
// GPR register management
uint32 PPCRecompilerImlGen_loadRegister(ppcImlGenContext_t* ppcImlGenContext, uint32 mappedName, bool loadNew = false);
uint32 PPCRecompilerImlGen_loadOverwriteRegister(ppcImlGenContext_t* ppcImlGenContext, uint32 mappedName);
IMLReg PPCRecompilerImlGen_loadRegister(ppcImlGenContext_t* ppcImlGenContext, uint32 mappedName);
// FPR register management
uint32 PPCRecompilerImlGen_loadFPRRegister(ppcImlGenContext_t* ppcImlGenContext, uint32 mappedName, bool loadNew = false);
uint32 PPCRecompilerImlGen_loadOverwriteFPRRegister(ppcImlGenContext_t* ppcImlGenContext, uint32 mappedName);
IMLReg PPCRecompilerImlGen_loadFPRRegister(ppcImlGenContext_t* ppcImlGenContext, uint32 mappedName, bool loadNew = false);
IMLReg PPCRecompilerImlGen_loadOverwriteFPRRegister(ppcImlGenContext_t* ppcImlGenContext, uint32 mappedName);
// IML instruction generation
void PPCRecompilerImlGen_generateNewInstruction_jump(ppcImlGenContext_t* ppcImlGenContext, PPCRecImlInstruction_t* imlInstruction, uint32 jumpmarkAddress);
void PPCRecompilerImlGen_generateNewInstruction_jumpSegment(ppcImlGenContext_t* ppcImlGenContext, PPCRecImlInstruction_t* imlInstruction);
void PPCRecompilerImlGen_generateNewInstruction_r_s32(ppcImlGenContext_t* ppcImlGenContext, uint32 operation, uint8 registerIndex, sint32 immS32, uint32 copyWidth, bool signExtend, bool bigEndian, uint8 crRegister, uint32 crMode);
void PPCRecompilerImlGen_generateNewInstruction_conditional_r_s32(ppcImlGenContext_t* ppcImlGenContext, PPCRecImlInstruction_t* imlInstruction, uint32 operation, uint8 registerIndex, sint32 immS32, uint32 crRegisterIndex, uint32 crBitIndex, bool bitMustBeSet);
void PPCRecompilerImlGen_generateNewInstruction_r_r(ppcImlGenContext_t* ppcImlGenContext, PPCRecImlInstruction_t* imlInstruction, uint32 operation, uint8 registerResult, uint8 registerA, uint8 crRegister = PPC_REC_INVALID_REGISTER, uint8 crMode = 0);
// IML instruction generation (new style, can generate new instructions but also overwrite existing ones)
void PPCRecompilerImlGen_generateNewInstruction_noOp(ppcImlGenContext_t* ppcImlGenContext, PPCRecImlInstruction_t* imlInstruction);
void PPCRecompilerImlGen_generateNewInstruction_memory_memory(ppcImlGenContext_t* ppcImlGenContext, PPCRecImlInstruction_t* imlInstruction, uint8 srcMemReg, sint32 srcImmS32, uint8 dstMemReg, sint32 dstImmS32, uint8 copyWidth);
void PPCRecompilerImlGen_generateNewInstruction_fpr_r(ppcImlGenContext_t* ppcImlGenContext, PPCRecImlInstruction_t* imlInstruction, sint32 operation, uint8 registerResult, sint32 crRegister = PPC_REC_INVALID_REGISTER);
void PPCRecompilerImlGen_generateNewInstruction_conditional_r_s32(ppcImlGenContext_t* ppcImlGenContext, IMLInstruction* imlInstruction, uint32 operation, IMLReg registerIndex, sint32 immS32, uint32 crRegisterIndex, uint32 crBitIndex, bool bitMustBeSet);
void PPCRecompilerImlGen_generateNewInstruction_fpr_r(ppcImlGenContext_t* ppcImlGenContext, IMLInstruction* imlInstruction, sint32 operation, IMLReg registerResult);
// IML generation - FPU
bool PPCRecompilerImlGen_LFS(ppcImlGenContext_t* ppcImlGenContext, uint32 opcode);
@ -347,76 +101,4 @@ bool PPCRecompilerImlGen_PS_CMPU1(ppcImlGenContext_t* ppcImlGenContext, uint32 o
// IML general
bool PPCRecompiler_isSuffixInstruction(PPCRecImlInstruction_t* iml);
void PPCRecompilerIML_linkSegments(ppcImlGenContext_t* ppcImlGenContext);
void PPCRecompilerIml_setLinkBranchNotTaken(PPCRecImlSegment_t* imlSegmentSrc, PPCRecImlSegment_t* imlSegmentDst);
void PPCRecompilerIml_setLinkBranchTaken(PPCRecImlSegment_t* imlSegmentSrc, PPCRecImlSegment_t* imlSegmentDst);
void PPCRecompilerIML_relinkInputSegment(PPCRecImlSegment_t* imlSegmentOrig, PPCRecImlSegment_t* imlSegmentNew);
void PPCRecompilerIML_removeLink(PPCRecImlSegment_t* imlSegmentSrc, PPCRecImlSegment_t* imlSegmentDst);
void PPCRecompilerIML_isolateEnterableSegments(ppcImlGenContext_t* ppcImlGenContext);
PPCRecImlInstruction_t* PPCRecompilerIML_getLastInstruction(PPCRecImlSegment_t* imlSegment);
// IML analyzer
typedef struct
{
uint32 readCRBits;
uint32 writtenCRBits;
}PPCRecCRTracking_t;
bool PPCRecompilerImlAnalyzer_isTightFiniteLoop(PPCRecImlSegment_t* imlSegment);
bool PPCRecompilerImlAnalyzer_canTypeWriteCR(PPCRecImlInstruction_t* imlInstruction);
void PPCRecompilerImlAnalyzer_getCRTracking(PPCRecImlInstruction_t* imlInstruction, PPCRecCRTracking_t* crTracking);
// IML optimizer
bool PPCRecompiler_reduceNumberOfFPRRegisters(ppcImlGenContext_t* ppcImlGenContext);
bool PPCRecompiler_manageFPRRegisters(ppcImlGenContext_t* ppcImlGenContext);
void PPCRecompiler_removeRedundantCRUpdates(ppcImlGenContext_t* ppcImlGenContext);
void PPCRecompiler_optimizeDirectFloatCopies(ppcImlGenContext_t* ppcImlGenContext);
void PPCRecompiler_optimizeDirectIntegerCopies(ppcImlGenContext_t* ppcImlGenContext);
void PPCRecompiler_optimizePSQLoadAndStore(ppcImlGenContext_t* ppcImlGenContext);
// IML register allocator
void PPCRecompilerImm_allocateRegisters(ppcImlGenContext_t* ppcImlGenContext);
// late optimizations
void PPCRecompiler_reorderConditionModifyInstructions(ppcImlGenContext_t* ppcImlGenContext);
// debug
void PPCRecompiler_dumpIMLSegment(PPCRecImlSegment_t* imlSegment, sint32 segmentIndex, bool printLivenessRangeInfo = false);
typedef struct
{
union
{
struct
{
sint16 readNamedReg1;
sint16 readNamedReg2;
sint16 readNamedReg3;
sint16 writtenNamedReg1;
};
sint16 gpr[4]; // 3 read + 1 write
};
// FPR
union
{
struct
{
// note: If destination operand is not fully written, it will be added as a read FPR as well
sint16 readFPR1;
sint16 readFPR2;
sint16 readFPR3;
sint16 readFPR4; // usually this is set to the result FPR if only partially overwritten
sint16 writtenFPR1;
};
sint16 fpr[4];
};
}PPCImlOptimizerUsedRegisters_t;
void PPCRecompiler_checkRegisterUsage(ppcImlGenContext_t* ppcImlGenContext, PPCRecImlInstruction_t* imlInstruction, PPCImlOptimizerUsedRegisters_t* registersUsed);

View file

@ -1,137 +0,0 @@
#include "PPCRecompiler.h"
#include "PPCRecompilerIml.h"
#include "util/helpers/fixedSizeList.h"
#include "Cafe/HW/Espresso/Interpreter/PPCInterpreterInternal.h"
/*
* Initializes a single segment and returns true if it is a finite loop
*/
bool PPCRecompilerImlAnalyzer_isTightFiniteLoop(PPCRecImlSegment_t* imlSegment)
{
bool isTightFiniteLoop = false;
// base criteria, must jump to beginning of same segment
if (imlSegment->nextSegmentBranchTaken != imlSegment)
return false;
// loops using BDNZ are assumed to always be finite
for (sint32 t = 0; t < imlSegment->imlListCount; t++)
{
if (imlSegment->imlList[t].type == PPCREC_IML_TYPE_R_S32 && imlSegment->imlList[t].operation == PPCREC_IML_OP_SUB && imlSegment->imlList[t].crRegister == 8)
{
return true;
}
}
// for non-BDNZ loops, check for common patterns
// risky approach, look for ADD/SUB operations and assume that potential overflow means finite (does not include r_r_s32 ADD/SUB)
// this catches most loops with load-update and store-update instructions, but also those with decrementing counters
FixedSizeList<sint32, 64, true> list_modifiedRegisters;
for (sint32 t = 0; t < imlSegment->imlListCount; t++)
{
if (imlSegment->imlList[t].type == PPCREC_IML_TYPE_R_S32 && (imlSegment->imlList[t].operation == PPCREC_IML_OP_ADD || imlSegment->imlList[t].operation == PPCREC_IML_OP_SUB) )
{
list_modifiedRegisters.addUnique(imlSegment->imlList[t].op_r_immS32.registerIndex);
}
}
if (list_modifiedRegisters.count > 0)
{
// remove all registers from the list that are modified by non-ADD/SUB instructions
// todo: We should also cover the case where ADD+SUB on the same register cancel the effect out
PPCImlOptimizerUsedRegisters_t registersUsed;
for (sint32 t = 0; t < imlSegment->imlListCount; t++)
{
if (imlSegment->imlList[t].type == PPCREC_IML_TYPE_R_S32 && (imlSegment->imlList[t].operation == PPCREC_IML_OP_ADD || imlSegment->imlList[t].operation == PPCREC_IML_OP_SUB))
continue;
PPCRecompiler_checkRegisterUsage(NULL, imlSegment->imlList + t, &registersUsed);
if(registersUsed.writtenNamedReg1 < 0)
continue;
list_modifiedRegisters.remove(registersUsed.writtenNamedReg1);
}
if (list_modifiedRegisters.count > 0)
{
return true;
}
}
return false;
}
/*
* Returns true if the imlInstruction can overwrite CR (depending on value of ->crRegister)
*/
bool PPCRecompilerImlAnalyzer_canTypeWriteCR(PPCRecImlInstruction_t* imlInstruction)
{
if (imlInstruction->type == PPCREC_IML_TYPE_R_R)
return true;
if (imlInstruction->type == PPCREC_IML_TYPE_R_R_R)
return true;
if (imlInstruction->type == PPCREC_IML_TYPE_R_R_S32)
return true;
if (imlInstruction->type == PPCREC_IML_TYPE_R_S32)
return true;
if (imlInstruction->type == PPCREC_IML_TYPE_FPR_R_R)
return true;
if (imlInstruction->type == PPCREC_IML_TYPE_FPR_R_R_R)
return true;
if (imlInstruction->type == PPCREC_IML_TYPE_FPR_R_R_R_R)
return true;
if (imlInstruction->type == PPCREC_IML_TYPE_FPR_R)
return true;
return false;
}
void PPCRecompilerImlAnalyzer_getCRTracking(PPCRecImlInstruction_t* imlInstruction, PPCRecCRTracking_t* crTracking)
{
crTracking->readCRBits = 0;
crTracking->writtenCRBits = 0;
if (imlInstruction->type == PPCREC_IML_TYPE_CJUMP)
{
if (imlInstruction->op_conditionalJump.condition != PPCREC_JUMP_CONDITION_NONE)
{
uint32 crBitFlag = 1 << (imlInstruction->op_conditionalJump.crRegisterIndex * 4 + imlInstruction->op_conditionalJump.crBitIndex);
crTracking->readCRBits = (crBitFlag);
}
}
else if (imlInstruction->type == PPCREC_IML_TYPE_CONDITIONAL_R_S32)
{
uint32 crBitFlag = 1 << (imlInstruction->op_conditional_r_s32.crRegisterIndex * 4 + imlInstruction->op_conditional_r_s32.crBitIndex);
crTracking->readCRBits = crBitFlag;
}
else if (imlInstruction->type == PPCREC_IML_TYPE_R_S32 && imlInstruction->operation == PPCREC_IML_OP_MFCR)
{
crTracking->readCRBits = 0xFFFFFFFF;
}
else if (imlInstruction->type == PPCREC_IML_TYPE_R_S32 && imlInstruction->operation == PPCREC_IML_OP_MTCRF)
{
crTracking->writtenCRBits |= ppc_MTCRFMaskToCRBitMask((uint32)imlInstruction->op_r_immS32.immS32);
}
else if (imlInstruction->type == PPCREC_IML_TYPE_CR)
{
if (imlInstruction->operation == PPCREC_IML_OP_CR_CLEAR ||
imlInstruction->operation == PPCREC_IML_OP_CR_SET)
{
uint32 crBitFlag = 1 << (imlInstruction->op_cr.crD);
crTracking->writtenCRBits = crBitFlag;
}
else if (imlInstruction->operation == PPCREC_IML_OP_CR_OR ||
imlInstruction->operation == PPCREC_IML_OP_CR_ORC ||
imlInstruction->operation == PPCREC_IML_OP_CR_AND ||
imlInstruction->operation == PPCREC_IML_OP_CR_ANDC)
{
uint32 crBitFlag = 1 << (imlInstruction->op_cr.crD);
crTracking->writtenCRBits = crBitFlag;
crBitFlag = 1 << (imlInstruction->op_cr.crA);
crTracking->readCRBits = crBitFlag;
crBitFlag = 1 << (imlInstruction->op_cr.crB);
crTracking->readCRBits |= crBitFlag;
}
else
assert_dbg();
}
else if (PPCRecompilerImlAnalyzer_canTypeWriteCR(imlInstruction) && imlInstruction->crRegister >= 0 && imlInstruction->crRegister <= 7)
{
crTracking->writtenCRBits |= (0xF << (imlInstruction->crRegister * 4));
}
else if ((imlInstruction->type == PPCREC_IML_TYPE_STORE || imlInstruction->type == PPCREC_IML_TYPE_STORE_INDEXED) && imlInstruction->op_storeLoad.copyWidth == PPC_REC_STORE_STWCX_MARKER)
{
// overwrites CR0
crTracking->writtenCRBits |= (0xF << 0);
}
}

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -1,399 +0,0 @@
#include "PPCRecompiler.h"
#include "PPCRecompilerIml.h"
#include "PPCRecompilerX64.h"
#include "PPCRecompilerImlRanges.h"
#include "util/helpers/MemoryPool.h"
void PPCRecRARange_addLink_perVirtualGPR(raLivenessSubrange_t** root, raLivenessSubrange_t* subrange)
{
#ifdef CEMU_DEBUG_ASSERT
if ((*root) && (*root)->range->virtualRegister != subrange->range->virtualRegister)
assert_dbg();
#endif
subrange->link_sameVirtualRegisterGPR.next = *root;
if (*root)
(*root)->link_sameVirtualRegisterGPR.prev = subrange;
subrange->link_sameVirtualRegisterGPR.prev = nullptr;
*root = subrange;
}
void PPCRecRARange_addLink_allSubrangesGPR(raLivenessSubrange_t** root, raLivenessSubrange_t* subrange)
{
subrange->link_segmentSubrangesGPR.next = *root;
if (*root)
(*root)->link_segmentSubrangesGPR.prev = subrange;
subrange->link_segmentSubrangesGPR.prev = nullptr;
*root = subrange;
}
void PPCRecRARange_removeLink_perVirtualGPR(raLivenessSubrange_t** root, raLivenessSubrange_t* subrange)
{
raLivenessSubrange_t* tempPrev = subrange->link_sameVirtualRegisterGPR.prev;
if (subrange->link_sameVirtualRegisterGPR.prev)
subrange->link_sameVirtualRegisterGPR.prev->link_sameVirtualRegisterGPR.next = subrange->link_sameVirtualRegisterGPR.next;
else
(*root) = subrange->link_sameVirtualRegisterGPR.next;
if (subrange->link_sameVirtualRegisterGPR.next)
subrange->link_sameVirtualRegisterGPR.next->link_sameVirtualRegisterGPR.prev = tempPrev;
#ifdef CEMU_DEBUG_ASSERT
subrange->link_sameVirtualRegisterGPR.prev = (raLivenessSubrange_t*)1;
subrange->link_sameVirtualRegisterGPR.next = (raLivenessSubrange_t*)1;
#endif
}
void PPCRecRARange_removeLink_allSubrangesGPR(raLivenessSubrange_t** root, raLivenessSubrange_t* subrange)
{
raLivenessSubrange_t* tempPrev = subrange->link_segmentSubrangesGPR.prev;
if (subrange->link_segmentSubrangesGPR.prev)
subrange->link_segmentSubrangesGPR.prev->link_segmentSubrangesGPR.next = subrange->link_segmentSubrangesGPR.next;
else
(*root) = subrange->link_segmentSubrangesGPR.next;
if (subrange->link_segmentSubrangesGPR.next)
subrange->link_segmentSubrangesGPR.next->link_segmentSubrangesGPR.prev = tempPrev;
#ifdef CEMU_DEBUG_ASSERT
subrange->link_segmentSubrangesGPR.prev = (raLivenessSubrange_t*)1;
subrange->link_segmentSubrangesGPR.next = (raLivenessSubrange_t*)1;
#endif
}
MemoryPoolPermanentObjects<raLivenessRange_t> memPool_livenessRange(4096);
MemoryPoolPermanentObjects<raLivenessSubrange_t> memPool_livenessSubrange(4096);
raLivenessRange_t* PPCRecRA_createRangeBase(ppcImlGenContext_t* ppcImlGenContext, uint32 virtualRegister, uint32 name)
{
raLivenessRange_t* livenessRange = memPool_livenessRange.acquireObj();
livenessRange->list_subranges.resize(0);
livenessRange->virtualRegister = virtualRegister;
livenessRange->name = name;
livenessRange->physicalRegister = -1;
ppcImlGenContext->raInfo.list_ranges.push_back(livenessRange);
return livenessRange;
}
raLivenessSubrange_t* PPCRecRA_createSubrange(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange_t* range, PPCRecImlSegment_t* imlSegment, sint32 startIndex, sint32 endIndex)
{
raLivenessSubrange_t* livenessSubrange = memPool_livenessSubrange.acquireObj();
livenessSubrange->list_locations.resize(0);
livenessSubrange->range = range;
livenessSubrange->imlSegment = imlSegment;
PPCRecompilerIml_setSegmentPoint(&livenessSubrange->start, imlSegment, startIndex);
PPCRecompilerIml_setSegmentPoint(&livenessSubrange->end, imlSegment, endIndex);
// default values
livenessSubrange->hasStore = false;
livenessSubrange->hasStoreDelayed = false;
livenessSubrange->lastIterationIndex = 0;
livenessSubrange->subrangeBranchNotTaken = nullptr;
livenessSubrange->subrangeBranchTaken = nullptr;
livenessSubrange->_noLoad = false;
// add to range
range->list_subranges.push_back(livenessSubrange);
// add to segment
PPCRecRARange_addLink_perVirtualGPR(&(imlSegment->raInfo.linkedList_perVirtualGPR[range->virtualRegister]), livenessSubrange);
PPCRecRARange_addLink_allSubrangesGPR(&imlSegment->raInfo.linkedList_allSubranges, livenessSubrange);
return livenessSubrange;
}
void _unlinkSubrange(raLivenessSubrange_t* subrange)
{
PPCRecImlSegment_t* imlSegment = subrange->imlSegment;
PPCRecRARange_removeLink_perVirtualGPR(&imlSegment->raInfo.linkedList_perVirtualGPR[subrange->range->virtualRegister], subrange);
PPCRecRARange_removeLink_allSubrangesGPR(&imlSegment->raInfo.linkedList_allSubranges, subrange);
}
void PPCRecRA_deleteSubrange(ppcImlGenContext_t* ppcImlGenContext, raLivenessSubrange_t* subrange)
{
_unlinkSubrange(subrange);
subrange->range->list_subranges.erase(std::find(subrange->range->list_subranges.begin(), subrange->range->list_subranges.end(), subrange));
subrange->list_locations.clear();
PPCRecompilerIml_removeSegmentPoint(&subrange->start);
PPCRecompilerIml_removeSegmentPoint(&subrange->end);
memPool_livenessSubrange.releaseObj(subrange);
}
void _PPCRecRA_deleteSubrangeNoUnlinkFromRange(ppcImlGenContext_t* ppcImlGenContext, raLivenessSubrange_t* subrange)
{
_unlinkSubrange(subrange);
PPCRecompilerIml_removeSegmentPoint(&subrange->start);
PPCRecompilerIml_removeSegmentPoint(&subrange->end);
memPool_livenessSubrange.releaseObj(subrange);
}
void PPCRecRA_deleteRange(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange_t* range)
{
for (auto& subrange : range->list_subranges)
{
_PPCRecRA_deleteSubrangeNoUnlinkFromRange(ppcImlGenContext, subrange);
}
ppcImlGenContext->raInfo.list_ranges.erase(std::find(ppcImlGenContext->raInfo.list_ranges.begin(), ppcImlGenContext->raInfo.list_ranges.end(), range));
memPool_livenessRange.releaseObj(range);
}
void PPCRecRA_deleteRangeNoUnlink(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange_t* range)
{
for (auto& subrange : range->list_subranges)
{
_PPCRecRA_deleteSubrangeNoUnlinkFromRange(ppcImlGenContext, subrange);
}
memPool_livenessRange.releaseObj(range);
}
void PPCRecRA_deleteAllRanges(ppcImlGenContext_t* ppcImlGenContext)
{
for(auto& range : ppcImlGenContext->raInfo.list_ranges)
{
PPCRecRA_deleteRangeNoUnlink(ppcImlGenContext, range);
}
ppcImlGenContext->raInfo.list_ranges.clear();
}
void PPCRecRA_mergeRanges(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange_t* range, raLivenessRange_t* absorbedRange)
{
cemu_assert_debug(range != absorbedRange);
cemu_assert_debug(range->virtualRegister == absorbedRange->virtualRegister);
// move all subranges from absorbedRange to range
for (auto& subrange : absorbedRange->list_subranges)
{
range->list_subranges.push_back(subrange);
subrange->range = range;
}
absorbedRange->list_subranges.clear();
PPCRecRA_deleteRange(ppcImlGenContext, absorbedRange);
}
void PPCRecRA_mergeSubranges(ppcImlGenContext_t* ppcImlGenContext, raLivenessSubrange_t* subrange, raLivenessSubrange_t* absorbedSubrange)
{
#ifdef CEMU_DEBUG_ASSERT
PPCRecRA_debugValidateSubrange(subrange);
PPCRecRA_debugValidateSubrange(absorbedSubrange);
if (subrange->imlSegment != absorbedSubrange->imlSegment)
assert_dbg();
if (subrange->end.index > absorbedSubrange->start.index)
assert_dbg();
if (subrange->subrangeBranchTaken || subrange->subrangeBranchNotTaken)
assert_dbg();
if (subrange == absorbedSubrange)
assert_dbg();
#endif
subrange->subrangeBranchTaken = absorbedSubrange->subrangeBranchTaken;
subrange->subrangeBranchNotTaken = absorbedSubrange->subrangeBranchNotTaken;
// merge usage locations
for (auto& location : absorbedSubrange->list_locations)
{
subrange->list_locations.push_back(location);
}
absorbedSubrange->list_locations.clear();
subrange->end.index = absorbedSubrange->end.index;
PPCRecRA_debugValidateSubrange(subrange);
PPCRecRA_deleteSubrange(ppcImlGenContext, absorbedSubrange);
}
// remove all inter-segment connections from the range and split it into local ranges (also removes empty ranges)
void PPCRecRA_explodeRange(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange_t* range)
{
if (range->list_subranges.size() == 1)
assert_dbg();
for (auto& subrange : range->list_subranges)
{
if (subrange->list_locations.empty())
continue;
raLivenessRange_t* newRange = PPCRecRA_createRangeBase(ppcImlGenContext, range->virtualRegister, range->name);
raLivenessSubrange_t* newSubrange = PPCRecRA_createSubrange(ppcImlGenContext, newRange, subrange->imlSegment, subrange->list_locations.data()[0].index, subrange->list_locations.data()[subrange->list_locations.size() - 1].index + 1);
// copy locations
for (auto& location : subrange->list_locations)
{
newSubrange->list_locations.push_back(location);
}
}
// remove original range
PPCRecRA_deleteRange(ppcImlGenContext, range);
}
#ifdef CEMU_DEBUG_ASSERT
void PPCRecRA_debugValidateSubrange(raLivenessSubrange_t* subrange)
{
// validate subrange
if (subrange->subrangeBranchTaken && subrange->subrangeBranchTaken->imlSegment != subrange->imlSegment->nextSegmentBranchTaken)
assert_dbg();
if (subrange->subrangeBranchNotTaken && subrange->subrangeBranchNotTaken->imlSegment != subrange->imlSegment->nextSegmentBranchNotTaken)
assert_dbg();
}
#else
void PPCRecRA_debugValidateSubrange(raLivenessSubrange_t* subrange) {}
#endif
// split subrange at the given index
// After the split there will be two ranges/subranges:
// head -> subrange is shortned to end at splitIndex
// tail -> a new subrange that reaches from splitIndex to the end of the original subrange
// if head has a physical register assigned it will not carry over to tail
// The return value is the tail subrange
// If trimToHole is true, the end of the head subrange and the start of the tail subrange will be moved to fit the locations
// Ranges that begin at RA_INTER_RANGE_START are allowed and can be split
raLivenessSubrange_t* PPCRecRA_splitLocalSubrange(ppcImlGenContext_t* ppcImlGenContext, raLivenessSubrange_t* subrange, sint32 splitIndex, bool trimToHole)
{
// validation
#ifdef CEMU_DEBUG_ASSERT
if (subrange->end.index == RA_INTER_RANGE_END || subrange->end.index == RA_INTER_RANGE_START)
assert_dbg();
if (subrange->start.index >= splitIndex)
assert_dbg();
if (subrange->end.index <= splitIndex)
assert_dbg();
#endif
// create tail
raLivenessRange_t* tailRange = PPCRecRA_createRangeBase(ppcImlGenContext, subrange->range->virtualRegister, subrange->range->name);
raLivenessSubrange_t* tailSubrange = PPCRecRA_createSubrange(ppcImlGenContext, tailRange, subrange->imlSegment, splitIndex, subrange->end.index);
// copy locations
for (auto& location : subrange->list_locations)
{
if (location.index >= splitIndex)
tailSubrange->list_locations.push_back(location);
}
// remove tail locations from head
for (sint32 i = 0; i < subrange->list_locations.size(); i++)
{
raLivenessLocation_t* location = subrange->list_locations.data() + i;
if (location->index >= splitIndex)
{
subrange->list_locations.resize(i);
break;
}
}
// adjust start/end
if (trimToHole)
{
if (subrange->list_locations.empty())
{
subrange->end.index = subrange->start.index+1;
}
else
{
subrange->end.index = subrange->list_locations.back().index + 1;
}
if (tailSubrange->list_locations.empty())
{
assert_dbg(); // should not happen? (In this case we can just avoid generating a tail at all)
}
else
{
tailSubrange->start.index = tailSubrange->list_locations.front().index;
}
}
return tailSubrange;
}
void PPCRecRA_updateOrAddSubrangeLocation(raLivenessSubrange_t* subrange, sint32 index, bool isRead, bool isWrite)
{
if (subrange->list_locations.empty())
{
subrange->list_locations.emplace_back(index, isRead, isWrite);
return;
}
raLivenessLocation_t* lastLocation = subrange->list_locations.data() + (subrange->list_locations.size() - 1);
cemu_assert_debug(lastLocation->index <= index);
if (lastLocation->index == index)
{
// update
lastLocation->isRead = lastLocation->isRead || isRead;
lastLocation->isWrite = lastLocation->isWrite || isWrite;
return;
}
// add new
subrange->list_locations.emplace_back(index, isRead, isWrite);
}
sint32 PPCRecRARange_getReadWriteCost(PPCRecImlSegment_t* imlSegment)
{
sint32 v = imlSegment->loopDepth + 1;
v *= 5;
return v*v; // 25, 100, 225, 400
}
// calculate cost of entire range
// ignores data flow and does not detect avoidable reads/stores
sint32 PPCRecRARange_estimateCost(raLivenessRange_t* range)
{
sint32 cost = 0;
// todo - this algorithm isn't accurate. If we have 10 parallel branches with a load each then the actual cost is still only that of one branch (plus minimal extra cost for generating more code).
// currently we calculate the cost based on the most expensive entry/exit point
sint32 mostExpensiveRead = 0;
sint32 mostExpensiveWrite = 0;
sint32 readCount = 0;
sint32 writeCount = 0;
for (auto& subrange : range->list_subranges)
{
if (subrange->start.index != RA_INTER_RANGE_START)
{
//cost += PPCRecRARange_getReadWriteCost(subrange->imlSegment);
mostExpensiveRead = std::max(mostExpensiveRead, PPCRecRARange_getReadWriteCost(subrange->imlSegment));
readCount++;
}
if (subrange->end.index != RA_INTER_RANGE_END)
{
//cost += PPCRecRARange_getReadWriteCost(subrange->imlSegment);
mostExpensiveWrite = std::max(mostExpensiveWrite, PPCRecRARange_getReadWriteCost(subrange->imlSegment));
writeCount++;
}
}
cost = mostExpensiveRead + mostExpensiveWrite;
cost = cost + (readCount + writeCount) / 10;
return cost;
}
// calculate cost of range that it would have after calling PPCRecRA_explodeRange() on it
sint32 PPCRecRARange_estimateAdditionalCostAfterRangeExplode(raLivenessRange_t* range)
{
sint32 cost = -PPCRecRARange_estimateCost(range);
for (auto& subrange : range->list_subranges)
{
if (subrange->list_locations.empty())
continue;
cost += PPCRecRARange_getReadWriteCost(subrange->imlSegment) * 2; // we assume a read and a store
}
return cost;
}
sint32 PPCRecRARange_estimateAdditionalCostAfterSplit(raLivenessSubrange_t* subrange, sint32 splitIndex)
{
// validation
#ifdef CEMU_DEBUG_ASSERT
if (subrange->end.index == RA_INTER_RANGE_END)
assert_dbg();
#endif
sint32 cost = 0;
// find split position in location list
if (subrange->list_locations.empty())
{
assert_dbg(); // should not happen?
return 0;
}
if (splitIndex <= subrange->list_locations.front().index)
return 0;
if (splitIndex > subrange->list_locations.back().index)
return 0;
// todo - determine exact cost of split subranges
cost += PPCRecRARange_getReadWriteCost(subrange->imlSegment) * 2; // currently we assume that the additional region will require a read and a store
//for (sint32 f = 0; f < subrange->list_locations.size(); f++)
//{
// raLivenessLocation_t* location = subrange->list_locations.data() + f;
// if (location->index >= splitIndex)
// {
// ...
// return cost;
// }
//}
return cost;
}

View file

@ -1,27 +0,0 @@
#pragma once
raLivenessRange_t* PPCRecRA_createRangeBase(ppcImlGenContext_t* ppcImlGenContext, uint32 virtualRegister, uint32 name);
raLivenessSubrange_t* PPCRecRA_createSubrange(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange_t* range, PPCRecImlSegment_t* imlSegment, sint32 startIndex, sint32 endIndex);
void PPCRecRA_deleteSubrange(ppcImlGenContext_t* ppcImlGenContext, raLivenessSubrange_t* subrange);
void PPCRecRA_deleteRange(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange_t* range);
void PPCRecRA_deleteAllRanges(ppcImlGenContext_t* ppcImlGenContext);
void PPCRecRA_mergeRanges(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange_t* range, raLivenessRange_t* absorbedRange);
void PPCRecRA_explodeRange(ppcImlGenContext_t* ppcImlGenContext, raLivenessRange_t* range);
void PPCRecRA_mergeSubranges(ppcImlGenContext_t* ppcImlGenContext, raLivenessSubrange_t* subrange, raLivenessSubrange_t* absorbedSubrange);
raLivenessSubrange_t* PPCRecRA_splitLocalSubrange(ppcImlGenContext_t* ppcImlGenContext, raLivenessSubrange_t* subrange, sint32 splitIndex, bool trimToHole = false);
void PPCRecRA_updateOrAddSubrangeLocation(raLivenessSubrange_t* subrange, sint32 index, bool isRead, bool isWrite);
void PPCRecRA_debugValidateSubrange(raLivenessSubrange_t* subrange);
// cost estimation
sint32 PPCRecRARange_getReadWriteCost(PPCRecImlSegment_t* imlSegment);
sint32 PPCRecRARange_estimateCost(raLivenessRange_t* range);
sint32 PPCRecRARange_estimateAdditionalCostAfterRangeExplode(raLivenessRange_t* range);
sint32 PPCRecRARange_estimateAdditionalCostAfterSplit(raLivenessSubrange_t* subrange, sint32 splitIndex);
// special values to mark the index of ranges that reach across the segment border
#define RA_INTER_RANGE_START (-1)
#define RA_INTER_RANGE_END (0x70000000)

View file

@ -1,414 +0,0 @@
#include "PPCRecompiler.h"
#include "PPCRecompilerIml.h"
#include "PPCRecompilerX64.h"
#include "PPCRecompilerImlRanges.h"
#include <queue>
bool _isRangeDefined(PPCRecImlSegment_t* imlSegment, sint32 vGPR)
{
return (imlSegment->raDistances.reg[vGPR].usageStart != INT_MAX);
}
void PPCRecRA_calculateSegmentMinMaxRanges(ppcImlGenContext_t* ppcImlGenContext, PPCRecImlSegment_t* imlSegment)
{
for (sint32 i = 0; i < PPC_REC_MAX_VIRTUAL_GPR; i++)
{
imlSegment->raDistances.reg[i].usageStart = INT_MAX;
imlSegment->raDistances.reg[i].usageEnd = INT_MIN;
}
// scan instructions for usage range
sint32 index = 0;
PPCImlOptimizerUsedRegisters_t gprTracking;
while (index < imlSegment->imlListCount)
{
// end loop at suffix instruction
if (PPCRecompiler_isSuffixInstruction(imlSegment->imlList + index))
break;
// get accessed GPRs
PPCRecompiler_checkRegisterUsage(NULL, imlSegment->imlList + index, &gprTracking);
for (sint32 t = 0; t < 4; t++)
{
sint32 virtualRegister = gprTracking.gpr[t];
if (virtualRegister < 0)
continue;
cemu_assert_debug(virtualRegister < PPC_REC_MAX_VIRTUAL_GPR);
imlSegment->raDistances.reg[virtualRegister].usageStart = std::min(imlSegment->raDistances.reg[virtualRegister].usageStart, index); // index before/at instruction
imlSegment->raDistances.reg[virtualRegister].usageEnd = std::max(imlSegment->raDistances.reg[virtualRegister].usageEnd, index+1); // index after instruction
}
// next instruction
index++;
}
}
void PPCRecRA_calculateLivenessRangesV2(ppcImlGenContext_t* ppcImlGenContext)
{
// for each register calculate min/max index of usage range within each segment
for (sint32 s = 0; s < ppcImlGenContext->segmentListCount; s++)
{
PPCRecRA_calculateSegmentMinMaxRanges(ppcImlGenContext, ppcImlGenContext->segmentList[s]);
}
}
raLivenessSubrange_t* PPCRecRA_convertToMappedRanges(ppcImlGenContext_t* ppcImlGenContext, PPCRecImlSegment_t* imlSegment, sint32 vGPR, raLivenessRange_t* range)
{
if (imlSegment->raDistances.isProcessed[vGPR])
{
// return already existing segment
return imlSegment->raInfo.linkedList_perVirtualGPR[vGPR];
}
imlSegment->raDistances.isProcessed[vGPR] = true;
if (_isRangeDefined(imlSegment, vGPR) == false)
return nullptr;
// create subrange
cemu_assert_debug(imlSegment->raInfo.linkedList_perVirtualGPR[vGPR] == nullptr);
raLivenessSubrange_t* subrange = PPCRecRA_createSubrange(ppcImlGenContext, range, imlSegment, imlSegment->raDistances.reg[vGPR].usageStart, imlSegment->raDistances.reg[vGPR].usageEnd);
// traverse forward
if (imlSegment->raDistances.reg[vGPR].usageEnd == RA_INTER_RANGE_END)
{
if (imlSegment->nextSegmentBranchTaken && imlSegment->nextSegmentBranchTaken->raDistances.reg[vGPR].usageStart == RA_INTER_RANGE_START)
{
subrange->subrangeBranchTaken = PPCRecRA_convertToMappedRanges(ppcImlGenContext, imlSegment->nextSegmentBranchTaken, vGPR, range);
cemu_assert_debug(subrange->subrangeBranchTaken->start.index == RA_INTER_RANGE_START);
}
if (imlSegment->nextSegmentBranchNotTaken && imlSegment->nextSegmentBranchNotTaken->raDistances.reg[vGPR].usageStart == RA_INTER_RANGE_START)
{
subrange->subrangeBranchNotTaken = PPCRecRA_convertToMappedRanges(ppcImlGenContext, imlSegment->nextSegmentBranchNotTaken, vGPR, range);
cemu_assert_debug(subrange->subrangeBranchNotTaken->start.index == RA_INTER_RANGE_START);
}
}
// traverse backward
if (imlSegment->raDistances.reg[vGPR].usageStart == RA_INTER_RANGE_START)
{
for (auto& it : imlSegment->list_prevSegments)
{
if (it->raDistances.reg[vGPR].usageEnd == RA_INTER_RANGE_END)
PPCRecRA_convertToMappedRanges(ppcImlGenContext, it, vGPR, range);
}
}
// return subrange
return subrange;
}
void PPCRecRA_createSegmentLivenessRanges(ppcImlGenContext_t* ppcImlGenContext, PPCRecImlSegment_t* imlSegment)
{
for (sint32 i = 0; i < PPC_REC_MAX_VIRTUAL_GPR; i++)
{
if( _isRangeDefined(imlSegment, i) == false )
continue;
if( imlSegment->raDistances.isProcessed[i])
continue;
raLivenessRange_t* range = PPCRecRA_createRangeBase(ppcImlGenContext, i, ppcImlGenContext->mappedRegister[i]);
PPCRecRA_convertToMappedRanges(ppcImlGenContext, imlSegment, i, range);
}
// create lookup table of ranges
raLivenessSubrange_t* vGPR2Subrange[PPC_REC_MAX_VIRTUAL_GPR];
for (sint32 i = 0; i < PPC_REC_MAX_VIRTUAL_GPR; i++)
{
vGPR2Subrange[i] = imlSegment->raInfo.linkedList_perVirtualGPR[i];
#ifdef CEMU_DEBUG_ASSERT
if (vGPR2Subrange[i] && vGPR2Subrange[i]->link_sameVirtualRegisterGPR.next != nullptr)
assert_dbg();
#endif
}
// parse instructions and convert to locations
sint32 index = 0;
PPCImlOptimizerUsedRegisters_t gprTracking;
while (index < imlSegment->imlListCount)
{
// end loop at suffix instruction
if (PPCRecompiler_isSuffixInstruction(imlSegment->imlList + index))
break;
// get accessed GPRs
PPCRecompiler_checkRegisterUsage(NULL, imlSegment->imlList + index, &gprTracking);
// handle accessed GPR
for (sint32 t = 0; t < 4; t++)
{
sint32 virtualRegister = gprTracking.gpr[t];
if (virtualRegister < 0)
continue;
bool isWrite = (t == 3);
// add location
PPCRecRA_updateOrAddSubrangeLocation(vGPR2Subrange[virtualRegister], index, isWrite == false, isWrite);
#ifdef CEMU_DEBUG_ASSERT
if (index < vGPR2Subrange[virtualRegister]->start.index)
assert_dbg();
if (index+1 > vGPR2Subrange[virtualRegister]->end.index)
assert_dbg();
#endif
}
// next instruction
index++;
}
}
void PPCRecRA_extendRangeToEndOfSegment(ppcImlGenContext_t* ppcImlGenContext, PPCRecImlSegment_t* imlSegment, sint32 vGPR)
{
if (_isRangeDefined(imlSegment, vGPR) == false)
{
imlSegment->raDistances.reg[vGPR].usageStart = RA_INTER_RANGE_END;
imlSegment->raDistances.reg[vGPR].usageEnd = RA_INTER_RANGE_END;
return;
}
imlSegment->raDistances.reg[vGPR].usageEnd = RA_INTER_RANGE_END;
}
void PPCRecRA_extendRangeToBeginningOfSegment(ppcImlGenContext_t* ppcImlGenContext, PPCRecImlSegment_t* imlSegment, sint32 vGPR)
{
if (_isRangeDefined(imlSegment, vGPR) == false)
{
imlSegment->raDistances.reg[vGPR].usageStart = RA_INTER_RANGE_START;
imlSegment->raDistances.reg[vGPR].usageEnd = RA_INTER_RANGE_START;
}
else
{
imlSegment->raDistances.reg[vGPR].usageStart = RA_INTER_RANGE_START;
}
// propagate backwards
for (auto& it : imlSegment->list_prevSegments)
{
PPCRecRA_extendRangeToEndOfSegment(ppcImlGenContext, it, vGPR);
}
}
void _PPCRecRA_connectRanges(ppcImlGenContext_t* ppcImlGenContext, sint32 vGPR, PPCRecImlSegment_t** route, sint32 routeDepth)
{
#ifdef CEMU_DEBUG_ASSERT
if (routeDepth < 2)
assert_dbg();
#endif
// extend starting range to end of segment
PPCRecRA_extendRangeToEndOfSegment(ppcImlGenContext, route[0], vGPR);
// extend all the connecting segments in both directions
for (sint32 i = 1; i < (routeDepth - 1); i++)
{
PPCRecRA_extendRangeToEndOfSegment(ppcImlGenContext, route[i], vGPR);
PPCRecRA_extendRangeToBeginningOfSegment(ppcImlGenContext, route[i], vGPR);
}
// extend the final segment towards the beginning
PPCRecRA_extendRangeToBeginningOfSegment(ppcImlGenContext, route[routeDepth-1], vGPR);
}
void _PPCRecRA_checkAndTryExtendRange(ppcImlGenContext_t* ppcImlGenContext, PPCRecImlSegment_t* currentSegment, sint32 vGPR, sint32 distanceLeft, PPCRecImlSegment_t** route, sint32 routeDepth)
{
if (routeDepth >= 64)
{
cemuLog_logDebug(LogType::Force, "Recompiler RA route maximum depth exceeded for function 0x{:08x}", ppcImlGenContext->functionRef->ppcAddress);
return;
}
route[routeDepth] = currentSegment;
if (currentSegment->raDistances.reg[vGPR].usageStart == INT_MAX)
{
// measure distance to end of segment
distanceLeft -= currentSegment->imlListCount;
if (distanceLeft > 0)
{
if (currentSegment->nextSegmentBranchNotTaken)
_PPCRecRA_checkAndTryExtendRange(ppcImlGenContext, currentSegment->nextSegmentBranchNotTaken, vGPR, distanceLeft, route, routeDepth + 1);
if (currentSegment->nextSegmentBranchTaken)
_PPCRecRA_checkAndTryExtendRange(ppcImlGenContext, currentSegment->nextSegmentBranchTaken, vGPR, distanceLeft, route, routeDepth + 1);
}
return;
}
else
{
// measure distance to range
if (currentSegment->raDistances.reg[vGPR].usageStart == RA_INTER_RANGE_END)
{
if (distanceLeft < currentSegment->imlListCount)
return; // range too far away
}
else if (currentSegment->raDistances.reg[vGPR].usageStart != RA_INTER_RANGE_START && currentSegment->raDistances.reg[vGPR].usageStart > distanceLeft)
return; // out of range
// found close range -> connect ranges
_PPCRecRA_connectRanges(ppcImlGenContext, vGPR, route, routeDepth + 1);
}
}
void PPCRecRA_checkAndTryExtendRange(ppcImlGenContext_t* ppcImlGenContext, PPCRecImlSegment_t* currentSegment, sint32 vGPR)
{
#ifdef CEMU_DEBUG_ASSERT
if (currentSegment->raDistances.reg[vGPR].usageEnd < 0)
assert_dbg();
#endif
// count instructions to end of initial segment
if (currentSegment->raDistances.reg[vGPR].usageEnd == RA_INTER_RANGE_START)
assert_dbg();
sint32 instructionsUntilEndOfSeg;
if (currentSegment->raDistances.reg[vGPR].usageEnd == RA_INTER_RANGE_END)
instructionsUntilEndOfSeg = 0;
else
instructionsUntilEndOfSeg = currentSegment->imlListCount - currentSegment->raDistances.reg[vGPR].usageEnd;
#ifdef CEMU_DEBUG_ASSERT
if (instructionsUntilEndOfSeg < 0)
assert_dbg();
#endif
sint32 remainingScanDist = 45 - instructionsUntilEndOfSeg;
if (remainingScanDist <= 0)
return; // can't reach end
// also dont forget: Extending is easier if we allow 'non symetric' branches. E.g. register range one enters one branch
PPCRecImlSegment_t* route[64];
route[0] = currentSegment;
if (currentSegment->nextSegmentBranchNotTaken)
{
_PPCRecRA_checkAndTryExtendRange(ppcImlGenContext, currentSegment->nextSegmentBranchNotTaken, vGPR, remainingScanDist, route, 1);
}
if (currentSegment->nextSegmentBranchTaken)
{
_PPCRecRA_checkAndTryExtendRange(ppcImlGenContext, currentSegment->nextSegmentBranchTaken, vGPR, remainingScanDist, route, 1);
}
}
void PPCRecRA_mergeCloseRangesForSegmentV2(ppcImlGenContext_t* ppcImlGenContext, PPCRecImlSegment_t* imlSegment)
{
for (sint32 i = 0; i < PPC_REC_MAX_VIRTUAL_GPR; i++) // todo: Use dynamic maximum or list of used vGPRs so we can avoid parsing empty entries
{
if(imlSegment->raDistances.reg[i].usageStart == INT_MAX)
continue; // not used
// check and extend if possible
PPCRecRA_checkAndTryExtendRange(ppcImlGenContext, imlSegment, i);
}
#ifdef CEMU_DEBUG_ASSERT
if (imlSegment->list_prevSegments.empty() == false && imlSegment->isEnterable)
assert_dbg();
if ((imlSegment->nextSegmentBranchNotTaken != nullptr || imlSegment->nextSegmentBranchTaken != nullptr) && imlSegment->nextSegmentIsUncertain)
assert_dbg();
#endif
}
void PPCRecRA_followFlowAndExtendRanges(ppcImlGenContext_t* ppcImlGenContext, PPCRecImlSegment_t* imlSegment)
{
std::vector<PPCRecImlSegment_t*> list_segments;
list_segments.reserve(1000);
sint32 index = 0;
imlSegment->raRangeExtendProcessed = true;
list_segments.push_back(imlSegment);
while (index < list_segments.size())
{
PPCRecImlSegment_t* currentSegment = list_segments[index];
PPCRecRA_mergeCloseRangesForSegmentV2(ppcImlGenContext, currentSegment);
// follow flow
if (currentSegment->nextSegmentBranchNotTaken && currentSegment->nextSegmentBranchNotTaken->raRangeExtendProcessed == false)
{
currentSegment->nextSegmentBranchNotTaken->raRangeExtendProcessed = true;
list_segments.push_back(currentSegment->nextSegmentBranchNotTaken);
}
if (currentSegment->nextSegmentBranchTaken && currentSegment->nextSegmentBranchTaken->raRangeExtendProcessed == false)
{
currentSegment->nextSegmentBranchTaken->raRangeExtendProcessed = true;
list_segments.push_back(currentSegment->nextSegmentBranchTaken);
}
index++;
}
}
void PPCRecRA_mergeCloseRangesV2(ppcImlGenContext_t* ppcImlGenContext)
{
for (sint32 s = 0; s < ppcImlGenContext->segmentListCount; s++)
{
PPCRecImlSegment_t* imlSegment = ppcImlGenContext->segmentList[s];
if (imlSegment->list_prevSegments.empty())
{
if (imlSegment->raRangeExtendProcessed)
assert_dbg(); // should not happen
PPCRecRA_followFlowAndExtendRanges(ppcImlGenContext, imlSegment);
}
}
}
void PPCRecRA_extendRangesOutOfLoopsV2(ppcImlGenContext_t* ppcImlGenContext)
{
for (sint32 s = 0; s < ppcImlGenContext->segmentListCount; s++)
{
PPCRecImlSegment_t* imlSegment = ppcImlGenContext->segmentList[s];
auto localLoopDepth = imlSegment->loopDepth;
if( localLoopDepth <= 0 )
continue; // not inside a loop
// look for loop exit
bool hasLoopExit = false;
if (imlSegment->nextSegmentBranchTaken && imlSegment->nextSegmentBranchTaken->loopDepth < localLoopDepth)
{
hasLoopExit = true;
}
if (imlSegment->nextSegmentBranchNotTaken && imlSegment->nextSegmentBranchNotTaken->loopDepth < localLoopDepth)
{
hasLoopExit = true;
}
if(hasLoopExit == false)
continue;
// extend looping ranges into all exits (this allows the data flow analyzer to move stores out of the loop)
for (sint32 i = 0; i < PPC_REC_MAX_VIRTUAL_GPR; i++) // todo: Use dynamic maximum or list of used vGPRs so we can avoid parsing empty entries
{
if (imlSegment->raDistances.reg[i].usageEnd != RA_INTER_RANGE_END)
continue; // range not set or does not reach end of segment
if(imlSegment->nextSegmentBranchTaken)
PPCRecRA_extendRangeToBeginningOfSegment(ppcImlGenContext, imlSegment->nextSegmentBranchTaken, i);
if(imlSegment->nextSegmentBranchNotTaken)
PPCRecRA_extendRangeToBeginningOfSegment(ppcImlGenContext, imlSegment->nextSegmentBranchNotTaken, i);
}
}
}
void PPCRecRA_processFlowAndCalculateLivenessRangesV2(ppcImlGenContext_t* ppcImlGenContext)
{
// merge close ranges
PPCRecRA_mergeCloseRangesV2(ppcImlGenContext);
// extra pass to move register stores out of loops
PPCRecRA_extendRangesOutOfLoopsV2(ppcImlGenContext);
// calculate liveness ranges
for (sint32 s = 0; s < ppcImlGenContext->segmentListCount; s++)
{
PPCRecImlSegment_t* imlSegment = ppcImlGenContext->segmentList[s];
PPCRecRA_createSegmentLivenessRanges(ppcImlGenContext, imlSegment);
}
}
void PPCRecRA_analyzeSubrangeDataDependencyV2(raLivenessSubrange_t* subrange)
{
bool isRead = false;
bool isWritten = false;
bool isOverwritten = false;
for (auto& location : subrange->list_locations)
{
if (location.isRead)
{
isRead = true;
}
if (location.isWrite)
{
if (isRead == false)
isOverwritten = true;
isWritten = true;
}
}
subrange->_noLoad = isOverwritten;
subrange->hasStore = isWritten;
if (subrange->start.index == RA_INTER_RANGE_START)
subrange->_noLoad = true;
}
void _analyzeRangeDataFlow(raLivenessSubrange_t* subrange);
void PPCRecRA_analyzeRangeDataFlowV2(ppcImlGenContext_t* ppcImlGenContext)
{
// this function is called after _assignRegisters(), which means that all ranges are already final and wont change anymore
// first do a per-subrange pass
for (auto& range : ppcImlGenContext->raInfo.list_ranges)
{
for (auto& subrange : range->list_subranges)
{
PPCRecRA_analyzeSubrangeDataDependencyV2(subrange);
}
}
// then do a second pass where we scan along subrange flow
for (auto& range : ppcImlGenContext->raInfo.list_ranges)
{
for (auto& subrange : range->list_subranges) // todo - traversing this backwards should be faster and yield better results due to the nature of the algorithm
{
_analyzeRangeDataFlow(subrange);
}
}
}

View file

@ -1,173 +1,26 @@
#include "PPCRecompiler.h"
#include "PPCRecompilerIml.h"
PPCRecImlSegment_t* PPCRecompiler_getSegmentByPPCJumpAddress(ppcImlGenContext_t* ppcImlGenContext, uint32 ppcOffset)
{
for(sint32 s=0; s<ppcImlGenContext->segmentListCount; s++)
{
if( ppcImlGenContext->segmentList[s]->isJumpDestination && ppcImlGenContext->segmentList[s]->jumpDestinationPPCAddress == ppcOffset )
{
return ppcImlGenContext->segmentList[s];
}
}
debug_printf("PPCRecompiler_getSegmentByPPCJumpAddress(): Unable to find segment (ppcOffset 0x%08x)\n", ppcOffset);
return NULL;
}
void PPCRecompilerIml_setLinkBranchNotTaken(PPCRecImlSegment_t* imlSegmentSrc, PPCRecImlSegment_t* imlSegmentDst)
{
// make sure segments aren't already linked
if (imlSegmentSrc->nextSegmentBranchNotTaken == imlSegmentDst)
return;
// add as next segment for source
if (imlSegmentSrc->nextSegmentBranchNotTaken != NULL)
assert_dbg();
imlSegmentSrc->nextSegmentBranchNotTaken = imlSegmentDst;
// add as previous segment for destination
imlSegmentDst->list_prevSegments.push_back(imlSegmentSrc);
}
void PPCRecompilerIml_setLinkBranchTaken(PPCRecImlSegment_t* imlSegmentSrc, PPCRecImlSegment_t* imlSegmentDst)
{
// make sure segments aren't already linked
if (imlSegmentSrc->nextSegmentBranchTaken == imlSegmentDst)
return;
// add as next segment for source
if (imlSegmentSrc->nextSegmentBranchTaken != NULL)
assert_dbg();
imlSegmentSrc->nextSegmentBranchTaken = imlSegmentDst;
// add as previous segment for destination
imlSegmentDst->list_prevSegments.push_back(imlSegmentSrc);
}
void PPCRecompilerIML_removeLink(PPCRecImlSegment_t* imlSegmentSrc, PPCRecImlSegment_t* imlSegmentDst)
{
if (imlSegmentSrc->nextSegmentBranchNotTaken == imlSegmentDst)
{
imlSegmentSrc->nextSegmentBranchNotTaken = NULL;
}
else if (imlSegmentSrc->nextSegmentBranchTaken == imlSegmentDst)
{
imlSegmentSrc->nextSegmentBranchTaken = NULL;
}
else
assert_dbg();
bool matchFound = false;
for (sint32 i = 0; i < imlSegmentDst->list_prevSegments.size(); i++)
{
if (imlSegmentDst->list_prevSegments[i] == imlSegmentSrc)
{
imlSegmentDst->list_prevSegments.erase(imlSegmentDst->list_prevSegments.begin()+i);
matchFound = true;
break;
}
}
if (matchFound == false)
assert_dbg();
}
/*
* Replaces all links to segment orig with linkts to segment new
*/
void PPCRecompilerIML_relinkInputSegment(PPCRecImlSegment_t* imlSegmentOrig, PPCRecImlSegment_t* imlSegmentNew)
{
while (imlSegmentOrig->list_prevSegments.size() != 0)
{
PPCRecImlSegment_t* prevSegment = imlSegmentOrig->list_prevSegments[0];
if (prevSegment->nextSegmentBranchNotTaken == imlSegmentOrig)
{
PPCRecompilerIML_removeLink(prevSegment, imlSegmentOrig);
PPCRecompilerIml_setLinkBranchNotTaken(prevSegment, imlSegmentNew);
}
else if (prevSegment->nextSegmentBranchTaken == imlSegmentOrig)
{
PPCRecompilerIML_removeLink(prevSegment, imlSegmentOrig);
PPCRecompilerIml_setLinkBranchTaken(prevSegment, imlSegmentNew);
}
else
{
assert_dbg();
}
}
}
void PPCRecompilerIML_linkSegments(ppcImlGenContext_t* ppcImlGenContext)
{
for(sint32 s=0; s<ppcImlGenContext->segmentListCount; s++)
{
PPCRecImlSegment_t* imlSegment = ppcImlGenContext->segmentList[s];
bool isLastSegment = (s+1)>=ppcImlGenContext->segmentListCount;
PPCRecImlSegment_t* nextSegment = isLastSegment?NULL:ppcImlGenContext->segmentList[s+1];
// handle empty segment
if( imlSegment->imlListCount == 0 )
{
if (isLastSegment == false)
PPCRecompilerIml_setLinkBranchNotTaken(imlSegment, ppcImlGenContext->segmentList[s+1]); // continue execution to next segment
else
imlSegment->nextSegmentIsUncertain = true;
continue;
}
// check last instruction of segment
PPCRecImlInstruction_t* imlInstruction = imlSegment->imlList+(imlSegment->imlListCount-1);
if( imlInstruction->type == PPCREC_IML_TYPE_CJUMP || imlInstruction->type == PPCREC_IML_TYPE_CJUMP_CYCLE_CHECK )
{
// find destination segment by ppc jump address
PPCRecImlSegment_t* jumpDestSegment = PPCRecompiler_getSegmentByPPCJumpAddress(ppcImlGenContext, imlInstruction->op_conditionalJump.jumpmarkAddress);
if( jumpDestSegment )
{
if (imlInstruction->op_conditionalJump.condition != PPCREC_JUMP_CONDITION_NONE)
PPCRecompilerIml_setLinkBranchNotTaken(imlSegment, nextSegment);
PPCRecompilerIml_setLinkBranchTaken(imlSegment, jumpDestSegment);
}
else
{
imlSegment->nextSegmentIsUncertain = true;
}
}
else if( imlInstruction->type == PPCREC_IML_TYPE_MACRO )
{
// currently we assume that the next segment is unknown for all macros
imlSegment->nextSegmentIsUncertain = true;
}
else
{
// all other instruction types do not branch
//imlSegment->nextSegment[0] = nextSegment;
PPCRecompilerIml_setLinkBranchNotTaken(imlSegment, nextSegment);
//imlSegment->nextSegmentIsUncertain = true;
}
}
}
void PPCRecompilerIML_isolateEnterableSegments(ppcImlGenContext_t* ppcImlGenContext)
{
sint32 initialSegmentCount = ppcImlGenContext->segmentListCount;
for (sint32 i = 0; i < ppcImlGenContext->segmentListCount; i++)
size_t initialSegmentCount = ppcImlGenContext->segmentList2.size();
for (size_t i = 0; i < initialSegmentCount; i++)
{
PPCRecImlSegment_t* imlSegment = ppcImlGenContext->segmentList[i];
IMLSegment* imlSegment = ppcImlGenContext->segmentList2[i];
if (imlSegment->list_prevSegments.empty() == false && imlSegment->isEnterable)
{
// spawn new segment at end
PPCRecompilerIml_insertSegments(ppcImlGenContext, ppcImlGenContext->segmentListCount, 1);
PPCRecImlSegment_t* entrySegment = ppcImlGenContext->segmentList[ppcImlGenContext->segmentListCount-1];
PPCRecompilerIml_insertSegments(ppcImlGenContext, ppcImlGenContext->segmentList2.size(), 1);
IMLSegment* entrySegment = ppcImlGenContext->segmentList2[ppcImlGenContext->segmentList2.size()-1];
entrySegment->isEnterable = true;
entrySegment->enterPPCAddress = imlSegment->enterPPCAddress;
// create jump instruction
PPCRecompiler_pushBackIMLInstructions(entrySegment, 0, 1);
PPCRecompilerImlGen_generateNewInstruction_jumpSegment(ppcImlGenContext, entrySegment->imlList + 0);
PPCRecompilerIml_setLinkBranchTaken(entrySegment, imlSegment);
entrySegment->imlList.data()[0].make_jump();
IMLSegment_SetLinkBranchTaken(entrySegment, imlSegment);
// remove enterable flag from original segment
imlSegment->isEnterable = false;
imlSegment->enterPPCAddress = 0;
}
}
}
PPCRecImlInstruction_t* PPCRecompilerIML_getLastInstruction(PPCRecImlSegment_t* imlSegment)
{
if (imlSegment->imlListCount == 0)
return nullptr;
return imlSegment->imlList + (imlSegment->imlListCount - 1);
}
}

File diff suppressed because it is too large Load diff

View file

@ -52,7 +52,7 @@ struct LatteGPUState_t
uint32 gx2InitCalled; // incremented every time GX2Init() is called
// OpenGL control
uint32 glVendor; // GLVENDOR_*
bool alwaysDisplayDRC = false;
bool isDRCPrimary = false;
// temporary (replace with proper solution later)
bool tvBufferUsesSRGB;
bool drcBufferUsesSRGB;

View file

@ -141,6 +141,14 @@ private:
void LatteCP_processCommandBuffer(DrawPassContext& drawPassCtx);
// called whenever the GPU runs out of commands or hits a wait condition (semaphores, HLE waits)
void LatteCP_signalEnterWait()
{
// based on the assumption that games won't do a rugpull and swap out buffer data in the middle of an uninterrupted sequence of drawcalls,
// we only flush caches when the GPU goes idle or has to wait for any operation
LatteIndices_invalidateAll();
}
/*
* Read a U32 from the command buffer
* If no data is available then wait in a busy loop
@ -466,6 +474,8 @@ LatteCMDPtr LatteCP_itWaitRegMem(LatteCMDPtr cmd, uint32 nWords)
const uint32 GPU7_WAIT_MEM_OP_GREATER = 6;
const uint32 GPU7_WAIT_MEM_OP_NEVER = 7;
LatteCP_signalEnterWait();
bool stalls = false;
if ((word0 & 0x10) != 0)
{
@ -594,6 +604,7 @@ LatteCMDPtr LatteCP_itMemSemaphore(LatteCMDPtr cmd, uint32 nWords)
else if(SEM_SIGNAL == 7)
{
// wait
LatteCP_signalEnterWait();
size_t loopCount = 0;
while (true)
{
@ -788,7 +799,7 @@ LatteCMDPtr LatteCP_itHLESampleTimer(LatteCMDPtr cmd, uint32 nWords)
{
cemu_assert_debug(nWords == 1);
MPTR timerMPTR = (MPTR)LatteReadCMD();
memory_writeU64(timerMPTR, coreinit::coreinit_getTimerTick());
memory_writeU64(timerMPTR, coreinit::OSGetSystemTime());
return cmd;
}
@ -1305,11 +1316,13 @@ void LatteCP_processCommandBuffer(DrawPassContext& drawPassCtx)
}
case IT_HLE_TRIGGER_SCANBUFFER_SWAP:
{
LatteCP_signalEnterWait();
LatteCP_itHLESwapScanBuffer(cmdData, nWords);
break;
}
case IT_HLE_WAIT_FOR_FLIP:
{
LatteCP_signalEnterWait();
LatteCP_itHLEWaitForFlip(cmdData, nWords);
break;
}
@ -1594,12 +1607,14 @@ void LatteCP_ProcessRingbuffer()
}
case IT_HLE_TRIGGER_SCANBUFFER_SWAP:
{
LatteCP_signalEnterWait();
LatteCP_itHLESwapScanBuffer(cmd, nWords);
timerRecheck += CP_TIMER_RECHECK / 64;
break;
}
case IT_HLE_WAIT_FOR_FLIP:
{
LatteCP_signalEnterWait();
LatteCP_itHLEWaitForFlip(cmd, nWords);
timerRecheck += CP_TIMER_RECHECK / 1;
break;

View file

@ -1,6 +1,7 @@
#include "Cafe/HW/Latte/Core/LatteConst.h"
#include "Cafe/HW/Latte/Renderer/Renderer.h"
#include "Cafe/HW/Latte/ISA/RegDefines.h"
#include "Cafe/HW/Latte/Core/LattePerformanceMonitor.h"
#include "Common/cpu_features.h"
#if defined(ARCH_X86_64) && defined(__GNUC__)
@ -9,32 +10,53 @@
struct
{
const void* lastPtr;
uint32 lastCount;
LattePrimitiveMode lastPrimitiveMode;
LatteIndexType lastIndexType;
// output
uint32 indexMin;
uint32 indexMax;
Renderer::INDEX_TYPE renderIndexType;
uint32 outputCount;
uint32 indexBufferOffset;
uint32 indexBufferIndex;
struct CacheEntry
{
// input data
const void* lastPtr;
uint32 lastCount;
LattePrimitiveMode lastPrimitiveMode;
LatteIndexType lastIndexType;
uint64 lastUsed;
// output
uint32 indexMin;
uint32 indexMax;
Renderer::INDEX_TYPE renderIndexType;
uint32 outputCount;
Renderer::IndexAllocation indexAllocation;
};
std::array<CacheEntry, 8> entry;
uint64 currentUsageCounter{0};
}LatteIndexCache{};
void LatteIndices_invalidate(const void* memPtr, uint32 size)
{
if (LatteIndexCache.lastPtr >= memPtr && (LatteIndexCache.lastPtr < ((uint8*)memPtr + size)) )
for(auto& entry : LatteIndexCache.entry)
{
LatteIndexCache.lastPtr = nullptr;
LatteIndexCache.lastCount = 0;
if (entry.lastPtr >= memPtr && (entry.lastPtr < ((uint8*)memPtr + size)) )
{
if(entry.lastPtr != nullptr)
g_renderer->indexData_releaseIndexMemory(entry.indexAllocation);
entry.lastPtr = nullptr;
entry.lastCount = 0;
}
}
}
void LatteIndices_invalidateAll()
{
LatteIndexCache.lastPtr = nullptr;
LatteIndexCache.lastCount = 0;
for(auto& entry : LatteIndexCache.entry)
{
if (entry.lastPtr != nullptr)
g_renderer->indexData_releaseIndexMemory(entry.indexAllocation);
entry.lastPtr = nullptr;
entry.lastCount = 0;
}
}
uint64 LatteIndices_GetNextUsageIndex()
{
return LatteIndexCache.currentUsageCounter++;
}
uint32 LatteIndices_calculateIndexOutputSize(LattePrimitiveMode primitiveMode, LatteIndexType indexType, uint32 count)
@ -532,7 +554,7 @@ void LatteIndices_alternativeCalculateIndexMinMax(const void* indexData, LatteIn
}
}
void LatteIndices_decode(const void* indexData, LatteIndexType indexType, uint32 count, LattePrimitiveMode primitiveMode, uint32& indexMin, uint32& indexMax, Renderer::INDEX_TYPE& renderIndexType, uint32& outputCount, uint32& indexBufferOffset, uint32& indexBufferIndex)
void LatteIndices_decode(const void* indexData, LatteIndexType indexType, uint32 count, LattePrimitiveMode primitiveMode, uint32& indexMin, uint32& indexMax, Renderer::INDEX_TYPE& renderIndexType, uint32& outputCount, Renderer::IndexAllocation& indexAllocation)
{
// what this should do:
// [x] use fast SIMD-based index decoding
@ -542,17 +564,18 @@ void LatteIndices_decode(const void* indexData, LatteIndexType indexType, uint32
// [ ] better cache implementation, allow to cache across frames
// reuse from cache if data didn't change
if (LatteIndexCache.lastPtr == indexData &&
LatteIndexCache.lastCount == count &&
LatteIndexCache.lastPrimitiveMode == primitiveMode &&
LatteIndexCache.lastIndexType == indexType)
auto cacheEntry = std::find_if(LatteIndexCache.entry.begin(), LatteIndexCache.entry.end(), [indexData, count, primitiveMode, indexType](const auto& entry)
{
indexMin = LatteIndexCache.indexMin;
indexMax = LatteIndexCache.indexMax;
renderIndexType = LatteIndexCache.renderIndexType;
outputCount = LatteIndexCache.outputCount;
indexBufferOffset = LatteIndexCache.indexBufferOffset;
indexBufferIndex = LatteIndexCache.indexBufferIndex;
return entry.lastPtr == indexData && entry.lastCount == count && entry.lastPrimitiveMode == primitiveMode && entry.lastIndexType == indexType;
});
if (cacheEntry != LatteIndexCache.entry.end())
{
indexMin = cacheEntry->indexMin;
indexMax = cacheEntry->indexMax;
renderIndexType = cacheEntry->renderIndexType;
outputCount = cacheEntry->outputCount;
indexAllocation = cacheEntry->indexAllocation;
cacheEntry->lastUsed = LatteIndices_GetNextUsageIndex();
return;
}
@ -576,10 +599,12 @@ void LatteIndices_decode(const void* indexData, LatteIndexType indexType, uint32
indexMin = 0;
indexMax = std::max(count, 1u)-1;
renderIndexType = Renderer::INDEX_TYPE::NONE;
indexAllocation = {};
return; // no indices
}
// query index buffer from renderer
void* indexOutputPtr = g_renderer->indexData_reserveIndexMemory(indexOutputSize, indexBufferOffset, indexBufferIndex);
indexAllocation = g_renderer->indexData_reserveIndexMemory(indexOutputSize);
void* indexOutputPtr = indexAllocation.mem;
// decode indices
indexMin = std::numeric_limits<uint32>::max();
@ -704,16 +729,25 @@ void LatteIndices_decode(const void* indexData, LatteIndexType indexType, uint32
// recalculate index range but filter out primitive restart index
LatteIndices_alternativeCalculateIndexMinMax(indexData, indexType, count, indexMin, indexMax);
}
g_renderer->indexData_uploadIndexMemory(indexBufferOffset, indexOutputSize);
g_renderer->indexData_uploadIndexMemory(indexAllocation);
performanceMonitor.cycle[performanceMonitor.cycleIndex].indexDataUploaded += indexOutputSize;
// get least recently used cache entry
auto lruEntry = std::min_element(LatteIndexCache.entry.begin(), LatteIndexCache.entry.end(), [](const auto& a, const auto& b)
{
return a.lastUsed < b.lastUsed;
});
// invalidate previous allocation
if(lruEntry->lastPtr != nullptr)
g_renderer->indexData_releaseIndexMemory(lruEntry->indexAllocation);
// update cache
LatteIndexCache.lastPtr = indexData;
LatteIndexCache.lastCount = count;
LatteIndexCache.lastPrimitiveMode = primitiveMode;
LatteIndexCache.lastIndexType = indexType;
LatteIndexCache.indexMin = indexMin;
LatteIndexCache.indexMax = indexMax;
LatteIndexCache.renderIndexType = renderIndexType;
LatteIndexCache.outputCount = outputCount;
LatteIndexCache.indexBufferOffset = indexBufferOffset;
LatteIndexCache.indexBufferIndex = indexBufferIndex;
lruEntry->lastPtr = indexData;
lruEntry->lastCount = count;
lruEntry->lastPrimitiveMode = primitiveMode;
lruEntry->lastIndexType = indexType;
lruEntry->indexMin = indexMin;
lruEntry->indexMax = indexMax;
lruEntry->renderIndexType = renderIndexType;
lruEntry->outputCount = outputCount;
lruEntry->indexAllocation = indexAllocation;
lruEntry->lastUsed = LatteIndices_GetNextUsageIndex();
}

View file

@ -4,4 +4,4 @@
void LatteIndices_invalidate(const void* memPtr, uint32 size);
void LatteIndices_invalidateAll();
void LatteIndices_decode(const void* indexData, LatteIndexType indexType, uint32 count, LattePrimitiveMode primitiveMode, uint32& indexMin, uint32& indexMax, Renderer::INDEX_TYPE& renderIndexType, uint32& outputCount, uint32& indexBufferOffset, uint32& indexBufferIndex);
void LatteIndices_decode(const void* indexData, LatteIndexType indexType, uint32 count, LattePrimitiveMode primitiveMode, uint32& indexMin, uint32& indexMax, Renderer::INDEX_TYPE& renderIndexType, uint32& outputCount, Renderer::IndexAllocation& indexAllocation);

View file

@ -107,7 +107,13 @@ void LatteOverlay_renderOverlay(ImVec2& position, ImVec2& pivot, sint32 directio
ImGui::Text("VRAM: %dMB / %dMB", g_state.vramUsage, g_state.vramTotal);
if (config.overlay.debug)
{
// general debug info
ImGui::Text("--- Debug info ---");
ImGui::Text("IndexUploadPerFrame: %dKB", (performanceMonitor.stats.indexDataUploadPerFrame+1023)/1024);
// backend specific info
g_renderer->AppendOverlayDebugInfo();
}
position.y += (ImGui::GetWindowSize().y + 10.0f) * direction;
}

View file

@ -74,7 +74,6 @@ void LattePerformanceMonitor_frameEnd()
uniformBankDataUploadedPerFrame /= 1024ULL;
uint32 uniformBankCountUploadedPerFrame = (uint32)(uniformBankUploadedCount / (uint64)elapsedFrames);
uint64 indexDataUploadPerFrame = (indexDataUploaded / (uint64)elapsedFrames);
indexDataUploadPerFrame /= 1024ULL;
double fps = (double)elapsedFrames2S * 1000.0 / (double)totalElapsedTimeFPS;
uint32 shaderBindsPerFrame = shaderBindCounter / elapsedFrames;
@ -82,7 +81,7 @@ void LattePerformanceMonitor_frameEnd()
uint32 rlps = (uint32)((uint64)recompilerLeaveCount * 1000ULL / (uint64)totalElapsedTime);
uint32 tlps = (uint32)((uint64)threadLeaveCount * 1000ULL / (uint64)totalElapsedTime);
// set stats
performanceMonitor.stats.indexDataUploadPerFrame = indexDataUploadPerFrame;
// next counter cycle
sint32 nextCycleIndex = (performanceMonitor.cycleIndex + 1) % PERFORMANCE_MONITOR_TRACK_CYCLES;
performanceMonitor.cycle[nextCycleIndex].drawCallCounter = 0;

View file

@ -124,6 +124,7 @@ typedef struct
LattePerfStatCounter numGraphicPipelines;
LattePerfStatCounter numImages;
LattePerfStatCounter numImageViews;
LattePerfStatCounter numSamplers;
LattePerfStatCounter numRenderPass;
LattePerfStatCounter numFramebuffer;
@ -131,6 +132,12 @@ typedef struct
LattePerfStatCounter numDrawBarriersPerFrame;
LattePerfStatCounter numBeginRenderpassPerFrame;
}vk;
// calculated stats (per frame)
struct
{
uint32 indexDataUploadPerFrame;
}stats;
}performanceMonitor_t;
extern performanceMonitor_t performanceMonitor;

View file

@ -11,7 +11,6 @@
#include "Cafe/HW/Latte/Core/LattePerformanceMonitor.h"
#include "Cafe/GraphicPack/GraphicPack2.h"
#include "config/ActiveSettings.h"
#include "Cafe/HW/Latte/Renderer/Vulkan/VulkanRenderer.h"
#include "gui/guiWrapper.h"
#include "Cafe/OS/libs/erreula/erreula.h"
#include "input/InputManager.h"
@ -933,13 +932,6 @@ void LatteRenderTarget_copyToBackbuffer(LatteTextureView* textureView, bool isPa
if (shader == nullptr)
{
sint32 scaling_filter = downscaling ? GetConfig().downscale_filter : GetConfig().upscale_filter;
if (g_renderer->GetType() == RendererAPI::Vulkan)
{
// force linear or nearest neighbor filter
if(scaling_filter != kLinearFilter && scaling_filter != kNearestNeighborFilter)
scaling_filter = kLinearFilter;
}
if (scaling_filter == kLinearFilter)
{
@ -957,7 +949,7 @@ void LatteRenderTarget_copyToBackbuffer(LatteTextureView* textureView, bool isPa
else
shader = RendererOutputShader::s_bicubic_shader;
filter = LatteTextureView::MagFilter::kNearestNeighbor;
filter = LatteTextureView::MagFilter::kLinear;
}
else if (scaling_filter == kBicubicHermiteFilter)
{
@ -989,8 +981,6 @@ void LatteRenderTarget_copyToBackbuffer(LatteTextureView* textureView, bool isPa
g_renderer->ImguiEnd();
}
bool ctrlTabHotkeyPressed = false;
void LatteRenderTarget_itHLECopyColorBufferToScanBuffer(MPTR colorBufferPtr, uint32 colorBufferWidth, uint32 colorBufferHeight, uint32 colorBufferSliceIndex, uint32 colorBufferFormat, uint32 colorBufferPitch, Latte::E_HWTILEMODE colorBufferTilemode, uint32 colorBufferSwizzle, uint32 renderTarget)
{
cemu_assert_debug(colorBufferSliceIndex == 0); // todo - support for non-zero slice
@ -1000,38 +990,31 @@ void LatteRenderTarget_itHLECopyColorBufferToScanBuffer(MPTR colorBufferPtr, uin
return;
}
auto getVPADScreenActive = [](size_t n) -> std::pair<bool, bool> {
auto controller = InputManager::instance().get_vpad_controller(n);
if (!controller)
return {false,false};
auto pressed = controller->is_screen_active();
auto toggle = controller->is_screen_active_toggle();
return {pressed && !toggle, pressed && toggle};
};
const bool tabPressed = gui_isKeyDown(PlatformKeyCodes::TAB);
const bool ctrlPressed = gui_isKeyDown(PlatformKeyCodes::LCONTROL);
const auto [vpad0Active, vpad0Toggle] = getVPADScreenActive(0);
const auto [vpad1Active, vpad1Toggle] = getVPADScreenActive(1);
bool showDRC = swkbd_hasKeyboardInputHook() == false && tabPressed;
bool& alwaysDisplayDRC = LatteGPUState.alwaysDisplayDRC;
const bool altScreenRequested = (!ctrlPressed && tabPressed) || vpad0Active || vpad1Active;
const bool togglePressed = (ctrlPressed && tabPressed) || vpad0Toggle || vpad1Toggle;
static bool togglePressedLast = false;
if (ctrlPressed && tabPressed)
{
if (ctrlTabHotkeyPressed == false)
{
alwaysDisplayDRC = !alwaysDisplayDRC;
ctrlTabHotkeyPressed = true;
}
}
else
ctrlTabHotkeyPressed = false;
bool& isDRCPrimary = LatteGPUState.isDRCPrimary;
if (alwaysDisplayDRC)
showDRC = !tabPressed;
if(togglePressed && !togglePressedLast)
isDRCPrimary = !isDRCPrimary;
togglePressedLast = togglePressed;
if (!showDRC)
{
auto controller = InputManager::instance().get_vpad_controller(0);
if (controller && controller->is_screen_active())
showDRC = true;
if (!showDRC)
{
controller = InputManager::instance().get_vpad_controller(1);
if (controller && controller->is_screen_active())
showDRC = true;
}
}
bool showDRC = swkbd_hasKeyboardInputHook() == false && (isDRCPrimary ^ altScreenRequested);
if ((renderTarget & RENDER_TARGET_DRC) && g_renderer->IsPadWindowActive())
LatteRenderTarget_copyToBackbuffer(texView, true);

View file

@ -451,9 +451,8 @@ void LatteShader_DumpShader(uint64 baseHash, uint64 auxHash, LatteDecompilerShad
suffix = "gs";
else if (shader->shaderType == LatteConst::ShaderType::Pixel)
suffix = "ps";
fs::path dumpPath = "dump/shaders";
dumpPath /= fmt::format("{:016x}_{:016x}_{}.txt", baseHash, auxHash, suffix);
FileStream* fs = FileStream::createFile2(dumpPath);
FileStream* fs = FileStream::createFile2(ActiveSettings::GetUserDataPath("dump/shaders/{:016x}_{:016x}_{}.txt", baseHash, auxHash, suffix));
if (fs)
{
if (shader->strBuf_shaderSource)
@ -479,9 +478,8 @@ void LatteShader_DumpRawShader(uint64 baseHash, uint64 auxHash, uint32 type, uin
suffix = "copy";
else if (type == SHADER_DUMP_TYPE_COMPUTE)
suffix = "compute";
fs::path dumpPath = "dump/shaders";
dumpPath /= fmt::format("{:016x}_{:016x}_{}.bin", baseHash, auxHash, suffix);
FileStream* fs = FileStream::createFile2(dumpPath);
FileStream* fs = FileStream::createFile2(ActiveSettings::GetUserDataPath("dump/shaders/{:016x}_{:016x}_{}.bin", baseHash, auxHash, suffix));
if (fs)
{
fs->writeData(programCode, programLen);

View file

@ -25,6 +25,9 @@
#include "util/helpers/Serializer.h"
#include <wx/msgdlg.h>
#include <audio/IAudioAPI.h>
#include <util/bootSound/BootSoundReader.h>
#include <thread>
#if BOOST_OS_WINDOWS
#include <psapi.h>
@ -155,6 +158,118 @@ bool LoadTGAFile(const std::vector<uint8>& buffer, TGAFILE *tgaFile)
return true;
}
class BootSoundPlayer
{
public:
BootSoundPlayer() = default;
~BootSoundPlayer()
{
m_stopRequested = true;
}
void StartSound()
{
if (!m_bootSndPlayThread.joinable())
{
m_fadeOutRequested = false;
m_stopRequested = false;
m_bootSndPlayThread = std::thread{[this]() {
StreamBootSound();
}};
}
}
void FadeOutSound()
{
m_fadeOutRequested = true;
}
void ApplyFadeOutEffect(std::span<sint16> samples, uint64& fadeOutSample, uint64 fadeOutDuration)
{
for (size_t i = 0; i < samples.size(); i += 2)
{
const float decibel = (float)fadeOutSample / fadeOutDuration * -60.0f;
const float volumeFactor = pow(10, decibel / 20);
samples[i] *= volumeFactor;
samples[i + 1] *= volumeFactor;
fadeOutSample++;
}
}
void StreamBootSound()
{
SetThreadName("bootsnd");
constexpr sint32 sampleRate = 48'000;
constexpr sint32 bitsPerSample = 16;
constexpr sint32 samplesPerBlock = sampleRate / 10; // block is 1/10th of a second
constexpr sint32 nChannels = 2;
static_assert(bitsPerSample % 8 == 0, "bits per sample is not a multiple of 8");
AudioAPIPtr bootSndAudioDev;
try
{
bootSndAudioDev = IAudioAPI::CreateDeviceFromConfig(true, sampleRate, nChannels, samplesPerBlock, bitsPerSample);
if(!bootSndAudioDev)
return;
}
catch (const std::runtime_error& ex)
{
cemuLog_log(LogType::Force, "Failed to initialise audio device for bootup sound");
return;
}
bootSndAudioDev->SetAudioDelayOverride(4);
bootSndAudioDev->Play();
std::string sndPath = fmt::format("{}/meta/{}", CafeSystem::GetMlcStoragePath(CafeSystem::GetForegroundTitleId()), "bootSound.btsnd");
sint32 fscStatus = FSC_STATUS_UNDEFINED;
if(!fsc_doesFileExist(sndPath.c_str()))
return;
FSCVirtualFile* bootSndFileHandle = fsc_open(sndPath.c_str(), FSC_ACCESS_FLAG::OPEN_FILE | FSC_ACCESS_FLAG::READ_PERMISSION, &fscStatus);
if(!bootSndFileHandle)
{
cemuLog_log(LogType::Force, "failed to open bootSound.btsnd");
return;
}
constexpr sint32 audioBlockSize = samplesPerBlock * (bitsPerSample/8) * nChannels;
BootSoundReader bootSndFileReader(bootSndFileHandle, audioBlockSize);
uint64 fadeOutSample = 0; // track how far into the fadeout
constexpr uint64 fadeOutDuration = sampleRate * 2; // fadeout should last 2 seconds
while(fadeOutSample < fadeOutDuration && !m_stopRequested)
{
while (bootSndAudioDev->NeedAdditionalBlocks())
{
sint16* data = bootSndFileReader.getSamples();
if(data == nullptr)
{
// break outer loop
m_stopRequested = true;
break;
}
if(m_fadeOutRequested)
ApplyFadeOutEffect({data, samplesPerBlock * nChannels}, fadeOutSample, fadeOutDuration);
bootSndAudioDev->FeedBlock(data);
}
// sleep for the duration of a single block
std::this_thread::sleep_for(std::chrono::milliseconds(samplesPerBlock / (sampleRate/ 1'000)));
}
if(bootSndFileHandle)
fsc_close(bootSndFileHandle);
}
private:
std::thread m_bootSndPlayThread;
std::atomic_bool m_fadeOutRequested = false;
std::atomic_bool m_stopRequested = false;
};
static BootSoundPlayer g_bootSndPlayer;
void LatteShaderCache_finish()
{
if (g_renderer->GetType() == RendererAPI::Vulkan)
@ -299,6 +414,9 @@ void LatteShaderCache_Load()
loadBackgroundTexture(true, g_shaderCacheLoaderState.textureTVId);
loadBackgroundTexture(false, g_shaderCacheLoaderState.textureDRCId);
if(GetConfig().play_boot_sound)
g_bootSndPlayer.StartSound();
sint32 numLoadedShaders = 0;
uint32 loadIndex = 0;
@ -365,6 +483,11 @@ void LatteShaderCache_Load()
g_renderer->DeleteTexture(g_shaderCacheLoaderState.textureTVId);
if (g_shaderCacheLoaderState.textureDRCId)
g_renderer->DeleteTexture(g_shaderCacheLoaderState.textureDRCId);
g_bootSndPlayer.FadeOutSound();
if(Latte_GetStopSignal())
LatteThread_Exit();
}
void LatteShaderCache_ShowProgress(const std::function <bool(void)>& loadUpdateFunc, bool isPipelines)
@ -505,8 +628,6 @@ void LatteShaderCache_LoadVulkanPipelineCache(uint64 cacheTitleId)
g_shaderCacheLoaderState.loadedPipelines = 0;
LatteShaderCache_ShowProgress(LatteShaderCache_updatePipelineLoadingProgress, true);
pipelineCache.EndLoading();
if(Latte_GetStopSignal())
LatteThread_Exit();
}
bool LatteShaderCache_updatePipelineLoadingProgress()
@ -805,4 +926,4 @@ void LatteShaderCache_handleDeprecatedCacheFiles(fs::path pathGeneric, fs::path
fs::remove(pathGenericPre1_25_0, ec);
}
}
}
}

View file

@ -235,6 +235,8 @@ void Latte_Start()
void Latte_Stop()
{
std::unique_lock _lock(sLatteThreadStateMutex);
if (!sLatteThreadRunning)
return;
sLatteThreadRunning = false;
_lock.unlock();
sLatteThread.join();
@ -257,6 +259,7 @@ void LatteThread_Exit()
LatteSHRC_UnloadAll();
// close disk cache
LatteShaderCache_Close();
RendererOutputShader::ShutdownStatic();
// destroy renderer but make sure that g_renderer remains valid until the destructor has finished
if (g_renderer)
{

View file

@ -370,6 +370,8 @@ bool LatteDecompiler_IsALUTransInstruction(bool isOP3, uint32 opcode)
opcode == ALU_OP2_INST_LSHR_INT ||
opcode == ALU_OP2_INST_MAX_INT ||
opcode == ALU_OP2_INST_MIN_INT ||
opcode == ALU_OP2_INST_MAX_UINT ||
opcode == ALU_OP2_INST_MIN_UINT ||
opcode == ALU_OP2_INST_MOVA_FLOOR ||
opcode == ALU_OP2_INST_MOVA_INT ||
opcode == ALU_OP2_INST_SETE_DX10 ||

View file

@ -140,6 +140,8 @@ bool _isIntegerInstruction(const LatteDecompilerALUInstruction& aluInstruction)
case ALU_OP2_INST_SUB_INT:
case ALU_OP2_INST_MAX_INT:
case ALU_OP2_INST_MIN_INT:
case ALU_OP2_INST_MAX_UINT:
case ALU_OP2_INST_MIN_UINT:
case ALU_OP2_INST_SETE_INT:
case ALU_OP2_INST_SETGT_INT:
case ALU_OP2_INST_SETGE_INT:

View file

@ -1415,19 +1415,23 @@ void _emitALUOP2InstructionCode(LatteDecompilerShaderContext* shaderContext, Lat
}
else if( aluInstruction->opcode == ALU_OP2_INST_ADD_INT )
_emitALUOperationBinary<LATTE_DECOMPILER_DTYPE_SIGNED_INT>(shaderContext, aluInstruction, " + ");
else if( aluInstruction->opcode == ALU_OP2_INST_MAX_INT || aluInstruction->opcode == ALU_OP2_INST_MIN_INT )
else if( aluInstruction->opcode == ALU_OP2_INST_MAX_INT || aluInstruction->opcode == ALU_OP2_INST_MIN_INT ||
aluInstruction->opcode == ALU_OP2_INST_MAX_UINT || aluInstruction->opcode == ALU_OP2_INST_MIN_UINT)
{
// not verified
bool isUnsigned = aluInstruction->opcode == ALU_OP2_INST_MAX_UINT || aluInstruction->opcode == ALU_OP2_INST_MIN_UINT;
auto opType = isUnsigned ? LATTE_DECOMPILER_DTYPE_UNSIGNED_INT : LATTE_DECOMPILER_DTYPE_SIGNED_INT;
_emitInstructionOutputVariableName(shaderContext, aluInstruction);
if( aluInstruction->opcode == ALU_OP2_INST_MAX_INT )
src->add(" = max(");
src->add(" = ");
_emitTypeConversionPrefix(shaderContext, opType, outputType);
if( aluInstruction->opcode == ALU_OP2_INST_MAX_INT || aluInstruction->opcode == ALU_OP2_INST_MAX_UINT )
src->add("max(");
else
src->add(" = min(");
_emitTypeConversionPrefix(shaderContext, LATTE_DECOMPILER_DTYPE_SIGNED_INT, outputType);
_emitOperandInputCode(shaderContext, aluInstruction, 0, LATTE_DECOMPILER_DTYPE_SIGNED_INT);
src->add("min(");
_emitOperandInputCode(shaderContext, aluInstruction, 0, opType);
src->add(", ");
_emitOperandInputCode(shaderContext, aluInstruction, 1, LATTE_DECOMPILER_DTYPE_SIGNED_INT);
_emitTypeConversionSuffix(shaderContext, LATTE_DECOMPILER_DTYPE_SIGNED_INT, outputType);
_emitOperandInputCode(shaderContext, aluInstruction, 1, opType);
_emitTypeConversionSuffix(shaderContext, opType, outputType);
src->add(");" _CRLF);
}
else if( aluInstruction->opcode == ALU_OP2_INST_SUB_INT )

View file

@ -60,6 +60,8 @@
#define ALU_OP2_INST_SUB_INT (0x035) // integer instruction
#define ALU_OP2_INST_MAX_INT (0x036) // integer instruction
#define ALU_OP2_INST_MIN_INT (0x037) // integer instruction
#define ALU_OP2_INST_MAX_UINT (0x038) // integer instruction
#define ALU_OP2_INST_MIN_UINT (0x039) // integer instruction
#define ALU_OP2_INST_SETE_INT (0x03A) // integer instruction
#define ALU_OP2_INST_SETGT_INT (0x03B) // integer instruction
#define ALU_OP2_INST_SETGE_INT (0x03C) // integer instruction

View file

@ -570,13 +570,10 @@ void OpenGLRenderer::DrawBackbufferQuad(LatteTextureView* texView, RendererOutpu
g_renderer->ClearColorbuffer(padView);
}
sint32 effectiveWidth, effectiveHeight;
texView->baseTexture->GetEffectiveSize(effectiveWidth, effectiveHeight, 0);
shader_unbind(RendererShader::ShaderType::kGeometry);
shader_bind(shader->GetVertexShader());
shader_bind(shader->GetFragmentShader());
shader->SetUniformParameters(*texView, { effectiveWidth, effectiveHeight }, { imageWidth, imageHeight });
shader->SetUniformParameters(*texView, {imageWidth, imageHeight});
// set viewport
glViewportIndexedf(0, imageX, imageY, imageWidth, imageHeight);
@ -584,6 +581,12 @@ void OpenGLRenderer::DrawBackbufferQuad(LatteTextureView* texView, RendererOutpu
LatteTextureViewGL* texViewGL = (LatteTextureViewGL*)texView;
texture_bindAndActivate(texView, 0);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
texViewGL->samplerState.clampS = texViewGL->samplerState.clampT = 0xFF;
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, useLinearTexFilter ? GL_LINEAR : GL_NEAREST);
texViewGL->samplerState.filterMin = 0xFFFFFFFF;
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, useLinearTexFilter ? GL_LINEAR : GL_NEAREST);
texViewGL->samplerState.filterMag = 0xFFFFFFFF;

View file

@ -102,16 +102,21 @@ public:
static void SetAttributeArrayState(uint32 index, bool isEnabled, sint32 aluDivisor);
static void SetArrayElementBuffer(GLuint arrayElementBuffer);
// index
void* indexData_reserveIndexMemory(uint32 size, uint32& offset, uint32& bufferIndex) override
// index (not used by OpenGL renderer yet)
IndexAllocation indexData_reserveIndexMemory(uint32 size) override
{
assert_dbg();
return nullptr;
cemu_assert_unimplemented();
return {};
}
void indexData_uploadIndexMemory(uint32 offset, uint32 size) override
void indexData_releaseIndexMemory(IndexAllocation& allocation) override
{
assert_dbg();
cemu_assert_unimplemented();
}
void indexData_uploadIndexMemory(IndexAllocation& allocation) override
{
cemu_assert_unimplemented();
}
// uniform

View file

@ -138,8 +138,15 @@ public:
virtual void draw_endSequence() = 0;
// index
virtual void* indexData_reserveIndexMemory(uint32 size, uint32& offset, uint32& bufferIndex) = 0;
virtual void indexData_uploadIndexMemory(uint32 offset, uint32 size) = 0;
struct IndexAllocation
{
void* mem; // pointer to index data inside buffer
void* rendererInternal; // for renderer use
};
virtual IndexAllocation indexData_reserveIndexMemory(uint32 size) = 0;
virtual void indexData_releaseIndexMemory(IndexAllocation& allocation) = 0;
virtual void indexData_uploadIndexMemory(IndexAllocation& allocation) = 0;
// occlusion queries
virtual LatteQueryObject* occlusionQuery_create() = 0;

View file

@ -2,18 +2,7 @@
#include "Cafe/HW/Latte/Renderer/OpenGL/OpenGLRenderer.h"
const std::string RendererOutputShader::s_copy_shader_source =
R"(#version 420
#ifdef VULKAN
layout(location = 0) in vec2 passUV;
layout(binding = 0) uniform sampler2D textureSrc;
layout(location = 0) out vec4 colorOut0;
#else
in vec2 passUV;
layout(binding=0) uniform sampler2D textureSrc;
layout(location = 0) out vec4 colorOut0;
#endif
R"(
void main()
{
colorOut0 = vec4(texture(textureSrc, passUV).rgb,1.0);
@ -22,20 +11,6 @@ void main()
const std::string RendererOutputShader::s_bicubic_shader_source =
R"(
#version 420
#ifdef VULKAN
layout(location = 0) in vec2 passUV;
layout(binding = 0) uniform sampler2D textureSrc;
layout(binding = 1) uniform vec2 textureSrcResolution;
layout(location = 0) out vec4 colorOut0;
#else
in vec2 passUV;
layout(binding=0) uniform sampler2D textureSrc;
uniform vec2 textureSrcResolution;
layout(location = 0) out vec4 colorOut0;
#endif
vec4 cubic(float x)
{
float x2 = x * x;
@ -48,24 +23,23 @@ vec4 cubic(float x)
return w / 6.0;
}
vec4 bcFilter(vec2 texcoord, vec2 texscale)
vec4 bcFilter(vec2 uv, vec4 texelSize)
{
float fx = fract(texcoord.x);
float fy = fract(texcoord.y);
texcoord.x -= fx;
texcoord.y -= fy;
vec2 pixel = uv*texelSize.zw - 0.5;
vec2 pixelFrac = fract(pixel);
vec2 pixelInt = pixel - pixelFrac;
vec4 xcubic = cubic(fx);
vec4 ycubic = cubic(fy);
vec4 xcubic = cubic(pixelFrac.x);
vec4 ycubic = cubic(pixelFrac.y);
vec4 c = vec4(texcoord.x - 0.5, texcoord.x + 1.5, texcoord.y - 0.5, texcoord.y + 1.5);
vec4 c = vec4(pixelInt.x - 0.5, pixelInt.x + 1.5, pixelInt.y - 0.5, pixelInt.y + 1.5);
vec4 s = vec4(xcubic.x + xcubic.y, xcubic.z + xcubic.w, ycubic.x + ycubic.y, ycubic.z + ycubic.w);
vec4 offset = c + vec4(xcubic.y, xcubic.w, ycubic.y, ycubic.w) / s;
vec4 sample0 = texture(textureSrc, vec2(offset.x, offset.z) * texscale);
vec4 sample1 = texture(textureSrc, vec2(offset.y, offset.z) * texscale);
vec4 sample2 = texture(textureSrc, vec2(offset.x, offset.w) * texscale);
vec4 sample3 = texture(textureSrc, vec2(offset.y, offset.w) * texscale);
vec4 sample0 = texture(textureSrc, vec2(offset.x, offset.z) * texelSize.xy);
vec4 sample1 = texture(textureSrc, vec2(offset.y, offset.z) * texelSize.xy);
vec4 sample2 = texture(textureSrc, vec2(offset.x, offset.w) * texelSize.xy);
vec4 sample3 = texture(textureSrc, vec2(offset.y, offset.w) * texelSize.xy);
float sx = s.x / (s.x + s.y);
float sy = s.z / (s.z + s.w);
@ -76,20 +50,13 @@ vec4 bcFilter(vec2 texcoord, vec2 texscale)
}
void main(){
colorOut0 = vec4(bcFilter(passUV*textureSrcResolution, vec2(1.0,1.0)/textureSrcResolution).rgb,1.0);
vec4 texelSize = vec4( 1.0 / textureSrcResolution.xy, textureSrcResolution.xy);
colorOut0 = vec4(bcFilter(passUV, texelSize).rgb,1.0);
}
)";
const std::string RendererOutputShader::s_hermite_shader_source =
R"(#version 420
in vec4 gl_FragCoord;
in vec2 passUV;
layout(binding=0) uniform sampler2D textureSrc;
uniform vec2 textureSrcResolution;
uniform vec2 outputResolution;
layout(location = 0) out vec4 colorOut0;
R"(
// https://www.shadertoy.com/view/MllSzX
vec3 CubicHermite (vec3 A, vec3 B, vec3 C, vec3 D, float t)
@ -111,7 +78,7 @@ vec3 BicubicHermiteTexture(vec2 uv, vec4 texelSize)
vec2 frac = fract(pixel);
pixel = floor(pixel) / texelSize.zw - vec2(texelSize.xy/2.0);
vec4 doubleSize = texelSize*texelSize;
vec4 doubleSize = texelSize*2.0;
vec3 C00 = texture(textureSrc, pixel + vec2(-texelSize.x ,-texelSize.y)).rgb;
vec3 C10 = texture(textureSrc, pixel + vec2( 0.0 ,-texelSize.y)).rgb;
@ -142,15 +109,17 @@ vec3 BicubicHermiteTexture(vec2 uv, vec4 texelSize)
}
void main(){
vec4 texelSize = vec4( 1.0 / outputResolution.xy, outputResolution.xy);
vec4 texelSize = vec4( 1.0 / textureSrcResolution.xy, textureSrcResolution.xy);
colorOut0 = vec4(BicubicHermiteTexture(passUV, texelSize), 1.0);
}
)";
RendererOutputShader::RendererOutputShader(const std::string& vertex_source, const std::string& fragment_source)
{
m_vertex_shader = g_renderer->shader_create(RendererShader::ShaderType::kVertex, 0, 0, vertex_source, false, false);
m_fragment_shader = g_renderer->shader_create(RendererShader::ShaderType::kFragment, 0, 0, fragment_source, false, false);
auto finalFragmentSrc = PrependFragmentPreamble(fragment_source);
m_vertex_shader.reset(g_renderer->shader_create(RendererShader::ShaderType::kVertex, 0, 0, vertex_source, false, false));
m_fragment_shader.reset(g_renderer->shader_create(RendererShader::ShaderType::kFragment, 0, 0, finalFragmentSrc, false, false));
m_vertex_shader->PreponeCompilation(true);
m_fragment_shader->PreponeCompilation(true);
@ -163,74 +132,45 @@ RendererOutputShader::RendererOutputShader(const std::string& vertex_source, con
if (g_renderer->GetType() == RendererAPI::OpenGL)
{
m_attributes[0].m_loc_texture_src_resolution = m_vertex_shader->GetUniformLocation("textureSrcResolution");
m_attributes[0].m_loc_input_resolution = m_vertex_shader->GetUniformLocation("inputResolution");
m_attributes[0].m_loc_output_resolution = m_vertex_shader->GetUniformLocation("outputResolution");
m_uniformLocations[0].m_loc_textureSrcResolution = m_vertex_shader->GetUniformLocation("textureSrcResolution");
m_uniformLocations[0].m_loc_nativeResolution = m_vertex_shader->GetUniformLocation("nativeResolution");
m_uniformLocations[0].m_loc_outputResolution = m_vertex_shader->GetUniformLocation("outputResolution");
m_attributes[1].m_loc_texture_src_resolution = m_fragment_shader->GetUniformLocation("textureSrcResolution");
m_attributes[1].m_loc_input_resolution = m_fragment_shader->GetUniformLocation("inputResolution");
m_attributes[1].m_loc_output_resolution = m_fragment_shader->GetUniformLocation("outputResolution");
m_uniformLocations[1].m_loc_textureSrcResolution = m_fragment_shader->GetUniformLocation("textureSrcResolution");
m_uniformLocations[1].m_loc_nativeResolution = m_fragment_shader->GetUniformLocation("nativeResolution");
m_uniformLocations[1].m_loc_outputResolution = m_fragment_shader->GetUniformLocation("outputResolution");
}
else
{
cemuLog_logDebug(LogType::Force, "RendererOutputShader() - todo for Vulkan");
m_attributes[0].m_loc_texture_src_resolution = -1;
m_attributes[0].m_loc_input_resolution = -1;
m_attributes[0].m_loc_output_resolution = -1;
m_attributes[1].m_loc_texture_src_resolution = -1;
m_attributes[1].m_loc_input_resolution = -1;
m_attributes[1].m_loc_output_resolution = -1;
}
}
void RendererOutputShader::SetUniformParameters(const LatteTextureView& texture_view, const Vector2i& input_res, const Vector2i& output_res) const
void RendererOutputShader::SetUniformParameters(const LatteTextureView& texture_view, const Vector2i& output_res) const
{
float res[2];
// vertex shader
if (m_attributes[0].m_loc_texture_src_resolution != -1)
{
res[0] = (float)texture_view.baseTexture->width;
res[1] = (float)texture_view.baseTexture->height;
m_vertex_shader->SetUniform2fv(m_attributes[0].m_loc_texture_src_resolution, res, 1);
}
sint32 effectiveWidth, effectiveHeight;
texture_view.baseTexture->GetEffectiveSize(effectiveWidth, effectiveHeight, 0);
auto setUniforms = [&](RendererShader* shader, const UniformLocations& locations){
float res[2];
if (locations.m_loc_textureSrcResolution != -1)
{
res[0] = (float)effectiveWidth;
res[1] = (float)effectiveHeight;
shader->SetUniform2fv(locations.m_loc_textureSrcResolution, res, 1);
}
if (m_attributes[0].m_loc_input_resolution != -1)
{
res[0] = (float)input_res.x;
res[1] = (float)input_res.y;
m_vertex_shader->SetUniform2fv(m_attributes[0].m_loc_input_resolution, res, 1);
}
if (locations.m_loc_nativeResolution != -1)
{
res[0] = (float)texture_view.baseTexture->width;
res[1] = (float)texture_view.baseTexture->height;
shader->SetUniform2fv(locations.m_loc_nativeResolution, res, 1);
}
if (m_attributes[0].m_loc_output_resolution != -1)
{
res[0] = (float)output_res.x;
res[1] = (float)output_res.y;
m_vertex_shader->SetUniform2fv(m_attributes[0].m_loc_output_resolution, res, 1);
}
// fragment shader
if (m_attributes[1].m_loc_texture_src_resolution != -1)
{
res[0] = (float)texture_view.baseTexture->width;
res[1] = (float)texture_view.baseTexture->height;
m_fragment_shader->SetUniform2fv(m_attributes[1].m_loc_texture_src_resolution, res, 1);
}
if (m_attributes[1].m_loc_input_resolution != -1)
{
res[0] = (float)input_res.x;
res[1] = (float)input_res.y;
m_fragment_shader->SetUniform2fv(m_attributes[1].m_loc_input_resolution, res, 1);
}
if (m_attributes[1].m_loc_output_resolution != -1)
{
res[0] = (float)output_res.x;
res[1] = (float)output_res.y;
m_fragment_shader->SetUniform2fv(m_attributes[1].m_loc_output_resolution, res, 1);
}
if (locations.m_loc_outputResolution != -1)
{
res[0] = (float)output_res.x;
res[1] = (float)output_res.y;
shader->SetUniform2fv(locations.m_loc_outputResolution, res, 1);
}
};
setUniforms(m_vertex_shader.get(), m_uniformLocations[0]);
setUniforms(m_fragment_shader.get(), m_uniformLocations[1]);
}
RendererOutputShader* RendererOutputShader::s_copy_shader;
@ -247,8 +187,8 @@ std::string RendererOutputShader::GetOpenGlVertexSource(bool render_upside_down)
// vertex shader
std::ostringstream vertex_source;
vertex_source <<
R"(#version 400
out vec2 passUV;
R"(#version 420
layout(location = 0) smooth out vec2 passUV;
out gl_PerVertex
{
@ -341,6 +281,27 @@ void main(){
)";
return vertex_source.str();
}
std::string RendererOutputShader::PrependFragmentPreamble(const std::string& shaderSrc)
{
return R"(#version 430
#ifdef VULKAN
layout(push_constant) uniform pc {
vec2 textureSrcResolution;
vec2 nativeResolution;
vec2 outputResolution;
};
#else
uniform vec2 textureSrcResolution;
uniform vec2 nativeResolution;
uniform vec2 outputResolution;
#endif
layout(location = 0) smooth in vec2 passUV;
layout(binding = 0) uniform sampler2D textureSrc;
layout(location = 0) out vec4 colorOut0;
)" + shaderSrc;
}
void RendererOutputShader::InitializeStatic()
{
std::string vertex_source, vertex_source_ud;
@ -349,28 +310,30 @@ void RendererOutputShader::InitializeStatic()
{
vertex_source = GetOpenGlVertexSource(false);
vertex_source_ud = GetOpenGlVertexSource(true);
s_copy_shader = new RendererOutputShader(vertex_source, s_copy_shader_source);
s_copy_shader_ud = new RendererOutputShader(vertex_source_ud, s_copy_shader_source);
s_bicubic_shader = new RendererOutputShader(vertex_source, s_bicubic_shader_source);
s_bicubic_shader_ud = new RendererOutputShader(vertex_source_ud, s_bicubic_shader_source);
s_hermit_shader = new RendererOutputShader(vertex_source, s_hermite_shader_source);
s_hermit_shader_ud = new RendererOutputShader(vertex_source_ud, s_hermite_shader_source);
}
else
{
vertex_source = GetVulkanVertexSource(false);
vertex_source_ud = GetVulkanVertexSource(true);
s_copy_shader = new RendererOutputShader(vertex_source, s_copy_shader_source);
s_copy_shader_ud = new RendererOutputShader(vertex_source_ud, s_copy_shader_source);
/* s_bicubic_shader = new RendererOutputShader(vertex_source, s_bicubic_shader_source); TODO
s_bicubic_shader_ud = new RendererOutputShader(vertex_source_ud, s_bicubic_shader_source);
s_hermit_shader = new RendererOutputShader(vertex_source, s_hermite_shader_source);
s_hermit_shader_ud = new RendererOutputShader(vertex_source_ud, s_hermite_shader_source);*/
}
s_copy_shader = new RendererOutputShader(vertex_source, s_copy_shader_source);
s_copy_shader_ud = new RendererOutputShader(vertex_source_ud, s_copy_shader_source);
s_bicubic_shader = new RendererOutputShader(vertex_source, s_bicubic_shader_source);
s_bicubic_shader_ud = new RendererOutputShader(vertex_source_ud, s_bicubic_shader_source);
s_hermit_shader = new RendererOutputShader(vertex_source, s_hermite_shader_source);
s_hermit_shader_ud = new RendererOutputShader(vertex_source_ud, s_hermite_shader_source);
}
void RendererOutputShader::ShutdownStatic()
{
delete s_copy_shader;
delete s_copy_shader_ud;
delete s_bicubic_shader;
delete s_bicubic_shader_ud;
delete s_hermit_shader;
delete s_hermit_shader_ud;
}

View file

@ -17,19 +17,20 @@ public:
RendererOutputShader(const std::string& vertex_source, const std::string& fragment_source);
virtual ~RendererOutputShader() = default;
void SetUniformParameters(const LatteTextureView& texture_view, const Vector2i& input_res, const Vector2i& output_res) const;
void SetUniformParameters(const LatteTextureView& texture_view, const Vector2i& output_res) const;
RendererShader* GetVertexShader() const
{
return m_vertex_shader;
return m_vertex_shader.get();
}
RendererShader* GetFragmentShader() const
{
return m_fragment_shader;
return m_fragment_shader.get();
}
static void InitializeStatic();
static void ShutdownStatic();
static RendererOutputShader* s_copy_shader;
static RendererOutputShader* s_copy_shader_ud;
@ -43,16 +44,18 @@ public:
static std::string GetVulkanVertexSource(bool render_upside_down);
static std::string GetOpenGlVertexSource(bool render_upside_down);
protected:
RendererShader* m_vertex_shader;
RendererShader* m_fragment_shader;
static std::string PrependFragmentPreamble(const std::string& shaderSrc);
struct
protected:
std::unique_ptr<RendererShader> m_vertex_shader;
std::unique_ptr<RendererShader> m_fragment_shader;
struct UniformLocations
{
sint32 m_loc_texture_src_resolution = -1;
sint32 m_loc_input_resolution = -1;
sint32 m_loc_output_resolution = -1;
} m_attributes[2]{};
sint32 m_loc_textureSrcResolution = -1;
sint32 m_loc_nativeResolution = -1;
sint32 m_loc_outputResolution = -1;
} m_uniformLocations[2]{};
private:
static const std::string s_copy_shader_source;

View file

@ -202,6 +202,13 @@ VkSampler LatteTextureViewVk::GetDefaultTextureSampler(bool useLinearTexFilter)
VkSamplerCreateInfo samplerInfo{};
samplerInfo.sType = VK_STRUCTURE_TYPE_SAMPLER_CREATE_INFO;
// emulate OpenGL minFilters
// see note under: https://docs.vulkan.org/spec/latest/chapters/samplers.html#VkSamplerCreateInfo
// if maxLod = 0 then magnification is always performed
samplerInfo.mipmapMode = VK_SAMPLER_MIPMAP_MODE_NEAREST;
samplerInfo.minLod = 0.0f;
samplerInfo.maxLod = 0.25f;
if (useLinearTexFilter)
{
samplerInfo.magFilter = VK_FILTER_LINEAR;
@ -212,6 +219,9 @@ VkSampler LatteTextureViewVk::GetDefaultTextureSampler(bool useLinearTexFilter)
samplerInfo.magFilter = VK_FILTER_NEAREST;
samplerInfo.minFilter = VK_FILTER_NEAREST;
}
samplerInfo.addressModeU = VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE;
samplerInfo.addressModeV = VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE;
samplerInfo.addressModeW = VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE;
if (vkCreateSampler(m_device, &samplerInfo, nullptr, &sampler) != VK_SUCCESS)
{

View file

@ -211,6 +211,9 @@ RendererShaderVk::~RendererShaderVk()
{
while (!list_pipelineInfo.empty())
delete list_pipelineInfo[0];
VkDevice vkDev = VulkanRenderer::GetInstance()->GetLogicalDevice();
vkDestroyShaderModule(vkDev, m_shader_module, nullptr);
}
void RendererShaderVk::Init()

View file

@ -60,7 +60,7 @@ void SwapchainInfoVk::Create()
VkAttachmentDescription colorAttachment = {};
colorAttachment.format = m_surfaceFormat.format;
colorAttachment.samples = VK_SAMPLE_COUNT_1_BIT;
colorAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_LOAD;
colorAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
colorAttachment.storeOp = VK_ATTACHMENT_STORE_OP_STORE;
colorAttachment.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
colorAttachment.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
@ -146,8 +146,17 @@ void SwapchainInfoVk::Create()
UnrecoverableError("Failed to create semaphore for swapchain acquire");
}
VkFenceCreateInfo fenceInfo = {};
fenceInfo.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO;
fenceInfo.flags = VK_FENCE_CREATE_SIGNALED_BIT;
result = vkCreateFence(m_logicalDevice, &fenceInfo, nullptr, &m_imageAvailableFence);
if (result != VK_SUCCESS)
UnrecoverableError("Failed to create fence for swapchain");
m_acquireIndex = 0;
hasDefinedSwapchainImage = false;
m_queueDepth = 0;
}
void SwapchainInfoVk::Cleanup()
@ -177,6 +186,12 @@ void SwapchainInfoVk::Cleanup()
m_swapchainFramebuffers.clear();
if (m_imageAvailableFence)
{
WaitAvailableFence();
vkDestroyFence(m_logicalDevice, m_imageAvailableFence, nullptr);
m_imageAvailableFence = nullptr;
}
if (m_swapchain)
{
vkDestroySwapchainKHR(m_logicalDevice, m_swapchain, nullptr);
@ -189,6 +204,18 @@ bool SwapchainInfoVk::IsValid() const
return m_swapchain && !m_acquireSemaphores.empty();
}
void SwapchainInfoVk::WaitAvailableFence()
{
if(m_awaitableFence != VK_NULL_HANDLE)
vkWaitForFences(m_logicalDevice, 1, &m_awaitableFence, VK_TRUE, UINT64_MAX);
m_awaitableFence = VK_NULL_HANDLE;
}
void SwapchainInfoVk::ResetAvailableFence() const
{
vkResetFences(m_logicalDevice, 1, &m_imageAvailableFence);
}
VkSemaphore SwapchainInfoVk::ConsumeAcquireSemaphore()
{
VkSemaphore ret = m_currentSemaphore;
@ -198,8 +225,10 @@ VkSemaphore SwapchainInfoVk::ConsumeAcquireSemaphore()
bool SwapchainInfoVk::AcquireImage()
{
ResetAvailableFence();
VkSemaphore acquireSemaphore = m_acquireSemaphores[m_acquireIndex];
VkResult result = vkAcquireNextImageKHR(m_logicalDevice, m_swapchain, 1'000'000'000, acquireSemaphore, nullptr, &swapchainImageIndex);
VkResult result = vkAcquireNextImageKHR(m_logicalDevice, m_swapchain, 1'000'000'000, acquireSemaphore, m_imageAvailableFence, &swapchainImageIndex);
if (result == VK_ERROR_OUT_OF_DATE_KHR || result == VK_SUBOPTIMAL_KHR)
m_shouldRecreate = true;
if (result == VK_TIMEOUT)
@ -216,6 +245,7 @@ bool SwapchainInfoVk::AcquireImage()
return false;
}
m_currentSemaphore = acquireSemaphore;
m_awaitableFence = m_imageAvailableFence;
m_acquireIndex = (m_acquireIndex + 1) % m_swapchainImages.size();
return true;
@ -319,6 +349,7 @@ VkExtent2D SwapchainInfoVk::ChooseSwapExtent(const VkSurfaceCapabilitiesKHR& cap
VkPresentModeKHR SwapchainInfoVk::ChoosePresentMode(const std::vector<VkPresentModeKHR>& modes)
{
m_maxQueued = 0;
const auto vsyncState = (VSync)GetConfig().vsync.GetValue();
if (vsyncState == VSync::MAILBOX)
{
@ -345,6 +376,7 @@ VkPresentModeKHR SwapchainInfoVk::ChoosePresentMode(const std::vector<VkPresentM
return VK_PRESENT_MODE_FIFO_KHR;
}
m_maxQueued = 1;
return VK_PRESENT_MODE_FIFO_KHR;
}

View file

@ -26,6 +26,9 @@ struct SwapchainInfoVk
bool IsValid() const;
void WaitAvailableFence();
void ResetAvailableFence() const;
bool AcquireImage();
// retrieve semaphore of last acquire for submitting a wait operation
// only one wait operation must be submitted per acquire (which submits a single signal operation)
@ -67,7 +70,11 @@ struct SwapchainInfoVk
VkSurfaceFormatKHR m_surfaceFormat{};
VkSwapchainKHR m_swapchain{};
Vector2i m_desiredExtent{};
VkExtent2D m_actualExtent{};
uint32 swapchainImageIndex = (uint32)-1;
uint64 m_presentId = 1;
uint64 m_queueDepth = 0; // number of frames with pending presentation requests
uint64 m_maxQueued = 0; // the maximum number of frames with presentation requests.
// swapchain image ringbuffer (indexed by swapchainImageIndex)
@ -81,8 +88,9 @@ struct SwapchainInfoVk
private:
uint32 m_acquireIndex = 0;
std::vector<VkSemaphore> m_acquireSemaphores; // indexed by m_acquireIndex
VkFence m_imageAvailableFence{};
VkFence m_awaitableFence = VK_NULL_HANDLE;
VkSemaphore m_currentSemaphore = VK_NULL_HANDLE;
std::array<uint32, 2> m_swapchainQueueFamilyIndices;
VkExtent2D m_actualExtent{};
};

View file

@ -22,7 +22,7 @@ uint32 LatteTextureReadbackInfoVk::GetImageSize(LatteTextureView* textureView)
cemu_assert(textureFormat == VK_FORMAT_R8G8B8A8_UNORM);
return baseTexture->width * baseTexture->height * 4;
}
else if (textureView->format == Latte::E_GX2SURFFMT::R8_UNORM)
else if (textureView->format == Latte::E_GX2SURFFMT::R8_UNORM )
{
cemu_assert(textureFormat == VK_FORMAT_R8_UNORM);
return baseTexture->width * baseTexture->height * 1;
@ -79,6 +79,13 @@ uint32 LatteTextureReadbackInfoVk::GetImageSize(LatteTextureView* textureView)
// todo - if driver does not support VK_FORMAT_D24_UNORM_S8_UINT this is represented as VK_FORMAT_D32_SFLOAT_S8_UINT which is 8 bytes
return baseTexture->width * baseTexture->height * 4;
}
else if (textureView->format == Latte::E_GX2SURFFMT::R5_G6_B5_UNORM )
{
if(textureFormat == VK_FORMAT_R5G6B5_UNORM_PACK16){
return baseTexture->width * baseTexture->height * 2;
}
return 0;
}
else
{
cemuLog_log(LogType::Force, "Unsupported texture readback format {:04x}", (uint32)textureView->format);

View file

@ -19,7 +19,7 @@ public:
virtual ~VKRMoveableRefCounter()
{
cemu_assert_debug(refCount == 0);
cemu_assert_debug(m_refCount == 0);
// remove references
#ifdef CEMU_DEBUG_ASSERT
@ -30,7 +30,11 @@ public:
}
#endif
for (auto itr : refs)
itr->ref->refCount--;
{
itr->ref->m_refCount--;
if (itr->ref->m_refCount == 0)
itr->ref->RefCountReachedZero();
}
refs.clear();
delete selfRef;
selfRef = nullptr;
@ -41,8 +45,8 @@ public:
VKRMoveableRefCounter(VKRMoveableRefCounter&& rhs) noexcept
{
this->refs = std::move(rhs.refs);
this->refCount = rhs.refCount;
rhs.refCount = 0;
this->m_refCount = rhs.m_refCount;
rhs.m_refCount = 0;
this->selfRef = rhs.selfRef;
rhs.selfRef = nullptr;
this->selfRef->ref = this;
@ -57,7 +61,7 @@ public:
void addRef(VKRMoveableRefCounter* refTarget)
{
this->refs.emplace_back(refTarget->selfRef);
refTarget->refCount++;
refTarget->m_refCount++;
#ifdef CEMU_DEBUG_ASSERT
// add reverse ref
@ -68,16 +72,23 @@ public:
// methods to directly increment/decrement ref counter (for situations where no external object is available)
void incRef()
{
this->refCount++;
m_refCount++;
}
void decRef()
{
this->refCount--;
m_refCount--;
if (m_refCount == 0)
RefCountReachedZero();
}
protected:
int refCount{};
virtual void RefCountReachedZero()
{
// does nothing by default
}
int m_refCount{};
private:
VKRMoveableRefCounterRef* selfRef;
std::vector<VKRMoveableRefCounterRef*> refs;
@ -88,7 +99,7 @@ private:
void moveObj(VKRMoveableRefCounter&& rhs)
{
this->refs = std::move(rhs.refs);
this->refCount = rhs.refCount;
this->m_refCount = rhs.m_refCount;
this->selfRef = rhs.selfRef;
this->selfRef->ref = this;
}
@ -131,6 +142,25 @@ public:
VkSampler m_textureDefaultSampler[2] = { VK_NULL_HANDLE, VK_NULL_HANDLE }; // relict from LatteTextureViewVk, get rid of it eventually
};
class VKRObjectSampler : public VKRDestructibleObject
{
public:
VKRObjectSampler(VkSamplerCreateInfo* samplerInfo);
~VKRObjectSampler() override;
static VKRObjectSampler* GetOrCreateSampler(VkSamplerCreateInfo* samplerInfo);
static void DestroyCache();
void RefCountReachedZero() override; // sampler objects are destroyed when not referenced anymore
VkSampler GetSampler() const { return m_sampler; }
private:
static std::unordered_map<uint64, VKRObjectSampler*> s_samplerCache;
VkSampler m_sampler{ VK_NULL_HANDLE };
uint64 m_hash;
};
class VKRObjectRenderPass : public VKRDestructibleObject
{
public:
@ -191,11 +221,14 @@ public:
VKRObjectPipeline();
~VKRObjectPipeline() override;
void setPipeline(VkPipeline newPipeline);
void SetPipeline(VkPipeline newPipeline);
VkPipeline GetPipeline() const { return m_pipeline; }
VkPipeline pipeline = VK_NULL_HANDLE;
VkDescriptorSetLayout vertexDSL = VK_NULL_HANDLE, pixelDSL = VK_NULL_HANDLE, geometryDSL = VK_NULL_HANDLE;
VkPipelineLayout pipeline_layout = VK_NULL_HANDLE;
VkDescriptorSetLayout m_vertexDSL = VK_NULL_HANDLE, m_pixelDSL = VK_NULL_HANDLE, m_geometryDSL = VK_NULL_HANDLE;
VkPipelineLayout m_pipelineLayout = VK_NULL_HANDLE;
private:
VkPipeline m_pipeline = VK_NULL_HANDLE;
};
class VKRObjectDescriptorSet : public VKRDestructibleObject

View file

@ -4,6 +4,14 @@
/* VKRSynchronizedMemoryBuffer */
VKRSynchronizedRingAllocator::~VKRSynchronizedRingAllocator()
{
for(auto& buf : m_buffers)
{
m_vkrMemMgr->DeleteBuffer(buf.vk_buffer, buf.vk_mem);
}
}
void VKRSynchronizedRingAllocator::addUploadBufferSyncPoint(AllocatorBuffer_t& buffer, uint32 offset)
{
auto cmdBufferId = m_vkr->GetCurrentCommandBufferId();
@ -23,11 +31,11 @@ void VKRSynchronizedRingAllocator::allocateAdditionalUploadBuffer(uint32 sizeReq
AllocatorBuffer_t newBuffer{};
newBuffer.writeIndex = 0;
newBuffer.basePtr = nullptr;
if (m_bufferType == BUFFER_TYPE::STAGING)
if (m_bufferType == VKR_BUFFER_TYPE::STAGING)
m_vkrMemMgr->CreateBuffer(bufferAllocSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT, newBuffer.vk_buffer, newBuffer.vk_mem);
else if (m_bufferType == BUFFER_TYPE::INDEX)
else if (m_bufferType == VKR_BUFFER_TYPE::INDEX)
m_vkrMemMgr->CreateBuffer(bufferAllocSize, VK_BUFFER_USAGE_INDEX_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, newBuffer.vk_buffer, newBuffer.vk_mem);
else if (m_bufferType == BUFFER_TYPE::STRIDE)
else if (m_bufferType == VKR_BUFFER_TYPE::STRIDE)
m_vkrMemMgr->CreateBuffer(bufferAllocSize, VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, newBuffer.vk_buffer, newBuffer.vk_mem);
else
cemu_assert_debug(false);
@ -53,7 +61,7 @@ VKRSynchronizedRingAllocator::AllocatorReservation_t VKRSynchronizedRingAllocato
uint32 distanceToSyncPoint;
if (!itr.queue_syncPoints.empty())
{
if(itr.queue_syncPoints.front().offset < itr.writeIndex)
if (itr.queue_syncPoints.front().offset < itr.writeIndex)
distanceToSyncPoint = 0xFFFFFFFF;
else
distanceToSyncPoint = itr.queue_syncPoints.front().offset - itr.writeIndex;
@ -100,7 +108,7 @@ VKRSynchronizedRingAllocator::AllocatorReservation_t VKRSynchronizedRingAllocato
void VKRSynchronizedRingAllocator::FlushReservation(AllocatorReservation_t& uploadReservation)
{
cemu_assert_debug(m_bufferType == BUFFER_TYPE::STAGING); // only the staging buffer isn't coherent
cemu_assert_debug(m_bufferType == VKR_BUFFER_TYPE::STAGING); // only the staging buffer isn't coherent
// todo - use nonCoherentAtomSize for flush size (instead of hardcoded constant)
VkMappedMemoryRange flushedRange{};
flushedRange.sType = VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE;
@ -167,15 +175,88 @@ void VKRSynchronizedRingAllocator::GetStats(uint32& numBuffers, size_t& totalBuf
}
}
/* VKRSynchronizedHeapAllocator */
VKRSynchronizedHeapAllocator::VKRSynchronizedHeapAllocator(class VKRMemoryManager* vkMemoryManager, VKR_BUFFER_TYPE bufferType, size_t minimumBufferAllocSize)
: m_vkrMemMgr(vkMemoryManager), m_chunkedHeap(bufferType, minimumBufferAllocSize) {};
VKRSynchronizedHeapAllocator::AllocatorReservation* VKRSynchronizedHeapAllocator::AllocateBufferMemory(uint32 size, uint32 alignment)
{
CHAddr addr = m_chunkedHeap.alloc(size, alignment);
m_activeAllocations.emplace_back(addr);
AllocatorReservation* res = m_poolAllocatorReservation.allocObj();
res->bufferIndex = addr.chunkIndex;
res->bufferOffset = addr.offset;
res->size = size;
res->memPtr = m_chunkedHeap.GetChunkPtr(addr.chunkIndex) + addr.offset;
m_chunkedHeap.GetChunkVkMemInfo(addr.chunkIndex, res->vkBuffer, res->vkMem);
return res;
}
void VKRSynchronizedHeapAllocator::FreeReservation(AllocatorReservation* uploadReservation)
{
// put the allocation on a delayed release queue for the current command buffer
uint64 currentCommandBufferId = VulkanRenderer::GetInstance()->GetCurrentCommandBufferId();
auto it = std::find_if(m_activeAllocations.begin(), m_activeAllocations.end(), [&uploadReservation](const TrackedAllocation& allocation) { return allocation.allocation.chunkIndex == uploadReservation->bufferIndex && allocation.allocation.offset == uploadReservation->bufferOffset; });
cemu_assert_debug(it != m_activeAllocations.end());
m_releaseQueue[currentCommandBufferId].emplace_back(it->allocation);
m_activeAllocations.erase(it);
m_poolAllocatorReservation.freeObj(uploadReservation);
}
void VKRSynchronizedHeapAllocator::FlushReservation(AllocatorReservation* uploadReservation)
{
if (m_chunkedHeap.RequiresFlush(uploadReservation->bufferIndex))
{
VkMappedMemoryRange flushedRange{};
flushedRange.sType = VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE;
flushedRange.memory = uploadReservation->vkMem;
flushedRange.offset = uploadReservation->bufferOffset;
flushedRange.size = uploadReservation->size;
vkFlushMappedMemoryRanges(VulkanRenderer::GetInstance()->GetLogicalDevice(), 1, &flushedRange);
}
}
void VKRSynchronizedHeapAllocator::CleanupBuffer(uint64 latestFinishedCommandBufferId)
{
auto it = m_releaseQueue.begin();
while (it != m_releaseQueue.end())
{
if (it->first <= latestFinishedCommandBufferId)
{
// release allocations
for(auto& addr : it->second)
m_chunkedHeap.free(addr);
it = m_releaseQueue.erase(it);
continue;
}
it++;
}
}
void VKRSynchronizedHeapAllocator::GetStats(uint32& numBuffers, size_t& totalBufferSize, size_t& freeBufferSize) const
{
m_chunkedHeap.GetStats(numBuffers, totalBufferSize, freeBufferSize);
}
/* VkTextureChunkedHeap */
VkTextureChunkedHeap::~VkTextureChunkedHeap()
{
VkDevice device = VulkanRenderer::GetInstance()->GetLogicalDevice();
for (auto& i : m_list_chunkInfo)
{
vkFreeMemory(device, i.mem, nullptr);
}
}
uint32 VkTextureChunkedHeap::allocateNewChunk(uint32 chunkIndex, uint32 minimumAllocationSize)
{
cemu_assert_debug(m_list_chunkInfo.size() == chunkIndex);
m_list_chunkInfo.resize(m_list_chunkInfo.size() + 1);
// pad minimumAllocationSize to 32KB alignment
minimumAllocationSize = (minimumAllocationSize + (32*1024-1)) & ~(32 * 1024 - 1);
minimumAllocationSize = (minimumAllocationSize + (32 * 1024 - 1)) & ~(32 * 1024 - 1);
uint32 allocationSize = 1024 * 1024 * 128;
if (chunkIndex == 0)
@ -189,8 +270,7 @@ uint32 VkTextureChunkedHeap::allocateNewChunk(uint32 chunkIndex, uint32 minimumA
std::vector<uint32> deviceLocalMemoryTypeIndices = m_vkrMemoryManager->FindMemoryTypes(m_typeFilter, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT);
std::vector<uint32> hostLocalMemoryTypeIndices = m_vkrMemoryManager->FindMemoryTypes(m_typeFilter, 0);
// remove device local memory types from host local vector
auto pred = [&deviceLocalMemoryTypeIndices](const uint32& v) ->bool
{
auto pred = [&deviceLocalMemoryTypeIndices](const uint32& v) -> bool {
return std::find(deviceLocalMemoryTypeIndices.begin(), deviceLocalMemoryTypeIndices.end(), v) != deviceLocalMemoryTypeIndices.end();
};
hostLocalMemoryTypeIndices.erase(std::remove_if(hostLocalMemoryTypeIndices.begin(), hostLocalMemoryTypeIndices.end(), pred), hostLocalMemoryTypeIndices.end());
@ -206,7 +286,7 @@ uint32 VkTextureChunkedHeap::allocateNewChunk(uint32 chunkIndex, uint32 minimumA
allocInfo.memoryTypeIndex = memType;
VkDeviceMemory imageMemory;
VkResult r = vkAllocateMemory(m_device, &allocInfo, nullptr, &imageMemory);
VkResult r = vkAllocateMemory(VulkanRenderer::GetInstance()->GetLogicalDevice(), &allocInfo, nullptr, &imageMemory);
if (r != VK_SUCCESS)
continue;
m_list_chunkInfo[chunkIndex].mem = imageMemory;
@ -221,7 +301,7 @@ uint32 VkTextureChunkedHeap::allocateNewChunk(uint32 chunkIndex, uint32 minimumA
allocInfo.memoryTypeIndex = memType;
VkDeviceMemory imageMemory;
VkResult r = vkAllocateMemory(m_device, &allocInfo, nullptr, &imageMemory);
VkResult r = vkAllocateMemory(VulkanRenderer::GetInstance()->GetLogicalDevice(), &allocInfo, nullptr, &imageMemory);
if (r != VK_SUCCESS)
continue;
m_list_chunkInfo[chunkIndex].mem = imageMemory;
@ -238,28 +318,76 @@ uint32 VkTextureChunkedHeap::allocateNewChunk(uint32 chunkIndex, uint32 minimumA
return 0;
}
uint32_t VKRMemoryManager::FindMemoryType(uint32_t typeFilter, VkMemoryPropertyFlags properties) const
{
VkPhysicalDeviceMemoryProperties memProperties;
vkGetPhysicalDeviceMemoryProperties(m_vkr->GetPhysicalDevice(), &memProperties);
/* VkBufferChunkedHeap */
for (uint32 i = 0; i < memProperties.memoryTypeCount; i++)
VKRBuffer* VKRBuffer::Create(VKR_BUFFER_TYPE bufferType, size_t bufferSize, VkMemoryPropertyFlags properties)
{
auto* memMgr = VulkanRenderer::GetInstance()->GetMemoryManager();
VkBuffer buffer;
VkDeviceMemory bufferMemory;
bool allocSuccess;
if (bufferType == VKR_BUFFER_TYPE::STAGING)
allocSuccess = memMgr->CreateBuffer(bufferSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, properties, buffer, bufferMemory);
else if (bufferType == VKR_BUFFER_TYPE::INDEX)
allocSuccess = memMgr->CreateBuffer(bufferSize, VK_BUFFER_USAGE_INDEX_BUFFER_BIT, properties, buffer, bufferMemory);
else if (bufferType == VKR_BUFFER_TYPE::STRIDE)
allocSuccess = memMgr->CreateBuffer(bufferSize, VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, properties, buffer, bufferMemory);
else
cemu_assert_debug(false);
if (!allocSuccess)
return nullptr;
VKRBuffer* bufferObj = new VKRBuffer(buffer, bufferMemory);
// if host visible, then map buffer
void* data = nullptr;
if (properties & VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT)
{
if ((typeFilter & (1 << i)) != 0 && (memProperties.memoryTypes[i].propertyFlags & properties) == properties)
return i;
vkMapMemory(VulkanRenderer::GetInstance()->GetLogicalDevice(), bufferMemory, 0, bufferSize, 0, &data);
bufferObj->m_requiresFlush = !HAS_FLAG(properties, VK_MEMORY_PROPERTY_HOST_COHERENT_BIT);
}
m_vkr->UnrecoverableError(fmt::format("failed to find suitable memory type ({0:#08x} {1:#08x})", typeFilter, properties).c_str());
return 0;
bufferObj->m_mappedMemory = (uint8*)data;
return bufferObj;
}
bool VKRMemoryManager::FindMemoryType2(uint32 typeFilter, VkMemoryPropertyFlags properties, uint32& memoryIndex) const
VKRBuffer::~VKRBuffer()
{
if (m_mappedMemory)
vkUnmapMemory(VulkanRenderer::GetInstance()->GetLogicalDevice(), m_bufferMemory);
if (m_bufferMemory != VK_NULL_HANDLE)
vkFreeMemory(VulkanRenderer::GetInstance()->GetLogicalDevice(), m_bufferMemory, nullptr);
if (m_buffer != VK_NULL_HANDLE)
vkDestroyBuffer(VulkanRenderer::GetInstance()->GetLogicalDevice(), m_buffer, nullptr);
}
VkBufferChunkedHeap::~VkBufferChunkedHeap()
{
for (auto& chunk : m_chunkBuffers)
delete chunk;
}
uint32 VkBufferChunkedHeap::allocateNewChunk(uint32 chunkIndex, uint32 minimumAllocationSize)
{
size_t allocationSize = std::max<size_t>(m_minimumBufferAllocationSize, minimumAllocationSize);
VKRBuffer* buffer = VKRBuffer::Create(m_bufferType, allocationSize, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT);
if(!buffer)
buffer = VKRBuffer::Create(m_bufferType, allocationSize, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT);
if(!buffer)
VulkanRenderer::GetInstance()->UnrecoverableError("Failed to allocate buffer memory for VkBufferChunkedHeap");
cemu_assert_debug(buffer);
cemu_assert_debug(m_chunkBuffers.size() == chunkIndex);
m_chunkBuffers.emplace_back(buffer);
// todo - VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT might be worth it?
return allocationSize;
}
bool VKRMemoryManager::FindMemoryType(uint32 typeFilter, VkMemoryPropertyFlags properties, uint32& memoryIndex) const
{
VkPhysicalDeviceMemoryProperties memProperties;
vkGetPhysicalDeviceMemoryProperties(m_vkr->GetPhysicalDevice(), &memProperties);
for (uint32_t i = 0; i < memProperties.memoryTypeCount; i++)
{
if (typeFilter & (1 << i) && memProperties.memoryTypes[i].propertyFlags == properties)
if (typeFilter & (1 << i) && (memProperties.memoryTypes[i].propertyFlags & properties) == properties)
{
memoryIndex = i;
return true;
@ -330,31 +458,7 @@ size_t VKRMemoryManager::GetTotalMemoryForBufferType(VkBufferUsageFlags usage, V
return total;
}
void VKRMemoryManager::CreateBuffer(VkDeviceSize size, VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, VkBuffer& buffer, VkDeviceMemory& bufferMemory) const
{
VkBufferCreateInfo bufferInfo{};
bufferInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
bufferInfo.usage = usage;
bufferInfo.size = size;
bufferInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
if (vkCreateBuffer(m_vkr->GetLogicalDevice(), &bufferInfo, nullptr, &buffer) != VK_SUCCESS)
m_vkr->UnrecoverableError("Failed to create buffer");
VkMemoryRequirements memRequirements;
vkGetBufferMemoryRequirements(m_vkr->GetLogicalDevice(), buffer, &memRequirements);
VkMemoryAllocateInfo allocInfo{};
allocInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
allocInfo.allocationSize = memRequirements.size;
allocInfo.memoryTypeIndex = FindMemoryType(memRequirements.memoryTypeBits, properties);
if (vkAllocateMemory(m_vkr->GetLogicalDevice(), &allocInfo, nullptr, &bufferMemory) != VK_SUCCESS)
m_vkr->UnrecoverableError("Failed to allocate buffer memory");
if (vkBindBufferMemory(m_vkr->GetLogicalDevice(), buffer, bufferMemory, 0) != VK_SUCCESS)
m_vkr->UnrecoverableError("Failed to bind buffer memory");
}
bool VKRMemoryManager::CreateBuffer2(VkDeviceSize size, VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, VkBuffer& buffer, VkDeviceMemory& bufferMemory) const
bool VKRMemoryManager::CreateBuffer(VkDeviceSize size, VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, VkBuffer& buffer, VkDeviceMemory& bufferMemory) const
{
VkBufferCreateInfo bufferInfo{};
bufferInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
@ -363,7 +467,7 @@ bool VKRMemoryManager::CreateBuffer2(VkDeviceSize size, VkBufferUsageFlags usage
bufferInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
if (vkCreateBuffer(m_vkr->GetLogicalDevice(), &bufferInfo, nullptr, &buffer) != VK_SUCCESS)
{
cemuLog_log(LogType::Force, "Failed to create buffer (CreateBuffer2)");
cemuLog_log(LogType::Force, "Failed to create buffer (CreateBuffer)");
return false;
}
@ -373,7 +477,7 @@ bool VKRMemoryManager::CreateBuffer2(VkDeviceSize size, VkBufferUsageFlags usage
VkMemoryAllocateInfo allocInfo{};
allocInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
allocInfo.allocationSize = memRequirements.size;
if (!FindMemoryType2(memRequirements.memoryTypeBits, properties, allocInfo.memoryTypeIndex))
if (!FindMemoryType(memRequirements.memoryTypeBits, properties, allocInfo.memoryTypeIndex))
{
vkDestroyBuffer(m_vkr->GetLogicalDevice(), buffer, nullptr);
return false;
@ -386,7 +490,7 @@ bool VKRMemoryManager::CreateBuffer2(VkDeviceSize size, VkBufferUsageFlags usage
if (vkBindBufferMemory(m_vkr->GetLogicalDevice(), buffer, bufferMemory, 0) != VK_SUCCESS)
{
vkDestroyBuffer(m_vkr->GetLogicalDevice(), buffer, nullptr);
cemuLog_log(LogType::Force, "Failed to bind buffer (CreateBuffer2)");
cemuLog_log(LogType::Force, "Failed to bind buffer (CreateBuffer)");
return false;
}
return true;
@ -408,7 +512,7 @@ bool VKRMemoryManager::CreateBufferFromHostMemory(void* hostPointer, VkDeviceSiz
if (vkCreateBuffer(m_vkr->GetLogicalDevice(), &bufferInfo, nullptr, &buffer) != VK_SUCCESS)
{
cemuLog_log(LogType::Force, "Failed to create buffer (CreateBuffer2)");
cemuLog_log(LogType::Force, "Failed to create buffer (CreateBuffer)");
return false;
}
@ -423,13 +527,13 @@ bool VKRMemoryManager::CreateBufferFromHostMemory(void* hostPointer, VkDeviceSiz
importHostMem.sType = VK_STRUCTURE_TYPE_IMPORT_MEMORY_HOST_POINTER_INFO_EXT;
importHostMem.handleType = VK_EXTERNAL_MEMORY_HANDLE_TYPE_HOST_ALLOCATION_BIT_EXT;
importHostMem.pHostPointer = hostPointer;
// VK_EXTERNAL_MEMORY_HANDLE_TYPE_HOST_ALLOCATION_BIT_EXT or
// VK_EXTERNAL_MEMORY_HANDLE_TYPE_HOST_ALLOCATION_BIT_EXT or
// VK_EXTERNAL_MEMORY_HANDLE_TYPE_HOST_MAPPED_FOREIGN_MEMORY_BIT_EXT
// whats the difference ?
allocInfo.pNext = &importHostMem;
if (!FindMemoryType2(memRequirements.memoryTypeBits, properties, allocInfo.memoryTypeIndex))
if (!FindMemoryType(memRequirements.memoryTypeBits, properties, allocInfo.memoryTypeIndex))
{
vkDestroyBuffer(m_vkr->GetLogicalDevice(), buffer, nullptr);
return false;
@ -469,11 +573,11 @@ VkImageMemAllocation* VKRMemoryManager::imageMemoryAllocate(VkImage image)
auto it = map_textureHeap.find(typeFilter);
if (it == map_textureHeap.end())
{
texHeap = new VkTextureChunkedHeap(this, typeFilter, m_vkr->GetLogicalDevice());
texHeap = new VkTextureChunkedHeap(this, typeFilter);
map_textureHeap.emplace(typeFilter, texHeap);
}
else
texHeap = it->second;
texHeap = it->second.get();
// alloc mem from heap
uint32 allocationSize = (uint32)memRequirements.size;

View file

@ -2,6 +2,36 @@
#include "Cafe/HW/Latte/Renderer/Renderer.h"
#include "Cafe/HW/Latte/Renderer/Vulkan/VulkanAPI.h"
#include "util/ChunkedHeap/ChunkedHeap.h"
#include "util/helpers/MemoryPool.h"
enum class VKR_BUFFER_TYPE
{
STAGING, // staging upload buffer
INDEX, // buffer for index data
STRIDE, // buffer for stride-adjusted vertex data
};
class VKRBuffer
{
public:
static VKRBuffer* Create(VKR_BUFFER_TYPE bufferType, size_t bufferSize, VkMemoryPropertyFlags properties);
~VKRBuffer();
VkBuffer GetVkBuffer() const { return m_buffer; }
VkDeviceMemory GetVkBufferMemory() const { return m_bufferMemory; }
uint8* GetPtr() const { return m_mappedMemory; }
bool RequiresFlush() const { return m_requiresFlush; }
private:
VKRBuffer(VkBuffer buffer, VkDeviceMemory bufferMem) : m_buffer(buffer), m_bufferMemory(bufferMem) { };
VkBuffer m_buffer;
VkDeviceMemory m_bufferMemory;
uint8* m_mappedMemory;
bool m_requiresFlush{false};
};
struct VkImageMemAllocation
{
@ -14,18 +44,17 @@ struct VkImageMemAllocation
uint32 getAllocationSize() { return allocationSize; }
};
class VkTextureChunkedHeap : private ChunkedHeap
class VkTextureChunkedHeap : private ChunkedHeap<>
{
public:
VkTextureChunkedHeap(class VKRMemoryManager* memoryManager, uint32 typeFilter, VkDevice device) : m_vkrMemoryManager(memoryManager), m_typeFilter(typeFilter), m_device(device) { };
VkTextureChunkedHeap(class VKRMemoryManager* memoryManager, uint32 typeFilter) : m_vkrMemoryManager(memoryManager), m_typeFilter(typeFilter) { };
~VkTextureChunkedHeap();
struct ChunkInfo
{
VkDeviceMemory mem;
};
uint32 allocateNewChunk(uint32 chunkIndex, uint32 minimumAllocationSize) override;
CHAddr allocMem(uint32 size, uint32 alignment)
{
if (alignment < 4)
@ -43,11 +72,6 @@ public:
this->free(addr);
}
void setDevice(VkDevice dev)
{
m_device = dev;
}
VkDeviceMemory getChunkMem(uint32 index)
{
if (index >= m_list_chunkInfo.size())
@ -57,29 +81,75 @@ public:
void getStatistics(uint32& totalHeapSize, uint32& allocatedBytes) const
{
totalHeapSize = numHeapBytes;
allocatedBytes = numAllocatedBytes;
totalHeapSize = m_numHeapBytes;
allocatedBytes = m_numAllocatedBytes;
}
VkDevice m_device;
private:
uint32 allocateNewChunk(uint32 chunkIndex, uint32 minimumAllocationSize) override;
uint32 m_typeFilter{ 0xFFFFFFFF };
class VKRMemoryManager* m_vkrMemoryManager;
std::vector<ChunkInfo> m_list_chunkInfo;
};
class VkBufferChunkedHeap : private ChunkedHeap<>
{
public:
VkBufferChunkedHeap(VKR_BUFFER_TYPE bufferType, size_t minimumBufferAllocationSize) : m_bufferType(bufferType), m_minimumBufferAllocationSize(minimumBufferAllocationSize) { };
~VkBufferChunkedHeap();
using ChunkedHeap::alloc;
using ChunkedHeap::free;
uint8* GetChunkPtr(uint32 index) const
{
if (index >= m_chunkBuffers.size())
return nullptr;
return m_chunkBuffers[index]->GetPtr();
}
void GetChunkVkMemInfo(uint32 index, VkBuffer& buffer, VkDeviceMemory& mem)
{
if (index >= m_chunkBuffers.size())
{
buffer = VK_NULL_HANDLE;
mem = VK_NULL_HANDLE;
return;
}
buffer = m_chunkBuffers[index]->GetVkBuffer();
mem = m_chunkBuffers[index]->GetVkBufferMemory();
}
void GetStats(uint32& numBuffers, size_t& totalBufferSize, size_t& freeBufferSize) const
{
numBuffers = m_chunkBuffers.size();
totalBufferSize = m_numHeapBytes;
freeBufferSize = m_numHeapBytes - m_numAllocatedBytes;
}
bool RequiresFlush(uint32 index) const
{
if (index >= m_chunkBuffers.size())
return false;
return m_chunkBuffers[index]->RequiresFlush();
}
private:
uint32 allocateNewChunk(uint32 chunkIndex, uint32 minimumAllocationSize) override;
VKR_BUFFER_TYPE m_bufferType;
std::vector<VKRBuffer*> m_chunkBuffers;
size_t m_minimumBufferAllocationSize;
};
// a circular ring-buffer which tracks and releases memory per command-buffer
class VKRSynchronizedRingAllocator
{
public:
enum class BUFFER_TYPE
{
STAGING, // staging upload buffer
INDEX, // buffer for index data
STRIDE, // buffer for stride-adjusted vertex data
};
VKRSynchronizedRingAllocator(class VulkanRenderer* vkRenderer, class VKRMemoryManager* vkMemoryManager, BUFFER_TYPE bufferType, uint32 minimumBufferAllocSize) : m_vkr(vkRenderer), m_vkrMemMgr(vkMemoryManager), m_bufferType(bufferType), m_minimumBufferAllocSize(minimumBufferAllocSize) {};
VKRSynchronizedRingAllocator(class VulkanRenderer* vkRenderer, class VKRMemoryManager* vkMemoryManager, VKR_BUFFER_TYPE bufferType, uint32 minimumBufferAllocSize) : m_vkr(vkRenderer), m_vkrMemMgr(vkMemoryManager), m_bufferType(bufferType), m_minimumBufferAllocSize(minimumBufferAllocSize) {};
VKRSynchronizedRingAllocator(const VKRSynchronizedRingAllocator&) = delete; // disallow copy
~VKRSynchronizedRingAllocator();
struct BufferSyncPoint_t
{
@ -126,13 +196,53 @@ private:
const class VulkanRenderer* m_vkr;
const class VKRMemoryManager* m_vkrMemMgr;
const BUFFER_TYPE m_bufferType;
const VKR_BUFFER_TYPE m_bufferType;
const uint32 m_minimumBufferAllocSize;
std::vector<AllocatorBuffer_t> m_buffers;
};
// heap style allocator with released memory being freed after the current command buffer finishes
class VKRSynchronizedHeapAllocator
{
struct TrackedAllocation
{
TrackedAllocation(CHAddr allocation) : allocation(allocation) {};
CHAddr allocation;
};
public:
VKRSynchronizedHeapAllocator(class VKRMemoryManager* vkMemoryManager, VKR_BUFFER_TYPE bufferType, size_t minimumBufferAllocSize);
VKRSynchronizedHeapAllocator(const VKRSynchronizedHeapAllocator&) = delete; // disallow copy
struct AllocatorReservation
{
VkBuffer vkBuffer;
VkDeviceMemory vkMem;
uint8* memPtr;
uint32 bufferOffset;
uint32 size;
uint32 bufferIndex;
};
AllocatorReservation* AllocateBufferMemory(uint32 size, uint32 alignment);
void FreeReservation(AllocatorReservation* uploadReservation);
void FlushReservation(AllocatorReservation* uploadReservation);
void CleanupBuffer(uint64 latestFinishedCommandBufferId);
void GetStats(uint32& numBuffers, size_t& totalBufferSize, size_t& freeBufferSize) const;
private:
const class VKRMemoryManager* m_vkrMemMgr;
VkBufferChunkedHeap m_chunkedHeap;
// allocations
std::vector<TrackedAllocation> m_activeAllocations;
MemoryPool<AllocatorReservation> m_poolAllocatorReservation{32};
// release queue
std::unordered_map<uint64, std::vector<CHAddr>> m_releaseQueue;
};
void LatteIndices_invalidateAll();
class VKRMemoryManager
@ -140,15 +250,15 @@ class VKRMemoryManager
friend class VKRSynchronizedRingAllocator;
public:
VKRMemoryManager(class VulkanRenderer* renderer) :
m_stagingBuffer(renderer, this, VKRSynchronizedRingAllocator::BUFFER_TYPE::STAGING, 32u * 1024 * 1024),
m_indexBuffer(renderer, this, VKRSynchronizedRingAllocator::BUFFER_TYPE::INDEX, 4u * 1024 * 1024),
m_vertexStrideMetalBuffer(renderer, this, VKRSynchronizedRingAllocator::BUFFER_TYPE::STRIDE, 4u * 1024 * 1024)
m_stagingBuffer(renderer, this, VKR_BUFFER_TYPE::STAGING, 32u * 1024 * 1024),
m_indexBuffer(this, VKR_BUFFER_TYPE::INDEX, 4u * 1024 * 1024),
m_vertexStrideMetalBuffer(renderer, this, VKR_BUFFER_TYPE::STRIDE, 4u * 1024 * 1024)
{
m_vkr = renderer;
}
// texture memory management
std::unordered_map<uint32, VkTextureChunkedHeap*> map_textureHeap; // one heap per memory type
std::unordered_map<uint32, std::unique_ptr<VkTextureChunkedHeap>> map_textureHeap; // one heap per memory type
std::vector<uint8> m_textureUploadBuffer;
// texture upload buffer
@ -167,7 +277,7 @@ public:
}
VKRSynchronizedRingAllocator& getStagingAllocator() { return m_stagingBuffer; }; // allocator for texture/attribute/uniform uploads
VKRSynchronizedRingAllocator& getIndexAllocator() { return m_indexBuffer; }; // allocator for index data
VKRSynchronizedHeapAllocator& GetIndexAllocator() { return m_indexBuffer; }; // allocator for index data
VKRSynchronizedRingAllocator& getMetalStrideWorkaroundAllocator() { return m_vertexStrideMetalBuffer; }; // allocator for stride-adjusted vertex data
void cleanupBuffers(uint64 latestFinishedCommandBufferId)
@ -178,9 +288,7 @@ public:
m_vertexStrideMetalBuffer.CleanupBuffer(latestFinishedCommandBufferId);
}
// memory helpers
uint32_t FindMemoryType(uint32_t typeFilter, VkMemoryPropertyFlags properties) const;
bool FindMemoryType2(uint32 typeFilter, VkMemoryPropertyFlags properties, uint32& memoryIndex) const; // searches for exact properties. Can gracefully fail without throwing exception (returns false)
bool FindMemoryType(uint32 typeFilter, VkMemoryPropertyFlags properties, uint32& memoryIndex) const; // searches for exact properties. Can gracefully fail without throwing exception (returns false)
std::vector<uint32> FindMemoryTypes(uint32_t typeFilter, VkMemoryPropertyFlags properties) const;
// image memory allocation
@ -190,8 +298,7 @@ public:
// buffer management
size_t GetTotalMemoryForBufferType(VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, size_t minimumBufferSize = 16 * 1024 * 1024);
void CreateBuffer(VkDeviceSize size, VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, VkBuffer& buffer, VkDeviceMemory& bufferMemory) const;
bool CreateBuffer2(VkDeviceSize size, VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, VkBuffer& buffer, VkDeviceMemory& bufferMemory) const; // same as CreateBuffer but doesn't throw exception on failure
bool CreateBuffer(VkDeviceSize size, VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, VkBuffer& buffer, VkDeviceMemory& bufferMemory) const; // same as CreateBuffer but doesn't throw exception on failure
bool CreateBufferFromHostMemory(void* hostPointer, VkDeviceSize size, VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, VkBuffer& buffer, VkDeviceMemory& bufferMemory) const;
void DeleteBuffer(VkBuffer& buffer, VkDeviceMemory& deviceMem) const;
@ -202,6 +309,6 @@ public:
private:
class VulkanRenderer* m_vkr;
VKRSynchronizedRingAllocator m_stagingBuffer;
VKRSynchronizedRingAllocator m_indexBuffer;
VKRSynchronizedHeapAllocator m_indexBuffer;
VKRSynchronizedRingAllocator m_vertexStrideMetalBuffer;
};

View file

@ -26,7 +26,6 @@ PipelineInfo::PipelineInfo(uint64 minimalStateHash, uint64 pipelineHash, LatteFe
// init VKRObjPipeline
m_vkrObjPipeline = new VKRObjectPipeline();
m_vkrObjPipeline->pipeline = VK_NULL_HANDLE;
// track dependency with shaders
if (vertexShaderVk)

View file

@ -165,6 +165,7 @@ VKFUNC_DEVICE(vkCmdDraw);
VKFUNC_DEVICE(vkCmdCopyBufferToImage);
VKFUNC_DEVICE(vkCmdCopyImageToBuffer);
VKFUNC_DEVICE(vkCmdClearColorImage);
VKFUNC_DEVICE(vkCmdClearAttachments);
VKFUNC_DEVICE(vkCmdBindIndexBuffer);
VKFUNC_DEVICE(vkCmdBindVertexBuffers);
VKFUNC_DEVICE(vkCmdDrawIndexed);
@ -188,6 +189,9 @@ VKFUNC_DEVICE(vkCmdPipelineBarrier2KHR);
VKFUNC_DEVICE(vkCmdBeginRenderingKHR);
VKFUNC_DEVICE(vkCmdEndRenderingKHR);
// khr_present_wait
VKFUNC_DEVICE(vkWaitForPresentKHR);
// transform feedback extension
VKFUNC_DEVICE(vkCmdBindTransformFeedbackBuffersEXT);
VKFUNC_DEVICE(vkCmdBeginTransformFeedbackEXT);
@ -195,6 +199,7 @@ VKFUNC_DEVICE(vkCmdEndTransformFeedbackEXT);
// query
VKFUNC_DEVICE(vkCreateQueryPool);
VKFUNC_DEVICE(vkDestroyQueryPool);
VKFUNC_DEVICE(vkCmdResetQueryPool);
VKFUNC_DEVICE(vkCmdBeginQuery);
VKFUNC_DEVICE(vkCmdEndQuery);
@ -233,6 +238,7 @@ VKFUNC_DEVICE(vkAllocateDescriptorSets);
VKFUNC_DEVICE(vkFreeDescriptorSets);
VKFUNC_DEVICE(vkUpdateDescriptorSets);
VKFUNC_DEVICE(vkCreateDescriptorPool);
VKFUNC_DEVICE(vkDestroyDescriptorPool);
VKFUNC_DEVICE(vkDestroyDescriptorSetLayout);
#undef VKFUNC_INIT

View file

@ -558,8 +558,8 @@ void PipelineCompiler::InitRasterizerState(const LatteContextRegister& latteRegi
rasterizerExt.flags = 0;
rasterizer.sType = VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_STATE_CREATE_INFO;
rasterizer.pNext = &rasterizerExt;
rasterizer.rasterizerDiscardEnable = LatteGPUState.contextNew.PA_CL_CLIP_CNTL.get_DX_RASTERIZATION_KILL();
rasterizer.pNext = VulkanRenderer::GetInstance()->m_featureControl.deviceExtensions.depth_clip_enable ? &rasterizerExt : nullptr;
// GX2SetSpecialState(0, true) workaround
if (!LatteGPUState.contextNew.PA_CL_VTE_CNTL.get_VPORT_X_OFFSET_ENA())
rasterizer.rasterizerDiscardEnable = false;
@ -730,7 +730,7 @@ void PipelineCompiler::InitDescriptorSetLayouts(VulkanRenderer* vkRenderer, Pipe
{
cemu_assert_debug(descriptorSetLayoutCount == 0);
CreateDescriptorSetLayout(vkRenderer, vertexShader, descriptorSetLayout[descriptorSetLayoutCount], vkrPipelineInfo);
vkObjPipeline->vertexDSL = descriptorSetLayout[descriptorSetLayoutCount];
vkObjPipeline->m_vertexDSL = descriptorSetLayout[descriptorSetLayoutCount];
descriptorSetLayoutCount++;
}
@ -738,7 +738,7 @@ void PipelineCompiler::InitDescriptorSetLayouts(VulkanRenderer* vkRenderer, Pipe
{
cemu_assert_debug(descriptorSetLayoutCount == 1);
CreateDescriptorSetLayout(vkRenderer, pixelShader, descriptorSetLayout[descriptorSetLayoutCount], vkrPipelineInfo);
vkObjPipeline->pixelDSL = descriptorSetLayout[descriptorSetLayoutCount];
vkObjPipeline->m_pixelDSL = descriptorSetLayout[descriptorSetLayoutCount];
descriptorSetLayoutCount++;
}
else if (geometryShader)
@ -757,7 +757,7 @@ void PipelineCompiler::InitDescriptorSetLayouts(VulkanRenderer* vkRenderer, Pipe
{
cemu_assert_debug(descriptorSetLayoutCount == 2);
CreateDescriptorSetLayout(vkRenderer, geometryShader, descriptorSetLayout[descriptorSetLayoutCount], vkrPipelineInfo);
vkObjPipeline->geometryDSL = descriptorSetLayout[descriptorSetLayoutCount];
vkObjPipeline->m_geometryDSL = descriptorSetLayout[descriptorSetLayoutCount];
descriptorSetLayoutCount++;
}
}
@ -918,7 +918,7 @@ bool PipelineCompiler::InitFromCurrentGPUState(PipelineInfo* pipelineInfo, const
pipelineLayoutInfo.pPushConstantRanges = nullptr;
pipelineLayoutInfo.pushConstantRangeCount = 0;
VkResult result = vkCreatePipelineLayout(vkRenderer->m_logicalDevice, &pipelineLayoutInfo, nullptr, &m_pipeline_layout);
VkResult result = vkCreatePipelineLayout(vkRenderer->m_logicalDevice, &pipelineLayoutInfo, nullptr, &m_pipelineLayout);
if (result != VK_SUCCESS)
{
cemuLog_log(LogType::Force, "Failed to create pipeline layout: {}", result);
@ -936,7 +936,7 @@ bool PipelineCompiler::InitFromCurrentGPUState(PipelineInfo* pipelineInfo, const
// ##########################################################################################################################################
pipelineInfo->m_vkrObjPipeline->pipeline_layout = m_pipeline_layout;
pipelineInfo->m_vkrObjPipeline->m_pipelineLayout = m_pipelineLayout;
// increment ref counter for vkrObjPipeline and renderpass object to make sure they dont get released while we are using them
m_vkrObjPipeline->incRef();
@ -989,7 +989,7 @@ bool PipelineCompiler::Compile(bool forceCompile, bool isRenderThread, bool show
pipelineInfo.pRasterizationState = &rasterizer;
pipelineInfo.pMultisampleState = &multisampling;
pipelineInfo.pColorBlendState = &colorBlending;
pipelineInfo.layout = m_pipeline_layout;
pipelineInfo.layout = m_pipelineLayout;
pipelineInfo.renderPass = m_renderPassObj->m_renderPass;
pipelineInfo.pDepthStencilState = &depthStencilState;
pipelineInfo.subpass = 0;
@ -1037,7 +1037,7 @@ bool PipelineCompiler::Compile(bool forceCompile, bool isRenderThread, bool show
}
else if (result == VK_SUCCESS)
{
m_vkrObjPipeline->setPipeline(pipeline);
m_vkrObjPipeline->SetPipeline(pipeline);
}
else
{

View file

@ -41,7 +41,7 @@ public:
bool InitFromCurrentGPUState(PipelineInfo* pipelineInfo, const LatteContextRegister& latteRegister, VKRObjectRenderPass* renderPassObj);
void TrackAsCached(uint64 baseHash, uint64 pipelineStateHash); // stores pipeline to permanent cache if not yet cached. Must be called synchronously from render thread due to dependency on GPU state
VkPipelineLayout m_pipeline_layout;
VkPipelineLayout m_pipelineLayout;
VKRObjectRenderPass* m_renderPassObj{};
/* shader stages */

View file

@ -47,7 +47,10 @@ const std::vector<const char*> kOptionalDeviceExtensions =
VK_EXT_FILTER_CUBIC_EXTENSION_NAME, // not supported by any device yet
VK_EXT_EXTERNAL_MEMORY_HOST_EXTENSION_NAME,
VK_KHR_SYNCHRONIZATION_2_EXTENSION_NAME,
VK_KHR_SHADER_FLOAT_CONTROLS_EXTENSION_NAME
VK_KHR_SHADER_FLOAT_CONTROLS_EXTENSION_NAME,
VK_KHR_PRESENT_WAIT_EXTENSION_NAME,
VK_KHR_PRESENT_ID_EXTENSION_NAME,
VK_EXT_DEPTH_CLIP_ENABLE_EXTENSION_NAME
};
const std::vector<const char*> kRequiredDeviceExtensions =
@ -80,8 +83,6 @@ VKAPI_ATTR VkBool32 VKAPI_CALL DebugUtilsCallback(VkDebugUtilsMessageSeverityFla
if (strstr(pCallbackData->pMessage, "Number of currently valid sampler objects is not less than the maximum allowed"))
return VK_FALSE;
assert_dbg();
#endif
cemuLog_log(LogType::Force, (char*)pCallbackData->pMessage);
@ -252,12 +253,24 @@ void VulkanRenderer::GetDeviceFeatures()
pcc.pNext = prevStruct;
prevStruct = &pcc;
VkPhysicalDevicePresentIdFeaturesKHR pidf{};
pidf.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PRESENT_ID_FEATURES_KHR;
pidf.pNext = prevStruct;
prevStruct = &pidf;
VkPhysicalDevicePresentWaitFeaturesKHR pwf{};
pwf.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PRESENT_WAIT_FEATURES_KHR;
pwf.pNext = prevStruct;
prevStruct = &pwf;
VkPhysicalDeviceFeatures2 physicalDeviceFeatures2{};
physicalDeviceFeatures2.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FEATURES_2;
physicalDeviceFeatures2.pNext = prevStruct;
vkGetPhysicalDeviceFeatures2(m_physicalDevice, &physicalDeviceFeatures2);
cemuLog_log(LogType::Force, "Vulkan: present_wait extension: {}", (pwf.presentWait && pidf.presentId) ? "supported" : "unsupported");
/* Get Vulkan device properties and limits */
VkPhysicalDeviceFloatControlsPropertiesKHR pfcp{};
prevStruct = nullptr;
@ -300,7 +313,10 @@ void VulkanRenderer::GetDeviceFeatures()
cemuLog_log(LogType::Force, "VK_EXT_custom_border_color not supported. Cannot emulate arbitrary border color");
}
}
if (!m_featureControl.deviceExtensions.depth_clip_enable)
{
cemuLog_log(LogType::Force, "VK_EXT_depth_clip_enable not supported");
}
// get limits
m_featureControl.limits.minUniformBufferOffsetAlignment = std::max(prop2.properties.limits.minUniformBufferOffsetAlignment, (VkDeviceSize)4);
m_featureControl.limits.nonCoherentAtomSize = std::max(prop2.properties.limits.nonCoherentAtomSize, (VkDeviceSize)4);
@ -425,7 +441,7 @@ VulkanRenderer::VulkanRenderer()
GetDeviceFeatures();
// init memory manager
memoryManager = new VKRMemoryManager(this);
memoryManager.reset(new VKRMemoryManager(this));
try
{
@ -490,6 +506,24 @@ VulkanRenderer::VulkanRenderer()
customBorderColorFeature.customBorderColors = VK_TRUE;
customBorderColorFeature.customBorderColorWithoutFormat = VK_TRUE;
}
// enable VK_KHR_present_id
VkPhysicalDevicePresentIdFeaturesKHR presentIdFeature{};
if(m_featureControl.deviceExtensions.present_wait)
{
presentIdFeature.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PRESENT_ID_FEATURES_KHR;
presentIdFeature.pNext = deviceExtensionFeatures;
deviceExtensionFeatures = &presentIdFeature;
presentIdFeature.presentId = VK_TRUE;
}
// enable VK_KHR_present_wait
VkPhysicalDevicePresentWaitFeaturesKHR presentWaitFeature{};
if(m_featureControl.deviceExtensions.present_wait)
{
presentWaitFeature.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PRESENT_WAIT_FEATURES_KHR;
presentWaitFeature.pNext = deviceExtensionFeatures;
deviceExtensionFeatures = &presentWaitFeature;
presentWaitFeature.presentWait = VK_TRUE;
}
std::vector<const char*> used_extensions;
VkDeviceCreateInfo createInfo = CreateDeviceCreateInfo(queueCreateInfos, deviceFeatures, deviceExtensionFeatures, used_extensions);
@ -545,15 +579,15 @@ VulkanRenderer::VulkanRenderer()
void* bufferPtr;
// init ringbuffer for uniform vars
m_uniformVarBufferMemoryIsCoherent = false;
if (memoryManager->CreateBuffer2(UNIFORMVAR_RINGBUFFER_SIZE, VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT | VK_MEMORY_PROPERTY_HOST_CACHED_BIT, m_uniformVarBuffer, m_uniformVarBufferMemory))
if (memoryManager->CreateBuffer(UNIFORMVAR_RINGBUFFER_SIZE, VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT | VK_MEMORY_PROPERTY_HOST_CACHED_BIT, m_uniformVarBuffer, m_uniformVarBufferMemory))
m_uniformVarBufferMemoryIsCoherent = true;
else if (memoryManager->CreateBuffer2(UNIFORMVAR_RINGBUFFER_SIZE, VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT | VK_MEMORY_PROPERTY_HOST_CACHED_BIT | VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, m_uniformVarBuffer, m_uniformVarBufferMemory))
else if (memoryManager->CreateBuffer(UNIFORMVAR_RINGBUFFER_SIZE, VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT | VK_MEMORY_PROPERTY_HOST_CACHED_BIT | VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, m_uniformVarBuffer, m_uniformVarBufferMemory))
m_uniformVarBufferMemoryIsCoherent = true; // unified memory
else if (memoryManager->CreateBuffer2(UNIFORMVAR_RINGBUFFER_SIZE, VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, m_uniformVarBuffer, m_uniformVarBufferMemory))
else if (memoryManager->CreateBuffer(UNIFORMVAR_RINGBUFFER_SIZE, VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, m_uniformVarBuffer, m_uniformVarBufferMemory))
m_uniformVarBufferMemoryIsCoherent = true;
else
{
memoryManager->CreateBuffer2(UNIFORMVAR_RINGBUFFER_SIZE, VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT, m_uniformVarBuffer, m_uniformVarBufferMemory);
memoryManager->CreateBuffer(UNIFORMVAR_RINGBUFFER_SIZE, VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT, m_uniformVarBuffer, m_uniformVarBufferMemory);
}
if (!m_uniformVarBufferMemoryIsCoherent)
@ -596,6 +630,31 @@ VulkanRenderer::~VulkanRenderer()
m_pipeline_cache_semaphore.notify();
m_pipeline_cache_save_thread.join();
vkDestroyPipelineCache(m_logicalDevice, m_pipeline_cache, nullptr);
if(!m_backbufferBlitDescriptorSetCache.empty())
{
std::vector<VkDescriptorSet> freeVector;
freeVector.reserve(m_backbufferBlitDescriptorSetCache.size());
std::transform(m_backbufferBlitDescriptorSetCache.begin(), m_backbufferBlitDescriptorSetCache.end(), std::back_inserter(freeVector), [](auto& i) {
return i.second;
});
vkFreeDescriptorSets(m_logicalDevice, m_descriptorPool, freeVector.size(), freeVector.data());
}
vkDestroyDescriptorPool(m_logicalDevice, m_descriptorPool, nullptr);
for(auto& i : m_backbufferBlitPipelineCache)
{
vkDestroyPipeline(m_logicalDevice, i.second, nullptr);
}
m_backbufferBlitPipelineCache = {};
if(m_occlusionQueries.queryPool != VK_NULL_HANDLE)
vkDestroyQueryPool(m_logicalDevice, m_occlusionQueries.queryPool, nullptr);
vkDestroyDescriptorSetLayout(m_logicalDevice, m_swapchainDescriptorSetLayout, nullptr);
// shut down imgui
ImGui_ImplVulkan_Shutdown();
@ -608,10 +667,6 @@ VulkanRenderer::~VulkanRenderer()
memoryManager->DeleteBuffer(m_xfbRingBuffer, m_xfbRingBufferMemory);
memoryManager->DeleteBuffer(m_occlusionQueries.bufferQueryResults, m_occlusionQueries.memoryQueryResults);
memoryManager->DeleteBuffer(m_bufferCache, m_bufferCacheMemory);
// texture memory
// todo
// upload buffers
// todo
m_padSwapchainInfo = nullptr;
m_mainSwapchainInfo = nullptr;
@ -634,12 +689,20 @@ VulkanRenderer::~VulkanRenderer()
it = VK_NULL_HANDLE;
}
for(auto& sem : m_commandBufferSemaphores)
{
vkDestroySemaphore(m_logicalDevice, sem, nullptr);
sem = VK_NULL_HANDLE;
}
if (m_pipelineLayout != VK_NULL_HANDLE)
vkDestroyPipelineLayout(m_logicalDevice, m_pipelineLayout, nullptr);
if (m_commandPool != VK_NULL_HANDLE)
vkDestroyCommandPool(m_logicalDevice, m_commandPool, nullptr);
VKRObjectSampler::DestroyCache();
// destroy debug callback
if (m_debugCallback)
{
@ -647,6 +710,12 @@ VulkanRenderer::~VulkanRenderer()
vkDestroyDebugUtilsMessengerEXT(m_instance, m_debugCallback, nullptr);
}
while(!m_destructionQueue.empty())
ProcessDestructionQueue();
// destroy memory manager
memoryManager.reset();
// destroy instance, devices
if (m_instance != VK_NULL_HANDLE)
{
@ -658,9 +727,6 @@ VulkanRenderer::~VulkanRenderer()
vkDestroyInstance(m_instance, nullptr);
}
// destroy memory manager
delete memoryManager;
// crashes?
//glslang::FinalizeProcess();
}
@ -791,7 +857,14 @@ void VulkanRenderer::HandleScreenshotRequest(LatteTextureView* texView, bool pad
VkMemoryAllocateInfo allocInfo{};
allocInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
allocInfo.allocationSize = memRequirements.size;
allocInfo.memoryTypeIndex = memoryManager->FindMemoryType(memRequirements.memoryTypeBits, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT);
uint32 memIndex;
bool foundMemory = memoryManager->FindMemoryType(memRequirements.memoryTypeBits, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, memIndex);
if(!foundMemory)
{
cemuLog_log(LogType::Force, "Screenshot request failed due to incompatible vulkan memory types.");
return;
}
allocInfo.memoryTypeIndex = memIndex;
if (vkAllocateMemory(m_logicalDevice, &allocInfo, nullptr, &imageMemory) != VK_SUCCESS)
{
@ -1047,6 +1120,13 @@ VkDeviceCreateInfo VulkanRenderer::CreateDeviceCreateInfo(const std::vector<VkDe
used_extensions.emplace_back(VK_KHR_DYNAMIC_RENDERING_EXTENSION_NAME);
if (m_featureControl.deviceExtensions.shader_float_controls)
used_extensions.emplace_back(VK_KHR_SHADER_FLOAT_CONTROLS_EXTENSION_NAME);
if (m_featureControl.deviceExtensions.depth_clip_enable)
used_extensions.emplace_back(VK_EXT_DEPTH_CLIP_ENABLE_EXTENSION_NAME);
if (m_featureControl.deviceExtensions.present_wait)
{
used_extensions.emplace_back(VK_KHR_PRESENT_ID_EXTENSION_NAME);
used_extensions.emplace_back(VK_KHR_PRESENT_WAIT_EXTENSION_NAME);
}
VkDeviceCreateInfo createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO;
@ -1143,7 +1223,9 @@ bool VulkanRenderer::CheckDeviceExtensionSupport(const VkPhysicalDevice device,
info.deviceExtensions.synchronization2 = isExtensionAvailable(VK_KHR_SYNCHRONIZATION_2_EXTENSION_NAME);
info.deviceExtensions.shader_float_controls = isExtensionAvailable(VK_KHR_SHADER_FLOAT_CONTROLS_EXTENSION_NAME);
info.deviceExtensions.dynamic_rendering = false; // isExtensionAvailable(VK_KHR_DYNAMIC_RENDERING_EXTENSION_NAME);
info.deviceExtensions.depth_clip_enable = isExtensionAvailable(VK_EXT_DEPTH_CLIP_ENABLE_EXTENSION_NAME);
// dynamic rendering doesn't provide any benefits for us right now. Driver implementations are very unoptimized as of Feb 2022
info.deviceExtensions.present_wait = isExtensionAvailable(VK_KHR_PRESENT_WAIT_EXTENSION_NAME) && isExtensionAvailable(VK_KHR_PRESENT_ID_EXTENSION_NAME);
// check for framedebuggers
info.debugMarkersSupported = false;
@ -1513,37 +1595,35 @@ void VulkanRenderer::DeleteNullObjects()
void VulkanRenderer::ImguiInit()
{
if (m_imguiRenderPass == VK_NULL_HANDLE)
{
// TODO: renderpass swapchain format may change between srgb and rgb -> need reinit
VkAttachmentDescription colorAttachment = {};
colorAttachment.format = m_mainSwapchainInfo->m_surfaceFormat.format;
colorAttachment.samples = VK_SAMPLE_COUNT_1_BIT;
colorAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_LOAD;
colorAttachment.storeOp = VK_ATTACHMENT_STORE_OP_STORE;
colorAttachment.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
colorAttachment.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
colorAttachment.initialLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
colorAttachment.finalLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
VkRenderPass prevRenderPass = m_imguiRenderPass;
VkAttachmentReference colorAttachmentRef = {};
colorAttachmentRef.attachment = 0;
colorAttachmentRef.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
VkSubpassDescription subpass = {};
subpass.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
subpass.colorAttachmentCount = 1;
subpass.pColorAttachments = &colorAttachmentRef;
VkAttachmentDescription colorAttachment = {};
colorAttachment.format = m_mainSwapchainInfo->m_surfaceFormat.format;
colorAttachment.samples = VK_SAMPLE_COUNT_1_BIT;
colorAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_LOAD;
colorAttachment.storeOp = VK_ATTACHMENT_STORE_OP_STORE;
colorAttachment.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
colorAttachment.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
colorAttachment.initialLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
colorAttachment.finalLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
VkRenderPassCreateInfo renderPassInfo = {};
renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO;
renderPassInfo.attachmentCount = 1;
renderPassInfo.pAttachments = &colorAttachment;
renderPassInfo.subpassCount = 1;
renderPassInfo.pSubpasses = &subpass;
const auto result = vkCreateRenderPass(m_logicalDevice, &renderPassInfo, nullptr, &m_imguiRenderPass);
if (result != VK_SUCCESS)
throw VkException(result, "can't create imgui renderpass");
}
VkAttachmentReference colorAttachmentRef = {};
colorAttachmentRef.attachment = 0;
colorAttachmentRef.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
VkSubpassDescription subpass = {};
subpass.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
subpass.colorAttachmentCount = 1;
subpass.pColorAttachments = &colorAttachmentRef;
VkRenderPassCreateInfo renderPassInfo = {};
renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO;
renderPassInfo.attachmentCount = 1;
renderPassInfo.pAttachments = &colorAttachment;
renderPassInfo.subpassCount = 1;
renderPassInfo.pSubpasses = &subpass;
const auto result = vkCreateRenderPass(m_logicalDevice, &renderPassInfo, nullptr, &m_imguiRenderPass);
if (result != VK_SUCCESS)
throw VkException(result, "can't create imgui renderpass");
ImGui_ImplVulkan_InitInfo info{};
info.Instance = m_instance;
@ -1557,6 +1637,9 @@ void VulkanRenderer::ImguiInit()
info.ImageCount = info.MinImageCount;
ImGui_ImplVulkan_Init(&info, m_imguiRenderPass);
if (prevRenderPass != VK_NULL_HANDLE)
vkDestroyRenderPass(GetLogicalDevice(), prevRenderPass, nullptr);
}
void VulkanRenderer::Initialize()
@ -1569,6 +1652,7 @@ void VulkanRenderer::Initialize()
void VulkanRenderer::Shutdown()
{
DeleteFontTextures();
Renderer::Shutdown();
SubmitCommandBuffer();
WaitDeviceIdle();
@ -1769,7 +1853,6 @@ void VulkanRenderer::ImguiEnd()
vkCmdEndRenderPass(m_state.currentCommandBuffer);
}
std::vector<LatteTextureVk*> g_imgui_textures; // TODO manage better
ImTextureID VulkanRenderer::GenerateTexture(const std::vector<uint8>& data, const Vector2i& size)
{
try
@ -1799,6 +1882,7 @@ void VulkanRenderer::DeleteTexture(ImTextureID id)
void VulkanRenderer::DeleteFontTextures()
{
WaitDeviceIdle();
ImGui_ImplVulkan_DestroyFontsTexture();
}
@ -1837,7 +1921,7 @@ void VulkanRenderer::InitFirstCommandBuffer()
vkResetFences(m_logicalDevice, 1, &m_cmd_buffer_fences[m_commandBufferIndex]);
VkCommandBufferBeginInfo beginInfo{};
beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
beginInfo.flags = VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT;
beginInfo.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;
vkBeginCommandBuffer(m_state.currentCommandBuffer, &beginInfo);
vkCmdSetViewport(m_state.currentCommandBuffer, 0, 1, &m_state.currentViewport);
@ -1855,6 +1939,7 @@ void VulkanRenderer::ProcessFinishedCommandBuffers()
if (fenceStatus == VK_SUCCESS)
{
ProcessDestructionQueue();
m_uniformVarBufferReadIndex = m_cmdBufferUniformRingbufIndices[m_commandBufferSyncIndex];
m_commandBufferSyncIndex = (m_commandBufferSyncIndex + 1) % m_commandBuffers.size();
memoryManager->cleanupBuffers(m_countCommandBufferFinished);
m_countCommandBufferFinished++;
@ -1948,6 +2033,7 @@ void VulkanRenderer::SubmitCommandBuffer(VkSemaphore signalSemaphore, VkSemaphor
cemuLog_logDebug(LogType::Force, "Vulkan: Waiting for available command buffer...");
WaitForNextFinishedCommandBuffer();
}
m_cmdBufferUniformRingbufIndices[nextCmdBufferIndex] = m_cmdBufferUniformRingbufIndices[m_commandBufferIndex];
m_commandBufferIndex = nextCmdBufferIndex;
@ -1957,7 +2043,7 @@ void VulkanRenderer::SubmitCommandBuffer(VkSemaphore signalSemaphore, VkSemaphor
VkCommandBufferBeginInfo beginInfo{};
beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
beginInfo.flags = VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT;
beginInfo.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;
vkBeginCommandBuffer(m_state.currentCommandBuffer, &beginInfo);
// make sure some states are set for this command buffer
@ -2478,9 +2564,8 @@ VkPipeline VulkanRenderer::backbufferBlit_createGraphicsPipeline(VkDescriptorSet
hash += (uint64)(chainInfo.m_usesSRGB);
hash += ((uint64)padView) << 1;
static std::unordered_map<uint64, VkPipeline> s_pipeline_cache;
const auto it = s_pipeline_cache.find(hash);
if (it != s_pipeline_cache.cend())
const auto it = m_backbufferBlitPipelineCache.find(hash);
if (it != m_backbufferBlitPipelineCache.cend())
return it->second;
std::vector<VkPipelineShaderStageCreateInfo> shaderStages;
@ -2542,10 +2627,18 @@ VkPipeline VulkanRenderer::backbufferBlit_createGraphicsPipeline(VkDescriptorSet
colorBlending.blendConstants[2] = 0.0f;
colorBlending.blendConstants[3] = 0.0f;
VkPushConstantRange pushConstantRange{
.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT,
.offset = 0,
.size = 3 * sizeof(float) * 2 // 3 vec2's
};
VkPipelineLayoutCreateInfo pipelineLayoutInfo{};
pipelineLayoutInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO;
pipelineLayoutInfo.setLayoutCount = 1;
pipelineLayoutInfo.pSetLayouts = &descriptorLayout;
pipelineLayoutInfo.pushConstantRangeCount = 1;
pipelineLayoutInfo.pPushConstantRanges = &pushConstantRange;
VkResult result = vkCreatePipelineLayout(m_logicalDevice, &pipelineLayoutInfo, nullptr, &m_pipelineLayout);
if (result != VK_SUCCESS)
@ -2576,7 +2669,7 @@ VkPipeline VulkanRenderer::backbufferBlit_createGraphicsPipeline(VkDescriptorSet
throw std::runtime_error(fmt::format("Failed to create graphics pipeline: {}", result));
}
s_pipeline_cache[hash] = pipeline;
m_backbufferBlitPipelineCache[hash] = pipeline;
m_pipeline_cache_semaphore.notify();
return pipeline;
@ -2695,11 +2788,21 @@ void VulkanRenderer::SwapBuffer(bool mainWindow)
ClearColorImageRaw(chainInfo.m_swapchainImages[chainInfo.swapchainImageIndex], 0, 0, clearColor, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_PRESENT_SRC_KHR);
}
const size_t currentFrameCmdBufferID = GetCurrentCommandBufferId();
VkSemaphore presentSemaphore = chainInfo.m_presentSemaphores[chainInfo.swapchainImageIndex];
SubmitCommandBuffer(presentSemaphore); // submit all command and signal semaphore
cemu_assert_debug(m_numSubmittedCmdBuffers > 0);
// wait for the previous frame to finish rendering
WaitCommandBufferFinished(m_commandBufferIDOfPrevFrame);
m_commandBufferIDOfPrevFrame = currentFrameCmdBufferID;
chainInfo.WaitAvailableFence();
VkPresentIdKHR presentId = {};
VkPresentInfoKHR presentInfo = {};
presentInfo.sType = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR;
presentInfo.swapchainCount = 1;
@ -2709,6 +2812,24 @@ void VulkanRenderer::SwapBuffer(bool mainWindow)
presentInfo.waitSemaphoreCount = 1;
presentInfo.pWaitSemaphores = &presentSemaphore;
// if present_wait is available and enabled, add frame markers to present requests
// and limit the number of queued present operations
if (m_featureControl.deviceExtensions.present_wait && chainInfo.m_maxQueued > 0)
{
presentId.sType = VK_STRUCTURE_TYPE_PRESENT_ID_KHR;
presentId.swapchainCount = 1;
presentId.pPresentIds = &chainInfo.m_presentId;
presentInfo.pNext = &presentId;
if(chainInfo.m_queueDepth >= chainInfo.m_maxQueued)
{
uint64 waitFrameId = chainInfo.m_presentId - chainInfo.m_queueDepth;
vkWaitForPresentKHR(m_logicalDevice, chainInfo.m_swapchain, waitFrameId, 40'000'000);
chainInfo.m_queueDepth--;
}
}
VkResult result = vkQueuePresentKHR(m_presentQueue, &presentInfo);
if (result < 0 && result != VK_ERROR_OUT_OF_DATE_KHR)
{
@ -2717,6 +2838,12 @@ void VulkanRenderer::SwapBuffer(bool mainWindow)
if(result == VK_ERROR_OUT_OF_DATE_KHR || result == VK_SUBOPTIMAL_KHR)
chainInfo.m_shouldRecreate = true;
if(result >= 0)
{
chainInfo.m_queueDepth++;
chainInfo.m_presentId++;
}
chainInfo.hasDefinedSwapchainImage = false;
chainInfo.swapchainImageIndex = -1;
@ -2839,9 +2966,6 @@ void VulkanRenderer::DrawBackbufferQuad(LatteTextureView* texView, RendererOutpu
LatteTextureViewVk* texViewVk = (LatteTextureViewVk*)texView;
draw_endRenderPass();
if (clearBackground)
ClearColorbuffer(padView);
// barrier for input texture
VkMemoryBarrier memoryBarrier{};
memoryBarrier.sType = VK_STRUCTURE_TYPE_MEMORY_BARRIER;
@ -2878,11 +3002,40 @@ void VulkanRenderer::DrawBackbufferQuad(LatteTextureView* texView, RendererOutpu
vkCmdBeginRenderPass(m_state.currentCommandBuffer, &renderPassInfo, VK_SUBPASS_CONTENTS_INLINE);
if (clearBackground)
{
VkClearAttachment clearAttachment{};
clearAttachment.clearValue = {0,0,0,0};
clearAttachment.colorAttachment = 0;
clearAttachment.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
VkClearRect clearExtent = {{{0,0},chainInfo.m_actualExtent}, 0, 1};
vkCmdClearAttachments(m_state.currentCommandBuffer, 1, &clearAttachment, 1, &clearExtent);
}
vkCmdBindPipeline(m_state.currentCommandBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline);
m_state.currentPipeline = pipeline;
vkCmdBindDescriptorSets(m_state.currentCommandBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, m_pipelineLayout, 0, 1, &descriptSet, 0, nullptr);
// update push constants
Vector2f pushData[3];
// textureSrcResolution
sint32 effectiveWidth, effectiveHeight;
texView->baseTexture->GetEffectiveSize(effectiveWidth, effectiveHeight, 0);
pushData[0] = {(float)effectiveWidth, (float)effectiveHeight};
// nativeResolution
pushData[1] = {
(float)texViewVk->baseTexture->width,
(float)texViewVk->baseTexture->height,
};
// outputResolution
pushData[2] = {(float)imageWidth,(float)imageHeight};
vkCmdPushConstants(m_state.currentCommandBuffer, m_pipelineLayout, VK_SHADER_STAGE_FRAGMENT_BIT, 0, sizeof(float) * 2 * 3, &pushData);
vkCmdDraw(m_state.currentCommandBuffer, 6, 1, 0, 0);
vkCmdEndRenderPass(m_state.currentCommandBuffer);
@ -2923,9 +3076,8 @@ VkDescriptorSet VulkanRenderer::backbufferBlit_createDescriptorSet(VkDescriptorS
hash += (uint64)texViewVk->GetViewRGBA();
hash += (uint64)texViewVk->GetDefaultTextureSampler(useLinearTexFilter);
static std::unordered_map<uint64, VkDescriptorSet> s_set_cache;
const auto it = s_set_cache.find(hash);
if (it != s_set_cache.cend())
const auto it = m_backbufferBlitDescriptorSetCache.find(hash);
if (it != m_backbufferBlitDescriptorSetCache.cend())
return it->second;
VkDescriptorSetAllocateInfo allocInfo = {};
@ -2956,7 +3108,7 @@ VkDescriptorSet VulkanRenderer::backbufferBlit_createDescriptorSet(VkDescriptorS
vkUpdateDescriptorSets(m_logicalDevice, 1, &descriptorWrites, 0, nullptr);
performanceMonitor.vk.numDescriptorSamplerTextures.increment();
s_set_cache[hash] = result;
m_backbufferBlitDescriptorSetCache[hash] = result;
return result;
}
@ -3089,7 +3241,8 @@ VkDescriptorSetInfo::~VkDescriptorSetInfo()
performanceMonitor.vk.numDescriptorDynUniformBuffers.decrement(statsNumDynUniformBuffers);
performanceMonitor.vk.numDescriptorStorageBuffers.decrement(statsNumStorageBuffers);
VulkanRenderer::GetInstance()->ReleaseDestructibleObject(m_vkObjDescriptorSet);
auto renderer = VulkanRenderer::GetInstance();
renderer->ReleaseDestructibleObject(m_vkObjDescriptorSet);
m_vkObjDescriptorSet = nullptr;
}
@ -3491,13 +3644,13 @@ void VulkanRenderer::buffer_bindUniformBuffer(LatteConst::ShaderType shaderType,
switch (shaderType)
{
case LatteConst::ShaderType::Vertex:
dynamicOffsetInfo.shaderUB[VulkanRendererConst::SHADER_STAGE_INDEX_VERTEX].unformBufferOffset[bufferIndex] = offset;
dynamicOffsetInfo.shaderUB[VulkanRendererConst::SHADER_STAGE_INDEX_VERTEX].uniformBufferOffset[bufferIndex] = offset;
break;
case LatteConst::ShaderType::Geometry:
dynamicOffsetInfo.shaderUB[VulkanRendererConst::SHADER_STAGE_INDEX_GEOMETRY].unformBufferOffset[bufferIndex] = offset;
dynamicOffsetInfo.shaderUB[VulkanRendererConst::SHADER_STAGE_INDEX_GEOMETRY].uniformBufferOffset[bufferIndex] = offset;
break;
case LatteConst::ShaderType::Pixel:
dynamicOffsetInfo.shaderUB[VulkanRendererConst::SHADER_STAGE_INDEX_FRAGMENT].unformBufferOffset[bufferIndex] = offset;
dynamicOffsetInfo.shaderUB[VulkanRendererConst::SHADER_STAGE_INDEX_FRAGMENT].uniformBufferOffset[bufferIndex] = offset;
break;
default:
cemu_assert_debug(false);
@ -3599,7 +3752,7 @@ void VulkanRenderer::bufferCache_copyStreamoutToMainBuffer(uint32 srcOffset, uin
void VulkanRenderer::AppendOverlayDebugInfo()
{
ImGui::Text("--- Vulkan info ---");
ImGui::Text("--- Vulkan debug info ---");
ImGui::Text("GfxPipelines %u", performanceMonitor.vk.numGraphicPipelines.get());
ImGui::Text("DescriptorSets %u", performanceMonitor.vk.numDescriptorSets.get());
ImGui::Text("DS ImgSamplers %u", performanceMonitor.vk.numDescriptorSamplerTextures.get());
@ -3607,6 +3760,7 @@ void VulkanRenderer::AppendOverlayDebugInfo()
ImGui::Text("DS StorageBuf %u", performanceMonitor.vk.numDescriptorStorageBuffers.get());
ImGui::Text("Images %u", performanceMonitor.vk.numImages.get());
ImGui::Text("ImageView %u", performanceMonitor.vk.numImageViews.get());
ImGui::Text("ImageSampler %u", performanceMonitor.vk.numSamplers.get());
ImGui::Text("RenderPass %u", performanceMonitor.vk.numRenderPass.get());
ImGui::Text("Framebuffer %u", performanceMonitor.vk.numFramebuffer.get());
m_spinlockDestructionQueue.lock();
@ -3616,7 +3770,7 @@ void VulkanRenderer::AppendOverlayDebugInfo()
ImGui::Text("BeginRP/f %u", performanceMonitor.vk.numBeginRenderpassPerFrame.get());
ImGui::Text("Barriers/f %u", performanceMonitor.vk.numDrawBarriersPerFrame.get());
ImGui::Text("--- Cache info ---");
ImGui::Text("--- Cache debug info ---");
uint32 bufferCacheHeapSize = 0;
uint32 bufferCacheAllocationSize = 0;
@ -3636,7 +3790,7 @@ void VulkanRenderer::AppendOverlayDebugInfo()
ImGui::SameLine(60.0f);
ImGui::Text("%06uKB / %06uKB Buffers: %u", ((uint32)(totalSize - freeSize) + 1023) / 1024, ((uint32)totalSize + 1023) / 1024, (uint32)numBuffers);
memoryManager->getIndexAllocator().GetStats(numBuffers, totalSize, freeSize);
memoryManager->GetIndexAllocator().GetStats(numBuffers, totalSize, freeSize);
ImGui::Text("Index");
ImGui::SameLine(60.0f);
ImGui::Text("%06uKB / %06uKB Buffers: %u", ((uint32)(totalSize - freeSize) + 1023) / 1024, ((uint32)totalSize + 1023) / 1024, (uint32)numBuffers);
@ -3652,7 +3806,7 @@ void VKRDestructibleObject::flagForCurrentCommandBuffer()
bool VKRDestructibleObject::canDestroy()
{
if (refCount > 0)
if (m_refCount > 0)
return false;
return VulkanRenderer::GetInstance()->HasCommandBufferFinished(m_lastCmdBufferId);
}
@ -3693,6 +3847,111 @@ VKRObjectTextureView::~VKRObjectTextureView()
performanceMonitor.vk.numImageViews.decrement();
}
static uint64 CalcHashSamplerCreateInfo(const VkSamplerCreateInfo& info)
{
uint64 h = 0xcbf29ce484222325ULL;
auto fnvHashCombine = [](uint64_t &h, auto val) {
using T = decltype(val);
static_assert(sizeof(T) <= 8);
uint64_t val64 = 0;
std::memcpy(&val64, &val, sizeof(val));
h ^= val64;
h *= 0x100000001b3ULL;
};
cemu_assert_debug(info.sType == VK_STRUCTURE_TYPE_SAMPLER_CREATE_INFO);
fnvHashCombine(h, info.flags);
fnvHashCombine(h, info.magFilter);
fnvHashCombine(h, info.minFilter);
fnvHashCombine(h, info.mipmapMode);
fnvHashCombine(h, info.addressModeU);
fnvHashCombine(h, info.addressModeV);
fnvHashCombine(h, info.addressModeW);
fnvHashCombine(h, info.mipLodBias);
fnvHashCombine(h, info.anisotropyEnable);
if(info.anisotropyEnable == VK_TRUE)
fnvHashCombine(h, info.maxAnisotropy);
fnvHashCombine(h, info.compareEnable);
if(info.compareEnable == VK_TRUE)
fnvHashCombine(h, info.compareOp);
fnvHashCombine(h, info.minLod);
fnvHashCombine(h, info.maxLod);
fnvHashCombine(h, info.borderColor);
fnvHashCombine(h, info.unnormalizedCoordinates);
// handle custom border color
VkBaseOutStructure* ext = (VkBaseOutStructure*)info.pNext;
while(ext)
{
if(ext->sType == VK_STRUCTURE_TYPE_SAMPLER_CUSTOM_BORDER_COLOR_CREATE_INFO_EXT)
{
auto* extInfo = (VkSamplerCustomBorderColorCreateInfoEXT*)ext;
fnvHashCombine(h, extInfo->customBorderColor.uint32[0]);
fnvHashCombine(h, extInfo->customBorderColor.uint32[1]);
fnvHashCombine(h, extInfo->customBorderColor.uint32[2]);
fnvHashCombine(h, extInfo->customBorderColor.uint32[3]);
}
else
{
cemu_assert_unimplemented();
}
ext = ext->pNext;
}
return h;
}
std::unordered_map<uint64, VKRObjectSampler*> VKRObjectSampler::s_samplerCache;
VKRObjectSampler::VKRObjectSampler(VkSamplerCreateInfo* samplerInfo)
{
auto* vulkanRenderer = VulkanRenderer::GetInstance();
if (vkCreateSampler(vulkanRenderer->GetLogicalDevice(), samplerInfo, nullptr, &m_sampler) != VK_SUCCESS)
vulkanRenderer->UnrecoverableError("Failed to create texture sampler");
performanceMonitor.vk.numSamplers.increment();
m_hash = CalcHashSamplerCreateInfo(*samplerInfo);
}
VKRObjectSampler::~VKRObjectSampler()
{
vkDestroySampler(VulkanRenderer::GetInstance()->GetLogicalDevice(), m_sampler, nullptr);
performanceMonitor.vk.numSamplers.decrement();
// remove from cache
auto it = s_samplerCache.find(m_hash);
if(it != s_samplerCache.end())
s_samplerCache.erase(it);
}
void VKRObjectSampler::RefCountReachedZero()
{
VulkanRenderer::GetInstance()->ReleaseDestructibleObject(this);
}
VKRObjectSampler* VKRObjectSampler::GetOrCreateSampler(VkSamplerCreateInfo* samplerInfo)
{
auto* vulkanRenderer = VulkanRenderer::GetInstance();
uint64 hash = CalcHashSamplerCreateInfo(*samplerInfo);
auto it = s_samplerCache.find(hash);
if (it != s_samplerCache.end())
{
auto* sampler = it->second;
return sampler;
}
auto* sampler = new VKRObjectSampler(samplerInfo);
s_samplerCache[hash] = sampler;
return sampler;
}
void VKRObjectSampler::DestroyCache()
{
// assuming all other objects which depend on vkSampler are destroyed, this cache should also have been emptied already
// but just to be sure lets still clear the cache
cemu_assert_debug(s_samplerCache.empty());
for(auto& sampler : s_samplerCache)
{
cemu_assert_debug(sampler.second->m_refCount == 0);
delete sampler.second;
}
s_samplerCache.clear();
}
VKRObjectRenderPass::VKRObjectRenderPass(AttachmentInfo_t& attachmentInfo, sint32 colorAttachmentCount)
{
// generate helper hash for pipeline state
@ -3860,33 +4119,36 @@ VKRObjectFramebuffer::~VKRObjectFramebuffer()
VKRObjectPipeline::VKRObjectPipeline()
{
// todo
}
void VKRObjectPipeline::setPipeline(VkPipeline newPipeline)
void VKRObjectPipeline::SetPipeline(VkPipeline newPipeline)
{
cemu_assert_debug(pipeline == VK_NULL_HANDLE);
pipeline = newPipeline;
if(newPipeline != VK_NULL_HANDLE)
if (m_pipeline == newPipeline)
return;
cemu_assert_debug(m_pipeline == VK_NULL_HANDLE); // replacing an already assigned pipeline is not intended
if(m_pipeline == VK_NULL_HANDLE && newPipeline != VK_NULL_HANDLE)
performanceMonitor.vk.numGraphicPipelines.increment();
else if(m_pipeline != VK_NULL_HANDLE && newPipeline == VK_NULL_HANDLE)
performanceMonitor.vk.numGraphicPipelines.decrement();
m_pipeline = newPipeline;
}
VKRObjectPipeline::~VKRObjectPipeline()
{
auto vkr = VulkanRenderer::GetInstance();
if (pipeline != VK_NULL_HANDLE)
if (m_pipeline != VK_NULL_HANDLE)
{
vkDestroyPipeline(vkr->GetLogicalDevice(), pipeline, nullptr);
vkDestroyPipeline(vkr->GetLogicalDevice(), m_pipeline, nullptr);
performanceMonitor.vk.numGraphicPipelines.decrement();
}
if (vertexDSL != VK_NULL_HANDLE)
vkDestroyDescriptorSetLayout(vkr->GetLogicalDevice(), vertexDSL, nullptr);
if (pixelDSL != VK_NULL_HANDLE)
vkDestroyDescriptorSetLayout(vkr->GetLogicalDevice(), pixelDSL, nullptr);
if (geometryDSL != VK_NULL_HANDLE)
vkDestroyDescriptorSetLayout(vkr->GetLogicalDevice(), geometryDSL, nullptr);
if (pipeline_layout != VK_NULL_HANDLE)
vkDestroyPipelineLayout(vkr->GetLogicalDevice(), pipeline_layout, nullptr);
if (m_vertexDSL != VK_NULL_HANDLE)
vkDestroyDescriptorSetLayout(vkr->GetLogicalDevice(), m_vertexDSL, nullptr);
if (m_pixelDSL != VK_NULL_HANDLE)
vkDestroyDescriptorSetLayout(vkr->GetLogicalDevice(), m_pixelDSL, nullptr);
if (m_geometryDSL != VK_NULL_HANDLE)
vkDestroyDescriptorSetLayout(vkr->GetLogicalDevice(), m_geometryDSL, nullptr);
if (m_pipelineLayout != VK_NULL_HANDLE)
vkDestroyPipelineLayout(vkr->GetLogicalDevice(), m_pipelineLayout, nullptr);
}
VKRObjectDescriptorSet::VKRObjectDescriptorSet()

View file

@ -137,8 +137,8 @@ class VulkanRenderer : public Renderer
public:
// memory management
VKRMemoryManager* memoryManager{};
VKRMemoryManager* GetMemoryManager() const { return memoryManager; };
std::unique_ptr<VKRMemoryManager> memoryManager;
VKRMemoryManager* GetMemoryManager() const { return memoryManager.get(); };
VkSupportedFormatInfo_t m_supportedFormatInfo;
@ -328,8 +328,9 @@ public:
RendererShader* shader_create(RendererShader::ShaderType type, uint64 baseHash, uint64 auxHash, const std::string& source, bool isGameShader, bool isGfxPackShader) override;
void* indexData_reserveIndexMemory(uint32 size, uint32& offset, uint32& bufferIndex) override;
void indexData_uploadIndexMemory(uint32 offset, uint32 size) override;
IndexAllocation indexData_reserveIndexMemory(uint32 size) override;
void indexData_releaseIndexMemory(IndexAllocation& allocation) override;
void indexData_uploadIndexMemory(IndexAllocation& allocation) override;
// externally callable
void GetTextureFormatInfoVK(Latte::E_GX2SURFFMT format, bool isDepth, Latte::E_DIM dim, sint32 width, sint32 height, FormatInfoVK* formatInfoOut);
@ -450,6 +451,8 @@ private:
bool synchronization2 = false; // VK_KHR_synchronization2
bool dynamic_rendering = false; // VK_KHR_dynamic_rendering
bool shader_float_controls = false; // VK_KHR_shader_float_controls
bool present_wait = false; // VK_KHR_present_wait
bool depth_clip_enable = false; // VK_EXT_depth_clip_enable
}deviceExtensions;
struct
@ -457,7 +460,7 @@ private:
bool shaderRoundingModeRTEFloat32{ false };
}shaderFloatControls; // from VK_KHR_shader_float_controls
struct
struct
{
bool debug_utils = false; // VK_EXT_DEBUG_UTILS
}instanceExtensions;
@ -581,6 +584,8 @@ private:
std::shared_mutex m_pipeline_cache_save_mutex;
std::thread m_pipeline_cache_save_thread;
VkPipelineCache m_pipeline_cache{ nullptr };
std::unordered_map<uint64, VkPipeline> m_backbufferBlitPipelineCache;
std::unordered_map<uint64, VkDescriptorSet> m_backbufferBlitDescriptorSetCache;
VkPipelineLayout m_pipelineLayout{nullptr};
VkCommandPool m_commandPool{ nullptr };
@ -590,6 +595,7 @@ private:
bool m_uniformVarBufferMemoryIsCoherent{false};
uint8* m_uniformVarBufferPtr = nullptr;
uint32 m_uniformVarBufferWriteIndex = 0;
uint32 m_uniformVarBufferReadIndex = 0;
// transform feedback ringbuffer
VkBuffer m_xfbRingBuffer = VK_NULL_HANDLE;
@ -635,6 +641,8 @@ private:
size_t m_commandBufferIndex = 0; // current buffer being filled
size_t m_commandBufferSyncIndex = 0; // latest buffer that finished execution (updated on submit)
size_t m_commandBufferIDOfPrevFrame = 0;
std::array<size_t, kCommandBufferPoolSize> m_cmdBufferUniformRingbufIndices {}; // index in the uniform ringbuffer
std::array<VkFence, kCommandBufferPoolSize> m_cmd_buffer_fences;
std::array<VkCommandBuffer, kCommandBufferPoolSize> m_commandBuffers;
std::array<VkSemaphore, kCommandBufferPoolSize> m_commandBufferSemaphores;
@ -657,7 +665,7 @@ private:
uint32 uniformVarBufferOffset[VulkanRendererConst::SHADER_STAGE_INDEX_COUNT];
struct
{
uint32 unformBufferOffset[LATTE_NUM_MAX_UNIFORM_BUFFERS];
uint32 uniformBufferOffset[LATTE_NUM_MAX_UNIFORM_BUFFERS];
}shaderUB[VulkanRendererConst::SHADER_STAGE_INDEX_COUNT];
}dynamicOffsetInfo{};
@ -855,7 +863,7 @@ private:
memBarrier.pNext = nullptr;
VkPipelineStageFlags srcStages = VK_PIPELINE_STAGE_TRANSFER_BIT;
VkPipelineStageFlags dstStages = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
VkPipelineStageFlags dstStages = VK_PIPELINE_STAGE_ALL_COMMANDS_BIT;
memBarrier.srcAccessMask = VK_ACCESS_TRANSFER_READ_BIT | VK_ACCESS_TRANSFER_WRITE_BIT;
memBarrier.dstAccessMask = 0;

Some files were not shown because too many files have changed in this diff Show more