- 07 Feb, 2018 4 commits
-
-
Hans Wennborg authored
```--------------------------------------------------------------------- r324496 | yroux | 2018-02-07 19:27:25 +0100 (Wed, 07 Feb 2018) | 9 lines [asan] Fix filename size on linux platforms. This is a a fix for: https://bugs.llvm.org/show_bug.cgi?id=35996 Use filename limits from system headers to be synchronized with what LD_PRELOAD can handle. Differential Revision: https://reviews.llvm.org/D42900 ``` --------------------------------------------------------------------- llvm-svn: 324506
-
Hans Wennborg authored
```--------------------------------------------------------------------- r324467 | atanasyan | 2018-02-07 11:02:49 +0100 (Wed, 07 Feb 2018) | 9 lines [ELF][MIPS] Ignore incorrect version definition index for _gp_disp symbol MIPS BFD linker puts _gp_disp symbol into DSO files and assigns zero version definition index to it. This value means 'unversioned local symbol' while _gp_disp is a section global symbol. We have to handle this bug in the LLD because BFD linker is used for building MIPS toolchain libraries. Differential revision: https://reviews.llvm.org/D42486 ``` --------------------------------------------------------------------- ------------------------------------------------------------------------ r324468 | atanasyan | 2018-02-07 11:14:22 +0100 (Wed, 07 Feb 2018) | 1 line [ELF][MIPS] Mark the test as required MIPS target support. NFC ------------------------------------------------------------------------ llvm-svn: 324471
-
Hans Wennborg authored
```--------------------------------------------------------------------- r324422 | efriedma | 2018-02-07 00:00:17 +0100 (Wed, 07 Feb 2018) | 16 lines [LivePhysRegs] Fix handling of return instructions. See D42509 for the original version of this. Basically, there are two significant changes to behavior here: - addLiveOuts always adds all pristine registers (even if a block has no successors). - addLiveOuts and addLiveOutsNoPristines always add all callee-saved registers for return blocks (including conditional return blocks). I cleaned up the functions a bit to make it clear these properties hold. Differential Revision: https://reviews.llvm.org/D42655 ``` --------------------------------------------------------------------- llvm-svn: 324466
-
Hans Wennborg authored
```--------------------------------------------------------------------- r324439 | compnerd | 2018-02-07 02:55:08 +0100 (Wed, 07 Feb 2018) | 5 lines AST: support SwiftCC on MS ABI Microsoft has reserved the identifier 'S' as the swift calling convention. Decorate the symbols appropriately. This enables swift on Windows. ``` --------------------------------------------------------------------- llvm-svn: 324460
-
- 06 Feb, 2018 5 commits
-
-
Hans Wennborg authored
```--------------------------------------------------------------------- r324153 | ericwf | 2018-02-02 23:39:59 +0100 (Fri, 02 Feb 2018) | 6 lines Fix has_unique_object_representation after Clang commit r324134. Clang previously reported an empty union as having a unique object representation. This was incorrect and was fixed in a recent Clang commit. This patch fixes the libc++ tests. ``` --------------------------------------------------------------------- llvm-svn: 324345
-
Hans Wennborg authored
```--------------------------------------------------------------------- r324246 | mzeren-vmw | 2018-02-05 16:59:00 +0100 (Mon, 05 Feb 2018) | 33 lines [clang-format] Re-land: Fixup #include guard indents after parseFile() Summary: When a preprocessor indent closes after the last line of normal code we do not correctly fixup include guard indents. For example: #ifndef HEADER_H #define HEADER_H #if 1 int i; # define A 0 #endif #endif incorrectly reformats to: #ifndef HEADER_H #define HEADER_H #if 1 int i; # define A 0 # endif #endif To resolve this issue we must fixup levels after parseFile(). Delaying the fixup introduces a new state, so consolidate include guard search state into an enum. Reviewers: krasimir, klimek Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D42035 ``` --------------------------------------------------------------------- llvm-svn: 324331
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323904 | mzeren-vmw | 2018-01-31 21:05:50 +0100 (Wed, 31 Jan 2018) | 34 lines [clang-format] Align preprocessor comments with # Summary: r312125, which introduced preprocessor indentation, shipped with a known issue where "indentation of comments immediately before indented preprocessor lines is toggled on each run". For example these two forms toggle: #ifndef HEADER_H #define HEADER_H #if 1 // comment # define A 0 #endif #endif #ifndef HEADER_H #define HEADER_H #if 1 // comment # define A 0 #endif #endif This happens because we check vertical alignment against the '#' yet indent to the level of the 'define'. This patch resolves this issue by aligning against the '#'. Reviewers: krasimir, klimek, djasper Reviewed By: krasimir Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D42408 ``` --------------------------------------------------------------------- llvm-svn: 324329 -
Hans Wennborg authored
```--------------------------------------------------------------------- r324234 | kamil | 2018-02-05 14:16:22 +0100 (Mon, 05 Feb 2018) | 29 lines Fix a crash in *NetBSD::Factory::Launch Summary: We cannot call process_up->SetState() inside the NativeProcessNetBSD::Factory::Launch function because it triggers a NULL pointer deference. The generic code for launching a process in: GDBRemoteCommunicationServerLLGS::LaunchProcess sets the m_debugged_process_up pointer after a successful call to m_process_factory.Launch(). If we attempt to call process_up->SetState() inside a platform specific Launch function we end up dereferencing a NULL pointer in NativeProcessProtocol::GetCurrentThreadID(). Use the proper call process_up->SetState(,false) that sets notify_delegates to false. Sponsored by <The NetBSD Foundation> Reviewers: labath, joerg Reviewed By: labath Subscribers: lldb-commits Differential Revision: https://reviews.llvm.org/D42868 ``` --------------------------------------------------------------------- llvm-svn: 324327
-
Hans Wennborg authored
```--------------------------------------------------------------------- r324251 | kamil | 2018-02-05 18:12:23 +0100 (Mon, 05 Feb 2018) | 14 lines Sync PlatformNetBSD.cpp with Linux Summary: Various changes in logging from log->Printf() to generic LLDB_LOG(). Sponsored by <The NetBSD Foundation> Reviewers: labath, joerg Reviewed By: labath Subscribers: llvm-commits, lldb-commits Differential Revision: https://reviews.llvm.org/D42912 ``` --------------------------------------------------------------------- llvm-svn: 324326
-
- 05 Feb, 2018 8 commits
-
-
Krzysztof Parzyszek authored
llvm-svn: 324248
-
Hans Wennborg authored
```--------------------------------------------------------------------- r324059 | mstorsjo | 2018-02-02 07:22:35 +0100 (Fri, 02 Feb 2018) | 21 lines [MinGW] Emit typeinfo locally for dllimported classes without key functions This fixes building Qt as shared libraries with clang in MinGW mode; previously subclasses of the QObjectData class (in other DLLs than the base DLL) failed to find the typeinfo symbols (that neither were emitted in the base DLL nor in the DLL containing the subclass). If the virtual destructor in the newly added testcase wouldn't be pure (or if there'd be another non-pure virtual method), it'd be a key function and things would work out even before this change. Make sure to locally emit the typeinfo for these classes as well. This matches what GCC does in this specific testcase. This fixes the root issue that spawned PR35146. (The difference to GCC that is initially described in that bug still is present though.) Differential Revision: https://reviews.llvm.org/D42641 ``` --------------------------------------------------------------------- llvm-svn: 324219
-
Hans Wennborg authored
```--------------------------------------------------------------------- r324039 | matze | 2018-02-02 01:08:19 +0100 (Fri, 02 Feb 2018) | 33 lines SplitKit: Fix liveness recomputation in some remat cases. Example situation: ``` BB0: %0 = ... use %0 ; ... condjump BB1 jmp BB2 BB1: %0 = ... ; rematerialized def from above (from earlier split step) jmp BB2 BB2: ; ... use %0 ``` %0 will have a live interval with 3 value numbers (for the BB0, BB1 and BB2 parts). Now SplitKit tries and succeeds in rematerializing the value number in BB2 (This only works because it is a secondary split so SplitKit is can trace this back to a single original def). We need to recompute all live ranges affected by a value number that we rematerialize. The case that we missed before is that when the value that is rematerialized is at a join (Phi VNI) then we also have to recompute liveness for the predecessor VNIs. rdar://35699130 Differential Revision: https://reviews.llvm.org/D42667 ``` --------------------------------------------------------------------- llvm-svn: 324218
-
Hans Wennborg authored
```--------------------------------------------------------------------- r324002 | ctopper | 2018-02-01 21:48:50 +0100 (Thu, 01 Feb 2018) | 7 lines [DAGCombiner] When folding (insert_subvector undef, (bitcast (extract_subvector N1, Idx)), Idx) -> (bitcast N1) make sure that N1 has the same total size as the original output We were only checking the element count, but not the total width. This could cause illegal bitcasts to be created if for example the output was 512-bits, but N1 is 256 bits, and the extraction size was 128-bits. Fixes PR36199 Differential Revision: https://reviews.llvm.org/D42809 ``` --------------------------------------------------------------------- llvm-svn: 324216
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323935 | rsmith | 2018-02-01 01:28:36 +0100 (Thu, 01 Feb 2018) | 5 lines PR36181: Teach CodeGen to properly ignore requests to emit dependent entities. Previously, friend function definitions within class templates slipped through the gaps and caused the MS mangler to assert. ``` --------------------------------------------------------------------- llvm-svn: 324215
-
Hans Wennborg authored
```--------------------------------------------------------------------- r324134 | ericwf | 2018-02-02 21:30:39 +0100 (Fri, 02 Feb 2018) | 14 lines Make __has_unique_object_representations reject empty union types. Summary: Clang incorrectly reports empty unions as having a unique object representation. However, this is not correct since `sizeof(EmptyUnion) == 1` AKA it has 8 bits of padding. Therefore it should be treated the same as an empty struct and report `false`. @erichkeane also suggested this fix should be merged into the 6.0 release branch, so the initial release of `__has_unique_object_representations` is as bug-free as possible. Reviewers: erichkeane, rsmith, aaron.ballman, majnemer Reviewed By: erichkeane Subscribers: cfe-commits, erichkeane Differential Revision: https://reviews.llvm.org/D42863 ``` --------------------------------------------------------------------- llvm-svn: 324213
-
Hans Wennborg authored
Patch by Amara Emerson. Differential Revision: https://reviews.llvm.org/D42860 llvm-svn: 324212
-
Hans Wennborg authored
Patch by Amara Emerson. Differential Revision: https://reviews.llvm.org/D42861 llvm-svn: 324211
-
- 02 Feb, 2018 14 commits
-
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323908 | mareko | 2018-01-31 21:18:04 +0100 (Wed, 31 Jan 2018) | 7 lines AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16} Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D41663 ``` --------------------------------------------------------------------- llvm-svn: 324103 -
Hans Wennborg authored
```--------------------------------------------------------------------- r324043 | ruiu | 2018-02-02 01:31:05 +0100 (Fri, 02 Feb 2018) | 6 lines Fix typo: --nopie -> --no-pie. --nopie was a typo. GNU gold doesn't recognize it. It is also inconsistent with other options that have --foo and --no-foo. Differential Revision: https://reviews.llvm.org/D42825 ``` --------------------------------------------------------------------- llvm-svn: 324100
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323643 | jdevlieghere | 2018-01-29 13:10:32 +0100 (Mon, 29 Jan 2018) | 16 lines [Sparc] Account for bias in stack readjustment Summary: This was broken long ago in D12208, which failed to account for the fact that 64-bit SPARC uses a stack bias of 2047, and it is the *unbiased* value which should be aligned, not the biased one. This was seen to be an issue with Rust. Patch by: jrtc27 (James Clarke) Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: jacob_hansen, JDevlieghere, fhahn, fedor.sergeev, llvm-commits Differential Revision: https://reviews.llvm.org/D39425 ``` --------------------------------------------------------------------- llvm-svn: 324090
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323909 | mareko | 2018-01-31 21:18:11 +0100 (Wed, 31 Jan 2018) | 13 lines AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9 Summary: This enables load merging into x2, x4, which is driven by inline offsets. 6500 shaders are affected: Code Size in affected shaders: -15.14 % Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D42078 ``` --------------------------------------------------------------------- llvm-svn: 324089
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323907 | mareko | 2018-01-31 21:17:52 +0100 (Wed, 31 Jan 2018) | 11 lines [SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs Summary: !amdgpu.uniform needs to be preserved for AMDGPU, otherwise bad things happen. Reviewers: arsenm, nhaehnle, jingyue, broune, majnemer, bjarke.roune, dblaikie Subscribers: wdng, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D42744 ``` --------------------------------------------------------------------- ------------------------------------------------------------------------ r323913 | mareko | 2018-01-31 21:49:19 +0100 (Wed, 31 Jan 2018) | 1 line [SeparateConstOffsetFromGEP] Fix up addrspace in the AMDGPU test ------------------------------------------------------------------------ llvm-svn: 324088
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323536 | arichardson | 2018-01-26 16:56:14 +0100 (Fri, 26 Jan 2018) | 11 lines [MIPS] Don't crash on unsized extern types with -mgpopt Summary: This fixes an assertion when building the FreeBSD MIPS64 kernel. Reviewers: atanasyan, sdardis, emaste Reviewed By: sdardis Subscribers: krytarowski, llvm-commits Differential Revision: https://reviews.llvm.org/D42571 ``` --------------------------------------------------------------------- llvm-svn: 324087
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323759 | spatel | 2018-01-30 14:53:59 +0100 (Tue, 30 Jan 2018) | 10 lines [DSE] make sure memory is not modified before partial store merging (PR36129) We missed a critical check in D30703. We must make sure that no intermediate store is sitting between the stores that we want to merge. This should fix: https://bugs.llvm.org/show_bug.cgi?id=36129 Differential Revision: https://reviews.llvm.org/D42663 ``` --------------------------------------------------------------------- llvm-svn: 324086
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323781 | sdardis | 2018-01-30 17:24:10 +0100 (Tue, 30 Jan 2018) | 15 lines [mips] Fix incorrect sign extension for fpowi libcall PR36061 showed that during the expansion of ISD::FPOWI, that there was an incorrect zero extension of the integer argument which for MIPS64 would then give incorrect results. Address this with the existing mechanism for correcting sign extensions. This resolves PR36061. Thanks to James Cowgill for reporting the issue! Reviewers: atanasyan, hfinkel Differential Revision: https://reviews.llvm.org/D42537 ``` --------------------------------------------------------------------- llvm-svn: 324085
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323857 | rogfer01 | 2018-01-31 10:23:43 +0100 (Wed, 31 Jan 2018) | 19 lines [ARM] Allow the scheduler to clone a node with glue to avoid a copy CPSR
↔ GPR. In Thumb 1, with the new ADDCARRY / SUBCARRY the scheduler may need to do copies CPSR↔ GPR but not all Thumb1 targets implement them. The schedule can attempt, before attempting a copy, to clone the instructions but it does not currently do that for nodes with input glue. In this patch we introduce a target-hook to let the hook decide if a glued machinenode is still eligible for copying. In this case these are ARM::tADCS and ARM::tSBCS . As a follow-up of this change we should actually implement the copies for the Thumb1 targets that do implement them and restrict the hook to the targets that can't really do such copy as these clones are not ideal. This change fixes PR35836. Differential Revision: https://reviews.llvm.org/D42051 ``` --------------------------------------------------------------------- llvm-svn: 324082 -
Hans Wennborg authored
```--------------------------------------------------------------------- r323915 | chandlerc | 2018-01-31 21:56:37 +0100 (Wed, 31 Jan 2018) | 17 lines [x86] Make the retpoline thunk insertion a machine function pass. Summary: This removes the need for a machine module pass using some deeply questionable hacks. This should address PR36123 which is a case where in full LTO the memory usage of a machine module pass actually ended up being significant. We should revert this on trunk as soon as we understand and fix the memory usage issue, but we should include this in any backports of retpolines themselves. Reviewers: echristo, MatzeB Subscribers: sanjoy, mcrosier, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D42726 ``` --------------------------------------------------------------------- llvm-svn: 324071
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323288 | ruiu | 2018-01-24 01:26:57 +0100 (Wed, 24 Jan 2018) | 3 lines Fix retpoline PLT header size for i386. Differential Revision: https://reviews.llvm.org/D42397 ``` --------------------------------------------------------------------- llvm-svn: 324070
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323155 | chandlerc | 2018-01-22 23:05:25 +0100 (Mon, 22 Jan 2018) | 133 lines Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took *in the speculative execution*, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile *all* libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we *strongly* recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 ``` --------------------------------------------------------------------- llvm-svn: 324069
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323155 | chandlerc | 2018-01-22 23:05:25 +0100 (Mon, 22 Jan 2018) | 133 lines Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took *in the speculative execution*, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile *all* libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we *strongly* recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 ``` --------------------------------------------------------------------- llvm-svn: 324068
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323155 | chandlerc | 2018-01-22 23:05:25 +0100 (Mon, 22 Jan 2018) | 133 lines Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took *in the speculative execution*, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile *all* libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we *strongly* recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 ``` --------------------------------------------------------------------- llvm-svn: 324067
-
- 01 Feb, 2018 1 commit
-
-
Hans Wennborg authored
llvm-svn: 323948
-
- 31 Jan, 2018 5 commits
-
-
Anastasia Stulova authored
Differential Revision: https://reviews.llvm.org/D42307 llvm-svn: 323875
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323813 | tejohnson | 2018-01-30 21:16:32 +0100 (Tue, 30 Jan 2018) | 14 lines Teach ValueMapper to use ODR uniqued types when available Summary: This is exposed during ThinLTO compilation, when we import an alias by creating a clone of the aliasee. Without this fix the debug type is unnecessarily cloned and we get a duplicate, undoing the uniquing. Fixes PR36089. Reviewers: mehdi_amini, pcc Subscribers: eraman, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D41669 ``` --------------------------------------------------------------------- llvm-svn: 323854
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323811 | mstorsjo | 2018-01-30 20:50:58 +0100 (Tue, 30 Jan 2018) | 3 lines [GlobalISel] Bail out on calls to dllimported functions Differential Revision: https://reviews.llvm.org/D42568 ``` --------------------------------------------------------------------- llvm-svn: 323853
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323810 | mstorsjo | 2018-01-30 20:50:51 +0100 (Tue, 30 Jan 2018) | 3 lines [AArch64] Properly handle dllimport of variables when using fast-isel Differential Revision: https://reviews.llvm.org/D42567 ``` --------------------------------------------------------------------- llvm-svn: 323852
-
Hans Wennborg authored
```--------------------------------------------------------------------- r322588 | eugenis | 2018-01-16 20:21:45 +0100 (Tue, 16 Jan 2018) | 9 lines [hwasan] Build runtime library with -fPIC, not -fPIE. Summary: -fPIE can not be used when building a shared library. Reviewers: alekseyshl, peter.smith Subscribers: kubamracek, llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D42121 ``` --------------------------------------------------------------------- llvm-svn: 323850
-
- 30 Jan, 2018 3 commits
-
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323706 | mareko | 2018-01-30 00:19:10 +0100 (Tue, 30 Jan 2018) | 15 lines AMDGPU: Allow a SGPR for the conditional KILL operand Patch by: Bas Nieuwenhuizen Just use the _e64 variant if needed. This should be possible as per def : Pat < (int_amdgcn_kill (i1 (setcc f32:$src, InlineFPImm<f32>:$imm, cond:$cond))), (SI_KILL_F32_COND_IMM_PSEUDO $src, (bitcast_fpimm_to_i32 $imm), (cond_as_i32imm $cond)) > ; I don't think we can get an immediate for the other operand for which we need the second 32-bit word. https://reviews.llvm.org/D42302 ``` --------------------------------------------------------------------- llvm-svn: 323772
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323515 | fhahn | 2018-01-26 11:36:50 +0100 (Fri, 26 Jan 2018) | 7 lines [CallSiteSplitting] Fix infinite loop when recording conditions. Fix infinite loop when recording conditions by correctly marking basic blocks as visited. Fixes https://bugs.llvm.org/show_bug.cgi?id=36105 ``` --------------------------------------------------------------------- llvm-svn: 323771
-
Hans Wennborg authored
```--------------------------------------------------------------------- r323469 | ctopper | 2018-01-25 22:23:57 +0100 (Thu, 25 Jan 2018) | 3 lines [X86] Teach Intel syntax InstPrinter to print lock prefixes that have been parsed from the asm parser. The asm parser puts the lock prefix in the MCInst flags so we need to check that in addition to TSFlags. This matches what the ATT printer does. ``` --------------------------------------------------------------------- llvm-svn: 323770
-