and though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here

  1. 07 Feb, 2018 4 commits
    • Hans Wennborg's avatar
      Merging r324496: · 5aa8942f
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r324496 | yroux | 2018-02-07 19:27:25 +0100 (Wed, 07 Feb 2018) | 9 lines
      
      [asan] Fix filename size on linux platforms.
      
      This is a a fix for:
      https://bugs.llvm.org/show_bug.cgi?id=35996
      
      Use filename limits from system headers to be synchronized with what
      LD_PRELOAD can handle.
      
      Differential Revision: https://reviews.llvm.org/D42900
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324506
      5aa8942f
    • Hans Wennborg's avatar
      Merging r324467 and r324468: · 79ce1bb1
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r324467 | atanasyan | 2018-02-07 11:02:49 +0100 (Wed, 07 Feb 2018) | 9 lines
      
      [ELF][MIPS] Ignore incorrect version definition index for _gp_disp symbol
      
      MIPS BFD linker puts _gp_disp symbol into DSO files and assigns zero
      version definition index to it. This value means 'unversioned local
      symbol' while _gp_disp is a section global symbol. We have to handle
      this bug in the LLD because BFD linker is used for building MIPS
      toolchain libraries.
      
      Differential revision: https://reviews.llvm.org/D42486
      ```
      
      ---------------------------------------------------------------------
      
      ------------------------------------------------------------------------
      r324468 | atanasyan | 2018-02-07 11:14:22 +0100 (Wed, 07 Feb 2018) | 1 line
      
      [ELF][MIPS] Mark the test as required MIPS target support. NFC
      ------------------------------------------------------------------------
      
      llvm-svn: 324471
      79ce1bb1
    • Hans Wennborg's avatar
      Merging r324422: · 43ba9d9f
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r324422 | efriedma | 2018-02-07 00:00:17 +0100 (Wed, 07 Feb 2018) | 16 lines
      
      [LivePhysRegs] Fix handling of return instructions.
      
      See D42509 for the original version of this.
      
      Basically, there are two significant changes to behavior here:
      
      - addLiveOuts always adds all pristine registers (even if a block has
      no successors).
      - addLiveOuts and addLiveOutsNoPristines always add all callee-saved
      registers for return blocks (including conditional return blocks).
      
      I cleaned up the functions a bit to make it clear these properties hold.
      
      Differential Revision: https://reviews.llvm.org/D42655
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324466
      43ba9d9f
    • Hans Wennborg's avatar
      Merging r324439: · 0e83d687
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r324439 | compnerd | 2018-02-07 02:55:08 +0100 (Wed, 07 Feb 2018) | 5 lines
      
      AST: support SwiftCC on MS ABI
      
      Microsoft has reserved the identifier 'S' as the swift calling
      convention.  Decorate the symbols appropriately.  This enables swift on
      Windows.
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324460
      0e83d687
  2. 06 Feb, 2018 5 commits
    • Hans Wennborg's avatar
      Merging r324153: · 34c856a8
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r324153 | ericwf | 2018-02-02 23:39:59 +0100 (Fri, 02 Feb 2018) | 6 lines
      
      Fix has_unique_object_representation after Clang commit r324134.
      
      Clang previously reported an empty union as having a unique object
      representation. This was incorrect and was fixed in a recent Clang commit.
      
      This patch fixes the libc++ tests.
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324345
      34c856a8
    • Hans Wennborg's avatar
      Merging r324246: · a6dc176b
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r324246 | mzeren-vmw | 2018-02-05 16:59:00 +0100 (Mon, 05 Feb 2018) | 33 lines
      
      [clang-format] Re-land: Fixup #include guard indents after parseFile()
      
      Summary:
      When a preprocessor indent closes after the last line of normal code we do not
      correctly fixup include guard indents. For example:
      
        #ifndef HEADER_H
        #define HEADER_H
        #if 1
        int i;
        #  define A 0
        #endif
        #endif
      
      incorrectly reformats to:
      
        #ifndef HEADER_H
        #define HEADER_H
        #if 1
        int i;
        #    define A 0
        #  endif
        #endif
      
      To resolve this issue we must fixup levels after parseFile(). Delaying
      the fixup introduces a new state, so consolidate include guard search
      state into an enum.
      
      Reviewers: krasimir, klimek
      
      Subscribers: cfe-commits
      
      Differential Revision: https://reviews.llvm.org/D42035
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324331
      a6dc176b
    • Hans Wennborg's avatar
      Merging r323904: · 0b67f617
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323904 | mzeren-vmw | 2018-01-31 21:05:50 +0100 (Wed, 31 Jan 2018) | 34 lines
      
      [clang-format] Align preprocessor comments with #
      
      Summary:
      r312125, which introduced preprocessor indentation, shipped with a known
      issue where "indentation of comments immediately before indented
      preprocessor lines is toggled on each run". For example these two forms
      toggle:
      
        #ifndef HEADER_H
        #define HEADER_H
        #if 1
        // comment
        #   define A 0
        #endif
        #endif
      
        #ifndef HEADER_H
        #define HEADER_H
        #if 1
           // comment
        #   define A 0
        #endif
        #endif
      
      This happens because we check vertical alignment against the '#' yet
      indent to the level of the 'define'. This patch resolves this issue by
      aligning against the '#'.
      
      Reviewers: krasimir, klimek, djasper
      
      Reviewed By: krasimir
      
      Subscribers: cfe-commits
      Differential Revision: https://reviews.llvm.org/D42408
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324329
      0b67f617
    • Hans Wennborg's avatar
      Merging r324234: · d85785aa
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r324234 | kamil | 2018-02-05 14:16:22 +0100 (Mon, 05 Feb 2018) | 29 lines
      
      Fix a crash in *NetBSD::Factory::Launch
      
      Summary:
      We cannot call process_up->SetState() inside
      the NativeProcessNetBSD::Factory::Launch
      function because it triggers a NULL pointer
      deference.
      
      The generic code for launching a process in:
      GDBRemoteCommunicationServerLLGS::LaunchProcess
      sets the m_debugged_process_up pointer after
      a successful call to  m_process_factory.Launch().
      If we attempt to call process_up->SetState()
      inside a platform specific Launch function we
      end up dereferencing a NULL pointer in
      NativeProcessProtocol::GetCurrentThreadID().
      
      Use the proper call process_up->SetState(,false)
      that sets notify_delegates to false.
      
      Sponsored by <The NetBSD Foundation>
      
      Reviewers: labath, joerg
      
      Reviewed By: labath
      
      Subscribers: lldb-commits
      
      Differential Revision: https://reviews.llvm.org/D42868
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324327
      d85785aa
    • Hans Wennborg's avatar
      Merging r324251: · bdb1df16
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r324251 | kamil | 2018-02-05 18:12:23 +0100 (Mon, 05 Feb 2018) | 14 lines
      
      Sync PlatformNetBSD.cpp with Linux
      
      Summary:
      Various changes in logging from log->Printf() to generic LLDB_LOG().
      
      Sponsored by <The NetBSD Foundation>
      
      Reviewers: labath, joerg
      
      Reviewed By: labath
      
      Subscribers: llvm-commits, lldb-commits
      
      Differential Revision: https://reviews.llvm.org/D42912
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324326
      bdb1df16
  3. 05 Feb, 2018 8 commits
    • Krzysztof Parzyszek's avatar
      [Hexagon] Add release notes for 6.0.0 · e337f237
      Krzysztof Parzyszek authored
      llvm-svn: 324248
      e337f237
    • Hans Wennborg's avatar
      Merging r324059: · 2cf6b595
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r324059 | mstorsjo | 2018-02-02 07:22:35 +0100 (Fri, 02 Feb 2018) | 21 lines
      
      [MinGW] Emit typeinfo locally for dllimported classes without key functions
      
      This fixes building Qt as shared libraries with clang in MinGW
      mode; previously subclasses of the QObjectData class (in other
      DLLs than the base DLL) failed to find the typeinfo symbols
      (that neither were emitted in the base DLL nor in the DLL
      containing the subclass).
      
      If the virtual destructor in the newly added testcase wouldn't
      be pure (or if there'd be another non-pure virtual method),
      it'd be a key function and things would work out even before this
      change. Make sure to locally emit the typeinfo for these classes
      as well.
      
      This matches what GCC does in this specific testcase.
      
      This fixes the root issue that spawned PR35146. (The difference
      to GCC that is initially described in that bug still is present
      though.)
      
      Differential Revision: https://reviews.llvm.org/D42641
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324219
      2cf6b595
    • Hans Wennborg's avatar
      Merging r324039: (test case modified to work around r323886 et al.) · 4d860d29
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r324039 | matze | 2018-02-02 01:08:19 +0100 (Fri, 02 Feb 2018) | 33 lines
      
      SplitKit: Fix liveness recomputation in some remat cases.
      
      Example situation:
      ```
      BB0:
        %0 = ...
        use %0
        ; ...
        condjump BB1
        jmp BB2
      
      BB1:
        %0 = ...   ; rematerialized def from above (from earlier split step)
        jmp BB2
      
      BB2:
        ; ...
        use %0
      ```
      
      %0 will have a live interval with 3 value numbers (for the BB0, BB1 and
      BB2 parts). Now SplitKit tries and succeeds in rematerializing the value
      number in BB2 (This only works because it is a secondary split so
      SplitKit is can trace this back to a single original def).
      
      We need to recompute all live ranges affected by a value number that we
      rematerialize. The case that we missed before is that when the value
      that is rematerialized is at a join (Phi VNI) then we also have to
      recompute liveness for the predecessor VNIs.
      
      rdar://35699130
      
      Differential Revision: https://reviews.llvm.org/D42667
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324218
      4d860d29
    • Hans Wennborg's avatar
      Merging r324002: · eb2e0b41
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r324002 | ctopper | 2018-02-01 21:48:50 +0100 (Thu, 01 Feb 2018) | 7 lines
      
      [DAGCombiner] When folding (insert_subvector undef, (bitcast (extract_subvector N1, Idx)), Idx) -> (bitcast N1) make sure that N1 has the same total size as the original output
      
      We were only checking the element count, but not the total width. This could cause illegal bitcasts to be created if for example the output was 512-bits, but N1 is 256 bits, and the extraction size was 128-bits.
      
      Fixes PR36199
      
      Differential Revision: https://reviews.llvm.org/D42809
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324216
      eb2e0b41
    • Hans Wennborg's avatar
      Merging r323935: · 8be5478f
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323935 | rsmith | 2018-02-01 01:28:36 +0100 (Thu, 01 Feb 2018) | 5 lines
      
      PR36181: Teach CodeGen to properly ignore requests to emit dependent entities.
      
      Previously, friend function definitions within class templates slipped through
      the gaps and caused the MS mangler to assert.
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324215
      8be5478f
    • Hans Wennborg's avatar
      Merging r324134: · e71868f8
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r324134 | ericwf | 2018-02-02 21:30:39 +0100 (Fri, 02 Feb 2018) | 14 lines
      
      Make __has_unique_object_representations reject empty union types.
      
      Summary:
      Clang incorrectly reports empty unions as having a unique object representation. However, this is not correct since `sizeof(EmptyUnion) == 1` AKA it has 8 bits of padding. Therefore it should be treated the same as an empty struct and report `false`.
      
      @erichkeane also suggested this fix should be merged into the 6.0 release branch, so the initial release of `__has_unique_object_representations` is as bug-free as possible. 
      
      Reviewers: erichkeane, rsmith, aaron.ballman, majnemer
      
      Reviewed By: erichkeane
      
      Subscribers: cfe-commits, erichkeane
      
      Differential Revision: https://reviews.llvm.org/D42863
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324213
      e71868f8
    • Hans Wennborg's avatar
      [ReleaseNotes] Add note for the new -fexperimental-isel flag. · 3980c8df
      Hans Wennborg authored
      Patch by Amara Emerson.
      
      Differential Revision: https://reviews.llvm.org/D42860
      
      llvm-svn: 324212
      3980c8df
    • Hans Wennborg's avatar
      [ReleaseNotes] Add note for enabling GlobalISel for AArch64 -O0 · 8ec2d86a
      Hans Wennborg authored
      Patch by Amara Emerson.
      
      Differential Revision: https://reviews.llvm.org/D42861
      
      llvm-svn: 324211
      8ec2d86a
  4. 02 Feb, 2018 14 commits
    • Hans Wennborg's avatar
      Merging r323908: · 0f84e3ee
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323908 | mareko | 2018-01-31 21:18:04 +0100 (Wed, 31 Jan 2018) | 7 lines
      
      AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}
      
      Reviewers: arsenm, nhaehnle
      
      Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye
      
      Differential Revision: https://reviews.llvm.org/D41663
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324103
      0f84e3ee
    • Hans Wennborg's avatar
      Merging r324043: · 9b8da2b9
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r324043 | ruiu | 2018-02-02 01:31:05 +0100 (Fri, 02 Feb 2018) | 6 lines
      
      Fix typo: --nopie -> --no-pie.
      
      --nopie was a typo. GNU gold doesn't recognize it. It is also
      inconsistent with other options that have --foo and --no-foo.
      
      Differential Revision: https://reviews.llvm.org/D42825
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324100
      9b8da2b9
    • Hans Wennborg's avatar
      Merging r323643: · 27021b8a
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323643 | jdevlieghere | 2018-01-29 13:10:32 +0100 (Mon, 29 Jan 2018) | 16 lines
      
      [Sparc] Account for bias in stack readjustment
      
      Summary: This was broken long ago in D12208, which failed to account for
      the fact that 64-bit SPARC uses a stack bias of 2047, and it is the
      *unbiased* value which should be aligned, not the biased one. This was
      seen to be an issue with Rust.
      
      Patch by: jrtc27 (James Clarke)
      
      Reviewers: jyknight, venkatra
      
      Reviewed By: jyknight
      
      Subscribers: jacob_hansen, JDevlieghere, fhahn, fedor.sergeev, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D39425
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324090
      27021b8a
    • Hans Wennborg's avatar
      Merging r323909: · 3b724f63
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323909 | mareko | 2018-01-31 21:18:11 +0100 (Wed, 31 Jan 2018) | 13 lines
      
      AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9
      
      Summary:
      This enables load merging into x2, x4, which is driven by inline offsets.
      
      6500 shaders are affected:
      Code Size in affected shaders: -15.14 %
      
      Reviewers: arsenm, nhaehnle
      
      Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D42078
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324089
      3b724f63
    • Hans Wennborg's avatar
      Merging r323907 and r323913: · 9f3da91d
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323907 | mareko | 2018-01-31 21:17:52 +0100 (Wed, 31 Jan 2018) | 11 lines
      
      [SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs
      
      Summary:
      !amdgpu.uniform needs to be preserved for AMDGPU, otherwise bad things
      happen.
      
      Reviewers: arsenm, nhaehnle, jingyue, broune, majnemer, bjarke.roune, dblaikie
      
      Subscribers: wdng, tpr, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D42744
      ```
      
      ---------------------------------------------------------------------
      
      ------------------------------------------------------------------------
      r323913 | mareko | 2018-01-31 21:49:19 +0100 (Wed, 31 Jan 2018) | 1 line
      
      [SeparateConstOffsetFromGEP] Fix up addrspace in the AMDGPU test
      ------------------------------------------------------------------------
      
      llvm-svn: 324088
      9f3da91d
    • Hans Wennborg's avatar
      Merging r323536: · 87627246
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323536 | arichardson | 2018-01-26 16:56:14 +0100 (Fri, 26 Jan 2018) | 11 lines
      
      [MIPS] Don't crash on unsized extern types with -mgpopt
      
      Summary: This fixes an assertion when building the FreeBSD MIPS64 kernel.
      
      Reviewers: atanasyan, sdardis, emaste
      
      Reviewed By: sdardis
      
      Subscribers: krytarowski, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D42571
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324087
      87627246
    • Hans Wennborg's avatar
      Merging r323759: · 6291b0d9
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323759 | spatel | 2018-01-30 14:53:59 +0100 (Tue, 30 Jan 2018) | 10 lines
      
      [DSE] make sure memory is not modified before partial store merging (PR36129)
      
      We missed a critical check in D30703. We must make sure that no intermediate 
      store is sitting between the stores that we want to merge.
      
      This should fix:
      https://bugs.llvm.org/show_bug.cgi?id=36129
      
      Differential Revision: https://reviews.llvm.org/D42663
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324086
      6291b0d9
    • Hans Wennborg's avatar
      Merging r323781: · 7a8cd3ec
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323781 | sdardis | 2018-01-30 17:24:10 +0100 (Tue, 30 Jan 2018) | 15 lines
      
      [mips] Fix incorrect sign extension for fpowi libcall
      
      PR36061 showed that during the expansion of ISD::FPOWI, that there
      was an incorrect zero extension of the integer argument which for
      MIPS64 would then give incorrect results. Address this with the
      existing mechanism for correcting sign extensions.
      
      This resolves PR36061.
      
      Thanks to James Cowgill for reporting the issue!
      
      Reviewers: atanasyan, hfinkel
      
      Differential Revision: https://reviews.llvm.org/D42537
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324085
      7a8cd3ec
    • Hans Wennborg's avatar
      Merging r323857: · 00852763
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323857 | rogfer01 | 2018-01-31 10:23:43 +0100 (Wed, 31 Jan 2018) | 19 lines
      
      [ARM] Allow the scheduler to clone a node with glue to avoid a copy CPSR  GPR.
      
      In Thumb 1, with the new ADDCARRY / SUBCARRY the scheduler may need to do
      copies CPSR  GPR but not all Thumb1 targets implement them.
      
      The schedule can attempt, before attempting a copy, to clone the instructions
      but it does not currently do that for nodes with input glue. In this patch we
      introduce a target-hook to let the hook decide if a glued machinenode is still
      eligible for copying. In this case these are ARM::tADCS and ARM::tSBCS .
      
      As a follow-up of this change we should actually implement the copies for the
      Thumb1 targets that do implement them and restrict the hook to the targets that
      can't really do such copy as these clones are not ideal.
      
      This change fixes PR35836.
      
      Differential Revision: https://reviews.llvm.org/D42051
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324082
      00852763
    • Hans Wennborg's avatar
      Merging r323915: · eeabe19e
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323915 | chandlerc | 2018-01-31 21:56:37 +0100 (Wed, 31 Jan 2018) | 17 lines
      
      [x86] Make the retpoline thunk insertion a machine function pass.
      
      Summary:
      This removes the need for a machine module pass using some deeply
      questionable hacks. This should address PR36123 which is a case where in
      full LTO the memory usage of a machine module pass actually ended up
      being significant.
      
      We should revert this on trunk as soon as we understand and fix the
      memory usage issue, but we should include this in any backports of
      retpolines themselves.
      
      Reviewers: echristo, MatzeB
      
      Subscribers: sanjoy, mcrosier, mehdi_amini, hiraditya, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D42726
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324071
      eeabe19e
    • Hans Wennborg's avatar
      Merging r323288: · 7df01702
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323288 | ruiu | 2018-01-24 01:26:57 +0100 (Wed, 24 Jan 2018) | 3 lines
      
      Fix retpoline PLT header size for i386.
      
      Differential Revision: https://reviews.llvm.org/D42397
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324070
      7df01702
    • Hans Wennborg's avatar
      Merging r323155: · f21e34b3
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323155 | chandlerc | 2018-01-22 23:05:25 +0100 (Mon, 22 Jan 2018) | 133 lines
      
      Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
      
      Summary:
      First, we need to explain the core of the vulnerability. Note that this
      is a very incomplete description, please see the Project Zero blog post
      for details:
      https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
      
      The basis for branch target injection is to direct speculative execution
      of the processor to some "gadget" of executable code by poisoning the
      prediction of indirect branches with the address of that gadget. The
      gadget in turn contains an operation that provides a side channel for
      reading data. Most commonly, this will look like a load of secret data
      followed by a branch on the loaded value and then a load of some
      predictable cache line. The attacker then uses timing of the processors
      cache to determine which direction the branch took *in the speculative
      execution*, and in turn what one bit of the loaded value was. Due to the
      nature of these timing side channels and the branch predictor on Intel
      processors, this allows an attacker to leak data only accessible to
      a privileged domain (like the kernel) back into an unprivileged domain.
      
      The goal is simple: avoid generating code which contains an indirect
      branch that could have its prediction poisoned by an attacker. In many
      cases, the compiler can simply use directed conditional branches and
      a small search tree. LLVM already has support for lowering switches in
      this way and the first step of this patch is to disable jump-table
      lowering of switches and introduce a pass to rewrite explicit indirectbr
      sequences into a switch over integers.
      
      However, there is no fully general alternative to indirect calls. We
      introduce a new construct we call a "retpoline" to implement indirect
      calls in a non-speculatable way. It can be thought of loosely as
      a trampoline for indirect calls which uses the RET instruction on x86.
      Further, we arrange for a specific call->ret sequence which ensures the
      processor predicts the return to go to a controlled, known location. The
      retpoline then "smashes" the return address pushed onto the stack by the
      call with the desired target of the original indirect call. The result
      is a predicted return to the next instruction after a call (which can be
      used to trap speculative execution within an infinite loop) and an
      actual indirect branch to an arbitrary address.
      
      On 64-bit x86 ABIs, this is especially easily done in the compiler by
      using a guaranteed scratch register to pass the target into this device.
      For 32-bit ABIs there isn't a guaranteed scratch register and so several
      different retpoline variants are introduced to use a scratch register if
      one is available in the calling convention and to otherwise use direct
      stack push/pop sequences to pass the target address.
      
      This "retpoline" mitigation is fully described in the following blog
      post: https://support.google.com/faqs/answer/7625886
      
      We also support a target feature that disables emission of the retpoline
      thunk by the compiler to allow for custom thunks if users want them.
      These are particularly useful in environments like kernels that
      routinely do hot-patching on boot and want to hot-patch their thunk to
      different code sequences. They can write this custom thunk and use
      `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
      case, on x86-64 thu thunk names must be:
      ```
        __llvm_external_retpoline_r11
      ```
      or on 32-bit:
      ```
        __llvm_external_retpoline_eax
        __llvm_external_retpoline_ecx
        __llvm_external_retpoline_edx
        __llvm_external_retpoline_push
      ```
      And the target of the retpoline is passed in the named register, or in
      the case of the `push` suffix on the top of the stack via a `pushl`
      instruction.
      
      There is one other important source of indirect branches in x86 ELF
      binaries: the PLT. These patches also include support for LLD to
      generate PLT entries that perform a retpoline-style indirection.
      
      The only other indirect branches remaining that we are aware of are from
      precompiled runtimes (such as crt0.o and similar). The ones we have
      found are not really attackable, and so we have not focused on them
      here, but eventually these runtimes should also be replicated for
      retpoline-ed configurations for completeness.
      
      For kernels or other freestanding or fully static executables, the
      compiler switch `-mretpoline` is sufficient to fully mitigate this
      particular attack. For dynamic executables, you must compile *all*
      libraries with `-mretpoline` and additionally link the dynamic
      executable and all shared libraries with LLD and pass `-z retpolineplt`
      (or use similar functionality from some other linker). We strongly
      recommend also using `-z now` as non-lazy binding allows the
      retpoline-mitigated PLT to be substantially smaller.
      
      When manually apply similar transformations to `-mretpoline` to the
      Linux kernel we observed very small performance hits to applications
      running typical workloads, and relatively minor hits (approximately 2%)
      even for extremely syscall-heavy applications. This is largely due to
      the small number of indirect branches that occur in performance
      sensitive paths of the kernel.
      
      When using these patches on statically linked applications, especially
      C++ applications, you should expect to see a much more dramatic
      performance hit. For microbenchmarks that are switch, indirect-, or
      virtual-call heavy we have seen overheads ranging from 10% to 50%.
      
      However, real-world workloads exhibit substantially lower performance
      impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
      the impact of hot indirect calls (by speculatively promoting them to
      direct calls) and allow optimized search trees to be used to lower
      switches. If you need to deploy these techniques in C++ applications, we
      *strongly* recommend that you ensure all hot call targets are statically
      linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
      tuned servers using all of these techniques saw 5% - 10% overhead from
      the use of retpoline.
      
      We will add detailed documentation covering these components in
      subsequent patches, but wanted to make the core functionality available
      as soon as possible. Happy for more code review, but we'd really like to
      get these patches landed and backported ASAP for obvious reasons. We're
      planning to backport this to both 6.0 and 5.0 release streams and get
      a 5.0 release with just this cherry picked ASAP for distros and vendors.
      
      This patch is the work of a number of people over the past month: Eric, Reid,
      Rui, and myself. I'm mailing it out as a single commit due to the time
      sensitive nature of landing this and the need to backport it. Huge thanks to
      everyone who helped out here, and everyone at Intel who helped out in
      discussions about how to craft this. Also, credit goes to Paul Turner (at
      Google, but not an LLVM contributor) for much of the underlying retpoline
      design.
      
      Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer
      
      Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D41723
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324069
      f21e34b3
    • Hans Wennborg's avatar
      Merging r323155: · 64aeffbb
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323155 | chandlerc | 2018-01-22 23:05:25 +0100 (Mon, 22 Jan 2018) | 133 lines
      
      Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
      
      Summary:
      First, we need to explain the core of the vulnerability. Note that this
      is a very incomplete description, please see the Project Zero blog post
      for details:
      https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
      
      The basis for branch target injection is to direct speculative execution
      of the processor to some "gadget" of executable code by poisoning the
      prediction of indirect branches with the address of that gadget. The
      gadget in turn contains an operation that provides a side channel for
      reading data. Most commonly, this will look like a load of secret data
      followed by a branch on the loaded value and then a load of some
      predictable cache line. The attacker then uses timing of the processors
      cache to determine which direction the branch took *in the speculative
      execution*, and in turn what one bit of the loaded value was. Due to the
      nature of these timing side channels and the branch predictor on Intel
      processors, this allows an attacker to leak data only accessible to
      a privileged domain (like the kernel) back into an unprivileged domain.
      
      The goal is simple: avoid generating code which contains an indirect
      branch that could have its prediction poisoned by an attacker. In many
      cases, the compiler can simply use directed conditional branches and
      a small search tree. LLVM already has support for lowering switches in
      this way and the first step of this patch is to disable jump-table
      lowering of switches and introduce a pass to rewrite explicit indirectbr
      sequences into a switch over integers.
      
      However, there is no fully general alternative to indirect calls. We
      introduce a new construct we call a "retpoline" to implement indirect
      calls in a non-speculatable way. It can be thought of loosely as
      a trampoline for indirect calls which uses the RET instruction on x86.
      Further, we arrange for a specific call->ret sequence which ensures the
      processor predicts the return to go to a controlled, known location. The
      retpoline then "smashes" the return address pushed onto the stack by the
      call with the desired target of the original indirect call. The result
      is a predicted return to the next instruction after a call (which can be
      used to trap speculative execution within an infinite loop) and an
      actual indirect branch to an arbitrary address.
      
      On 64-bit x86 ABIs, this is especially easily done in the compiler by
      using a guaranteed scratch register to pass the target into this device.
      For 32-bit ABIs there isn't a guaranteed scratch register and so several
      different retpoline variants are introduced to use a scratch register if
      one is available in the calling convention and to otherwise use direct
      stack push/pop sequences to pass the target address.
      
      This "retpoline" mitigation is fully described in the following blog
      post: https://support.google.com/faqs/answer/7625886
      
      We also support a target feature that disables emission of the retpoline
      thunk by the compiler to allow for custom thunks if users want them.
      These are particularly useful in environments like kernels that
      routinely do hot-patching on boot and want to hot-patch their thunk to
      different code sequences. They can write this custom thunk and use
      `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
      case, on x86-64 thu thunk names must be:
      ```
        __llvm_external_retpoline_r11
      ```
      or on 32-bit:
      ```
        __llvm_external_retpoline_eax
        __llvm_external_retpoline_ecx
        __llvm_external_retpoline_edx
        __llvm_external_retpoline_push
      ```
      And the target of the retpoline is passed in the named register, or in
      the case of the `push` suffix on the top of the stack via a `pushl`
      instruction.
      
      There is one other important source of indirect branches in x86 ELF
      binaries: the PLT. These patches also include support for LLD to
      generate PLT entries that perform a retpoline-style indirection.
      
      The only other indirect branches remaining that we are aware of are from
      precompiled runtimes (such as crt0.o and similar). The ones we have
      found are not really attackable, and so we have not focused on them
      here, but eventually these runtimes should also be replicated for
      retpoline-ed configurations for completeness.
      
      For kernels or other freestanding or fully static executables, the
      compiler switch `-mretpoline` is sufficient to fully mitigate this
      particular attack. For dynamic executables, you must compile *all*
      libraries with `-mretpoline` and additionally link the dynamic
      executable and all shared libraries with LLD and pass `-z retpolineplt`
      (or use similar functionality from some other linker). We strongly
      recommend also using `-z now` as non-lazy binding allows the
      retpoline-mitigated PLT to be substantially smaller.
      
      When manually apply similar transformations to `-mretpoline` to the
      Linux kernel we observed very small performance hits to applications
      running typical workloads, and relatively minor hits (approximately 2%)
      even for extremely syscall-heavy applications. This is largely due to
      the small number of indirect branches that occur in performance
      sensitive paths of the kernel.
      
      When using these patches on statically linked applications, especially
      C++ applications, you should expect to see a much more dramatic
      performance hit. For microbenchmarks that are switch, indirect-, or
      virtual-call heavy we have seen overheads ranging from 10% to 50%.
      
      However, real-world workloads exhibit substantially lower performance
      impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
      the impact of hot indirect calls (by speculatively promoting them to
      direct calls) and allow optimized search trees to be used to lower
      switches. If you need to deploy these techniques in C++ applications, we
      *strongly* recommend that you ensure all hot call targets are statically
      linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
      tuned servers using all of these techniques saw 5% - 10% overhead from
      the use of retpoline.
      
      We will add detailed documentation covering these components in
      subsequent patches, but wanted to make the core functionality available
      as soon as possible. Happy for more code review, but we'd really like to
      get these patches landed and backported ASAP for obvious reasons. We're
      planning to backport this to both 6.0 and 5.0 release streams and get
      a 5.0 release with just this cherry picked ASAP for distros and vendors.
      
      This patch is the work of a number of people over the past month: Eric, Reid,
      Rui, and myself. I'm mailing it out as a single commit due to the time
      sensitive nature of landing this and the need to backport it. Huge thanks to
      everyone who helped out here, and everyone at Intel who helped out in
      discussions about how to craft this. Also, credit goes to Paul Turner (at
      Google, but not an LLVM contributor) for much of the underlying retpoline
      design.
      
      Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer
      
      Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D41723
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324068
      64aeffbb
    • Hans Wennborg's avatar
      Merging r323155: · 4321822f
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323155 | chandlerc | 2018-01-22 23:05:25 +0100 (Mon, 22 Jan 2018) | 133 lines
      
      Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
      
      Summary:
      First, we need to explain the core of the vulnerability. Note that this
      is a very incomplete description, please see the Project Zero blog post
      for details:
      https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
      
      The basis for branch target injection is to direct speculative execution
      of the processor to some "gadget" of executable code by poisoning the
      prediction of indirect branches with the address of that gadget. The
      gadget in turn contains an operation that provides a side channel for
      reading data. Most commonly, this will look like a load of secret data
      followed by a branch on the loaded value and then a load of some
      predictable cache line. The attacker then uses timing of the processors
      cache to determine which direction the branch took *in the speculative
      execution*, and in turn what one bit of the loaded value was. Due to the
      nature of these timing side channels and the branch predictor on Intel
      processors, this allows an attacker to leak data only accessible to
      a privileged domain (like the kernel) back into an unprivileged domain.
      
      The goal is simple: avoid generating code which contains an indirect
      branch that could have its prediction poisoned by an attacker. In many
      cases, the compiler can simply use directed conditional branches and
      a small search tree. LLVM already has support for lowering switches in
      this way and the first step of this patch is to disable jump-table
      lowering of switches and introduce a pass to rewrite explicit indirectbr
      sequences into a switch over integers.
      
      However, there is no fully general alternative to indirect calls. We
      introduce a new construct we call a "retpoline" to implement indirect
      calls in a non-speculatable way. It can be thought of loosely as
      a trampoline for indirect calls which uses the RET instruction on x86.
      Further, we arrange for a specific call->ret sequence which ensures the
      processor predicts the return to go to a controlled, known location. The
      retpoline then "smashes" the return address pushed onto the stack by the
      call with the desired target of the original indirect call. The result
      is a predicted return to the next instruction after a call (which can be
      used to trap speculative execution within an infinite loop) and an
      actual indirect branch to an arbitrary address.
      
      On 64-bit x86 ABIs, this is especially easily done in the compiler by
      using a guaranteed scratch register to pass the target into this device.
      For 32-bit ABIs there isn't a guaranteed scratch register and so several
      different retpoline variants are introduced to use a scratch register if
      one is available in the calling convention and to otherwise use direct
      stack push/pop sequences to pass the target address.
      
      This "retpoline" mitigation is fully described in the following blog
      post: https://support.google.com/faqs/answer/7625886
      
      We also support a target feature that disables emission of the retpoline
      thunk by the compiler to allow for custom thunks if users want them.
      These are particularly useful in environments like kernels that
      routinely do hot-patching on boot and want to hot-patch their thunk to
      different code sequences. They can write this custom thunk and use
      `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
      case, on x86-64 thu thunk names must be:
      ```
        __llvm_external_retpoline_r11
      ```
      or on 32-bit:
      ```
        __llvm_external_retpoline_eax
        __llvm_external_retpoline_ecx
        __llvm_external_retpoline_edx
        __llvm_external_retpoline_push
      ```
      And the target of the retpoline is passed in the named register, or in
      the case of the `push` suffix on the top of the stack via a `pushl`
      instruction.
      
      There is one other important source of indirect branches in x86 ELF
      binaries: the PLT. These patches also include support for LLD to
      generate PLT entries that perform a retpoline-style indirection.
      
      The only other indirect branches remaining that we are aware of are from
      precompiled runtimes (such as crt0.o and similar). The ones we have
      found are not really attackable, and so we have not focused on them
      here, but eventually these runtimes should also be replicated for
      retpoline-ed configurations for completeness.
      
      For kernels or other freestanding or fully static executables, the
      compiler switch `-mretpoline` is sufficient to fully mitigate this
      particular attack. For dynamic executables, you must compile *all*
      libraries with `-mretpoline` and additionally link the dynamic
      executable and all shared libraries with LLD and pass `-z retpolineplt`
      (or use similar functionality from some other linker). We strongly
      recommend also using `-z now` as non-lazy binding allows the
      retpoline-mitigated PLT to be substantially smaller.
      
      When manually apply similar transformations to `-mretpoline` to the
      Linux kernel we observed very small performance hits to applications
      running typical workloads, and relatively minor hits (approximately 2%)
      even for extremely syscall-heavy applications. This is largely due to
      the small number of indirect branches that occur in performance
      sensitive paths of the kernel.
      
      When using these patches on statically linked applications, especially
      C++ applications, you should expect to see a much more dramatic
      performance hit. For microbenchmarks that are switch, indirect-, or
      virtual-call heavy we have seen overheads ranging from 10% to 50%.
      
      However, real-world workloads exhibit substantially lower performance
      impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
      the impact of hot indirect calls (by speculatively promoting them to
      direct calls) and allow optimized search trees to be used to lower
      switches. If you need to deploy these techniques in C++ applications, we
      *strongly* recommend that you ensure all hot call targets are statically
      linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
      tuned servers using all of these techniques saw 5% - 10% overhead from
      the use of retpoline.
      
      We will add detailed documentation covering these components in
      subsequent patches, but wanted to make the core functionality available
      as soon as possible. Happy for more code review, but we'd really like to
      get these patches landed and backported ASAP for obvious reasons. We're
      planning to backport this to both 6.0 and 5.0 release streams and get
      a 5.0 release with just this cherry picked ASAP for distros and vendors.
      
      This patch is the work of a number of people over the past month: Eric, Reid,
      Rui, and myself. I'm mailing it out as a single commit due to the time
      sensitive nature of landing this and the need to backport it. Huge thanks to
      everyone who helped out here, and everyone at Intel who helped out in
      discussions about how to craft this. Also, credit goes to Paul Turner (at
      Google, but not an LLVM contributor) for much of the underlying retpoline
      design.
      
      Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer
      
      Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D41723
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 324067
      4321822f
  5. 01 Feb, 2018 1 commit
  6. 31 Jan, 2018 5 commits
    • Anastasia Stulova's avatar
      [Docs] Added release notes for OpenCL. · c4a589b2
      Anastasia Stulova authored
         
      Differential Revision: https://reviews.llvm.org/D42307
      
      llvm-svn: 323875
      c4a589b2
    • Hans Wennborg's avatar
      Merging r323813: · 91d53cf3
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323813 | tejohnson | 2018-01-30 21:16:32 +0100 (Tue, 30 Jan 2018) | 14 lines
      
      Teach ValueMapper to use ODR uniqued types when available
      
      Summary:
      This is exposed during ThinLTO compilation, when we import an alias by
      creating a clone of the aliasee. Without this fix the debug type is
      unnecessarily cloned and we get a duplicate, undoing the uniquing.
      
      Fixes PR36089.
      
      Reviewers: mehdi_amini, pcc
      
      Subscribers: eraman, JDevlieghere, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D41669
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 323854
      91d53cf3
    • Hans Wennborg's avatar
      Merging r323811: · 05bd093d
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323811 | mstorsjo | 2018-01-30 20:50:58 +0100 (Tue, 30 Jan 2018) | 3 lines
      
      [GlobalISel] Bail out on calls to dllimported functions
      
      Differential Revision: https://reviews.llvm.org/D42568
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 323853
      05bd093d
    • Hans Wennborg's avatar
      Merging r323810: · 5dd46a0b
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323810 | mstorsjo | 2018-01-30 20:50:51 +0100 (Tue, 30 Jan 2018) | 3 lines
      
      [AArch64] Properly handle dllimport of variables when using fast-isel
      
      Differential Revision: https://reviews.llvm.org/D42567
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 323852
      5dd46a0b
    • Hans Wennborg's avatar
      Merging r322588: · dc521008
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r322588 | eugenis | 2018-01-16 20:21:45 +0100 (Tue, 16 Jan 2018) | 9 lines
      
      [hwasan] Build runtime library with -fPIC, not -fPIE.
      
      Summary: -fPIE can not be used when building a shared library.
      
      Reviewers: alekseyshl, peter.smith
      
      Subscribers: kubamracek, llvm-commits, mgorny
      
      Differential Revision: https://reviews.llvm.org/D42121
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 323850
      dc521008
  7. 30 Jan, 2018 3 commits
    • Hans Wennborg's avatar
      Merging r323706: · 3890cc2b
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323706 | mareko | 2018-01-30 00:19:10 +0100 (Tue, 30 Jan 2018) | 15 lines
      
      AMDGPU: Allow a SGPR for the conditional KILL operand
      
      Patch by: Bas Nieuwenhuizen
      
      Just use the _e64 variant if needed. This should be possible as per
      
      def : Pat <
        (int_amdgcn_kill (i1 (setcc f32:$src, InlineFPImm<f32>:$imm, cond:$cond))),
        (SI_KILL_F32_COND_IMM_PSEUDO $src, (bitcast_fpimm_to_i32 $imm), (cond_as_i32imm $cond))
      > ;
      
      I don't think we can get an immediate for the other operand for which we
      need the second 32-bit word.
      
      https://reviews.llvm.org/D42302
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 323772
      3890cc2b
    • Hans Wennborg's avatar
      Merging r323515: · 93fb7556
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323515 | fhahn | 2018-01-26 11:36:50 +0100 (Fri, 26 Jan 2018) | 7 lines
      
      [CallSiteSplitting] Fix infinite loop when recording conditions.
      
      Fix infinite loop when recording conditions by correctly marking basic
      blocks as visited.
      
      Fixes https://bugs.llvm.org/show_bug.cgi?id=36105
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 323771
      93fb7556
    • Hans Wennborg's avatar
      Merging r323469: · 2a3963f2
      Hans Wennborg authored
      ```---------------------------------------------------------------------
      r323469 | ctopper | 2018-01-25 22:23:57 +0100 (Thu, 25 Jan 2018) | 3 lines
      
      [X86] Teach Intel syntax InstPrinter to print lock prefixes that have been parsed from the asm parser.
      
      The asm parser puts the lock prefix in the MCInst flags so we need to check that in addition to TSFlags. This matches what the ATT printer does.
      ```
      
      ---------------------------------------------------------------------
      
      llvm-svn: 323770
      2a3963f2