• Peter Zijlstra's avatar
    x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}() · 0b9ccc0a
    Peter Zijlstra authored
    Nadav Amit reported that commit:
    
      b59167ac ("x86/percpu: Fix this_cpu_read()")
    
    added a bunch of constraints to all sorts of code; and while some of
    that was correct and desired, some of that seems superfluous.
    
    The thing is, the this_cpu_*() operations are defined IRQ-safe, this
    means the values are subject to change from IRQs, and thus must be
    reloaded.
    
    Also, the generic form:
    
      local_irq_save()
      __this_cpu_read()
      local_irq_restore()
    
    would not allow the re-use of previous values; if by nothing else,
    then the barrier()s implied by local_irq_*().
    
    Which raises the point that percpu_from_op() and the others also need
    that volatile.
    
    OTOH __this_cpu_*() operations are not IRQ-safe and assume external
    preempt/IRQ disabling and could thus be allowed more room for
    optimization.
    
    This makes the this_cpu_*() vs __this_cpu_*() behaviour more
    consistent with other architectures.
    
      $ ./compare.sh defconfig-build defconfig-build1 vmlinux.o
      x86_pmu_cancel_txn                                         80         71   -9,+0
      __text_poke                                               919        964   +45,+0
      do_user_addr_fault                                       1082       1058   -24,+0
      __do_page_fault                                          1194       1178   -16,+0
      do_exit                                                  2995       3027   -43,+75
      process_one_work                                         1008        989   -67,+48
      finish_task_switch                                        524        505   -19,+0
      __schedule_bug                                            103         98   -59,+54
      __schedule_bug                                            103         98   -59,+54
      __sched_setscheduler                                     2015       2030   +15,+0
      freeze_processes                                          203        230   +31,-4
      rcu_gp_kthread_wake                                       106         99   -7,+0
      rcu_core                                                 1841       1834   -7,+0
      call_timer_fn                                             298        286   -12,+0
      can_stop_idle_tick                                        146        139   -31,+24
      perf_pending_event                                        253        239   -14,+0
      shmem_alloc_page                                          209        213   +4,+0
      __alloc_pages_slowpath                                   3284       3269   -15,+0
      umount_tree                                               671        694   +23,+0
      advance_transaction                                       803        798   -5,+0
      con_put_char                                               71         51   -20,+0
      xhci_urb_enqueue                                         1302       1295   -7,+0
      xhci_urb_enqueue                                         1302       1295   -7,+0
      tcp_sacktag_write_queue                                  2130       2075   -55,+0
      tcp_try_undo_loss                                         229        208   -21,+0
      tcp_v4_inbound_md5_hash                                   438        411   -31,+4
      tcp_v4_inbound_md5_hash                                   438        411   -31,+4
      tcp_v6_inbound_md5_hash                                   469        411   -33,-25
      tcp_v6_inbound_md5_hash                                   469        411   -33,-25
      restricted_pointer                                        434        420   -14,+0
      irq_exit                                                  162        154   -8,+0
      get_perf_callchain                                        638        624   -14,+0
      rt_mutex_trylock                                          169        156   -13,+0
      avc_has_extended_perms                                   1092       1089   -3,+0
      avc_has_perm_noaudit                                      309        306   -3,+0
      __perf_sw_event                                           138        122   -16,+0
      perf_swevent_get_recursion_context                        116        102   -14,+0
      __local_bh_enable_ip                                       93         72   -21,+0
      xfrm_input                                               4175       4161   -14,+0
      avc_has_perm                                              446        443   -3,+0
      vm_events_fold_cpu                                         57         56   -1,+0
      vfree                                                      68         61   -7,+0
      freeze_processes                                          203        230   +31,-4
      _local_bh_enable                                           44         30   -14,+0
      ip_do_fragment                                           1982       1944   -38,+0
      do_exit                                                  2995       3027   -43,+75
      __do_softirq                                              742        724   -18,+0
      cpu_init                                                 1510       1489   -21,+0
      account_system_time                                        80         79   -1,+0
                                                   total   12985281   12984819   -742,+280
    Reported-by: default avatarNadav Amit <nadav.amit@gmail.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lkml.kernel.org/r/20181206112433.GB13675@hirez.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    0b9ccc0a
percpu.h 19.6 KB