• Ingo Molnar's avatar
    [PATCH] smptimers, old BH removal, tq-cleanup · dd140c87
    Ingo Molnar authored
    This is the smptimers patch plus the removal of old BHs and a rewrite of
    task-queue handling.
    
    Basically with the removal of TIMER_BH i think the time is right to get
    rid of old BHs forever, and to do a massive cleanup of all related
    fields.  The following five basic 'execution context' abstractions are
    supported by the kernel:
    
      - hardirq
      - softirq
      - tasklet
      - keventd-driven task-queues
      - process contexts
    
    I've done the following cleanups/simplifications to task-queues:
    
     - removed the ability to define your own task-queue, what can be done is
       to schedule_task() a given task to keventd, and to flush all pending
       tasks.
    
    This is actually a quite easy transition, since 90% of all task-queue
    users in the kernel used BH_IMMEDIATE - which is very similar in
    functionality to keventd.
    
    I believe task-queues should not be removed from the kernel altogether.
    It's true that they were written as a candidate replacement for BHs
    originally, but they do make sense in a different way: it's perhaps the
    easiest interface to do deferred processing from IRQ context, in
    performance-uncritical code areas.  They are easier to use than
    tasklets.
    
    code that cares about performance should convert to tasklets - as the
    timer code and the serial subsystem has done already. For extreme
    performance softirqs should be used - the net subsystem does this.
    
    and we can do this for 2.6 - there are only a couple of areas left after
    fixing all the BH_IMMEDIATE places.
    
    i have moved all the taskqueue handling code into kernel/context.c, and
    only kept the basic 'queue a task' definitions in include/linux/tqueue.h.
    I've converted three of the most commonly used BH_IMMEDIATE users:
    tty_io.c, floppy.c and random.c. [random.c might need more thought
    though.]
    
    i've also cleaned up kernel/timer.c over that of the stock smptimers
    patch: privatized the timer-vec definitions (nothing needs it,
    init_timer() used it mistakenly) and cleaned up the code. Plus i've moved
    some code around that does not belong into timer.c, and within timer.c
    i've organized data and functions along functionality and further
    separated the base timer code from the NTP bits.
    
    net_bh_lock: i have removed it, since it would synchronize to nothing. The
    old protocol handlers should still run on UP, and on SMP the kernel prints
    a warning upon use. Alexey, is this approach fine with you?
    
    scalable timers: i've further improved the patch ported to 2.5 by wli and
    Dipankar. There is only one pending issue i can see, the question of
    whether to migrate timers in mod_timer() or not. I'm quite convinced that
    they should be migrated, but i might be wrong. It's a 10 lines change to
    switch between migrating and non-migrating timers, we can do performance
    tests later on. The current, more complex migration code is pretty fast
    and has been stable under extremely high networking loads in the past 2
    years, so we can immediately switch to the simpler variant if someone
    proves it improves performance. (I'd say if non-migrating timers improve
    Apache performance on one of the bigger NUMA boxes then the point is
    proven, no further though will be needed.)
    dd140c87
dev.c 68.6 KB