1. 23 Mar, 2011 40 commits
    • Tejun Heo's avatar
      job control: Small reorganization of wait_consider_task() · 823b018e
      Tejun Heo authored
      Move EXIT_DEAD test in wait_consider_task() above ptrace check.  As
      ptraced tasks can't be EXIT_DEAD, this change doesn't cause any
      behavior change.  This is to prepare for further changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      823b018e
    • Tejun Heo's avatar
      job control: Don't set group_stop exit_code if re-entering job control stop · 408a37de
      Tejun Heo authored
      While ptraced, a task may be resumed while the containing process is
      still job control stopped.  If the task receives another stop signal
      in this state, it will still initiate group stop, which generates
      group_exit_code, which the real parent would be able to see once the
      ptracer detaches.
      
      In this scenario, the real parent may see two consecutive CLD_STOPPED
      events from two stop signals without intervening SIGCONT, which
      normally is impossible.
      
      Test case follows.
      
        #include <stdio.h>
        #include <unistd.h>
        #include <sys/ptrace.h>
        #include <sys/wait.h>
      
        int main(void)
        {
      	  pid_t tracee;
      	  siginfo_t si;
      
      	  tracee = fork();
      	  if (!tracee)
      		  while (1)
      			  pause();
      
      	  kill(tracee, SIGSTOP);
      	  waitid(P_PID, tracee, &si, WSTOPPED);
      
      	  if (!fork()) {
      		  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
      		  waitid(P_PID, tracee, &si, WSTOPPED);
      		  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
      		  waitid(P_PID, tracee, &si, WSTOPPED);
      		  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
      		  waitid(P_PID, tracee, &si, WSTOPPED);
      		  ptrace(PTRACE_DETACH, tracee, NULL, NULL);
      		  return 0;
      	  }
      
      	  while (1) {
      		  si.si_pid = 0;
      		  waitid(P_PID, tracee, &si, WSTOPPED | WNOHANG);
      		  if (si.si_pid)
      			  printf("st=%02d c=%02d\n", si.si_status, si.si_code);
      	  }
      	  return 0;
        }
      
      Before the patch, the latter waitid() in polling mode reports the
      second stopped event generated by the implied SIGSTOP of
      PTRACE_ATTACH.
      
        st=19 c=05
        ^C
      
      After the patch, the second event is not reported.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      408a37de
    • Tejun Heo's avatar
      ptrace: Always put ptracee into appropriate execution state · 0e9f0a4a
      Tejun Heo authored
      Currently, __ptrace_unlink() wakes up the tracee iff it's in
      TASK_TRACED.  For unlinking from PTRACE_DETACH, this is correct as the
      tracee is guaranteed to be in TASK_TRACED or dead; however, unlinking
      also happens when the ptracer exits and in this case the ptracee can
      be in any state and ptrace might be left running even if the group it
      belongs to is stopped.
      
      This patch updates __ptrace_unlink() such that GROUP_STOP_PENDING is
      reinstated regardless of the ptracee's current state as long as it's
      alive and makes sure that signal_wake_up() is called if execution
      state transition is necessary.
      
      Test case follows.
      
        #include <unistd.h>
        #include <time.h>
        #include <sys/types.h>
        #include <sys/ptrace.h>
        #include <sys/wait.h>
      
        static const struct timespec ts1s = { .tv_sec = 1 };
      
        int main(void)
        {
      	  pid_t tracee;
      	  siginfo_t si;
      
      	  tracee = fork();
      	  if (tracee == 0) {
      		  while (1) {
      			  nanosleep(&ts1s, NULL);
      			  write(1, ".", 1);
      		  }
      	  }
      
      	  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
      	  waitid(P_PID, tracee, &si, WSTOPPED);
      	  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
      	  waitid(P_PID, tracee, &si, WSTOPPED);
      	  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
      	  write(1, "exiting", 7);
      	  return 0;
        }
      
      Before the patch, after the parent process exits, the child is left
      running and prints out "." every second.
      
        exiting..... (continues)
      
      After the patch, the group stop initiated by the implied SIGSTOP from
      PTRACE_ATTACH is re-established when the parent exits.
      
        exiting
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      0e9f0a4a
    • Tejun Heo's avatar
      ptrace: Collapse ptrace_untrace() into __ptrace_unlink() · e3bd058f
      Tejun Heo authored
      Remove the extra task_is_traced() check in __ptrace_unlink() and
      collapse ptrace_untrace() into __ptrace_unlink().  This is to prepare
      for further changes.
      
      While at it, drop the comment on top of ptrace_untrace() and convert
      __ptrace_unlink() comment to docbook format.  Detailed comment will be
      added by the next patch.
      
      This patch doesn't cause any visible behavior changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      e3bd058f
    • Tejun Heo's avatar
      ptrace: Clean transitions between TASK_STOPPED and TRACED · d79fdd6d
      Tejun Heo authored
      Currently, if the task is STOPPED on ptrace attach, it's left alone
      and the state is silently changed to TRACED on the next ptrace call.
      The behavior breaks the assumption that arch_ptrace_stop() is called
      before any task is poked by ptrace and is ugly in that a task
      manipulates the state of another task directly.
      
      With GROUP_STOP_PENDING, the transitions between TASK_STOPPED and
      TRACED can be made clean.  The tracer can use the flag to tell the
      tracee to retry stop on attach and detach.  On retry, the tracee will
      enter the desired state in the correct way.  The lower 16bits of
      task->group_stop is used to remember the signal number which caused
      the last group stop.  This is used while retrying for ptrace attach as
      the original group_exit_code could have been consumed with wait(2) by
      then.
      
      As the real parent may wait(2) and consume the group_exit_code
      anytime, the group_exit_code needs to be saved separately so that it
      can be used when switching from regular sleep to ptrace_stop().  This
      is recorded in the lower 16bits of task->group_stop.
      
      If a task is already stopped and there's no intervening SIGCONT, a
      ptrace request immediately following a successful PTRACE_ATTACH should
      always succeed even if the tracer doesn't wait(2) for attach
      completion; however, with this change, the tracee might still be
      TASK_RUNNING trying to enter TASK_TRACED which would cause the
      following request to fail with -ESRCH.
      
      This intermediate state is hidden from the ptracer by setting
      GROUP_STOP_TRAPPING on attach and making ptrace_check_attach() wait
      for it to clear on its signal->wait_chldexit.  Completing the
      transition or getting killed clears TRAPPING and wakes up the tracer.
      
      Note that the STOPPED -> RUNNING -> TRACED transition is still visible
      to other threads which are in the same group as the ptracer and the
      reverse transition is visible to all.  Please read the comments for
      details.
      
      Oleg:
      
      * Spotted a race condition where a task may retry group stop without
        proper bookkeeping.  Fixed by redoing bookkeeping on retry.
      
      * Spotted that the transition is visible to userland in several
        different ways.  Most are fixed with GROUP_STOP_TRAPPING.  Unhandled
        corner case is documented.
      
      * Pointed out not setting GROUP_STOP_SIGMASK on an already stopped
        task would result in more consistent behavior.
      
      * Pointed out that calling ptrace_stop() from do_signal_stop() in
        TASK_STOPPED can race with group stop start logic and then confuse
        the TRAPPING wait in ptrace_check_attach().  ptrace_stop() is now
        called with TASK_RUNNING.
      
      * Suggested using signal->wait_chldexit instead of bit wait.
      
      * Spotted a race condition between TRACED transition and clearing of
        TRAPPING.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      d79fdd6d
    • Tejun Heo's avatar
      ptrace: Make do_signal_stop() use ptrace_stop() if the task is being ptraced · 5224fa36
      Tejun Heo authored
      A ptraced task would still stop at do_signal_stop() when it's stopping
      for stop signals and do_signal_stop() behaves the same whether the
      task is ptraced or not.  However, in addition to stopping,
      ptrace_stop() also does ptrace specific stuff like calling
      architecture specific callbacks, so this behavior makes the code more
      fragile and difficult to understand.
      
      This patch makes do_signal_stop() test whether the task is ptraced and
      use ptrace_stop() if so.  This renders tracehook_notify_jctl() rather
      pointless as the ptrace notification is now handled by ptrace_stop()
      regardless of the return value from the tracehook.  It probably is a
      good idea to update it.
      
      This doesn't solve the whole problem as tasks already in stopped state
      would stay in the regular stop when ptrace attached.  That part will
      be handled by the next patch.
      
      Oleg pointed out that this makes a userland-visible change.  Before,
      SIGCONT would be able to wake up a task in group stop even if the task
      is ptraced if the tracer hasn't issued another ptrace command
      afterwards (as the next ptrace commands transitions the state into
      TASK_TRACED which ignores SIGCONT wakeups).  With this and the next
      patch, SIGCONT may race with the transition into TASK_TRACED and is
      ignored if the tracee already entered TASK_TRACED.
      
      Another userland visible change of this and the next patch is that the
      ptracee's state would now be TASK_TRACED where it used to be
      TASK_STOPPED, which is visible via fs/proc.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      5224fa36
    • Tejun Heo's avatar
      ptrace: Participate in group stop from ptrace_stop() iff the task is trapping for group stop · 0ae8ce1c
      Tejun Heo authored
      Currently, ptrace_stop() unconditionally participates in group stop
      bookkeeping.  This is unnecessary and inaccurate.  Make it only
      participate if the task is trapping for group stop - ie. if @why is
      CLD_STOPPED.  As ptrace_stop() currently is not used when trapping for
      group stop, this equals to disabling group stop participation from
      ptrace_stop().
      
      A visible behavior change is increased likelihood of delayed group
      stop completion if the thread group contains one or more ptraced
      tasks.
      
      This is to preapre for further cleanup of the interaction between
      group stop and ptrace.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      0ae8ce1c
    • Tejun Heo's avatar
      signal: Use GROUP_STOP_PENDING to stop once for a single group stop · 39efa3ef
      Tejun Heo authored
      Currently task->signal->group_stop_count is used to decide whether to
      stop for group stop.  However, if there is a task in the group which
      is taking a long time to stop, other tasks which are continued by
      ptrace would repeatedly stop for the same group stop until the group
      stop is complete.
      
      Conversely, if a ptraced task is in TASK_TRACED state, the debugger
      won't get notified of group stops which is inconsistent compared to
      the ptraced task in any other state.
      
      This patch introduces GROUP_STOP_PENDING which tracks whether a task
      is yet to stop for the group stop in progress.  The flag is set when a
      group stop starts and cleared when the task stops the first time for
      the group stop, and consulted whenever whether the task should
      participate in a group stop needs to be determined.  Note that now
      tasks in TASK_TRACED also participate in group stop.
      
      This results in the following behavior changes.
      
      * For a single group stop, a ptracer would see at most one stop
        reported.
      
      * A ptracee in TASK_TRACED now also participates in group stop and the
        tracer would get the notification.  However, as a ptraced task could
        be in TASK_STOPPED state or any ptrace trap could consume group
        stop, the notification may still be missing.  These will be
        addressed with further patches.
      
      * A ptracee may start a group stop while one is still in progress if
        the tracer let it continue with stop signal delivery.  Group stop
        code handles this correctly.
      
      Oleg:
      
      * Spotted that a task might skip signal check even when its
        GROUP_STOP_PENDING is set.  Fixed by updating
        recalc_sigpending_tsk() to check GROUP_STOP_PENDING instead of
        group_stop_count.
      
      * Pointed out that task->group_stop should be cleared whenever
        task->signal->group_stop_count is cleared.  Fixed accordingly.
      
      * Pointed out the behavior inconsistency between TASK_TRACED and
        RUNNING and the last behavior change.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      39efa3ef
    • Tejun Heo's avatar
      signal: Fix premature completion of group stop when interfered by ptrace · e5c1902e
      Tejun Heo authored
      task->signal->group_stop_count is used to track the progress of group
      stop.  It's initialized to the number of tasks which need to stop for
      group stop to finish and each stopping or trapping task decrements.
      However, each task doesn't keep track of whether it decremented the
      counter or not and if woken up before the group stop is complete and
      stops again, it can decrement the counter multiple times.
      
      Please consider the following example code.
      
       static void *worker(void *arg)
       {
      	 while (1) ;
      	 return NULL;
       }
      
       int main(void)
       {
      	 pthread_t thread;
      	 pid_t pid;
      	 int i;
      
      	 pid = fork();
      	 if (!pid) {
      		 for (i = 0; i < 5; i++)
      			 pthread_create(&thread, NULL, worker, NULL);
      		 while (1) ;
      		 return 0;
      	 }
      
      	 ptrace(PTRACE_ATTACH, pid, NULL, NULL);
      	 while (1) {
      		 waitid(P_PID, pid, NULL, WSTOPPED);
      		 ptrace(PTRACE_SINGLESTEP, pid, NULL, (void *)(long)SIGSTOP);
      	 }
      	 return 0;
       }
      
      The child creates five threads and the parent continuously traps the
      first thread and whenever the child gets a signal, SIGSTOP is
      delivered.  If an external process sends SIGSTOP to the child, all
      other threads in the process should reliably stop.  However, due to
      the above bug, the first thread will often end up consuming
      group_stop_count multiple times and SIGSTOP often ends up stopping
      none or part of the other four threads.
      
      This patch adds a new field task->group_stop which is protected by
      siglock and uses GROUP_STOP_CONSUME flag to track which task is still
      to consume group_stop_count to fix this bug.
      
      task_clear_group_stop_pending() and task_participate_group_stop() are
      added to help manipulating group stop states.  As ptrace_stop() now
      also uses task_participate_group_stop(), it will set
      SIGNAL_STOP_STOPPED if it completes a group stop.
      
      There still are many issues regarding the interaction between group
      stop and ptrace.  Patches to address them will follow.
      
      - Oleg spotted duplicate GROUP_STOP_CONSUME.  Dropped.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      e5c1902e
    • Tejun Heo's avatar
      ptrace: Add @why to ptrace_stop() · fe1bc6a0
      Tejun Heo authored
      To prepare for cleanup of the interaction between group stop and
      ptrace, add @why to ptrace_stop().  Existing users are updated such
      that there is no behavior change.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarRoland McGrath <roland@redhat.com>
      fe1bc6a0
    • Tejun Heo's avatar
      ptrace: Kill tracehook_notify_jctl() · edf2ed15
      Tejun Heo authored
      tracehook_notify_jctl() aids in determining whether and what to report
      to the parent when a task is stopped or continued.  The function also
      adds an extra requirement that siglock may be released across it,
      which is currently unused and quite difficult to satisfy in
      well-defined manner.
      
      As job control and the notifications are about to receive major
      overhaul, remove the tracehook and open code it.  If ever necessary,
      let's factor it out after the overhaul.
      
      * Oleg spotted incorrect CLD_CONTINUED/STOPPED selection when ptraced.
        Fixed.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      edf2ed15
    • Tejun Heo's avatar
      signal: Remove superflous try_to_freeze() loop in do_signal_stop() · 71db5eb9
      Tejun Heo authored
      do_signal_stop() is used only by get_signal_to_deliver() and after a
      successful signal stop, it always calls try_to_freeze(), so the
      try_to_freeze() loop around schedule() in do_signal_stop() is
      superflous and confusing.  Remove it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarRoland McGrath <roland@redhat.com>
      71db5eb9
    • Tejun Heo's avatar
      ptrace: Remove the extra wake_up_state() from ptrace_detach() · 9f2bf651
      Tejun Heo authored
      This wake_up_state() has a turbulent history.  This is a remnant from
      ancient ptrace implementation and patently wrong.  Commit 95a3540d
      (ptrace_detach: the wrong wakeup breaks the ERESTARTxxx logic) removed
      it but the change was reverted later by commit edaba2c5 (ptrace:
      revert "ptrace_detach: the wrong wakeup breaks the ERESTARTxxx logic")
      citing compatibility breakage and general brokeness of the whole group
      stop / ptrace interaction.  Then, recently, it got converted from
      wake_up_process() to wake_up_state() to make it less dangerous.
      
      Digging through the mailing archives, the compatibility breakage
      doesn't seem to be critical in the sense that the behavior isn't well
      defined or reliable to begin with and it seems to have been agreed to
      remove the wakeup with proper cleanup of the whole thing.
      
      Now that the group stop and its interaction with ptrace are being
      cleaned up, it's high time to finally kill this silliness.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      9f2bf651
    • Tejun Heo's avatar
      signal: Fix SIGCONT notification code · c672af35
      Tejun Heo authored
      After a task receives SIGCONT, its parent is notified via SIGCHLD with
      its siginfo describing what the notified event is.  If SIGCONT is
      received while the child process is stopped, the code should be
      CLD_CONTINUED.  If SIGCONT is recieved while the child process is in
      the process of being stopped, it should be CLD_STOPPED.  Which code to
      use is determined in prepare_signal() and recorded in signal->flags
      using SIGNAL_CLD_CONTINUED|STOP flags.
      
      get_signal_deliver() should test these flags and then notify
      accoringly; however, it incorrectly tested SIGNAL_STOP_CONTINUED
      instead of SIGNAL_CLD_CONTINUED, thus incorrectly notifying
      CLD_CONTINUED if the signal is delivered before the task is wait(2)ed
      and CLD_STOPPED if the state was fetched already.
      
      Fix it by testing SIGNAL_CLD_CONTINUED.  While at it, uncompress the
      ?: test into if/else clause for better readability.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarRoland McGrath <roland@redhat.com>
      c672af35
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx · 6447f55d
      Linus Torvalds authored
      * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (66 commits)
        avr32: at32ap700x: fix typo in DMA master configuration
        dmaengine/dmatest: Pass timeout via module params
        dma: let IMX_DMA depend on IMX_HAVE_DMA_V1 instead of an explicit list of SoCs
        fsldma: make halt behave nicely on all supported controllers
        fsldma: reduce locking during descriptor cleanup
        fsldma: support async_tx dependencies and automatic unmapping
        fsldma: fix controller lockups
        fsldma: minor codingstyle and consistency fixes
        fsldma: improve link descriptor debugging
        fsldma: use channel name in printk output
        fsldma: move related helper functions near each other
        dmatest: fix automatic buffer unmap type
        drivers, pch_dma: Fix warning when CONFIG_PM=n.
        dmaengine/dw_dmac fix: use readl & writel instead of __raw_readl & __raw_writel
        avr32: at32ap700x: Specify DMA Flow Controller, Src and Dst msize
        dw_dmac: Setting Default Burst length for transfers as 16.
        dw_dmac: Allow src/dst msize & flow controller to be configured at runtime
        dw_dmac: Changing type of src_master and dest_master to u8.
        dw_dmac: Pass Channel Priority from platform_data
        dw_dmac: Pass Channel Allocation Order from platform_data
        ...
      6447f55d
    • Jean Delvare's avatar
      bloat-o-meter: include read-only data section in report · c50e3f51
      Jean Delvare authored
      I'm not sure why the read-only data section is excluded from the report,
      it seems as relevant as the other data sections (b and d).
      
      I've stripped the symbols starting with __mod_ as they can have their
      names dynamically generated and thus comparison between binaries is not
      possible.
      Signed-off-by: default avatarJean Delvare <khali@linux-fr.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Acked-by: default avatarNathan Lynch <ntl@pobox.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c50e3f51
    • Jim Keniston's avatar
      zlib: slim down zlib_deflate() workspace when possible · 565d76cb
      Jim Keniston authored
      Instead of always creating a huge (268K) deflate_workspace with the
      maximum compression parameters (windowBits=15, memLevel=8), allow the
      caller to obtain a smaller workspace by specifying smaller parameter
      values.
      
      For example, when capturing oops and panic reports to a medium with
      limited capacity, such as NVRAM, compression may be the only way to
      capture the whole report.  In this case, a small workspace (24K works
      fine) is a win, whether you allocate the workspace when you need it (i.e.,
      during an oops or panic) or at boot time.
      
      I've verified that this patch works with all accepted values of windowBits
      (positive and negative), memLevel, and compression level.
      Signed-off-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Mason <chris.mason@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      565d76cb
    • Andrey Vagin's avatar
      fs/devpts/inode.c: correctly check d_alloc_name() return code in devpts_pty_new() · b12d1259
      Andrey Vagin authored
      d_alloc_name return NULL in case error, but we expect errno in
      devpts_pty_new.
      
      Addresses http://bugzilla.openvz.org/show_bug.cgi?id=1758Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b12d1259
    • Roland Dreier's avatar
      aio: wake all waiters when destroying ctx · e91f90bb
      Roland Dreier authored
      The test program below will hang because io_getevents() uses
      add_wait_queue_exclusive(), which means the wake_up() in io_destroy() only
      wakes up one of the threads.  Fix this by using wake_up_all() in the aio
      code paths where we want to make sure no one gets stuck.
      
      	// t.c -- compile with gcc -lpthread -laio t.c
      
      	#include <libaio.h>
      	#include <pthread.h>
      	#include <stdio.h>
      	#include <unistd.h>
      
      	static const int nthr = 2;
      
      	void *getev(void *ctx)
      	{
      		struct io_event ev;
      		io_getevents(ctx, 1, 1, &ev, NULL);
      		printf("io_getevents returned\n");
      		return NULL;
      	}
      
      	int main(int argc, char *argv[])
      	{
      		io_context_t ctx = 0;
      		pthread_t thread[nthr];
      		int i;
      
      		io_setup(1024, &ctx);
      
      		for (i = 0; i < nthr; ++i)
      			pthread_create(&thread[i], NULL, getev, ctx);
      
      		sleep(1);
      
      		io_destroy(ctx);
      
      		for (i = 0; i < nthr; ++i)
      			pthread_join(thread[i], NULL);
      
      		return 0;
      	}
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e91f90bb
    • Alexander Gordeev's avatar
      pps: remove unreachable code · 77d1c8eb
      Alexander Gordeev authored
      Remove code enabled only when CONFIG_PREEMPT_RT is turned on because it is
      not used in the vanilla kernel.
      Signed-off-by: default avatarAlexander Gordeev <lasaine@lvk.cs.msu.su>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Rodolfo Giometti <giometti@linux.it>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      77d1c8eb
    • Stuart Swales's avatar
      adfs: add hexadecimal filetype suffix option · da23ef05
      Stuart Swales authored
      ADFS (FileCore) storage complies with the RISC OS filetype specification
      (12 bits of file type information is stored in the file load address,
      rather than using a file extension).  The existing driver largely ignores
      this information and does not present it to the end user.
      
      It is desirable that stored filetypes be made visible to the end user to
      facilitate a precise copy of data and metadata from a hard disc (or image
      thereof) into a RISC OS emulator (such as RPCEmu) or to a network share
      which can be accessed by real Acorn systems.
      
      This patch implements a per-mount filetype suffix option (use -o
      ftsuffix=1) to present any filetype as a ,xyz hexadecimal suffix on each
      file.  This type suffix is compatible with that used by RISC OS systems
      that access network servers using NFS client software and by RPCemu's host
      filing system.
      Signed-off-by: default avatarStuart Swales <stuart.swales.croftnuisk@gmail.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      da23ef05
    • Stuart Swales's avatar
      adfs: improve timestamp precision · 7a9730af
      Stuart Swales authored
      ADFS (FileCore) storage complies with the RISC OS timestamp specification
      (40-bit centiseconds since 01 Jan 1900 00:00:00).  It is desirable that
      stored timestamp precision be maintained to facilitate a precise copy of
      data and metadata from a hard disc (or image thereof) into a RISC OS
      emulator (such as RPCEmu).
      
      This patch implements a full-precision conversion from ADFS to Unix
      timestamp as the existing driver, for ease of calculation with old 32-bit
      compilers, uses the common trick of shifting the 40-bits representing
      centiseconds around into 32-bits representing seconds thereby losing
      precision.
      
      Signed-off-by: Stuart Swales<stuart.swales.croftnuisk@gmail.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7a9730af
    • Stuart Swales's avatar
      adfs: fix E+/F+ dir size > 2048 crashing kernel · 2f09719a
      Stuart Swales authored
      Kernel crashes in fs/adfs module when accessing directories with a large
      number of objects on mounted Acorn ADFS E+/F+ format discs (or images) as
      the existing code writes off the end of the fixed array of struct
      buffer_head pointers.
      
      Additionally, each directory access that didn't crash would leak a buffer
      as nr_buffers was not adjusted correctly for E+/F+ discs (was always left
      as one less than required).
      
      The patch fixes this by allocating a dynamically-sized set of struct
      buffer_head pointers if necessary for the E+/F+ case (many directories
      still do in fact fit in 2048 bytes) and sets the correct nr_buffers so
      that all buffers are released.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=26072
      
      Tested by tar'ing the contents of my RISC PC's E+ format 20Gb HDD which
      contains a number of large directories that previously crashed the kernel.
      Signed-off-by: default avatarStuart Swales <stuart.swales.croftnuisk@gmail.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2f09719a
    • Chen Gong's avatar
      Documentation/vm/page-types.c: auto debugfs mount for hwpoison operation · 12da58b0
      Chen Gong authored
      page-types.c doesn't supply a way to specify the debugfs path and the
      original debugfs path is not usual on most machines.  This patch supplies
      a way to auto mount debugfs if needed.
      
      This patch is heavily inspired by tools/perf/utils/debugfs.c
      
      [akpm@linux-foundation.org: make functions static]
      [akpm@linux-foundation.org: fix debugfs_mount() signature]
      Signed-off-by: default avatarChen Gong <gong.chen@linux.intel.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      12da58b0
    • Christian Kujau's avatar
      Documentation/Changes: minor corrections · e06c3744
      Christian Kujau authored
      I noticed the 'mcelog' program had no comment and then ended up "fixing"
      a few more things:
      
        * reiserfsck -V does not print "reiserfsprogs" (any more?)
        * is "udevinfo" still shipped? udevd certainly is
        * grub2 doesn't have a 'grub' binary
        * add a "# how to get the mcelog version" comment
      Signed-off-by: default avatarChristian Kujau <lists@nerdbynature.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e06c3744
    • Harry Wei's avatar
      Documentation/CodingStyle: flesh out if-else examples · 38829dc9
      Harry Wei authored
      There is a missing case for "Chapter 3: Placing Braces and Spaces".  We
      often know we should not use braces where a single statement.  The first
      case is:
      
      	if (condition)
      		action();
      
      Another case is:
      
      	if (condition)
      		do_this();
      	else
      		do_that();
      
      However, I can not find a description of the second case.
      Signed-off-by: default avatarHarry Wei <harryxiyou@gmail.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      38829dc9
    • Rakib Mullick's avatar
      codafs: fix compile warning when CONFIG_SYSCTL=n · 0bc825d2
      Rakib Mullick authored
      When CONFIG_SYSCTL=n, we get the following warning:
      
      fs/coda/sysctl.c:18: warning: `coda_tabl' defined but not used
      
      Fix the warning by making sure coda_table and it's callee function are in
      the same context.  Also clean up the code by removing extra #ifdef.
      
      [akpm@linux-foundation.org: remove unneeded stub macros]
      Signed-off-by: default avatarRakib Mullick <rakib.mullick@gmail.com>
      Cc: Jan Harkes <jaharkes@cs.cmu.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0bc825d2
    • David Rientjes's avatar
      x86: allow CONFIG_ISA_DMA_API to be disabled · 1c00f016
      David Rientjes authored
      Not all 64-bit systems require ISA-style DMA, so allow it to be
      configurable.  x86 utilizes the generic ISA DMA allocator from
      kernel/dma.c, so require it only when CONFIG_ISA_DMA_API is enabled.
      
      Disabling CONFIG_ISA_DMA_API is dependent on x86_64 since those machines
      do not have ISA slots and benefit the most from disabling the option (and
      on CONFIG_EXPERT as required by H.  Peter Anvin).
      
      When disabled, this also avoids declaring claim_dma_lock(),
      release_dma_lock(), request_dma(), and free_dma() since those interfaces
      will no longer be provided.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c00f016
    • David Rientjes's avatar
      x86: only compile floppy driver if CONFIG_ISA_DMA_API is enabled · 8df3bd9e
      David Rientjes authored
      The generic floppy disk driver utilizies the interface provided by
      CONFIG_ISA_DMA_API, specifically claim_dma_lock(), release_dma_lock(),
      request_dma(), and free_dma().  Thus, there's a strict dependency on the
      config option and the driver should only be loaded if the kernel supports
      ISA-style DMA.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8df3bd9e
    • David Rientjes's avatar
      x86: only compile 8237A if CONFIG_ISA_DMA_API is enabled · 4061d68e
      David Rientjes authored
      8237A utilizes the interface provided by CONFIG_ISA_DMA_API, specifically
      claim_dma_lock() and release_dma_lock().  Thus, there's a strict
      dependency on the config option and the module should only be loaded if
      the kernel supports ISA-style DMA.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4061d68e
    • David Rientjes's avatar
      pnp: only assign IORESOURCE_DMA if CONFIG_ISA_DMA_API is enabled · 586f83e2
      David Rientjes authored
      IORESOURCE_DMA cannot be assigned without utilizing the interface
      provided by CONFIG_ISA_DMA_API, specifically request_dma() and
      free_dma().  Thus, there's a strict dependency on the config option and
      limits IORESOURCE_DMA only to architectures that support ISA-style DMA.
      
      ia64 is not one of those architectures, so pnp_check_dma() no longer
      needs to be special-cased for that architecture.
      
      pnp_assign_resources() will now return -EINVAL if IORESOURCE_DMA is
      attempted on such a kernel.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      586f83e2
    • Andrew Chew's avatar
      rtc: add real-time clock driver for NVIDIA Tegra · ff859ba6
      Andrew Chew authored
      This is a platform driver that supports the built-in real-time clock on
      Tegra SOCs.
      Signed-off-by: default avatarAndrew Chew <achew@nvidia.com>
      Acked-by: default avatarAlessandro Zummo <a.zummo@towertech.it>
      Acked-by: default avatarWan ZongShun <mcuos.com@gmail.com>
      Acked-by: default avatarJon Mayo <jmayo@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ff859ba6
    • Vasiliy Kulikov's avatar
      drivers/rtc/rtc-ds1511.c: world-writable sysfs nvram file · 49d50fb1
      Vasiliy Kulikov authored
      Don't allow everybogy to write to NVRAM.
      Signed-off-by: default avatarVasiliy Kulikov <segoon@openwall.com>
      Cc: Andy Sharp <andy.sharp@onstor.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      49d50fb1
    • Ryan Mallon's avatar
      drivers/rtc/rtc-isl1208.c: add alarm support · cf044f0e
      Ryan Mallon authored
      Add alarm/wakeup support to rtc isl1208 driver
      Signed-off-by: default avatarRyan Mallon <ryan@bluewatersys.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cf044f0e
    • Mark Brown's avatar
      rtc: convert DS1374 to dev_pm_ops · bc96ba74
      Mark Brown authored
      There is a general move to replace bus-specific PM ops with dev_pm_ops in
      order to facilitate core improvements. Do this conversion for DS1374.
      Signed-off-by: default avatarMark Brown <broonie@opensource.wolfsonmicro.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc96ba74
    • Davidlohr Bueso's avatar
      init: return proper error code in do_mounts_rd() · ea611b26
      Davidlohr Bueso authored
      In do_mounts_rd() if memory cannot be allocated, return -ENOMEM.
      Signed-off-by: default avatarDavidlohr Bueso <dave@gnu.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ea611b26
    • David Daney's avatar
      binfmt_elf: quiet GCC-4.6 'set but not used' warning in load_elf_binary() · 1a530a6f
      David Daney authored
      With GCC-4.6 we get warnings about things being 'set but not used'.
      
      In load_elf_binary() this can happen with reloc_func_desc if ELF_PLAT_INIT
      is defined, but doesn't use the reloc_func_desc argument.
      
      Quiet the warning/error by marking reloc_func_desc as __maybe_unused.
      Signed-off-by: default avatarDavid Daney <ddaney@caviumnetworks.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1a530a6f
    • Shawn Bohrer's avatar
      epoll: fix compiler warning and optimize the non-blocking path · f4d93ad7
      Shawn Bohrer authored
      Add a comment to ep_poll(), rename labels a bit clearly, fix a warning of
      unused variable from gcc and optimize the non-blocking path a little.
      Hinted-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      
      hannes@cmpxchg.org:
      
      : The non-blocking ep_poll path optimization introduced skipping over the
      : return value setup.
      :
      : Initialize it properly, my userspace gets upset by epoll_wait() returning
      : random things.
      :
      : In addition, remove the reinitialization at the fetch_events label, the
      : return value is garuanteed to be zero when execution reaches there.
      
      [hannes@cmpxchg.org: fix initialization]
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Shawn Bohrer <shawn.bohrer@gmail.com>
      Acked-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f4d93ad7
    • Davide Libenzi's avatar
      epoll: move ready event check into proper inline · 3fb0e584
      Davide Libenzi authored
      Move the event readiness check into a proper inline, and use it uniformly
      inside ep_poll() code.  Events in the ->ovflist are no less ready than the
      ones in ->rdllist.
      Signed-off-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Cc: Shawn Bohrer <shawn.bohrer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3fb0e584
    • Konstantin Khlebnikov's avatar
      crc32: add missed brackets in macro · d03e1617
      Konstantin Khlebnikov authored
      Add brackets around typecasted argument in crc32() macro.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d03e1617