• Ming Lei's avatar
    mm: teach mm by current context info to not do I/O during memory allocation · 21caf2fc
    Ming Lei authored
    This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
    'struct task_struct'), so that the flag can be set by one task to avoid
    doing I/O inside memory allocation in the task's context.
    
    The patch trys to solve one deadlock problem caused by block device, and
    the problem may happen at least in the below situations:
    
    - during block device runtime resume, if memory allocation with
      GFP_KERNEL is called inside runtime resume callback of any one of its
      ancestors(or the block device itself), the deadlock may be triggered
      inside the memory allocation since it might not complete until the block
      device becomes active and the involed page I/O finishes.  The situation
      is pointed out first by Alan Stern.  It is not a good approach to
      convert all GFP_KERNEL[1] in the path into GFP_NOIO because several
      subsystems may be involved(for example, PCI, USB and SCSI may be
      involved for usb mass stoarage device, network devices involved too in
      the iSCSI case)
    
    - during block device runtime suspend, because runtime resume need to
      wait for completion of concurrent runtime suspend.
    
    - during error handling of usb mass storage deivce, USB bus reset will
      be put on the device, so there shouldn't have any memory allocation with
      GFP_KERNEL during USB bus reset, otherwise the deadlock similar with
      above may be triggered.  Unfortunately, any usb device may include one
      mass storage interface in theory, so it requires all usb interface
      drivers to handle the situation.  In fact, most usb drivers don't know
      how to handle bus reset on the device and don't provide .pre_set() and
      .post_reset() callback at all, so USB core has to unbind and bind driver
      for these devices.  So it is still not practical to resort to GFP_NOIO
      for solving the problem.
    
    Also the introduced solution can be used by block subsystem or block
    drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
    actual I/O transfer.
    
    It is not a good idea to convert all these GFP_KERNEL in the affected
    path into GFP_NOIO because these functions doing that may be implemented
    as library and will be called in many other contexts.
    
    In fact, memalloc_noio_flags() can convert some of current static
    GFP_NOIO allocation into GFP_KERNEL back in other non-affected contexts,
    at least almost all GFP_NOIO in USB subsystem can be converted into
    GFP_KERNEL after applying the approach and make allocation with GFP_NOIO
    only happen in runtime resume/bus reset/block I/O transfer contexts
    generally.
    
    [1], several GFP_KERNEL allocation examples in runtime resume path
    
    - pci subsystem
    acpi_os_allocate
    	<-acpi_ut_allocate
    		<-ACPI_ALLOCATE_ZEROED
    			<-acpi_evaluate_object
    				<-__acpi_bus_set_power
    					<-acpi_bus_set_power
    						<-acpi_pci_set_power_state
    							<-platform_pci_set_power_state
    								<-pci_platform_power_transition
    									<-__pci_complete_power_transition
    										<-pci_set_power_state
    											<-pci_restore_standard_config
    												<-pci_pm_runtime_resume
    - usb subsystem
    usb_get_status
    	<-finish_port_resume
    		<-usb_port_resume
    			<-generic_resume
    				<-usb_resume_device
    					<-usb_resume_both
    						<-usb_runtime_resume
    
    - some individual usb drivers
    usblp, uvc, gspca, most of dvb-usb-v2 media drivers, cpia2, az6007, ....
    
    That is just what I have found.  Unfortunately, this allocation can only
    be found by human being now, and there should be many not found since
    any function in the resume path(call tree) may allocate memory with
    GFP_KERNEL.
    Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
    Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
    Cc: Alan Stern <stern@rowland.harvard.edu>
    Cc: Oliver Neukum <oneukum@suse.de>
    Cc: Jiri Kosina <jiri.kosina@suse.com>
    Cc: Mel Gorman <mel@csn.ul.ie>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Michal Hocko <mhocko@suse.cz>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
    Cc: Greg KH <greg@kroah.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Eric Dumazet <eric.dumazet@gmail.com>
    Cc: David Decotigny <david.decotigny@google.com>
    Cc: Tom Herbert <therbert@google.com>
    Cc: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    21caf2fc
vmscan.c 100 KB