diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/DocBook/genericirq.tmpl | 84 | ||||
-rw-r--r-- | Documentation/DocBook/kernel-locking.tmpl | 14 | ||||
-rw-r--r-- | Documentation/RCU/checklist.txt | 46 | ||||
-rw-r--r-- | Documentation/RCU/stallwarn.txt | 18 | ||||
-rw-r--r-- | Documentation/RCU/trace.txt | 13 | ||||
-rw-r--r-- | Documentation/cputopology.txt | 23 | ||||
-rw-r--r-- | Documentation/feature-removal-schedule.txt | 28 | ||||
-rw-r--r-- | Documentation/kernel-parameters.txt | 11 | ||||
-rw-r--r-- | Documentation/kprobes.txt | 8 | ||||
-rw-r--r-- | Documentation/pcmcia/driver-changes.txt | 25 |
10 files changed, 185 insertions, 85 deletions
diff --git a/Documentation/DocBook/genericirq.tmpl b/Documentation/DocBook/genericirq.tmpl index 1448b33fd222..fb10fd08c05c 100644 --- a/Documentation/DocBook/genericirq.tmpl +++ b/Documentation/DocBook/genericirq.tmpl @@ -28,7 +28,7 @@ </authorgroup> <copyright> - <year>2005-2006</year> + <year>2005-2010</year> <holder>Thomas Gleixner</holder> </copyright> <copyright> @@ -100,6 +100,10 @@ <listitem><para>Edge type</para></listitem> <listitem><para>Simple type</para></listitem> </itemizedlist> + During the implementation we identified another type: + <itemizedlist> + <listitem><para>Fast EOI type</para></listitem> + </itemizedlist> In the SMP world of the __do_IRQ() super-handler another type was identified: <itemizedlist> @@ -153,6 +157,7 @@ is still available. This leads to a kind of duality for the time being. Over time the new model should be used in more and more architectures, as it enables smaller and cleaner IRQ subsystems. + It's deprecated for three years now and about to be removed. </para> </chapter> <chapter id="bugs"> @@ -217,6 +222,7 @@ <itemizedlist> <listitem><para>handle_level_irq</para></listitem> <listitem><para>handle_edge_irq</para></listitem> + <listitem><para>handle_fasteoi_irq</para></listitem> <listitem><para>handle_simple_irq</para></listitem> <listitem><para>handle_percpu_irq</para></listitem> </itemizedlist> @@ -233,33 +239,33 @@ are used by the default flow implementations. The following helper functions are implemented (simplified excerpt): <programlisting> -default_enable(irq) +default_enable(struct irq_data *data) { - desc->chip->unmask(irq); + desc->chip->irq_unmask(data); } -default_disable(irq) +default_disable(struct irq_data *data) { - if (!delay_disable(irq)) - desc->chip->mask(irq); + if (!delay_disable(data)) + desc->chip->irq_mask(data); } -default_ack(irq) +default_ack(struct irq_data *data) { - chip->ack(irq); + chip->irq_ack(data); } -default_mask_ack(irq) +default_mask_ack(struct irq_data *data) { - if (chip->mask_ack) { - chip->mask_ack(irq); + if (chip->irq_mask_ack) { + chip->irq_mask_ack(data); } else { - chip->mask(irq); - chip->ack(irq); + chip->irq_mask(data); + chip->irq_ack(data); } } -noop(irq) +noop(struct irq_data *data)) { } @@ -278,12 +284,27 @@ noop(irq) <para> The following control flow is implemented (simplified excerpt): <programlisting> -desc->chip->start(); +desc->chip->irq_mask(); handle_IRQ_event(desc->action); -desc->chip->end(); +desc->chip->irq_unmask(); </programlisting> </para> - </sect3> + </sect3> + <sect3 id="Default_FASTEOI_IRQ_flow_handler"> + <title>Default Fast EOI IRQ flow handler</title> + <para> + handle_fasteoi_irq provides a generic implementation + for interrupts, which only need an EOI at the end of + the handler + </para> + <para> + The following control flow is implemented (simplified excerpt): + <programlisting> +handle_IRQ_event(desc->action); +desc->chip->irq_eoi(); + </programlisting> + </para> + </sect3> <sect3 id="Default_Edge_IRQ_flow_handler"> <title>Default Edge IRQ flow handler</title> <para> @@ -294,20 +315,19 @@ desc->chip->end(); The following control flow is implemented (simplified excerpt): <programlisting> if (desc->status & running) { - desc->chip->hold(); + desc->chip->irq_mask(); desc->status |= pending | masked; return; } -desc->chip->start(); +desc->chip->irq_ack(); desc->status |= running; do { if (desc->status & masked) - desc->chip->enable(); + desc->chip->irq_unmask(); desc->status &= ~pending; handle_IRQ_event(desc->action); } while (status & pending); desc->status &= ~running; -desc->chip->end(); </programlisting> </para> </sect3> @@ -342,9 +362,9 @@ handle_IRQ_event(desc->action); <para> The following control flow is implemented (simplified excerpt): <programlisting> -desc->chip->start(); handle_IRQ_event(desc->action); -desc->chip->end(); +if (desc->chip->irq_eoi) + desc->chip->irq_eoi(); </programlisting> </para> </sect3> @@ -375,8 +395,7 @@ desc->chip->end(); mechanism. (It's necessary to enable CONFIG_HARDIRQS_SW_RESEND when you want to use the delayed interrupt disable feature and your hardware is not capable of retriggering an interrupt.) - The delayed interrupt disable can be runtime enabled, per interrupt, - by setting the IRQ_DELAYED_DISABLE flag in the irq_desc status field. + The delayed interrupt disable is not configurable. </para> </sect2> </sect1> @@ -387,13 +406,13 @@ desc->chip->end(); contains all the direct chip relevant functions, which can be utilized by the irq flow implementations. <itemizedlist> - <listitem><para>ack()</para></listitem> - <listitem><para>mask_ack() - Optional, recommended for performance</para></listitem> - <listitem><para>mask()</para></listitem> - <listitem><para>unmask()</para></listitem> - <listitem><para>retrigger() - Optional</para></listitem> - <listitem><para>set_type() - Optional</para></listitem> - <listitem><para>set_wake() - Optional</para></listitem> + <listitem><para>irq_ack()</para></listitem> + <listitem><para>irq_mask_ack() - Optional, recommended for performance</para></listitem> + <listitem><para>irq_mask()</para></listitem> + <listitem><para>irq_unmask()</para></listitem> + <listitem><para>irq_retrigger() - Optional</para></listitem> + <listitem><para>irq_set_type() - Optional</para></listitem> + <listitem><para>irq_set_wake() - Optional</para></listitem> </itemizedlist> These primitives are strictly intended to mean what they say: ack means ACK, masking means masking of an IRQ line, etc. It is up to the flow @@ -458,6 +477,7 @@ desc->chip->end(); <para> This chapter contains the autogenerated documentation of the internal functions. </para> +!Ikernel/irq/irqdesc.c !Ikernel/irq/handle.c !Ikernel/irq/chip.c </chapter> diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl index a0d479d1e1dd..f66f4df18690 100644 --- a/Documentation/DocBook/kernel-locking.tmpl +++ b/Documentation/DocBook/kernel-locking.tmpl @@ -1645,7 +1645,9 @@ the amount of locking which needs to be done. all the readers who were traversing the list when we deleted the element are finished. We use <function>call_rcu()</function> to register a callback which will actually destroy the object once - the readers are finished. + all pre-existing readers are finished. Alternatively, + <function>synchronize_rcu()</function> may be used to block until + all pre-existing are finished. </para> <para> But how does Read Copy Update know when the readers are @@ -1714,7 +1716,7 @@ the amount of locking which needs to be done. - object_put(obj); + list_del_rcu(&obj->list); cache_num--; -+ call_rcu(&obj->rcu, cache_delete_rcu, obj); ++ call_rcu(&obj->rcu, cache_delete_rcu); } /* Must be holding cache_lock */ @@ -1725,14 +1727,6 @@ the amount of locking which needs to be done. if (++cache_num > MAX_CACHE_SIZE) { struct object *i, *outcast = NULL; list_for_each_entry(i, &cache, list) { -@@ -85,6 +94,7 @@ - obj->popularity = 0; - atomic_set(&obj->refcnt, 1); /* The cache holds a reference */ - spin_lock_init(&obj->lock); -+ INIT_RCU_HEAD(&obj->rcu); - - spin_lock_irqsave(&cache_lock, flags); - __cache_add(obj); @@ -104,12 +114,11 @@ struct object *cache_find(int id) { diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 790d1a812376..0c134f8afc6f 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -218,13 +218,22 @@ over a rather long period of time, but improvements are always welcome! include: a. Keeping a count of the number of data-structure elements - used by the RCU-protected data structure, including those - waiting for a grace period to elapse. Enforce a limit - on this number, stalling updates as needed to allow - previously deferred frees to complete. - - Alternatively, limit only the number awaiting deferred - free rather than the total number of elements. + used by the RCU-protected data structure, including + those waiting for a grace period to elapse. Enforce a + limit on this number, stalling updates as needed to allow + previously deferred frees to complete. Alternatively, + limit only the number awaiting deferred free rather than + the total number of elements. + + One way to stall the updates is to acquire the update-side + mutex. (Don't try this with a spinlock -- other CPUs + spinning on the lock could prevent the grace period + from ever ending.) Another way to stall the updates + is for the updates to use a wrapper function around + the memory allocator, so that this wrapper function + simulates OOM when there is too much memory awaiting an + RCU grace period. There are of course many other + variations on this theme. b. Limiting update rate. For example, if updates occur only once per hour, then no explicit rate limiting is required, @@ -365,3 +374,26 @@ over a rather long period of time, but improvements are always welcome! and the compiler to freely reorder code into and out of RCU read-side critical sections. It is the responsibility of the RCU update-side primitives to deal with this. + +17. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and + the __rcu sparse checks to validate your RCU code. These + can help find problems as follows: + + CONFIG_PROVE_RCU: check that accesses to RCU-protected data + structures are carried out under the proper RCU + read-side critical section, while holding the right + combination of locks, or whatever other conditions + are appropriate. + + CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the + same object to call_rcu() (or friends) before an RCU + grace period has elapsed since the last time that you + passed that same object to call_rcu() (or friends). + + __rcu sparse checks: tag the pointer to the RCU-protected data + structure with __rcu, and sparse will warn you if you + access that pointer without the services of one of the + variants of rcu_dereference(). + + These debugging aids can help you find problems that are + otherwise extremely difficult to spot. diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 44c6dcc93d6d..862c08ef1fde 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -80,6 +80,24 @@ o A CPU looping with bottom halves disabled. This condition can o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel without invoking schedule(). +o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might + happen to preempt a low-priority task in the middle of an RCU + read-side critical section. This is especially damaging if + that low-priority task is not permitted to run on any other CPU, + in which case the next RCU grace period can never complete, which + will eventually cause the system to run out of memory and hang. + While the system is in the process of running itself out of + memory, you might see stall-warning messages. + +o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that + is running at a higher priority than the RCU softirq threads. + This will prevent RCU callbacks from ever being invoked, + and in a CONFIG_TREE_PREEMPT_RCU kernel will further prevent + RCU grace periods from ever completing. Either way, the + system will eventually run out of memory and hang. In the + CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning + messages. + o A bug in the RCU implementation. o A hardware failure. This is quite unlikely, but has occurred diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index efd8cc95c06b..a851118775d8 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -125,6 +125,17 @@ o "b" is the batch limit for this CPU. If more than this number of RCU callbacks is ready to invoke, then the remainder will be deferred. +o "ci" is the number of RCU callbacks that have been invoked for + this CPU. Note that ci+ql is the number of callbacks that have + been registered in absence of CPU-hotplug activity. + +o "co" is the number of RCU callbacks that have been orphaned due to + this CPU going offline. + +o "ca" is the number of RCU callbacks that have been adopted due to + other CPUs going offline. Note that ci+co-ca+ql is the number of + RCU callbacks registered on this CPU. + There is also an rcu/rcudata.csv file with the same information in comma-separated-variable spreadsheet format. @@ -180,7 +191,7 @@ o "s" is the "signaled" state that drives force_quiescent_state()'s o "jfq" is the number of jiffies remaining for this grace period before force_quiescent_state() is invoked to help push things - along. Note that CPUs in dyntick-idle mode thoughout the grace + along. Note that CPUs in dyntick-idle mode throughout the grace period will not report on their own, but rather must be check by some other CPU via force_quiescent_state(). diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt index f1c5c4bccd3e..902d3151f527 100644 --- a/Documentation/cputopology.txt +++ b/Documentation/cputopology.txt @@ -14,25 +14,39 @@ to /proc/cpuinfo. identifier (rather than the kernel's). The actual value is architecture and platform dependent. -3) /sys/devices/system/cpu/cpuX/topology/thread_siblings: +3) /sys/devices/system/cpu/cpuX/topology/book_id: + + the book ID of cpuX. Typically it is the hardware platform's + identifier (rather than the kernel's). The actual value is + architecture and platform dependent. + +4) /sys/devices/system/cpu/cpuX/topology/thread_siblings: internel kernel map of cpuX's hardware threads within the same core as cpuX -4) /sys/devices/system/cpu/cpuX/topology/core_siblings: +5) /sys/devices/system/cpu/cpuX/topology/core_siblings: internal kernel map of cpuX's hardware threads within the same physical_package_id. +6) /sys/devices/system/cpu/cpuX/topology/book_siblings: + + internal kernel map of cpuX's hardware threads within the same + book_id. + To implement it in an architecture-neutral way, a new source file, -drivers/base/topology.c, is to export the 4 attributes. +drivers/base/topology.c, is to export the 4 or 6 attributes. The two book +related sysfs files will only be created if CONFIG_SCHED_BOOK is selected. For an architecture to support this feature, it must define some of these macros in include/asm-XXX/topology.h: #define topology_physical_package_id(cpu) #define topology_core_id(cpu) +#define topology_book_id(cpu) #define topology_thread_cpumask(cpu) #define topology_core_cpumask(cpu) +#define topology_book_cpumask(cpu) The type of **_id is int. The type of siblings is (const) struct cpumask *. @@ -45,6 +59,9 @@ not defined by include/asm-XXX/topology.h: 3) thread_siblings: just the given CPU 4) core_siblings: just the given CPU +For architectures that don't support books (CONFIG_SCHED_BOOK) there are no +default definitions for topology_book_id() and topology_book_cpumask(). + Additionally, CPU topology information is provided under /sys/devices/system/cpu and includes these files. The internal source for the output is in brackets ("[]"). diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 842aa9de84a6..5e2bc4ab897a 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -386,34 +386,6 @@ Who: Tejun Heo <tj@kernel.org> ---------------------------- -What: Support for VMware's guest paravirtuliazation technique [VMI] will be - dropped. -When: 2.6.37 or earlier. -Why: With the recent innovations in CPU hardware acceleration technologies - from Intel and AMD, VMware ran a few experiments to compare these - techniques to guest paravirtualization technique on VMware's platform. - These hardware assisted virtualization techniques have outperformed the - performance benefits provided by VMI in most of the workloads. VMware - expects that these hardware features will be ubiquitous in a couple of - years, as a result, VMware has started a phased retirement of this - feature from the hypervisor. We will be removing this feature from the - Kernel too. Right now we are targeting 2.6.37 but can retire earlier if - technical reasons (read opportunity to remove major chunk of pvops) - arise. - - Please note that VMI has always been an optimization and non-VMI kernels - still work fine on VMware's platform. - Latest versions of VMware's product which support VMI are, - Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence - releases for these products will continue supporting VMI. - - For more details about VMI retirement take a look at this, - http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html - -Who: Alok N Kataria <akataria@vmware.com> - ----------------------------- - What: Support for lcd_switch and display_get in asus-laptop driver When: March 2010 Why: These two features use non-standard interfaces. There are the diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 8dd7248508a9..3a0009e03d14 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -455,7 +455,7 @@ and is between 256 and 4096 characters. It is defined in the file [ARM] imx_timer1,OSTS,netx_timer,mpu_timer2, pxa_timer,timer3,32k_counter,timer0_1 [AVR32] avr32 - [X86-32] pit,hpet,tsc,vmi-timer; + [X86-32] pit,hpet,tsc; scx200_hrt on Geode; cyclone on IBM x440 [MIPS] MIPS [PARISC] cr16 @@ -2153,6 +2153,11 @@ and is between 256 and 4096 characters. It is defined in the file Reserves a hole at the top of the kernel virtual address space. + reservelow= [X86] + Format: nn[K] + Set the amount of memory to reserve for BIOS at + the bottom of the address space. + reset_devices [KNL] Force drivers to reset the underlying device during initialization. @@ -2435,6 +2440,10 @@ and is between 256 and 4096 characters. It is defined in the file disables clocksource verification at runtime. Used to enable high-resolution timer mode on older hardware, and in virtualized environment. + [x86] noirqtime: Do not use TSC to do irq accounting. + Used to run time disable IRQ_TIME_ACCOUNTING on any + platforms where RDTSC is slow and this accounting + can add overhead. turbografx.map[2|3]= [HW,JOY] TurboGraFX parallel port interface diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt index 1762b81fcdf2..741fe66d6eca 100644 --- a/Documentation/kprobes.txt +++ b/Documentation/kprobes.txt @@ -542,9 +542,11 @@ Kprobes does not use mutexes or allocate memory except during registration and unregistration. Probe handlers are run with preemption disabled. Depending on the -architecture, handlers may also run with interrupts disabled. In any -case, your handler should not yield the CPU (e.g., by attempting to -acquire a semaphore). +architecture and optimization state, handlers may also run with +interrupts disabled (e.g., kretprobe handlers and optimized kprobe +handlers run without interrupt disabled on x86/x86-64). In any case, +your handler should not yield the CPU (e.g., by attempting to acquire +a semaphore). Since a return probe is implemented by replacing the return address with the trampoline's address, stack backtraces and calls diff --git a/Documentation/pcmcia/driver-changes.txt b/Documentation/pcmcia/driver-changes.txt index 26c0f9c00545..dd04361dd361 100644 --- a/Documentation/pcmcia/driver-changes.txt +++ b/Documentation/pcmcia/driver-changes.txt @@ -1,4 +1,29 @@ This file details changes in 2.6 which affect PCMCIA card driver authors: +* pcmcia_loop_config() and autoconfiguration (as of 2.6.36) + If struct pcmcia_device *p_dev->config_flags is set accordingly, + pcmcia_loop_config() now sets up certain configuration values + automatically, though the driver may still override the settings + in the callback function. The following autoconfiguration options + are provided at the moment: + CONF_AUTO_CHECK_VCC : check for matching Vcc + CONF_AUTO_SET_VPP : set Vpp + CONF_AUTO_AUDIO : auto-enable audio line, if required + CONF_AUTO_SET_IO : set ioport resources (->resource[0,1]) + CONF_AUTO_SET_IOMEM : set first iomem resource (->resource[2]) + +* pcmcia_request_configuration -> pcmcia_enable_device (as of 2.6.36) + pcmcia_request_configuration() got renamed to pcmcia_enable_device(), + as it mirrors pcmcia_disable_device(). Configuration settings are now + stored in struct pcmcia_device, e.g. in the fields config_flags, + config_index, config_base, vpp. + +* pcmcia_request_window changes (as of 2.6.36) + Instead of win_req_t, drivers are now requested to fill out + struct pcmcia_device *p_dev->resource[2,3,4,5] for up to four ioport + ranges. After a call to pcmcia_request_window(), the regions found there + are reserved and may be used immediately -- until pcmcia_release_window() + is called. + * pcmcia_request_io changes (as of 2.6.36) Instead of io_req_t, drivers are now requested to fill out struct pcmcia_device *p_dev->resource[0,1] for up to two ioport |