commit bd1a9eb6a755e1cb342725a11242251d2bfad567 Author: Greg Kroah-Hartman Date: Fri Jul 21 07:19:02 2017 +0200 Linux 4.11.12 commit c69bb567124fd43fcbf439318216556e4259c11f Author: Haozhong Zhang Date: Tue Jul 4 10:27:41 2017 +0800 kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS commit 691bd4340bef49cf7e5855d06cf24444b5bf2d85 upstream. It's easier for host applications, such as QEMU, if they can always access guest MSR_IA32_BNDCFGS in VMCS, even though MPX is disabled in guest cpuid. Signed-off-by: Haozhong Zhang Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit bedb27f7481485fee092611409da00f1ce345c4a Author: Jim Mattson Date: Tue May 23 11:52:54 2017 -0700 kvm: vmx: Check value written to IA32_BNDCFGS commit 4531662d1abf6c1f0e5c2b86ddb60e61509786c8 upstream. Bits 11:2 must be zero and the linear addess in bits 63:12 must be canonical. Otherwise, WRMSR(BNDCFGS) should raise #GP. Fixes: 0dd376e709975779 ("KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save") Signed-off-by: Jim Mattson Signed-off-by: Radim Krčmář Signed-off-by: Greg Kroah-Hartman commit e0c0372d43bbe3a7d0f2c61cbf7c1a9566a9ed58 Author: Jim Mattson Date: Wed May 24 10:49:25 2017 -0700 kvm: x86: Guest BNDCFGS requires guest MPX support commit 4439af9f911ae0243ffe4e2dfc12bace49605d8b upstream. The BNDCFGS MSR should only be exposed to the guest if the guest supports MPX. (cf. the TSC_AUX MSR and RDTSCP.) Fixes: 0dd376e709975779 ("KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save") Change-Id: I3ad7c01bda616715137ceac878f3fa7e66b6b387 Signed-off-by: Jim Mattson Signed-off-by: Radim Krčmář Signed-off-by: Greg Kroah-Hartman commit e36fbd0c09fc3f33ce156b022a2ec0a17802f6e9 Author: Jim Mattson Date: Tue May 23 11:52:52 2017 -0700 kvm: vmx: Do not disable intercepts for BNDCFGS commit a8b6fda38f80e75afa3b125c9e7f2550b579454b upstream. The MSR permission bitmaps are shared by all VMs. However, some VMs may not be configured to support MPX, even when the host does. If the host supports VMX and the guest does not, we should intercept accesses to the BNDCFGS MSR, so that we can synthesize a #GP fault. Furthermore, if the host does not support MPX and the "ignore_msrs" kvm kernel parameter is set, then we should intercept accesses to the BNDCFGS MSR, so that we can skip over the rdmsr/wrmsr without raising a #GP fault. Fixes: da8999d31818fdc8 ("KVM: x86: Intel MPX vmx and msr handle") Signed-off-by: Jim Mattson Signed-off-by: Radim Krčmář Signed-off-by: Greg Kroah-Hartman commit 27a231e2cbf4b1b276dbdc7722f3ae3e49533c5a Author: Dan Carpenter Date: Mon Jul 10 10:21:40 2017 +0300 PM / QoS: return -EINVAL for bogus strings commit 2ca30331c156ca9e97643ad05dd8930b8fe78b01 upstream. In the current code, if the user accidentally writes a bogus command to this sysfs file, then we set the latency tolerance to an uninitialized variable. Fixes: 2d984ad132a8 (PM / QoS: Introcuce latency tolerance device PM QoS type) Signed-off-by: Dan Carpenter Acked-by: Pavel Machek Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman commit 81d6d6cff7a1b4052f4a985e3b4dd7e45d61a6ca Author: Ville Syrjälä Date: Thu Apr 27 19:02:21 2017 +0300 ALSA: x86: Clear the pdata.notify_lpe_audio pointer before teardown commit 8d5c30308d7c5a17db96fa5452c0232f633377c2 upstream. Clear the notify function pointer in the platform data before we tear down the driver. Otherwise i915 would end up calling a stale function pointer and possibly explode. Cc: Takashi Iwai Cc: Pierre-Louis Bossart Signed-off-by: Ville Syrjälä Link: http://patchwork.freedesktop.org/patch/msgid/20170427160231.13337-3-ville.syrjala@linux.intel.com Reviewed-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 4ad8c2aa7bc94e43c4692f56de86362c85ff8002 Author: Thomas Gleixner Date: Sun Jun 25 19:31:13 2017 +0200 PM / wakeirq: Convert to SRCU commit ea0212f40c6bc0594c8eff79266759e3ecd4bacc upstream. The wakeirq infrastructure uses RCU to protect the list of wakeirqs. That breaks the irq bus locking infrastructure, which is allows sleeping functions to be called so interrupt controllers behind slow busses, e.g. i2c, can be handled. The wakeirq functions hold rcu_read_lock and call into irq functions, which in case of interrupts using the irq bus locking will trigger a might_sleep() splat. Convert the wakeirq infrastructure to Sleepable RCU and unbreak it. Fixes: 4990d4fe327b (PM / Wakeirq: Add automated device wake IRQ handling) Reported-by: Brian Norris Suggested-by: Paul E. McKenney Signed-off-by: Thomas Gleixner Reviewed-by: Paul E. McKenney Tested-by: Tony Lindgren Tested-by: Brian Norris Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman commit f8d51665764029049703fa64bb631215c02b76b3 Author: Peter Zijlstra Date: Tue Apr 25 14:00:49 2017 +0200 sched/topology: Fix overlapping sched_group_mask commit 73bb059f9b8a00c5e1bf2f7ca83138c05d05e600 upstream. The point of sched_group_mask is to select those CPUs from sched_group_cpus that can actually arrive at this balance domain. The current code gets it wrong, as can be readily demonstrated with a topology like: node 0 1 2 3 0: 10 20 30 20 1: 20 10 20 30 2: 30 20 10 20 3: 20 30 20 10 Where (for example) domain 1 on CPU1 ends up with a mask that includes CPU0: [] CPU1 attaching sched-domain: [] domain 0: span 0-2 level NUMA [] groups: 1 (mask: 1), 2, 0 [] domain 1: span 0-3 level NUMA [] groups: 0-2 (mask: 0-2) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072) This causes sched_balance_cpu() to compute the wrong CPU and consequently should_we_balance() will terminate early resulting in missed load-balance opportunities. The fixed topology looks like: [] CPU1 attaching sched-domain: [] domain 0: span 0-2 level NUMA [] groups: 1 (mask: 1), 2, 0 [] domain 1: span 0-3 level NUMA [] groups: 0-2 (mask: 1) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072) (note: this relies on OVERLAP domains to always have children, this is true because the regular topology domains are still here -- this is before degenerate trimming) Debugged-by: Lauro Ramos Venancio Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Fixes: e3589f6c81e4 ("sched: Allow for overlapping sched_domain spans") Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 25816b4f1c992c6bb0978d2550735816cab2d8dc Author: Lauro Ramos Venancio Date: Thu Apr 20 16:51:40 2017 -0300 sched/topology: Optimize build_group_mask() commit f32d782e31bf079f600dcec126ed117b0577e85c upstream. The group mask is always used in intersection with the group CPUs. So, when building the group mask, we don't have to care about CPUs that are not part of the group. Signed-off-by: Lauro Ramos Venancio Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: lwang@redhat.com Cc: riel@redhat.com Link: http://lkml.kernel.org/r/1492717903-5195-2-git-send-email-lvenanci@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 5d2515d4fdbf77284d71de8d3e78ab936f7bd234 Author: Peter Zijlstra Date: Fri Apr 14 17:24:02 2017 +0200 sched/topology: Fix building of overlapping sched-groups commit 0372dd2736e02672ac6e189c31f7d8c02ad543cd upstream. When building the overlapping groups, we very obviously should start with the previous domain of _this_ @cpu, not CPU-0. This can be readily demonstrated with a topology like: node 0 1 2 3 0: 10 20 30 20 1: 20 10 20 30 2: 30 20 10 20 3: 20 30 20 10 Where (for example) CPU1 ends up generating the following nonsensical groups: [] CPU1 attaching sched-domain: [] domain 0: span 0-2 level NUMA [] groups: 1 2 0 [] domain 1: span 0-3 level NUMA [] groups: 1-3 (cpu_capacity = 3072) 0-1,3 (cpu_capacity = 3072) Where the fact that domain 1 doesn't include a group with span 0-2 is the obvious fail. With patch this looks like: [] CPU1 attaching sched-domain: [] domain 0: span 0-2 level NUMA [] groups: 1 0 2 [] domain 1: span 0-3 level NUMA [] groups: 0-2 (cpu_capacity = 3072) 0,2-3 (cpu_capacity = 3072) Debugged-by: Lauro Ramos Venancio Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Fixes: e3589f6c81e4 ("sched: Allow for overlapping sched_domain spans") Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 497deefb42dc3015886f969127b97125ea2cb048 Author: Peter Zijlstra Date: Fri Apr 14 14:20:05 2017 +0200 sched/fair, cpumask: Export for_each_cpu_wrap() commit c6508a39640b9a27fc2bc10cb708152672c82045 upstream. commit c743f0a5c50f2fcbc628526279cfa24f3dabe182 upstream. More users for for_each_cpu_wrap() have appeared. Promote the construct to generic cpumask interface. The implementation is slightly modified to reduce arguments. Signed-off-by: Peter Zijlstra (Intel) Cc: Lauro Ramos Venancio Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Cc: lwang@redhat.com Link: http://lkml.kernel.org/r/20170414122005.o35me2h5nowqkxbv@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman Signed-off-by: Greg Kroah-Hartman commit 5e256be7bb4e757e52289544ec24d503294bd75a Author: Horia Geantă Date: Fri Jul 7 16:57:06 2017 +0300 crypto: caam - fix signals handling commit 7459e1d25ffefa2b1be799477fcc1f6c62f6cec7 upstream. Driver does not properly handle the case when signals interrupt wait_for_completion_interruptible(): -it does not check for return value -completion structure is allocated on stack; in case a signal interrupts the sleep, it will go out of scope, causing the worker thread (caam_jr_dequeue) to fail when it accesses it wait_for_completion_interruptible() is replaced with uninterruptable wait_for_completion(). We choose to block all signals while waiting for I/O (device executing the split key generation job descriptor) since the alternative - in order to have a deterministic device state - would be to flush the job ring (aborting *all* in-progress jobs). Fixes: 045e36780f115 ("crypto: caam - ahash hmac support") Fixes: 4c1ec1f930154 ("crypto: caam - refactor key_gen, sg") Signed-off-by: Horia Geantă Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman commit d060c2f05f59d6aed891e067fb6a7eafc2407b61 Author: David Gstir Date: Wed Jun 28 15:27:10 2017 +0200 crypto: caam - properly set IV after {en,de}crypt commit 854b06f768794cd664886ec3ba3a5b1c58d42167 upstream. Certain cipher modes like CTS expect the IV (req->info) of ablkcipher_request (or equivalently req->iv of skcipher_request) to contain the last ciphertext block when the {en,de}crypt operation is done. This is currently not the case for the CAAM driver which in turn breaks e.g. cts(cbc(aes)) when the CAAM driver is enabled. This patch fixes the CAAM driver to properly set the IV after the {en,de}crypt operation of ablkcipher finishes. This issue was revealed by the changes in the SW CTS mode in commit 0605c41cc53ca ("crypto: cts - Convert to skcipher") Signed-off-by: David Gstir Reviewed-by: Horia Geantă Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman commit 96a5ede157dc0f27e922abd5874f8d9095b6f811 Author: Herbert Xu Date: Tue Jul 4 12:21:12 2017 +0800 crypto: sha1-ssse3 - Disable avx2 commit b82ce24426a4071da9529d726057e4e642948667 upstream. It has been reported that sha1-avx2 can cause page faults by reading beyond the end of the input. This patch disables it until it can be fixed. Fixes: 7c1da8d0d046 ("crypto: sha - SHA1 transform x86_64 AVX2") Reported-by: Jan Stancek Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman commit d7e274720ad901bcf7c26f8577e4e54deff9fb0c Author: Gilad Ben-Yossef Date: Wed Jun 28 10:22:03 2017 +0300 crypto: atmel - only treat EBUSY as transient if backlog commit 1606043f214f912a52195293614935811a6e3e53 upstream. The Atmel SHA driver was treating -EBUSY as indication of queueing to backlog without checking that backlog is enabled for the request. Fix it by checking request flags. Signed-off-by: Gilad Ben-Yossef Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman commit a7439186c9d5104d35de0f15a24339c15877940f Author: Martin Hicks Date: Tue May 2 09:38:35 2017 -0400 crypto: talitos - Extend max key length for SHA384/512-HMAC and AEAD commit 03d2c5114c95797c0aa7d9f463348b171a274fd4 upstream. An updated patch that also handles the additional key length requirements for the AEAD algorithms. The max keysize is not 96. For SHA384/512 it's 128, and for the AEAD algorithms it's longer still. Extend the max keysize for the AEAD size for AES256 + HMAC(SHA512). Fixes: 357fb60502ede ("crypto: talitos - add sha224, sha384 and sha512 to existing AEAD algorithms") Signed-off-by: Martin Hicks Acked-by: Horia Geantă Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman commit 87dde1dce9f2f9376a9c4b1ad142b0629979f767 Author: Helge Deller Date: Fri Jul 14 14:49:38 2017 -0700 mm: fix overflow check in expand_upwards() commit 37511fb5c91db93d8bd6e3f52f86e5a7ff7cfcdf upstream. Jörn Engel noticed that the expand_upwards() function might not return -ENOMEM in case the requested address is (unsigned long)-PAGE_SIZE and if the architecture didn't defined TASK_SIZE as multiple of PAGE_SIZE. Affected architectures are arm, frv, m68k, blackfin, h8300 and xtensa which all define TASK_SIZE as 0xffffffff, but since none of those have an upwards-growing stack we currently have no actual issue. Nevertheless let's fix this just in case any of the architectures with an upward-growing stack (currently parisc, metag and partly ia64) define TASK_SIZE similar. Link: http://lkml.kernel.org/r/20170702192452.GA11868@p100.box Fixes: bd726c90b6b8 ("Allow stack to grow up to address space limit") Signed-off-by: Helge Deller Reported-by: Jörn Engel Cc: Hugh Dickins Cc: Oleg Nesterov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit defb6f0beee7ed31c8bad9e93174388a3eda14b7 Author: Andy Lutomirski Date: Thu Jun 29 08:46:12 2017 -0700 selftests/capabilities: Fix the test_execve test commit 796a3bae2fba6810427efdb314a1c126c9490fb3 upstream. test_execve does rather odd mount manipulations to safely create temporary setuid and setgid executables that aren't visible to the rest of the system. Those executables end up in the test's cwd, but that cwd is MNT_DETACHed. The core namespace code considers MNT_DETACHed trees to belong to no mount namespace at all and, in general, MNT_DETACHed trees are only barely function. This interacted with commit 380cf5ba6b0a ("fs: Treat foreign mounts as nosuid") to cause all MNT_DETACHed trees to act as though they're nosuid, breaking the test. Fix it by just not detaching the tree. It's still in a private mount namespace and is therefore still invisible to the rest of the system (except via /proc, and the same nosuid logic will protect all other programs on the system from believing in test_execve's setuid bits). While we're at it, fix some blatant whitespace problems. Reported-by: Naresh Kamboju Fixes: 380cf5ba6b0a ("fs: Treat foreign mounts as nosuid") Cc: "Eric W. Biederman" Cc: Kees Cook Cc: Shuah Khan Cc: Greg KH Cc: linux-kselftest@vger.kernel.org Signed-off-by: Andy Lutomirski Acked-by: Greg Kroah-Hartman Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman commit d61020efab56b36c98ed3f3d522716d9a3f7ab61 Author: Eric W. Biederman Date: Mon Oct 24 17:25:19 2016 -0500 mnt: Make propagate_umount less slow for overlapping mount propagation trees commit 296990deb389c7da21c78030376ba244dc1badf5 upstream. Andrei Vagin pointed out that time to executue propagate_umount can go non-linear (and take a ludicrious amount of time) when the mount propogation trees of the mounts to be unmunted by a lazy unmount overlap. Make the walk of the mount propagation trees nearly linear by remembering which mounts have already been visited, allowing subsequent walks to detect when walking a mount propgation tree or a subtree of a mount propgation tree would be duplicate work and to skip them entirely. Walk the list of mounts whose propgatation trees need to be traversed from the mount highest in the mount tree to mounts lower in the mount tree so that odds are higher that the code will walk the largest trees first, allowing later tree walks to be skipped entirely. Add cleanup_umount_visitation to remover the code's memory of which mounts have been visited. Add the functions last_slave and skip_propagation_subtree to allow skipping appropriate parts of the mount propagation tree without needing to change the logic of the rest of the code. A script to generate overlapping mount propagation trees: $ cat runs.h set -e mount -t tmpfs zdtm /mnt mkdir -p /mnt/1 /mnt/2 mount -t tmpfs zdtm /mnt/1 mount --make-shared /mnt/1 mkdir /mnt/1/1 iteration=10 if [ -n "$1" ] ; then iteration=$1 fi for i in $(seq $iteration); do mount --bind /mnt/1/1 /mnt/1/1 done mount --rbind /mnt/1 /mnt/2 TIMEFORMAT='%Rs' nr=$(( ( 2 ** ( $iteration + 1 ) ) + 1 )) echo -n "umount -l /mnt/1 -> $nr " time umount -l /mnt/1 nr=$(cat /proc/self/mountinfo | grep zdtm | wc -l ) time umount -l /mnt/2 $ for i in $(seq 9 19); do echo $i; unshare -Urm bash ./run.sh $i; done Here are the performance numbers with and without the patch: mhash | 8192 | 8192 | 1048576 | 1048576 mounts | before | after | before | after ------------------------------------------------ 1025 | 0.040s | 0.016s | 0.038s | 0.019s 2049 | 0.094s | 0.017s | 0.080s | 0.018s 4097 | 0.243s | 0.019s | 0.206s | 0.023s 8193 | 1.202s | 0.028s | 1.562s | 0.032s 16385 | 9.635s | 0.036s | 9.952s | 0.041s 32769 | 60.928s | 0.063s | 44.321s | 0.064s 65537 | | 0.097s | | 0.097s 131073 | | 0.233s | | 0.176s 262145 | | 0.653s | | 0.344s 524289 | | 2.305s | | 0.735s 1048577 | | 7.107s | | 2.603s Andrei Vagin reports fixing the performance problem is part of the work to fix CVE-2016-6213. Fixes: a05964f3917c ("[PATCH] shared mounts handling: umount") Reported-by: Andrei Vagin Reviewed-by: Andrei Vagin Signed-off-by: "Eric W. Biederman" Signed-off-by: Greg Kroah-Hartman commit b43f81ef0babcc07db1eaaece37c989204e4ff00 Author: Eric W. Biederman Date: Mon Oct 24 16:16:13 2016 -0500 mnt: In propgate_umount handle visiting mounts in any order commit 99b19d16471e9c3faa85cad38abc9cbbe04c6d55 upstream. While investigating some poor umount performance I realized that in the case of overlapping mount trees where some of the mounts are locked the code has been failing to unmount all of the mounts it should have been unmounting. This failure to unmount all of the necessary mounts can be reproduced with: $ cat locked_mounts_test.sh mount -t tmpfs test-base /mnt mount --make-shared /mnt mkdir -p /mnt/b mount -t tmpfs test1 /mnt/b mount --make-shared /mnt/b mkdir -p /mnt/b/10 mount -t tmpfs test2 /mnt/b/10 mount --make-shared /mnt/b/10 mkdir -p /mnt/b/10/20 mount --rbind /mnt/b /mnt/b/10/20 unshare -Urm --propagation unchaged /bin/sh -c 'sleep 5; if [ $(grep test /proc/self/mountinfo | wc -l) -eq 1 ] ; then echo SUCCESS ; else echo FAILURE ; fi' sleep 1 umount -l /mnt/b wait %% $ unshare -Urm ./locked_mounts_test.sh This failure is corrected by removing the prepass that marks mounts that may be umounted. A first pass is added that umounts mounts if possible and if not sets mount mark if they could be unmounted if they weren't locked and adds them to a list to umount possibilities. This first pass reconsiders the mounts parent if it is on the list of umount possibilities, ensuring that information of umoutability will pass from child to mount parent. A second pass then walks through all mounts that are umounted and processes their children unmounting them or marking them for reparenting. A last pass cleans up the state on the mounts that could not be umounted and if applicable reparents them to their first parent that remained mounted. While a bit longer than the old code this code is much more robust as it allows information to flow up from the leaves and down from the trunk making the order in which mounts are encountered in the umount propgation tree irrelevant. Fixes: 0c56fe31420c ("mnt: Don't propagate unmounts to locked mounts") Reviewed-by: Andrei Vagin Signed-off-by: "Eric W. Biederman" Signed-off-by: Greg Kroah-Hartman commit 2d3d57171b36639ac533a889d6e4453c9c03e96e Author: Eric W. Biederman Date: Mon May 15 14:42:07 2017 -0500 mnt: In umount propagation reparent in a separate pass commit 570487d3faf2a1d8a220e6ee10f472163123d7da upstream. It was observed that in some pathlogical cases that the current code does not unmount everything it should. After investigation it was determined that the issue is that mnt_change_mntpoint can can change which mounts are available to be unmounted during mount propagation which is wrong. The trivial reproducer is: $ cat ./pathological.sh mount -t tmpfs test-base /mnt cd /mnt mkdir 1 2 1/1 mount --bind 1 1 mount --make-shared 1 mount --bind 1 2 mount --bind 1/1 1/1 mount --bind 1/1 1/1 echo grep test-base /proc/self/mountinfo umount 1/1 echo grep test-base /proc/self/mountinfo $ unshare -Urm ./pathological.sh The expected output looks like: 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 The output without the fix looks like: 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 52 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 That last mount in the output was in the propgation tree to be unmounted but was missed because the mnt_change_mountpoint changed it's parent before the walk through the mount propagation tree observed it. Fixes: 1064f874abc0 ("mnt: Tuck mounts under others instead of creating shadow/side mounts.") Acked-by: Andrei Vagin Reviewed-by: Ram Pai Signed-off-by: "Eric W. Biederman" Signed-off-by: Greg Kroah-Hartman commit ec39c02d391259a88629177526c7842f6044bf50 Author: Michael Kelley Date: Thu May 18 10:46:07 2017 -0700 Drivers: hv: vmbus: Close timing hole that can corrupt per-cpu page commit 13b9abfc92be7c4454bff912021b9f835dea6e15 upstream. Extend the disabling of preemption to include the hypercall so that another thread can't get the CPU and corrupt the per-cpu page used for hypercall arguments. Signed-off-by: Michael Kelley Signed-off-by: K. Y. Srinivasan Signed-off-by: Greg Kroah-Hartman commit 9b5eaef15cfc272707a71e4176c714eedcd8a093 Author: Johan Hovold Date: Fri Jun 9 10:59:07 2017 +0100 nvmem: core: fix leaks on registration errors commit 3360acdf839170b612f5b212539694c20e3f16d0 upstream. Make sure to deregister and release the nvmem device and underlying memory on registration errors. Note that the private data must be freed using put_device() once the struct device has been initialised. Also note that there's a related reference leak in the deregistration function as reported by Mika Westerberg which is being fixed separately. Fixes: b6c217ab9be6 ("nvmem: Add backwards compatibility support for older EEPROM drivers.") Fixes: eace75cfdcf7 ("nvmem: Add a simple NVMEM framework for nvmem providers") Cc: Andrew Lunn Cc: Srinivas Kandagatla Cc: Mika Westerberg Signed-off-by: Johan Hovold Acked-by: Andrey Smirnov Signed-off-by: Srinivas Kandagatla Signed-off-by: Greg Kroah-Hartman commit 98ea29734ec8feac8fe35e6dc67d1ed52a0bdc9b Author: Paul E. McKenney Date: Fri Apr 28 20:11:09 2017 -0700 rcu: Add memory barriers for NOCB leader wakeup commit 6b5fc3a1331810db407c9e0e673dc1837afdc9d0 upstream. Wait/wakeup operations do not guarantee ordering on their own. Instead, either locking or memory barriers are required. This commit therefore adds memory barriers to wake_nocb_leader() and nocb_leader_wait(). Signed-off-by: Paul E. McKenney Tested-by: Krister Johansen Signed-off-by: Greg Kroah-Hartman commit 994b8608572b27ed758a56b54235c466c3fe3d28 Author: Adam Borowski Date: Sat Jun 3 09:35:06 2017 +0200 vt: fix unchecked __put_user() in tioclinux ioctls commit 6987dc8a70976561d22450b5858fc9767788cc1c upstream. Only read access is checked before this call. Actually, at the moment this is not an issue, as every in-tree arch does the same manual checks for VERIFY_READ vs VERIFY_WRITE, relying on the MMU to tell them apart, but this wasn't the case in the past and may happen again on some odd arch in the future. If anyone cares about 3.7 and earlier, this is a security hole (untested) on real 80386 CPUs. Signed-off-by: Adam Borowski Signed-off-by: Greg Kroah-Hartman commit a6b177129a76f49e77b5b42d5f74308ebb86e165 Author: Dong Bo Date: Tue Apr 25 14:11:29 2017 +0800 arm64: Preventing READ_IMPLIES_EXEC propagation commit 48f99c8ec0b25756d0283ab058826ae07d14fad7 upstream. Like arch/arm/, we inherit the READ_IMPLIES_EXEC personality flag across fork(). This is undesirable for a number of reasons: * ELF files that don't require executable stack can end up with it anyway * We end up performing un-necessary I-cache maintenance when mapping what should be non-executable pages * Restricting what is executable is generally desirable when defending against overflow attacks This patch clears the personality flag when setting up the personality for newly spwaned native tasks. Given that semi-recent AArch64 toolchains emit a non-executable PT_GNU_STACK header, userspace applications can already not rely on READ_IMPLIES_EXEC so shouldn't be adversely affected by this change. Reported-by: Peter Maydell Signed-off-by: Dong Bo [will: added comment to compat code, rewrote commit message] Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit 618986c4bcda58a6ec5fc3244f061a756d549548 Author: Marc Zyngier Date: Wed Jun 21 22:45:08 2017 +0100 ARM64: dts: marvell: armada37xx: Fix timer interrupt specifiers commit 88cda00733f0731711c76e535d4972c296ac512e upstream. Contrary to popular belief, PPIs connected to a GICv3 to not have an affinity field similar to that of GICv2. That is consistent with the fact that GICv3 is designed to accomodate thousands of CPUs, and fitting them as a bitmap in a byte is... difficult. Fixes: adbc3695d9e4 ("arm64: dts: add the Marvell Armada 3700 family and a development board") Signed-off-by: Marc Zyngier Signed-off-by: Gregory CLEMENT Signed-off-by: Greg Kroah-Hartman commit 21f81d9022bbb7f68ff2194f3321e3b381f8b6d6 Author: Balbir Singh Date: Thu Jun 29 21:57:26 2017 +1000 powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR commit 1e2a516e89fc412a754327522ab271b42f99c6b4 upstream. This patch fixes a crash seen while doing a kexec from radix mode to hash mode. Key 0 is special in hash and used in the RPN by default, we set the key values to 0 today. In radix mode key 0 is used to control supervisor<->user access. In hash key 0 is used by default, so the first instruction after the switch causes a crash on kexec. Commit 3b10d0095a1e ("powerpc/mm/radix: Prevent kernel execution of user space") introduced the setting of IAMR and AMOR values to prevent execution of user mode instructions from supervisor mode. We need to clean up these SPR's on kexec. Fixes: 3b10d0095a1e ("powerpc/mm/radix: Prevent kernel execution of user space") Reported-by: Benjamin Herrenschmidt Signed-off-by: Balbir Singh Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit 2ee500dcfdcb688aceb06ea164541a5e99aecfac Author: Kees Cook Date: Fri Jul 7 11:57:29 2017 -0700 exec: Limit arg stack to at most 75% of _STK_LIM commit da029c11e6b12f321f36dac8771e833b65cec962 upstream. To avoid pathological stack usage or the need to special-case setuid execs, just limit all arg stack usage to at most 75% of _STK_LIM (6MB). Signed-off-by: Kees Cook Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 4d5266f108957aff03a335cf3ba23b3742290bcf Author: Kees Cook Date: Mon Jul 10 15:52:51 2017 -0700 s390: reduce ELF_ET_DYN_BASE commit a73dc5370e153ac63718d850bddf0c9aa9d871e6 upstream. Now that explicitly executed loaders are loaded in the mmap region, we have more freedom to decide where we position PIE binaries in the address space to avoid possible collisions with mmap or stack regions. For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit address space for 32-bit pointers. On 32-bit use 4MB, which is the traditional x86 minimum load location, likely to avoid historically requiring a 4MB page table entry when only a portion of the first 4MB would be used (since the NULL address is avoided). For s390 the position could be 0x10000, but that is needlessly close to the NULL address. Link: http://lkml.kernel.org/r/1498154792-49952-5-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: James Hogan Cc: Pratyush Anand Cc: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit fa3a378b228b6f8d760fb470e7f17b387919e22f Author: Kees Cook Date: Mon Jul 10 15:52:47 2017 -0700 powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB commit 47ebb09d54856500c5a5e14824781902b3bb738e upstream. Now that explicitly executed loaders are loaded in the mmap region, we have more freedom to decide where we position PIE binaries in the address space to avoid possible collisions with mmap or stack regions. For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit address space for 32-bit pointers. On 32-bit use 4MB, which is the traditional x86 minimum load location, likely to avoid historically requiring a 4MB page table entry when only a portion of the first 4MB would be used (since the NULL address is avoided). Link: http://lkml.kernel.org/r/1498154792-49952-4-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook Tested-by: Michael Ellerman Acked-by: Michael Ellerman Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: James Hogan Cc: Pratyush Anand Cc: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 4e33130d2c1b989f1b03667d7c2bf134f221ae64 Author: Kees Cook Date: Mon Jul 10 15:52:44 2017 -0700 arm64: move ELF_ET_DYN_BASE to 4GB / 4MB commit 02445990a96e60a67526510d8b00f7e3d14101c3 upstream. Now that explicitly executed loaders are loaded in the mmap region, we have more freedom to decide where we position PIE binaries in the address space to avoid possible collisions with mmap or stack regions. For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit address space for 32-bit pointers. On 32-bit use 4MB, to match ARM. This could be 0x8000, the standard ET_EXEC load address, but that is needlessly close to the NULL address, and anyone running arm compat PIE will have an MMU, so the tight mapping is not needed. Link: http://lkml.kernel.org/r/1498251600-132458-4-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook Cc: Ard Biesheuvel Cc: Catalin Marinas Cc: Mark Rutland Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 612584f74dfe5a1345b6153d0d020d3514ddb4fa Author: Kees Cook Date: Mon Jul 10 15:52:40 2017 -0700 arm: move ELF_ET_DYN_BASE to 4MB commit 6a9af90a3bcde217a1c053e135f5f43e5d5fafbd upstream. Now that explicitly executed loaders are loaded in the mmap region, we have more freedom to decide where we position PIE binaries in the address space to avoid possible collisions with mmap or stack regions. 4MB is chosen here mainly to have parity with x86, where this is the traditional minimum load location, likely to avoid historically requiring a 4MB page table entry when only a portion of the first 4MB would be used (since the NULL address is avoided). For ARM the position could be 0x8000, the standard ET_EXEC load address, but that is needlessly close to the NULL address, and anyone running PIE on 32-bit ARM will have an MMU, so the tight mapping is not needed. Link: http://lkml.kernel.org/r/1498154792-49952-2-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: James Hogan Cc: Pratyush Anand Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Alexander Viro Cc: Andy Lutomirski Cc: Daniel Micay Cc: Dmitry Safonov Cc: Grzegorz Andrejczuk Cc: Kees Cook Cc: Masahiro Yamada Cc: Qualys Security Advisory Cc: Rik van Riel Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 9b1bbf6ea9b2b596ba271bec23b93c48181ad896 Author: Kees Cook Date: Mon Jul 10 15:52:37 2017 -0700 binfmt_elf: use ELF_ET_DYN_BASE only for PIE commit eab09532d40090698b05a07c1c87f39fdbc5fab5 upstream. The ELF_ET_DYN_BASE position was originally intended to keep loaders away from ET_EXEC binaries. (For example, running "/lib/ld-linux.so.2 /bin/cat" might cause the subsequent load of /bin/cat into where the loader had been loaded.) With the advent of PIE (ET_DYN binaries with an INTERP Program Header), ELF_ET_DYN_BASE continued to be used since the kernel was only looking at ET_DYN. However, since ELF_ET_DYN_BASE is traditionally set at the top 1/3rd of the TASK_SIZE, a substantial portion of the address space is unused. For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs are loaded above the mmap region. This means they can be made to collide (CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with pathological stack regions. Lowering ELF_ET_DYN_BASE solves both by moving programs below the mmap region in all cases, and will now additionally avoid programs falling back to the mmap region by enforcing MAP_FIXED for program loads (i.e. if it would have collided with the stack, now it will fail to load instead of falling back to the mmap region). To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP) are loaded into the mmap region, leaving space available for either an ET_EXEC binary with a fixed location or PIE being loaded into mmap by the loader. Only PIE programs are loaded offset from ELF_ET_DYN_BASE, which means architectures can now safely lower their values without risk of loaders colliding with their subsequently loaded programs. For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to use the entire 32-bit address space for 32-bit pointers. Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and suggestions on how to implement this solution. Fixes: d1fd836dcf00 ("mm: split ET_DYN ASLR from mmap ASLR") Link: http://lkml.kernel.org/r/20170621173201.GA114489@beast Signed-off-by: Kees Cook Acked-by: Rik van Riel Cc: Daniel Micay Cc: Qualys Security Advisory Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Alexander Viro Cc: Dmitry Safonov Cc: Andy Lutomirski Cc: Grzegorz Andrejczuk Cc: Masahiro Yamada Cc: Benjamin Herrenschmidt Cc: Catalin Marinas Cc: Heiko Carstens Cc: James Hogan Cc: Martin Schwidefsky Cc: Michael Ellerman Cc: Paul Mackerras Cc: Pratyush Anand Cc: Russell King Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 422b6365be82d8d2b225e0e24af46b25d6ab7990 Author: Cyril Bur Date: Mon Jul 10 15:52:21 2017 -0700 checkpatch: silence perl 5.26.0 unescaped left brace warnings commit 8d81ae05d0176da1c54aeaed697fa34be5c5575e upstream. As of perl 5, version 26, subversion 0 (v5.26.0) some new warnings have occurred when running checkpatch. Unescaped left brace in regex is deprecated here (and will be fatal in Perl 5.30), passed through in regex; marked by <-- HERE in m/^(.\s*){ <-- HERE \s*/ at scripts/checkpatch.pl line 3544. Unescaped left brace in regex is deprecated here (and will be fatal in Perl 5.30), passed through in regex; marked by <-- HERE in m/^(.\s*){ <-- HERE \s*/ at scripts/checkpatch.pl line 3885. Unescaped left brace in regex is deprecated here (and will be fatal in Perl 5.30), passed through in regex; marked by <-- HERE in m/^(\+.*(?:do|\))){ <-- HERE / at scripts/checkpatch.pl line 4374. It seems perfectly reasonable to do as the warning suggests and simply escape the left brace in these three locations. Link: http://lkml.kernel.org/r/20170607060135.17384-1-cyrilbur@gmail.com Signed-off-by: Cyril Bur Acked-by: Joe Perches Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 29d52923b259ddba006265a42a818bd50fcb6fd5 Author: Sahitya Tummala Date: Mon Jul 10 15:50:00 2017 -0700 fs/dcache.c: fix spin lockup issue on nlru->lock commit b17c070fb624cf10162cf92ea5e1ec25cd8ac176 upstream. __list_lru_walk_one() acquires nlru spin lock (nlru->lock) for longer duration if there are more number of items in the lru list. As per the current code, it can hold the spin lock for upto maximum UINT_MAX entries at a time. So if there are more number of items in the lru list, then "BUG: spinlock lockup suspected" is observed in the below path: spin_bug+0x90 do_raw_spin_lock+0xfc _raw_spin_lock+0x28 list_lru_add+0x28 dput+0x1c8 path_put+0x20 terminate_walk+0x3c path_lookupat+0x100 filename_lookup+0x6c user_path_at_empty+0x54 SyS_faccessat+0xd0 el0_svc_naked+0x24 This nlru->lock is acquired by another CPU in this path - d_lru_shrink_move+0x34 dentry_lru_isolate_shrink+0x48 __list_lru_walk_one.isra.10+0x94 list_lru_walk_node+0x40 shrink_dcache_sb+0x60 do_remount_sb+0xbc do_emergency_remount+0xb0 process_one_work+0x228 worker_thread+0x2e0 kthread+0xf4 ret_from_fork+0x10 Fix this lockup by reducing the number of entries to be shrinked from the lru list to 1024 at once. Also, add cond_resched() before processing the lru list again. Link: http://marc.info/?t=149722864900001&r=1&w=2 Link: http://lkml.kernel.org/r/1498707575-2472-1-git-send-email-stummala@codeaurora.org Signed-off-by: Sahitya Tummala Suggested-by: Jan Kara Suggested-by: Vladimir Davydov Acked-by: Vladimir Davydov Cc: Alexander Polakov Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 7d212c20781ed5c21e9b5eddd3aad6cbf15ae541 Author: Sahitya Tummala Date: Mon Jul 10 15:49:57 2017 -0700 mm/list_lru.c: fix list_lru_count_node() to be race free commit 2c80cd57c74339889a8752b20862a16c28929c3a upstream. list_lru_count_node() iterates over all memcgs to get the total number of entries on the node but it can race with memcg_drain_all_list_lrus(), which migrates the entries from a dead cgroup to another. This can return incorrect number of entries from list_lru_count_node(). Fix this by keeping track of entries per node and simply return it in list_lru_count_node(). Link: http://lkml.kernel.org/r/1498707555-30525-1-git-send-email-stummala@codeaurora.org Signed-off-by: Sahitya Tummala Acked-by: Vladimir Davydov Cc: Jan Kara Cc: Alexander Polakov Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 35038e4ff37512b1f56196a7e4ca91933974445b Author: Marcin Nowakowski Date: Thu Jul 6 15:35:31 2017 -0700 kernel/extable.c: mark core_kernel_text notrace commit c0d80ddab89916273cb97114889d3f337bc370ae upstream. core_kernel_text is used by MIPS in its function graph trace processing, so having this method traced leads to an infinite set of recursive calls such as: Call Trace: ftrace_return_to_handler+0x50/0x128 core_kernel_text+0x10/0x1b8 prepare_ftrace_return+0x6c/0x114 ftrace_graph_caller+0x20/0x44 return_to_handler+0x10/0x30 return_to_handler+0x0/0x30 return_to_handler+0x0/0x30 ftrace_ops_no_ops+0x114/0x1bc core_kernel_text+0x10/0x1b8 core_kernel_text+0x10/0x1b8 core_kernel_text+0x10/0x1b8 ftrace_ops_no_ops+0x114/0x1bc core_kernel_text+0x10/0x1b8 prepare_ftrace_return+0x6c/0x114 ftrace_graph_caller+0x20/0x44 (...) Mark the function notrace to avoid it being traced. Link: http://lkml.kernel.org/r/1498028607-6765-1-git-send-email-marcin.nowakowski@imgtec.com Signed-off-by: Marcin Nowakowski Reviewed-by: Masami Hiramatsu Cc: Peter Zijlstra Cc: Thomas Meyer Cc: Ingo Molnar Cc: Steven Rostedt Cc: Daniel Borkmann Cc: Paul Gortmaker Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 687203d9a7c3de69552a175c70f26b4043c62e56 Author: Kirill A. Shutemov Date: Thu Jul 6 15:35:28 2017 -0700 thp, mm: fix crash due race in MADV_FREE handling commit bbf29ffc7f963bb894f84f0580c70cfea01c3892 upstream. Reinette reported the following crash: BUG: Bad page state in process log2exe pfn:57600 page:ffffea00015d8000 count:0 mapcount:0 mapping: (null) index:0x20200 flags: 0x4000000000040019(locked|uptodate|dirty|swapbacked) raw: 4000000000040019 0000000000000000 0000000000020200 00000000ffffffff raw: ffffea00015d8020 ffffea00015d8020 0000000000000000 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: 0x1(locked) Modules linked in: rfcomm 8021q bnep intel_rapl x86_pkg_temp_thermal coretemp efivars btusb btrtl btbcm pwm_lpss_pci snd_hda_codec_hdmi btintel pwm_lpss snd_hda_codec_realtek snd_soc_skl snd_hda_codec_generic snd_soc_skl_ipc spi_pxa2xx_platform snd_soc_sst_ipc snd_soc_sst_dsp i2c_designware_platform i2c_designware_core snd_hda_ext_core snd_soc_sst_match snd_hda_intel snd_hda_codec mei_me snd_hda_core mei snd_soc_rt286 snd_soc_rl6347a snd_soc_core efivarfs CPU: 1 PID: 354 Comm: log2exe Not tainted 4.12.0-rc7-test-test #19 Hardware name: Intel corporation NUC6CAYS/NUC6CAYB, BIOS AYAPLCEL.86A.0027.2016.1108.1529 11/08/2016 Call Trace: bad_page+0x16a/0x1f0 free_pages_check_bad+0x117/0x190 free_hot_cold_page+0x7b1/0xad0 __put_page+0x70/0xa0 madvise_free_huge_pmd+0x627/0x7b0 madvise_free_pte_range+0x6f8/0x1150 __walk_page_range+0x6b5/0xe30 walk_page_range+0x13b/0x310 madvise_free_page_range.isra.16+0xad/0xd0 madvise_free_single_vma+0x2e4/0x470 SyS_madvise+0x8ce/0x1450 If somebody frees the page under us and we hold the last reference to it, put_page() would attempt to free the page before unlocking it. The fix is trivial reorder of operations. Dave said: "I came up with the exact same patch. For posterity, here's the test case, generated by syzkaller and trimmed down by Reinette: https://www.sr71.net/~dave/intel/log2.c And the config that helps detect this: https://www.sr71.net/~dave/intel/config-log2" Fixes: b8d3c4c3009d ("mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called") Link: http://lkml.kernel.org/r/20170628101249.17879-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov Reported-by: Reinette Chatre Acked-by: Dave Hansen Acked-by: Michal Hocko Acked-by: Minchan Kim Cc: Huang Ying Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit f795825cd63572fa9bb3dae3e50cb9b732017683 Author: Ben Hutchings Date: Thu May 25 12:58:33 2017 +0000 tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth commit 98dcea0cfd04e083ac74137ceb9a632604740e2d upstream. liblockdep has been broken since commit 75dd602a5198 ("lockdep: Fix lock_chain::base size"), as that adds a check that MAX_LOCK_DEPTH is within the range of lock_chain::depth and in liblockdep it is much too large. That should have resulted in a compiler error, but didn't because: - the check uses ARRAY_SIZE(), which isn't yet defined in liblockdep so is assumed to be an (undeclared) function - putting a function call inside a BUILD_BUG_ON() expression quietly turns it into some nonsense involving a variable-length array It did produce a compiler warning, but I didn't notice because liblockdep already produces too many warnings if -Wall is enabled (which I'll fix shortly). Even before that commit, which reduced lock_chain::depth from 8 bits to 6, MAX_LOCK_DEPTH was too large. Signed-off-by: Ben Hutchings Signed-off-by: Sasha Levin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: a.p.zijlstra@chello.nl Link: http://lkml.kernel.org/r/20170525130005.5947-3-alexander.levin@verizon.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit e03e4acf59845d38995589f0958ec3cb55305414 Author: Helge Deller Date: Mon May 29 17:14:16 2017 +0200 parisc/mm: Ensure IRQs are off in switch_mm() commit 649aa24254e85bf6bd7807dd372d083707852b1f upstream. This is because of commit f98db6013c55 ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler") in which switch_mm_irqs_off() is called by the scheduler, vs switch_mm() which is used by use_mm(). This patch lets the parisc code mirror the x86 and powerpc code, ie. it disables interrupts in switch_mm(), and optimises the scheduler case by defining switch_mm_irqs_off(). Signed-off-by: Helge Deller Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman commit b05deb78923ed491eec923b3f01d109163437a95 Author: Thomas Bogendoerfer Date: Mon Jul 3 10:38:05 2017 +0200 parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs commit 33f9e02495d15a061f0c94ef46f5103a2d0c20f3 upstream. Enabling parport pc driver on a B2600 (and probably other 64bit PARISC systems) produced following BUG: CPU: 0 PID: 1 Comm: swapper Not tainted 4.12.0-rc5-30198-g1132d5e #156 task: 000000009e050000 task.stack: 000000009e04c000 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00001000000001101111111100001111 Not tainted r00-03 000000ff0806ff0f 000000009e04c990 0000000040871b78 000000009e04cac0 r04-07 0000000040c14de0 ffffffffffffffff 000000009e07f098 000000009d82d200 r08-11 000000009d82d210 0000000000000378 0000000000000000 0000000040c345e0 r12-15 0000000000000005 0000000040c345e0 0000000000000000 0000000040c9d5e0 r16-19 0000000040c345e0 00000000f00001c4 00000000f00001bc 0000000000000061 r20-23 000000009e04ce28 0000000000000010 0000000000000010 0000000040b89e40 r24-27 0000000000000003 0000000000ffffff 000000009d82d210 0000000040c14de0 r28-31 0000000000000000 000000009e04ca90 000000009e04cb40 0000000000000000 sr00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000 sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000404aece0 00000000404aece4 IIR: 03ffe01f ISR: 0000000010340000 IOR: 000001781304cac8 CPU: 0 CR30: 000000009e04c000 CR31: 00000000e2976de2 ORIG_R28: 0000000000000200 IAOQ[0]: sba_dma_supported+0x80/0xd0 IAOQ[1]: sba_dma_supported+0x84/0xd0 RP(r2): parport_pc_probe_port+0x178/0x1200 Cause is a call to dma_coerce_mask_and_coherenet in parport_pc_probe_port, which PARISC DMA API doesn't handle very nicely. This commit gives back DMA_ERROR_CODE for DMA API calls, if device isn't capable of DMA transaction. Signed-off-by: Thomas Bogendoerfer Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman commit e8de691feb0d4aba93a96b6e48b73d27870f8922 Author: Eric Biggers Date: Mon Jun 12 23:18:30 2017 -0700 parisc: use compat_sys_keyctl() commit b0f94efd5aa8daa8a07d7601714c2573266cd4c9 upstream. Architectures with a compat syscall table must put compat_sys_keyctl() in it, not sys_keyctl(). The parisc architecture was not doing this; fix it. Signed-off-by: Eric Biggers Acked-by: Helge Deller Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman commit 1d4e20497b07e023bc908269ca5c52a3850c1a78 Author: Helge Deller Date: Sun Jul 2 22:00:41 2017 +0200 parisc: Report SIGSEGV instead of SIGBUS when running out of stack commit 247462316f85a9e0479445c1a4223950b68ffac1 upstream. When a process runs out of stack the parisc kernel wrongly faults with SIGBUS instead of the expected SIGSEGV signal. This example shows how the kernel faults: do_page_fault() command='a.out' type=15 address=0xfaac2000 in libc-2.24.so[f8308000+16c000] trap #15: Data TLB miss fault, vm_start = 0xfa2c2000, vm_end = 0xfaac2000 The vma->vm_end value is the first address which does not belong to the vma, so adjust the check to include vma->vm_end to the range for which to send the SIGSEGV signal. This patch unbreaks building the debian libsigsegv package. Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman commit 46bed2221cb9d05a5ed4164ec88c22bb474bfff7 Author: Suzuki K Poulose Date: Fri Jun 30 10:58:28 2017 +0100 irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity commit 866d7c1b0a3c70387646c4e455e727a58c5d465a upstream. The GICv3 driver doesn't check if the target CPU for gic_set_affinity is valid before going ahead and making the changes. This triggers the following splat with KASAN: [ 141.189434] BUG: KASAN: global-out-of-bounds in gic_set_affinity+0x8c/0x140 [ 141.189704] Read of size 8 at addr ffff200009741d20 by task swapper/1/0 [ 141.189958] [ 141.190158] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.12.0-rc7 [ 141.190458] Hardware name: Foundation-v8A (DT) [ 141.190658] Call trace: [ 141.190908] [] dump_backtrace+0x0/0x328 [ 141.191224] [] show_stack+0x14/0x20 [ 141.191507] [] dump_stack+0xa4/0xc8 [ 141.191858] [] print_address_description+0x13c/0x250 [ 141.192219] [] kasan_report+0x210/0x300 [ 141.192547] [] __asan_load8+0x84/0x98 [ 141.192874] [] gic_set_affinity+0x8c/0x140 [ 141.193158] [] irq_do_set_affinity+0x54/0xb8 [ 141.193473] [] irq_set_affinity_locked+0x64/0xf0 [ 141.193828] [] __irq_set_affinity+0x48/0x78 [ 141.194158] [] arm_perf_starting_cpu+0x104/0x150 [ 141.194513] [] cpuhp_invoke_callback+0x17c/0x1f8 [ 141.194783] [] notify_cpu_starting+0x8c/0xb8 [ 141.195130] [] secondary_start_kernel+0x15c/0x200 [ 141.195390] [<0000000080db81b4>] 0x80db81b4 [ 141.195603] [ 141.195685] The buggy address belongs to the variable: [ 141.196012] __cpu_logical_map+0x200/0x220 [ 141.196176] [ 141.196315] Memory state around the buggy address: [ 141.196586] ffff200009741c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 141.196913] ffff200009741c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 141.197158] >ffff200009741d00: 00 00 00 00 fa fa fa fa 00 00 00 00 00 00 00 00 [ 141.197487] ^ [ 141.197758] ffff200009741d80: 00 00 00 00 00 00 00 00 fa fa fa fa 00 00 00 00 [ 141.198060] ffff200009741e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 141.198358] ================================================================== [ 141.198609] Disabling lock debugging due to kernel taint [ 141.198961] CPU1: Booted secondary processor [410fd051] This patch adds the check to make sure the cpu is valid. Fixes: commit 021f653791ad17e03f98 ("irqchip: gic-v3: Initial support for GICv3") Signed-off-by: Suzuki K Poulose Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit da3538c85910f7be46be7cff083dc8b615b038ad Author: Alex Deucher Date: Fri Jun 2 16:30:46 2017 -0400 drm/amdgpu/gfx6: properly cache mc_arb_ramcfg commit 6653ebd48f493efe3f3598ff3fe7b3d5451665df upstream. This was missing for gfx6. Acked-by: Huang Rui Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit fc0a2569a0ae9af51da2632715d84be7ec97df7c Author: Srinivas Dasari Date: Fri Jul 7 01:43:40 2017 +0300 cfg80211: Check if NAN service ID is of expected size commit 0a27844ce86d039d74221dd56cd8c0349b146b63 upstream. nla policy checks for only maximum length of the attribute data when the attribute type is NLA_BINARY. If userspace sends less data than specified, cfg80211 may access illegal memory. When type is NLA_UNSPEC, nla policy check ensures that userspace sends minimum specified length number of bytes. Remove type assignment to NLA_BINARY from nla_policy of NL80211_NAN_FUNC_SERVICE_ID to make these NLA_UNSPEC and to make sure minimum NL80211_NAN_FUNC_SERVICE_ID_LEN bytes are received from userspace with NL80211_NAN_FUNC_SERVICE_ID. Fixes: a442b761b24 ("cfg80211: add add_nan_func / del_nan_func") Signed-off-by: Srinivas Dasari Signed-off-by: Jouni Malinen Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit 0e60de601f58acb4682c43cb77362f1dd0800655 Author: Srinivas Dasari Date: Fri Jul 7 01:43:39 2017 +0300 cfg80211: Check if PMKID attribute is of expected size commit 9361df14d1cbf966409d5d6f48bb334384fbe138 upstream. nla policy checks for only maximum length of the attribute data when the attribute type is NLA_BINARY. If userspace sends less data than specified, the wireless drivers may access illegal memory. When type is NLA_UNSPEC, nla policy check ensures that userspace sends minimum specified length number of bytes. Remove type assignment to NLA_BINARY from nla_policy of NL80211_ATTR_PMKID to make this NLA_UNSPEC and to make sure minimum WLAN_PMKID_LEN bytes are received from userspace with NL80211_ATTR_PMKID. Fixes: 67fbb16be69d ("nl80211: PMKSA caching support") Signed-off-by: Srinivas Dasari Signed-off-by: Jouni Malinen Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit 880b98c33913e23c6f97e5f838b235faa9d54abb Author: Srinivas Dasari Date: Fri Jul 7 01:43:42 2017 +0300 cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES commit d7f13f7450369281a5d0ea463cc69890a15923ae upstream. validate_scan_freqs() retrieves frequencies from attributes nested in the attribute NL80211_ATTR_SCAN_FREQUENCIES with nla_get_u32(), which reads 4 bytes from each attribute without validating the size of data received. Attributes nested in NL80211_ATTR_SCAN_FREQUENCIES don't have an nla policy. Validate size of each attribute before parsing to avoid potential buffer overread. Fixes: 2a519311926 ("cfg80211/nl80211: scanning (and mac80211 update to use it)") Signed-off-by: Srinivas Dasari Signed-off-by: Jouni Malinen Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit 61d3f24df74b44a00f4662a679446d0ed1f5fdf5 Author: Srinivas Dasari Date: Fri Jul 7 01:43:41 2017 +0300 cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE commit 8feb69c7bd89513be80eb19198d48f154b254021 upstream. Buffer overread may happen as nl80211_set_station() reads 4 bytes from the attribute NL80211_ATTR_LOCAL_MESH_POWER_MODE without validating the size of data received when userspace sends less than 4 bytes of data with NL80211_ATTR_LOCAL_MESH_POWER_MODE. Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE to avoid the buffer overread. Fixes: 3b1c5a5307f ("{cfg,nl}80211: mesh power mode primitives and userspace access") Signed-off-by: Srinivas Dasari Signed-off-by: Jouni Malinen Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit 5c5b4bff5f4edb04285e8affb717e4830e4fc6bc Author: Daniel Kiper Date: Thu Jun 22 12:51:36 2017 +0200 efi: Process the MEMATTR table only if EFI_MEMMAP is enabled commit 457ea3f7e97881f937136ce0ba1f29f82b9abdb0 upstream. Otherwise e.g. Xen dom0 on x86_64 EFI platforms crashes. In theory we can check EFI_PARAVIRT too, however, EFI_MEMMAP looks more targeted and covers more cases. Signed-off-by: Daniel Kiper Reviewed-by: Ard Biesheuvel Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: andrew.cooper3@citrix.com Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: linux-efi@vger.kernel.org Cc: matt@codeblueprint.co.uk Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1498128697-12943-2-git-send-email-daniel.kiper@oracle.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 046f5b7733ab3363ec0f5b430ac98030411e5e69 Author: Peter S. Housel Date: Mon Jun 12 11:46:22 2017 +0100 brcmfmac: Fix glom_skb leak in brcmf_sdiod_recv_chain commit 5ea59db8a375216e6c915c5586f556766673b5a7 upstream. An earlier change to this function (3bdae810721b) fixed a leak in the case of an unsuccessful call to brcmf_sdiod_buffrw(). However, the glom_skb buffer, used for emulating a scattering read, is never used or referenced after its contents are copied into the destination buffers, and therefore always needs to be freed by the end of the function. Fixes: 3bdae810721b ("brcmfmac: Fix glob_skb leak in brcmf_sdiod_recv_chain") Fixes: a413e39a38573 ("brcmfmac: fix brcmf_sdcard_recv_chain() for host without sg support") Signed-off-by: Peter S. Housel Signed-off-by: Arend van Spriel Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman commit 305b4b1b5bbadb65b9eb4a412f751a3258de1fdf Author: Christophe Jaillet Date: Wed Jun 21 07:45:53 2017 +0200 brcmfmac: Fix a memory leak in error handling path in 'brcmf_cfg80211_attach' commit 57c00f2fac512837f8de73474ec1f54020015bae upstream. If 'wiphy_new()' fails, we leak 'ops'. Add a new label in the error handling path to free it in such a case. Fixes: 5c22fb85102a7 ("brcmfmac: add wowl gtk rekeying offload support") Signed-off-by: Christophe JAILLET Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman commit a6378366f6bbeefdb806fc0c2bda7914cb942589 Author: Bart Van Assche Date: Wed Jun 14 13:27:50 2017 -0600 block: Fix a blk_exit_rl() regression commit dc9edc44de6cd7cc8cc7f5b36c1adb221eda3207 upstream. Avoid that the following complaint is reported: BUG: sleeping function called from invalid context at kernel/workqueue.c:2790 in_atomic(): 1, irqs_disabled(): 0, pid: 41, name: rcuop/3 1 lock held by rcuop/3/41: #0: (rcu_callback){......}, at: [] rcu_nocb_kthread+0x282/0x500 Call Trace: dump_stack+0x86/0xcf ___might_sleep+0x174/0x260 __might_sleep+0x4a/0x80 flush_work+0x7e/0x2e0 __cancel_work_timer+0x143/0x1c0 cancel_work_sync+0x10/0x20 blk_throtl_exit+0x25/0x60 blkcg_exit_queue+0x35/0x40 blk_release_queue+0x42/0x130 kobject_put+0xa9/0x190 This happens since we invoke callbacks that need to block from the queue release handler. Fix this by pushing the final release to a workqueue. Reported-by: Ross Zwisler Fixes: commit b425e5049258 ("block: Avoid that blk_exit_rl() triggers a use-after-free") Signed-off-by: Bart Van Assche Tested-by: Ross Zwisler Signed-off-by: Greg Kroah-Hartman Updated changelog Signed-off-by: Jens Axboe Cc: Laura Abbott Signed-off-by: Greg Kroah-Hartman commit 27a85c4accc0e59fdeb9a408c72f735a3c4f9a1a Author: Nitin Gupta Date: Thu Jun 22 17:15:08 2017 -0700 sparc64: Fix gup_huge_pmd [ Upstream commit dbd2667a4fb9ce4f547982b07cd69dda127c47ea ] The function assumes that each PMD points to head of a huge page. This is not correct as a PMD can point to start of any 8M region with a, say 256M, hugepage. The fix ensures that it points to the correct head of any PMD huge page. Cc: Julian Calaby Signed-off-by: Nitin Gupta Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 64e3d664517f0075ef3848bfecb6a21276ed5cc3 Author: Nagarathnam Muthusamy Date: Mon Jun 19 13:08:50 2017 -0400 Adding the type of exported symbols [ Upstream commit f5a651f1d5e524cab345250a783702fb6a3f14d6 ] Missing symbol type for few functions prevents genksyms from generating symbol versions for those functions. This patch fixes them. Signed-off-by: Nagarathnam Muthusamy Reviewed-by: Babu Moger Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2ea8b8188f423b089394d5a86b481716ea58150a Author: Nagarathnam Muthusamy Date: Mon Jun 19 13:08:49 2017 -0400 sed regex in Makefile.build requires line break between exported symbols [ Upstream commit d16c0649feb4fe4e814f44803df5a617769c3233 ] The following regex in Makefile.build matches only one ___EXPORT_SYMBOL per line. sed 's/.*___EXPORT_SYMBOL[[:space:]]*\([a-zA-Z0-9_]*\)[[:space:]]*,.*/EXPORT_SYMBOL(\1);/' ATOMIC_OPS macro in atomic_64.S expands multiple symbols in same line hence version generation is done only for the last matched symbol. This patch adds new line between the symbol expansions. Signed-off-by: Nagarathnam Muthusamy Reviewed-by: Babu Moger Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4aac8676b889d83bfae4111cdd4f3c6fad3ff1bb Author: Nagarathnam Muthusamy Date: Mon Jun 19 13:08:48 2017 -0400 Adding asm-prototypes.h for genksyms to generate crc [ Upstream commit bdca8cc096203b17ad0ac4e19f50578207e054d2 ] This patch adds the prototypes of assembly defined functions to asm-prototypes.h. Some prototypes are directly added as they are not present in any existing header files. Signed-off-by: Nagarathnam Muthusamy Reviewed-by: Babu Moger Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 065d0643c0d62adde224415c4aa0ca434fdd53b0 Author: Bert Kenward Date: Wed Jul 12 17:19:41 2017 +0100 sfc: don't read beyond unicast address list [ Upstream commit c70d68150f71b84cea6997a53493e17bf18a54db ] If we have more than 32 unicast MAC addresses assigned to an interface we will read beyond the end of the address table in the driver when adding filters. The next 256 entries store multicast addresses, so we will end up attempting to insert duplicate filters, which is mostly harmless. If we add more than 288 unicast addresses we will then read past the multicast address table, which is likely to be more exciting. Fixes: 12fb0da45c9a ("sfc: clean fallbacks between promisc/normal in efx_ef10_filter_sync_rx_mode") Signed-off-by: Bert Kenward Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0dc4be778d53ba5ffc2ef434f7c6eabdf509e802 Author: Arend van Spriel Date: Fri Jul 7 21:09:06 2017 +0100 brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx() [ Upstream commit 8f44c9a41386729fea410e688959ddaa9d51be7c ] The lower level nl80211 code in cfg80211 ensures that "len" is between 25 and NL80211_ATTR_FRAME (2304). We subtract DOT11_MGMT_HDR_LEN (24) from "len" so thats's max of 2280. However, the action_frame->data[] buffer is only BRCMF_FIL_ACTION_FRAME_SIZE (1800) bytes long so this memcpy() can overflow. memcpy(action_frame->data, &buf[DOT11_MGMT_HDR_LEN], le16_to_cpu(action_frame->len)); Cc: stable@vger.kernel.org # 3.9.x Fixes: 18e2f61db3b70 ("brcmfmac: P2P action frame tx.") Reported-by: "freenerguo(郭大兴)" Signed-off-by: Arend van Spriel Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a470f5fb25accb0cd4a9675960a8b5fd4e5bfdf1 Author: Eduardo Valentin Date: Tue Jul 11 14:55:12 2017 -0700 bridge: mdb: fix leak on complete_info ptr on fail path [ Upstream commit 1bfb159673957644951ab0a8d2aec44b93ddb1ae ] We currently get the following kmemleak report: unreferenced object 0xffff8800039d9820 (size 32): comm "softirq", pid 0, jiffies 4295212383 (age 792.416s) hex dump (first 32 bytes): 00 0c e0 03 00 88 ff ff ff 02 00 00 00 00 00 00 ................ 00 00 00 01 ff 11 00 02 86 dd 00 00 ff ff ff ff ................ backtrace: [] kmemleak_alloc+0x4a/0xa0 [] kmem_cache_alloc_trace+0xb8/0x1c0 [] __br_mdb_notify+0x2a3/0x300 [bridge] [] br_mdb_notify+0x6e/0x70 [bridge] [] br_multicast_add_group+0x109/0x150 [bridge] [] br_ip6_multicast_add_group+0x58/0x60 [bridge] [] br_multicast_rcv+0x1d5/0xdb0 [bridge] [] br_handle_frame_finish+0xcf/0x510 [bridge] [] br_nf_hook_thresh.part.27+0xb/0x10 [br_netfilter] [] br_nf_hook_thresh+0x48/0xb0 [br_netfilter] [] br_nf_pre_routing_finish_ipv6+0x109/0x1d0 [br_netfilter] [] br_nf_pre_routing_ipv6+0xd0/0x14c [br_netfilter] [] br_nf_pre_routing+0x197/0x3d0 [br_netfilter] [] nf_iterate+0x52/0x60 [] nf_hook_slow+0x5c/0xb0 [] br_handle_frame+0x1a4/0x2c0 [bridge] This happens when switchdev_port_obj_add() fails. This patch frees complete_info object in the fail path. Reviewed-by: Vallish Vaidyeshwara Signed-off-by: Eduardo Valentin Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c5363b1aa8facf7616a664e59c633771bd907ff9 Author: WANG Cong Date: Mon Jul 10 10:05:50 2017 -0700 tap: convert a mutex to a spinlock [ Upstream commit ffa423fb3251f8737303ffc3b0659e86e501808e ] We are not allowed to block on the RCU reader side, so can't just hold the mutex as before. As a quick fix, convert it to a spinlock. Fixes: d9f1f61c0801 ("tap: Extending tap device create/destroy APIs") Reported-by: Christian Borntraeger Tested-by: Christian Borntraeger Cc: Sainath Grandhi Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5af29e937ed030e5a737585a229bed03e4035cd0 Author: Guilherme G. Piccoli Date: Mon Jul 10 10:55:46 2017 -0300 cxgb4: fix BUG() on interrupt deallocating path of ULD [ Upstream commit 6a146f3a5894b751cef16feb3d7903e45e3c445c ] Since the introduction of ULD (Upper-Layer Drivers), the MSI-X deallocating path changed in cxgb4: the driver frees the interrupts of ULD when unregistering it or on shutdown PCI handler. Problem is that if a MSI-X is not freed before deallocated in the PCI layer, it will trigger a BUG() due to still "alive" interrupt being tentatively quiesced. The below trace was observed when doing a simple unbind of Chelsio's adapter PCI function, like: "echo 001e:80:00.4 > /sys/bus/pci/drivers/cxgb4/unbind" Trace: kernel BUG at drivers/pci/msi.c:352! Oops: Exception in kernel mode, sig: 5 [#1] ... NIP [c0000000005a5e60] free_msi_irqs+0xa0/0x250 LR [c0000000005a5e50] free_msi_irqs+0x90/0x250 Call Trace: [c0000000005a5e50] free_msi_irqs+0x90/0x250 (unreliable) [c0000000005a72c4] pci_disable_msix+0x124/0x180 [d000000011e06708] disable_msi+0x88/0xb0 [cxgb4] [d000000011e06948] free_some_resources+0xa8/0x160 [cxgb4] [d000000011e06d60] remove_one+0x170/0x3c0 [cxgb4] [c00000000058a910] pci_device_remove+0x70/0x110 [c00000000064ef04] device_release_driver_internal+0x1f4/0x2c0 ... This patch fixes the issue by refactoring the shutdown path of ULD on cxgb4 driver, by properly freeing and disabling interrupts on PCI remove handler too. Fixes: 0fbc81b3ad51 ("Allocate resources dynamically for all cxgb4 ULD's") Reported-by: Harsha Thyagaraja Signed-off-by: Guilherme G. Piccoli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9192b9c71e68bde74dc828c94aa472ec3b71c514 Author: Huy Nguyen Date: Thu Jun 29 16:50:01 2017 -0500 net/mlx5e: Initialize CEE's getpermhwaddr address buffer to 0xff [ Upstream commit d968f0f2e4404152f37ed2384b4a2269dd2dae5a ] Latest change in open-lldp code uses bytes 6-11 of perm_addr buffer as the Ethernet source address for the host TLV packet. Since our driver does not fill these bytes, they stay at zero and the open-lldp code ends up sending the TLV packet with zero source address and the switch drops this packet. The fix is to initialize these bytes to 0xff. The open-lldp code considers 0xff:ff:ff:ff:ff:ff as the invalid address and falls back to use the host's mac address as the Ethernet source address. Fixes: 3a6a931dfb8e ("net/mlx5e: Support DCBX CEE API") Signed-off-by: Huy Nguyen Reviewed-by: Daniel Jurgens Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 1567144c579e336dcea085068d82def38b1f9384 Author: Sowmini Varadhan Date: Thu Jul 6 08:15:06 2017 -0700 rds: tcp: use sock_create_lite() to create the accept socket [ Upstream commit 0933a578cd55b02dc80f219dc8f2efb17ec61c9a ] There are two problems with calling sock_create_kern() from rds_tcp_accept_one() 1. it sets up a new_sock->sk that is wasteful, because this ->sk is going to get replaced by inet_accept() in the subsequent ->accept() 2. The new_sock->sk is a leaked reference in sock_graft() which expects to find a null parent->sk Avoid these problems by calling sock_create_lite(). Signed-off-by: Sowmini Varadhan Acked-by: Santosh Shilimkar Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 372cc0d29ed770dafe73259c230467956b082ac8 Author: Nikolay Aleksandrov Date: Thu Jul 6 15:24:40 2017 +0300 vrf: fix bug_on triggered by rx when destroying a vrf [ Upstream commit f630c38ef0d785101363a8992bbd4f302180f86f ] When destroying a VRF device we cleanup the slaves in its ndo_uninit() function, but that causes packets to be switched (skb->dev == vrf being destroyed) even though we're pass the point where the VRF should be receiving any packets while it is being dismantled. This causes a BUG_ON to trigger if we have raw sockets (trace below). The reason is that the inetdev of the VRF has been destroyed but we're still sending packets up the stack with it, so let's free the slaves in the dellink callback as David Ahern suggested. Note that this fix doesn't prevent packets from going up when the VRF device is admin down. [ 35.631371] ------------[ cut here ]------------ [ 35.631603] kernel BUG at net/ipv4/fib_frontend.c:285! [ 35.631854] invalid opcode: 0000 [#1] SMP [ 35.631977] Modules linked in: [ 35.632081] CPU: 2 PID: 22 Comm: ksoftirqd/2 Not tainted 4.12.0-rc7+ #45 [ 35.632247] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 35.632477] task: ffff88005ad68000 task.stack: ffff88005ad64000 [ 35.632632] RIP: 0010:fib_compute_spec_dst+0xfc/0x1ee [ 35.632769] RSP: 0018:ffff88005ad67978 EFLAGS: 00010202 [ 35.632910] RAX: 0000000000000001 RBX: ffff880059a7f200 RCX: 0000000000000000 [ 35.633084] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff82274af0 [ 35.633256] RBP: ffff88005ad679f8 R08: 000000000001ef70 R09: 0000000000000046 [ 35.633430] R10: ffff88005ad679f8 R11: ffff880037731cb0 R12: 0000000000000001 [ 35.633603] R13: ffff8800599e3000 R14: 0000000000000000 R15: ffff8800599cb852 [ 35.634114] FS: 0000000000000000(0000) GS:ffff88005d900000(0000) knlGS:0000000000000000 [ 35.634306] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 35.634456] CR2: 00007f3563227095 CR3: 000000000201d000 CR4: 00000000000406e0 [ 35.634632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 35.634865] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 35.635055] Call Trace: [ 35.635271] ? __lock_acquire+0xf0d/0x1117 [ 35.635522] ipv4_pktinfo_prepare+0x82/0x151 [ 35.635831] raw_rcv_skb+0x17/0x3c [ 35.636062] raw_rcv+0xe5/0xf7 [ 35.636287] raw_local_deliver+0x169/0x1d9 [ 35.636534] ip_local_deliver_finish+0x87/0x1c4 [ 35.636820] ip_local_deliver+0x63/0x7f [ 35.637058] ip_rcv_finish+0x340/0x3a1 [ 35.637295] ip_rcv+0x314/0x34a [ 35.637525] __netif_receive_skb_core+0x49f/0x7c5 [ 35.637780] ? lock_acquire+0x13f/0x1d7 [ 35.638018] ? lock_acquire+0x15e/0x1d7 [ 35.638259] __netif_receive_skb+0x1e/0x94 [ 35.638502] ? __netif_receive_skb+0x1e/0x94 [ 35.638748] netif_receive_skb_internal+0x74/0x300 [ 35.639002] ? dev_gro_receive+0x2ed/0x411 [ 35.639246] ? lock_is_held_type+0xc4/0xd2 [ 35.639491] napi_gro_receive+0x105/0x1a0 [ 35.639736] receive_buf+0xc32/0xc74 [ 35.639965] ? detach_buf+0x67/0x153 [ 35.640201] ? virtqueue_get_buf_ctx+0x120/0x176 [ 35.640453] virtnet_poll+0x128/0x1c5 [ 35.640690] net_rx_action+0x103/0x343 [ 35.640932] __do_softirq+0x1c7/0x4b7 [ 35.641171] run_ksoftirqd+0x23/0x5c [ 35.641403] smpboot_thread_fn+0x24f/0x26d [ 35.641646] ? sort_range+0x22/0x22 [ 35.641878] kthread+0x129/0x131 [ 35.642104] ? __list_add+0x31/0x31 [ 35.642335] ? __list_add+0x31/0x31 [ 35.642568] ret_from_fork+0x2a/0x40 [ 35.642804] Code: 05 bd 87 a3 00 01 e8 1f ef 98 ff 4d 85 f6 48 c7 c7 f0 4a 27 82 41 0f 94 c4 31 c9 31 d2 41 0f b6 f4 e8 04 71 a1 ff 45 84 e4 74 02 <0f> 0b 0f b7 93 c4 00 00 00 4d 8b a5 80 05 00 00 48 03 93 d0 00 [ 35.644342] RIP: fib_compute_spec_dst+0xfc/0x1ee RSP: ffff88005ad67978 Fixes: 193125dbd8eb ("net: Introduce VRF device driver") Reported-by: Chris Cormier Signed-off-by: Nikolay Aleksandrov Acked-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 04e0f78f877dd21ef3114c1d18281d7daaa07e59 Author: David Ahern Date: Wed Jul 5 14:41:46 2017 -0600 net: ipv6: Compare lwstate in detecting duplicate nexthops [ Upstream commit f06b7549b79e29a672336d4e134524373fb7a232 ] Lennert reported a failure to add different mpls encaps in a multipath route: $ ip -6 route add 1234::/16 \ nexthop encap mpls 10 via fe80::1 dev ens3 \ nexthop encap mpls 20 via fe80::1 dev ens3 RTNETLINK answers: File exists The problem is that the duplicate nexthop detection does not compare lwtunnel configuration. Add it. Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes") Signed-off-by: David Ahern Reported-by: João Taveira Araújo Reported-by: Lennert Buytenhek Acked-by: Roopa Prabhu Tested-by: Lennert Buytenhek Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5269faa455c680f4f1e44e45ef41dae0878c45d4 Author: Derek Chickles Date: Wed Jul 5 11:59:27 2017 -0700 liquidio: fix bug in soft reset failure detection [ Upstream commit 05a6b4cae8c0cc1680c9dd33a97a49a13c0f01bc ] The code that detects a failed soft reset of Octeon is comparing the wrong value against the reset value of the Octeon SLI_SCRATCH_1 register, resulting in an inability to detect a soft reset failure. Fix it by using the correct value in the comparison, which is any non-zero value. Fixes: f21fb3ed364b ("Add support of Cavium Liquidio ethernet adapters") Fixes: c0eab5b3580a ("liquidio: CN23XX firmware download") Signed-off-by: Derek Chickles Signed-off-by: Satanand Burla Signed-off-by: Raghu Vatsavayi Signed-off-by: Felix Manlunas Reviewed-by: Leon Romanovsky Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8a2f02b890e0c8a327cb48340f501ea3b23a67f5 Author: Alban Browaeys Date: Mon Jul 3 03:20:13 2017 +0200 net: core: Fix slab-out-of-bounds in netdev_stats_to_stats64 [ Upstream commit 9af9959e142c274f4a30fefb71d97d2b028b337f ] commit 9256645af098 ("net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64") made an attempt to read beyond the size of the source a possibility. Fix to only copy src size to dest. As dest might be bigger than src. ================================================================== BUG: KASAN: slab-out-of-bounds in netdev_stats_to_stats64+0xe/0x30 at addr ffff8801be248b20 Read of size 192 by task VBoxNetAdpCtl/6734 CPU: 1 PID: 6734 Comm: VBoxNetAdpCtl Tainted: G O 4.11.4prahal+intel+ #118 Hardware name: LENOVO 20CDCTO1WW/20CDCTO1WW, BIOS GQET52WW (1.32 ) 05/04/2017 Call Trace: dump_stack+0x63/0x86 kasan_object_err+0x1c/0x70 kasan_report+0x270/0x520 ? netdev_stats_to_stats64+0xe/0x30 ? sched_clock_cpu+0x1b/0x190 ? __module_address+0x3e/0x3b0 ? unwind_next_frame+0x1ea/0xb00 check_memory_region+0x13c/0x1a0 memcpy+0x23/0x50 netdev_stats_to_stats64+0xe/0x30 dev_get_stats+0x1b9/0x230 rtnl_fill_stats+0x44/0xc00 ? nla_put+0xc6/0x130 rtnl_fill_ifinfo+0xe9e/0x3700 ? rtnl_fill_vfinfo+0xde0/0xde0 ? sched_clock+0x9/0x10 ? sched_clock+0x9/0x10 ? sched_clock_local+0x120/0x130 ? __module_address+0x3e/0x3b0 ? unwind_next_frame+0x1ea/0xb00 ? sched_clock+0x9/0x10 ? sched_clock+0x9/0x10 ? sched_clock_cpu+0x1b/0x190 ? VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp] ? depot_save_stack+0x1d8/0x4a0 ? depot_save_stack+0x34f/0x4a0 ? depot_save_stack+0x34f/0x4a0 ? save_stack+0xb1/0xd0 ? save_stack_trace+0x16/0x20 ? save_stack+0x46/0xd0 ? kasan_slab_alloc+0x12/0x20 ? __kmalloc_node_track_caller+0x10d/0x350 ? __kmalloc_reserve.isra.36+0x2c/0xc0 ? __alloc_skb+0xd0/0x560 ? rtmsg_ifinfo_build_skb+0x61/0x120 ? rtmsg_ifinfo.part.25+0x16/0xb0 ? rtmsg_ifinfo+0x47/0x70 ? register_netdev+0x15/0x30 ? vboxNetAdpOsCreate+0xc0/0x1c0 [vboxnetadp] ? vboxNetAdpCreate+0x210/0x400 [vboxnetadp] ? VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp] ? do_vfs_ioctl+0x17f/0xff0 ? SyS_ioctl+0x74/0x80 ? do_syscall_64+0x182/0x390 ? __alloc_skb+0xd0/0x560 ? __alloc_skb+0xd0/0x560 ? save_stack_trace+0x16/0x20 ? init_object+0x64/0xa0 ? ___slab_alloc+0x1ae/0x5c0 ? ___slab_alloc+0x1ae/0x5c0 ? __alloc_skb+0xd0/0x560 ? sched_clock+0x9/0x10 ? kasan_unpoison_shadow+0x35/0x50 ? kasan_kmalloc+0xad/0xe0 ? __kmalloc_node_track_caller+0x246/0x350 ? __alloc_skb+0xd0/0x560 ? kasan_unpoison_shadow+0x35/0x50 ? memset+0x31/0x40 ? __alloc_skb+0x31f/0x560 ? napi_consume_skb+0x320/0x320 ? br_get_link_af_size_filtered+0xb7/0x120 [bridge] ? if_nlmsg_size+0x440/0x630 rtmsg_ifinfo_build_skb+0x83/0x120 rtmsg_ifinfo.part.25+0x16/0xb0 rtmsg_ifinfo+0x47/0x70 register_netdevice+0xa2b/0xe50 ? __kmalloc+0x171/0x2d0 ? netdev_change_features+0x80/0x80 register_netdev+0x15/0x30 vboxNetAdpOsCreate+0xc0/0x1c0 [vboxnetadp] vboxNetAdpCreate+0x210/0x400 [vboxnetadp] ? vboxNetAdpComposeMACAddress+0x1d0/0x1d0 [vboxnetadp] ? kasan_check_write+0x14/0x20 VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp] ? VBoxNetAdpLinuxOpen+0x20/0x20 [vboxnetadp] ? lock_acquire+0x11c/0x270 ? __audit_syscall_entry+0x2fb/0x660 do_vfs_ioctl+0x17f/0xff0 ? __audit_syscall_entry+0x2fb/0x660 ? ioctl_preallocate+0x1d0/0x1d0 ? __audit_syscall_entry+0x2fb/0x660 ? kmem_cache_free+0xb2/0x250 ? syscall_trace_enter+0x537/0xd00 ? exit_to_usermode_loop+0x100/0x100 SyS_ioctl+0x74/0x80 ? do_sys_open+0x350/0x350 ? do_vfs_ioctl+0xff0/0xff0 do_syscall_64+0x182/0x390 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f7e39a1ae07 RSP: 002b:00007ffc6f04c6d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffc6f04c730 RCX: 00007f7e39a1ae07 RDX: 00007ffc6f04c730 RSI: 00000000c0207601 RDI: 0000000000000007 RBP: 00007ffc6f04c700 R08: 00007ffc6f04c780 R09: 0000000000000008 R10: 0000000000000541 R11: 0000000000000206 R12: 0000000000000007 R13: 00000000c0207601 R14: 00007ffc6f04c730 R15: 0000000000000012 Object at ffff8801be248008, in cache kmalloc-4096 size: 4096 Allocated: PID = 6734 save_stack_trace+0x16/0x20 save_stack+0x46/0xd0 kasan_kmalloc+0xad/0xe0 __kmalloc+0x171/0x2d0 alloc_netdev_mqs+0x8a7/0xbe0 vboxNetAdpOsCreate+0x65/0x1c0 [vboxnetadp] vboxNetAdpCreate+0x210/0x400 [vboxnetadp] VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp] do_vfs_ioctl+0x17f/0xff0 SyS_ioctl+0x74/0x80 do_syscall_64+0x182/0x390 return_from_SYSCALL_64+0x0/0x6a Freed: PID = 5600 save_stack_trace+0x16/0x20 save_stack+0x46/0xd0 kasan_slab_free+0x73/0xc0 kfree+0xe4/0x220 kvfree+0x25/0x30 single_release+0x74/0xb0 __fput+0x265/0x6b0 ____fput+0x9/0x10 task_work_run+0xd5/0x150 exit_to_usermode_loop+0xe2/0x100 do_syscall_64+0x26c/0x390 return_from_SYSCALL_64+0x0/0x6a Memory state around the buggy address: ffff8801be248a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8801be248b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8801be248b80: 00 00 00 00 00 00 00 00 00 00 00 07 fc fc fc fc ^ ffff8801be248c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8801be248c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Signed-off-by: Alban Browaeys Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c1daa73dac700d5d8fc99434ccd6427f2a24ab9c Author: Jiri Benc Date: Sun Jul 2 19:00:58 2017 +0200 geneve: fix hlist corruption [ Upstream commit 4b4c21fad6ae6bd58ff1566f23b0f4f70fdc9a30 ] It's not a good idea to add the same hlist_node to two different hash lists. This leads to various hard to debug memory corruptions. Fixes: 8ed66f0e8235 ("geneve: implement support for IPv6-based tunnels") Cc: John W. Linville Signed-off-by: Jiri Benc Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 173ced803da20b073c84c17822e780b128cd3cbd Author: Jiri Benc Date: Sun Jul 2 19:00:57 2017 +0200 vxlan: fix hlist corruption [ Upstream commit 69e766612c4bcb79e19cebed9eed61d4222c1d47 ] It's not a good idea to add the same hlist_node to two different hash lists. This leads to various hard to debug memory corruptions. Fixes: b1be00a6c39f ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device") Signed-off-by: Jiri Benc Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3ca557c85ab07b35d4cf2bcb4439b3d12a4c246a Author: Sabrina Dubroca Date: Thu Jun 29 16:56:54 2017 +0200 ipv6: dad: don't remove dynamic addresses if link is down [ Upstream commit ec8add2a4c9df723c94a863b8fcd6d93c472deed ] Currently, when the link for $DEV is down, this command succeeds but the address is removed immediately by DAD (1): ip addr add 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800 In the same situation, this will succeed and not remove the address (2): ip addr add 1111::12/64 dev $DEV ip addr change 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800 The comment in addrconf_dad_begin() when !IF_READY makes it look like this is the intended behavior, but doesn't explain why: * If the device is not ready: * - keep it tentative if it is a permanent address. * - otherwise, kill it. We clearly cannot prevent userspace from doing (2), but we can make (1) work consistently with (2). addrconf_dad_stop() is only called in two cases: if DAD failed, or to skip DAD when the link is down. In that second case, the fix is to avoid deleting the address, like we already do for permanent addresses. Fixes: 3c21edbd1137 ("[IPV6]: Defer IPv6 device initialization until the link becomes ready.") Signed-off-by: Sabrina Dubroca Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit febc276df9ace2450e5e3e8fcfed588490539b90 Author: Gal Pressman Date: Sun Jun 25 16:46:25 2017 +0300 net/mlx5e: Fix TX carrier errors report in get stats ndo [ Upstream commit 8ff93de7668bd81bc8efa819d1184ebd48fae72d ] Symbol error during carrier counter from PPCNT was mistakenly reported as TX carrier errors in get_stats ndo, although it's an RX counter. Fixes: 269e6b3af3bf ("net/mlx5e: Report additional error statistics in get stats ndo") Signed-off-by: Gal Pressman Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit fa8b93e1dcc5c8354bc146892e44d3240faa2f9e Author: Mohamad Haj Yahia Date: Thu Mar 30 17:09:00 2017 +0300 net/mlx5: Cancel delayed recovery work when unloading the driver [ Upstream commit 2a0165a034ac024b60cca49c61e46f4afa2e4d98 ] Draining the health workqueue will ignore future health works including the one that report hardware failure and thus we can't enter error state Instead cancel the recovery flow and make sure only recovery flow won't be scheduled. Fixes: 5e44fca50470 ('net/mlx5: Only cancel recovery work when cleaning up device') Signed-off-by: Mohamad Haj Yahia Signed-off-by: Moshe Shemesh Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman commit 0a3eafac6c118a4d88d95f4d54d16ec322b930f9 Author: Michal Kubeček Date: Thu Jun 29 11:13:36 2017 +0200 net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish() [ Upstream commit e44699d2c28067f69698ccb68dd3ddeacfebc434 ] Recently I started seeing warnings about pages with refcount -1. The problem was traced to packets being reused after their head was merged into a GRO packet by skb_gro_receive(). While bisecting the issue pointed to commit c21b48cc1bbf ("net: adjust skb->truesize in ___pskb_trim()") and I have never seen it on a kernel with it reverted, I believe the real problem appeared earlier when the option to merge head frag in GRO was implemented. Handling NAPI_GRO_FREE_STOLEN_HEAD state was only added to GRO_MERGED_FREE branch of napi_skb_finish() so that if the driver uses napi_gro_frags() and head is merged (which in my case happens after the skb_condense() call added by the commit mentioned above), the skb is reused including the head that has been merged. As a result, we release the page reference twice and eventually end up with negative page refcount. To fix the problem, handle NAPI_GRO_FREE_STOLEN_HEAD in napi_frags_finish() the same way it's done in napi_skb_finish(). Fixes: d7e8883cfcf4 ("net: make GRO aware of skb->head_frag") Signed-off-by: Michal Kubecek Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 87a82d25054b5b0b273da5c407c6b918305f54b4 Author: Daniel Borkmann Date: Thu Jun 29 03:04:59 2017 +0200 bpf: prevent leaking pointer via xadd on unpriviledged [ Upstream commit 6bdf6abc56b53103324dfd270a86580306e1a232 ] Leaking kernel addresses on unpriviledged is generally disallowed, for example, verifier rejects the following: 0: (b7) r0 = 0 1: (18) r2 = 0xffff897e82304400 3: (7b) *(u64 *)(r1 +48) = r2 R2 leaks addr into ctx Doing pointer arithmetic on them is also forbidden, so that they don't turn into unknown value and then get leaked out. However, there's xadd as a special case, where we don't check the src reg for being a pointer register, e.g. the following will pass: 0: (b7) r0 = 0 1: (7b) *(u64 *)(r1 +48) = r0 2: (18) r2 = 0xffff897e82304400 ; map 4: (db) lock *(u64 *)(r1 +48) += r2 5: (95) exit We could store the pointer into skb->cb, loose the type context, and then read it out from there again to leak it eventually out of a map value. Or more easily in a different variant, too: 0: (bf) r6 = r1 1: (7a) *(u64 *)(r10 -8) = 0 2: (bf) r2 = r10 3: (07) r2 += -8 4: (18) r1 = 0x0 6: (85) call bpf_map_lookup_elem#1 7: (15) if r0 == 0x0 goto pc+3 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R6=ctx R10=fp 8: (b7) r3 = 0 9: (7b) *(u64 *)(r0 +0) = r3 10: (db) lock *(u64 *)(r0 +0) += r6 11: (b7) r0 = 0 12: (95) exit from 7 to 11: R0=inv,min_value=0,max_value=0 R6=ctx R10=fp 11: (b7) r0 = 0 12: (95) exit Prevent this by checking xadd src reg for pointer types. Also add a couple of test cases related to this. Fixes: 1be7f75d1668 ("bpf: enable non-root eBPF programs") Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov Acked-by: Martin KaFai Lau Acked-by: Edward Cree Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit dedf7348e8090d8f46388f2c7e3d05e16a52a571 Author: Dan Carpenter Date: Wed Jun 28 14:44:21 2017 +0300 rocker: move dereference before free [ Upstream commit acb4b7df48b539cb391287921de57e4e5fae3460 ] My static checker complains that ofdpa_neigh_del() can sometimes free "found". It just makes sense to use it first before deleting it. Fixes: ecf244f753e0 ("rocker: fix maybe-uninitialized warning") Signed-off-by: Dan Carpenter Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7841af1a3abe8359736e1f90e7f8d4832bf7ad4b Author: Ido Schimmel Date: Wed Jun 28 09:03:12 2017 +0300 mlxsw: spectrum_router: Fix NULL pointer dereference [ Upstream commit 6b27c8adf27edf1dabe2cdcfaa101ef7e2712415 ] In case a VLAN device is enslaved to a bridge we shouldn't create a router interface (RIF) for it when it's configured with an IP address. This is already handled by the driver for other types of netdevs, such as physical ports and LAG devices. If this IP address is then removed and the interface is subsequently unlinked from the bridge, a NULL pointer dereference can happen, as the original 802.1d FID was replaced with an rFID which was then deleted. To reproduce: $ ip link set dev enp3s0np9 up $ ip link add name enp3s0np9.111 link enp3s0np9 type vlan id 111 $ ip link set dev enp3s0np9.111 up $ ip link add name br0 type bridge $ ip link set dev br0 up $ ip link set enp3s0np9.111 master br0 $ ip address add dev enp3s0np9.111 192.168.0.1/24 $ ip address del dev enp3s0np9.111 192.168.0.1/24 $ ip link set dev enp3s0np9.111 nomaster Fixes: 99724c18fc66 ("mlxsw: spectrum: Introduce support for router interfaces") Signed-off-by: Ido Schimmel Reported-by: Petr Machata Tested-by: Petr Machata Reviewed-by: Petr Machata Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3f4715e60e1d93b8ce952a3e4127008bfd58a4f8 Author: Gao Feng Date: Wed Jun 28 12:53:54 2017 +0800 net: sched: Fix one possible panic when no destroy callback [ Upstream commit c1a4872ebfb83b1af7144f7b29ac8c4b344a12a8 ] When qdisc fail to init, qdisc_create would invoke the destroy callback to cleanup. But there is no check if the callback exists really. So it would cause the panic if there is no real destroy callback like the qdisc codel, fq, and so on. Take codel as an example following: When a malicious user constructs one invalid netlink msg, it would cause codel_init->codel_change->nla_parse_nested failed. Then kernel would invoke the destroy callback directly but qdisc codel doesn't define one. It causes one panic as a result. Now add one the check for destroy to avoid the possible panic. Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Signed-off-by: Gao Feng Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 378345d61fca426c0b7c3f604c14dafb67895f40 Author: Jason Wang Date: Wed Jun 28 09:51:03 2017 +0800 virtio-net: serialize tx routine during reset [ Upstream commit 713a98d90c5ea072c1bb00ef40617aee2cef2232 ] We don't hold any tx lock when trying to disable TX during reset, this would lead a use after free since ndo_start_xmit() tries to access the virtqueue which has already been freed. Fix this by using netif_tx_disable() before freeing the vqs, this could make sure no tx after vq freeing. Reported-by: Jean-Philippe Menil Tested-by: Jean-Philippe Menil Fixes commit f600b6905015 ("virtio_net: Add XDP support") Cc: John Fastabend Signed-off-by: Jason Wang Acked-by: Michael S. Tsirkin Acked-by: Robert McCabe Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 28b35b3ffbe0eb9fae8b1b6e7f680a8648dea2f7 Author: Eric Dumazet Date: Tue Jun 27 07:02:20 2017 -0700 net: prevent sign extension in dev_get_stats() [ Upstream commit 6f64ec74515925cced6df4571638b5a099a49aae ] Similar to the fix provided by Dominik Heidler in commit 9b3dc0a17d73 ("l2tp: cast l2tp traffic counter to unsigned") we need to take care of 32bit kernels in dev_get_stats(). When using atomic_long_read(), we add a 'long' to u64 and might misinterpret high order bit, unless we cast to unsigned. Fixes: caf586e5f23ce ("net: add a core netdev->rx_dropped counter") Fixes: 015f0688f57ca ("net: net: add a core netdev->tx_dropped counter") Fixes: 6e7333d315a76 ("net: add rx_nohandler stat counter") Signed-off-by: Eric Dumazet Cc: Jarod Wilson Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 85c1186f5df971d6d30affb7c064a7841ccdbd3a Author: WANG Cong Date: Sat Jun 24 23:50:30 2017 -0700 tcp: reset sk_rx_dst in tcp_disconnect() [ Upstream commit d747a7a51b00984127a88113cdbbc26f91e9d815 ] We have to reset the sk->sk_rx_dst when we disconnect a TCP connection, because otherwise when we re-connect it this dst reference is simply overridden in tcp_finish_connect(). This fixes a dst leak which leads to a loopback dev refcnt leak. It is a long-standing bug, Kevin reported a very similar (if not same) bug before. Thanks to Andrei for providing such a reliable reproducer which greatly narrows down the problem. Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.") Reported-by: Andrei Vagin Reported-by: Kevin Xu Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a700bd8decb8626f3f4652d793674fc3ac8ef21b Author: Richard Cochran Date: Fri Jun 23 17:51:31 2017 +0200 net: dp83640: Avoid NULL pointer dereference. [ Upstream commit db9d8b29d19d2801793e4419f4c6272bf8951c62 ] The function, skb_complete_tx_timestamp(), used to allow passing in a NULL pointer for the time stamps, but that was changed in commit 62bccb8cdb69051b95a55ab0c489e3cab261c8ef ("net-timestamp: Make the clone operation stand-alone from phy timestamping"), and the existing call sites, all of which are in the dp83640 driver, were fixed up. Even though the kernel-doc was subsequently updated in commit 7a76a021cd5a292be875fbc616daf03eab1e6996 ("net-timestamp: Update skb_complete_tx_timestamp comment"), still a bug fix from Manfred Rudigier came into the driver using the old semantics. Probably Manfred derived that patch from an older kernel version. This fix should be applied to the stable trees as well. Fixes: 81e8f2e930fe ("net: dp83640: Fix tx timestamp overflow handling.") Signed-off-by: Richard Cochran Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8a0cd5660dfe186b046979aec0f61c7de2bbe721 Author: Michal Kubeček Date: Mon Jun 19 13:03:43 2017 +0200 net: account for current skb length when deciding about UFO [ Upstream commit a5cb659bbc1c8644efa0c3138a757a1e432a4880 ] Our customer encountered stuck NFS writes for blocks starting at specific offsets w.r.t. page boundary caused by networking stack sending packets via UFO enabled device with wrong checksum. The problem can be reproduced by composing a long UDP datagram from multiple parts using MSG_MORE flag: sendto(sd, buff, 1000, MSG_MORE, ...); sendto(sd, buff, 1000, MSG_MORE, ...); sendto(sd, buff, 3000, 0, ...); Assume this packet is to be routed via a device with MTU 1500 and NETIF_F_UFO enabled. When second sendto() gets into __ip_append_data(), this condition is tested (among others) to decide whether to call ip_ufo_append_data(): ((length + fragheaderlen) > mtu) || (skb && skb_is_gso(skb)) At the moment, we already have skb with 1028 bytes of data which is not marked for GSO so that the test is false (fragheaderlen is usually 20). Thus we append second 1000 bytes to this skb without invoking UFO. Third sendto(), however, has sufficient length to trigger the UFO path so that we end up with non-UFO skb followed by a UFO one. Later on, udp_send_skb() uses udp_csum() to calculate the checksum but that assumes all fragments have correct checksum in skb->csum which is not true for UFO fragments. When checking against MTU, we need to add skb->len to length of new segment if we already have a partially filled skb and fragheaderlen only if there isn't one. In the IPv6 case, skb can only be null if this is the first segment so that we have to use headersize (length of the first IPv6 header) rather than fragheaderlen (length of IPv6 header of further fragments) for skb == NULL. Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach") Fixes: e4c5e13aa45c ("ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output") Signed-off-by: Michal Kubecek Acked-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e2894b8778a8399024290567c398ec118ee094d6 Author: Martin Habets Date: Thu Jun 22 10:50:41 2017 +0100 sfc: Fix MCDI command size for filter operations [ Upstream commit bb53f4d4f5116d3dae76bb12fb16bc73771f958a ] The 8000 series adapters uses catch-all filters for encapsulated traffic to support filtering VXLAN, NVGRE and GENEVE traffic. This new filter functionality requires a longer MCDI command. This patch increases the size of buffers on stack that were missed, which fixes a kernel panic from the stack protector. Fixes: 9b41080125176 ("sfc: insert catch-all filters for encapsulated traffic") Signed-off-by: Martin Habets Acked-by: Edward Cree Acked-by: Bert Kenward bkenward@solarflare.com Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 527bad77ddfd3941dac7f7db925e2e4645db79a4 Author: Arnd Bergmann Date: Thu Jun 22 00:16:37 2017 +0200 netvsc: don't access netdev->num_rx_queues directly [ Upstream commit b92b7d3312033a08cae2c879b9243c42ad7de94b ] This structure member is hidden behind CONFIG_SYSFS, and we get a build error when that is disabled: drivers/net/hyperv/netvsc_drv.c: In function 'netvsc_set_channels': drivers/net/hyperv/netvsc_drv.c:754:49: error: 'struct net_device' has no member named 'num_rx_queues'; did you mean 'num_tx_queues'? drivers/net/hyperv/netvsc_drv.c: In function 'netvsc_set_rxfh': drivers/net/hyperv/netvsc_drv.c:1181:25: error: 'struct net_device' has no member named 'num_rx_queues'; did you mean 'num_tx_queues'? As the value is only set once to the argument of alloc_netdev_mq(), we can compare against that constant directly. Fixes: ff4a44199012 ("netvsc: allow get/set of RSS indirection table") Fixes: 2b01888d1b45 ("netvsc: allow more flexible setting of number of channels") Signed-off-by: Arnd Bergmann Reviewed-by: Haiyang Zhang Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 72056dab81057db52cf4133f7e3accb8a19cedcc Author: WANG Cong Date: Wed Jun 21 14:34:58 2017 -0700 ipv6: avoid unregistering inet6_dev for loopback [ Upstream commit 60abc0be96e00ca71bac083215ac91ad2e575096 ] The per netns loopback_dev->ip6_ptr is unregistered and set to NULL when its mtu is set to smaller than IPV6_MIN_MTU, this leads to that we could set rt->rt6i_idev NULL after a rt6_uncached_list_flush_dev() and then crash after another call. In this case we should just bring its inet6_dev down, rather than unregistering it, at least prior to commit 176c39af29bc ("netns: fix addrconf_ifdown kernel panic") we always override the case for loopback. Thanks a lot to Andrey for finding a reliable reproducer. Fixes: 176c39af29bc ("netns: fix addrconf_ifdown kernel panic") Reported-by: Andrey Konovalov Cc: Andrey Konovalov Cc: Daniel Lezcano Cc: David Ahern Signed-off-by: Cong Wang Acked-by: David Ahern Tested-by: Andrey Konovalov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5dc86c8d7849aee36159bd24163911fac6ff849e Author: Zach Brown Date: Tue Jun 20 12:48:11 2017 -0500 net/phy: micrel: configure intterupts after autoneg workaround [ Upstream commit b866203d872d5deeafcecd25ea429d6748b5bd56 ] The commit ("net/phy: micrel: Add workaround for bad autoneg") fixes an autoneg failure case by resetting the hardware. This turns off intterupts. Things will work themselves out if the phy polls, as it will figure out it's state during a poll. However if the phy uses only intterupts, the phy will stall, since interrupts are off. This patch fixes the issue by calling config_intr after resetting the phy. Fixes: d2fd719bcb0e ("net/phy: micrel: Add workaround for bad autoneg ") Signed-off-by: Zach Brown Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman