namespaces now completely use the unconfined profile to track the
refcount and rcu freeing cycle. So rework the code to simplify (track
everything through the profile path right up to the end), and move the
rcu_head from policy base to profile as the namespace no longer needs
it.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
ns->unconfined is being used read side without locking, nor rcu but is
being updated when a namespace is removed. This works for the root ns
which is never removed but has a race window and can cause failures when
children namespaces are removed.
Also ns and ns->unconfined have a circular refcounting dependency that
is problematic and must be broken. Currently this is done incorrectly
when the namespace is destroyed.
Fix this by forward referencing unconfined via the replacedby infrastructure
instead of directly updating the ns->unconfined pointer.
Remove the circular refcount dependency by making the ns and its unconfined
profile share the same refcount.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
remove the use of replaced by chaining and move to profile invalidation
and lookup to handle task replacement.
Replacement chaining can result in large chains of profiles being pinned
in memory when one profile in the chain is use. With implicit labeling
this will be even more of a problem, so move to a direct lookup method.
Signed-off-by: John Johansen <john.johansen@canonical.com>
previously profiles had to be loaded one at a time, which could result
in cases where a replacement of a set would partially succeed, and then fail
resulting in inconsistent policy.
Allow multiple profiles to replaced "atomically" so that the replacement
either succeeds or fails for the entire set of profiles.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Add a policy directory to features to contain features that can affect
policy compilation but do not affect mediation. Eg of such features would
be types of dfa compression supported, etc.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
This is a follow-up to commit b5b3ee6c "apparmor: no need to delay vfree()".
Since vmalloc() will do "size = PAGE_ALIGN(size);",
we don't need to check for "size >= sizeof(struct work_struct)".
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Smack interface for loading rules has always parsed only single rule from
data written to it. This requires user program to call one write() per
each rule it wants to load.
This change makes it possible to write multiple rules, separated by new
line character. Smack will load at most PAGE_SIZE-1 characters and properly
return number of processed bytes. In case when user buffer is larger, it
will be additionally truncated. All characters after last \n will not get
parsed to avoid partial rule near input buffer boundary.
Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
Previously, all css descendant iterators didn't include the origin
(root of subtree) css in the iteration. The reasons were maintaining
consistency with css_for_each_child() and that at the time of
introduction more use cases needed skipping the origin anyway;
however, given that css_is_descendant() considers self to be a
descendant, omitting the origin css has become more confusing and
looking at the accumulated use cases rather clearly indicates that
including origin would result in simpler code overall.
While this is a change which can easily lead to subtle bugs, cgroup
API including the iterators has recently gone through major
restructuring and no out-of-tree changes will be applicable without
adjustments making this a relatively acceptable opportunity for this
type of change.
The conversions are mostly straight-forward. If the iteration block
had explicit origin handling before or after, it's moved inside the
iteration. If not, if (pos == origin) continue; is added. Some
conversions add extra reference get/put around origin handling by
consolidating origin handling and the rest. While the extra ref
operations aren't strictly necessary, this shouldn't cause any
noticeable difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
cgroup is currently in the process of transitioning to using css
(cgroup_subsys_state) as the primary handle instead of cgroup in
subsystem API. For hierarchy iterators, this is beneficial because
* In most cases, css is the only thing subsystems care about anyway.
* On the planned unified hierarchy, iterations for different
subsystems will need to skip over different subtrees of the
hierarchy depending on which subsystems are enabled on each cgroup.
Passing around css makes it unnecessary to explicitly specify the
subsystem in question as css is intersection between cgroup and
subsystem
* For the planned unified hierarchy, css's would need to be created
and destroyed dynamically independent from cgroup hierarchy. Having
cgroup core manage css iteration makes enforcing deref rules a lot
easier.
Most subsystem conversions are straight-forward. Noteworthy changes
are
* blkio: cgroup_to_blkcg() is no longer used. Removed.
* freezer: cgroup_freezer() is no longer used. Removed.
* devices: cgroup_to_devcgroup() is no longer used. Removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
cgroup is currently in the process of transitioning to using struct
cgroup_subsys_state * as the primary handle instead of struct cgroup.
Please see the previous commit which converts the subsystem methods
for rationale.
This patch converts all cftype file operations to take @css instead of
@cgroup. cftypes for the cgroup core files don't have their subsytem
pointer set. These will automatically use the dummy_css added by the
previous patch and can be converted the same way.
Most subsystem conversions are straight forwards but there are some
interesting ones.
* freezer: update_if_frozen() is also converted to take @css instead
of @cgroup for consistency. This will make the code look simpler
too once iterators are converted to use css.
* memory/vmpressure: mem_cgroup_from_css() needs to be exported to
vmpressure while mem_cgroup_from_cont() can be made static.
Updated accordingly.
* cpu: cgroup_tg() doesn't have any user left. Removed.
* cpuacct: cgroup_ca() doesn't have any user left. Removed.
* hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left.
Removed.
* net_cls: cgrp_cls_state() doesn't have any user left. Removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Steven Rostedt <rostedt@goodmis.org>
cgroup is currently in the process of transitioning to using struct
cgroup_subsys_state * as the primary handle instead of struct cgroup *
in subsystem implementations for the following reasons.
* With unified hierarchy, subsystems will be dynamically bound and
unbound from cgroups and thus css's (cgroup_subsys_state) may be
created and destroyed dynamically over the lifetime of a cgroup,
which is different from the current state where all css's are
allocated and destroyed together with the associated cgroup. This
in turn means that cgroup_css() should be synchronized and may
return NULL, making it more cumbersome to use.
* Differing levels of per-subsystem granularity in the unified
hierarchy means that the task and descendant iterators should behave
differently depending on the specific subsystem the iteration is
being performed for.
* In majority of the cases, subsystems only care about its part in the
cgroup hierarchy - ie. the hierarchy of css's. Subsystem methods
often obtain the matching css pointer from the cgroup and don't
bother with the cgroup pointer itself. Passing around css fits
much better.
This patch converts all cgroup_subsys methods to take @css instead of
@cgroup. The conversions are mostly straight-forward. A few
noteworthy changes are
* ->css_alloc() now takes css of the parent cgroup rather than the
pointer to the new cgroup as the css for the new cgroup doesn't
exist yet. Knowing the parent css is enough for all the existing
subsystems.
* In kernel/cgroup.c::offline_css(), unnecessary open coded css
dereference is replaced with local variable access.
This patch shouldn't cause any behavior differences.
v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced
with local variable @css as suggested by Li Zefan.
Rebased on top of new for-3.12 which includes for-3.11-fixes so
that ->css_free() invocation added by da0a12caff ("cgroup: fix a
leak when percpu_ref_init() fails") is converted too. Suggested
by Li Zefan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Currently, controllers have to explicitly follow the cgroup hierarchy
to find the parent of a given css. cgroup is moving towards using
cgroup_subsys_state as the main controller interface construct, so
let's provide a way to climb the hierarchy using just csses.
This patch implements css_parent() which, given a css, returns its
parent. The function is guarnateed to valid non-NULL parent css as
long as the target css is not at the top of the hierarchy.
freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
are converted to use css_parent() instead of accessing cgroup->parent
directly.
* __parent_ca() is dropped from cpuacct and its usage is replaced with
parent_ca(). The only difference between the two was NULL test on
cgroup->parent which is now embedded in css_parent() making the
distinction moot. Note that eventually a css->parent field will be
added to css and the NULL check in css_parent() will go away.
This patch shouldn't cause any behavior differences.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
css (cgroup_subsys_state) is usually embedded in a subsys specific
data structure. Subsystems either use container_of() directly to cast
from css to such data structure or has an accessor function wrapping
such cast. As cgroup as whole is moving towards using css as the main
interface handle, add and update such accessors to ease dealing with
css's.
All accessors explicitly handle NULL input and return NULL in those
cases. While this looks like an extra branch in the code, as all
controllers specific data structures have css as the first field, the
casting doesn't involve any offsetting and the compiler can trivially
optimize out the branch.
* blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such
accessor. Added.
* memory, hugetlb and devices already had one but didn't explicitly
handle NULL input. Updated.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
The names of the two struct cgroup_subsys_state accessors -
cgroup_subsys_state() and task_subsys_state() - are somewhat awkward.
The former clashes with the type name and the latter doesn't even
indicate it's somehow related to cgroup.
We're about to revamp large portion of cgroup API, so, let's rename
them so that they're less awkward. Most per-controller usages of the
accessors are localized in accessor wrappers and given the amount of
scheduled changes, this isn't gonna add any noticeable headache.
Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state()
to task_css(). This patch is pure rename.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
The original implementation of the Smack IPv6 port based
local controls works most of the time using a sockaddr as
a temporary variable, but not always as it overflows in
some circumstances. The correct data is a sockaddr_in6.
A struct sockaddr isn't as large as a struct sockaddr_in6.
There would need to be casting one way or the other. This
patch gets it the right way.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
The Smack code that matches incoming CIPSO tags with Smack labels
reaches through the NetLabel interfaces and compares the network
data with the CIPSO header associated with a Smack label. This was
done in a ill advised attempt to optimize performance. It works
so long as the categories fit in a single capset, but this isn't
always the case.
This patch changes the Smack code to use the appropriate NetLabel
interfaces to compare the incoming CIPSO header with the CIPSO
header associated with a label. It will always match the CIPSO
headers correctly.
Targeted for git://git.gitorious.org/smack-next/kernel.git
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Accepted for the smack-next tree after changing the number of
slots from 128 to 16.
This patch adds a hash table to quicken searching of a smack label by its name.
Basically, the patch improves performance of SMACK initialization. Parsing of
rules involves translation from a string to a smack_known (aka label) entity
which is done in smk_find_entry().
The current implementation of the function iterates over a global list of
smack_known resulting in O(N) complexity for smk_find_entry(). The total
complexity of SMACK initialization becomes O(rules * labels). Therefore it
scales quadratically with a complexity of a system.
Applying the patch reduced the complexity of smk_find_entry() to O(1) as long
as number of label is in hundreds. If the number of labels is increased please
update SMACK_HASH_SLOTS constant defined in security/smack/smack.h. Introducing
the configuration of this constant with Kconfig or cmdline might be a good
idea.
The size of the hash table was adjusted experimentally. The rule set used by
TIZEN contains circa 17K rules for 500 labels. The table above contains
results of SMACK initialization using 'time smackctl apply' bash command.
The 'Ref' is a kernel without this patch applied. The consecutive values
refers to value of SMACK_HASH_SLOTS. Every measurement was repeated three
times to reduce noise.
| Ref | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | 512
--------------------------------------------------------------------------------------------
Run1 | 1.156 | 1.096 | 0.883 | 0.764 | 0.692 | 0.667 | 0.649 | 0.633 | 0.634 | 0.629 | 0.620
Run2 | 1.156 | 1.111 | 0.885 | 0.764 | 0.694 | 0.661 | 0.649 | 0.651 | 0.634 | 0.638 | 0.623
Run3 | 1.160 | 1.107 | 0.886 | 0.764 | 0.694 | 0.671 | 0.661 | 0.638 | 0.631 | 0.624 | 0.638
AVG | 1.157 | 1.105 | 0.885 | 0.764 | 0.693 | 0.666 | 0.653 | 0.641 | 0.633 | 0.630 | 0.627
Surprisingly, a single hlist is slightly faster than a double-linked list.
The speed-up saturates near 64 slots. Therefore I chose value 128 to provide
some margin if more labels were used.
It looks that IO becomes a new bottleneck.
Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
The smack_parsed_rule structure is allocated. If a rule is successfully
installed then the last reference to the object is lost. This patch fixes this
leak. Moreover smack_parsed_rule is allocated on stack because it no longer
needed ofter smk_write_rules_list() is finished.
Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
Current net name space has only one genid for both IPv4 and IPv6, it has below
drawbacks:
- Add/delete an IPv4 address will invalidate all IPv6 routing table entries.
- Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table
entries even when the policy is only applied for one address family.
Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6
separately in a fine granularity.
Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the packet class in SELinux is not checked if there are no
SECMARK rules in the security or mangle netfilter tables. Some systems
prefer that packets are always checked, for example, to protect the system
should the netfilter rules fail to load or if the nefilter rules
were maliciously flushed.
Add the always_check_network policy capability which, when enabled, treats
SECMARK as enabled, even if there are no netfilter SECMARK rules and
treats peer labeling as enabled, even if there is no Netlabel or
labeled IPSEC configuration.
Includes definition of "redhat1" SELinux policy capability, which
exists in the SELinux userpace library, to keep ordering correct.
The SELinux userpace portion of this was merged last year, but this kernel
change fell on the floor.
Signed-off-by: Chris PeBenito <cpebenito@tresys.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
When the BUG() macro is disabled at compile time it can cause some
problems in the SELinux netnode code: invalid return codes and
uninitialized variables. This patch fixes this by making sure we take
some corrective action after the BUG() macro.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Use a helper to determine if a superblock should have the seclabel flag
rather than doing it in the function. I'm going to use this in the
security server as well.
Signed-off-by: Eric Paris <eparis@redhat.com>
Rather than passing pointers to memory locations, strings, and other
stuff just give up on the separation and give security_fs_use the
superblock. It just makes the code easier to read (even if not easier to
reuse on some other OS)
Signed-off-by: Eric Paris <eparis@redhat.com>
Instead of having special code around the 'non-mount' seclabel mount option
just handle it like the mount options.
Signed-off-by: Eric Paris <eparis@redhat.com>
We only have 6 options, so char is good enough, but use a short as that
packs nicely. This shrinks the superblock_security_struct just a little
bit.
Signed-off-by: Eric Paris <eparis@redhat.com>
Just to make it clear that we have mount time options and flags,
separate them. Since I decided to move the non-mount options above
above 0x10, we need a short instead of a char. (x86 padding says
this takes up no additional space as we have a 3byte whole in the
structure)
Signed-off-by: Eric Paris <eparis@redhat.com>
Currently we set the initialize and seclabel flag in one place. Do some
unrelated printk then we unset the seclabel flag. Eww. Instead do the flag
twiddling in one place in the code not seperated by unrelated printk. Also
don't set and unset the seclabel flag. Only set it if we need to.
Signed-off-by: Eric Paris <eparis@redhat.com>
We had this random hard coded value of '8' in the code (I put it there)
for the number of bits to check for mount options. This is stupid. Instead
use the #define we already have which tells us the number of mount
options.
Signed-off-by: Eric Paris <eparis@redhat.com>
We check if the fsname is proc and if so set the proc superblock security
struct flag. We then check if the flag is set and use the string 'proc'
for the fsname instead of just using the fsname. What's the point? It's
always proc... Get rid of the useless conditional.
Signed-off-by: Eric Paris <eparis@redhat.com>
The /sys/fs/selinux/policy file is not valid on big endian systems like
ppc64 or s390. Let's see why:
static int hashtab_cnt(void *key, void *data, void *ptr)
{
int *cnt = ptr;
*cnt = *cnt + 1;
return 0;
}
static int range_write(struct policydb *p, void *fp)
{
size_t nel;
[...]
/* count the number of entries in the hashtab */
nel = 0;
rc = hashtab_map(p->range_tr, hashtab_cnt, &nel);
if (rc)
return rc;
buf[0] = cpu_to_le32(nel);
rc = put_entry(buf, sizeof(u32), 1, fp);
So size_t is 64 bits. But then we pass a pointer to it as we do to
hashtab_cnt. hashtab_cnt thinks it is a 32 bit int and only deals with
the first 4 bytes. On x86_64 which is little endian, those first 4
bytes and the least significant, so this works out fine. On ppc64/s390
those first 4 bytes of memory are the high order bits. So at the end of
the call to hashtab_map nel has a HUGE number. But the least
significant 32 bits are all 0's.
We then pass that 64 bit number to cpu_to_le32() which happily truncates
it to a 32 bit number and does endian swapping. But the low 32 bits are
all 0's. So no matter how many entries are in the hashtab, big endian
systems always say there are 0 entries because I screwed up the
counting.
The fix is easy. Use a 32 bit int, as the hashtab_cnt expects, for nel.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
rootfs (ramfs) can support setting of security contexts
by userspace due to the vfs fallback behavior of calling
the security module to set the in-core inode state
for security.* attributes when the filesystem does not
provide an xattr handler. No xattr handler required
as the inodes are pinned in memory and have no backing
store.
This is useful in allowing early userspace to label individual
files within a rootfs while still providing a policy-defined
default via genfs.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Currently, the ebitmap_node structure has a fixed size of 32 bytes. On
a 32-bit system, the overhead is 8 bytes, leaving 24 bytes for being
used as bitmaps. The overhead ratio is 1/4.
On a 64-bit system, the overhead is 16 bytes. Therefore, only 16 bytes
are left for bitmap purpose and the overhead ratio is 1/2. With a
3.8.2 kernel, a boot-up operation will cause the ebitmap_get_bit()
function to be called about 9 million times. The average number of
ebitmap_node traversal is about 3.7.
This patch increases the size of the ebitmap_node structure to 64
bytes for 64-bit system to keep the overhead ratio at 1/4. This may
also improve performance a little bit by making node to node traversal
less frequent (< 2) as more bits are available in each node.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
While running the high_systime workload of the AIM7 benchmark on
a 2-socket 12-core Westmere x86-64 machine running 3.10-rc4 kernel
(with HT on), it was found that a pretty sizable amount of time was
spent in the SELinux code. Below was the perf trace of the "perf
record -a -s" of a test run at 1500 users:
5.04% ls [kernel.kallsyms] [k] ebitmap_get_bit
1.96% ls [kernel.kallsyms] [k] mls_level_isvalid
1.95% ls [kernel.kallsyms] [k] find_next_bit
The ebitmap_get_bit() was the hottest function in the perf-report
output. Both the ebitmap_get_bit() and find_next_bit() functions
were, in fact, called by mls_level_isvalid(). As a result, the
mls_level_isvalid() call consumed 8.95% of the total CPU time of
all the 24 virtual CPUs which is quite a lot. The majority of the
mls_level_isvalid() function invocations come from the socket creation
system call.
Looking at the mls_level_isvalid() function, it is checking to see
if all the bits set in one of the ebitmap structure are also set in
another one as well as the highest set bit is no bigger than the one
specified by the given policydb data structure. It is doing it in
a bit-by-bit manner. So if the ebitmap structure has many bits set,
the iteration loop will be done many times.
The current code can be rewritten to use a similar algorithm as the
ebitmap_contains() function with an additional check for the
highest set bit. The ebitmap_contains() function was extended to
cover an optional additional check for the highest set bit, and the
mls_level_isvalid() function was modified to call ebitmap_contains().
With that change, the perf trace showed that the used CPU time drop
down to just 0.08% (ebitmap_contains + mls_level_isvalid) of the
total which is about 100X less than before.
0.07% ls [kernel.kallsyms] [k] ebitmap_contains
0.05% ls [kernel.kallsyms] [k] ebitmap_get_bit
0.01% ls [kernel.kallsyms] [k] mls_level_isvalid
0.01% ls [kernel.kallsyms] [k] find_next_bit
The remaining ebitmap_get_bit() and find_next_bit() functions calls
are made by other kernel routines as the new mls_level_isvalid()
function will not call them anymore.
This patch also improves the high_systime AIM7 benchmark result,
though the improvement is not as impressive as is suggested by the
reduction in CPU time spent in the ebitmap functions. The table below
shows the performance change on the 2-socket x86-64 system (with HT
on) mentioned above.
+--------------+---------------+----------------+-----------------+
| Workload | mean % change | mean % change | mean % change |
| | 10-100 users | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| high_systime | +0.1% | +0.9% | +2.6% |
+--------------+---------------+----------------+-----------------+
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Remove the BUG_ON() from selinux_skb_xfrm_sid() and propogate the
error code up to the caller. Also check the return values in the
only caller function, selinux_skb_peerlbl_sid().
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Remove the unused get_sock_isec() function and do some formatting
fixes.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
The SELinux labeled IPsec code state management functions have been
long neglected and could use some cleanup and consolidation.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
The xfrm_state_alloc_security() LSM hook implementation is really a
multiplexed hook with two different behaviors depending on the
arguments passed to it by the caller. This patch splits the LSM hook
implementation into two new hook implementations, which match the
LSM hooks in the rest of the kernel:
* xfrm_state_alloc
* xfrm_state_alloc_acquire
Also included in this patch are the necessary changes to the SELinux
code; no other LSMs are affected.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Since everybody sets kstrdup()ed constant string to "struct xattr"->name but
nobody modifies "struct xattr"->name , we can omit kstrdup() and its failure
checking by constifying ->name member of "struct xattr".
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Joel Becker <jlbec@evilplan.org> [ocfs2]
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Tested-by: Paul Moore <paul@paul-moore.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Pull nfsd changes from Bruce Fields:
"Changes this time include:
- 4.1 enabled on the server by default: the last 4.1-specific issues
I know of are fixed, so we're not going to find the rest of the
bugs without more exposure.
- Experimental support for NFSv4.2 MAC Labeling (to allow running
selinux over NFS), from Dave Quigley.
- Fixes for some delicate cache/upcall races that could cause rare
server hangs; thanks to Neil Brown and Bodo Stroesser for extreme
debugging persistence.
- Fixes for some bugs found at the recent NFS bakeathon, mostly v4
and v4.1-specific, but also a generic bug handling fragmented rpc
calls"
* 'for-3.11' of git://linux-nfs.org/~bfields/linux: (31 commits)
nfsd4: support minorversion 1 by default
nfsd4: allow destroy_session over destroyed session
svcrpc: fix failures to handle -1 uid's
sunrpc: Don't schedule an upcall on a replaced cache entry.
net/sunrpc: xpt_auth_cache should be ignored when expired.
sunrpc/cache: ensure items removed from cache do not have pending upcalls.
sunrpc/cache: use cache_fresh_unlocked consistently and correctly.
sunrpc/cache: remove races with queuing an upcall.
nfsd4: return delegation immediately if lease fails
nfsd4: do not throw away 4.1 lock state on last unlock
nfsd4: delegation-based open reclaims should bypass permissions
svcrpc: don't error out on small tcp fragment
svcrpc: fix handling of too-short rpc's
nfsd4: minor read_buf cleanup
nfsd4: fix decoding of compounds across page boundaries
nfsd4: clean up nfs4_open_delegation
NFSD: Don't give out read delegations on creates
nfsd4: allow client to send no cb_sec flavors
nfsd4: fail attempts to request gss on the backchannel
nfsd4: implement minimal SP4_MACH_CRED
...
Pull networking updates from David Miller:
"This is a re-do of the net-next pull request for the current merge
window. The only difference from the one I made the other day is that
this has Eliezer's interface renames and the timeout handling changes
made based upon your feedback, as well as a few bug fixes that have
trickeled in.
Highlights:
1) Low latency device polling, eliminating the cost of interrupt
handling and context switches. Allows direct polling of a network
device from socket operations, such as recvmsg() and poll().
Currently ixgbe, mlx4, and bnx2x support this feature.
Full high level description, performance numbers, and design in
commit 0a4db187a999 ("Merge branch 'll_poll'")
From Eliezer Tamir.
2) With the routing cache removed, ip_check_mc_rcu() gets exercised
more than ever before in the case where we have lots of multicast
addresses. Use a hash table instead of a simple linked list, from
Eric Dumazet.
3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from
Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski,
Marek Puzyniak, Michal Kazior, and Sujith Manoharan.
4) Support reporting the TUN device persist flag to userspace, from
Pavel Emelyanov.
5) Allow controlling network device VF link state using netlink, from
Rony Efraim.
6) Support GRE tunneling in openvswitch, from Pravin B Shelar.
7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from
Daniel Borkmann and Eric Dumazet.
8) Allow controlling of TCP quickack behavior on a per-route basis,
from Cong Wang.
9) Several bug fixes and improvements to vxlan from Stephen
Hemminger, Pravin B Shelar, and Mike Rapoport. In particular,
support receiving on multiple UDP ports.
10) Major cleanups, particular in the area of debugging and cookie
lifetime handline, to the SCTP protocol code. From Daniel
Borkmann.
11) Allow packets to cross network namespaces when traversing tunnel
devices. From Nicolas Dichtel.
12) Allow monitoring netlink traffic via AF_PACKET sockets, in a
manner akin to how we monitor real network traffic via ptype_all.
From Daniel Borkmann.
13) Several bug fixes and improvements for the new alx device driver,
from Johannes Berg.
14) Fix scalability issues in the netem packet scheduler's time queue,
by using an rbtree. From Eric Dumazet.
15) Several bug fixes in TCP loss recovery handling, from Yuchung
Cheng.
16) Add support for GSO segmentation of MPLS packets, from Simon
Horman.
17) Make network notifiers have a real data type for the opaque
pointer that's passed into them. Use this to properly handle
network device flag changes in arp_netdev_event(). From Jiri
Pirko and Timo Teräs.
18) Convert several drivers over to module_pci_driver(), from Peter
Huewe.
19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a
O(1) calculation instead. From Eric Dumazet.
20) Support setting of explicit tunnel peer addresses in ipv6, just
like ipv4. From Nicolas Dichtel.
21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet.
22) Prevent a single high rate flow from overruning an individual cpu
during RX packet processing via selective flow shedding. From
Willem de Bruijn.
23) Don't use spinlocks in TCP md5 signing fast paths, from Eric
Dumazet.
24) Don't just drop GSO packets which are above the TBF scheduler's
burst limit, chop them up so they are in-bounds instead. Also
from Eric Dumazet.
25) VLAN offloads are missed when configured on top of a bridge, fix
from Vlad Yasevich.
26) Support IPV6 in ping sockets. From Lorenzo Colitti.
27) Receive flow steering targets should be updated at poll() time
too, from David Majnemer.
28) Fix several corner case regressions in PMTU/redirect handling due
to the routing cache removal, from Timo Teräs.
29) We have to be mindful of ipv4 mapped ipv6 sockets in
upd_v6_push_pending_frames(). From Hannes Frederic Sowa.
30) Fix L2TP sequence number handling bugs, from James Chapman."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits)
drivers/net: caif: fix wrong rtnl_is_locked() usage
drivers/net: enic: release rtnl_lock on error-path
vhost-net: fix use-after-free in vhost_net_flush
net: mv643xx_eth: do not use port number as platform device id
net: sctp: confirm route during forward progress
virtio_net: fix race in RX VQ processing
virtio: support unlocked queue poll
net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
Documentation: Fix references to defunct linux-net@vger.kernel.org
net/fs: change busy poll time accounting
net: rename low latency sockets functions to busy poll
bridge: fix some kernel warning in multicast timer
sfc: Fix memory leak when discarding scattered packets
sit: fix tunnel update via netlink
dt:net:stmmac: Add dt specific phy reset callback support.
dt:net:stmmac: Add support to dwmac version 3.610 and 3.710
dt:net:stmmac: Allocate platform data only if its NULL.
net:stmmac: fix memleak in the open method
ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
net: ipv6: fix wrong ping_v6_sendmsg return value
...
Feature highlights include:
- Add basic client support for NFSv4.2
- Add basic client support for Labeled NFS (selinux for NFSv4.2)
- Fix the use of credentials in NFSv4.1 stateful operations, and
add support for NFSv4.1 state protection.
Bugfix highlights:
- Fix another NFSv4 open state recovery race
- Fix an NFSv4.1 back channel session regression
- Various rpc_pipefs races
- Fix another issue with NFSv3 auth negotiation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJR2vsSAAoJEGcL54qWCgDyWBIP/AqlpBBAblxbNQ1Bl/0m1Pdb
iKH961qgM4U1BzK0svGtHTZqkovpm4o/VbkbKBT5mQ4g6SbbsJ/AsS1plCyfnIZi
bdnKNJyj6zg0NsAkJ3vKWqd4BTaP+icdSfEIlRKQxAPESewN7b5B3OWgY4KdYmnk
q5BP25anC1ryxVycSY67ux8S2IKXVSRZeCZv+RO21rvZ2G0bV5y7t8Om28ztxEnU
RKrHgQHgaaktR7i8QVO0sbiWq3iqLa3GPkUvFLwWGr8PQJtTkYY0QwYSrsV3N4rY
hYpMRUZFHpZ8UG5YvBT6xyOy/XaGwMGKSfZjB9/YG4QVju+tTy50U1JbTil5PEWY
GHWYF68aurIeUkXrhSv8AVnOnhir0mISx5ou/SV7p0QoAZ92V6kq+LkPrW520qlc
z8ILh3j28pN3ZUCIEArcaZhYCt48uO2hwBi5TqevQyyGRsXFGbN1moD5jvHkllft
Fi0XGuCBdvhrzFRZcsEl+PDq7fT8lXUK2BHe8oR5jz9PhUp+jpEl9m/eg3RsjJjN
DuxsHye2U4chScdnRtLBQvpFtdINvWX/Gy8Bi7kdE5tsQySvOa+rdwuBc7h88PHC
+4xI2iX3z4O1+GpsAe/T9+pjW689jEilS+eVDRVEGl6yHGn9q8PYOayjPjwbJHxS
R2mLTRhKu1DKguTzO13f
=wGjn
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Feature highlights include:
- Add basic client support for NFSv4.2
- Add basic client support for Labeled NFS (selinux for NFSv4.2)
- Fix the use of credentials in NFSv4.1 stateful operations, and add
support for NFSv4.1 state protection.
Bugfix highlights:
- Fix another NFSv4 open state recovery race
- Fix an NFSv4.1 back channel session regression
- Various rpc_pipefs races
- Fix another issue with NFSv3 auth negotiation
Please note that Labeled NFS does require some additional support from
the security subsystem. The relevant changesets have all been
reviewed and acked by James Morris."
* tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (54 commits)
NFS: Set NFS_CS_MIGRATION for NFSv4 mounts
NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs
nfs: have NFSv3 try server-specified auth flavors in turn
nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it
nfs: move server_authlist into nfs_try_mount_request
nfs: refactor "need_mount" code out of nfs_try_mount
SUNRPC: PipeFS MOUNT notification optimization for dying clients
SUNRPC: split client creation routine into setup and registration
SUNRPC: fix races on PipeFS UMOUNT notifications
SUNRPC: fix races on PipeFS MOUNT notifications
NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount
NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount
NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize
NFS: Improve legacy idmapping fallback
NFSv4.1 end back channel session draining
NFS: Apply v4.1 capabilities to v4.2
NFSv4.1: Clean up layout segment comparison helper names
NFSv4.1: layout segment comparison helpers should take 'const' parameters
NFSv4: Move the DNS resolver into the NFSv4 module
rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set
...
Pull security subsystem updates from James Morris:
"In this update, Smack learns to love IPv6 and to mount a filesystem
with a transmutable hierarchy (i.e. security labels are inherited
from parent directory upon creation rather than creating process).
The rest of the changes are maintenance"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (37 commits)
tpm/tpm_i2c_infineon: Remove unused header file
tpm: tpm_i2c_infinion: Don't modify i2c_client->driver
evm: audit integrity metadata failures
integrity: move integrity_audit_msg()
evm: calculate HMAC after initializing posix acl on tmpfs
maintainers: add Dmitry Kasatkin
Smack: Fix the bug smackcipso can't set CIPSO correctly
Smack: Fix possible NULL pointer dereference at smk_netlbl_mls()
Smack: Add smkfstransmute mount option
Smack: Improve access check performance
Smack: Local IPv6 port based controls
tpm: fix regression caused by section type conflict of tpm_dev_release() in ppc builds
maintainers: Remove Kent from maintainers
tpm: move TPM_DIGEST_SIZE defintion
tpm_tis: missing platform_driver_unregister() on error in init_tis()
security: clarify cap_inode_getsecctx description
apparmor: no need to delay vfree()
apparmor: fix fully qualified name parsing
apparmor: fix setprocattr arg processing for onexec
apparmor: localize getting the security context to a few macros
...