Privacy policies
Privacy policies are declared within server configuration files to control how much information is exposed to clients. Keep in mind that the outcomes of vectorized operations (that is, operations between columns, like additions) are not exposed at all. Whereas map-reduce operations already expose only aggregate quantities to external clients. However, it may still be important to add more degrees of anonymity to avoid accidental information leakage, even under the assumption that clients are well-meaning. Below we describe commonly available privacy policies, although you can also write your own.
Data security
This describes available privacy policies and how to declare them inside the configuration file. These policies are important to ensure that sensitive information is handled correctly and that the privacy of individuals is maintained.
k-Anonymity Ensures that any computations that return non-None values are computed across at least k data samples.
privacy:
- policy: fedmed.privacy.Anonymity
params:
k: 3
filter: ["*"] # optional (this default is to apply on all fragments)
reject: [] # optional (this default is to reject nothing)
Noise Addition Adds random noise to reduce operation outcomes to obscure individual values. The amount of noise should depend on differential privacy needs.
privacy:
- policy: fedmed.privacy.Noise
params:
value: 0.01
type: float
filter: ["*"]
reject: []
Coarsening Reduces the precision of numerical values to a specified level. This is especially needed when noise is added.
privacy:
- policy: fedmed.privacy.Coarsening
params:
value: 0.1
type: float
filter: ["*"]
reject: []
Saturation Limits the outcomes of reduce operations to within a specified range.
privacy:
- policy: fedmed.privacy.Saturation
params:
min: 0
max: 100
type: int
filter: ["*"]
reject: []
Workload limits
Cache Limit Restricts the number of cached computations to protect server memory.
privacy:
- policy: fedmed.privacy.CacheLimit
params:
limit: 100
filter: ["*"]
reject: []
Complexity Cap Limits the number of dependent operations to prevent overly complex computations. Basically, this is one more than the maximum allowed depth of the abstract syntax tree of vectorized operations.
privacy:
- policy: fedmed.privacy.ComplexityCap
params:
cap: 10
filter: ["*"]
reject: []
Custom policies
To implement a new policy, follow the prototype below. This example shows how to create a new privacy policy that can be applied to server operations.
class PrivacyPolicy:
def __init__(self, mandatory_arg0, mandatory_arg1, **kwargs):
...
self.applied = 0 # holds how many times the policy is applied
# kwargs are optional arguments
self.condition = kwargs.get("filter", ["*"])
self.reject = kwargs.get("reject", [])
def name(self):
return '<span class="badge bg-secondary text-light">value</span> Policy name (this will appear in the server panel)'
def description(self):
return 'Your policy description here (this will appear in the server panel)'
def on(self, fragment):
# standard fragment matching on when to apply the policy
for condition in self.reject:
if fnmatch.fnmatch(fragment, condition):
return None
for condition in self.condition:
if fnmatch.fnmatch(fragment, condition):
return self
return None
def bins(self, results):
# how to apply the policy when bins of numbers are returned
return [(value, self.postprocess(count)) for value, count in results]
def preprocess(self, entries):
return entries
def postprocess(self, result):
# how to apply the policy returning the transformed (more anonymous outcome)
# this is an example of a coarsening application
if "float" == result.__class__.__name__:
self.applied += 1 # keep track of the times the policy was applied
return int(result / 0.01) * 0.01
return result
def acknowledge(self, server, fragment):
# Called after the server acknowledges the fragment.
# Workload cap or similar policies can use this method
# to remove too complex fragments from the server to
# prevent their reuse. Implement this with care, as
# it can be catastrophic for those trying to run
# operations on your data.
pass
The above policy can be added to a configuration file per the following snippet. Do not forget to also share the policy module as a file or installable package with anyone that will be using the configuration:
privacy:
# other policies applied before the PrivacyPolicy
- policy: module.PrivacyPolicy # module is where to import PrivacyPolicy from
params:
mandatory_arg0: ...
mandatory_arg1: ...
optional_arg0: ... # may be omitted (typically, the optional arguments are `filer` and `reject`
# other policies applied after the PrivacyPolicy
methods:
# methods implemented by the server