US8032924B2 - Access control method and a system for privacy protection - Google Patents

Access control method and a system for privacy protection Download PDF

Info

Publication number
US8032924B2
US8032924B2 US12/130,308 US13030808A US8032924B2 US 8032924 B2 US8032924 B2 US 8032924B2 US 13030808 A US13030808 A US 13030808A US 8032924 B2 US8032924 B2 US 8032924B2
Authority
US
United States
Prior art keywords
label
channel
read
processing component
labels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US12/130,308
Other versions
US20090064343A1 (en
Inventor
Weifeng Chen
Alexandre V. Evfimievski
Zhen Liu
Ralf Rantzau
Anton Viktorovich Riabov
Pankaj Rohatgi
Angela Marie Schuett
Ramakrishnan Srikant
Grant Wagner
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US12/130,308 priority Critical patent/US8032924B2/en
Publication of US20090064343A1 publication Critical patent/US20090064343A1/en
Application granted granted Critical
Publication of US8032924B2 publication Critical patent/US8032924B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • H04L63/105Multiple levels of security

Definitions

  • the present invention relates to network security, and more particularly, to an access control method and a system for privacy protection.
  • the protection of PII involves two aspects: information security and privacy protection.
  • Information security measures such as access control, firewalls, sandboxing and secure communication channels are used to prevent unauthorized access to PII.
  • Information security measures alone are not enough for information systems to be privacy preserving since they are low level and do not distinguish between accessing PII for statistics computation or record retrieval. Privacy policies generally allow the former and prohibit the latter.
  • Lattice-based access control (LBAC) models such as those of Bell and LaPadula D. Bell and L. LaPadula. Secure computer system: Unified exposition and Multics interpretation. Technical Report ESD-TR-75-306, ESD/AFSC, Hanscom AFB, Bedford, Mass., 1975, and Denning D. Denning. A lattice model of secure information flow. Communications of the ACM, 19(5):236-243, May 1976, can provide formal end-to-end security guarantees in systems where the components are not trusted. These models are often overly restrictive. However, these issues can be address by modifications that permit declassification of data using trusted components, e.g., D. Bell. Secure computer systems: A refinement of the mathematical model. MTR-2547, Vol. III, MITRE Corp., 1974.
  • the Bell-LaPadula policy is widely used in multi-level secure (MLS) systems.
  • the models of Bell-LaPadula, Denning, Biba and several others can be unified under the common LBAC framework described by Sandhu, R. Sandhu. Lattice-based access control models. IEEE Computer, 26(11):9-19, 1993.
  • Security and integrity models have been combined in practice, e.g., in the Caemarvon protocol H. Scherzer, R. Canetti, P. A. Karger, H. Krawczyk, T. Rabin, and D.C. Toll. Authenticating mandatory access controls and preserving privacy for a high-assurance smart card. In ESORICS, pages 181-200, 2003.
  • a method for protecting information in a distributed stream processing system comprises: assigning a principal label to a processing component; assigning a first channel label to a first communication channel that is input to the processing component; comparing the principal label to the first channel label to determine if the processing component can read data attributes of the first channel label; and reading the data attributes of the first channel label when the principal label is equal to or has precedence over the first channel label, wherein the principal label includes a read label and a write label and at least one of a selection label, an addition label or a suppression label.
  • the principal label includes data attributes.
  • the principal label includes a name and a risk level for each of the data attributes of the principal label.
  • the principal label includes an integrity level for the data attributes of the principal label.
  • the first channel label includes a name and a risk level for each of the data attributes of the first channel label.
  • the first channel label includes an integrity level for the data attributes of the first channel label.
  • the first channel label further includes a processing history of the first communication channel.
  • the method further comprises: assigning a second channel label to a second communication channel that is output from the processing component; comparing the principal label to the second channel label to determine if data attributes of the principal label can be written to the second communication channel; and writing the data attributes of the principal label to the second communication channel when the second channel label is equal to or has precedence over the principal label.
  • the method further comprises: assigning a user label to a user that is connected to the second communication channel; comparing the user label to the second channel label to determine if the user can read data attributes of the second channel label; and reading the data attributes of the second channel label when the user label is equal to or has precedence over the second channel label.
  • the user label includes data attributes, the data attributes of the user label including a user name, a user role or a user purpose.
  • the user label includes a read label.
  • a method for verifying privacy policy compliance of a workflow in a distributed stream processing system comprises: assigning a read label to a user; assigning channel labels to channels connecting processing components of the workflow to external data sources; assigning read and write labels to the processing components; accepting the workflow when each of the read labels dominates the channel labels of all channels read by its corresponding processing component, each of the write labels is dominated by the channel labels of all channels written to by its corresponding processing component and each of the write labels dominates the read label of its corresponding processing component; or rejecting the workflow if each of the read labels does not dominate the channel labels of all the channels read by its corresponding processing component, each of the write labels is not dominated by the channel labels of all the channels written to by its corresponding processing component, each of the write labels does not dominate the read label of its corresponding processing component or if the user label does not dominate the channel labels of all channels read by the user, wherein the read label of the user, the channel labels of the channels and the read and write labels of the processing
  • At least one of the channel labels is already assigned to at least one of the communication channels before the channel labels are assigned.
  • At least one set of the read and write labels is already assigned to at least one of the processing components before the read and write labels are assigned.
  • the method further comprises: assigning a maximum read label to one of the processing components; and rejecting the workflow if the read label is not dominated by the maximum read label.
  • the method further comprises: assigning an addition label to one of the processing components; and rejecting the workflow if the addition label is not dominated by the write label.
  • the method further comprises: assigning a selection label to one of the processing components; and rejecting the workflow if the selection label is not dominated by the write label.
  • the method further comprises: assigning a suppression label to one of the processing components, wherein when the suppression label is assigned the write label dominates a minimum value of the suppression level and the read label.
  • the method further comprises: generating a planning task representation; reading the planning task representation using a planning algorithm; assigning the read label to the user, the channel labels to the channels and the read and write labels to the processing components using the planning algorithm; and generating the workflow by translating an output of the planning algorithm.
  • the method further comprises deploying the workflow when it is accepted.
  • the method further comprises translating the read and write labels of the processing components into multi-level secure (MLS) system compatible labels, wherein the MLS system enforces a Bell-LaPadula privacy policy and the workflow is deployed in the MLS system.
  • MLS multi-level secure
  • a computer program product comprising a computer useable medium having computer program logic recorded thereon for protecting information in a distributed stream processing system
  • the computer program logic comprises: program code for assigning a principal label to a processing component; program code for assigning a first channel label to a first communication channel that is input to the processing component; program code for comparing the principal label to the first channel label to determine if the processing component can read data attributes of the first channel label; and program code for reading the data attributes of the first channel label when the principal label is equal to or has precedence over the first channel label, wherein the principal label includes a read label and a write label and at least one of a selection label, an addition label or a suppression label.
  • the computer program product further comprises: program code for assigning a second channel label to a second communication channel this is output from the processing component; program code for comparing the principal label to the second channel label to determine if data attributes of the principal label can be written to the second communication channel; and program code for writing the data attributes of the principal label to the second communication channel when the second channel label is equal to or has precedence over the principal label.
  • the computer program product further comprises: program code for assigning a user label to a user that is connected to the second communication channel; program code for comparing the user label to the second channel label to determine if the user can read data attributes of the second channel label; and program code for reading the data attributes of the second channel label when the user label is equal to or has precedence over the second channel label.
  • a computer program product comprising a computer useable medium having computer program logic recorded thereon for verifying privacy policy compliance of a workflow in a distributed stream processing system
  • the computer program logic comprises: program code for assigning a read label to a user; program code for assigning channel labels to channels connecting processing components of the workflow to external data sources; program code for assigning read and write labels to the processing components; program code for accepting the workflow when each of the read labels dominates the channel labels of all channels read by its corresponding processing component, each of the write labels is dominated by the channel labels of all channels written to by its corresponding processing component and each of the write labels dominates the read label of its corresponding processing component; or rejecting the workflow if each of the read labels does not dominate the channel labels of all the channels read by its corresponding processing component, each of the write labels is not dominated by the channel labels of all the channels written to by its corresponding processing component, each of the write labels does not dominate the read label of its corresponding processing component or if the user label does not dominate the channel
  • the computer program product further comprises: program code for assigning a maximum read label to one of the processing components; and program code for rejecting the workflow if the read label is not dominated by the maximum read label.
  • the computer program product further comprises: program code for assigning an addition label to one of the processing components; and program code for rejecting the workflow if the addition label is not dominated by the write label.
  • the computer program product further comprises: program code for assigning a selection label to one of the processing components; and program code for rejecting the workflow if the selection label is not dominated by the write label.
  • the computer program product further comprises: program code for assigning a suppression label to one of the processing components, wherein when the suppression label is assigned the write label dominates a minimum value of the suppression level and the read label.
  • the computer program product further comprises: program code for generating a planning task representation; program code for reading the planning task representation using a planning algorithm; program code for assigning the read label to the user, the channel labels to the channels and the read and write labels to the processing components using the planning algorithm; and program code for generating the workflow by translating an output of the planning algorithm.
  • the computer program product further comprises program code for deploying the workflow when it is accepted.
  • the computer program product further comprises program code for translating the read and write labels of the processing components into MLS system compatible labels, wherein the MLS system enforces a Bell-LaPadula privacy policy.
  • a workflow deployment unit for protecting personally identifiable information (PII) in a distributed stream processing system comprises: an assigner for assigning principal labels to processing components, channel labels to communication channels, and a user label to a user; a verifier for verifying that the channel label of a communication channel that is input to a processing component can be read by the processing component, for verifying that the principal label of a processing component can be written to by the processing component to a communication channel that is output from the processing component and for verifying that the user label can be read by the channel label of the communication channel this is output from the processing component; and a deployer for deploying a workflow that includes the processing component that can read the channel label of the communication channel that is input to the processing component and that can write the principal label to the communication channel that is output from the processing component, the communication channel that is input to the processing component, the communication channel that is output from the processing component and the user that can read the channel label of the communication channel that is output from the processing component.
  • PII personally identifiable information
  • the principal labels and the channel labels include PII.
  • the workflow deployment unit further comprises: a first interface for receiving PII; and a second interface for receiving the principal labels, the channel labels and the user label.
  • FIG. 1 illustrates a method for protecting data in a distributed stream processing system according to an exemplary embodiment of the present invention
  • FIG. 2 illustrates controlling access to a stream of data according to an exemplary embodiment of the present invention
  • FIG. 3A illustrates a workflow generated with privacy constraints according to an exemplary embodiment of the present invention
  • FIG. 3B illustrates a workflow generated without privacy constraints according to an exemplary embodiment of the present invention
  • FIG. 4 illustrates a system for workflow verification and deployment according to an exemplary embodiment of the present invention.
  • FIG. 5 illustrates a label assignment algorithm according to an exemplary embodiment of the present invention.
  • a set of communicating components connected by communication channels is referred to as workflows or information flow graphs.
  • workflows or information flow graphs There are three categories of nodes in the workflow. Nodes that have no incoming edges are referred to as source nodes. Nodes that have no outgoing edges are referred to as sink nodes. The rest of the nodes are referred to as processing components or components.
  • source nodes represent the data sources (e.g., databases or data streams) from which the data arrives in the system.
  • a sink node typically represents an end user who receives results of the processing.
  • Processing components are programs that transform data, producing new data streams from the data they receive.
  • labels are assigned to the communication channels and to the processing components.
  • the labels describe data attributes present in channels or accessed by components.
  • the components are treated as principals, and may or may not be trusted.
  • a channel label is chosen and assigned such that is can correctly reflect both the content and processing history (e.g., selection and suppression) of information passing through the channel.
  • An access decision is made by comparing a label assigned to an output channel and a label of a user according to a role of the user and a user's access purpose.
  • a privacy policy in the MSA model provides a description of a set of categories of information that must be protected, e.g., taxpayer identifier, salary amount or bank account number. These categories are referred to as attributes, to distinguish from coarser security models and to better reflect operations on the data, such as selection. If a channel carries information belonging to a category, the channel is said to contain that attribute. For example, if a channel can carry a document containing the following sentence: “Alice's salary of $XXXX is directly deposited to her bank account YYYYYY”, the label of the channel must contain both salary and bank account attributes.
  • the channels in the MSA model can carry unstructured data.
  • a channel o in the MSA model is assigned a label L(o).
  • L(o) is written as L.
  • a channel label describes the attributes contained in the channel, as well as a processing history of the channel.
  • L. and L.t is used to denote these two elements, respectively.
  • Attribute set includes a pair of (attribute-name, risk-level) for each attribute contained in the channel.
  • the risk level r 1 represents the risk of disclosing information corresponding to the attribute att i if the contents of the channel collected over a fixed time are disclosed.
  • a risk level of 1 means that the attribute is explicitly present in the channel
  • a risk level of 0 means that it is impossible to derive the value of the attribute from the channel.
  • practical issues of risk evaluation will be addressed.
  • an attribute with the risk level of 0 can be omitted from the labels.
  • the integrity level L.t is an integer between 1 and T, where T is the highest level of integrity specified in the privacy policy.
  • Information from a channel with a higher t is considered more accurate, reliable and trustworthy than that from a channel with a lower t, as described, e.g., in K. J. Biba. Integrity considerations for secure computer systems.
  • a low integrity level corresponds to either potentially damaging data (e.g., data containing a virus), or to data that may be misleading or unreliable.
  • L(o).t is determined by the labels of a principal that writes to a channel, or it is determined based on processing history.
  • Each principal p is assigned the following labels: a read label L R (p), a write label L W (p), a selection label L S (p), an addition label L A (p), and a suppression (or downgrading) label L D (p).
  • L R determines which channels can be read by a principal
  • L W controls labels of channels written by the principal.
  • the selection label L S is used to record in the processing history of a channel selection operations that have been performed by the principal. In particular, if a principal conducts a selection based on an attribute att, it is required that (att-selected, 1) ⁇ L S . where att-selected is a new attribute representing that att has been selected.
  • the access control rules to be described below guarantee that the selection history is recorded when information is transmitted throughout the workflow graph.
  • the addition label L A includes the attributes that are added to output data by the principal.
  • the suppression label L D describes the attributes that are suppressed by the principal.
  • attribute set contains more sensitive information than Definition 1 can be extended to compare two labels as follows.
  • L L′ means that label L′ represents more sensitive information than L. It is easy to show that is a partial order (e.g., reflexive, transitive and antisymmetric). Since is not a total order, there may exist two labels that are not comparable. Use L L′ to denote the fact that (1) L and L′ are not comparable, or (2) (L′ L) (L ⁇ L′).
  • L ⁇ L′ min ⁇ t,t′ ⁇ ).
  • L ⁇ L′ represents the union of the attributes annotated by L and L′.
  • L L 1 L
  • L 2 L L 1 ⁇ L 2
  • L ⁇ L′ min ⁇ t,t′ ⁇
  • a principal can read channels that contain at most as much sensitive information as represented by the principal's read label.
  • the integrity level t R of L R can be used to restrict allowed input of the principal to channels with integrity at or above a specified level (t R ).
  • This rule specifies that the channel written by the principal contains all the attributes that are represented by the write label L W , and L(o′).t ⁇ L W .t.
  • the label of a channel written by a principal with a low integrity level has the same or lower integrity level, which disallows read access to the channel for other principals with high input integrity requirements. Consequently, access to the channels produced by low-integrity principals can be restricted, and potential privacy violations due to label uncertainty can be avoided.
  • the channels written by the principal will have a risk level for each of the attributes that is equal to or exceeding the risk level of the same attribute in any of the inputs or selection and addition labels. It is clear that (L R ⁇ L S ⁇ L A L W ) (L R L W ) (L S L W ) (L A L W ). Since is transitive, (L(o) L R L W L(o′)) (L(o) L(o′)), which means that, for an untrusted principal, all attributes contained in the label of a channel read by the principal must be contained in the labels of all channels written by the principal, with the same or higher risk level.
  • Condition L S L W L(o′) guarantees that if the principal performs a selection based on attribute att, label L(o′) includes the pair of (att-selected, 1) thus recording the selection operation in the processing history of the information.
  • L A L W L(o′) ensures that L(o′) contains the attributes added by the principal. Note that L(o′).t ⁇ L W .t ⁇ min ⁇ L R .t, L S .t, L A .t ⁇ , hence the integrity level of the label of a channel written by a principal is no larger than the minimum of the integrity levels of the principal's labels.
  • a principal label is assigned to a processing component ( 105 ) and a first channel label is assigned to a first communication channel ( 110 ).
  • the principal label is compared to the first channel label to determine if the processing component can read data attributes of the first channel label ( 115 ). If the principal label is equal to or has precedence over the first channel label, the data attributes of the first channel label are read by the processing component ( 120 ). For example, if a channel label L W contains attribute “SSN” with risk level 0.8, and attribute “Name” with risk level 0.6, and a principal label L R contains attribute “SNN” with risk level 1.0, and attribute “Name” with risk level 1.0, the principal is allowed read access to the channel.
  • a second channel label is assigned to a second communication channel that is output from the processing component ( 125 ).
  • the principal label is compared to the second channel label to determine if data attributes of the principal label can be written to the second communication channel ( 130 ). If the second channel label is equal to or has precedence over the principal label, the data attributes of the principal label are written to the second communication channel ( 135 ).
  • a user label is assigned to a user that is connected to the second communication channel ( 140 ).
  • the user label is compared to the second channel label to determine if the user can read data attributes of the second channel label ( 145 ). If the user label is equal to or has precedence over the second channel label, the data attributes of the second channel label are read by the user label ( 150 ).
  • FIG. 2 is an example of controlling access to a stream of data according to an exemplary embodiment of the present invention.
  • FIG. 2 shows data sources 210 a - b connected to processing components 220 a - c via channels (indicated by arrows) that have labels 240 a - e assigned thereto and a user 230 .
  • the user 230 has a read label 230 a assigned thereto.
  • user read access to the channel label 240 e is denied (by applying the method of FIG. 1 ) since the read label 230 a does not dominate the channel label 240 e .
  • a suppressor in the processing component 220 c can be used to reduce the risk level of attributes and selection labels shown in labels 240 d - e can add new attributes indicating that a selection has been performed.
  • a channel label is correct if it contains all the attributes contained in the channel, and the risk level corresponding to the attribute in the label is equal to or higher than a privacy risk measure of the channel with respect to the attribute. For example, each attribute contained in the channel is included in the label with an appropriate risk level, and selection operations performed using any attribute of the channel according to selection criteria are recorded in the label.
  • the attribute accumulation rule (3) ensures that all attributes contained in the channels read by the principal are preserved and appear in the label of every channel written by the principal. In this case, assuming that the input channel labels are correct, the labels of the channels written by the principal are also correct since no new attributes are introduced by the principal and the output label dominates the input label. In the case when a principal adds a new attribute att to the channels it writes, the policy requires that for this principal (att, 1) ⁇ L A . Therefore, rule (3) guarantees that (att, 1) ⁇ L(o′). , and hence, att is correctly reflected in the label L′ of each channel written to by the principal.
  • rule (3) guarantees that this selection operation is reflected by the labels of the channels written by the principal performing selection.
  • rule (4) makes sure that any attribute that is not suppressed is correctly reflected in label L(o′) of each channel written by the principal. For those attributes that are suppressed, it is assumed that the principal correctly reduces the risk level of the corresponding attribute since the principal is trusted to perform a suppression operation on all channels that satisfy the input requirements of the principal, and whose labels are dominated by the principal's read label.
  • Rule (3) describes label computation at the level of a single principal. In the following, however, it will be shown that as a whole the system of principals and channels, in which MSA access control rules are enforced, achieves privacy policy compliance.
  • first case there are no trusted principals performing suppression in the information flow graph.
  • second case there exists a principal that suppresses one or more attributes. For simplicity, attribute att will be the focus.
  • the second case can trivially be extended for multiple attributes.
  • Proposition (2) states that once a selection on at; is performed, att-selected is always included in the label when the information is transmitted through the workflow graph.
  • the channel label still includes att-selected, indicating that the stream has been selected based on att.
  • the MSA model can apply different access control rules in cases when the suppression of att is performed with or without prior selection of att.
  • the channel should be considered more sensitive with respect to the attribute att.
  • SPPL Stream Processing Planning Language
  • FIGS. 3A and 3B An example of a workflow-planning problem will now be described with reference to FIGS. 3A and 3B .
  • a marketing research department of a company that analyzes conversations within an Internet chatroom hosted by the company for the purpose of studying opinions about products. The analysis is performed by a stream processing system in real time. Privacy policy requires that no information entered by a child under 13 can be analyzed or stored, irrespective of purpose. Chatroom conversations may be stored and analyzed for research purposes, as long as the conversations are not associated with the names of the persons participating, and unless customers “opt-out”, e.g., request that their conversations be excluded.
  • FIGS. 3A and 3B the components are shown as rectangles 320 a - f and 320 a - b connected by streams (indicated by arrows), and for each stream, a label 340 a - g and 340 a - c with corresponding data types and privacy attributes is shown.
  • the workflow in FIG. 3A is generated if the MSA privacy policy is enforced. In the absence of privacy constraints, intermediate steps between GetConversations 320 a and AnalyzeChat 320 f components of FIG. 3A will be automatically removed, since they would become unnecessary.
  • FIG. 3B shows the workflow in the absence of privacy constraints.
  • ScreenNameToID 320 b was included in the workflow shown in FIG. 3A only because an ID type is required by subsequent suppressors. This type corresponds to a user identifier that is needed to access user properties, such as age and opt-out flag.
  • SPPL planners can create workflow directed acyclic graphs (DAGs) of higher complexity than shown in this example. A general approach will now be described.
  • the SPPL model describes the components of a stream processing workflow as actions that can have preconditions and effects. Streams in this model are described by predicates that can correspond to the type, format, and security or privacy attributes of data passing through the stream. Primal streams must be described as completely as possible based on the available knowledge about the external data sources.
  • the descriptions of the derived streams e.g., the streams produced by a component, are computed based on the descriptions of the component and of the streams processed by the component. Hence, the description of the output stream can be computed recursively.
  • the subset x.t describes the format and type of data carried by the stream, and the subset x.s represents the privacy label of the data, as will now be described.
  • An action c can be applied in a state if there exist streams x 1 , . . . , x J(c) ⁇ such that (c) ⁇ x j .
  • state K(c) new streams are created.
  • the new streams have type and format defined by the corresponding output ports of c, and have a privacy-related set equal to the intersection of input privacy-related sets, possibly affected by port-specific additions and deletions.
  • ⁇ , B, Given a goal set the planning problem ⁇ , B, ) is to find a set of actions that leads to a state in which there exists a stream x ⁇ such that ⁇ x, and such that the sum of quality values of the actions is maximized, and the sum of resource costs is less than a cost budget
  • a universal set U of all privacy predicates is defined as follows. The set is defined using the values of risk and integrity that appear in the labels used in describing components, primal streams or users. Since there is a limited number of these labels, the size of U is polynomial in the number of components and privacy attributes. For each value of integrity level t, set U includes the element int_atleast(t). For each attribute-risk (att, r) pair used in the labels, set U includes the element no_att(att, r).
  • the primal streams have write labels, which are represented as initial state vectors. Given L W , the corresponding set x.s describing one of the initial state streams is:
  • Proposition 4 The planning problem ⁇ , without resource and quality constraints is NP-complete.
  • polynomial algorithms can be derived for other cases.
  • labels are ordered, that is, if labels are one-dimensional, e.g., contain a single attribute (possibly, with different risk levels) and have equal integrity levels.
  • the same algorithm solves the planning problem in O(
  • a general search method such as branch-and-bound, can be employed for solving the problem.
  • the polynomial algorithm of Proposition 5 can be used to compute bounds and speedup the search.
  • Example architecture of an MSA-complaint system is shown in FIG. 4 .
  • this architecture the requirement for automatic workflow verification is addressed by including a workflow deployment interface 410 that is coupled to a privacy metadata database 405 and which performs necessary analysis and verification of workflows 415 a . . . x before deployment.
  • the workflow interface 410 ensures that PII is protected
  • Privacy architecture also extends a system runtime environment 430 by inserting a number of trusted enforcement components 460 a . . . x , which are used to ensure that the description of the workflow analyzed before deployment correctly reflects the information flow in runtime.
  • the system runtime 430 is a distributed system, which consists of one or more computers 450 a . . . x , a data interface 435 and a results interface 440 connected via a network 470 .
  • Sources 420 are connected to the data interface 435 and sinks 425 are connected to the results interface 440 .
  • Each of the computers 450 a . . . x runs one or more operating systems and may have trusted I/Os 455 a . . . x .
  • Local operating systems can host components, either directly, or using compositional framework software.
  • Workflows 415 a . . . x are submitted to a global scheduler 445 for deployment, and the scheduler 445 allocates components to operating systems and manages allocation of resources in runtime.
  • a workflow 415 a . . . x is sent to the scheduler 445 for deployment, it is verified for policy compliance by using the method of FIG. 1 and/or variations of the method of FIG. 1 as will be described hereinafter with reference to FIG. 5 . If a privacy violation is detected, the workflow graph is rejected; otherwise, the workflow 415 a . . . x is sent to the scheduler 445 , which will deploy it, provided that there are sufficient resources available.
  • the system may have a choice of labels assigned to the components 460 a . . . x . This is because the least sensitive information can be labeled as most sensitive without incurring violation of the policy. However this also means that access to that information will be restricted. Therefore, minimal write labels are assigned, so that read labels can also remain low while satisfying inequalities (rules (1), (2), (3) and (4)).
  • the privacy enforcement architecture In runtime, the privacy enforcement architecture must ensure that no communication between the components takes place except those specified in the workflow graph. Further, it must verify correctness of the write labels assigned to the nodes of the workflow 415 a . . . x , especially when components 460 a . . . x are not trusted. This can be achieved by selectively logging and auditing the dataflow between the components 460 a . . . x , since full analysis of all data can be infeasible because of significant resource requirements. Finally, each workflow 415 a . . . x must be logged and later audited to manage the risk of users obtaining privacy-sensitive information through submitting a combination of multiple workflows.
  • the privacy architecture provides functionality spanning entire lifetime of the workflow as follows: 1) workflow verification before deployment; 2) runtime privacy policy enforcement during execution; and 3) auditing of data, components and workflows during and after execution.
  • modules comprising the privacy architecture will now be described, starting with metadata for specifying policy and policy parameters, followed by a detailed description of verification, runtime enforcement and auditing.
  • Privacy Policy Metadata Descriptions of attributes, components, sources and sinks are direct reflections of the privacy policy. Changing any of these descriptions is equivalent to changing the policy, and therefore the metadata must be protected by access control measures.
  • RBAC role-based access control
  • RBAC fits naturally in the privacy framework, and a similar approach can be used to specify privacy policy rules instead of specifying read labels for every node.
  • mapping can be implemented as table look-up. Assuming privacy policy metadata contains this table, specifying role and purpose for a principal is equivalent to specifying a read label.
  • Metadata for each of the data sources contains a single write label L W that describes the set of attributes and integrity of the information arriving from this source.
  • the set of attributes L W . should contain all attributes that may appear in the data stream supplied by the source. Risk levels for the attributes should be chosen conservatively. A higher risk level assigned to the attribute restricts access to the information derived from the source. Restricted access may be alleviated by stronger suppression, if such suppressor components are available. In this case, specifying a higher risk level is equivalent to requiring valid workflows to apply stronger suppression.
  • Integrity of the source L W .t can be used to control suppression.
  • Low integrity sources may supply data in which it is hard to suppress private information, e.g., because steganography methods are used to hide PII in usually non-sensitive data.
  • read labels assigned to suppressors should specify a minimum integrity level necessary for suppression.
  • Sink nodes correspond to outputs of the system, and typically represent flows that are presented to end users of the system. For each of the sink nodes, the read labels are used to describe access rights of the recipients of these flows. As discussed above, the user role and access purpose can be specified instead.
  • Purpose can be uniquely determined by the user role, or the user can be given a choice of purposes, if the policy allows that. If users can choose among multiple purposes, it is hard to detect the situation in which data is requested for one purpose, but used for another. However, auditors may detect a violation even in this case based on the processing history of submitted workflows.
  • Instantiated Components are components that are already deployed when the workflow is submitted, and for which labels cannot be changed. Workflows can make use of components already existing in the system, applying different processing to output already deployed workflows to derive new results. It is assumed that instantiated components as principals comply with the MSA model, and that rule (4), which defines the relationship between principal labels, holds for each instantiated component. For these instantiated components, privacy metadata must contain read and write labels.
  • Deployment of the workflow may involve not only establishing connections between already deployed instantiated components, but also require automated deployment of new component instances. Each of these newly deployed components is an instance of a corresponding component class. Metadata describing a component class includes the location of executable code that must be deployed during component instantiation, as well as various configuration parameters and resource dependencies.
  • Component class metadata must also include a description of component functionality and access rights from a privacy perspective. As other types of privacy metadata, privacy-related sections of component class description must be protected from unauthorized modification. While a component class itself does not process any data and does not act as a principal in the information flow, component instances become principals, and therefore, must be assigned a set of principal labels. Component class metadata defines template labels that are used to compute principal labels for component instances.
  • Template labels defined in component class metadata are (L R , L A , L S , L D ), where L R is the template read label, L A is the template addition label, L S is the template selection label, and L D is the template suppression label.
  • read labels L R can be specified indirectly by assigning roles and purposes to component classes.
  • Principal labels for new component instances are computed before deployment of the workflow. Principal labels ( ⁇ circumflex over (L) ⁇ R , ⁇ circumflex over (L) ⁇ A , ⁇ circumflex over (L) ⁇ S , ⁇ circumflex over (L) ⁇ D , ⁇ circumflex over (L) ⁇ ⁇ W ) are based on the labels of component inputs in the workflow and are derived from template labels using the procedure that will be described below.
  • att ⁇ Attr ( ⁇ circumflex over (L) ⁇ R ) ⁇ ⁇ circumflex over (L) ⁇ S .t L S .t.
  • rule (1) is verified for every principal p and every channel o supplied to p. This condition requires that the read labels of principals dominate channel labels of the edges carrying flow into the principals. If this condition is violated, the workflow is rejected. Otherwise, the workflow with the computed labels complies with all rules of the MSA model, and can be deployed.
  • the metadata specifies the values of write labels.
  • the algorithm proceeds recursively, that is, starting from the sources.
  • channel labels are first assigned to all edges leaving the nodes with assigned write labels. If this assignment causes a conflict with rule (1), the workflow is rejected due to a policy violation. Then, the loop at step 7 of the algorithm assigns read labels to all new component instances that do not have read labels assigned thereto, and to channel labels of all input edges that have been assigned.
  • the read label assigned to the component instance should allow as little access to sensitive information as possible due to the principle of least privilege.
  • the read label is chosen to be just enough to read the channels received by the component instance.
  • the READ label L R (p) satisfies
  • L ⁇ R ⁇ ( p ) ⁇ i ⁇ L ⁇ ( o i )
  • Privacy metadata requires that the read label specified in metadata describing component class P of component p must dominate the read label assigned to p.
  • ⁇ circumflex over (L) ⁇ R (p) must satisfy constraints C(P) specified in the description of the component class. If either of the two conditions is violated, the workflow must be rejected as shown in step 10 .
  • the selection label specified in the component class metadata may enumerate a large set of attributes, exceeding the set present in the read label of a component instance.
  • the selection label ⁇ circumflex over (L) ⁇ S (p) assigned to the component instance in step 11 depends on the attributes contained in the channels read by the component instance. More specifically, ⁇ circumflex over (L) ⁇ S (p) should include the attribute att—selected only if att is contained in the read label ⁇ circumflex over (L) ⁇ R (p).
  • the algorithm terminates after a finite number of iterations, which does not exceed the number of nodes in the workflow. It is also straightforward to show that this algorithm assigns labels such that if a feasible assignment exists, a feasible solution will be found by the algorithm, and the workflow will be accepted. Further, label assignment is feasible if it satisfies rules (1), (2), (4) and component instantiation rules (5).
  • Runtime Enforcement and MLS The verification method described above helps prevent the deployment of privacy-violating workflows. However, in run-time, continuous privacy policy enforcement is required to make sure that the information flows stay within the boundaries defined by the approved workflow specifications which were initially submitted for deployment.
  • Access control mechanisms of existing multi-level secure (MLS) systems can be leveraged to enforce the MSA model constraints. This can be achieved by creating an order preserving one-to-one mapping M(•) between MSA labels L and MLS labels l, such that for all MSA labels L L′ it follows that M(L) M(L′), in the partial order of MLS, and for all l l′ it follows that M ⁇ 1 (l) M ⁇ 1 (l′). With this mapping, the MLS enforcement mechanisms will automatically enforce the MSA condition (6) by enforcing a similar constraint of the Bell-LaPadula policy on corresponding MLS labels.
  • MLS multi-level secure
  • Secrecy level can be chosen from one of several discrete values, such as unclassified or secret.
  • the category set contains the categories relevant to the data described by the label. The relation is defined as follows: (s,C) (s′,C′) if and only if s ⁇ s′ and C ⁇ C′.
  • This mapping has all the required properties.
  • the MLS implementation must support trusted downgraders, which are allowed to violate the “no write-down” rule of the Bell-LaPadula policy. It should also be noted that the mapping between MSA and MLS can be modified such that existing MLS labels on the channels can be used jointly with MSA labels, enforcing regular MLS data confidentiality protection in addition to MSA privacy protection constraints.
  • the MSA model makes final access control decisions based on the label assigned to the flow leaving the system. These labels, and therefore these decisions, can only be trusted if the metadata assigned to components and sources exactly or with conservative overprotection represents the functionality of the components and the contents of the sources. Thus, it is crucial for successful deployment of an MSA-based system to establish procedures for verifying the correctness of the metadata, e.g., by performing code reviews or selective logging and auditing of data flows.
  • MSA metadata has been identified as requiring attention during auditing.
  • a write label assigned to a source does not contain all sensitive attributes contained in the output of the source, or contains these attributes with levels lower than the observed levels. Sensitive information added by a component to its output is not reflected in the addition label of the component. Sensitive attribute suppression declared in the suppression label of a component is not performed correctly.
  • Integrity violations An integrity level assigned to the write label of a component is above the true integrity level.
  • a write label assigned to a source specifies an integrity higher than true integrity of the data.
  • rule (1) needs to be extended to take into account combinations of attributes. Recall that the original rule (1) requires that the read label of a component dominates the object label of the channel: L ( o ) L R .
  • This condition is then extended by adding to this condition a principal-specific set of combinations that are not allowed to appear in any channel connected to the principal. For example, the combination of SSN and FullName attributes can be prohibited, while SSN or FullName alone are allowed.
  • the set of combinations is then stored as part of component or component class metadata, together with the read label, or a read template label.
  • the privacy protection framework based on the MSA access control policy supports a general composition architecture where data streams are processed by workflows formed of stream processing components connected by communication channels.
  • any component may produce new streams of data that become available for analysis by other components.
  • the framework can easily be enforced in existing MLS systems that implement the Bell-LaPadula policy. This together with an RBAC approach makes configuration and administration of the framework simple and practical. For example, while additional tools are needed to visualize and modify MSA labels at the level of system administrators and users, the underlying enforcement mechanisms of MLS systems do not require any modification.
  • the framework supports suppression algorithms with varying degrees of suppression.
  • the MSA policy can prevent a disclosure of results derived from selected data if the selection criteria leads to a mechanism similar to those used in the RBAC approach.
  • the present invention may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof.
  • the present invention may be implemented in software as an application program tangibly embodied on a program storage device (e.g., magnetic floppy disk, RAM, CD ROM, DVD, ROM, and flash memory).
  • the application program may be uploaded to, and executed by, a machine comprising any suitable architecture.

Abstract

A method for protecting information in a distributed stream processing system, including: assigning a principal label to a processing component; assigning a first channel label to a first communication channel that is input to the processing component; comparing the principal label to the first channel label to determine if the processing component can read data attributes of the first channel label; and reading the data attributes of the first channel label when the principal label is equal to or has precedence over the first channel label, wherein the principal label includes a read label and a write label and at least one of a selection label, an addition label or a suppression label.

Description

CROSS-REFERENCE TO RELATED APPLICATION
This application is a continuation of application Ser. No. 11/496,821, filed Aug. 1, 2006, the disclosure of which is incorporated by reference herein in its entirety.
GOVERNMENT INTERESTS
This invention was made with Government support under Contract No.: TIA H98230-04-3-0001 awarded by Distillery Phase II Program. The Government has certain rights in this invention.
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates to network security, and more particularly, to an access control method and a system for privacy protection.
2. Discussion of the Related Art
Privacy protection of personally identifiable information (PII) in information systems used in industry and government applications has become increasingly important due to the proliferation of computerized information and management systems. The work on privacy protection in computer systems that rely on databases and data mining algorithms for storing and processing information has largely focused on masking protected data at the point of retrieval, e.g., at the data source. In the area of information security protection, significant attention has focused on information flows between multiple principals.
The protection of PII involves two aspects: information security and privacy protection. Information security measures such as access control, firewalls, sandboxing and secure communication channels are used to prevent unauthorized access to PII. Information security measures alone are not enough for information systems to be privacy preserving since they are low level and do not distinguish between accessing PII for statistics computation or record retrieval. Privacy policies generally allow the former and prohibit the latter.
Existing privacy protection methods, such as privacy-preserving data mining, R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proc. SIGMOD-97, 1997, define a formal framework for privacy analysis and protection. A common assumption in existing privacy protection methods is that its software, e.g., data mining software, can be trusted to protect private information. However, this is not always the case, especially when large systems are built using a large set of components that are open source or using commercial software developed by third parties.
Further, business circumstances often demand that the third party software be installed and running as soon as possible. However, there may not always be sufficient time to verify the code for potential security flaws or Trojan horses. One example of such an application where privacy constraints are important is the national airline passenger-prescreening program called Secure Flight that is currently being developed by the Transportation Security Administration (TSA) of the U.S. Department of Homeland Security, Secure Flight Program. U.S. Department of Homeland Security, Transportation Security Administration, http://www.tsa.gov/public/interapp/editorial/editorial1716.xml.
Lattice-based access control (LBAC) models, such as those of Bell and LaPadula D. Bell and L. LaPadula. Secure computer system: Unified exposition and Multics interpretation. Technical Report ESD-TR-75-306, ESD/AFSC, Hanscom AFB, Bedford, Mass., 1975, and Denning D. Denning. A lattice model of secure information flow. Communications of the ACM, 19(5):236-243, May 1976, can provide formal end-to-end security guarantees in systems where the components are not trusted. These models are often overly restrictive. However, these issues can be address by modifications that permit declassification of data using trusted components, e.g., D. Bell. Secure computer systems: A refinement of the mathematical model. MTR-2547, Vol. III, MITRE Corp., 1974.
Tracking data integrity as data is transformed within stream processing systems is also important and can be addressed by the Biba integrity model as described, e.g., in, K. J. Biba. Integrity considerations for secure computer systems. Technical Report ESD-TR-76-372, ESD/AFSC, Hanscom AFB, Bedford, Mass., 1977.
The Bell-LaPadula policy is widely used in multi-level secure (MLS) systems. The models of Bell-LaPadula, Denning, Biba and several others can be unified under the common LBAC framework described by Sandhu, R. Sandhu. Lattice-based access control models. IEEE Computer, 26(11):9-19, 1993. Security and integrity models have been combined in practice, e.g., in the Caemarvon protocol H. Scherzer, R. Canetti, P. A. Karger, H. Krawczyk, T. Rabin, and D.C. Toll. Authenticating mandatory access controls and preserving privacy for a high-assurance smart card. In ESORICS, pages 181-200, 2003.
While the above-mentioned models are well suited for protection of confidentiality and integrity, they do not completely address the needs of privacy protection since they do not protect anonymity as well as confidentiality. For example, anonymity protection methods, such as k-anonymity, described in L. Sweeney. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10(5):557-570, 2002, are often parametric and provide varying degrees of protection depending on their configuration. In addition, selection, e.g., content-based filtering, may noticeably reduce anonymity, especially if it results in the disclosure of data derived from a significantly smaller population. Further, privacy policies mandate that different access decisions must be made for different access purposes; however, the notion of purpose is not supported by the LBAC models.
SUMMARY OF THE INVENTION
In an exemplary embodiment of the present invention, a method for protecting information in a distributed stream processing system, comprises: assigning a principal label to a processing component; assigning a first channel label to a first communication channel that is input to the processing component; comparing the principal label to the first channel label to determine if the processing component can read data attributes of the first channel label; and reading the data attributes of the first channel label when the principal label is equal to or has precedence over the first channel label, wherein the principal label includes a read label and a write label and at least one of a selection label, an addition label or a suppression label.
The principal label includes data attributes. The principal label includes a name and a risk level for each of the data attributes of the principal label. The principal label includes an integrity level for the data attributes of the principal label.
The first channel label includes a name and a risk level for each of the data attributes of the first channel label. The first channel label includes an integrity level for the data attributes of the first channel label. The first channel label further includes a processing history of the first communication channel.
The method further comprises: assigning a second channel label to a second communication channel that is output from the processing component; comparing the principal label to the second channel label to determine if data attributes of the principal label can be written to the second communication channel; and writing the data attributes of the principal label to the second communication channel when the second channel label is equal to or has precedence over the principal label.
The method further comprises: assigning a user label to a user that is connected to the second communication channel; comparing the user label to the second channel label to determine if the user can read data attributes of the second channel label; and reading the data attributes of the second channel label when the user label is equal to or has precedence over the second channel label.
The user label includes data attributes, the data attributes of the user label including a user name, a user role or a user purpose. The user label includes a read label.
In an exemplary embodiment of the present invention, a method for verifying privacy policy compliance of a workflow in a distributed stream processing system, comprises: assigning a read label to a user; assigning channel labels to channels connecting processing components of the workflow to external data sources; assigning read and write labels to the processing components; accepting the workflow when each of the read labels dominates the channel labels of all channels read by its corresponding processing component, each of the write labels is dominated by the channel labels of all channels written to by its corresponding processing component and each of the write labels dominates the read label of its corresponding processing component; or rejecting the workflow if each of the read labels does not dominate the channel labels of all the channels read by its corresponding processing component, each of the write labels is not dominated by the channel labels of all the channels written to by its corresponding processing component, each of the write labels does not dominate the read label of its corresponding processing component or if the user label does not dominate the channel labels of all channels read by the user, wherein the read label of the user, the channel labels of the channels and the read and write labels of the processing components include data attributes.
At least one of the channel labels is already assigned to at least one of the communication channels before the channel labels are assigned. At least one set of the read and write labels is already assigned to at least one of the processing components before the read and write labels are assigned.
The method further comprises: assigning a maximum read label to one of the processing components; and rejecting the workflow if the read label is not dominated by the maximum read label. The method further comprises: assigning an addition label to one of the processing components; and rejecting the workflow if the addition label is not dominated by the write label.
The method further comprises: assigning a selection label to one of the processing components; and rejecting the workflow if the selection label is not dominated by the write label. The method further comprises: assigning a suppression label to one of the processing components, wherein when the suppression label is assigned the write label dominates a minimum value of the suppression level and the read label.
The method further comprises: generating a planning task representation; reading the planning task representation using a planning algorithm; assigning the read label to the user, the channel labels to the channels and the read and write labels to the processing components using the planning algorithm; and generating the workflow by translating an output of the planning algorithm.
The method further comprises deploying the workflow when it is accepted. The method further comprises translating the read and write labels of the processing components into multi-level secure (MLS) system compatible labels, wherein the MLS system enforces a Bell-LaPadula privacy policy and the workflow is deployed in the MLS system.
In an exemplary embodiment of the present invention, a computer program product comprising a computer useable medium having computer program logic recorded thereon for protecting information in a distributed stream processing system, the computer program logic comprises: program code for assigning a principal label to a processing component; program code for assigning a first channel label to a first communication channel that is input to the processing component; program code for comparing the principal label to the first channel label to determine if the processing component can read data attributes of the first channel label; and program code for reading the data attributes of the first channel label when the principal label is equal to or has precedence over the first channel label, wherein the principal label includes a read label and a write label and at least one of a selection label, an addition label or a suppression label.
The computer program product, further comprises: program code for assigning a second channel label to a second communication channel this is output from the processing component; program code for comparing the principal label to the second channel label to determine if data attributes of the principal label can be written to the second communication channel; and program code for writing the data attributes of the principal label to the second communication channel when the second channel label is equal to or has precedence over the principal label.
The computer program product, further comprises: program code for assigning a user label to a user that is connected to the second communication channel; program code for comparing the user label to the second channel label to determine if the user can read data attributes of the second channel label; and program code for reading the data attributes of the second channel label when the user label is equal to or has precedence over the second channel label.
In an exemplary embodiment of the present invention, a computer program product comprising a computer useable medium having computer program logic recorded thereon for verifying privacy policy compliance of a workflow in a distributed stream processing system, the computer program logic comprises: program code for assigning a read label to a user; program code for assigning channel labels to channels connecting processing components of the workflow to external data sources; program code for assigning read and write labels to the processing components; program code for accepting the workflow when each of the read labels dominates the channel labels of all channels read by its corresponding processing component, each of the write labels is dominated by the channel labels of all channels written to by its corresponding processing component and each of the write labels dominates the read label of its corresponding processing component; or rejecting the workflow if each of the read labels does not dominate the channel labels of all the channels read by its corresponding processing component, each of the write labels is not dominated by the channel labels of all the channels written to by its corresponding processing component, each of the write labels does not dominate the read label of its corresponding processing component or if the user label does not dominate the channel labels of all channels read by the user, wherein the read label of the user, the channel labels of the channels and the read and write labels of the processing components include data attributes.
The computer program product, further comprises: program code for assigning a maximum read label to one of the processing components; and program code for rejecting the workflow if the read label is not dominated by the maximum read label. The computer program product, further comprises: program code for assigning an addition label to one of the processing components; and program code for rejecting the workflow if the addition label is not dominated by the write label.
The computer program product, further comprises: program code for assigning a selection label to one of the processing components; and program code for rejecting the workflow if the selection label is not dominated by the write label. The computer program product, further comprises: program code for assigning a suppression label to one of the processing components, wherein when the suppression label is assigned the write label dominates a minimum value of the suppression level and the read label.
The computer program product, further comprises: program code for generating a planning task representation; program code for reading the planning task representation using a planning algorithm; program code for assigning the read label to the user, the channel labels to the channels and the read and write labels to the processing components using the planning algorithm; and program code for generating the workflow by translating an output of the planning algorithm.
The computer program product, further comprises program code for deploying the workflow when it is accepted. The computer program product, further comprises program code for translating the read and write labels of the processing components into MLS system compatible labels, wherein the MLS system enforces a Bell-LaPadula privacy policy.
In an exemplary embodiment of the present invention, a workflow deployment unit for protecting personally identifiable information (PII) in a distributed stream processing system, comprises: an assigner for assigning principal labels to processing components, channel labels to communication channels, and a user label to a user; a verifier for verifying that the channel label of a communication channel that is input to a processing component can be read by the processing component, for verifying that the principal label of a processing component can be written to by the processing component to a communication channel that is output from the processing component and for verifying that the user label can be read by the channel label of the communication channel this is output from the processing component; and a deployer for deploying a workflow that includes the processing component that can read the channel label of the communication channel that is input to the processing component and that can write the principal label to the communication channel that is output from the processing component, the communication channel that is input to the processing component, the communication channel that is output from the processing component and the user that can read the channel label of the communication channel that is output from the processing component.
The principal labels and the channel labels include PII. The workflow deployment unit further comprises: a first interface for receiving PII; and a second interface for receiving the principal labels, the channel labels and the user label.
The foregoing features are of representative embodiments and are presented to assist in understanding the invention. It should be understood that they are not intended to be considered limitations on the invention as defined by the claims, or limitations on equivalents to the claims. Therefore, this summary of features should not be considered dispositive in determining equivalents. Additional features of the invention will become apparent in the following description, from the drawings and from the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates a method for protecting data in a distributed stream processing system according to an exemplary embodiment of the present invention;
FIG. 2 illustrates controlling access to a stream of data according to an exemplary embodiment of the present invention;
FIG. 3A illustrates a workflow generated with privacy constraints according to an exemplary embodiment of the present invention;
FIG. 3B illustrates a workflow generated without privacy constraints according to an exemplary embodiment of the present invention;
FIG. 4 illustrates a system for workflow verification and deployment according to an exemplary embodiment of the present invention; and
FIG. 5 illustrates a label assignment algorithm according to an exemplary embodiment of the present invention.
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
An access control method and a system for privacy protection according to an exemplary embodiment of the present invention will now be described beginning first with a general stream processing system followed by a multi-set attribute (MSA) privacy model of the privacy protection framework.
General Model of Stream Processing. Consider a general model of a stream processing system in which data streams are processed by one or more components (e.g., principals). The components are connected by communication channels (e.g., objects). The communication channels also connect the components to external data sources and to sinks thus delivering information to a user. The communication channels are one-way links allowing the data stream to flow from its source to its destination. To simplify verification, it is assumed that a resulting information flow graph is acyclic.
A set of communicating components connected by communication channels is referred to as workflows or information flow graphs. There are three categories of nodes in the workflow. Nodes that have no incoming edges are referred to as source nodes. Nodes that have no outgoing edges are referred to as sink nodes. The rest of the nodes are referred to as processing components or components. Generally, source nodes represent the data sources (e.g., databases or data streams) from which the data arrives in the system. A sink node typically represents an end user who receives results of the processing. Processing components are programs that transform data, producing new data streams from the data they receive.
In the privacy protection framework, labels are assigned to the communication channels and to the processing components. The labels describe data attributes present in channels or accessed by components. The components are treated as principals, and may or may not be trusted. A channel label is chosen and assigned such that is can correctly reflect both the content and processing history (e.g., selection and suppression) of information passing through the channel. An access decision is made by comparing a label assigned to an output channel and a label of a user according to a role of the user and a user's access purpose.
Channel labels and principal labels will now be described in more detail.
Channel Label. A privacy policy in the MSA model provides a description of a set of categories of information that must be protected, e.g., taxpayer identifier, salary amount or bank account number. These categories are referred to as attributes, to distinguish from coarser security models and to better reflect operations on the data, such as selection. If a channel carries information belonging to a category, the channel is said to contain that attribute. For example, if a channel can carry a document containing the following sentence: “Alice's salary of $XXXX is directly deposited to her bank account YYYYYY”, the label of the channel must contain both salary and bank account attributes.
There is a significant difference between the notion of attributes in the MSA model and of attributes in relational databases. In particular, the channels in the MSA model can carry unstructured data.
A channel o in the MSA model is assigned a label L(o). In the following, when the context is clear, L(o) is written as L. A channel label describes the attributes contained in the channel, as well as a processing history of the channel. A label L has two elements: an attribute set A and an integrity level t, e.g., L=
Figure US08032924-20111004-P00001
. For notational convenience, L.
Figure US08032924-20111004-P00002
and L.t is used to denote these two elements, respectively.
Attribute set
Figure US08032924-20111004-P00002
includes a pair of (attribute-name, risk-level) for each attribute contained in the channel. Typically,
Figure US08032924-20111004-P00002
={(att1, r1), . . . , (attn, rn)} where atti is the name of attribute i and ri (0≦ri≦1) is a corresponding risk level. In the following, denote Attr(L.
Figure US08032924-20111004-P00002
)=Attr(L)={(att1, . . . , attn}. The risk level r1 represents the risk of disclosing information corresponding to the attribute atti if the contents of the channel collected over a fixed time are disclosed. For example, a risk level of 1 means that the attribute is explicitly present in the channel, and a risk level of 0 means that it is impossible to derive the value of the attribute from the channel. In the following, practical issues of risk evaluation will be addressed. Generally, an attribute with the risk level of 0 can be omitted from the labels.
The integrity level L.t is an integer between 1 and T, where T is the highest level of integrity specified in the privacy policy. Information from a channel with a higher t is considered more accurate, reliable and trustworthy than that from a channel with a lower t, as described, e.g., in K. J. Biba. Integrity considerations for secure computer systems. Technical Report ESD-TR-76-372, ESD/AFSC, Hanscom AFB, Bedford, Mass., 1977.
A low integrity level corresponds to either potentially damaging data (e.g., data containing a virus), or to data that may be misleading or unreliable. L(o).t is determined by the labels of a principal that writes to a channel, or it is determined based on processing history.
Principal Label. Each principal p is assigned the following labels: a read label LR(p), a write label LW(p), a selection label LS(p), an addition label LA(p), and a suppression (or downgrading) label LD(p). LR determines which channels can be read by a principal, and LW controls labels of channels written by the principal. The selection label LS is used to record in the processing history of a channel selection operations that have been performed by the principal. In particular, if a principal conducts a selection based on an attribute att, it is required that (att-selected, 1)∈LS.
Figure US08032924-20111004-P00003
where att-selected is a new attribute representing that att has been selected. The access control rules to be described below guarantee that the selection history is recorded when information is transmitted throughout the workflow graph. The addition label LA includes the attributes that are added to output data by the principal. Finally, the suppression label LD describes the attributes that are suppressed by the principal.
Access Rules. Before access control rules of the MSA model are presented, notations used to describe the rules will be introduced.
Definition 1. Given two attribute sets
Figure US08032924-20111004-P00002
and
Figure US08032924-20111004-P00004
if and only if ∀(att,r)∈
Figure US08032924-20111004-P00005
∃(att′,r′)∈
Figure US08032924-20111004-P00006
such that att=att′ and r≦r′.
To remain consistent with notations in existing security models,
Figure US08032924-20111004-P00006
dominates
Figure US08032924-20111004-P00007
if and only if
Figure US08032924-20111004-P00008
Informally,
Figure US08032924-20111004-P00009
means that attribute set
Figure US08032924-20111004-P00002
contains more sensitive information than
Figure US08032924-20111004-P00010
Definition 1 can be extended to compare two labels as follows.
Definition 2. Given two labels L=
Figure US08032924-20111004-P00011
t
Figure US08032924-20111004-P00012
and L′=
Figure US08032924-20111004-P00013
L
Figure US08032924-20111004-P00014
L′ if and only if
Figure US08032924-20111004-P00009
and t≧t′.
Since a lower integrity level indicates a lower trustworthiness of the label assigned to the information, L
Figure US08032924-20111004-P00014
L′ means that label L′ represents more sensitive information than L. It is easy to show that
Figure US08032924-20111004-P00014
is a partial order (e.g., reflexive, transitive and antisymmetric). Since
Figure US08032924-20111004-P00014
is not a total order, there may exist two labels that are not comparable. Use L
Figure US08032924-20111004-P00015
L′ to denote the fact that (1) L and L′ are not comparable, or (2) (L′
Figure US08032924-20111004-P00014
L)
Figure US08032924-20111004-P00016
(L≠L′).
Definition 3. Given two labels L=
Figure US08032924-20111004-P00017
and L′=
Figure US08032924-20111004-P00018
Figure US08032924-20111004-P00019
={(att,max{ratt A,ratt A′})} for ∀att∈Attr(
Figure US08032924-20111004-P00002
)∪ Attr(
Figure US08032924-20111004-P00006
where ratt A and ratt A′ are the risk levels of att in
Figure US08032924-20111004-P00002
and
Figure US08032924-20111004-P00006
respectively. Consequently, L␣L′ is defined as L␣L′=
Figure US08032924-20111004-P00020
min{t,t′}).
Recall that if att∉Attr(
Figure US08032924-20111004-P00002
), ratt A=0. Intuitively, L␣L′ represents the union of the attributes annotated by L and L′. The following result is straightforward: (L1
Figure US08032924-20111004-P00014
L)
Figure US08032924-20111004-P00021
(L2
Figure US08032924-20111004-P00014
L)
Figure US08032924-20111004-P00022
(L1␣L2)
Figure US08032924-20111004-P00014
L.
Definition 4. Given two labels L=
Figure US08032924-20111004-P00017
and L′=
Figure US08032924-20111004-P00018
define
Figure US08032924-20111004-P00023
={(att′,min{ratt′ A,ratt′ A′})}∪{(att,ratt)} for ∀att′∈Attr(
Figure US08032924-20111004-P00002
)∩Attr(
Figure US08032924-20111004-P00006
) and ∀att∈Attr(
Figure US08032924-20111004-P00002
)\Attr(
Figure US08032924-20111004-P00006
).
Consequently, L−L′ is defined as L−L′=
Figure US08032924-20111004-P00024
min{t,t′}
Figure US08032924-20111004-P00025
Under this definition,
Figure US08032924-20111004-P00026
includes the pair of (att,ratt) if att is contained in
Figure US08032924-20111004-P00002
but not
Figure US08032924-20111004-P00027
. If att is contained in both
Figure US08032924-20111004-P00002
and
Figure US08032924-20111004-P00006
, att is also included in
Figure US08032924-20111004-P00002
and its risk level is the minimum of the risk levels of att in
Figure US08032924-20111004-P00028
and
Figure US08032924-20111004-P00006
Figure US08032924-20111004-P00023
respectively.
The access control rules of the MSA model are described below.
Rule 1. Read access. A principal can read a channel o with label L(o) from an incoming link only if
L(o)
Figure US08032924-20111004-P00014
L R.  (1)
Hence, a principal can read channels that contain at most as much sensitive information as represented by the principal's read label. Recall that L(o)=
Figure US08032924-20111004-P00029
,to
Figure US08032924-20111004-P00030
R,tR
Figure US08032924-20111004-P00031
=LR
Figure US08032924-20111004-P00032
to≧tR. The integrity level tR of LR can be used to restrict allowed input of the principal to channels with integrity at or above a specified level (tR).
Rule 2. Write access. The label L(o′) of a channel o′ written to an outgoing link must satisfy
L W
Figure US08032924-20111004-P00014
L(o′).  (2)
This rule specifies that the channel written by the principal contains all the attributes that are represented by the write label LW, and L(o′).t≦LW.t. Thus, the label of a channel written by a principal with a low integrity level has the same or lower integrity level, which disallows read access to the channel for other principals with high input integrity requirements. Consequently, access to the channels produced by low-integrity principals can be restricted, and potential privacy violations due to label uncertainty can be avoided.
Rule 3. Attribute accumulation. For untrusted principals the following inequality must hold:
L R ␣L S ␣L A
Figure US08032924-20111004-P00014
L W.  (3)
Hence, the channels written by the principal will have a risk level for each of the attributes that is equal to or exceeding the risk level of the same attribute in any of the inputs or selection and addition labels. It is clear that (LR␣LS␣LA
Figure US08032924-20111004-P00014
LW)
Figure US08032924-20111004-P00032
(LR
Figure US08032924-20111004-P00014
LW)
Figure US08032924-20111004-P00033
(LS
Figure US08032924-20111004-P00014
LW)
Figure US08032924-20111004-P00034
(LA
Figure US08032924-20111004-P00014
LW). Since
Figure US08032924-20111004-P00014
is transitive, (L(o)
Figure US08032924-20111004-P00014
LR
Figure US08032924-20111004-P00014
LW
Figure US08032924-20111004-P00014
L(o′))
Figure US08032924-20111004-P00032
(L(o)
Figure US08032924-20111004-P00014
L(o′)), which means that, for an untrusted principal, all attributes contained in the label of a channel read by the principal must be contained in the labels of all channels written by the principal, with the same or higher risk level. Condition LS
Figure US08032924-20111004-P00014
LW
Figure US08032924-20111004-P00014
L(o′) guarantees that if the principal performs a selection based on attribute att, label L(o′) includes the pair of (att-selected, 1) thus recording the selection operation in the processing history of the information.
Additionally, LA
Figure US08032924-20111004-P00014
LW
Figure US08032924-20111004-P00014
L(o′) ensures that L(o′) contains the attributes added by the principal. Note that L(o′).t≦LW.t≦min{LR.t, LS.t, LA.t}, hence the integrity level of the label of a channel written by a principal is no larger than the minimum of the integrity levels of the principal's labels.
Rule 4. Suppression. For trusted principals, suppression operations are described by defining a nonempty suppression label LD. The condition (3) above is then adjusted to allow suppression of the attributes:
L R ␣L S ␣L A −L D
Figure US08032924-20111004-P00014
L W.  (4)
When this rule is applied, all implications of rule 3 hold for all attributes, with the exception of the suppressed attributes contained in LD. The suppressed attributes in LW can have a maximum risk level equal to or higher than the risk level specified for the attribute in the suppression label of the principal. Hence, the trusted principals can reduce risk levels of one or several attributes, effectively labeling the channels that they write as less sensitive than the channels that they read. Note that since
Figure US08032924-20111004-P00014
is a partial order, the information flow represented by a workflow graph is similar to the Denning's lattice model. Once the four access control rules described above hold and are enforced, the MSA model can guarantee privacy policy compliance.
Now that the general stream processing system and the MSA privacy model have been described, a method for protecting information such as personally identifiable information (PII) using the MSA control model in a general stream processing system will be described with reference to FIG. 1.
As shown in FIG. 1, a principal label is assigned to a processing component (105) and a first channel label is assigned to a first communication channel (110). The principal label is compared to the first channel label to determine if the processing component can read data attributes of the first channel label (115). If the principal label is equal to or has precedence over the first channel label, the data attributes of the first channel label are read by the processing component (120). For example, if a channel label LW contains attribute “SSN” with risk level 0.8, and attribute “Name” with risk level 0.6, and a principal label LR contains attribute “SNN” with risk level 1.0, and attribute “Name” with risk level 1.0, the principal is allowed read access to the channel.
A second channel label is assigned to a second communication channel that is output from the processing component (125). The principal label is compared to the second channel label to determine if data attributes of the principal label can be written to the second communication channel (130). If the second channel label is equal to or has precedence over the principal label, the data attributes of the principal label are written to the second communication channel (135).
A user label is assigned to a user that is connected to the second communication channel (140). The user label is compared to the second channel label to determine if the user can read data attributes of the second channel label (145). If the user label is equal to or has precedence over the second channel label, the data attributes of the second channel label are read by the user label (150).
FIG. 2 is an example of controlling access to a stream of data according to an exemplary embodiment of the present invention. FIG. 2 shows data sources 210 a-b connected to processing components 220 a-c via channels (indicated by arrows) that have labels 240 a-e assigned thereto and a user 230. The user 230 has a read label 230 a assigned thereto. In this example user read access to the channel label 240 e is denied (by applying the method of FIG. 1) since the read label 230 a does not dominate the channel label 240 e. A suppressor in the processing component 220 c can be used to reduce the risk level of attributes and selection labels shown in labels 240 d-e can add new attributes indicating that a selection has been performed.
The following description will focus on how the MSA model can be used for proving policy compliance.
Privacy Policy Compliance Properties. Note that if channel labels correctly reflect privacy sensitivity of the channels, principal labels can provide a flexible way of describing the trust placed in the principals. Depending on the degree of trust, the sensitivity and integrity of input information can be limited as needed by specifying appropriate read labels LR. The constraints and read labels directly represent the access permissions provided by the policy. However, to prove that the entire system of principals and channels complies with the policy when the access control rules are satisfied, it must be shown that the channel labels correctly annotate channel content.
What follows is the definition of correctness.
Definition 5. A channel label is correct if it contains all the attributes contained in the channel, and the risk level corresponding to the attribute in the label is equal to or higher than a privacy risk measure of the channel with respect to the attribute. For example, each attribute contained in the channel is included in the label with an appropriate risk level, and selection operations performed using any attribute of the channel according to selection criteria are recorded in the label.
This definition implies that once a label includes a pair of (att, 1), attribute att of the channel is correctly reflected by the label, independent of actual channel content. There are two scenarios of how incorrect labels can occur. First, a channel implicitly or explicitly can contain an attribute att while there are no pairs of (att,
Figure US08032924-20111004-P00035
) in the label of the channel. Second, the pair of (att, r) appears in the label, but the measure of risk that att can be inferred based on the channel is higher than r.
In the absence of suppression and addition, e.g., when Ln.
Figure US08032924-20111004-P00002
=LΔ.
Figure US08032924-20111004-P00002
=∅, the attribute accumulation rule (3) ensures that all attributes contained in the channels read by the principal are preserved and appear in the label of every channel written by the principal. In this case, assuming that the input channel labels are correct, the labels of the channels written by the principal are also correct since no new attributes are introduced by the principal and the output label dominates the input label. In the case when a principal adds a new attribute att to the channels it writes, the policy requires that for this principal (att, 1) ∈LA.
Figure US08032924-20111004-P00002
Therefore, rule (3) guarantees that (att, 1)∈L(o′).
Figure US08032924-20111004-P00002
, and hence, att is correctly reflected in the label L′ of each channel written to by the principal.
To ensure that the label correctly reflects a selection performed on attribute att, the pair of (att-selected, 1) must be included in LS.
Figure US08032924-20111004-P00002
. Rule (3) then guarantees that this selection operation is reflected by the labels of the channels written by the principal performing selection. When suppression is performed by trusted principals, rule (4) makes sure that any attribute that is not suppressed is correctly reflected in label L(o′) of each channel
Figure US08032924-20111004-P00036
written by the principal. For those attributes that are suppressed, it is assumed that the principal correctly reduces the risk level of the corresponding attribute since the principal is trusted to perform a suppression operation on all channels that satisfy the input requirements of the principal, and whose labels are dominated by the principal's read label.
Rule (3) describes label computation at the level of a single principal. In the following, however, it will be shown that as a whole the system of principals and channels, in which MSA access control rules are enforced, achieves privacy policy compliance. Consider the following two cases. In the first case, there are no trusted principals performing suppression in the information flow graph. In second case, there exists a principal that suppresses one or more attributes. For simplicity, attribute att will be the focus. The second case can trivially be extended for multiple attributes.
Before moving on, the following notation will be introduced.
Definition 6. A path
Figure US08032924-20111004-P00037
from principal
Figure US08032924-20111004-P00038
to principal
Figure US08032924-20111004-P00039
is a sequence of principals (
Figure US08032924-20111004-P00040
. . . ,
Figure US08032924-20111004-P00041
) such that there exists a directed edge in the information flow graph from
Figure US08032924-20111004-P00042
to
Figure US08032924-20111004-P00043
for each i=1. 2 . . . n−1. Principals on the path
Figure US08032924-20111004-P00044
are referred to as predecessors of
Figure US08032924-20111004-P00039
on
Figure US08032924-20111004-P00045
.
In the case where there are no principals performing suppression in the information flow graph, the following proposition stands.
Proposition 1. Let L
Figure US08032924-20111004-P00046
be the label of a channel
Figure US08032924-20111004-P00047
written by principal
Figure US08032924-20111004-P00048
Without suppression, L
Figure US08032924-20111004-P00049
)
Figure US08032924-20111004-P00014
L(op) for any predecessor
Figure US08032924-20111004-P00050
of
Figure US08032924-20111004-P00051
Proposition 2. If L(
Figure US08032924-20111004-P00047
) is the label of a channel
Figure US08032924-20111004-P00047
written by a principal
Figure US08032924-20111004-P00052
such that att-selected∉Attr(L(
Figure US08032924-20111004-P00047
)), then there are no predecessors of
Figure US08032924-20111004-P00051
on any path to
Figure US08032924-20111004-P00053
that perform selection based on attribute att.
Proposition (2) states that once a selection on at; is performed, att-selected is always included in the label when the information is transmitted through the workflow graph.
When suppression is performed by one of the principals in the information flow graph, one has the following result.
Proposition 3. Once attribute att is selected, even when att is suppressed later, the channel label still includes att-selected, indicating that the stream has been selected based on att.
Thus, the MSA model can apply different access control rules in cases when the suppression of att is performed with or without prior selection of att. When att is pre-selected the channel should be considered more sensitive with respect to the attribute att.
Workflow Planning. When formulating a workflow planning problem, workflow composition formalism of the Stream Processing Planning Language (SPPL) described in A. Riabov and Z. Liu. Planning for stream processing systems, In Proc. of AAAI-05, 2005, was used.
An example of a workflow-planning problem will now be described with reference to FIGS. 3A and 3B. Here, consider a marketing research department of a company that analyzes conversations within an Internet chatroom hosted by the company for the purpose of studying opinions about products. The analysis is performed by a stream processing system in real time. Privacy policy requires that no information entered by a child under 13 can be analyzed or stored, irrespective of purpose. Chatroom conversations may be stored and analyzed for research purposes, as long as the conversations are not associated with the names of the persons participating, and unless customers “opt-out”, e.g., request that their conversations be excluded.
In FIGS. 3A and 3B, the components are shown as rectangles 320 a-f and 320 a-b connected by streams (indicated by arrows), and for each stream, a label 340 a-g and 340 a-c with corresponding data types and privacy attributes is shown. The workflow in FIG. 3A is generated if the MSA privacy policy is enforced. In the absence of privacy constraints, intermediate steps between GetConversations 320 a and AnalyzeChat 320 f components of FIG. 3A will be automatically removed, since they would become unnecessary. FIG. 3B shows the workflow in the absence of privacy constraints.
Note that a component called ScreenNameToID 320 b was included in the workflow shown in FIG. 3A only because an ID type is required by subsequent suppressors. This type corresponds to a user identifier that is needed to access user properties, such as age and opt-out flag. In general, SPPL planners can create workflow directed acyclic graphs (DAGs) of higher complexity than shown in this example. A general approach will now be described.
SPPL. The SPPL model describes the components of a stream processing workflow as actions that can have preconditions and effects. Streams in this model are described by predicates that can correspond to the type, format, and security or privacy attributes of data passing through the stream. Primal streams must be described as completely as possible based on the available knowledge about the external data sources. The descriptions of the derived streams, e.g., the streams produced by a component, are computed based on the descriptions of the component and of the streams processed by the component. Hence, the description of the output stream can be computed recursively.
In the privacy protection framework, consider a simplification of the SPPL model in which predicates are restricted to ground formulas and only AND-logic for predicate propagation is used. In this model, each data stream is described by a set x=(x.t, x.s). The subset x.t describes the format and type of data carried by the stream, and the subset x.s represents the privacy label of the data, as will now be described.
Each component is represented by an SPPL action c∈Â, where  is the set of all actions and c is defined by: 1) a precondition set
Figure US08032924-20111004-P00054
(c) for each input port j, 1≦j≦J(c); 2) an add-effect set ak(c) and a delete-effect set dk(c), where ak(c)∩dk(c)=∅, for each output port k, k, 1≦k≦K(c); 3) two real numbers for resource cost r(c) and quality q(c) contribution of the component to the cost and quality of the workflow.
The initial state for workflow planning is defined as a set of primal streams,
Figure US08032924-20111004-P00055
:={x1 0, x2 0, . . . , xI 0}. An action c can be applied in a state
Figure US08032924-20111004-P00056
if there exist streams x1, . . . , xJ(c)
Figure US08032924-20111004-P00056
such that
Figure US08032924-20111004-P00057
(c)xj.
When action c is applied, a transition is made to a new state
Figure US08032924-20111004-P00058
=
Figure US08032924-20111004-P00059
defined as:
S := S k = 1 K { ( a k ( c ) · t , [ a k ( c ) · s j = 1 J x j · s ] \ d k ( c ) · s ) }
In state
Figure US08032924-20111004-P00060
K(c) new streams are created. The new streams have type and format defined by the corresponding output ports of c, and have a privacy-related set equal to the intersection of input privacy-related sets, possibly affected by port-specific additions and deletions.
Given a goal set
Figure US08032924-20111004-P00061
the planning problem
Figure US08032924-20111004-P00062
Â, B,
Figure US08032924-20111004-P00063
) is to find a set of actions that leads to a state
Figure US08032924-20111004-P00064
in which there exists a stream x∈
Figure US08032924-20111004-P00065
such that
Figure US08032924-20111004-P00066
x, and such that the sum of quality values of the actions is maximized, and the sum of resource costs is less than a cost budget
Figure US08032924-20111004-P00067
Privacy Planning Domain. A basic procedure that can be used to generate action descriptions for components taking into account the labels LR, LS, LA and LD will now be described. A similar approach is used to generate goal and primal stream descriptions based on the write labels of the sources and the read label of the user. This procedure will ensure that the workflows generated by an SPPL planner will comply with the MSA policy.
Universal Attribute Set. A universal set U of all privacy predicates is defined as follows. The set is defined using the values of risk and integrity that appear in the labels used in describing components, primal streams or users. Since there is a limited number of these labels, the size of U is polynomial in the number of components and privacy attributes. For each value of integrity level t, set U includes the element int_atleast(t). For each attribute-risk (att, r) pair used in the labels, set U includes the element no_att(att, r).
Initial State. The primal streams have write labels, which are represented as initial state vectors. Given LW, the corresponding set x.s describing one of the initial state streams is:
x · s := { int_atleast ( t ) ?? | t L W · t } { no_att ( att , r ) ?? | ( att , r ) L W · A , r < r } { no_att ( att , r ) ?? | ( att , r ) L W · A , r }
Preconditions and Goals. The read labels assigned to components are translated into preconditions, and the read label of the user is translated into the goal using the following formula:
p.s:={int_atleast(L R .t)}∪{no att(att,r)∈U|∃(att,r′)∈L R .
Figure US08032924-20111004-P00068
,r′<r}
Effects. The addition and the selection labels LA and LS are translated into delete-effects, while the suppression labels LD are translated into add-effects:
d.s:={int_atleast(t)∈U|t>min{L A .t,L S .t}}∪{no att(att,r)∈U|∃(att,r′)∈L A
Figure US08032924-20111004-P00002
∪L S .
Figure US08032924-20111004-P00002
,r′≧r};
a.s:={int_atleast(t)∈U|t≦L D .t}∪{no att(att,r)∈U|∃(att,r′)∈L D .
Figure US08032924-20111004-P00002
,r′<r};
It is straightforward to show that if this procedure is followed, any solution to the planning problem
Figure US08032924-20111004-P00069
Â, B,
Figure US08032924-20111004-P00070
satisfies the constraints of the MSA policy.
Efficiency Planning Algorithms. In terms of efficiency, the SPPL planning algorithm described in A. Riabov and Z. Liu. Planning for stream processing systems, In Proc. of AAAI-05, 2005, shows excellent performance on benchmark problems, composing large workflows of 500 components in 60 seconds. In a worst case, however, the planning problem
Figure US08032924-20111004-P00069
Â, B,
Figure US08032924-20111004-P00070
is an extension of the propositional STRIPS planning formulation, which is PSPACE-complete. It is desirable, therefore, to identify special classes of planning problems that are tractable in practice.
It will now be shown that certain simplifying assumptions guarantee that the workflow planning problem is always tractable. First, assume that there are no resource cost budget constraints and no quality optimization objectives. In practice, these constraints can often be handled separately by alternative methods. Second, assume that action preconditions can only contain non-propagating predicates. Under these assumptions, the problem is in NP.
Proposition 4. The planning problem
Figure US08032924-20111004-P00069
Â,
Figure US08032924-20111004-P00070
without resource and quality constraints is NP-complete.
Although this problem is NP-complete if the use of suppressors is allowed, if the suppression labels LD are ordered the problem can be solved in polynomial time.
Proposition 5. If all non-empty suppression labels LD are ordered, e.g., LD 1
Figure US08032924-20111004-P00014
LD 2
Figure US08032924-20111004-P00014
. . .
Figure US08032924-20111004-P00014
LD d, the planning problem
Figure US08032924-20111004-P00069
Â,
Figure US08032924-20111004-P00070
without resource and quality constraints can be solved in O(|Â|2d) operations.
From this result, polynomial algorithms can be derived for other cases. One example is the case where labels are ordered, that is, if labels are one-dimensional, e.g., contain a single attribute (possibly, with different risk levels) and have equal integrity levels. Similarly, in systems where there are no suppressors the same algorithm solves the planning problem in O(|Â|2) operations. When the assumption of Proposition 5 does not hold, a general search method, such as branch-and-bound, can be employed for solving the problem. In this case, the polynomial algorithm of Proposition 5 can be used to compute bounds and speedup the search.
MLS-based Implementation. Practical issues of implementing MSA-compliant systems and the general system architecture and procedures that are required for policy compliance enforcement will now be described.
Example architecture of an MSA-complaint system is shown in FIG. 4. In this architecture the requirement for automatic workflow verification is addressed by including a workflow deployment interface 410 that is coupled to a privacy metadata database 405 and which performs necessary analysis and verification of workflows 415 a . . . x before deployment. In other words, the workflow interface 410 ensures that PII is protected Privacy architecture also extends a system runtime environment 430 by inserting a number of trusted enforcement components 460 a . . . x, which are used to ensure that the description of the workflow analyzed before deployment correctly reflects the information flow in runtime.
The system runtime 430 is a distributed system, which consists of one or more computers 450 a . . . x, a data interface 435 and a results interface 440 connected via a network 470. Sources 420 are connected to the data interface 435 and sinks 425 are connected to the results interface 440. Each of the computers 450 a . . . x runs one or more operating systems and may have trusted I/Os 455 a . . . x. Local operating systems can host components, either directly, or using compositional framework software. Workflows 415 a . . . x are submitted to a global scheduler 445 for deployment, and the scheduler 445 allocates components to operating systems and manages allocation of resources in runtime.
Before a workflow 415 a . . . x is sent to the scheduler 445 for deployment, it is verified for policy compliance by using the method of FIG. 1 and/or variations of the method of FIG. 1 as will be described hereinafter with reference to FIG. 5. If a privacy violation is detected, the workflow graph is rejected; otherwise, the workflow 415 a . . . x is sent to the scheduler 445, which will deploy it, provided that there are sufficient resources available. During verification of the workflow 415 a . . . x, the system may have a choice of labels assigned to the components 460 a . . . x. This is because the least sensitive information can be labeled as most sensitive without incurring violation of the policy. However this also means that access to that information will be restricted. Therefore, minimal write labels are assigned, so that read labels can also remain low while satisfying inequalities (rules (1), (2), (3) and (4)).
In runtime, the privacy enforcement architecture must ensure that no communication between the components takes place except those specified in the workflow graph. Further, it must verify correctness of the write labels assigned to the nodes of the workflow 415 a . . . x, especially when components 460 a . . . x are not trusted. This can be achieved by selectively logging and auditing the dataflow between the components 460 a . . . x, since full analysis of all data can be infeasible because of significant resource requirements. Finally, each workflow 415 a . . . x must be logged and later audited to manage the risk of users obtaining privacy-sensitive information through submitting a combination of multiple workflows.
In sum, the privacy architecture provides functionality spanning entire lifetime of the workflow as follows: 1) workflow verification before deployment; 2) runtime privacy policy enforcement during execution; and 3) auditing of data, components and workflows during and after execution.
Details of modules comprising the privacy architecture will now be described, starting with metadata for specifying policy and policy parameters, followed by a detailed description of verification, runtime enforcement and auditing.
Privacy Policy Metadata. Descriptions of attributes, components, sources and sinks are direct reflections of the privacy policy. Changing any of these descriptions is equivalent to changing the policy, and therefore the metadata must be protected by access control measures.
Role, Purpose and Read Labels. Access rules based on information categories have been extensively studied in existing security models, and the read label together with rule (1) follows these approaches by specifying the principal label that must dominate the label of all accessible channels.
However, in practice, privacy policy rules are commonly specified as combinations of information categories and allowed purposes of access. Therefore, in the implementation of the MSA-compliant system the rules can be specified in a more natural format. The work on role-based access control (RBAC) models, such as those described in D. Ferraiolo and D. Kuhn. Role-based access controls. In 15th NIST-NSA National Computer Security Conference, Baltimore, Md., October 1992. R. Sandhu and E. Coyne. Role-based access control models. IEEE Computer, 29(2), February 1996, has shown that RBAC can reduce the complexity and cost of security administration.
RBAC fits naturally in the privacy framework, and a similar approach can be used to specify privacy policy rules instead of specifying read labels for every node.
Combining principals' roles and purposes, the privacy policy defines a mapping F that determines the read label LR(p) for each principal p based on the role and current purpose of the principal, e.g., LR(p)=
Figure US08032924-20111004-P00071
(role(p), purpose(p)).
In practice, the mapping
Figure US08032924-20111004-P00071
can be implemented as table look-up. Assuming privacy policy metadata contains this table, specifying role and purpose for a principal is equivalent to specifying a read label.
Sources. Metadata for each of the data sources contains a single write label LW that describes the set of attributes and integrity of the information arriving from this source.
The set of attributes LW.
Figure US08032924-20111004-P00002
should contain all attributes that may appear in the data stream supplied by the source. Risk levels for the attributes should be chosen conservatively. A higher risk level assigned to the attribute restricts access to the information derived from the source. Restricted access may be alleviated by stronger suppression, if such suppressor components are available. In this case, specifying a higher risk level is equivalent to requiring valid workflows to apply stronger suppression.
Integrity of the source LW.t can be used to control suppression. Low integrity sources may supply data in which it is hard to suppress private information, e.g., because steganography methods are used to hide PII in usually non-sensitive data. To prevent unsuccessful suppression, read labels assigned to suppressors should specify a minimum integrity level necessary for suppression.
Sinks and Users. Sink nodes correspond to outputs of the system, and typically represent flows that are presented to end users of the system. For each of the sink nodes, the read labels are used to describe access rights of the recipients of these flows. As discussed above, the user role and access purpose can be specified instead.
Purpose can be uniquely determined by the user role, or the user can be given a choice of purposes, if the policy allows that. If users can choose among multiple purposes, it is hard to detect the situation in which data is requested for one purpose, but used for another. However, auditors may detect a violation even in this case based on the processing history of submitted workflows.
Instantiated Components. Instantiated components are components that are already deployed when the workflow is submitted, and for which labels cannot be changed. Workflows can make use of components already existing in the system, applying different processing to output already deployed workflows to derive new results. It is assumed that instantiated components as principals comply with the MSA model, and that rule (4), which defines the relationship between principal labels, holds for each instantiated component. For these instantiated components, privacy metadata must contain read and write labels.
Deployment of the workflow may involve not only establishing connections between already deployed instantiated components, but also require automated deployment of new component instances. Each of these newly deployed components is an instance of a corresponding component class. Metadata describing a component class includes the location of executable code that must be deployed during component instantiation, as well as various configuration parameters and resource dependencies.
Component class metadata must also include a description of component functionality and access rights from a privacy perspective. As other types of privacy metadata, privacy-related sections of component class description must be protected from unauthorized modification. While a component class itself does not process any data and does not act as a principal in the information flow, component instances become principals, and therefore, must be assigned a set of principal labels. Component class metadata defines template labels that are used to compute principal labels for component instances.
Template labels defined in component class metadata are (LR, LA, LS, LD), where LR is the template read label, LA is the template addition label, LS is the template selection label, and LD is the template suppression label. As discussed above, read labels LR can be specified indirectly by assigning roles and purposes to component classes.
Principal labels for new component instances are computed before deployment of the workflow. Principal labels ({circumflex over (L)}R, {circumflex over (L)}A, {circumflex over (L)}S, {circumflex over (L)}D, {circumflex over (L)} W) are based on the labels of component inputs in the workflow and are derived from template labels using the procedure that will be described below. Principal labels must satisfy the following conditions:
{circumflex over (L)} R
Figure US08032924-20111004-P00014
L R
{circumflex over (L)} A =L A
{circumflex over (L)} S =L S ↓{circumflex over (L)} R
{circumflex over (L)} D =L D
{circumflex over (L)} W
Figure US08032924-20111004-P00072
{circumflex over (L)} R ␣{circumflex over (L)} S ␣{circumflex over (L)} A −{circumflex over (L)} D.  (5)
The above notation {circumflex over (L)}S={circumflex over (L)}S↓{circumflex over (L)}R means that the selection label {circumflex over (L)}S is adjusted to avoid conflicts with the chosen value of label {circumflex over (L)}R. This operation removes from LS the selection of attributes that cannot be contained in the input because of the read access rule (1). More precisely, {circumflex over (L)}S=LS↓{circumflex over (L)}R holds if and only if the following holds:
{circumflex over (L)} S.
Figure US08032924-20111004-P00002
={(att-selected,r)∈L S .
Figure US08032924-20111004-P00002
|att∈Attr({circumflex over (L)} R)}
{circumflex over (L)} S .t=L S .t.
In this case, {circumflex over (L)}S
Figure US08032924-20111004-P00014
LS.
The formulas in (5) imply that the description of a component class defines most of the principal labels for components, and establishes an upper bound on the read label in partial order. This bound specifies the maximum sensitivity of the information that the component can access. It is important to specify this label correctly for trusted components performing suppression, since for successful suppression the potential content of the data subjected to suppression must be known. The suppressors can only be trusted to work with data of limited sensitivity and of limited attribute range, and only this data is at or above a given minimum integrity level. LR allows all these requirements to be specified. The write label {circumflex over (L)}W is chosen to satisfy rule (4), which is repeated in the last inequality of (5).
The ability of this model to adapt to different read labels allows greater flexibility in a workflow configuration. Equations (5) generally allow {circumflex over (L)}W to be smaller for smaller values of {circumflex over (L)}R, since from (5) it follows that if {circumflex over (L)}R
Figure US08032924-20111004-P00014
LR, then {circumflex over (L)}R␣{circumflex over (L)}S␣{circumflex over (L)}A−{circumflex over (L)}D
Figure US08032924-20111004-P00014
LR␣LS␣LA−LD, and if {circumflex over (L)}W={circumflex over (L)}R␣{circumflex over (L)}S␣{circumflex over (L)}A−{circumflex over (L)}D, then using Rule (4), {circumflex over (L)}W={circumflex over (L)}R␣{circumflex over (L)}S␣{circumflex over (L)}A−{circumflex over (L)}D
Figure US08032924-20111004-P00014
LR␣LS␣LA−LD
Figure US08032924-20111004-P00014
LW, and thus {circumflex over (L)}R
Figure US08032924-20111004-P00014
L R
Figure US08032924-20111004-P00032
∃{circumflex over (L)}W:{circumflex over (L)}W
Figure US08032924-20111004-P00014
LW. Hence, components working with less sensitive information than the maximum LR allowed by component design can produce an output that is marked as less sensitive, and therefore, is accessible to a broader audience in compliance with the MSA policy.
Workflow Verification. To guarantee that the privacy policy is enforced, verification of workflows for policy compliance must take place before the workflows are deployed. A verification procedure is applied to the workflow in two steps, as shown, for example, in FIG. 5. As shown in FIG. 5, first, read and write labels satisfying rule (4) of the MSA model are assigned to the components that are not yet deployed, and do not have these labels specified directly. The metadata necessary for assigning read and write labels, when these labels are not specified directly, are stored as part of a component class description in a global directory of component classes. At the same time, channel labels are assigned to the edges, such that the channel label of an edge is equal to the write label of a tail node of that edge. Hence, after channel labels have been assigned, the workflow automatically satisfies rule (2).
Once read and write labels satisfying rule (4) have been assigned to every principal in the workflow, and channel labels satisfying rule (2) have been assigned to every edge, the second step of the verification procedure starts. At this step, rule (1) is verified for every principal p and every channel o supplied to p. This condition requires that the read labels of principals dominate channel labels of the edges carrying flow into the principals. If this condition is violated, the workflow is rejected. Otherwise, the workflow with the computed labels complies with all rules of the MSA model, and can be deployed.
A more detailed description of a label assignment algorithm of the workflow verification method will now be described. This algorithm is only performed for verification workflows that require deployment of new components, and the information flow is required to by acyclic.
As discussed above, for each of the sources, the metadata specifies the values of write labels. As shown in FIG. 5, the algorithm proceeds recursively, that is, starting from the sources. At each iteration of the algorithm, channel labels are first assigned to all edges leaving the nodes with assigned write labels. If this assignment causes a conflict with rule (1), the workflow is rejected due to a policy violation. Then, the loop at step 7 of the algorithm assigns read labels to all new component instances that do not have read labels assigned thereto, and to channel labels of all input edges that have been assigned.
The read label assigned to the component instance should allow as little access to sensitive information as possible due to the principle of least privilege. For example, the read label is chosen to be just enough to read the channels received by the component instance. In other words, the READ label LR(p) satisfies
L ^ R ( p ) = i L ( o i )
where L(oi)(i=1, . . . , n) are labels of the channels to be read by the component. Privacy metadata requires that the read label specified in metadata describing component class P of component p must dominate the read label assigned to p. Furthermore, {circumflex over (L)}R(p) must satisfy constraints C(P) specified in the description of the component class. If either of the two conditions is violated, the workflow must be rejected as shown in step 10.
The selection label specified in the component class metadata may enumerate a large set of attributes, exceeding the set present in the read label of a component instance. The selection label {circumflex over (L)}S(p) assigned to the component instance in step 11 depends on the attributes contained in the channels read by the component instance. More specifically, {circumflex over (L)}S(p) should include the attribute att—selected only if att is contained in the read label {circumflex over (L)}R(p). For example, if L(o)=
Figure US08032924-20111004-P00073
{(Name,0.6), (Salary,1)},t
Figure US08032924-20111004-P00031
and {circumflex over (L)}S(P)=
Figure US08032924-20111004-P00073
{(Name−selected,1), (SSN−selected,1)}, t
Figure US08032924-20111004-P00031
, the algorithm assigns {circumflex over (L)}S(P)=
Figure US08032924-20111004-P00073
{(Name−selected,1)},t
Figure US08032924-20111004-P00031
.
After the principal labels are assigned to the component instance p of class P, the write label LW(p) is determined as {circumflex over (L)}W(p)={circumflex over (L)}R(p)
Figure US08032924-20111004-P00074
{circumflex over (L)}S(p)
Figure US08032924-20111004-P00074
{circumflex over (L)}A(p)−{circumflex over (L)}D(p), in accordance with (5). Iterations are repeated until all workflow edges have been assigned channel labels.
In FIG. 5, the algorithm terminates after a finite number of iterations, which does not exceed the number of nodes in the workflow. It is also straightforward to show that this algorithm assigns labels such that if a feasible assignment exists, a feasible solution will be found by the algorithm, and the workflow will be accepted. Further, label assignment is feasible if it satisfies rules (1), (2), (4) and component instantiation rules (5).
Runtime Enforcement and MLS. The verification method described above helps prevent the deployment of privacy-violating workflows. However, in run-time, continuous privacy policy enforcement is required to make sure that the information flows stay within the boundaries defined by the approved workflow specifications which were initially submitted for deployment.
In particular, the system runtime must ensure that there are no information flows except those permitted by rules (1) and (2). It is easy to see that these rules can be satisfied if and only if for every principal p′ receiving information from any principal p, the following condition on the corresponding read and write labels is satisfied (6):
L W(p)
Figure US08032924-20111004-P00014
L R(p′).
Access control mechanisms of existing multi-level secure (MLS) systems, such as IBM zSeries IBMCorporation. http://www.ibm.com/servers/eserver/zseries/security/mls.html, can be leveraged to enforce the MSA model constraints. This can be achieved by creating an order preserving one-to-one mapping M(•) between MSA labels L and MLS labels l, such that for all MSA labels L
Figure US08032924-20111004-P00014
L′ it follows that M(L)
Figure US08032924-20111004-P00075
M(L′), in the partial order
Figure US08032924-20111004-P00035
of MLS, and for all l
Figure US08032924-20111004-P00035
l′ it follows that M−1(l)
Figure US08032924-20111004-P00075
M−1(l′). With this mapping, the MLS enforcement mechanisms will automatically enforce the MSA condition (6) by enforcing a similar constraint of the Bell-LaPadula policy on corresponding MLS labels.
An MLS label l=(s,C) is a combination of a secrecy level s and a category set C. Secrecy level can be chosen from one of several discrete values, such as unclassified or secret. The category set contains the categories relevant to the data described by the label. The relation
Figure US08032924-20111004-P00035
is defined as follows: (s,C)
Figure US08032924-20111004-P00035
(s′,C′) if and only if s≦s′ and CC′.
Assume that over all MSA labels at most n different risk levels are used for each attribute. Since the number of labels is finite, this is a reasonable assumption. For each attribute att, define MLS categories att-1, att-2, . . . , att-n. Similarly, for m integrity levels define MLS categories int-1, int-2, . . . , int-m. The mapping M is defined as follows: (s,C)=M(L), where s is constant and
C:=C(L)=(∪( att,r)∈L.Ai=1 μ( r){att-i})∪(∪i= L.t m {int-i}),
where μ(r) is the position of risk value r in the ascending order of n risk values. This mapping has all the required properties.
If privacy suppressors must be supported, the MLS implementation must support trusted downgraders, which are allowed to violate the “no write-down” rule of the Bell-LaPadula policy. It should also be noted that the mapping between MSA and MLS can be modified such that existing MLS labels on the channels can be used jointly with MSA labels, enforcing regular MLS data confidentiality protection in addition to MSA privacy protection constraints.
Auditing of MSA Metadata Correctness. The MSA model makes final access control decisions based on the label assigned to the flow leaving the system. These labels, and therefore these decisions, can only be trusted if the metadata assigned to components and sources exactly or with conservative overprotection represents the functionality of the components and the contents of the sources. Thus, it is crucial for successful deployment of an MSA-based system to establish procedures for verifying the correctness of the metadata, e.g., by performing code reviews or selective logging and auditing of data flows.
The following aspects of MSA metadata have been identified as requiring attention during auditing.
Confidentiality violations. A write label assigned to a source does not contain all sensitive attributes contained in the output of the source, or contains these attributes with levels lower than the observed levels. Sensitive information added by a component to its output is not reflected in the addition label of the component. Sensitive attribute suppression declared in the suppression label of a component is not performed correctly.
Anonymity violations. Selection on sensitive attributes performed by a component is not represented in the selection label of the component.
Integrity violations. An integrity level assigned to the write label of a component is above the true integrity level. A write label assigned to a source specifies an integrity higher than true integrity of the data.
Note that in this description there is no distinction between components and component classes, assuming that component class metadata must be valid for all possible component instantiations.
Attribute Combination Constraints. In many practical applications rule (1) needs to be extended to take into account combinations of attributes. Recall that the original rule (1) requires that the read label of a component dominates the object label of the channel:
L(o)
Figure US08032924-20111004-P00014
L R.
This condition is then extended by adding to this condition a principal-specific set of combinations that are not allowed to appear in any channel connected to the principal. For example, the combination of SSN and FullName attributes can be prohibited, while SSN or FullName alone are allowed. The set of combinations is then stored as part of component or component class metadata, together with the read label, or a read template label.
In accordance with an exemplary embodiment of the present invention, the privacy protection framework based on the MSA access control policy supports a general composition architecture where data streams are processed by workflows formed of stream processing components connected by communication channels. In the framework, any component may produce new streams of data that become available for analysis by other components. The framework can easily be enforced in existing MLS systems that implement the Bell-LaPadula policy. This together with an RBAC approach makes configuration and administration of the framework simple and practical. For example, while additional tools are needed to visualize and modify MSA labels at the level of system administrators and users, the underlying enforcement mechanisms of MLS systems do not require any modification.
In addition, the framework supports suppression algorithms with varying degrees of suppression. Further, the MSA policy can prevent a disclosure of results derived from selected data if the selection criteria leads to a mechanism similar to those used in the RBAC approach.
It should be understood that the present invention may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof. In one embodiment, the present invention may be implemented in software as an application program tangibly embodied on a program storage device (e.g., magnetic floppy disk, RAM, CD ROM, DVD, ROM, and flash memory). The application program may be uploaded to, and executed by, a machine comprising any suitable architecture.
It is to be further understood that because some of the constituent system components and method steps depicted in the accompanying figures may be implemented in software, the actual connections between the system components (or the process steps) may differ depending on the manner in which the present invention is programmed. Given the teachings of the present invention provided herein, one of ordinary skill in the art will be able to contemplate these and similar implementations or configurations of the present invention.
It should also be understood that the above description is only representative of illustrative embodiments. For the convenience of the reader, the above description has focused on a representative sample of possible embodiments, a sample that is illustrative of the principles of the invention. The description has not attempted to exhaustively enumerate all possible variations. That alternative embodiments may not have been presented for a specific portion of the invention, or that further undescribed alternatives may be available for a portion, is not to be considered a disclaimer of those alternate embodiments. Other applications and embodiments can be implemented without departing from the spirit and scope of the present invention.
It is therefore intended, that the invention not be limited to the specifically described embodiments, because numerous permutations and combinations of the above and implementations involving non-inventive substitutions for the above can be created, but the invention is to be defined in accordance with the claims that follow. It can be appreciated that many of those undescribed embodiments are within the literal scope of the following claims, and that others are equivalent.

Claims (18)

1. A method for verifying privacy policy compliance of a workflow in a distributed stream processing system, comprising:
assigning, by using a processor, a read label to a user, wherein the read label of the user includes a risk level for data readable by the user;
assigning, by using the processor, read and write labels to processing components, wherein a processing component receives data and includes a program that transforms the data to produce new data as output, the read label of a processing component includes a risk level for data readable by the processing component and the write label of a processing component includes a risk level of data written from the processing component;
assigning, by using the processor, channel labels to channels, wherein channels connect the processing components to other processing components, external data sources or the user and the channel labels identify a risk level of data readable by the channel;
generating, by using the processor, a workflow, wherein the workflow includes first and second processing components, first and second channels and the user, wherein the first and second processing components are connected to each other by the first channel and the second channel connects an output of the second processing component to the user, wherein the first processing component, the first channel, the second processing component, the second channel and the user are sequentially connected;
determining, by using the processor, if the workflow satisfies the following numbered conditions: (1) the read label of the second processing component dominates the channel label of the first channel, (2) the write label of the first processing component dominates the channel label of the first channel, (3) the write label of the first processing component dominates the read label of the first processing component, (4) the write label of the second processing component dominates the channel label of the second channel, (5) the write label of the second processing component dominates the read label of the second processing component and (6) the read label of the user dominates the channel labels of the first and second channels, wherein a label dominates another label when its risk level is greater than that of the another label and the label does not dominate the another label when its risk level is less than that of the another label;
verifying, by using the processor, that the workflow is in compliance with a privacy policy of the distributed stream processing system when the conditions 1-6 are affirmatively satisfied; and
accepting, by using the processor, the workflow such that the output of the second processing component is readable by the user.
2. The method of claim 1, wherein at least one of the channel labels of the first and second channels is pre-assigned.
3. The method of claim 1, wherein at least one set of the read and write labels of the first and second processing components is pre-assigned.
4. The method of claim 1, further comprising:
assigning, by using the processor, a maximum read label to one of the first and second processing components,
wherein the workflow is accepted if the read label of the processing component having the maximum read label is dominated by the maximum read label.
5. The method of claim 1, further comprising:
assigning, by using the processor, an addition label to one of the first and second processing components, the addition label includes data attributes added to data output by the processing component having the addition label,
wherein the workflow is accepted if the addition label is dominated by the write label of the processing component having the addition label.
6. The method of claim 1, further comprising:
assigning, by using the processor, a selection label to one of the first and second processing components, the selection label storing selection operations performed by the component having the selection label,
wherein the workflow is accepted if the selection label is dominated by the write label of the processing component having the selection label.
7. The method of claim 1, further comprising:
assigning, by using the processor, a suppression label to one of the first and second processing components, the suppression label describing data attributes that are to be suppressed by the processing component having the suppression label, wherein when the suppression label is assigned, the write label of the processing component to which it is assigned dominates a minimum value of the suppression label and the read label of that processing component.
8. The method of claim 1, further comprising:
generating, by using the processor, a planning task representation; and
reading, by using the processor, the planning task representation using a planning algorithm,
wherein the read label is assigned to the user, the channel labels are assigned to the channels and the read and write labels are assigned to the processing components using the planning algorithm, and
wherein the workflow is generated by translating an output of the planning algorithm.
9. The method of claim 1, further comprising:
deploying, by using the processor, the workflow when it is accepted.
10. The method of claim 9, further comprising:
translating, by using the processor, the read and write labels of the first and second processing components into multi-level secure (MLS) system compatible labels, wherein the MLS system enforces a Bell-LaPadula privacy policy and the workflow is deployed in the MLS system.
11. A computer program product comprising a non-transitory computer useable medium having computer program logic recorded thereon for verifying privacy policy compliance of a workflow in a distributed stream processing system, the computer program logic comprising:
program code for assigning a read label to a user, wherein the read label of the user includes a risk level for data readable by the user;
program code for assigning read and write labels to processing components, wherein a processing component receives data and includes a program that transforms the data to produce new data as output, the read label of a processing component includes a risk level for data readable by the processing component and the write label of a processing component includes a risk level of data written from the processing component;
program code for assigning channel labels to channels, wherein channels connect the processing components to other processing components, external data sources or the user and the channel labels identify a risk level of data readable by the channel;
program code for generating a workflow, wherein the workflow includes first and second processing components, first and second channels and the user, wherein the first and second processing components are connected to each other by the first channel and the second channel connects an output of the second processing component to the user, wherein the first processing component, the first channel, the second processing component, the second channel and the user are sequentially connected;
program code for determining if the workflow satisfies the following numbered conditions: (1) the read label of the second processing component dominates the channel label of the first channel, (2) the write label of the first processing component dominates the channel label of the first channel, (3) the write label of the first processing component dominates the read label of the first processing component, (4) the write label of the second processing component dominates the channel label of the second channel, (5) the write label of the second processing component dominates the read label of the second processing component and (6) the read label of the user dominates the channel labels of the first and second channels, wherein a label dominates another label when its risk level is greater than that of the another label and the label does not dominate the another label when its risk level is less than that of the another label;
program code for verifying that the workflow is in compliance with a privacy policy of the distributed stream processing system when the conditions 1-6 are affirmatively satisfied; and
program code for accepting the workflow such that the output of the second processing component is readable by the user.
12. The computer program product of claim 11, further comprising:
program code for assigning a maximum read label to one of the first and second processing components,
wherein the workflow is accepted if the read label of the processing component having the maximum read label is dominated by the maximum read label.
13. The computer program product of claim 11, further comprising:
program code for assigning an addition label to one of the first and second processing components, the addition label includes data attributes added to data output by the processing component having the addition label,
wherein the workflow is accepted if the addition label is dominated by the write label of the processing component having the addition label.
14. The computer program product of claim 11, further comprising:
program code for assigning a selection label to one of the first and second processing components, the selection label storing selection operations performed by the component having the selection label,
wherein the workflow is accepted if the selection label is dominated by the write label of the processing component having the selection label.
15. The computer program product of claim 11, further comprising:
program code for assigning a suppression label to one of the first and second processing components, the suppression label describing data attributes that are to be suppressed by the processing component having the suppression label, wherein when the suppression label is assigned, the write label of the processing component to which it is assigned dominates a minimum value of the suppression label and the read label of that processing component.
16. The computer program product of claim 11, further comprising:
program code for generating a planning task representation; and
program code for reading the planning task representation using a planning algorithm,
wherein the read label is assigned to the user, the channel labels are assigned to the channels and the read and write labels are assigned to the processing components using the planning algorithm, and
wherein the workflow is generated by translating an output of the planning algorithm.
17. The computer program product of claim 11, further comprising:
program code for deploying the workflow when it is accepted.
18. The computer program product of claim 17, further comprising:
program code for translating the read and write labels of the first and second processing components into multi-level secure (MLS) system compatible labels, wherein the MLS system enforces a Bell-LaPadula privacy policy.
US12/130,308 2006-08-01 2008-05-30 Access control method and a system for privacy protection Expired - Fee Related US8032924B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/130,308 US8032924B2 (en) 2006-08-01 2008-05-30 Access control method and a system for privacy protection

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/496,821 US8020213B2 (en) 2006-08-01 2006-08-01 Access control method and a system for privacy protection
US12/130,308 US8032924B2 (en) 2006-08-01 2008-05-30 Access control method and a system for privacy protection

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/496,821 Continuation US8020213B2 (en) 2006-08-01 2006-08-01 Access control method and a system for privacy protection

Publications (2)

Publication Number Publication Date
US20090064343A1 US20090064343A1 (en) 2009-03-05
US8032924B2 true US8032924B2 (en) 2011-10-04

Family

ID=39030789

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/496,821 Expired - Fee Related US8020213B2 (en) 2006-08-01 2006-08-01 Access control method and a system for privacy protection
US12/130,308 Expired - Fee Related US8032924B2 (en) 2006-08-01 2008-05-30 Access control method and a system for privacy protection

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US11/496,821 Expired - Fee Related US8020213B2 (en) 2006-08-01 2006-08-01 Access control method and a system for privacy protection

Country Status (1)

Country Link
US (2) US8020213B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8881293B1 (en) * 2013-07-01 2014-11-04 Sap Se Static application security testing

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8495384B1 (en) * 2009-03-10 2013-07-23 James DeLuccia Data comparison system
US20100293618A1 (en) * 2009-05-12 2010-11-18 Microsoft Corporation Runtime analysis of software privacy issues
US20110046976A1 (en) * 2009-08-20 2011-02-24 William Theodore Peruzzi Integrated Communications System
US8443448B2 (en) * 2009-08-20 2013-05-14 Federal Reserve Bank Of New York System and method for detection of non-compliant software installation
CA2830360C (en) * 2011-03-24 2017-07-18 Nec Corporation Information monitoring apparatus and information monitoring method
US20130035981A1 (en) * 2011-08-05 2013-02-07 Disney Enterprises, Inc. Social networks games configured to elicit research data as part of game play
US20130035989A1 (en) * 2011-08-05 2013-02-07 Disney Enterprises, Inc. Conducting market research using social games
US20140096261A1 (en) * 2012-10-01 2014-04-03 Nokia Corporation Method and apparatus for providing privacy policy for data stream
RU2525481C2 (en) * 2012-11-01 2014-08-20 Открытое акционерное общество "Научно-производственное объединение Русские базовые информационные технологии" Method of securing information flow in secure information systems with mandatory and role-based access control
US9361379B1 (en) 2013-09-27 2016-06-07 Amazon Technologies, Inc. Systems and methods providing recommendation data
US9021606B1 (en) * 2013-09-27 2015-04-28 Amazon Technologies, Inc. Systems and methods providing format data
US9218437B1 (en) * 2013-09-27 2015-12-22 Amazon Technologies, Inc. Systems and methods providing event data
US9697467B2 (en) * 2014-05-21 2017-07-04 International Business Machines Corporation Goal-driven composition with preferences method and system
US9785755B2 (en) * 2014-05-21 2017-10-10 International Business Machines Corporation Predictive hypothesis exploration using planning
US10334062B2 (en) * 2016-02-25 2019-06-25 InAuth, Inc. Systems and methods for recognizing a device
CN109766715B (en) * 2018-12-24 2023-07-25 贵州航天计量测试技术研究所 Big data environment-oriented privacy information anti-leakage automatic identification method and system

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5864683A (en) 1994-10-12 1999-01-26 Secure Computing Corporartion System for providing secure internetwork by connecting type enforcing secure computers to external network for limiting access to data based on user and process access rights
US6105132A (en) 1997-02-20 2000-08-15 Novell, Inc. Computer network graded authentication system and method
US6292900B1 (en) 1996-12-18 2001-09-18 Sun Microsystems, Inc. Multilevel security attribute passing methods, apparatuses, and computer program products in a stream
US20030105979A1 (en) 2001-12-05 2003-06-05 Shinji Itoh Network system enabling transmission control
US20030167269A1 (en) 1998-11-05 2003-09-04 Gupta Arun Kumar Method for controlling access to information
US20040044655A1 (en) 2002-09-04 2004-03-04 International Business Machines Corporation Row-level security in a relational database management system
US20040225883A1 (en) 2003-05-07 2004-11-11 Weller Michael K. Method and apparatus providing multiple single levels of security for distributed processing in communication systems
US20050044409A1 (en) 2003-08-19 2005-02-24 International Business Machines Corporation Implementation and use of a PII data access control facility employing personally identifying information labels and purpose serving functions sets
US20050138110A1 (en) * 2000-11-13 2005-06-23 Redlich Ron M. Data security system and method with multiple independent levels of security
US6931411B1 (en) 2001-05-30 2005-08-16 Cryptek, Inc. Virtual data labeling system and method
US20060218651A1 (en) * 1995-02-13 2006-09-28 Intertrust Technologies Corp. Trusted infrastructure support systems, methods and techniques for secure electronic commerce, electronic transactions, commerce process control and automation, distributed computing, and rights management
US7134022B2 (en) 2002-07-16 2006-11-07 Flyntz Terence T Multi-level and multi-category data labeling system
US20070256117A1 (en) 2006-05-01 2007-11-01 Patrick Shomo Systems and methods for the secure control of data within heterogeneous systems and networks
US7676673B2 (en) 2006-04-28 2010-03-09 Bae Systems Information And Electronic Systems Integration Inc. Multi-level secure (MLS) information network

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5864683A (en) 1994-10-12 1999-01-26 Secure Computing Corporartion System for providing secure internetwork by connecting type enforcing secure computers to external network for limiting access to data based on user and process access rights
US20060218651A1 (en) * 1995-02-13 2006-09-28 Intertrust Technologies Corp. Trusted infrastructure support systems, methods and techniques for secure electronic commerce, electronic transactions, commerce process control and automation, distributed computing, and rights management
US6292900B1 (en) 1996-12-18 2001-09-18 Sun Microsystems, Inc. Multilevel security attribute passing methods, apparatuses, and computer program products in a stream
US6105132A (en) 1997-02-20 2000-08-15 Novell, Inc. Computer network graded authentication system and method
US20030167269A1 (en) 1998-11-05 2003-09-04 Gupta Arun Kumar Method for controlling access to information
US20050138110A1 (en) * 2000-11-13 2005-06-23 Redlich Ron M. Data security system and method with multiple independent levels of security
US6931411B1 (en) 2001-05-30 2005-08-16 Cryptek, Inc. Virtual data labeling system and method
US20030105979A1 (en) 2001-12-05 2003-06-05 Shinji Itoh Network system enabling transmission control
US7134022B2 (en) 2002-07-16 2006-11-07 Flyntz Terence T Multi-level and multi-category data labeling system
US20040044655A1 (en) 2002-09-04 2004-03-04 International Business Machines Corporation Row-level security in a relational database management system
US20040225883A1 (en) 2003-05-07 2004-11-11 Weller Michael K. Method and apparatus providing multiple single levels of security for distributed processing in communication systems
US20050044409A1 (en) 2003-08-19 2005-02-24 International Business Machines Corporation Implementation and use of a PII data access control facility employing personally identifying information labels and purpose serving functions sets
US7676673B2 (en) 2006-04-28 2010-03-09 Bae Systems Information And Electronic Systems Integration Inc. Multi-level secure (MLS) information network
US20070256117A1 (en) 2006-05-01 2007-11-01 Patrick Shomo Systems and methods for the secure control of data within heterogeneous systems and networks

Non-Patent Citations (14)

* Cited by examiner, † Cited by third party
Title
Agrawal, R. And R. Srikant, "Privacy-Preserving Data Mining," In Proc. SIGMOD-97, 1997.
Bell, D. E., La Padula, L. J., "Secure Computer System: Unified Exposition and Multics Interpretation," The Mitre Corporation, Project No. 522B, Report No. ESD-TR-75-306, Mar. 1976.
Blythe, J., Deelman, E., Gil, Y., Kesselman, C., Agarwal, A., Mehta, G., Vahi, K., The Role of Planning in Grid Computing, ICAPS 2003.
Brown, A., Keller, A. and Hellerstein, J., "A Model of Configuration Complexity and its Application to a Change Management System," In Proc. IM-05, 2005.
Denning, D., "A Lattice Model of Secure Information Flow," Communications of ACM, 19 (5):236-243, May 1976.
Ferraiolo, D. and Kuhn, R., "Role-Based Access Control," In 15th NIST-NSA National Computer Security Conference, Baltimore, MD, Oct. 1992.
Gil, Y., Deelman, E., Blythe, J., Kesselman, C. and Tangmunarunkit, H., "Artificial Intelligence and Grids: Workflow Planning and Beyond," IEEE Intelligent Systems, Jan. 2004.
Myers, A. And B. Liskov, "Complete, Safe Information Flow with Decentralized Labels," Proceedings of the IEEE Symposium on Security and Privacy, May 1998, Oakland, California.
Pistore, M., Traverso, P. and Bertoli, P., "Automated Composition of Web Services by Planning in Asynchronous Domains," In Proc. of ISWC2003, 2003.
Riabov, A., Liu, Z., "Planning for Stream Processing Systems," AAAI 2005.
Sandhu, R., "Lattice-Based Access Control Models," IEEE Computer, 26(11):9-19, 1993.
Sandhu, R., E. Coyne, H. Feinstein and C. Youman, "Role-Based Access Control Models," IEEE Computer, 29 (2), Feb. 1996.
Sweeney, L., "k-Anonymity: A Model for Protecting Privacy," Int. J. Unceratin. Fuzziness Knowl.-Based Syst., 10 (5):557-570, 2002.
Wu, D., Parsia, B., Sirin, E., Hendler, J. and Nau, D., "Automating DAML-S Web Services Compositions Using SHOP2," In Proc. of ISWX2003, 2003.

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8881293B1 (en) * 2013-07-01 2014-11-04 Sap Se Static application security testing

Also Published As

Publication number Publication date
US20090064343A1 (en) 2009-03-05
US8020213B2 (en) 2011-09-13
US20080034439A1 (en) 2008-02-07

Similar Documents

Publication Publication Date Title
US8032924B2 (en) Access control method and a system for privacy protection
US8132259B2 (en) System and method for security planning with soft security constraints
US8276192B2 (en) System and method for security planning with hard security constraints
Karjoth et al. Platform for enterprise privacy practices: Privacy-enabled management of customer data
US7610315B2 (en) System and method of determining and recommending a document control policy for a document
US7529931B2 (en) Managing elevated rights on a network
US20090205018A1 (en) Method and system for the specification and enforcement of arbitrary attribute-based access control policies
US20030196108A1 (en) System and techniques to bind information objects to security labels
US20090210267A1 (en) System and method for automatically mapping security controls to subjects
EP1669837A2 (en) Believably trustworthy enforcement of privacy enhancing technologies in data processing
Chen et al. Practical data access minimization in {Trigger-Action} platforms
Burns Threat modeling: A process to ensure application security
Nicolaysen et al. Agile software development: The straight and narrow path to secure software?
JP2007004610A (en) Complex access approval method and device
Laborde et al. Logic-based methodology to help security architects in eliciting high-level network security requirements
Alam Software security requirements checklist
Walter et al. Tool-based attack graph estimation and scenario analysis for software architectures
Ouchani et al. Generating attacks in SysML activity diagrams by detecting attack surfaces
Wu Security functional requirements analysis for developing secure software
Guo Secrecy Resilience of Authorization Policies and Its Application to Role Mining
Mehta Ensuring compliance with data privacy and usage policies in online services
Kazmi Access control process for a saas provider
Zhao et al. Modeling and checking the security of DIFC system configurations
Squicciarini et al. Privacy policies compliance across digital identity management systems
Sankaranarayanan et al. A trust assignment model based on alternate actions payoff

Legal Events

Date Code Title Description
REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Expired due to failure to pay maintenance fee

Effective date: 20151004