WO2001037097A1 - Method for identifying unique entities in disparate data files - Google Patents
Method for identifying unique entities in disparate data files Download PDFInfo
- Publication number
- WO2001037097A1 WO2001037097A1 PCT/US2000/031399 US0031399W WO0137097A1 WO 2001037097 A1 WO2001037097 A1 WO 2001037097A1 US 0031399 W US0031399 W US 0031399W WO 0137097 A1 WO0137097 A1 WO 0137097A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- record
- data
- records
- unique identifier
- agreement
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/80—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H30/00—ICT specially adapted for the handling or processing of medical images
- G16H30/20—ICT specially adapted for the handling or processing of medical images for handling medical images, e.g. DICOM, HL7 or PACS
Definitions
- This invention relates to a method of matching computer-based records for identifying unique entities both within and between disparate data files.
- This method of record-linkage has particular utility in the fields of epidemiology and health services research.
- the present invention is a computer-implemented system and method for creating a universal identifier for more than one record in one or more data files, the process comprising standardizing one or more data elements m each record, estimating the agreement and disagreement weights employed in the probabilistic function, and assigning a randomly generated unique identifier to each record
- this invention relates to a computer-implemented system and method for concatenating records belonging to the same source within a data base or between data bases, the process comprising
- this invention relates to a computer-implemented system and method for concatenating records belonging to the same source where some records have a unique identifier and new records are created, the process comprising
- Figure 1 is a block diagram of illustrative input record components and atomic components
- Figure 2 is a flowchart of weights calculated based on chance agreement using an iterative bootstrap techmque Future 3 is a flowchart of the process for generating randomly assigned unique identifiers
- This invention provides a means for generating a unique identifier for records that ultimately relate back to a single source It is particularly useful where characterizing data identifying that source expands or changes over time Specific examples are financial data and patient data However, m both instances, data can normally be stored in a centralised data file such as a central server only if it is adequate secured and anonymized
- a trusted third party-environments This invention has its greatest use m the trusted third-party environment
- TTP Trusted Third Party
- this invention address the step of creating and assigning a unique identifier to a record after which these records are concatenated based on the unique identifier
- the creation and assignment steps have three ma
- Agreement Weight The weight assigned to an element pair when they agree du ⁇ ng the record matching process
- Element Error Rate' The proportion of element pairs where at least one element is unknown, e.g., null n null
- Frequency Table Summary of the number of times, and percentage of total different values of a variable occur
- the input file can contain an number of variables of which one or more are or may be unique to a particular data source such as an individual. Examples of useful variables are- member identifier, drivers' license number, social security number, insurance company code number, name, gender, date of birth, street address, city, state, postal code, citizenship In addition, some identifiers can be further distilled down into their basic, or atomic, components.
- Figure 1 illustrates the use of selected input record components and atomic components of some records that are amenable to such further distillation
- Input Record 100 illustrates data which can be used as the basis for assigning a unique identifier, and how that data can be broken out inot its atomic and subatomic components exemplified by Street Address 1 10, Date of birth 120 and Name 130.
- the source file is then scanned and the records are assigned a random number between 1 and ⁇
- a data mat ⁇ x is created containing a Cartesian product of records with a random number of 1 assigned
- the resulting matrix is then scanned Each element pair within each record pair is assessed and assigned a value in the following manner
- Each record from the input file is evaluated against a reference file to determine if the entity represented by the data has been previously identified using a combination of deterministic and probabilistic matching techniques If it is judged that the entity is already represented m the reference set, the input record is assigned the unique identifier (UID) from the reference record that it has matched against If it is judged that the entity represented by data is not yet m the reference set, a new UID is randomly generated and assigned Random numbers are generated in whatever language the process is being implemented
- the input record is evaluated, in it's entirety, to determine if the record is a unique representation of the entity not already contained in the reference table If it is a new record, then it is inserted into the reference table for future use Deterministic Matching Technique
- the deterministic matching technique employs simple Boolean logic Two records are judged to match if certain criteria are met, such as the following First Name Matches Exactly
- the first step in the probabilistic matching process is to build a set of candidate records from the reference table based on characteristics of specific elements of the input record This process is referred to as blocking, the set of candidate records is referred to as the blocking table All data sets do not use the same characte ⁇ stics, the elements used in this process are determined through data analysis However, it is suggested that blocking variable consist of those elements that are somewhat unique to an element, e g , social security number, or a combination of date of birth and last name
- each element for each candidate record is compared against its corresponding element from the input record See equation 7 for the scoring mechanism
- the candidate record with the highest composite weight is then evaluated against a predefined threshold If the weight meets or exceeds the threshold, the candidate record is judged to match the input record If the weight does not exceed the threshold, it is assumed that the input record represents an entity not yet included in the reference set
Abstract
This invention relates to a method of matching computer-based records (301) for identifying unique entities (303) both within and between disparate data files. This method of record-linkage has particular utility in the fields of epidemiology and health services research.
Description
Method for Identifying Unique Entities in Disparate Data Files
Field of the Invention This invention relates to a method of matching computer-based records for identifying unique entities both within and between disparate data files. This method of record-linkage has particular utility in the fields of epidemiology and health services research.
Background of the Invention A custom universal 'dentifier methodology was developed in response to the limitations of exact matching techniques The methodology was designed to incorporate a combination of exact and probabilistic matching techniques The term record linkage has been used to indicate the bringing together of two or more separately recorded pieces of information concerning a particular entity. Integrating patient information from vaπous sources is essential for multivaπate research. The various facts concerning an individual, if brought together, form an extensive history of that individual There are many purposes for linking records. Examples range from obtaining more data elements about an individual by merging data from different data sources, to creating a more comprehensive name and address list by merging the names and address from several data sources. In the first case, it is important to ensure that the matching is done accurately so that the matched data truly represent a multivaπate observation from a single individual In the second, the merging is intended to ensure as complete a list as possible while eliminating duplication
The idea of linkage records in the interest of science has a long pedigree Fisher (Box, 1979, p 237) lectured at a Zurich public health congress in 1929, arguing the usefulness of public records supplemented by (and presumably linked with) family data, in human genetics research Earlier, Alexander Graham Bell exploited genealogical records and administrative records on marriages, census results and others apparently linking some sources, to sustain his familial studies of deafness (Bruce, 1973; Bell, 1906)
For many applications involving multiple databases, enough information is present to allow an accurate human judgement about whether a record from one source refers to the same case as a record from other sources. However, this is an extremely time-consuming, error-prone, and unreliable method except for small data sets. Computer methods are necessary to perform this task for a record matching exercise to be cost effective.
Summary of the Invention The present invention is a computer-implemented system and method for creating a universal identifier for more than one record in one or more data files, the process comprising standardizing one or more data elements m each record, estimating the agreement and disagreement weights employed in the probabilistic function, and assigning a randomly generated unique identifier to each record In a second aspect, this invention relates to a computer-implemented system and method for concatenating records belonging to the same source within a data base or between data bases, the process comprising
(1) creating a universal identifier for each record in one or more data files, by a) standardizing one or more data elements in each record, b) estimating the agreement and disagreement weights employed in the probabilistic function, and c) assigning a randomly generated unique identifier to each record, and
(2) concatenating records having the same unique identifier
In yet a third aspect, this invention relates to a computer-implemented system and method for concatenating records belonging to the same source where some records have a unique identifier and new records are created, the process comprising
(1 ) creating a universal identifier for each new record in one or more data files, by a) standardizing one or more data elements m each record, b) estimating the agreement and disagreement weights employed in the probabilistic function, and c) assigning a randomly generated unique identifier to each record, and
(2) concatenating records newly assigned a unique identifier with existing records having the same unique identifier
Descπpiton of the Figures
Figure 1 is a block diagram of illustrative input record components and atomic components
Figure 2 is a flowchart of weights calculated based on chance agreement using an iterative bootstrap techmque
Future 3 is a flowchart of the process for generating randomly assigned unique identifiers
Description of the Invention General Overview This invention provides a means for generating a unique identifier for records that ultimately relate back to a single source It is particularly useful where characterizing data identifying that source expands or changes over time Specific examples are financial data and patient data However, m both instances, data can normally be stored in a centralised data file such as a central server only if it is adequate secured and anonymized One way to effect this security interst is to use a trusted third party-environments This invention has its greatest use m the trusted third-party environment
A Trusted Third Party (TTP) service is a current way for anonymizmg patient data The data is sent to a TTP, which takes the data and replaces all patient identifiers with a new code The TTP matches codes against the patients - it therefore knows all the codes and patients
Working within the pervue of a TTP, or elsewhere, this invention address the step of creating and assigning a unique identifier to a record after which these records are concatenated based on the unique identifier The creation and assignment steps have three ma|or components l) data standardization, n) weight estimation, and in) the assignment of a unique identifier, in that order Definitions
For the purposes of this invention, the following definitions and abbreviations are used
μ-Probabihty The probability that any random element pair will match by chance
' t-matcli μ n, 4*β
p-Probabιlιty The reh ability of the data element If the Element Error Rate is > 99 then p = 1 - EER , Else p - 99 - EER
Agreement A condition such that a given element pair matches exactly and both elements are known A = /?
Agreement Weight: The weight assigned to an element pair when they agree duπng the record matching process
Cartesian Product: The set of ordered pairs A * B = {(α,δ) | a e A A b e. B]
Disagreement: A condition such that a given element pair does not exactly match and both elements are known
Ae ≠ Beι Disagreement Weight- The weight assigned to an element pair when they disagree duπng the record matching process
'l - pλ lo , \ -μ
Element Error Rate' The proportion of element pairs where at least one element is unknown, e.g., null n null
Frequency Table: Summary of the number of times, and percentage of total different values of a variable occur
Mean: Arithmetic average
— _ _
X,
No Decision- A condition such that a given element pair where either one or both of the elements is unknown.
Random Number Assignment: Every row in the data set will be assigned a random number such that v blocks of approximately 1500 are created p = ιnt[(t» * P)+ l] where p = Random Number, υ = Upper Bound and P = Random Function
Threshold: The threshold utilized in probabilistic matching is a binit odds ratio with a range of — °° ≥ x ≤ ∞
Upper Bound- Number of strata such that the data set is divided into approximately equal rows of 1500.
^ Number of Records m Data Set^ f = mt
V 1500
As regards the computer and machine language used m this process, just about any piece of hardware capable of executing a fairly large number of calculations in shrot order will fill the bill Any current state-of-the-art PC or server could be used. As for the operating system, UNIX is perferred, but Windows 98 or NT for Windows or the like could be used. The source code can be written in any language, though Java if preferred. Data Standardization The first step of this process involves the standardization of data in an input file.
This standardization is required for increased precision and reliability The input file can contain an number of variables of which one or more are or may be unique to a particular data source such as an individual. Examples of useful variables are- member identifier, drivers' license number, social security number, insurance company code number, name, gender, date of birth, street address, city, state, postal code, citizenship In addition, some identifiers can be further distilled down into their basic, or atomic, components. Figure 1 illustrates the use of selected input record components and atomic components of some records that are amenable to such further distillation Referring to Figure 1 , Input Record 100 illustrates data which can be used as the basis for assigning a unique identifier, and how that data can be broken out inot its atomic and subatomic components exemplified by Street Address 1 10, Date of Birth 120 and Name 130.
During the standardization process, all character data is preferably transformed to a single case. For example they are transformed to uppercase. So for instance, first names are standardized to uppercase, e.g., {BOB, ROB, ROBBY} = ROBERT. Common names for cities and streets may be transformed to the postal code, e g , in the U.S. to United States
Postal Service standard. In the latter instance this can be done using industry standard CASS certified software Weight Estimation
A fundamental component of this algorithm is the process of estimating the agreement and disagreement weights necessary for the probabilistic function Weights are calculated based in probabilities of chance agreement using an iterative bootstrap technique Figure 2 provides a flow of the process The first step in the weight estimation process is to determine the number of strata required such that the data set can be divided into approximately equal blocks of 1500 rows (Fig 2 - 201 -219), see equation 1
Number of Records in Data Set υ = ιnt (1 ) v 1500
The source file is then scanned and the records are assigned a random number between 1 and υ A data matπx is created containing a Cartesian product of records with a random number of 1 assigned The resulting matrix is then scanned Each element pair within each record pair is assessed and assigned a value in the following manner
' 1 if l = β (Agreement)
Q = { 0 ιf ^ - Null and/or β = Null (No decision) e e (2)
I - 1 if ^ ≠ β (Disagreement) where A is the nth element from record A
Once the matrix has been fully assessed, percentages for each Q are tabulated and stored This process is repeated for 15 iterations
Mean percentages of Agreements and No Decisions are calculated for each data element (Fig 2 - 221 ) The p probability, or the reliability, for each data element is then calculated, see equation 3 let ε -
• ■ i rα.ni No Decision
The μ probability, or the probability that element n for any given record pair will match by chance, is calculated (Fig 2 - 223), see equation 4 = - rA- Percent Agreement (4)
From the p and μ probabilities, the disagreement and agreement weight formula are calculated (Fig 2 - 225)employιng equations 5 and 6 respectively
Disagreement = log. iχA (5) l -μ
Agreement = lθ2 P^ (6)
Unique Identifier Assignment
The final stage of this process is the action of uniquely identifying entities within the input data set Figure 3 provides an overview of this process
Each record from the input file is evaluated against a reference file to determine if the entity represented by the data has been previously identified using a combination of deterministic and probabilistic matching techniques If it is judged that the entity is already represented m the reference set, the input record is assigned the unique identifier (UID) from the reference record that it has matched against If it is judged that the entity represented by data is not yet m the reference set, a new UID is randomly generated and assigned Random numbers are generated in whatever language the process is being implemented
After the UID assignment occurs, the input record is evaluated, in it's entirety, to determine if the record is a unique representation of the entity not already contained in the reference table If it is a new record, then it is inserted into the reference table for future use Deterministic Matching Technique
The deterministic matching technique employs simple Boolean logic Two records are judged to match if certain criteria are met, such as the following First Name Matches Exactly
Last Name Matches Exactly
Date of Birth Matches Exactly
Social Security Number OR Member Identifier Matches Exactly
If two records satisfy the cπteπa for deterministic matching, no probabilistic processing occurs However, if no deterministic match occurs, the input record is presented for a probabilistic match Probabilistic Matching Technique
The first step in the probabilistic matching process is to build a set of candidate records from the reference table based on characteristics of specific elements of the input record This process is referred to as blocking, the set of candidate records is referred to as the blocking table All data sets do not use the same characteπstics, the elements used in this process are determined through data analysis However, it is suggested that blocking
variable consist of those elements that are somewhat unique to an element, e g , social security number, or a combination of date of birth and last name
Upon completion of the construction of the blocking table, each element for each candidate record is compared against its corresponding element from the input record See equation 7 for the scoring mechanism
' Agreement Weight A. = B
W = <{ 0 f = Null and/or β = Null e e (7)
I Disagreement Weight if ^ ≠ β where A is the nth element from record A A composite weight is then calculated for all candidate records, see equation 8
W = Wι (8)
;-|
The candidate record with the highest composite weight is then evaluated against a predefined threshold If the weight meets or exceeds the threshold, the candidate record is judged to match the input record If the weight does not exceed the threshold, it is assumed that the input record represents an entity not yet included in the reference set
Claims
What is claimed is
1 A computer-implemented system and method for creating a universal identifier for more than one record in one or more data files, the process comprising standardizing one or more data elements in each record, estimating the agreement and disagreement weights employed in the probabilistic function, and assigning a randomly generated unique identifier to each record
2 A computer-implemented system and method for concatenating records belonging to the same source within a data base or between data bases, the process comprising
(A) creating a universal identifier for each record in one or more data files, by a) standardizing one or more data elements in each record, b) estimating the agreement and disagreement weights employed m the probabilistic function, and c) assigning a randomly generated unique identifier to each record, and
(B) concatenating records having the same unique identifier
3 A computer-implemented system and method for concatenating records belonging to the same source where some records have a unique identifier and new records are created, the process comprising
(A) creating a universal identifier for each new record in one or more data files, by a) standardizing one or more data elements in each record, b) estimating the agreement and disagreement weights employed m the probabilistic function, and c) assigning a randomly generated unique identifier to each record, and
(B) concatenating records newly assigned a unique identifier with existing records having the same unique identifier
4 A method for assigning a unique identification number to a source or owner data as described herein
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU16126/01A AU1612601A (en) | 1999-11-15 | 2000-11-15 | Method for identifying unique entities in disparate data files |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16562199P | 1999-11-15 | 1999-11-15 | |
US60/165,621 | 1999-11-15 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2001037097A1 true WO2001037097A1 (en) | 2001-05-25 |
Family
ID=22599696
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2000/031399 WO2001037097A1 (en) | 1999-11-15 | 2000-11-15 | Method for identifying unique entities in disparate data files |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU1612601A (en) |
WO (1) | WO2001037097A1 (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004023335A2 (en) * | 2002-09-03 | 2004-03-18 | Sap Aktiengesellschaft | Central master data management |
WO2004023287A2 (en) * | 2002-09-03 | 2004-03-18 | Sap Aktiengesellschaft | Collaborative master data management |
WO2004036455A2 (en) * | 2002-10-16 | 2004-04-29 | Sap Aktiengesellschaft | Master data access |
EP1537499A2 (en) * | 2002-09-03 | 2005-06-08 | Sap Ag | Distribution of data in a master data management system |
US7031787B2 (en) | 2002-03-21 | 2006-04-18 | Sap Aktiengesellschaft | Change management |
EP1647929A1 (en) * | 2004-10-12 | 2006-04-19 | International Business Machines Corporation | Method, system and computer programm for associating healthcare records with an individual |
US7133878B2 (en) | 2002-03-21 | 2006-11-07 | Sap Aktiengesellschaft | External evaluation processes |
US7236973B2 (en) | 2002-11-27 | 2007-06-26 | Sap Aktiengesellschaft | Collaborative master data management system for identifying similar objects including identical and non-identical attributes |
US7272776B2 (en) | 2003-12-30 | 2007-09-18 | Sap Aktiengesellschaft | Master data quality |
CN100353313C (en) * | 2002-09-03 | 2007-12-05 | Sap股份公司 | Collaborative master data management |
CN100361624C (en) * | 2004-06-01 | 2008-01-16 | 株式会社东芝 | Medical image storage apparatus protecting personal information |
US7725565B2 (en) | 2008-02-25 | 2010-05-25 | Georgetown University | System and method for detecting, collecting, analyzing, and communicating event related information |
US7930149B2 (en) | 2003-12-19 | 2011-04-19 | Sap Aktiengesellschaft | Versioning of elements in a configuration model |
US8061604B1 (en) | 2003-02-13 | 2011-11-22 | Sap Ag | System and method of master data management using RFID technology |
US8200501B2 (en) | 2006-01-26 | 2012-06-12 | International Business Machines Corporation | Methods, systems and computer program products for synthesizing medical procedure information in healthcare databases |
US8499036B2 (en) | 2002-03-21 | 2013-07-30 | Sap Ag | Collaborative design process |
US8566113B2 (en) | 2006-02-07 | 2013-10-22 | International Business Machines Corporation | Methods, systems and computer program products for providing a level of anonymity to patient records/information |
US8881040B2 (en) | 2008-08-28 | 2014-11-04 | Georgetown University | System and method for detecting, collecting, analyzing, and communicating event-related information |
US9063991B2 (en) | 2013-01-25 | 2015-06-23 | Wipro Limited | Methods for identifying unique entities across data sources and devices thereof |
US20160196537A1 (en) * | 2015-01-02 | 2016-07-07 | Bank Of America Corporation | File Locking Framework |
US9529974B2 (en) | 2008-02-25 | 2016-12-27 | Georgetown University | System and method for detecting, collecting, analyzing, and communicating event-related information |
CN106777070A (en) * | 2016-12-12 | 2017-05-31 | 江苏师范大学 | A kind of system and method for the Web record links based on piecemeal |
US9870381B2 (en) | 2015-05-22 | 2018-01-16 | International Business Machines Corporation | Detecting quasi-identifiers in datasets |
US10095883B2 (en) | 2016-07-22 | 2018-10-09 | International Business Machines Corporation | Method/system for the online identification and blocking of privacy vulnerabilities in data streams |
US10503347B2 (en) | 2008-02-25 | 2019-12-10 | Georgetown University | System and method for detecting, collecting, analyzing, and communicating event-related information |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4821184A (en) * | 1981-05-22 | 1989-04-11 | Data General Corporation | Universal addressing system for a digital data processing system |
US5487164A (en) * | 1993-09-14 | 1996-01-23 | International Business Machines Corporation | Distribution-based replacement selection sorting system |
US5594889A (en) * | 1992-01-03 | 1997-01-14 | Digital Equipment Corporation | Memory resource allocation look ahead system and method |
US5668897A (en) * | 1994-03-15 | 1997-09-16 | Stolfo; Salvatore J. | Method and apparatus for imaging, image processing and data compression merge/purge techniques for document image databases |
-
2000
- 2000-11-15 WO PCT/US2000/031399 patent/WO2001037097A1/en active Application Filing
- 2000-11-15 AU AU16126/01A patent/AU1612601A/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4821184A (en) * | 1981-05-22 | 1989-04-11 | Data General Corporation | Universal addressing system for a digital data processing system |
US5594889A (en) * | 1992-01-03 | 1997-01-14 | Digital Equipment Corporation | Memory resource allocation look ahead system and method |
US5487164A (en) * | 1993-09-14 | 1996-01-23 | International Business Machines Corporation | Distribution-based replacement selection sorting system |
US5668897A (en) * | 1994-03-15 | 1997-09-16 | Stolfo; Salvatore J. | Method and apparatus for imaging, image processing and data compression merge/purge techniques for document image databases |
Cited By (46)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7031787B2 (en) | 2002-03-21 | 2006-04-18 | Sap Aktiengesellschaft | Change management |
US9400836B2 (en) | 2002-03-21 | 2016-07-26 | Sap Se | External evaluation processes |
US8499036B2 (en) | 2002-03-21 | 2013-07-30 | Sap Ag | Collaborative design process |
US8117157B2 (en) | 2002-03-21 | 2012-02-14 | Sap Ag | External evaluation processes |
US7133878B2 (en) | 2002-03-21 | 2006-11-07 | Sap Aktiengesellschaft | External evaluation processes |
CN100409238C (en) * | 2002-09-03 | 2008-08-06 | Sap股份公司 | Central master data management |
WO2004023335A3 (en) * | 2002-09-03 | 2004-08-26 | Sap Ag | Central master data management |
WO2004023287A3 (en) * | 2002-09-03 | 2004-09-02 | Sap Ag | Collaborative master data management |
WO2004023287A2 (en) * | 2002-09-03 | 2004-03-18 | Sap Aktiengesellschaft | Collaborative master data management |
CN100353313C (en) * | 2002-09-03 | 2007-12-05 | Sap股份公司 | Collaborative master data management |
WO2004023335A2 (en) * | 2002-09-03 | 2004-03-18 | Sap Aktiengesellschaft | Central master data management |
EP1537499A2 (en) * | 2002-09-03 | 2005-06-08 | Sap Ag | Distribution of data in a master data management system |
CN100410932C (en) * | 2002-09-03 | 2008-08-13 | Sap股份公司 | Data distribution in master data management system |
US7509326B2 (en) | 2002-09-03 | 2009-03-24 | Sap Ag | Central master data management |
WO2004036455A3 (en) * | 2002-10-16 | 2004-12-16 | Sap Ag | Master data access |
US9256655B2 (en) | 2002-10-16 | 2016-02-09 | Sap Se | Dynamic access of data |
WO2004036455A2 (en) * | 2002-10-16 | 2004-04-29 | Sap Aktiengesellschaft | Master data access |
US8180732B2 (en) | 2002-11-27 | 2012-05-15 | Sap Ag | Distributing data in master data management systems |
US7236973B2 (en) | 2002-11-27 | 2007-06-26 | Sap Aktiengesellschaft | Collaborative master data management system for identifying similar objects including identical and non-identical attributes |
US8061604B1 (en) | 2003-02-13 | 2011-11-22 | Sap Ag | System and method of master data management using RFID technology |
US9691053B1 (en) | 2003-02-13 | 2017-06-27 | Sap Se | System and method of master data management |
US7930149B2 (en) | 2003-12-19 | 2011-04-19 | Sap Aktiengesellschaft | Versioning of elements in a configuration model |
US7272776B2 (en) | 2003-12-30 | 2007-09-18 | Sap Aktiengesellschaft | Master data quality |
CN100361624C (en) * | 2004-06-01 | 2008-01-16 | 株式会社东芝 | Medical image storage apparatus protecting personal information |
US9230060B2 (en) | 2004-10-12 | 2016-01-05 | International Business Machines Corporation | Associating records in healthcare databases with individuals |
US8495069B2 (en) | 2004-10-12 | 2013-07-23 | International Business Machines Corporation | Associating records in healthcare databases with individuals |
EP1647929A1 (en) * | 2004-10-12 | 2006-04-19 | International Business Machines Corporation | Method, system and computer programm for associating healthcare records with an individual |
US8892571B2 (en) | 2004-10-12 | 2014-11-18 | International Business Machines Corporation | Systems for associating records in healthcare database with individuals |
US8200501B2 (en) | 2006-01-26 | 2012-06-12 | International Business Machines Corporation | Methods, systems and computer program products for synthesizing medical procedure information in healthcare databases |
US8566113B2 (en) | 2006-02-07 | 2013-10-22 | International Business Machines Corporation | Methods, systems and computer program products for providing a level of anonymity to patient records/information |
US9489495B2 (en) | 2008-02-25 | 2016-11-08 | Georgetown University | System and method for detecting, collecting, analyzing, and communicating event-related information |
US9529974B2 (en) | 2008-02-25 | 2016-12-27 | Georgetown University | System and method for detecting, collecting, analyzing, and communicating event-related information |
US10503347B2 (en) | 2008-02-25 | 2019-12-10 | Georgetown University | System and method for detecting, collecting, analyzing, and communicating event-related information |
US7725565B2 (en) | 2008-02-25 | 2010-05-25 | Georgetown University | System and method for detecting, collecting, analyzing, and communicating event related information |
US10055502B2 (en) | 2008-02-25 | 2018-08-21 | Georgetown University | System and method for detecting, collecting, analyzing, and communicating event related information |
US8881040B2 (en) | 2008-08-28 | 2014-11-04 | Georgetown University | System and method for detecting, collecting, analyzing, and communicating event-related information |
US9063991B2 (en) | 2013-01-25 | 2015-06-23 | Wipro Limited | Methods for identifying unique entities across data sources and devices thereof |
US10380087B2 (en) * | 2015-01-02 | 2019-08-13 | Bank Of America Corporation | File locking framework |
US20160196537A1 (en) * | 2015-01-02 | 2016-07-07 | Bank Of America Corporation | File Locking Framework |
US9870381B2 (en) | 2015-05-22 | 2018-01-16 | International Business Machines Corporation | Detecting quasi-identifiers in datasets |
US10380088B2 (en) | 2015-05-22 | 2019-08-13 | International Business Machines Corporation | Detecting quasi-identifiers in datasets |
US11269834B2 (en) | 2015-05-22 | 2022-03-08 | International Business Machines Corporation | Detecting quasi-identifiers in datasets |
US10095883B2 (en) | 2016-07-22 | 2018-10-09 | International Business Machines Corporation | Method/system for the online identification and blocking of privacy vulnerabilities in data streams |
US11030340B2 (en) | 2016-07-22 | 2021-06-08 | International Business Machines Corporation | Method/system for the online identification and blocking of privacy vulnerabilities in data streams |
CN106777070A (en) * | 2016-12-12 | 2017-05-31 | 江苏师范大学 | A kind of system and method for the Web record links based on piecemeal |
CN106777070B (en) * | 2016-12-12 | 2020-06-26 | 江苏师范大学 | Web record link system and method based on block |
Also Published As
Publication number | Publication date |
---|---|
AU1612601A (en) | 2001-05-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2001037097A1 (en) | Method for identifying unique entities in disparate data files | |
WO2000049531A9 (en) | Apparatus and method for depersonalizing information | |
CA2564307C (en) | Data record matching algorithms for longitudinal patient level databases | |
US6658412B1 (en) | Computer-based method and system for linking records in data files | |
JP5401037B2 (en) | A method of linking unidentified patient records using encrypted and unencrypted demographic information and healthcare information from multiple data sources. | |
US20030126156A1 (en) | Duplicate resolution system and method for data management | |
EP1240574A2 (en) | Anonymously linking a plurality of data records | |
MXPA04006390A (en) | Real time data warehousing. | |
WO2003071443A1 (en) | Method and system for a data service to control access to personal information | |
JP2014238892A (en) | Method, computer system and computer program for retrieving secured data | |
CN111709714B (en) | Loss personnel prediction method and device based on artificial intelligence | |
US7634559B2 (en) | System and method for analyzing network software application changes | |
US6694459B1 (en) | Method and apparatus for testing a data retrieval system | |
Cannon-Albright et al. | Creation of a national resource with linked genealogy and phenotypic data: the Veterans Genealogy Project | |
CN116010941B (en) | Multi-center medical queue construction system and method based on sandboxes | |
WO2016029124A1 (en) | System and method of matching identities among disparate physician records | |
US8782025B2 (en) | Systems and methods for address intelligence | |
US20050125257A1 (en) | System and method for creating data links between diagnostic information and prescription infornation records | |
US20020004728A1 (en) | Testing method and system | |
Graves | Integrating Order and Distance Relationships from Heterogeneous Maps. | |
Schnell et al. | Microsimulation of an educational attainment register to predict future record linkage quality | |
CN111652742B (en) | User data processing method, device, electronic equipment and readable storage medium | |
Deutsch | Using Unique Identifiers Within Syringe Service Programs | |
Vijenthira et al. | Registration errors among patients receiving blood transfusions: a national analysis from 2008 to 2017 | |
CN115660628A (en) | Block chain-based man-hour information processing method, block chain-based man-hour information processing device, electronic apparatus, and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AL AU BA BB BG BR BZ CA CN CZ DZ EE GE GH GM HR HU ID IL IN IS JP KP KR LC LK LR LT LV MA MG MK MN MX MZ NO NZ PL RO SG SI SK SL TR TT TZ UA US UZ VN YU ZA |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
122 | Ep: pct application non-entry in european phase |