US7698312B2 - Performing recursive database operations - Google Patents

Performing recursive database operations Download PDF

Info

Publication number
US7698312B2
US7698312B2 US11/600,272 US60027206A US7698312B2 US 7698312 B2 US7698312 B2 US 7698312B2 US 60027206 A US60027206 A US 60027206A US 7698312 B2 US7698312 B2 US 7698312B2
Authority
US
United States
Prior art keywords
candidate
slaves
iteration
steps
element set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US11/600,272
Other versions
US20070067327A1 (en
Inventor
Thierry Cruanes
Wei Li
Ari Mozes
Benoit Dageville
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
Oracle International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oracle International Corp filed Critical Oracle International Corp
Priority to US11/600,272 priority Critical patent/US7698312B2/en
Publication of US20070067327A1 publication Critical patent/US20070067327A1/en
Application granted granted Critical
Publication of US7698312B2 publication Critical patent/US7698312B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • G06F16/24566Recursive queries
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99942Manipulating data structure, e.g. compression, compaction, compilation

Definitions

  • the present invention relates to databases, and in particular, to performing recursive database operations.
  • Some computational tasks are especially suitable for recursive processing.
  • An “itemset” is a set of items. For example, one itemset might include the items (apple, banana), while another itemset might include the items (apple, orange), while yet another itemset might include the items (banana, orange).
  • An itemset is “frequent”, relative to a set of data structures, if the number of the data structures that contain all of the items in the itemset is at least a specified fraction of the total number of the data structures in the set.
  • each data structure might represent a different customer's transaction at a supermarket.
  • a first data structure might contain the items (apple, banana, milk), while a second data structure might contain the items (apple, banana, milk, orange), while a third data structure might contain the item (orange).
  • the itemset (apple, banana) is a frequent itemset because “apple” occurs with “banana” in two of the three data structures, but the itemsets (apple, orange) and (banana, orange) are not frequent itemsets because “apple” occurs with “orange” in only one of the three data structures and “banana” occurs with “orange” in only one of the three data structures.
  • the determination of whether a particular itemset is frequent relative to that set of data structures becomes more computationally intensive.
  • Frequent itemset determination lends itself especially well to recursive processing due at least in part to the observation that an N-element itemset cannot be a frequent itemset relative to a set of data structures unless all of the (N ⁇ 1)-element subsets of the N-element itemset are also frequent itemsets relative to that set of data structures.
  • the 3-element itemset (apple, banana, milk) cannot be a frequent itemset relative to the set of data structures in the above example unless all of the 2-element subsets of that 3-element itemset, namely, (apple, banana), (apple, milk), and (banana, milk), are also frequent itemsets relative to the set of data structures in the above example.
  • frequent itemsets might be determined in the following manner.
  • An application that is external to a database server might send a query to the database server.
  • the query would cause the database server to select, from a set of data structures, each data structure that contains all of the items in a specified itemset.
  • the database server would execute the query and return the selected data structures to the application.
  • the application might count the selected data structures and determine whether the number of selected data structures meets a specified threshold. If the number of selected data structures met the specified threshold, then the application might place the specified itemset in a set of frequent itemsets.
  • the application might perform the above steps for each 1-element itemset that is a subset of an M-element itemset, one 1-element itemset at a time, and one 1-element itemset after another.
  • the application might determine, for each particular 2-element subset of the M-element itemset, whether all of the 1-element subsets of that particular 2-element subset are contained in the set of frequent itemsets. If all of the 1-element subsets of the particular 2-element subset were contained in the group of frequent itemsets, then the application might send, to the database server, a query that would cause the database server to select, from the set of data structures, each data structure that contains all of the items in the particular 2-element itemset. The database server would execute the query and return the selected data structures to the application. The application might count the selected data structures and determine whether the number of selected data structures meets the specified threshold.
  • the application might place the particular 2-element itemset in the set of frequent itemsets.
  • the application might perform the above steps for each 2-element itemset that is a subset of the M-element itemset, one 2-element itemset at a time, and one 2-element itemset after another.
  • the application For each successive value of N, the application might perform the above steps for the N-element itemsets that are subsets of the M-element itemset until N was greater than M or there were no (N ⁇ 1)-element itemsets in the set of frequent itemsets, whichever came first.
  • the application by sending a multitude of queries to a database server in serial manner and counting the results of such queries, the application might determine frequent itemsets that are subsets of the M-element itemset.
  • FIG. 1 is a block diagram that illustrates a system in which recursive database operations may be performed in a parallelized manner, according to an embodiment of the present invention
  • FIG. 2 is a flow diagram that illustrates a technique for performing a recursive database operation using two stages of concurrently executing slaves, according to an embodiment of the present invention.
  • FIG. 3 is a block diagram that illustrates a computer system upon which an embodiment of the invention may be implemented.
  • a plurality of first-stage slaves and a plurality of second-stage slaves are established in a database server.
  • the first-stage slaves concurrently process data items and send the results of the first-stage slaves' processing to the second-stage slaves.
  • the second-stage slaves receive the first results of the first-stages slaves' processing and concurrently process those first results.
  • the second-stage slaves store the first results of the second-stage slaves' processing in a data repository.
  • the first-stage slaves obtain the first results of the second-stage slaves' processing from the data repository, concurrently process those first results, and send the second results of the first-stage slaves' processing to the second-stage slaves.
  • the second-stage slaves receive the second results of the first-stages slaves' processing and concurrently process those second results.
  • the second-stage slaves store the second results of the second-stage slaves' processing in the data repository.
  • Subsequent iterations of the recursive database operation proceed in this manner until the recursive database operation has been completed.
  • the first-stage slaves consume the product of the second-stage slaves' processing during the previous iteration
  • the second-stage slaves consume the product of the first-stage slaves' processing during the current iteration.
  • the above embodiment makes it unnecessary for an application to send multiple queries to and receive multiple results from the database server during the performance of the recursive database operation. Additionally, application programmers are spared the burden of programming applications to perform the processing that is performed by the first-stage slaves and second-stage slaves.
  • FIG. 1 is a block diagram that illustrates a system 100 in which recursive database operations may be performed in a parallelized manner, according to an embodiment of the present invention.
  • System 100 comprises a database server 102 and a database 104 .
  • Database server 102 is communicatively coupled to database 104 .
  • Database server 102 comprises a plurality of first-stage slaves 106 A-N and a plurality of second-stage slaves 108 A-N.
  • Each of first-stage slaves 106 A-N may be a separate thread of a first process
  • each of second-stage slaves 108 A-N may be a separate thread of a second process.
  • each of first-stage slaves 106 A-N and second-stage slaves 108 A-N may be a separate thread of the same process.
  • each of first-stage slaves 106 A-N and each of second-stage slaves 108 A-N may be a separate process.
  • First-stage slaves 106 A-N execute concurrently with each other and each of second-stage slaves 108 A-N.
  • Database server 102 also comprises a query coordinator 110 .
  • Query coordinator 110 is a process that executes concurrently with first-stage slaves 106 A-N and second-stage slaves 108 A-N.
  • Query coordinator 110 sends messages to, and receives messages from, first-stage slaves 106 A-N and second-stage slaves 108 A-N. Such messages may be sent and received using inter-process communication mechanisms, for example.
  • Database 104 comprises data structures 112 A-N. Each of data structures 112 A-N may be, for example, a separate row of a database table. In one embodiment, each of data structures 112 A-N contains one or more items. For example, one of data structures 112 A-N might contain items such as “apple,” “banana,” and “orange.” Database 104 also comprises a data repository 114 . Data repository 114 may be, for example, a temporary database table.
  • data repository 114 comprises two separate segments 116 A and 116 B.
  • One of segments 116 A-B is designated as a “read” segment, and one of segments 116 A-B is designated as a “write” segment.
  • query coordinator 110 maintains state information that indicates which of segments 116 A-B is currently designated as the “read” segment, and which of segments 116 A-B is currently designated as the “write” segment. The designations of segments 116 A-B can be swapped.
  • FIG. 2 is a block diagram that illustrates a technique 200 for performing a recursive database operation using two stages of concurrently executing slaves, according to an embodiment of the present invention.
  • a plurality of first-stage slaves is established in a database server.
  • first-stage slaves 106 A-N may be established in database server 102 .
  • a plurality of second-stage slaves in established in the database server.
  • second-stage slaves 108 A-N may be established in database server 102 .
  • An iteration of the recursive database operation is represented in blocks 206 - 212 .
  • data items in a data repository are processed by causing the first-stage slaves to process concurrently the data items.
  • database server 102 may process a plurality of data items that are stored in data repository 114 by causing first-stage slaves 106 A-N to process concurrently the data items.
  • each data item is a separate itemset.
  • the products of the first-stage slaves' processing are processed by causing the second-stage slaves to process concurrently those products.
  • database server 102 may process the products of the first-stage slaves' earlier processing by causing second-stage slaves 108 A-N to process concurrently the first-stage slaves' products.
  • each of second-stage slaves 108 A-N may store, in data repository 114 , data items that are the products of the second-stage slaves' processing.
  • query coordinator 110 may determine whether the recursive database operation is complete. If the recursive database operation is complete, then technique 200 ends. Alternative, if the recursive database operation is not complete, then control passes back to block 206 .
  • the first-stage slaves will, in block 206 , process concurrently the products produced by the second-stage slaves during the then-previous iteration of the recursive database operation.
  • These products may be, for example, data items that second-stage slaves 108 A-N stored in data repository 114 .
  • At least some of the processing performed by first-stage slaves 106 A-N, second-stage slaves 108 A-N, or both involves determining whether data items satisfy specified criteria, and producing only those of the data items that satisfy the specified criteria.
  • second-stage slaves 108 A-N concurrently process data items produced by the processing of first-stage slaves 106 A-N by determining, for each particular data item, whether that particular data item satisfies specified criteria. In one embodiment, second-stage slaves 108 A-N select only the data items that satisfy the specified criteria, and store in data repository 114 only the selected data items. In one embodiment, second-stage slaves 108 A-N return, to an application or other set of processes or threads, only the selected data items. In one embodiment, because only the selected data items are stored in data repository 114 , first-stage slaves 106 A-N process only the selected data items during the next iteration of the recursive database operation.
  • the data items processed and produced by first-stage slaves 106 A-N and second-stage slaves 108 A-N are itemsets.
  • the processing performed by first-stage slaves 106 A-N involves determining candidate itemsets and counting how many of data structures 112 A-N contain all of the elements of each such candidate itemset.
  • N-element itemset cannot be a frequent itemset relative to a set of data structures unless all of the (N ⁇ 1)-element subsets of the N-element itemset are also frequent itemsets relative to that set of data structures allows the computationally intensive determination of whether itemsets are frequent to be performed for fewer itemsets during each successive iteration of a recursive frequent itemset-determining operation.
  • At least one of first-stage slaves 106 A-N does the following. At least one of first stage slaves 106 A-N reads (N ⁇ 1)-element itemsets from data repository 114 .
  • the (N ⁇ 1)-element itemsets are, in one embodiment, frequent (N ⁇ 1)-element itemsets that second-stage slaves 108 A-N stored in data repository 114 during a previous iteration of the operation.
  • At least one of first-stage slaves 106 A-N determines all of the different possible N-element itemsets that are subsets of an M-element itemset, where the M-element itemset includes one instance of every item that occurs in data structures 112 A-N.
  • items are the only items that occur once or more in data structures 112 A-N, then the possible 2-element itemsets of that 4-element itemset are (apple, banana), (apple, milk), (apple, orange), (banana, milk), (banana, orange), and (milk, orange).
  • At least one of first-stage slaves 106 A-N determines whether all of the (N ⁇ 1)-element subsets of that possible N-element subset are frequent (N ⁇ 1)-element itemsets that were read from data repository 114 . If all of the (N ⁇ 1)-element subsets of a particular N-element itemset are frequent (N ⁇ 1) element itemsets, then at least one of first-stage slaves 106 A-N places the particular N-element itemset in a group of candidate N-element itemsets. Otherwise, the particular N-element itemset is not placed in the group of candidate N-element itemsets.
  • first-stage slaves 106 A-N do not determine candidate 1-element itemsets based on the contents of data repository 114 , because at that point, data repository 114 is empty. Instead, first-stage slaves 106 A-N assume that all of the 1-element subsets of the M-element itemset referenced above are candidate 1-element itemsets.
  • only one of first-stage slaves 106 A-N determines the group of candidate N-element itemsets as discussed above. In one embodiment, after making this determination, that first-stage slave sends, to the others of first-stage slaves 106 A-N, the candidate N-element itemsets.
  • a group of candidate N-element itemsets is generated based on the contents of data repository 114 , and the contents of that group are obtained by each of first-stage slaves 106 A-N. Once the group of candidate N-element itemsets is known, the occurrences of those candidate N-element itemsets in data structures 112 A-N can be counted. Occurrences of non-candidate N-element itemsets do not need to be counted.
  • first-stage slaves 106 A-N are assigned a separate subset of data structures 112 A-N. For example, if there were 10 first-stage slaves 106 A-N and 100 data structures 112 A-N, then each of first-stage slaves 106 A-N might be assigned 10 separate data structures of data structures 112 A-N.
  • the divvying of data structures 112 A-N among first-stage slaves 106 A-N might be done according to range, for example, or according to a hash mapping, for another example.
  • Each of data structures 112 A-N may indicate an identifier that is unique to that data structure.
  • a particular data structure's identifier may be used to determine the range in which the particular data structure belongs.
  • a particular data structure's identifier may be used as input to a hash function that produces a hash value to which the particular data structure corresponds. For example, if there were 10 first-stage slaves 106 A-N and 100 data structures 112 A-N, then the hash function might, for each data structure, divide that data structure's identifier by 10 and take the whole remainder resulting from that division to be the hash value associated with that data structure.
  • each of first-stage slaves 106 A-N counts, concurrently with each other of first-stage slaves 106 A-N, the occurrences of the candidate N-element itemsets in data structures in the subset of data structures 112 A-N that has been assigned to that first-stage slave. In one embodiment, because multiple slaves count occurrences of the candidate N-element itemsets in parallel, the time taken to count all such occurrences is reduced.
  • the contents of data structures 112 A-C assigned to first-stage slave 106 A might be (apple, banana, milk), (apple, banana, milk, orange), and (orange), respectively
  • the contents of data structures 112 D-F assigned to first-stage slave 106 B might be (banana, milk, orange), (apple, milk, orange), and (apple, banana, orange), respectively.
  • first-stage slave 106 A would count 2 occurrences of (apple, banana), 2 occurrences of (apple, milk), 1 occurrence of (apple, orange), 2 occurrences of (banana, milk), 1 occurrence of (banana, orange), and 1 occurrence of (milk, orange).
  • second-stage slave 106 B would count 1 occurrence of (apple, banana), 1 occurrence of (apple, milk), 2 occurrences of (apple, orange), 1 occurrence of (banana, milk), 2 occurrences of (banana, orange), and 2 occurrences of (milk, orange).
  • the counting of occurrences of candidate N-element itemsets is performed using either a “bitmap intersection” technique or a “prefix tree counting” technique. Both of these techniques are specifically described in co-pending U.S. patent application Ser. No. 10/643,563, titled “DYNAMIC SELECTION OF FREQUENT ITEMSET COUNTING TECHNIQUE.”
  • the technique that is used to count occurrences of candidate N-element itemsets is dynamically determined at each iteration of the operation. Thus, the counting technique used during one iteration of the operation may differ from the counting technique used during another iteration of the operation. Techniques for dynamically selecting counting techniques also are described in U.S. patent application Ser. No. 10/643,563.
  • each of first-stage slaves 106 A-N counts the total number of data structures in the subset of data structures 112 A-N assigned to that first-stage slave.
  • each particular first-stage slave of first-stage slaves 106 A-N sends, to one or more of second-stage slaves 108 A-N, at least the following type of information: (a) one or more of the candidate N-element itemsets, (b) a count of occurrences of those candidate N-element itemsets in those of data structures 112 A-N assigned to the particular first-stage slave, and (c) a count of the total number of those of data structures 112 A-N assigned to the particular first-stage slave.
  • each of first-stage slaves 106 A-N sends the above type of information to multiple second-stage slaves of second-stage slaves 108 A-N.
  • all of the counts associated with a particular candidate N-element itemset are sent to the same second-stage slave, regardless of which first stage-slave determined the counts. However, counts associated with different N-element itemsets may be sent to different second-stage slaves.
  • each of first-stage slaves 106 A-N may send counts to various ones of second-stage slaves 108 A-N in the following manner: counts associated with (apple, banana) may be sent to second-stage slave 108 A, counts associated with (apple, milk) may be sent to second-stage slave 108 B, counts associated with (apple, orange) may be sent to second-stage slave 108 C, counts associated with (banana, milk) may be sent to second-stage slave 108 D, counts associated with (banana, orange) may be sent to second-stage slave 108 E, and counts associated with (milk, orange) may be sent to second-stage slave 108 F.
  • counts associated with (apple, banana) and counts associated with (apple, milk) may be sent to second-stage slave 108 A
  • counts associated with (apple, orange) and counts associated with (banana, milk) may be sent to second-stage slave 108 B
  • counts associated with (banana, orange) and counts associated with (milk, orange) may be sent to second-stage slave 108 C.
  • the candidate itemsets are divvied among second-stage slaves 108 A-N in as balanced a manner as possible, so that each of second-stage slaves 108 A-N receives counts associated with approximately the same number of candidate itemsets during a particular iteration of the operation.
  • Candidate itemsets may be divvied among second-stage slaves 108 A-N through a hash-mapping technique, for example.
  • the elements of a particular candidate itemset may be enumerated, combined, and input into a hash function.
  • Counts associated with the particular candidate itemset may be sent to the second-stage slave that is associated with the hash value produced by the hash function.
  • the processing performed by second-stage slaves 108 A-N involves aggregating preliminary counts received from first-stage slaves 106 A-N, and selecting one or more candidate N-element itemsets based on whether aggregate counts for those itemsets meet a specified threshold.
  • each of second-stage slaves 108 A-N receives, from first-stage slaves 106 A-N, one or more preliminary occurrence counts respectively associated with one or more separate subsets of the candidate N-element itemsets. In one embodiment, each of second-stage slaves 108 A-N also receives, from each of first-stage slaves 106 A-N that sends a preliminary occurrence count to that second-stage slave, a count of the total number of data structures 112 A-N that were assigned to that first-stage slave.
  • each of first-stage slaves 106 A-N might determine, for 10 separate database structures of database structures 112 A-N, how many occurrences of each of the candidate 2-element itemsets are within those database structures.
  • second-stage slave 108 A might be associated with candidate 2-element itemsets (apple, banana) and (apple, milk).
  • each particular first-stage slave of first-stage slaves 106 A-N sends, to second-stage slave 108 A, information that indicates: the number of database structures that the particular first-stage slave evaluated (in this case, 10), the number of occurrences of (apple, banana) in the database structures that the particular first-stage slave evaluated, and the number of occurrences of (apple, milk) in the database structures that the particular first-stage slave evaluated.
  • each particular second-stage slave of second-stage slaves 108 A-N separately aggregates (i.e., adds up), for each of the candidate N-element itemsets with which the particular second-stage slave is associated, the preliminary counts that the particular second-stage slave receives from first-stage slaves 106 A-N.
  • second-stage slave 108 A might receive, from first-stage slave 106 A, a count of 2 for (apple, banana) and a count of 2 for (apple, milk).
  • second-stage slave 108 A might receive, from first-stage slave 106 B, a count of 1 for (apple, banana) and a count of 1 for (apple, milk). Therefore, in this example, second-stage slave 108 A would determine an aggregate count of 3 for (apple, banana) and an aggregate count of 3 for (apple, milk).
  • second-stage slaves 108 A-N concurrently determine aggregate counts for candidate N-element itemsets.
  • second-stage slave 108 A may aggregate counts for (apple, banana) and (apple, milk) at the same time that second-stage slave 108 B aggregates counts for (apple, orange) and (banana, milk), and at the same time that second-stage slave 108 C aggregates counts for (banana, orange) and (milk, orange).
  • second-stage slaves 108 A-N aggregate counts concurrently with first-stage slaves 106 A-N determining the counts and sending the counts to second-stage slaves 108 A-N.
  • each of second-stage slaves 108 A-N receive, from each of first-stage slaves 106 A-N that sends a preliminary occurrence count to that second-stage slave, a count of the total number of data structures 112 A-N that were assigned to that first-stage slave. In one embodiment, each of second-stage slaves aggregates each such total number of data structures to determine an aggregate total number of data structures 112 A-N.
  • the particular second-stage slave determines whether the aggregate count for the particular candidate N-element itemset is at least as great as a specified fraction of the aggregate total number of data structures 112 A-N.
  • the specified fraction is referred to herein as the “threshold.” If the aggregate count for the particular candidate N-element itemset is at least as great as the number derived from the specified threshold, then the particular second-stage slave determines that the particular candidate N-element is a frequent N-element itemset relative to data structures 112 A-N.
  • second-stage slave 108 A might receive, from first-stage slave 106 A, an indication that first-stage slave 106 A evaluated 3 of database structures 112 A-N.
  • second-stage slave 108 A might receive, from first-stage slave 106 B, an indication that first-stage slave 106 B evaluated 3 of database structures 112 A-N. Assuming for purposes of example that these were the only “database structure totals” received from first-stage slaves 106 A-B, second-stage slave 108 A determines that the aggregate total number of data structures 112 A-N is 6 (i.e., 3+3).
  • second-stage slave 108 A determines that the fraction of data structures 112 A-N that contain (apple, banana) is 1 ⁇ 2 (i.e., 3/6). Continuing the example, if the aggregate count for (apple, milk) is 3, then second-stage slave 108 A determines that the fraction of data structures 112 A-N that contain (apple, milk) is also 1 ⁇ 2 (i.e., 3/6). Assuming for purposes of the example that the specified threshold is 1 ⁇ 3, second-stage slave 108 A determines that both (apple, banana) and (apple, milk) are frequent 2-element itemsets relative to data structures 112 A-N.
  • each of second-stage slaves 108 A-N determines, for each of the candidate N-element itemsets with which that second-stage slave is associated, whether that candidate N-element itemset is a frequent N-element itemset relative to data structures 112 A-N. In one embodiment, each of second-stage slaves 108 A-N selects, from among the candidate N-element itemsets with which that second-stage slave is associated, only the frequent N-element itemsets.
  • each of second-stage slaves 108 A-N stores, in data repository 114 , only the selected frequent N-element itemsets.
  • each of second-stage slaves 108 A-N sends, to one or more other entities that may include an application external to database server 102 , the selected frequent N-element itemsets.
  • each of second-stage slaves 108 A-N performs the above determination, selection, storage, and sending concurrently with each of the others of second-stage slaves 108 A-N.
  • one or more of first-stage slaves 106 A-N uses the frequent N-element itemsets stored in data repository 114 to determine candidate (N+1)-element itemsets during a next iteration of the operation.
  • the determinations of each successive iteration may be based upon the determinations of previous iterations.
  • each particular first-stage slave of first-stage slaves 106 A-N sends, to second-stage slaves 108 A-N, a message that indicates that the particular first-stage slave has finished evaluating the subset of database structures 112 A-N to which the particular first-stage slave was assigned.
  • a particular first-stage slave sends the message only after the particular first-stage slave has evaluated all of the database structures in the particular first-stage slave's assigned subset of data structures.
  • each of first-stage slaves 106 A-N waits, after sending such a message, to receive a signal from query coordinator 110 before proceeding to the next iteration of the operation.
  • the particular second-stage slave when a particular second-stage slave of second-stage slaves 108 A-N has received such a message from each of first-stage slaves 106 A-N, the particular second-stage slave begins to determine, based on the aggregated counts and database structure total, which of the candidate N-element itemsets are frequent N-element itemsets. In one embodiment, second-stage slaves 108 A-N do not begin to perform this determination until such a message has been received from each of first-stage slaves 106 A-N.
  • the particular second-stage slave when the particular second-stage slave has finished storing frequent N-element itemsets in data repository 114 , the particular second-stage slave sends a message to query coordinator 110 .
  • the message indicates to query coordinator 110 that the particular second-stage slave has finished.
  • second-stage slaves 108 A-N store frequent N-element itemsets in a particular segment of segments 116 A-B that has been designated as the “write” segment.
  • query coordinator 110 after query coordinator 110 has received a message from each of second-stage slaves 108 A-N indicating that those second-stage slaves have finished, query coordinator 110 swaps the designations of the “read” and “write” segments 116 A-B, so that the former “read” segment becomes the new “write” segment for the next iteration of the operation, and the former “write” segment becomes the new “read” segment for the next iteration of the operation. After the designations have been swapped, the newly designated “write” segment may be emptied.
  • query coordinator 110 after query coordinator 110 swaps the designations of the “read” and “write” segments, query coordinator 110 sends, to second-stage slaves 108 A-N, a reference to the newly designated “write” segment. In one embodiment, when second-stage slaves 108 A-N store frequent N-element itemsets in the “write” segment, second-stage slaves 108 A-N do so by writing the frequent N-element itemsets to a location based on the reference to the “write” segment.
  • query coordinator 110 after query coordinator 110 swaps the designations of the “read” and “write” segments, query coordinator 110 sends, to first-stage slaves 106 A-N, a reference to the newly designated “read” segment, which contains the frequent itemsets stored during the previous iteration of the operation.
  • first-stage slaves 106 A-N generates candidate (N+1)-element itemsets based on frequent N-element itemsets read from a location based on the reference to the “read” segment.
  • query coordinator 110 after sending the newly designated “read” segment reference to each of first-stage slaves 106 A-N as described above, query coordinator 110 sends, to each of first-stage slaves 106 A-N, a signal that informs first-stage slaves 106 A-N that first-stage slaves 106 A-N may proceed with the next iteration of the operation. In one embodiment, query coordinator 110 then waits to receive “finished” messages from each of second-stage slaves 108 A-N as described above.
  • first-stage slave determines (a) whether the currently designated “write” segment of segments 116 A-B is empty and (b) whether “N” is greater than “M” as used in the above context of N-element itemsets and the M-element itemset. In one embodiment, if the currently designated “write” segment is empty (meaning that there are no candidate (N+1)-element itemsets for the next iteration of the operation), or if “N” is greater than “M,” then the recursive operation is ended. Otherwise, “N” is incremented and the next iteration of the recursive is performed as described above.
  • FIG. 3 is a block diagram that illustrates a computer system 300 upon which an embodiment of the invention may be implemented.
  • Computer system 300 includes a bus 302 or other communication mechanism for communicating information, and a processor 304 coupled with bus 302 for processing information.
  • Computer system 300 also includes a main memory 306 , such as a random access memory (RAM) or other dynamic storage device, coupled to bus 302 for storing information and instructions to be executed by processor 304 .
  • Main memory 306 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 304 .
  • Computer system 300 further includes a read only memory (ROM) 308 or other static storage device coupled to bus 302 for storing static information and instructions for processor 304 .
  • a storage device 310 such as a magnetic disk or optical disk, is provided and coupled to bus 302 for storing information and instructions.
  • Computer system 300 may be coupled via bus 302 to a display 312 , such as a cathode ray tube (CRT), for displaying information to a computer user.
  • a display 312 such as a cathode ray tube (CRT)
  • An input device 314 is coupled to bus 302 for communicating information and command selections to processor 304 .
  • cursor control 316 is Another type of user input device
  • cursor control 316 such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 304 and for controlling cursor movement on display 312 .
  • This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
  • the invention is related to the use of computer system 300 for implementing the techniques described herein. According to one embodiment of the invention, those techniques are performed by computer system 300 in response to processor 304 executing one or more sequences of one or more instructions contained in main memory 306 . Such instructions may be read into main memory 306 from another computer-readable medium, such as storage device 310 . Execution of the sequences of instructions contained in main memory 306 causes processor 304 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
  • Non-volatile media includes, for example, optical or magnetic disks, such as storage device 310 .
  • Volatile media includes dynamic memory, such as main memory 306 .
  • Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 302 . Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
  • Computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.
  • Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to processor 304 for execution.
  • the instructions may initially be carried on a magnetic disk of a remote computer.
  • the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
  • a modem local to computer system 300 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal.
  • An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 302 .
  • Bus 302 carries the data to main memory 306 , from which processor 304 retrieves and executes the instructions.
  • the instructions received by main memory 306 may optionally be stored on storage device 310 either before or after execution by processor 304 .
  • Computer system 300 also includes a communication interface 318 coupled to bus 302 .
  • Communication interface 318 provides a two-way data communication coupling to a network link 320 that is connected to a local network 322 .
  • communication interface 318 may be an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line.
  • ISDN integrated services digital network
  • communication interface 318 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN.
  • LAN local area network
  • Wireless links may also be implemented.
  • communication interface 318 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
  • Network link 320 typically provides data communication through one or more networks to other data devices.
  • network link 320 may provide a connection through local network 322 to a host computer 324 or to data equipment operated by an Internet Service Provider (ISP) 326 .
  • ISP 326 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 328 .
  • Internet 328 uses electrical, electromagnetic or optical signals that carry digital data streams.
  • the signals through the various networks and the signals on network link 320 and through communication interface 318 which carry the digital data to and from computer system 300 , are exemplary forms of carrier waves transporting the information.
  • Computer system 300 can send messages and receive data, including program code, through the network(s), network link 320 and communication interface 318 .
  • a server 330 might transmit a requested code for an application program through Internet 328 , ISP 326 , local network 322 and communication interface 318 .
  • the received code may be executed by processor 304 as it is received, and/or stored in storage device 310 , or other non-volatile storage for later execution. In this manner, computer system 300 may obtain application code in the form of a carrier wave.

Abstract

A method and apparatus for performing recursive database operations is provided. According to one aspect, a plurality of first-stage slaves and a plurality of second-stage slaves are established in a database server. During one or more iterations of a recursive database operation, the first-stage slaves concurrently process data items stored in a data repository and send results to the second-stage slaves. The second-stage slaves receive the results and concurrently process those results. The second-stage slaves store the results of the second-stage slaves' processing in the data repository. Subsequent iterations of the recursive database operation proceed in this manner until the recursive database operation has been completed. In each iteration, the first-stage slaves consume the product of the second-stage slaves' previous iteration's processing, and the second-stage slaves consume the product of the first-stage slaves' current iteration's processing.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims priority under 35 U.S.C. §120 as a continuation of U.S. patent application Ser. No. 10/867,923, filed on Jun. 14, 2004, now U.S. Pat. No. 7,155,446, which application lists as inventors Thierry Cruanes, Wei Li, Ari Mozes and Benoit Dagville, which application is titled PERFORMING RECURSIVE DATABASE OPERATIONS, and which application claims domestic priority to provisional U.S. Patent Application Ser. No. 60/571,441, entitled PERFORMING RECURSIVE DATABASE OPERATIONS, filed May 14, 2004; the contents of both of which applications are hereby incorporated by reference in their entirety for all purposes. This application is also related to the following U.S. patent applications: Ser. No. 10/643,629, entitled FREQUENT ITEMSET COUNTING USING CLUSTERED PREFIXES AND INDEX SUPPORT, filed on Aug. 18, 2003; Ser. No. 10/643,563, entitled DYNAMIC SELECTION OF FREQUENT ITEMSET COUNTING TECHNIQUE, filed on Aug. 18, 2003; and Ser. No. 10/643,628, entitled EXPRESSING FREQUENT ITEMSET COUNTING OPERATIONS, filed on Aug. 18, 2003; the contents of each of which are hereby incorporated by reference in their entirety for all purposes.
FIELD OF THE INVENTION
The present invention relates to databases, and in particular, to performing recursive database operations.
BACKGROUND
Some computational tasks are especially suitable for recursive processing. One example of an operational task that lends itself especially well to recursive processing is “frequent itemset” determination. An “itemset” is a set of items. For example, one itemset might include the items (apple, banana), while another itemset might include the items (apple, orange), while yet another itemset might include the items (banana, orange). An itemset is “frequent”, relative to a set of data structures, if the number of the data structures that contain all of the items in the itemset is at least a specified fraction of the total number of the data structures in the set.
For example, in a set of three data structures, each data structure might represent a different customer's transaction at a supermarket. A first data structure might contain the items (apple, banana, milk), while a second data structure might contain the items (apple, banana, milk, orange), while a third data structure might contain the item (orange). Assuming that the specified fraction is ⅔, the itemset (apple, banana) is a frequent itemset because “apple” occurs with “banana” in two of the three data structures, but the itemsets (apple, orange) and (banana, orange) are not frequent itemsets because “apple” occurs with “orange” in only one of the three data structures and “banana” occurs with “orange” in only one of the three data structures. As the number of data structures in a set of data structures increases, the determination of whether a particular itemset is frequent relative to that set of data structures becomes more computationally intensive.
Frequent itemset determination lends itself especially well to recursive processing due at least in part to the observation that an N-element itemset cannot be a frequent itemset relative to a set of data structures unless all of the (N−1)-element subsets of the N-element itemset are also frequent itemsets relative to that set of data structures. For example, the 3-element itemset (apple, banana, milk) cannot be a frequent itemset relative to the set of data structures in the above example unless all of the 2-element subsets of that 3-element itemset, namely, (apple, banana), (apple, milk), and (banana, milk), are also frequent itemsets relative to the set of data structures in the above example.
This observation allows the computationally intensive determination of whether itemsets are frequent to be performed for fewer itemsets. The determination of whether a particular N-element itemset is frequent needs to be performed only if all of the (N−1)-element subsets of the particular N-element itemset are also frequent. Thus, for each successive value of N, the group of N-element itemsets for which this determination needs to be performed can be based on the determinations already performed for the (N−1)-element itemsets. Frequent itemset counting is, therefore, a task that can be performed more efficiently using a recursive approach.
According to one theoretical approach, frequent itemsets might be determined in the following manner. An application that is external to a database server might send a query to the database server. When executed, the query would cause the database server to select, from a set of data structures, each data structure that contains all of the items in a specified itemset. The database server would execute the query and return the selected data structures to the application. The application might count the selected data structures and determine whether the number of selected data structures meets a specified threshold. If the number of selected data structures met the specified threshold, then the application might place the specified itemset in a set of frequent itemsets. The application might perform the above steps for each 1-element itemset that is a subset of an M-element itemset, one 1-element itemset at a time, and one 1-element itemset after another.
Once the application had performed the above steps for each such 1-element itemset, the application might determine, for each particular 2-element subset of the M-element itemset, whether all of the 1-element subsets of that particular 2-element subset are contained in the set of frequent itemsets. If all of the 1-element subsets of the particular 2-element subset were contained in the group of frequent itemsets, then the application might send, to the database server, a query that would cause the database server to select, from the set of data structures, each data structure that contains all of the items in the particular 2-element itemset. The database server would execute the query and return the selected data structures to the application. The application might count the selected data structures and determine whether the number of selected data structures meets the specified threshold. If the number of selected data structures met the specified threshold, then the application might place the particular 2-element itemset in the set of frequent itemsets. The application might perform the above steps for each 2-element itemset that is a subset of the M-element itemset, one 2-element itemset at a time, and one 2-element itemset after another.
For each successive value of N, the application might perform the above steps for the N-element itemsets that are subsets of the M-element itemset until N was greater than M or there were no (N−1)-element itemsets in the set of frequent itemsets, whichever came first. Thus, by sending a multitude of queries to a database server in serial manner and counting the results of such queries, the application might determine frequent itemsets that are subsets of the M-element itemset.
Unfortunately, considerable overheard would be involved in the above approach. It would take significant time for the application to send the many queries to the database server and for the database server to send the results of the many queries back to the application.
Furthermore, because most of the operations performed in the above approach would be performed by the application (the database server would just execute queries and return the results), application programmers would be burdened with implementing the functionality required to perform most of the operations involved in the above approach.
These are some of the problems that would attend the above approach. A technique that overcomes these problems is needed.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
FIG. 1 is a block diagram that illustrates a system in which recursive database operations may be performed in a parallelized manner, according to an embodiment of the present invention;
FIG. 2 is a flow diagram that illustrates a technique for performing a recursive database operation using two stages of concurrently executing slaves, according to an embodiment of the present invention; and
FIG. 3 is a block diagram that illustrates a computer system upon which an embodiment of the invention may be implemented.
DETAILED DESCRIPTION
A method and apparatus is described for performing recursive database operations. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.
Overview
In order to perform recursive database operations more efficiently, according to one embodiment of the invention, a plurality of first-stage slaves and a plurality of second-stage slaves are established in a database server. In a first iteration of a recursive database operation, the first-stage slaves concurrently process data items and send the results of the first-stage slaves' processing to the second-stage slaves. The second-stage slaves receive the first results of the first-stages slaves' processing and concurrently process those first results. The second-stage slaves store the first results of the second-stage slaves' processing in a data repository.
In a second iteration of the recursive database operation, the first-stage slaves obtain the first results of the second-stage slaves' processing from the data repository, concurrently process those first results, and send the second results of the first-stage slaves' processing to the second-stage slaves. The second-stage slaves receive the second results of the first-stages slaves' processing and concurrently process those second results. The second-stage slaves store the second results of the second-stage slaves' processing in the data repository.
Subsequent iterations of the recursive database operation proceed in this manner until the recursive database operation has been completed. In each iteration, the first-stage slaves consume the product of the second-stage slaves' processing during the previous iteration, and the second-stage slaves consume the product of the first-stage slaves' processing during the current iteration.
Because the first-stage slaves and second-stage slaves are implemented in the database server, the above embodiment makes it unnecessary for an application to send multiple queries to and receive multiple results from the database server during the performance of the recursive database operation. Additionally, application programmers are spared the burden of programming applications to perform the processing that is performed by the first-stage slaves and second-stage slaves.
Multi-Slave Database Server
FIG. 1 is a block diagram that illustrates a system 100 in which recursive database operations may be performed in a parallelized manner, according to an embodiment of the present invention. System 100 comprises a database server 102 and a database 104. Database server 102 is communicatively coupled to database 104.
Database server 102 comprises a plurality of first-stage slaves 106A-N and a plurality of second-stage slaves 108A-N. Each of first-stage slaves 106A-N may be a separate thread of a first process, and each of second-stage slaves 108A-N may be a separate thread of a second process. Alternatively, each of first-stage slaves 106A-N and second-stage slaves 108A-N may be a separate thread of the same process. Alternatively, each of first-stage slaves 106A-N and each of second-stage slaves 108A-N may be a separate process. First-stage slaves 106A-N execute concurrently with each other and each of second-stage slaves 108A-N.
Database server 102 also comprises a query coordinator 110. Query coordinator 110 is a process that executes concurrently with first-stage slaves 106A-N and second-stage slaves 108A-N. Query coordinator 110 sends messages to, and receives messages from, first-stage slaves 106A-N and second-stage slaves 108A-N. Such messages may be sent and received using inter-process communication mechanisms, for example.
Database 104 comprises data structures 112A-N. Each of data structures 112A-N may be, for example, a separate row of a database table. In one embodiment, each of data structures 112A-N contains one or more items. For example, one of data structures 112A-N might contain items such as “apple,” “banana,” and “orange.” Database 104 also comprises a data repository 114. Data repository 114 may be, for example, a temporary database table.
In one embodiment, data repository 114 comprises two separate segments 116A and 116B. One of segments 116A-B is designated as a “read” segment, and one of segments 116A-B is designated as a “write” segment. In one embodiment, query coordinator 110 maintains state information that indicates which of segments 116A-B is currently designated as the “read” segment, and which of segments 116A-B is currently designated as the “write” segment. The designations of segments 116A-B can be swapped.
Dual-Stage Parallelized Performance of Recursive Database Operations
FIG. 2 is a block diagram that illustrates a technique 200 for performing a recursive database operation using two stages of concurrently executing slaves, according to an embodiment of the present invention. In block 202, a plurality of first-stage slaves is established in a database server. For example, first-stage slaves 106A-N may be established in database server 102. In block 204, a plurality of second-stage slaves in established in the database server. For example, second-stage slaves 108A-N may be established in database server 102.
An iteration of the recursive database operation is represented in blocks 206-212. In block 206, data items in a data repository are processed by causing the first-stage slaves to process concurrently the data items. For example, database server 102 may process a plurality of data items that are stored in data repository 114 by causing first-stage slaves 106A-N to process concurrently the data items. In one embodiment, each data item is a separate itemset.
In block 208, the products of the first-stage slaves' processing are processed by causing the second-stage slaves to process concurrently those products. For example, database server 102 may process the products of the first-stage slaves' earlier processing by causing second-stage slaves 108A-N to process concurrently the first-stage slaves' products.
In block 210, the products of the second-stage slaves' processing are stored in the data repository. For example, each of second-stage slaves 108A-N may store, in data repository 114, data items that are the products of the second-stage slaves' processing.
In block 212, it is determined whether the recursive database operation is complete. For example, query coordinator 110 may determine whether the recursive database operation is complete. If the recursive database operation is complete, then technique 200 ends. Alternative, if the recursive database operation is not complete, then control passes back to block 206.
Thus, if the recursive database operation is not complete, then, during a next iteration of the recursive database operation, the first-stage slaves will, in block 206, process concurrently the products produced by the second-stage slaves during the then-previous iteration of the recursive database operation. These products may be, for example, data items that second-stage slaves 108A-N stored in data repository 114.
Selecting Criteria-Satisfying Data Items
In one embodiment of the invention, at least some of the processing performed by first-stage slaves 106A-N, second-stage slaves 108A-N, or both, involves determining whether data items satisfy specified criteria, and producing only those of the data items that satisfy the specified criteria.
In one embodiment, second-stage slaves 108A-N concurrently process data items produced by the processing of first-stage slaves 106A-N by determining, for each particular data item, whether that particular data item satisfies specified criteria. In one embodiment, second-stage slaves 108A-N select only the data items that satisfy the specified criteria, and store in data repository 114 only the selected data items. In one embodiment, second-stage slaves 108A-N return, to an application or other set of processes or threads, only the selected data items. In one embodiment, because only the selected data items are stored in data repository 114, first-stage slaves 106A-N process only the selected data items during the next iteration of the recursive database operation.
Determining Candidate Itemsets
In one embodiment, the data items processed and produced by first-stage slaves 106A-N and second-stage slaves 108A-N are itemsets. In one embodiment, the processing performed by first-stage slaves 106A-N involves determining candidate itemsets and counting how many of data structures 112A-N contain all of the elements of each such candidate itemset.
As is discussed above, the observation that an N-element itemset cannot be a frequent itemset relative to a set of data structures unless all of the (N−1)-element subsets of the N-element itemset are also frequent itemsets relative to that set of data structures allows the computationally intensive determination of whether itemsets are frequent to be performed for fewer itemsets during each successive iteration of a recursive frequent itemset-determining operation.
Accordingly, in one embodiment, during an iteration of the recursive frequent itemset-determining operation, at least one of first-stage slaves 106A-N does the following. At least one of first stage slaves 106A-N reads (N−1)-element itemsets from data repository 114. The (N−1)-element itemsets are, in one embodiment, frequent (N−1)-element itemsets that second-stage slaves 108A-N stored in data repository 114 during a previous iteration of the operation. At least one of first-stage slaves 106A-N determines all of the different possible N-element itemsets that are subsets of an M-element itemset, where the M-element itemset includes one instance of every item that occurs in data structures 112A-N.
For example, if items (apple, banana, milk, orange) are the only items that occur once or more in data structures 112A-N, then the possible 2-element itemsets of that 4-element itemset are (apple, banana), (apple, milk), (apple, orange), (banana, milk), (banana, orange), and (milk, orange).
In one embodiment, for each such possible N-element subset, at least one of first-stage slaves 106A-N determines whether all of the (N−1)-element subsets of that possible N-element subset are frequent (N−1)-element itemsets that were read from data repository 114. If all of the (N−1)-element subsets of a particular N-element itemset are frequent (N−1) element itemsets, then at least one of first-stage slaves 106A-N places the particular N-element itemset in a group of candidate N-element itemsets. Otherwise, the particular N-element itemset is not placed in the group of candidate N-element itemsets.
For example, if (apple), (banana), and (orange) are the only frequent 1-element itemsets relative to data structures 112A-N, then (apple, banana), (apple, orange), and (banana, orange) might be frequent 2-element itemsets relative to data structures 112A-N, and are therefore candidate 2-element itemsets. However, in this example, (apple, milk), (banana, milk), and (milk, orange) cannot be frequent 2-element itemsets relative to data structures 112A-N because (milk) is not a frequent 1-element itemset relative to data structures 112A-N. Consequently, the determinations of whether (apple, milk), (banana, milk), and (milk, orange) actually are frequent 2-element itemsets relative to data structures 112A-N do not need to be performed.
In one embodiment, during the initial iteration of the operation, first-stage slaves 106A-N do not determine candidate 1-element itemsets based on the contents of data repository 114, because at that point, data repository 114 is empty. Instead, first-stage slaves 106A-N assume that all of the 1-element subsets of the M-element itemset referenced above are candidate 1-element itemsets.
In one embodiment, only one of first-stage slaves 106A-N determines the group of candidate N-element itemsets as discussed above. In one embodiment, after making this determination, that first-stage slave sends, to the others of first-stage slaves 106A-N, the candidate N-element itemsets.
Concurrently Counting Occurrences of Candidate Itemsets
As is discussed above, in one embodiment, a group of candidate N-element itemsets is generated based on the contents of data repository 114, and the contents of that group are obtained by each of first-stage slaves 106A-N. Once the group of candidate N-element itemsets is known, the occurrences of those candidate N-element itemsets in data structures 112A-N can be counted. Occurrences of non-candidate N-element itemsets do not need to be counted.
In order to count these occurrences more efficiently, in one embodiment, the counting is performed concurrently, or in parallel, by first-stage slaves 106A-N. Each of first-stage slaves 106A-N may be assigned a separate subset of data structures 112A-N. For example, if there were 10 first-stage slaves 106A-N and 100 data structures 112A-N, then each of first-stage slaves 106A-N might be assigned 10 separate data structures of data structures 112A-N. The divvying of data structures 112A-N among first-stage slaves 106A-N might be done according to range, for example, or according to a hash mapping, for another example.
Each of data structures 112A-N may indicate an identifier that is unique to that data structure. A particular data structure's identifier may be used to determine the range in which the particular data structure belongs. Alternatively, a particular data structure's identifier may be used as input to a hash function that produces a hash value to which the particular data structure corresponds. For example, if there were 10 first-stage slaves 106A-N and 100 data structures 112A-N, then the hash function might, for each data structure, divide that data structure's identifier by 10 and take the whole remainder resulting from that division to be the hash value associated with that data structure.
In one embodiment, each of first-stage slaves 106A-N counts, concurrently with each other of first-stage slaves 106A-N, the occurrences of the candidate N-element itemsets in data structures in the subset of data structures 112A-N that has been assigned to that first-stage slave. In one embodiment, because multiple slaves count occurrences of the candidate N-element itemsets in parallel, the time taken to count all such occurrences is reduced.
For example, the contents of data structures 112A-C assigned to first-stage slave 106A might be (apple, banana, milk), (apple, banana, milk, orange), and (orange), respectively, and the contents of data structures 112D-F assigned to first-stage slave 106B might be (banana, milk, orange), (apple, milk, orange), and (apple, banana, orange), respectively. In this example, assuming that the candidate 2-element itemsets are (apple, banana), (apple, milk), (apple, orange), (banana, milk), (banana, orange), and (milk, orange), first-stage slave 106A would count 2 occurrences of (apple, banana), 2 occurrences of (apple, milk), 1 occurrence of (apple, orange), 2 occurrences of (banana, milk), 1 occurrence of (banana, orange), and 1 occurrence of (milk, orange). In this example, assuming the same candidate 2-element itemsets, second-stage slave 106B would count 1 occurrence of (apple, banana), 1 occurrence of (apple, milk), 2 occurrences of (apple, orange), 1 occurrence of (banana, milk), 2 occurrences of (banana, orange), and 2 occurrences of (milk, orange).
In one embodiment, the counting of occurrences of candidate N-element itemsets is performed using either a “bitmap intersection” technique or a “prefix tree counting” technique. Both of these techniques are specifically described in co-pending U.S. patent application Ser. No. 10/643,563, titled “DYNAMIC SELECTION OF FREQUENT ITEMSET COUNTING TECHNIQUE.” In one embodiment, the technique that is used to count occurrences of candidate N-element itemsets is dynamically determined at each iteration of the operation. Thus, the counting technique used during one iteration of the operation may differ from the counting technique used during another iteration of the operation. Techniques for dynamically selecting counting techniques also are described in U.S. patent application Ser. No. 10/643,563.
Distributing Preliminary Counts Among Second-Stage Slaves
In one embodiment, each of first-stage slaves 106A-N counts the total number of data structures in the subset of data structures 112A-N assigned to that first-stage slave. In one embodiment, each particular first-stage slave of first-stage slaves 106A-N sends, to one or more of second-stage slaves 108A-N, at least the following type of information: (a) one or more of the candidate N-element itemsets, (b) a count of occurrences of those candidate N-element itemsets in those of data structures 112A-N assigned to the particular first-stage slave, and (c) a count of the total number of those of data structures 112A-N assigned to the particular first-stage slave.
In one embodiment, each of first-stage slaves 106A-N sends the above type of information to multiple second-stage slaves of second-stage slaves 108A-N. In one embodiment, all of the counts associated with a particular candidate N-element itemset are sent to the same second-stage slave, regardless of which first stage-slave determined the counts. However, counts associated with different N-element itemsets may be sent to different second-stage slaves.
For example, assuming the candidate 2-element itemsets of the example above, each of first-stage slaves 106A-N may send counts to various ones of second-stage slaves 108A-N in the following manner: counts associated with (apple, banana) may be sent to second-stage slave 108A, counts associated with (apple, milk) may be sent to second-stage slave 108B, counts associated with (apple, orange) may be sent to second-stage slave 108C, counts associated with (banana, milk) may be sent to second-stage slave 108D, counts associated with (banana, orange) may be sent to second-stage slave 108E, and counts associated with (milk, orange) may be sent to second-stage slave 108F.
For another example, if there are fewer second-stage slaves 108A-N, then counts associated with (apple, banana) and counts associated with (apple, milk) may be sent to second-stage slave 108A, counts associated with (apple, orange) and counts associated with (banana, milk) may be sent to second-stage slave 108B, and counts associated with (banana, orange) and counts associated with (milk, orange) may be sent to second-stage slave 108C.
In one embodiment, the candidate itemsets are divvied among second-stage slaves 108A-N in as balanced a manner as possible, so that each of second-stage slaves 108A-N receives counts associated with approximately the same number of candidate itemsets during a particular iteration of the operation. Candidate itemsets may be divvied among second-stage slaves 108A-N through a hash-mapping technique, for example. For example, the elements of a particular candidate itemset may be enumerated, combined, and input into a hash function. Counts associated with the particular candidate itemset may be sent to the second-stage slave that is associated with the hash value produced by the hash function.
Concurrently Aggregating Preliminary Counts
In one embodiment, the processing performed by second-stage slaves 108A-N involves aggregating preliminary counts received from first-stage slaves 106A-N, and selecting one or more candidate N-element itemsets based on whether aggregate counts for those itemsets meet a specified threshold.
In one embodiment, due to the techniques described above, each of second-stage slaves 108A-N receives, from first-stage slaves 106A-N, one or more preliminary occurrence counts respectively associated with one or more separate subsets of the candidate N-element itemsets. In one embodiment, each of second-stage slaves 108A-N also receives, from each of first-stage slaves 106A-N that sends a preliminary occurrence count to that second-stage slave, a count of the total number of data structures 112A-N that were assigned to that first-stage slave.
For example, each of first-stage slaves 106A-N might determine, for 10 separate database structures of database structures 112A-N, how many occurrences of each of the candidate 2-element itemsets are within those database structures. Continuing the example, second-stage slave 108A might be associated with candidate 2-element itemsets (apple, banana) and (apple, milk). Therefore, in this example, each particular first-stage slave of first-stage slaves 106A-N sends, to second-stage slave 108A, information that indicates: the number of database structures that the particular first-stage slave evaluated (in this case, 10), the number of occurrences of (apple, banana) in the database structures that the particular first-stage slave evaluated, and the number of occurrences of (apple, milk) in the database structures that the particular first-stage slave evaluated.
In one embodiment, each particular second-stage slave of second-stage slaves 108A-N separately aggregates (i.e., adds up), for each of the candidate N-element itemsets with which the particular second-stage slave is associated, the preliminary counts that the particular second-stage slave receives from first-stage slaves 106A-N. For example, second-stage slave 108A might receive, from first-stage slave 106A, a count of 2 for (apple, banana) and a count of 2 for (apple, milk). Continuing the example, second-stage slave 108A might receive, from first-stage slave 106B, a count of 1 for (apple, banana) and a count of 1 for (apple, milk). Therefore, in this example, second-stage slave 108A would determine an aggregate count of 3 for (apple, banana) and an aggregate count of 3 for (apple, milk).
In one embodiment, second-stage slaves 108A-N concurrently determine aggregate counts for candidate N-element itemsets. For example, second-stage slave 108A may aggregate counts for (apple, banana) and (apple, milk) at the same time that second-stage slave 108B aggregates counts for (apple, orange) and (banana, milk), and at the same time that second-stage slave 108C aggregates counts for (banana, orange) and (milk, orange).
In one embodiment, second-stage slaves 108A-N aggregate counts concurrently with first-stage slaves 106A-N determining the counts and sending the counts to second-stage slaves 108A-N.
Selecting Frequent Itemsets from Among Candidate Itemsets
As is described above, each of second-stage slaves 108A-N receive, from each of first-stage slaves 106A-N that sends a preliminary occurrence count to that second-stage slave, a count of the total number of data structures 112A-N that were assigned to that first-stage slave. In one embodiment, each of second-stage slaves aggregates each such total number of data structures to determine an aggregate total number of data structures 112A-N.
In one embodiment, once a particular second-stage slave of second-stage slaves 108A-N has determined an aggregate count for a particular candidate N-element itemset, as described above, the particular second-stage slave determines whether the aggregate count for the particular candidate N-element itemset is at least as great as a specified fraction of the aggregate total number of data structures 112A-N. The specified fraction is referred to herein as the “threshold.” If the aggregate count for the particular candidate N-element itemset is at least as great as the number derived from the specified threshold, then the particular second-stage slave determines that the particular candidate N-element is a frequent N-element itemset relative to data structures 112A-N.
For example, second-stage slave 108A might receive, from first-stage slave 106A, an indication that first-stage slave 106A evaluated 3 of database structures 112A-N. Continuing the example, second-stage slave 108A might receive, from first-stage slave 106B, an indication that first-stage slave 106B evaluated 3 of database structures 112A-N. Assuming for purposes of example that these were the only “database structure totals” received from first-stage slaves 106A-B, second-stage slave 108A determines that the aggregate total number of data structures 112A-N is 6 (i.e., 3+3).
Continuing the example, if the aggregate count for (apple, banana) is 3, then second-stage slave 108A determines that the fraction of data structures 112A-N that contain (apple, banana) is ½ (i.e., 3/6). Continuing the example, if the aggregate count for (apple, milk) is 3, then second-stage slave 108A determines that the fraction of data structures 112A-N that contain (apple, milk) is also ½ (i.e., 3/6). Assuming for purposes of the example that the specified threshold is ⅓, second-stage slave 108A determines that both (apple, banana) and (apple, milk) are frequent 2-element itemsets relative to data structures 112A-N.
In one embodiment, each of second-stage slaves 108A-N determines, for each of the candidate N-element itemsets with which that second-stage slave is associated, whether that candidate N-element itemset is a frequent N-element itemset relative to data structures 112A-N. In one embodiment, each of second-stage slaves 108A-N selects, from among the candidate N-element itemsets with which that second-stage slave is associated, only the frequent N-element itemsets.
In one embodiment, for a particular iteration of the operation, each of second-stage slaves 108A-N stores, in data repository 114, only the selected frequent N-element itemsets. In one embodiment, each of second-stage slaves 108A-N sends, to one or more other entities that may include an application external to database server 102, the selected frequent N-element itemsets. In one embodiment, each of second-stage slaves 108A-N performs the above determination, selection, storage, and sending concurrently with each of the others of second-stage slaves 108A-N.
In one embodiment, one or more of first-stage slaves 106A-N uses the frequent N-element itemsets stored in data repository 114 to determine candidate (N+1)-element itemsets during a next iteration of the operation. Thus, the determinations of each successive iteration may be based upon the determinations of previous iterations.
Flow Control
In one embodiment, each particular first-stage slave of first-stage slaves 106A-N sends, to second-stage slaves 108A-N, a message that indicates that the particular first-stage slave has finished evaluating the subset of database structures 112A-N to which the particular first-stage slave was assigned. A particular first-stage slave sends the message only after the particular first-stage slave has evaluated all of the database structures in the particular first-stage slave's assigned subset of data structures. In one embodiment, each of first-stage slaves 106A-N waits, after sending such a message, to receive a signal from query coordinator 110 before proceeding to the next iteration of the operation.
In one embodiment, when a particular second-stage slave of second-stage slaves 108A-N has received such a message from each of first-stage slaves 106A-N, the particular second-stage slave begins to determine, based on the aggregated counts and database structure total, which of the candidate N-element itemsets are frequent N-element itemsets. In one embodiment, second-stage slaves 108A-N do not begin to perform this determination until such a message has been received from each of first-stage slaves 106A-N.
In one embodiment, for each particular second-stage slave of second-stage slaves 108A-N, when the particular second-stage slave has finished storing frequent N-element itemsets in data repository 114, the particular second-stage slave sends a message to query coordinator 110. The message indicates to query coordinator 110 that the particular second-stage slave has finished. In one embodiment, second-stage slaves 108A-N store frequent N-element itemsets in a particular segment of segments 116A-B that has been designated as the “write” segment.
In one embodiment, after query coordinator 110 has received a message from each of second-stage slaves 108A-N indicating that those second-stage slaves have finished, query coordinator 110 swaps the designations of the “read” and “write” segments 116A-B, so that the former “read” segment becomes the new “write” segment for the next iteration of the operation, and the former “write” segment becomes the new “read” segment for the next iteration of the operation. After the designations have been swapped, the newly designated “write” segment may be emptied.
In one embodiment, after query coordinator 110 swaps the designations of the “read” and “write” segments, query coordinator 110 sends, to second-stage slaves 108A-N, a reference to the newly designated “write” segment. In one embodiment, when second-stage slaves 108A-N store frequent N-element itemsets in the “write” segment, second-stage slaves 108A-N do so by writing the frequent N-element itemsets to a location based on the reference to the “write” segment.
In one embodiment, after query coordinator 110 swaps the designations of the “read” and “write” segments, query coordinator 110 sends, to first-stage slaves 106A-N, a reference to the newly designated “read” segment, which contains the frequent itemsets stored during the previous iteration of the operation. In one embodiment, one or more of first-stage slaves 106A-N generates candidate (N+1)-element itemsets based on frequent N-element itemsets read from a location based on the reference to the “read” segment.
In one embodiment, after sending the newly designated “read” segment reference to each of first-stage slaves 106A-N as described above, query coordinator 110 sends, to each of first-stage slaves 106A-N, a signal that informs first-stage slaves 106A-N that first-stage slaves 106A-N may proceed with the next iteration of the operation. In one embodiment, query coordinator 110 then waits to receive “finished” messages from each of second-stage slaves 108A-N as described above.
In one embodiment, after at least one of first-stage slaves 106A-N has received a signal from query coordinator 110 that first-stage slaves 106A-N may proceed with the next iteration of the operation, that first-stage slave (or slaves) determines (a) whether the currently designated “write” segment of segments 116A-B is empty and (b) whether “N” is greater than “M” as used in the above context of N-element itemsets and the M-element itemset. In one embodiment, if the currently designated “write” segment is empty (meaning that there are no candidate (N+1)-element itemsets for the next iteration of the operation), or if “N” is greater than “M,” then the recursive operation is ended. Otherwise, “N” is incremented and the next iteration of the recursive is performed as described above.
Hardware Overview
FIG. 3 is a block diagram that illustrates a computer system 300 upon which an embodiment of the invention may be implemented. Computer system 300 includes a bus 302 or other communication mechanism for communicating information, and a processor 304 coupled with bus 302 for processing information. Computer system 300 also includes a main memory 306, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 302 for storing information and instructions to be executed by processor 304. Main memory 306 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 304. Computer system 300 further includes a read only memory (ROM) 308 or other static storage device coupled to bus 302 for storing static information and instructions for processor 304. A storage device 310, such as a magnetic disk or optical disk, is provided and coupled to bus 302 for storing information and instructions.
Computer system 300 may be coupled via bus 302 to a display 312, such as a cathode ray tube (CRT), for displaying information to a computer user. An input device 314, including alphanumeric and other keys, is coupled to bus 302 for communicating information and command selections to processor 304. Another type of user input device is cursor control 316, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 304 and for controlling cursor movement on display 312. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
The invention is related to the use of computer system 300 for implementing the techniques described herein. According to one embodiment of the invention, those techniques are performed by computer system 300 in response to processor 304 executing one or more sequences of one or more instructions contained in main memory 306. Such instructions may be read into main memory 306 from another computer-readable medium, such as storage device 310. Execution of the sequences of instructions contained in main memory 306 causes processor 304 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
The term “computer-readable medium” as used herein refers to any medium that participates in providing instructions to processor 304 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 310. Volatile media includes dynamic memory, such as main memory 306. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 302. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.
Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to processor 304 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 300 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 302. Bus 302 carries the data to main memory 306, from which processor 304 retrieves and executes the instructions. The instructions received by main memory 306 may optionally be stored on storage device 310 either before or after execution by processor 304.
Computer system 300 also includes a communication interface 318 coupled to bus 302. Communication interface 318 provides a two-way data communication coupling to a network link 320 that is connected to a local network 322. For example, communication interface 318 may be an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 318 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 318 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
Network link 320 typically provides data communication through one or more networks to other data devices. For example, network link 320 may provide a connection through local network 322 to a host computer 324 or to data equipment operated by an Internet Service Provider (ISP) 326. ISP 326 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 328. Local network 322 and Internet 328 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 320 and through communication interface 318, which carry the digital data to and from computer system 300, are exemplary forms of carrier waves transporting the information.
Computer system 300 can send messages and receive data, including program code, through the network(s), network link 320 and communication interface 318. In the Internet example, a server 330 might transmit a requested code for an application program through Internet 328, ISP 326, local network 322 and communication interface 318.
The received code may be executed by processor 304 as it is received, and/or stored in storage device 310, or other non-volatile storage for later execution. In this manner, computer system 300 may obtain application code in the form of a carrier wave.
In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the invention, and is intended by the applicants to be the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims (28)

1. A method comprising:
during a first iteration of a multi-iteration operation performed by a first set of slaves and a second set of slaves, generating results of operations performed by the second set of slaves;
during each iteration of two or more iterations of the multi-iteration operation, the two or more iterations occurring after the first iteration of the multi-iteration operation, (a) providing, to the first set of slaves, results of operations performed by the second set of slaves during a previous iteration of the multi-iteration operation, and (b) providing, to the second set of slaves, results of operations performed by the first set of slaves during the current iteration of the multi-iteration operation;
wherein the operations performed by the first set of slaves are first operations for counting occurrences of itemsets in each of a plurality of data subsets;
wherein the operations performed by the second set of slaves are second operations for identifying which of said itemsets occur above a threshold frequency within the plurality of data subsets;
wherein each slave in the second set of slaves performs the second operations for a different subset of said itemsets;
wherein each slave in the first set of slaves and the second set of slaves is a process or thread executed by one or more processors in a database system;
wherein each iteration of the multi-iteration operation is performed by one or more computing devices.
2. The method of claim 1, wherein:
the first set of slaves executes concurrently with the second set of slaves;
at least some of the operations performed by the first set of slaves are performed during a same time that at least some of the operations performed by the second set of slaves are performed;
during each particular iteration of the two or more iterations:
(a) operations performed by the second set of slaves during said particular iteration are operations that the second set of slaves performs on results of operations performed by the first set of slaves during the particular iteration; and
(b) operations performed by the first set of slaves during said particular iteration are operations that the first set of slaves performs on results of operations performed by the second set of slaves during said iteration previous to said particular iteration.
3. The method of claim 1, wherein, during each iteration of the two or more iterations, the first set of slaves reads, from a repository common to each of the first set of slaves, results that were (a) results of operations performed by the second set of slaves during the previous iteration of the multi-iteration operation, and (b) stored in the repository by the second set of slaves during the previous iteration of the multi-iteration operation.
4. The method of claim 1, further comprising:
during a first iteration of the multi-iteration operation, determining candidate N-element sets based on (N−1)-element sets stored in a data repository.
5. The method of claim 4, wherein determining the candidate N-element sets comprises:
determining possible N-element sets that are subsets of an M-element set; and
for each particular possible N-element set of the possible N-element sets, performing steps comprising:
determining whether all (N−1)-element subsets of the particular possible N-element set are stored in the data repository; and
including the particular possible N-element set in the candidate N-element sets only if all (N−1)-element subsets of the particular possible N-element set are stored in the data repository.
6. The method of claim 4, further comprising:
assigning a first subset of a set of data structures to a first first-stage slave of the first set of slaves, wherein each data structure in the set of data structures comprises a separate set of elements;
assigning a second subset of the set of data structures to a second first-stage slave of the first set of slaves;
causing the first first-stage slave to perform first steps; and
causing the second first-stage slave to perform second steps;
wherein the first steps comprise determining, for each particular candidate N-element set of the candidate N-element sets, how many data structures in the first subset contain all of the elements of the particular candidate N-element set; and
wherein the second steps comprise determining, for each particular candidate N-element set of the candidate N-element sets, how many data structures in the second subset contain all of the elements of the particular candidate N-element set;
wherein the first steps and the second steps are performed concurrently.
7. The method of claim 6, wherein the first steps further comprise:
for each particular candidate N-element set of the candidate N-element sets, determining a separate hash value that corresponds to the particular candidate N-element set; and
for each particular candidate N-element set of the candidate N-element sets, (a) selecting, from among the second set of slaves, a second-stage slave that corresponds to the separate hash value that corresponds to the particular candidate N-element set, and (b) sending, to said second-stage slave, separate count information that includes an indication of how many data structures in the first subset contain all of the elements of the particular candidate N-element set.
8. The method of claim 4, further comprising
causing a first second-stage slave of the second set of slaves to perform first steps; and
causing a second second-stage slave of the second set of slaves to perform second steps;
wherein the first steps and the second steps are performed concurrently;
wherein the first steps comprise:
receiving, from a first first-stage slave of the first set of slaves, first count information for a first candidate N-element set of the candidate N-element sets;
receiving, from a second first-stage slave of the first set of slaves, second count information for the first candidate N-element set, wherein the second first-stage slave is separate from the first first-stage slave;
determining a first sum of at least the first count information for the first candidate N-element set and the second count information for the first candidate N-element set;
determining whether the first sum satisfies a specified threshold; and
including the first candidate N-element set in the second processed data items only if the first sum satisfies the specified threshold; and
wherein the second steps comprise:
receiving, from the first first-stage slave, first count information for a second candidate N-element set of the candidate N-element sets, wherein the second candidate N-element set differs from the first candidate N-element set;
receiving, from the second first-stage slave, second count information for the second candidate N-element set;
determining a second sum of at least the first count information for the second candidate N-element set and the second count information for the second candidate N-element set;
determining whether the second sum satisfies the specified threshold; and
including the second candidate N-element set in the second processed data items only if the second sum satisfies the specified threshold.
9. The method of claim 8, further comprising:
during a second iteration of the multi-iteration operation, determining candidate (N+1)-element sets based at least in part on N-element sets stored in the data repository.
10. The method of claim 9, wherein determining the candidate (N+1)-element sets comprises:
determining possible (N+1)-element sets that are subsets of the M-element set; and
for each particular possible (N+1)-element set of the possible (N+1)-element sets, performing steps comprising:
determining whether all N-element subsets of the particular possible (N+1)-element set are stored in the data repository; and
including the particular possible (N+1)-element set in the candidate (N+1)-element sets only if all N-element subsets of the particular possible (N+1)-element set are stored in the data repository.
11. The method of claim 9, further comprising:
causing the first first-stage slave to perform third steps; and
causing the second first-stage slave to perform fourth steps;
wherein the third steps and the fourth steps are performed concurrently;
wherein the third steps comprise determining, for each particular candidate (N+1)-element set of the candidate (N+1)-element sets, how many data structures in the first subset contain all of the elements of the particular candidate (N+1)-element set; and
wherein the fourth steps comprise determining, for each particular candidate (N+1)-element set of the candidate (N+1)-element sets, how many data structures in the second subset contain all of the elements of the particular candidate (N+1)-element set.
12. The method of claim 9, further comprising:
causing the first second-stage slave to perform third steps; and
causing the second second-stage slave to perform fourth steps;
wherein the third steps and the fourth steps are performed concurrently;
wherein the third steps comprise:
receiving, from the first first-stage slave, first count information for a first candidate (N+1)-element set of the candidate (N+1)-element sets;
receiving, from the second first-stage slave, second count information for the first candidate (N+1)-element set;
determining a third sum of at least the first count information for the first candidate (N+1)-element set and the second count information for the first candidate (N+1)-element set;
determining whether the third sum satisfies the specified threshold; and
including the first candidate (N+1)-element set in the fourth processed data items only if the third sum satisfies the specified threshold; and
wherein the fourth steps comprise:
receiving, from the first first-stage slave, first count information for a second candidate (N+1)-element set of the candidate (N+1)-element sets, wherein the second candidate (N+1)-element set differs from the first candidate (N+1)-element set;
receiving, from the second first-stage slave, second count information for the second candidate (N+1)-element set;
determining a fourth sum of at least the first count information for the second candidate (N+1)-element set and the second count information for the second candidate (N+1)-element set;
determining whether the fourth sum satisfies the specified threshold; and
including the second candidate (N+1)-element set in the fourth processed data items only if the fourth sum satisfies the specified threshold.
13. One or more computer-readable storage media storing one or more sequences of instructions which, when executed by one or more processors, causes:
during a first iteration of a multi-iteration operation, generating results of operations performed by a second set of slaves;
during each iteration of two or more iterations of the multi-iteration operation, the two or more iterations occurring after the first iteration of the multi-iteration operation, (a) providing, to a first set of slaves, results of operations performed by a second set of slaves during a previous iteration of the multi-iteration operation, and (b) providing, to the second set of slaves, results of operations performed by the first set of slaves during the current iteration of the multi-iteration operation;
wherein the operations performed by the first set of slaves include first operations for counting occurrences of itemsets in each of a plurality of data subsets:
wherein the operations performed by the second set of slaves include second operations for identifying which of said itemsets occur above a threshold frequency within the plurality of data subsets;
wherein each slave in the second set of slaves performs the second operations for a different subset of said itemsets;
wherein each slave in the first set of slaves and the second set of slaves is a process or thread executed by one or more processors in a database system.
14. The one or more computer-readable storage media of claim 13, wherein:
the first set of slaves executes concurrently with the second set of slaves;
at least some of the operations performed by the first set of slaves are performed during a same time that at least some of the operations performed by the second set of slaves are performed;
during each particular iteration of the two or more iterations;
(a) operations performed by the second set of slaves during said particular iteration are operations that the second set of slaves performs on results of operations performed by the first set of slaves during the particular iteration; and
(b) operations performed by the first set of slaves during said particular iteration are operations that the first set of slaves performs on results of operations performed by the second set of slaves during said iteration previous to said particular iteration.
15. The one or more computer-readable storage media of claim 13, wherein during each iteration of the two or more iterations, the first set of slaves reads, from a repository common to each of the first set of slaves, results that were (a) results of operations performed by the second set of slaves during the previous iteration of the multi-iteration operation, and (b) stored in the repository by the second set of slaves during the previous iteration of the multi-iteration operation.
16. The one or more computer-readable storage media of claim 13, wherein the one or more sequences of instructions, when executed by the one or more processors, further causes;
during a first iteration of the multi-iteration operation, determining candidate N-element sets based on (N−1)-element sets stored in a data repository.
17. The one or more computer-readable storage media of claim 16, wherein determining the candidate N-element sets comprises:
determining possible N-element sets that are subsets of an M-element set; and
for each particular possible N-element set of the possible N-element sets, performing steps comprising:
determining whether all (N−1)-element subsets of the particular possible N-element set are stored in the data repository; and
including the particular possible N-element set in the candidate N-element sets only if all (N−1)-element subsets of the particular possible N-element set are stored in the data repository.
18. The one or more computer-readable storage media of claim 16, wherein the one or more sequences of instructions, when executed by the one or more processors, further causes;
assigning a first subset of a set of data structures to a first first-stage slave of the first set of slaves, wherein each data structure in the set of data structures comprises a separate set of elements;
assigning a second subset of the set of data structures to a second first-stage slave of the first set of slaves;
causing the first first-stage slave to perform first steps; and
causing the second first-stage slave to perform second steps;
wherein the first steps comprise determining, for each particular candidate N-element set of the candidate N-element sets, how many data structures in the first subset contain all of the elements of the particular candidate N-element set; and
wherein the second steps comprise determining, for each particular candidate N-element set of the candidate N-element sets, how many data structures in the second subset contain all of the elements of the particular candidate N-element set;
wherein the first steps and the second steps are performed concurrently.
19. The one or more computer-readable storage media of claim 18, wherein the first steps further comprise:
for each particular candidate N-element set of the candidate N-element sets, determining a separate hash value that corresponds to the particular candidate N-element set; and
for each particular candidate N-element set of the candidate N-element sets, (a) selecting, from among the second set of slaves, a second-stage slave that corresponds to the separate hash value that corresponds to the particular candidate N-element set, and (b) sending, to said second-stage slave, separate count information that includes an indication of how many data structures in the first subset contain all of the elements of the particular candidate N-element set.
20. The one or more computer-readable storage media of claim 16, wherein the one or more sequences of instructions, when executed by the one or more processors, further causes;
causing a first second-stage slave of the second set of slaves to perform first steps; and
causing a second second-stage slave of the second set of slaves to perform second steps;
wherein the first steps and the second steps are performed concurrently;
wherein the first steps comprise:
receiving, from a first first-stage slave of the first set of slaves, first count information for a first candidate N-element set of the candidate N-element sets;
receiving, from a second first-stage slave of the first set of slaves, second count information for the first candidate N-element set, wherein the second first-stage slave is separate from the first first-stage slave;
determining a first sum of at least the first count information for the first candidate N-element set and the second count information for the first candidate N-element set;
determining whether the first sum satisfies a specified threshold; and
including the first candidate N-element set in the second processed data items only if the first sum satisfies the specified threshold; and
wherein the second steps comprise:
receiving, from the first first-stage slave, first count information for a second candidate N-element set of the candidate N-element sets, wherein the second candidate N-element set differs from the first candidate N-element set;
receiving, from the second first-stage slave, second count information for the second candidate N-element set;
determining a second sum of at least the first count information for the second candidate N-element set and the second count information for the second candidate N-element set;
determining whether the second sum satisfies the specified threshold; and
including the second candidate N-element set in the second processed data items only if the second sum satisfies the specified threshold.
21. The one or more computer-readable storage media of claim 20, wherein the one or more sequences of instructions, when executed by the one or more processors, further causes;
during a second iteration of the multi-iteration operation, determining candidate (N+1)-element sets based at least in part on N-element sets stored in the data repository.
22. The one or more computer-readable storage media of claim 21, wherein determining the candidate (N+1)-element sets comprises:
determining possible (N+1)-element sets that are subsets of the M-element set; and
for each particular possible (N+1)-element set of the possible (N+1)-element sets, performing steps comprising:
determining whether all N-element subsets of the particular possible (N+1)-element set are stored in the data repository; and
including the particular possible (N+1)-element set in the candidate (N+1)-element sets only if all N-element subsets of the particular possible (N+1)-element set are stored in the data repository.
23. The one or more computer-readable storage media of claim 21, wherein the one or more sequences of instructions, when executed by the one or more processors, further causes;
causing the first first-stage slave to perform third steps; and
causing the second first-stage slave to perform fourth steps;
wherein the third steps and the fourth steps are performed concurrently;
wherein the third steps comprise determining, for each particular candidate (N+1)-element set of the candidate (N+1)-element sets, how many data structures in the first subset contain all of the elements of the particular candidate (N+1)-element set; and
wherein the fourth steps comprise determining, for each particular candidate (N+1)-element set of the candidate (N+1)-element sets, how many data structures in the second subset contain all of the elements of the particular candidate (N+1)-element set.
24. The one or more computer-readable storage media of claim 21, wherein the one or more sequences of instructions, when executed by the one or more processors, further causes;
causing the first second-stage slave to perform third steps; and
causing the second second-stage slave to perform fourth steps;
wherein the third steps and the fourth steps are performed concurrently;
wherein the third steps comprise:
receiving, from the first first-stage slave, first count information for a first candidate (N+1)-element set of the candidate (N+1)-element sets;
receiving, from the second first-stage slave, second count information for the first candidate (N+1)-element set;
determining a third sum of at least the first count information for the first candidate (N+1)-element set and the second count information for the first candidate (N+1)-element set;
determining whether the third sum satisfies the specified threshold; and
including the first candidate (N+1)-element set in the fourth processed data items only if the third sum satisfies the specified threshold; and
wherein the fourth steps comprise:
receiving, from the first first-stage slave, first count information for a second candidate (N+1)-element set of the candidate (N+1)-element sets, wherein the second candidate (N+1)-element set differs from the first candidate (N+1)-element set;
receiving, from the second first-stage slave, second count information for the second candidate (N+1)-element set;
determining a fourth sum of at least the first count information for the second candidate (N+1)-element set and the second count information for the second candidate (N+1)-element set;
determining whether the fourth sum satisfies the specified threshold; and
including the second candidate (N+1)-element set in the fourth processed data items only if the fourth sum satisfies the specified threshold.
25. A database system comprising:
one or more processors;
a database;
a first set of slaves and a second set of slaves executing on the one or more processors,
wherein the first set of slaves is different than the second set of slaves;
wherein the one or more processors are configured to:
during a first iteration of a multi-iteration operation performed by the first set of slaves and the second set of slaves, generate results of operations performed by the second set of slaves;
during each iteration of two or more iterations of the multi-iteration operation, the two or more iterations occurring after the first iteration of the multi-iteration operation, (a) provide, to the first set of slaves, results of operations performed by the second set of slaves during a previous iteration of the multi-iteration operation, and (b) provide, to the second set of slaves, results of operations performed by the first set of slaves during the current iteration of the multi-iteration operation;
wherein the operations performed by the first set of slaves are first operations for counting occurrences of itemsets in each of a plurality of data subsets;
wherein the operations performed by the second set of slaves are second operations for identifying which of said itemsets occur above a threshold frequency within the plurality of data subsets;
wherein each slave in the second set of slaves performs the second operations for a different subset of said itemsets;
wherein each slave in the first set of slaves and the second set of slaves is a process or thread executed by the one or more processors.
26. The system of claim 25, wherein:
for each of the candidate itemsets, each first slave of the first set of slaves is configured to send a count of occurrences to only a second slave in the second set of slaves that is associated with the candidate itemset, wherein the count of occurrences is a count of occurrences of the candidate itemset in a subset of data assigned to the second slave.
27. A method for performing a multiple-iteration operation for identifying frequent itemsets in a database, the method comprising:
storing, in a first segment of data, a first set of frequent itemsets discovered during a first iteration of the multi-iteration operation;
storing, in a second segment of data, a second set of frequent itemsets discovered during a second iteration of the multiple-iteration operation;
storing a designation indicating that the second segment is readable for a third iteration of the multiple-iteration operation;
based on the designation, determining to utilize the data in the second segment as input during the third iteration;
based on the designation, determining to output results of the third iteration to the first segment;
replacing the data in the first segment with a third set of frequent itemsets discovered during the third iteration;
modifying the designation to indicate that the first segment is readable for a fourth iteration of the multiple-iteration operation;
based on the designation, determining to utilize the data in the first segment as input during the fourth iteration;
based on the designation, determining to output results of the fourth iteration to the second segment;
replacing the data in the second segment with a fourth set of frequent itemsets discovered during the fourth iteration;
wherein each iteration of the multiple-iteration operation is performed by one or more computing devices.
28. One or more computer-readable storage media storing one or more sequences of instructions which, when executed by one or more processors, causes;
storing, in a first segment of data, a first set of frequent itemsets discovered during a first iteration of the multi-iteration operation;
storing, in a second segment of data, a second set of frequent itemsets discovered during a second iteration of the multiple-iteration operation;
storing a designation indicating that the second segment is readable for a third iteration of the multiple-iteration operation;
based on the designation, determining to utilize the data in the second segment as input during the third iteration;
based on the designation, determining to output results of the third iteration to the first segment;
replacing the data in the first segment with a third set of frequent itemsets discovered during the third iteration;
modifying the designation to indicate that the first segment is readable for a fourth iteration of the multiple-iteration operation;
based on the designation, determining to utilize the data in the first segment as input during the fourth iteration;
based on the designation, determining to output results of the fourth iteration to the second segment;
replacing the data in the second segment with a fourth set of frequent itemsets discovered during the fourth iteration;
wherein each iteration of the multiple-iteration operation is performed by one or more computing devices.
US11/600,272 2004-05-14 2006-11-14 Performing recursive database operations Active 2025-05-15 US7698312B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/600,272 US7698312B2 (en) 2004-05-14 2006-11-14 Performing recursive database operations

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US57144104P 2004-05-14 2004-05-14
US10/867,923 US7155446B2 (en) 2004-05-14 2004-06-14 Performing recursive database operations
US11/600,272 US7698312B2 (en) 2004-05-14 2006-11-14 Performing recursive database operations

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/867,923 Continuation US7155446B2 (en) 2004-05-14 2004-06-14 Performing recursive database operations

Publications (2)

Publication Number Publication Date
US20070067327A1 US20070067327A1 (en) 2007-03-22
US7698312B2 true US7698312B2 (en) 2010-04-13

Family

ID=35515275

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/867,923 Active 2025-06-07 US7155446B2 (en) 2004-05-14 2004-06-14 Performing recursive database operations
US11/600,272 Active 2025-05-15 US7698312B2 (en) 2004-05-14 2006-11-14 Performing recursive database operations

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10/867,923 Active 2025-06-07 US7155446B2 (en) 2004-05-14 2004-06-14 Performing recursive database operations

Country Status (1)

Country Link
US (2) US7155446B2 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7962526B2 (en) * 2003-08-18 2011-06-14 Oracle International Corporation Frequent itemset counting using clustered prefixes and index support
US7720790B2 (en) * 2003-08-18 2010-05-18 Oracle International Corporation Dynamic selection of frequent itemset counting technique
US8655911B2 (en) * 2003-08-18 2014-02-18 Oracle International Corporation Expressing frequent itemset counting operations
US20070022274A1 (en) * 2005-06-29 2007-01-25 Roni Rosner Apparatus, system, and method of predicting and correcting critical paths
US9176995B2 (en) 2010-02-22 2015-11-03 International Business Machines Corporation Organization of data within a database
US8886593B2 (en) * 2011-02-01 2014-11-11 Siemens Product Lifecycle Management Software Inc. Controlled dispersion rates for transfer swarms
US20160313450A1 (en) * 2015-04-27 2016-10-27 Autoliv Asp, Inc. Automotive gnss real time kinematic dead reckoning receiver
US10360240B2 (en) * 2016-08-08 2019-07-23 International Business Machines Corporation Providing multidimensional attribute value information
US10311057B2 (en) 2016-08-08 2019-06-04 International Business Machines Corporation Attribute value information for a data extent
US11461323B2 (en) * 2019-06-28 2022-10-04 Visa International Service Association Techniques for efficient query processing

Citations (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5259066A (en) 1990-04-16 1993-11-02 Schmidt Richard Q Associative program control
US5615341A (en) * 1995-05-08 1997-03-25 International Business Machines Corporation System and method for mining generalized association rules in databases
US5724573A (en) * 1995-12-22 1998-03-03 International Business Machines Corporation Method and system for mining quantitative association rules in large relational tables
US5794209A (en) 1995-03-31 1998-08-11 International Business Machines Corporation System and method for quickly mining association rules in databases
US5842200A (en) * 1995-03-31 1998-11-24 International Business Machines Corporation System and method for parallel mining of association rules in databases
US6049797A (en) 1998-04-07 2000-04-11 Lucent Technologies, Inc. Method, apparatus and programmed medium for clustering databases with categorical attributes
US6061682A (en) * 1997-08-12 2000-05-09 International Business Machine Corporation Method and apparatus for mining association rules having item constraints
US6138117A (en) 1998-04-29 2000-10-24 International Business Machines Corporation Method and system for mining long patterns from databases
US6324533B1 (en) * 1998-05-29 2001-11-27 International Business Machines Corporation Integrated database and data-mining system
US20020073019A1 (en) 1989-05-01 2002-06-13 David W. Deaton System, method, and database for processing transactions
US6415287B1 (en) 2000-01-20 2002-07-02 International Business Machines Corporation Method and system for mining weighted association rule
US20020116457A1 (en) 2001-02-22 2002-08-22 John Eshleman Systems and methods for managing distributed database resources
US6453404B1 (en) 1999-05-27 2002-09-17 Microsoft Corporation Distributed data cache with memory allocation model
US6473757B1 (en) 2000-03-28 2002-10-29 Lucent Technologies Inc. System and method for constraint based sequential pattern mining
US6490582B1 (en) 2000-02-08 2002-12-03 Microsoft Corporation Iterative validation and sampling-based clustering using error-tolerant frequent item sets
US6507843B1 (en) 1999-08-14 2003-01-14 Kent Ridge Digital Labs Method and apparatus for classification of data by aggregating emerging patterns
US6567936B1 (en) 2000-02-08 2003-05-20 Microsoft Corporation Data clustering using error-tolerant frequent item sets
US6665669B2 (en) 2000-01-03 2003-12-16 Db Miner Technology Inc. Methods and system for mining frequent patterns
US6760718B2 (en) 2000-07-07 2004-07-06 Mitsubishi Denki Kabushiki Kaisha Database operation processor
US20040225742A1 (en) 2003-05-09 2004-11-11 Oracle International Corporation Using local locks for global synchronization in multi-node systems
US20050044087A1 (en) 2003-08-18 2005-02-24 Oracle International Corporation Dynamic selection of frequent itemset counting technique
US20050044094A1 (en) 2003-08-18 2005-02-24 Oracle International Corporation Expressing frequent itemset counting operations
US20050044062A1 (en) 2003-08-18 2005-02-24 Oracle International Corporation Frequent itemset counting using clustered prefixes and index support
US20050149540A1 (en) 2000-12-20 2005-07-07 Chan Wilson W.S. Remastering for asymmetric clusters in high-load scenarios
US6968335B2 (en) 2002-11-14 2005-11-22 Sesint, Inc. Method and system for parallel processing of database queries

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4601525A (en) * 1985-05-20 1986-07-22 Amp Incorporated Cover for chip carrier sockets
US6042412A (en) * 1998-08-28 2000-03-28 The Whitaker Corporation Land grid array connector assembly

Patent Citations (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020073019A1 (en) 1989-05-01 2002-06-13 David W. Deaton System, method, and database for processing transactions
US5259066A (en) 1990-04-16 1993-11-02 Schmidt Richard Q Associative program control
US5794209A (en) 1995-03-31 1998-08-11 International Business Machines Corporation System and method for quickly mining association rules in databases
US5842200A (en) * 1995-03-31 1998-11-24 International Business Machines Corporation System and method for parallel mining of association rules in databases
US5615341A (en) * 1995-05-08 1997-03-25 International Business Machines Corporation System and method for mining generalized association rules in databases
US5724573A (en) * 1995-12-22 1998-03-03 International Business Machines Corporation Method and system for mining quantitative association rules in large relational tables
US6061682A (en) * 1997-08-12 2000-05-09 International Business Machine Corporation Method and apparatus for mining association rules having item constraints
US6049797A (en) 1998-04-07 2000-04-11 Lucent Technologies, Inc. Method, apparatus and programmed medium for clustering databases with categorical attributes
US6138117A (en) 1998-04-29 2000-10-24 International Business Machines Corporation Method and system for mining long patterns from databases
US6324533B1 (en) * 1998-05-29 2001-11-27 International Business Machines Corporation Integrated database and data-mining system
US6453404B1 (en) 1999-05-27 2002-09-17 Microsoft Corporation Distributed data cache with memory allocation model
US6507843B1 (en) 1999-08-14 2003-01-14 Kent Ridge Digital Labs Method and apparatus for classification of data by aggregating emerging patterns
US6665669B2 (en) 2000-01-03 2003-12-16 Db Miner Technology Inc. Methods and system for mining frequent patterns
US6415287B1 (en) 2000-01-20 2002-07-02 International Business Machines Corporation Method and system for mining weighted association rule
US6490582B1 (en) 2000-02-08 2002-12-03 Microsoft Corporation Iterative validation and sampling-based clustering using error-tolerant frequent item sets
US6567936B1 (en) 2000-02-08 2003-05-20 Microsoft Corporation Data clustering using error-tolerant frequent item sets
US6473757B1 (en) 2000-03-28 2002-10-29 Lucent Technologies Inc. System and method for constraint based sequential pattern mining
US6760718B2 (en) 2000-07-07 2004-07-06 Mitsubishi Denki Kabushiki Kaisha Database operation processor
US20050149540A1 (en) 2000-12-20 2005-07-07 Chan Wilson W.S. Remastering for asymmetric clusters in high-load scenarios
US20020116457A1 (en) 2001-02-22 2002-08-22 John Eshleman Systems and methods for managing distributed database resources
US6968335B2 (en) 2002-11-14 2005-11-22 Sesint, Inc. Method and system for parallel processing of database queries
US20040225742A1 (en) 2003-05-09 2004-11-11 Oracle International Corporation Using local locks for global synchronization in multi-node systems
US20050044087A1 (en) 2003-08-18 2005-02-24 Oracle International Corporation Dynamic selection of frequent itemset counting technique
US20050044094A1 (en) 2003-08-18 2005-02-24 Oracle International Corporation Expressing frequent itemset counting operations
US20050044062A1 (en) 2003-08-18 2005-02-24 Oracle International Corporation Frequent itemset counting using clustered prefixes and index support

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
"The Authoritative Dictionary of IEEE Standards Terms Seventh Edition," IEEE Press, 2000. *
Current Claims in Canadian Patent Application No. 2, 448,050 (48 pgs).
Current Claims in European Patent Application No. 01968979.3-2212 (3 pgs).
Current Claims in PCT Patent Application No. PCT/US02/06981 (8 pgs.).
Office Action from Canadian Patent Application No. 2,448,050 dated Oct. 1, 2004 (2 pgs).
Office Action from European Patent Application No. 01968979.3-2212, dated Aug. 6, 2004 (3 pgs.).
Oracle Corporation, "Oracle® Data Mining, Concepts," 10g Release 1 (10.1), Part No. B10698-01, Dec. 2003, 118 pages.
Wei, Li et al., "Computing Frequent Itemsets Inside Oracle 10G," Proceedings of the 30th VLDB Conference, Toronto, Canada, Aug. 29, 2004, 4 pages.
Written Opinion from PCT Patent Application No. PCT/US02/06981 dated Oct. 3, 2004(8 pgs.).

Also Published As

Publication number Publication date
US20070067327A1 (en) 2007-03-22
US7155446B2 (en) 2006-12-26
US20060004807A1 (en) 2006-01-05

Similar Documents

Publication Publication Date Title
US7698312B2 (en) Performing recursive database operations
US7779008B2 (en) Parallel partition-wise aggregation
US6622138B1 (en) Method and apparatus for optimizing computation of OLAP ranking functions
US6430550B1 (en) Parallel distinct aggregates
US8433702B1 (en) Horizon histogram optimizations
US6205451B1 (en) Method and apparatus for incremental refresh of summary tables in a database system
US20180081939A1 (en) Techniques for dictionary based join and aggregation
US8315980B2 (en) Parallel execution of window functions
US20060161546A1 (en) Method for sorting data
US20110029557A1 (en) Techniques for partition pruning
US8352476B2 (en) Frequent itemset counting using clustered prefixes and index support
US11475006B2 (en) Query and change propagation scheduling for heterogeneous database systems
EP1738290A1 (en) Partial query caching
US10977280B2 (en) Systems and methods for memory optimization interest-driven business intelligence systems
CN108628898A (en) The method, apparatus and equipment of data loading
US20160092134A1 (en) Scalable, multi-dimensional search for optimal configuration
CN110309122A (en) Obtain method, apparatus, server and the storage medium of incremental data
CN110928900B (en) Multi-table data query method, device, terminal and computer storage medium
US6389410B1 (en) Method for minimizing the number of sorts required for a query block containing window functions
US20180060374A1 (en) Optimizing column based database table compression
US8655911B2 (en) Expressing frequent itemset counting operations
CN113468169B (en) Hardware database query method, database system query method and device
CN115599801A (en) Data query method, system, electronic equipment and storage medium
US7720790B2 (en) Dynamic selection of frequent itemset counting technique
CN110929207B (en) Data processing method, device and computer readable storage medium

Legal Events

Date Code Title Description
STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
FPAY Fee payment

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552)

Year of fee payment: 8

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12