US20130205009A1 - Overhead management for event tracing - Google Patents

Overhead management for event tracing Download PDF

Info

Publication number
US20130205009A1
US20130205009A1 US13/400,973 US201213400973A US2013205009A1 US 20130205009 A1 US20130205009 A1 US 20130205009A1 US 201213400973 A US201213400973 A US 201213400973A US 2013205009 A1 US2013205009 A1 US 2013205009A1
Authority
US
United States
Prior art keywords
trace
quota
application server
trace process
throughput
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/400,973
Inventor
Patrick Malloy
Peter Anthony CROSBY
Robert Meagher
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Riverbed Technology LLC
Opnet Technologies Inc
Original Assignee
Opnet Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US13/400,973 priority Critical patent/US20130205009A1/en
Application filed by Opnet Technologies Inc filed Critical Opnet Technologies Inc
Assigned to OPNET TECHNOLOGIES, INC. reassignment OPNET TECHNOLOGIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CROSBY, PETER ANTHONY, MALLOY, PATRICK, MEAGHER, ROBERT
Assigned to MORGAN STANLEY & CO. LLC reassignment MORGAN STANLEY & CO. LLC SECURITY AGREEMENT Assignors: OPNET TECHNOLOGIES, INC., RIVERBED TECHNOLOGY, INC.
Assigned to OPNET TECHNOLOGIES LLC reassignment OPNET TECHNOLOGIES LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: OPNET TECHNOLOGIES, INC.
Assigned to RIVERBED TECHNOLOGY, INC. reassignment RIVERBED TECHNOLOGY, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: OPNET TECHNOLOGIES LLC
Assigned to OPNET TECHNOLOGIES, INC. reassignment OPNET TECHNOLOGIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MALLOY, PATRICK, CROSBY, PETER ANTHONY, MEAGHER, ROBERT
Publication of US20130205009A1 publication Critical patent/US20130205009A1/en
Assigned to RIVERBED TECHNOLOGY, INC. reassignment RIVERBED TECHNOLOGY, INC. RELEASE OF PATENT SECURITY INTEREST Assignors: MORGAN STANLEY & CO. LLC, AS COLLATERAL AGENT
Assigned to JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT reassignment JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT PATENT SECURITY AGREEMENT Assignors: RIVERBED TECHNOLOGY, INC.
Assigned to RIVERBED TECHNOLOGY, INC. reassignment RIVERBED TECHNOLOGY, INC. RELEASE OF SECURITY INTEREST IN PATENTS Assignors: BARCLAYS BANK PLC
Assigned to RIVERBED TECHNOLOGY, INC. reassignment RIVERBED TECHNOLOGY, INC. CORRECTIVE ASSIGNMENT TO CORRECT THE CONVEYING PARTY NAME PREVIOUSLY RECORDED ON REEL 035521 FRAME 0069. ASSIGNOR(S) HEREBY CONFIRMS THE RELEASE OF SECURITY INTEREST IN PATENTS. Assignors: JPMORGAN CHASE BANK, N.A.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/14Arrangements for monitoring or testing data switching networks using software, i.e. software packages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3495Performance evaluation by tracing or monitoring for systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • H04L67/025Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/86Event-based monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/87Monitoring of transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor

Definitions

  • Application performance management relates to technologies and systems for monitoring and managing the performance of applications. For example, application performance management is commonly used to monitor and manage transactions performed by an application running on a server to a client.
  • a transaction typically comprises a sequence of method calls in a program that represent a complete set of operations necessary to perform a self-contained unit of work, such as a web request or a database query. Transactions can be traced to monitor and manage their performance. For example, a trace can be performed in an application server to obtain detailed information about the execution of an application within that server.
  • Java or .NET instrumentation components are running (on the application server, the client, etc.) and write records of all of the method calls of a transaction to a transaction trace file.
  • Such tracing must be initiated manually or triggered by a program condition and for only a limited period of time. It is necessary to limit trace duration and detail in the conventional systems because the act of tracing is relatively expensive and could negatively impact performance and disk space of the server, the client, etc.
  • FIG. 1 illustrates an exemplary system in accordance with the principles of the present invention.
  • FIG. 2 illustrates an exemplary process flow in accordance with the principles of the present invention.
  • FIG. 1 illustrates an exemplary system to support a multi-tier application and an application performance management system.
  • the system 100 may comprise a set of clients 102 , a web server 104 , application servers 106 , a database server 108 , a database 110 , and application performance management system 112 .
  • the application performance management system 112 may comprise a collector 114 , a monitoring server 116 , and a monitoring database 118 .
  • the application performance management system 112 may also be accessed via a monitoring client 120 .
  • Clients 102 refer to any device requesting and accessing services of applications provided by system 100 .
  • Clients 102 may be implemented using known hardware and software, such as a processor, a memory, communication interfaces, an operating system, application software, etc.
  • clients 102 may be implemented on a personal computer, a laptop computer, a tablet computer, a smart phone, and the like. Such devices are known to those skilled in the art and may be employed in one embodiment.
  • the clients 102 may access various applications based on client software running or installed on the clients 102 .
  • the clients 102 may execute a thick client, a thin client, or hybrid client.
  • the clients 102 may access applications via a thin client, such as a browser application like Internet Explore, Firefox, etc.
  • Programming for these thin clients may include, for example, JavaScript/AJX, JSP, ASP, PHP, Flash, Siverlight, and others.
  • Such browsers and programming code are known to those skilled in the art.
  • Application servers 106 provide a hardware and software environment on which the applications of system 1000 may execute.
  • applications servers 106 may be implemented based as Java Application Servers, Windows Server implement a .NET framework, LINUX, UNIX, WebSphere, etc. running on known hardware platforms.
  • Application servers 106 may be implemented on the same hardware platform as the web server 104 , or as shown in FIG. 1 , they may be implemented on their own hardware.
  • Database server 108 provides database services to database 110 for transactions and queries requested by clients 102 .
  • Database server 108 may be implemented using known hardware and software, such as a processor, a memory, communication interfaces, an operating system, etc.
  • database server 108 may be implemented based on Oracle, DB2, Ingres, SQL Server, MySQL, and etc. software running on the server 108 .
  • Database 110 represents the storage infrastructure for data and information requested by clients 102 .
  • Database 110 may be implemented using known hardware and software.
  • database 110 may be implemented as relational database based on known database management systems, such as SQL, MySQL, etc.
  • Database 110 may also comprise other types of databases, such as, object oriented databases, XML databases, and so forth.
  • Application performance management system 112 represents the hardware and software used for monitoring and managing the applications provided by system 100 . As shown, application performance management system 112 may comprise a collector 114 , a monitoring server 116 , a monitoring database 118 , a monitoring client 120 , and agents 122 . These components will now be further described.
  • Collector 114 collects application performance information from the components of system 100 .
  • collector 114 may receive information from clients 102 , web server 104 , application servers 106 , database server 108 , and network 124 .
  • the application performance information may comprise a variety of information, such as trace files, system logs, etc.
  • Collector 114 may be implemented using known hardware and software, such as a processor, a memory, communication interfaces, an operating system, etc.
  • collector 114 may be implemented as software running on a general-purpose server.
  • collector 114 may be implemented as an appliance or virtual machine running on a server.
  • Monitoring server 116 hosts the application performance management system. Monitoring server 116 may be implemented using known hardware and software, such as a processor, a memory, communication interfaces, an operating system, etc. Monitoring server 116 may be implemented as software running on a general-purpose server. Alternatively, monitoring server 116 may be implemented as an appliance or virtual machine running on a server.
  • Monitoring client 120 serves as an interface for accessing monitoring server 116 .
  • monitoring client 120 may be implemented as a personal computer running an application or web browser accessing the monitoring server 120 .
  • the user-defined target data rate applies communally on a per-system basis to all processes for which continuous tracing has been enabled. Based on the communal rate, the agents 122 may set quotas for the individual contributing processes.
  • the amount of data being written communally by agents 122 is measured based on a time interval.
  • the agents 122 may measure the communal data rate every 30 seconds, 1 minute, 2 minutes, etc.
  • the agents 122 may then adjust the level of transaction method call detail written to a transaction trace file to ensure these targets are met. If the current data rate is low enough, the agents 122 allows every detail of each method call, including information tags known as properties.
  • a property is a pair of strings comprising a name and a value. The name of a property derives from a set of strings that identify characteristics, such as method arguments, environment settings at the time of a call, etc., to be associated with each specific method call of a transaction. For example, properties such as SQL statements, database URLs, HTTP methods, etc. may be traced in the embodiments. If, however, the data rate of trace data written by agents 122 becomes excessive, the agents 122 will omit some property details, or even some method call events themselves, from the transaction trace file.
  • FIG. 2 illustrates an exemplary process flow for continuous tracing.
  • FIG. 2 provides an example of continuous tracing by an agent 122 monitoring one of applications servers 106 .
  • agents 122 may also be employed in other components or portions of the system 100 .
  • the agent 122 determines a communal data rate and quotas based on the target data rate for various processes running on the application server 106 .
  • the agent 122 may divide the communal data rate in various ways to determine individual quotas for the processes. For example, the agent 122 may divide the communal data rate evenly among the current processes running on application server 106 . Alternatively, the agent 122 may individually set different quota rates for different processes based on their characteristics, such as process type, duration, etc.
  • the agent 122 modulates the level of detail written by the individual processes for tracing.
  • the agent 122 employs a token-bucket algorithm based on a process quota to modulate the level of detail written by each individual process depending on its traffic level.
  • the token-bucket algorithm is a mechanism that monitors some rate of resource usage or data transfer. The bucket is initially holding a specific number of tokens.
  • the agent 122 assigns each of the values from 0 to 10 two filter values, a number and a set of properties.
  • the number and the property set are used by the agent 122 to restrict the amount of detailed information about each method call that is to be written to the trace file during the period, until the next threshold value is computed at the start of the next period.
  • the agent 122 may enforce other policies to ensure compliance with the communal data rate.
  • the agent 122 may throttle the one or more processes, cap the data rate of one or more processes, and the like.
  • the throughput manager in agent 122 may implement any form of scheduling and policing algorithm.
  • the agent 122 may write the current detail level to the trace file when it changes.

Abstract

The present invention relates to managing data generated by software transactions, such as event tracing software. In one embodiment, data generated by event tracing software is monitored. The throughput of the data generated may then be modulated based on various criteria, such as a target data rate. The throughput target may be specified on a per-system basis or individual basis. Based on the throughput, the level of detail recorded is modulated. Individual processes may determine a limit or quota depending on their contribution to the throughput. In one embodiment, the method calls for a trace are modified with different property specifications to meet a desired throughput of event tracing data.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application claims the benefit of priority of U.S. Provisional Application No. 61/439,658 filed Feb. 4, 2011, and U.S. Non-Provisional application Ser. No. 13/365,496 filed Feb. 3, 2012, entitled “Overhead Management for Event Tracing Software,” which is incorporated by reference in its entirety.
  • BACKGROUND
  • Application performance management relates to technologies and systems for monitoring and managing the performance of applications. For example, application performance management is commonly used to monitor and manage transactions performed by an application running on a server to a client.
  • Today, many applications can be accessed over a network, such as the Internet or intranet. For example, due to the ubiquity of web browsers on most client devices, web applications have become particularly popular. Web applications typically employ a browser-supported infrastructure, such as Java or a .NET framework. However, the performance of these types of applications is difficult to monitor and manage because of the complexity of the software and hardware and numerous components that may be involved.
  • A transaction typically comprises a sequence of method calls in a program that represent a complete set of operations necessary to perform a self-contained unit of work, such as a web request or a database query. Transactions can be traced to monitor and manage their performance. For example, a trace can be performed in an application server to obtain detailed information about the execution of an application within that server.
  • In a traditional transaction trace for web applications, Java or .NET instrumentation components are running (on the application server, the client, etc.) and write records of all of the method calls of a transaction to a transaction trace file. Such tracing must be initiated manually or triggered by a program condition and for only a limited period of time. It is necessary to limit trace duration and detail in the conventional systems because the act of tracing is relatively expensive and could negatively impact performance and disk space of the server, the client, etc.
  • Unfortunately, this means that in many circumstances the execution of an application within a system cannot be diagnosed or monitored.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The invention is explained in further detail, and by way of example, with reference to the accompanying drawings wherein:
  • FIG. 1 illustrates an exemplary system in accordance with the principles of the present invention.
  • FIG. 2 illustrates an exemplary process flow in accordance with the principles of the present invention.
  • Throughout the drawings, the same reference numerals indicate similar or corresponding features or functions. The drawings are included for illustrative purposes and are not intended to limit the scope of the invention.
  • DETAILED DESCRIPTION
  • The embodiments relate to monitoring and managing applications, such as web applications running via the hardware and software in a web infrastructure. In particular, the embodiments provide a framework for tracing as many transactions as possible in real-time. The framework may support continuous tracing, periodic, or on-demand tracing. In one embodiment, whenever possible, the application performance management systems and methods will attempt to trace every call in every transaction. In one embodiment, a throughput manager manages the tradeoff between performance and completeness of detail harvested by the tracing process or continuous tracing process, while maintaining a low overhead and minimizing impact on the system's performance.
  • In the following description, for purposes of explanation rather than limitation, specific details are set forth such as the particular architecture, interfaces, techniques, etc., in order to provide an understanding of the concepts of the invention. However, it will be apparent to those skilled in the art that the present invention may be practiced in other embodiments, which depart from these specific details.
  • Certain embodiments of the inventions will now be described. These embodiments are presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms. Furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the inventions. For example, for purposes of simplicity and clarity, detailed descriptions of well-known components, such as circuits, are omitted so as not to obscure the description of the present invention with unnecessary detail. To illustrate some of the embodiments, reference will now be made to the figures.
  • FIG. 1 illustrates an exemplary system to support a multi-tier application and an application performance management system. As shown, the system 100 may comprise a set of clients 102, a web server 104, application servers 106, a database server 108, a database 110, and application performance management system 112. The application performance management system 112 may comprise a collector 114, a monitoring server 116, and a monitoring database 118. The application performance management system 112 may also be accessed via a monitoring client 120. These components will now be further described.
  • Clients 102 refer to any device requesting and accessing services of applications provided by system 100. Clients 102 may be implemented using known hardware and software, such as a processor, a memory, communication interfaces, an operating system, application software, etc. For example, clients 102 may be implemented on a personal computer, a laptop computer, a tablet computer, a smart phone, and the like. Such devices are known to those skilled in the art and may be employed in one embodiment.
  • The clients 102 may access various applications based on client software running or installed on the clients 102. The clients 102 may execute a thick client, a thin client, or hybrid client. For example, the clients 102 may access applications via a thin client, such as a browser application like Internet Explore, Firefox, etc. Programming for these thin clients may include, for example, JavaScript/AJX, JSP, ASP, PHP, Flash, Siverlight, and others. Such browsers and programming code are known to those skilled in the art.
  • Alternatively, the clients 102 may execute a thick client, such as a stand-alone application, installed on the clients 102. Programming for thick clients may be based on the .NET framework, Java, Visual Studio, etc.
  • Web server 104 provides content for the applications of system 100 over a network, such as network 124. Web server 104 may be implemented using known hardware and software, such as a processor, a memory, communication interfaces, an operating system, etc. to deliver application content. For example, web server 104 may deliver content via HTML pages and employ various IP protocols, such as HTTP.
  • Application servers 106 provide a hardware and software environment on which the applications of system 1000 may execute. In one embodiment, applications servers 106 may be implemented based as Java Application Servers, Windows Server implement a .NET framework, LINUX, UNIX, WebSphere, etc. running on known hardware platforms. Application servers 106 may be implemented on the same hardware platform as the web server 104, or as shown in FIG. 1, they may be implemented on their own hardware.
  • In one embodiment, applications servers 106 may provide various applications, such as mail, word processors, spreadsheets, point-of-sale, multimedia, etc. Application servers 106 may perform various transaction related to requests by the clients 102. In addition, application servers 106 may interface with the database server 108 and database 110 on behalf of clients 102, implement business logic for the applications, and other functions known to those skilled in the art.
  • Database server 108 provides database services to database 110 for transactions and queries requested by clients 102. Database server 108 may be implemented using known hardware and software, such as a processor, a memory, communication interfaces, an operating system, etc. For example, database server 108 may be implemented based on Oracle, DB2, Ingres, SQL Server, MySQL, and etc. software running on the server 108.
  • Database 110 represents the storage infrastructure for data and information requested by clients 102. Database 110 may be implemented using known hardware and software. For example, database 110 may be implemented as relational database based on known database management systems, such as SQL, MySQL, etc. Database 110 may also comprise other types of databases, such as, object oriented databases, XML databases, and so forth.
  • Application performance management system 112 represents the hardware and software used for monitoring and managing the applications provided by system 100. As shown, application performance management system 112 may comprise a collector 114, a monitoring server 116, a monitoring database 118, a monitoring client 120, and agents 122. These components will now be further described.
  • Collector 114 collects application performance information from the components of system 100. For example, collector 114 may receive information from clients 102, web server 104, application servers 106, database server 108, and network 124. The application performance information may comprise a variety of information, such as trace files, system logs, etc. Collector 114 may be implemented using known hardware and software, such as a processor, a memory, communication interfaces, an operating system, etc. For example, collector 114 may be implemented as software running on a general-purpose server. Alternatively, collector 114 may be implemented as an appliance or virtual machine running on a server.
  • Monitoring server 116 hosts the application performance management system. Monitoring server 116 may be implemented using known hardware and software, such as a processor, a memory, communication interfaces, an operating system, etc. Monitoring server 116 may be implemented as software running on a general-purpose server. Alternatively, monitoring server 116 may be implemented as an appliance or virtual machine running on a server.
  • Monitoring database 118 provides a storage infrastructure for storing the application performance information processed by the monitoring server 116. Monitoring database 118 may be implemented using known hardware and software, such as a processor, a memory, communication interfaces, an operating system, etc.
  • Monitoring client 120 serves as an interface for accessing monitoring server 116. For example, monitoring client 120 may be implemented as a personal computer running an application or web browser accessing the monitoring server 120.
  • Agents 122 serve as instrumentation for the application performance management system. As shown, the agents 122 may be distributed and running on the various components of system 100. Agents 122 may be implemented as software running on the components or may be a hardware device coupled to the component. For example, agents 122 may implement monitoring instrumentation for Java and .NET framework applications. In one embodiment, the agents 122 implement, among other things, tracing of method calls for various transactions. In particular, in one embodiment, agents 122 may interface known tracing configurations provided by Java and the .NET framework to enable tracing periodically, continuously, or in response to various events and to modulate the level of detail of the tracing.
  • In one embodiment, the agents 122 may implement or comprise a throughput manager to allow for continuous tracing of the node or entity being monitored, such as clients 102 or application server 106. As noted, conventional tracing on a server, such as application server 106, must be initiated manually or triggered by a program condition and for only a limited period of time. Conventionally, it is considered necessary to limit trace duration and detail because the act of tracing is relatively expensive and could negatively impact performance and disk space of the application server 106.
  • In contrast, the embodiments permit continuous, rather than intermittent, tracing of an entity. The continuous tracing may be performed for various durations. In addition, in the embodiments, the continuous tracing may be temporarily suspended. However, in one embodiment, the throughput manager in agents 122 may continue to run and re-initiate tracing when system performance allows. For example, in one embodiment, the agents 122 automatically modulate the level of detail written to meet a set of throughput goals set by the user. In one embodiment, the user, for example via monitoring client 122, may set a target data rate, such as in kilobytes per second, and a maximum amount of disk space to be used by agents 122.
  • In one embodiment, the user-defined target data rate applies communally on a per-system basis to all processes for which continuous tracing has been enabled. Based on the communal rate, the agents 122 may set quotas for the individual contributing processes.
  • In one embodiment, the amount of data being written communally by agents 122 is measured based on a time interval. For example, the agents 122 may measure the communal data rate every 30 seconds, 1 minute, 2 minutes, etc.
  • Based on communal data rate measured, the agents 122 may then adjust the level of transaction method call detail written to a transaction trace file to ensure these targets are met. If the current data rate is low enough, the agents 122 allows every detail of each method call, including information tags known as properties. A property is a pair of strings comprising a name and a value. The name of a property derives from a set of strings that identify characteristics, such as method arguments, environment settings at the time of a call, etc., to be associated with each specific method call of a transaction. For example, properties such as SQL statements, database URLs, HTTP methods, etc. may be traced in the embodiments. If, however, the data rate of trace data written by agents 122 becomes excessive, the agents 122 will omit some property details, or even some method call events themselves, from the transaction trace file.
  • Network 124 serves as a communications infrastructure for the system 100. Network 124 may comprise various known network elements, such as routers, firewalls, hubs, switches, etc. In one embodiment, network 124 may support various communications protocols, such as TCP/IP. Network 124 may refer to any scale of network, such as a local area network, a metropolitan area network, a wide area network, the Internet, etc.
  • FIG. 2 illustrates an exemplary process flow for continuous tracing. For purposes of illustration, FIG. 2 provides an example of continuous tracing by an agent 122 monitoring one of applications servers 106. Those skilled in the art will recognize that the continuous tracing by agents 122 may also be employed in other components or portions of the system 100.
  • Referring now to FIG. 1, in phase 200, the agent 122 receives a target data rate. The target data rate may be provided to the agent 122 via a variety of ways. For example, a user may access monitoring client 120 and specify a desired target data rate for continuously monitoring one of application servers 106. Monitoring server 118 may then communicate this data rate to agent 122 via network 124. Alternatively, a user may directly access agent 122 locally, for example, on application server 106 via a command interface or other interface provided by agent 122.
  • In phase 202, the agent 122 determines a communal data rate and quotas based on the target data rate for various processes running on the application server 106. In one embodiment, the agent 122 may divide the communal data rate in various ways to determine individual quotas for the processes. For example, the agent 122 may divide the communal data rate evenly among the current processes running on application server 106. Alternatively, the agent 122 may individually set different quota rates for different processes based on their characteristics, such as process type, duration, etc.
  • In phase 204, the agent 122 monitors the data rate consumed, such as traffic level, by the processes in relation to their quota for tracing. For example, the agent 122 may monitor CPU time or cycles, number of bytes written to disk space, and the like.
  • In phase 206, the agent 122 modulates the level of detail written by the individual processes for tracing. In one embodiment, the agent 122 employs a token-bucket algorithm based on a process quota to modulate the level of detail written by each individual process depending on its traffic level. The token-bucket algorithm is a mechanism that monitors some rate of resource usage or data transfer. The bucket is initially holding a specific number of tokens.
  • Each time a defined quantum of a resource is used (such as a number of bytes written to a disk, or number of CPU cycles used to process data) during a fixed period, a token is removed from the bucket. Each time a resource is not used during the same period, a token is added to the bucket, until the token count is restored to its initial level.
  • Accordingly, the number of tokens remaining in the bucket fluctuates between zero and a maximum, such as the initial number assigned. In one embodiment, agent 122 can use the percentage of tokens remaining compared to the initial number of tokens as a level of activity or throughput regulator.
  • At the start of each interval, the throughput manager of the agent 122 uses the current bucket token count at the start of the interval to compute a threshold value, for example, from 0 to 10, where 0=all tokens present in the bucket, and 10=no tokens present in the bucket. In other words, these values represent a percentage of the maximum number of tokens that remain in the bucket, truncated to the nearest 10 percent.
  • In one embodiment, the agent 122 assigns each of the values from 0 to 10 two filter values, a number and a set of properties. The number and the property set are used by the agent 122 to restrict the amount of detailed information about each method call that is to be written to the trace file during the period, until the next threshold value is computed at the start of the next period.
  • If the data rate is too high, such as fewer tokens are available in the bucket, the throughput manager in agent 122 omits lower duration method calls. Properties, which are name-value pairs such as SQL statements, database URLs, etc., may also be omitted if the data rate is excessive. Thus, the agent 122 can continuously determine a detail level based on the quota and traffic level.
  • For purposes of illustration, a simplified example is provided below showing the bucket divided into three levels, such as 0, 5, and 10. For level 0, there is no minimum method duration, and thus, the agent 122 traces all method calls.
  • As also shown, three possible properties, A B and C may be specified by the agent 122 to indicate an allowable level of detail. In the simplified example shown, property A is always written to a trace. Property B, however, is written for levels 0 through 5, and property C is only written for level 0.
  • Throughput Level Minimum call duration Properties to write
    0  0 (all calls may be written) A, B, C
    5  5 microseconds A, B
    10 20 microseconds A
  • When agent 122 restricts the data written in any given period of time, the trace traffic of the processes is thus lowered, and the token bucket maintained by agent 122 replenishes tokens, eventually permitting more data to be written in future periods.
  • When modulating the data rate, in one embodiment, the agent 122 may enforce a stepped policy that omits method calls and calls to children based on call duration versus current throughput level. For method calls that are permitted to be written, the subset of associated property name-value pairs is also selected from a set that is defined for each throughput level. In other words, in one embodiment, each data filtering or permitted detail output level corresponds to a minimum call duration and the set of property-value pairs to include. The highest detail level includes all items. In one embodiment, the lowest level of detail may correspond to various levels that minimize impact to system performance. For example, the lowest level of detail may correspond to tracing being temporarily suspended.
  • In other embodiments, the agent 122 may enforce other policies to ensure compliance with the communal data rate. For example, the agent 122 may throttle the one or more processes, cap the data rate of one or more processes, and the like. The throughput manager in agent 122 may implement any form of scheduling and policing algorithm.
  • In one embodiment, to permit a consumer of the transaction trace file to observe the effects of modulating the detail level, the agent 122 may write the current detail level to the trace file when it changes.
  • The foregoing merely illustrates the principles of the invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements which, although not explicitly described or shown herein, embody the principles of the invention and are thus within its spirit and scope. Other system configuration and optimization features will be evident to one of ordinary skill in the art in view of this disclosure, and are included within the scope of the following claims.
  • The features and attributes of the specific embodiments disclosed above may be combined in different ways to form additional embodiments, all of which fall within the scope of the present disclosure. Although the present disclosure provides certain embodiments and applications, other embodiments that are apparent to those of ordinary skill in the art, including embodiments, which do not provide all of the features and advantages set forth herein, are also within the scope of this disclosure. Accordingly, the scope of the present disclosure is intended to be defined only by reference to the appended claims.

Claims (16)

1. A method of modulating an amount of resources used by a trace process running on an application server to record transaction data, wherein the application server is one of a plurality of systems running trace processes, said method comprising:
receiving, for the application server, a target data rate that is a portion of a communal limit applied collectively to the plurality of systems running trace processes;
determining a quota for a trace process running on the application server based on the target data rate;
determining an amount of resources used by the trace process to record transaction data; and
modulating the trace process based on the amount of resources used by the application in relation to the quota and in relation to the communal limit.
2. The method of claim 1, wherein receiving the target data rate comprises receiving a requested data rate from a user.
3. The method of claim 1, wherein determining the quota for the trace process comprises determining a quota for a number of bytes written to a storage.
4. The method of claim 1, wherein determining the quota for the trace process comprises determining a quota for a number processor cycles.
5. The method of claim 1, wherein determining the quota for the trace process comprises determining a quota for a maximum amount of storage space that may be used by the trace process.
6. The method of claim 1, wherein modulating the trace process based on the amount of resources used in relation to the quota comprises modulating the method calls traced by the trace process.
7. The method of claim 1, wherein modulating the trace process based on the amount of resources used in relation to the quota comprises tracing method calls having at least a minimum call duration.
8. The method of claim 1, wherein modulating the trace process based on the amount of resources used in relation to the quota comprises tracing method calls having a set of selected properties.
9. An application server configured to trace transactions serviced by the application server and modulate the resources consumed by tracing the transactions, wherein the application server is one of a plurality of systems tracing transactions, said application server comprising:
at least one processor configured to execute program code for at least one application, at least one trace process, and a throughput manager;
wherein the at least one application, running on the application server, performs transactions requested of the at least one application;
wherein the at least one trace process, running on the application server, records trace information related to the transactions;
a storage for storing trace information recorded by the at least one trace process; and
wherein the throughput manager is configured to receive a requested limit for the at least one trace process, wherein the requested limit is a portion of a communal limit applied collectively to the plurality of systems tracing transactions, determine a quota for the at least one process, and modulate the at least one trace process based on resources of the application server consumed by the at least one process in relation to the quota and in relation to the communal limit.
10. The application server of claim 9, wherein the throughput manager is configured to determine a quota for a number of bytes written by the at least one trace process to the storage.
11. The application server of claim 9, wherein the throughput manager is configured to determine a quota for a number processor cycles used to service the at least one trace process.
12. The application server of claim 9, wherein the throughput manager is configured to determine a quota for a maximum amount of storage space that may be used by the trace process.
13. The application server of claim 9, wherein the throughput manager is configured to modulate the method calls traced by the trace process.
14. The application server of claim 9, wherein the throughput manager is configured to trace method calls only having at least a minimum call duration.
15. The application server of claim 9, wherein the throughput manager is configured to trace method calls only having a set of selected properties.
16. A method for modulating an amount of data recorded by a first system for a plurality of transactions performed by the first system, wherein the first system is one of a plurality of systems recording transactions, said method comprising:
monitoring a data activity rate by the first system relative to a communal limit that is applied collectively to the plurality of systems recording transactions;
determining a throughput level for the first system based on the data activity rate, wherein the throughput level is associated with a minimum call duration and a set of property types; and
recording one or more select method calls and one or more select properties by the first system based on modulating the throughput level of the first system in relation to the communal limit, wherein the one or more select method calls are selected based on the associated minimum call duration, and wherein the one or more select properties are selected based on the set of property types.
US13/400,973 2011-02-04 2012-02-21 Overhead management for event tracing Abandoned US20130205009A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/400,973 US20130205009A1 (en) 2011-02-04 2012-02-21 Overhead management for event tracing

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201161439658P 2011-02-04 2011-02-04
US13/365,496 US9137136B2 (en) 2011-02-04 2012-02-03 Overhead management for event tracing
US13/400,973 US20130205009A1 (en) 2011-02-04 2012-02-21 Overhead management for event tracing

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/365,496 Continuation US9137136B2 (en) 2011-02-04 2012-02-03 Overhead management for event tracing

Publications (1)

Publication Number Publication Date
US20130205009A1 true US20130205009A1 (en) 2013-08-08

Family

ID=45582066

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/365,496 Active 2032-06-07 US9137136B2 (en) 2011-02-04 2012-02-03 Overhead management for event tracing
US13/400,973 Abandoned US20130205009A1 (en) 2011-02-04 2012-02-21 Overhead management for event tracing

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/365,496 Active 2032-06-07 US9137136B2 (en) 2011-02-04 2012-02-03 Overhead management for event tracing

Country Status (3)

Country Link
US (2) US9137136B2 (en)
EP (1) EP2671362A2 (en)
WO (1) WO2012106571A2 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140019598A1 (en) * 2013-01-25 2014-01-16 Concurix Corporation Tracing with a Workload Distributor
US9021262B2 (en) 2013-01-25 2015-04-28 Concurix Corporation Obfuscating trace data
US9137136B2 (en) 2011-02-04 2015-09-15 Riverbed Technology, Inc. Overhead management for event tracing
US9207969B2 (en) 2013-01-25 2015-12-08 Microsoft Technology Licensing, Llc Parallel tracing for performance and detail
CN105277191A (en) * 2014-06-30 2016-01-27 卡西欧计算机株式会社 Electronic device and arrival determination method
US9575874B2 (en) 2013-04-20 2017-02-21 Microsoft Technology Licensing, Llc Error list and bug report analysis for configuring an application tracer
US9658936B2 (en) 2013-02-12 2017-05-23 Microsoft Technology Licensing, Llc Optimization analysis using similar frequencies
US9665474B2 (en) 2013-03-15 2017-05-30 Microsoft Technology Licensing, Llc Relationships derived from trace data
US9767006B2 (en) 2013-02-12 2017-09-19 Microsoft Technology Licensing, Llc Deploying trace objectives using cost analyses
US9772927B2 (en) 2013-11-13 2017-09-26 Microsoft Technology Licensing, Llc User interface for selecting tracing origins for aggregating classes of trace data
US9804949B2 (en) 2013-02-12 2017-10-31 Microsoft Technology Licensing, Llc Periodicity optimization in an automated tracing system
US9864672B2 (en) 2013-09-04 2018-01-09 Microsoft Technology Licensing, Llc Module specific tracing in a shared module environment

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9639446B2 (en) * 2009-12-21 2017-05-02 International Business Machines Corporation Trace monitoring
US9323863B2 (en) 2013-02-01 2016-04-26 Microsoft Technology Licensing, Llc Highlighting of time series data on force directed graph
US9256969B2 (en) 2013-02-01 2016-02-09 Microsoft Technology Licensing, Llc Transformation function insertion for dynamically displayed tracer data
US9021447B2 (en) 2013-02-12 2015-04-28 Concurix Corporation Application tracing by distributed objectives
US8843901B2 (en) * 2013-02-12 2014-09-23 Concurix Corporation Cost analysis for selecting trace objectives
US9734040B2 (en) 2013-05-21 2017-08-15 Microsoft Technology Licensing, Llc Animated highlights in a graph representing an application
US8990777B2 (en) 2013-05-21 2015-03-24 Concurix Corporation Interactive graph for navigating and monitoring execution of application code
US9280841B2 (en) 2013-07-24 2016-03-08 Microsoft Technology Licensing, Llc Event chain visualization of performance data
US9912570B2 (en) * 2013-10-25 2018-03-06 Brocade Communications Systems LLC Dynamic cloning of application infrastructures
WO2015071777A1 (en) 2013-11-13 2015-05-21 Concurix Corporation Software component recommendation based on multiple trace runs
US9483378B2 (en) * 2014-05-21 2016-11-01 Dynatrace Llc Method and system for resource monitoring of large-scale, orchestrated, multi process job execution environments
US9654483B1 (en) * 2014-12-23 2017-05-16 Amazon Technologies, Inc. Network communication rate limiter
US10296436B2 (en) 2016-09-08 2019-05-21 International Business Machines Corporation Adjusting trace points based on overhead analysis
US11307958B2 (en) * 2018-09-19 2022-04-19 International Business Machines Corporation Data collection in transaction problem diagnostic
US10592378B1 (en) 2018-10-04 2020-03-17 Ca, Inc. Managing monitoring feature overhead
WO2020150642A1 (en) * 2019-01-18 2020-07-23 GalaxE.Solutions, Inc. Systems and methods for transaction tracing within an it environment
US11438239B2 (en) * 2020-06-22 2022-09-06 New Relic, Inc. Tail-based span data sampling
US20230078122A1 (en) * 2021-09-07 2023-03-16 Elasticsearch B.V. Distributed network data management systems and methods

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040267897A1 (en) * 2003-06-24 2004-12-30 Sychron Inc. Distributed System Providing Scalable Methodology for Real-Time Control of Server Pools and Data Centers
US20080008095A1 (en) * 2006-07-10 2008-01-10 International Business Machines Corporation Method for Distributed Traffic Shaping across a Cluster

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6775824B1 (en) 2000-01-12 2004-08-10 Empirix Inc. Method and system for software object testing
US7870431B2 (en) * 2002-10-18 2011-01-11 Computer Associates Think, Inc. Transaction tracer
US20040148237A1 (en) * 2003-01-29 2004-07-29 Msafe Ltd. Real time management of a communication network account
US20050223366A1 (en) * 2004-03-30 2005-10-06 Tonic Solutions, Inc. System and methods for transaction tracing
US7827539B1 (en) * 2004-06-25 2010-11-02 Identify Software Ltd. System and method for automated tuning of program execution tracing
US7761875B2 (en) * 2005-06-10 2010-07-20 Hewlett-Packard Development Company, L.P. Weighted proportional-share scheduler that maintains fairness in allocating shares of a resource to competing consumers when weights assigned to the consumers change
US7716335B2 (en) * 2005-06-27 2010-05-11 Oracle America, Inc. System and method for automated workload characterization of an application server
US20100229218A1 (en) * 2009-03-05 2010-09-09 Microsoft Corporation Quota management for network services
US9529694B2 (en) 2009-09-14 2016-12-27 Oracle International Corporation Techniques for adaptive trace logging
US8660022B2 (en) 2009-11-16 2014-02-25 International Business Machines Corporation Adaptive remote decision making under quality of information requirements
EP2671362A2 (en) 2011-02-04 2013-12-11 OPNET Technologies, Inc. Overhead management for event tracing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040267897A1 (en) * 2003-06-24 2004-12-30 Sychron Inc. Distributed System Providing Scalable Methodology for Real-Time Control of Server Pools and Data Centers
US20080008095A1 (en) * 2006-07-10 2008-01-10 International Business Machines Corporation Method for Distributed Traffic Shaping across a Cluster

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9137136B2 (en) 2011-02-04 2015-09-15 Riverbed Technology, Inc. Overhead management for event tracing
US20140019598A1 (en) * 2013-01-25 2014-01-16 Concurix Corporation Tracing with a Workload Distributor
US8954546B2 (en) * 2013-01-25 2015-02-10 Concurix Corporation Tracing with a workload distributor
US9021262B2 (en) 2013-01-25 2015-04-28 Concurix Corporation Obfuscating trace data
US9207969B2 (en) 2013-01-25 2015-12-08 Microsoft Technology Licensing, Llc Parallel tracing for performance and detail
US10178031B2 (en) 2013-01-25 2019-01-08 Microsoft Technology Licensing, Llc Tracing with a workload distributor
US9767006B2 (en) 2013-02-12 2017-09-19 Microsoft Technology Licensing, Llc Deploying trace objectives using cost analyses
US9658936B2 (en) 2013-02-12 2017-05-23 Microsoft Technology Licensing, Llc Optimization analysis using similar frequencies
US9804949B2 (en) 2013-02-12 2017-10-31 Microsoft Technology Licensing, Llc Periodicity optimization in an automated tracing system
US9665474B2 (en) 2013-03-15 2017-05-30 Microsoft Technology Licensing, Llc Relationships derived from trace data
US9575874B2 (en) 2013-04-20 2017-02-21 Microsoft Technology Licensing, Llc Error list and bug report analysis for configuring an application tracer
US9864672B2 (en) 2013-09-04 2018-01-09 Microsoft Technology Licensing, Llc Module specific tracing in a shared module environment
US9772927B2 (en) 2013-11-13 2017-09-26 Microsoft Technology Licensing, Llc User interface for selecting tracing origins for aggregating classes of trace data
CN105277191A (en) * 2014-06-30 2016-01-27 卡西欧计算机株式会社 Electronic device and arrival determination method

Also Published As

Publication number Publication date
US20130145015A1 (en) 2013-06-06
WO2012106571A2 (en) 2012-08-09
WO2012106571A3 (en) 2012-09-20
US9137136B2 (en) 2015-09-15
EP2671362A2 (en) 2013-12-11

Similar Documents

Publication Publication Date Title
US9137136B2 (en) Overhead management for event tracing
US10333863B2 (en) Adaptive resource allocation based upon observed historical usage
CN109073350B (en) Predictive summary and caching of application performance data
US8775585B2 (en) Autonomic SLA breach value estimation
US8316101B2 (en) Resource management system for hosting of user solutions
US7664847B2 (en) Managing workload by service
US9811445B2 (en) Methods and systems for the use of synthetic users to performance test cloud applications
US9128792B2 (en) Systems and methods for installing, managing, and provisioning applications
US20150067171A1 (en) Cloud service brokering systems and methods
US8943269B2 (en) Apparatus and method for meeting performance metrics for users in file systems
US9235491B2 (en) Systems and methods for installing, managing, and provisioning applications
CA2533744C (en) Hierarchical management of the dynamic allocation of resources in a multi-node system
US20120278578A1 (en) Cost-aware replication of intermediate data in dataflows
US20130024442A1 (en) System load query governor
US9135259B2 (en) Multi-tenancy storage node
US9317269B2 (en) Systems and methods for installing, managing, and provisioning applications
US20170278087A1 (en) Virtual machine pricing model
US20160094392A1 (en) Evaluating Configuration Changes Based on Aggregate Activity Level
US8725868B2 (en) Interactive service management
US10761726B2 (en) Resource fairness control in distributed storage systems using congestion data
WO2015122880A1 (en) Monitoring a computing environment
Zeng et al. Argus: A Multi-tenancy NoSQL store with workload-aware resource reservation
US20220188289A1 (en) Online file system consistency check for container data on a clustered filesystem
US20220191215A1 (en) Control of usage of computing services based on dynamic grouping
US11265250B2 (en) Targeted rate limiting of tenant systems in online services

Legal Events

Date Code Title Description
AS Assignment

Owner name: OPNET TECHNOLOGIES, INC., MARYLAND

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MALLOY, PATRICK;CROSBY, PETER ANTHONY;MEAGHER, ROBERT;SIGNING DATES FROM 20120502 TO 20120503;REEL/FRAME:028188/0187

AS Assignment

Owner name: MORGAN STANLEY & CO. LLC, MARYLAND

Free format text: SECURITY AGREEMENT;ASSIGNORS:RIVERBED TECHNOLOGY, INC.;OPNET TECHNOLOGIES, INC.;REEL/FRAME:029646/0060

Effective date: 20121218

AS Assignment

Owner name: OPNET TECHNOLOGIES LLC, MARYLAND

Free format text: CHANGE OF NAME;ASSIGNOR:OPNET TECHNOLOGIES, INC.;REEL/FRAME:030411/0273

Effective date: 20130401

AS Assignment

Owner name: RIVERBED TECHNOLOGY, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:OPNET TECHNOLOGIES LLC;REEL/FRAME:030462/0148

Effective date: 20130401

AS Assignment

Owner name: OPNET TECHNOLOGIES, INC., MARYLAND

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MALLOY, PATRICK;CROSBY, PETER ANTHONY;MEAGHER, ROBERT;SIGNING DATES FROM 20121025 TO 20121031;REEL/FRAME:030484/0530

AS Assignment

Owner name: RIVERBED TECHNOLOGY, INC., CALIFORNIA

Free format text: RELEASE OF PATENT SECURITY INTEREST;ASSIGNOR:MORGAN STANLEY & CO. LLC, AS COLLATERAL AGENT;REEL/FRAME:032113/0425

Effective date: 20131220

AS Assignment

Owner name: JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT, NEW YORK

Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:RIVERBED TECHNOLOGY, INC.;REEL/FRAME:032421/0162

Effective date: 20131220

Owner name: JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT

Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:RIVERBED TECHNOLOGY, INC.;REEL/FRAME:032421/0162

Effective date: 20131220

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: RIVERBED TECHNOLOGY, INC., CALIFORNIA

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:BARCLAYS BANK PLC;REEL/FRAME:035521/0069

Effective date: 20150424

AS Assignment

Owner name: RIVERBED TECHNOLOGY, INC., CALIFORNIA

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE CONVEYING PARTY NAME PREVIOUSLY RECORDED ON REEL 035521 FRAME 0069. ASSIGNOR(S) HEREBY CONFIRMS THE RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:035807/0680

Effective date: 20150424