WO2015105498A1 - Auto completion of source code constructs - Google Patents

Auto completion of source code constructs Download PDF

Info

Publication number
WO2015105498A1
WO2015105498A1 PCT/US2014/010951 US2014010951W WO2015105498A1 WO 2015105498 A1 WO2015105498 A1 WO 2015105498A1 US 2014010951 W US2014010951 W US 2014010951W WO 2015105498 A1 WO2015105498 A1 WO 2015105498A1
Authority
WO
WIPO (PCT)
Prior art keywords
source code
constructs
features
processor
previously used
Prior art date
Application number
PCT/US2014/010951
Other languages
French (fr)
Inventor
Ohad Assulin
Elad BENEDICT
Amit BEZALEL
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to PCT/US2014/010951 priority Critical patent/WO2015105498A1/en
Publication of WO2015105498A1 publication Critical patent/WO2015105498A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/33Intelligent editors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/274Converting codes to words; Guess-ahead of partial word inputs

Definitions

  • Computer programs may contain instructions that describe actions to be performed by a computer processor.
  • a computer programmer may create the instructions ("source code") of a computer program.
  • a programmer may edit a program's source code manually or may be assisted by an integrated development environment ("IDE").
  • IDE integrated development environment
  • FIG. 1 is a block diagram of an example system in accordance with aspects of the present disclosure.
  • FIG. 2 is a flow diagram of an example method in accordance with aspects of the present disclosure.
  • Fig. 3 is an example multidimensional space in accordance with aspects of the present disclosure.
  • FIG. 4 is a further example multidimensional space in accordance with aspects of the present disclosure.
  • a programmer may manually edit a program's source code or may be assisted by an IDE.
  • a programmer may use a typed programming language to implement the source code of a computer program.
  • a variable's type may limit the operations applicable on that variable.
  • a variable of type String may contain the value "text.”
  • a compiler of a programming language may reject an attempt to divide a number by such a variable because its type is defined as a string of characters and not as an integer. Thus, types may make it easier for a compiler to validate language constraints on the source code.
  • a static data type may be determined when the program is compiled or before the program is executed.
  • Static types may be explicitly defined or declared by a programmer.
  • Many popular programming languages such as C++, C# and Java, allow programmers to explicitly define static types that may be detected by a compiler.
  • dynamic data types may be determined at execution time. Therefore, dynamic data types may be associated with run-time values rather than predefined textual expressions. In this instance, a programmer is not required to explicitly define such types. However, type errors cannot be automatically detected until the program is executed.
  • Some examples of dynamically typed languages include, but are not limited to, Lisp, Perl, Python, JavaScript, and Ruby.
  • IDEs often provide programmers with auto completion to make coding more efficient and easier.
  • a type Integer may have been predefined with two methods, add and subtract.
  • a programmer may define a variable / ' of type Integer. While typing the variable "/ ' . ", a drop down box may appear displaying the methods add and subtract which allows the programmer to simply click on the method of choice. Upon clicking the method of choice, the IDE may insert the selected method into the code.
  • Such auto completions may save a programmer time while coding.
  • auto completion of source code constructs based on dynamic types is often problematic, since the properties of these types are unknown until runtime. While dynamic types have become increasingly popular of late, the rise in popularity of dynamic types may also make auto code completion less effective at saving time while coding.
  • source code samples may be used to differentiate between proper and improper source code constructs.
  • auto completion options may be displayed such that the completions are ordered based at least partially on a likelihood that each option would result in a proper source code construct.
  • the techniques disclosed herein may be used to provide auto completion options for dynamic types that are not predefined.
  • the techniques disclosed herein may provide auto completion options to programmers even while coding dynamic types.
  • FIG. 1 presents a schematic diagram of an illustrative computer apparatus 100 for executing the techniques disclosed herein.
  • the computer apparatus 100 may include all the components normally used in connection with a computer. For example, it may have a keyboard and mouse and/or various other types of input devices such as pen-inputs, joysticks, buttons, touch screens, etc., as well as a display, which could include, for instance, a CRT, LCD, plasma screen monitor, TV, projector, etc.
  • Computer apparatus 100 may also comprise a network interface (not shown) to communicate with other devices over a network.
  • the computer apparatus 100 may also contain a processor 1 10, which may be any number of well known processors, such as processors from Intel ® Corporation.
  • processor 1 10 may be an application specific integrated circuit ("ASIC").
  • Non-transitory computer readable medium (“CRM”) 1 12 may store instructions that may be retrieved and executed by processor 1 10.
  • the instructions may include a learning module 1 14 and a code completion module 1 16.
  • Non-transitory CRM 1 12 may be used by or in connection with any instruction execution system that can fetch or obtain the logic therefrom and execute the instructions contained therein.
  • Non-transitory CRM 1 12 may comprise any one of many physical media such as, for example, electronic, magnetic, optical, electromagnetic, or semiconductor media. More specific examples of suitable non-transitory CRM include, but are not limited to, a portable magnetic computer diskette such as floppy diskettes or hard drives, a read-only memory (“ROM”), an erasable programmable read-only memory, a portable compact disc or other storage devices that may be coupled to computer apparatus 100 directly or indirectly. Alternatively, non-transitory CRM 1 12 may be a random access memory (“RAM”) device or may be divided into multiple memory segments organized as dual in-line memory modules (“DIMMs").
  • RAM random access memory
  • the non-transitory CRM 1 12 may also include any combination of one or more of the foregoing and/or other devices as well. While only one processor and one non-transitory CRM are shown in FIG. 1 , computer apparatus 100 may actually comprise additional processors and memories that may or may not be stored within the same physical housing or location.
  • the instructions residing in non-transitory CRM 1 12 may comprise any set of instructions to be executed directly (such as machine code) or indirectly (such as scripts) by processor 1 10.
  • the terms "instructions,” “scripts,” and “applications” may be used interchangeably herein.
  • the computer executable instructions may be stored in any computer language or format, such as in object code or modules of source code.
  • the instructions may be implemented in the form of hardware, software, or a combination of hardware and software and that the examples herein are merely illustrative.
  • learning module 1 14 may instruct processor 1 10 to differentiate between proper and improper source code constructs contained in source code samples.
  • code completion module 1 16 may instruct processor 1 10 to display an ordered list of auto completion options when a source code prefix is detected such that the completions are ordered based on a likelihood that each completion results in a proper source code construct, as determined by the learning module, when appended to the prefix.
  • FIG. 2 illustrates a flow diagram of an example method 200 for source code auto completion.
  • FIG. 3 is an example of features of source code samples plotted in a multidimensional space and
  • FIG. 4 is an example of features in previously used constructs plotted in the multidimensional space. The actions shown in FIGS. 3-4 will be discussed below with regard to the flow diagram of FIG. 2.
  • features of source code samples may be categorized to distinguish between proper and improper source code constructs.
  • a team of researchers may determine which sample source code constructs are proper and improper. Such determination may be done visually or with the assistance of automated tools, such as dimensionality reduction algorithms (e.g., Kernel principal component analysis, multi-linear principal component analysis, etc.).
  • a proper source code construct may be defined as a construct that would compile successfully and execute successfully at runtime.
  • the features of the source code constructs may include, but are not limited to, number of characters in the construct, presence of special characters, or the presence of certain key words.
  • cross validation may be employed to determine which of the extracted features are most indicative of proper and improper source code constructs.
  • Cross validation is a statistical technique for estimating the accuracy of a predictive model. Cross validation may filter out features that seem significant within the context of a limited data set, but are insignificant generally. Thus, cross validation prevents researchers from accepting that a feature is highly indicative of a proper source code construct generally based on a limited data set.
  • One round of cross-validation may involve partitioning a sample of data into complementary subsets. One subset may be used as a training set and another set may be used to validate the analysis of the training set. Multiple rounds of cross-validation may be performed using different partitions and the validation results may be averaged over the multiple rounds.
  • Learning module 1 14 may model the determination of proper and improper source code constructs as a binary classification problem.
  • learning module 1 14 may comprise a support vector machine ("SVM") algorithm.
  • SVM support vector machine
  • An SVM algorithm is a binary classifier that may be employed to categorize new data into one of two classes (e.g., proper or improper source code constructs) based on a set of training samples.
  • learning module 1 14 may be provided with a set of source code samples such that each sample is manually labeled as a proper or improper source code construct. Moreover, each sample submitted to learning module 1 14 may be accompanied by an associated vector and each value in the vector may correspond to one of the detected features. Learning module 1 14 may plot these features in an n-dimensional space such that n is equal to the number of detected features. Since the vectors are already labeled as proper and improper, learning module 1 14 may associate different patterns of vector values with one of the two categories.
  • a training source code construct of "t.ssn_number” may be represented by the vector ⁇ 12, 0, 1 >, wherein 12 is the number of characters in the query, 0 indicates that the query does not contain a keyword, and 1 indicates that the source code construct does have a special character (in this example the special character is "_").
  • Learning module 1 14 may plot this vector in a three-dimensional space. As noted above, three dimensions are used in the examples herein for ease of illustration. That is, if twelve features are detected, learning module 1 14 may plot such features in a twelve dimensional space.
  • Multidimensional space 300 shown in FIG. 3 may be generated by learning module 1 14 in accordance with three features.
  • Each point plotted in cluster 304 may be associated with proper source code constructs and each point plotted in cluster 302 may be associated with improper source code constructs.
  • Learning module 1 14 may identify a boundary 306 that differentiates the two classes of source code constructs. As will be discussed further below, boundary 306 may be a decision boundary that may be used to asses future source code constructs. Thus, one goal of learning module 1 14 may be to determine the line or hyperplane, out of all possible lines or hyperplanes, that best represents the boundary between proper and improper source code constructs. If a boundary could not be found, learning module 1 14 may utilize statistical techniques, such as Gaussian kernel, to rearrange the graph.
  • a source code prefix is detected, as shown in block 204.
  • the resulting graph may be used to rank other source code constructs.
  • real-time typing of source code may be monitored to detect a source code prefix.
  • One popular source code format is "someObject.XXX' .
  • code completion module 1 16 may display an ordered list of auto completion options. The completions may be ranked or ordered based at least partially on an analysis of the multidimensional space and on a likelihood that each completion results in a proper source code construct when appended to the prefix, as shown in block 206 of FIG. 2.
  • the auto completion options may include previously used completions of previously used constructs.
  • previously used completions may be completions used for constructs in a software project associated with the source code file in which the prefix was detected.
  • features of the previously used constructs may be detected and plotted in the multidimensional space to determine the likelihood that each previously used completion would result in a proper construct when appended to the prefix.
  • the programmer may select the desired previously used completion for the prefix.
  • FIG. 4 shows points 402, 404, 406, 408, and 410 plotted in the multidimensional space. These example points are indicative of feature vectors associated with previously used constructs.
  • Learning module 1 14 may determine which side of boundary 306 to plot the constructs previously used by the programmer based on their features. As the distribution changes over time, learning module 1 14 may determine that a new boundary should be defined.
  • the likelihood that a completion of a previously used construct would result in a proper construct may be based on the distance between the detected features of the previously used constructs plotted in multidimensional space 300 and the boundary 306. As such, the further a vector associated with a given construct is plotted from boundary 306, the higher or lower the completion resulting in that construct is ranked in the auto complete list.
  • the completion resulting in the construct associated with point 406 may be ranked higher than any other, since it's the furthest from boundary 306 and it's on the "proper" side of boundary 306.
  • the lowest ranked completion may be the completion resulting in the construct associated with point 408, since it's the furthest from boundary 306 and it's on the "improper" side of boundary 306.
  • the foregoing system, method, and non-transitory computer readable medium provides a ranked auto completion list for source code constructs associated with dynamic types.
  • the completions may be ranked based on features of the resulting constructs as compared to features learned from source code samples.
  • programmers may continue to code efficiently despite their use of dynamic types.

Abstract

Disclosed herein are techniques for auto completion of source code constructs. Source code samples are used to differentiate between proper and improper source code constructs. Auto completion options are displayed such that the completions are ordered based at least partially on a likelihood that each option results in a proper construct.

Description

AUTO COMPLETION OF SOURCE CODE CONSTRUCTS
[0001] Computer programs may contain instructions that describe actions to be performed by a computer processor. A computer programmer may create the instructions ("source code") of a computer program. A programmer may edit a program's source code manually or may be assisted by an integrated development environment ("IDE").
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] Fig. 1 is a block diagram of an example system in accordance with aspects of the present disclosure.
[0003] Fig. 2 is a flow diagram of an example method in accordance with aspects of the present disclosure.
[0004] Fig. 3 is an example multidimensional space in accordance with aspects of the present disclosure.
[0005] Fig. 4 is a further example multidimensional space in accordance with aspects of the present disclosure.
DETAILED DESCRIPTION
[0006] As noted above, a programmer may manually edit a program's source code or may be assisted by an IDE. A programmer may use a typed programming language to implement the source code of a computer program. In a typed programming language, a variable's type may limit the operations applicable on that variable. By way of example, a variable of type String may contain the value "text." A compiler of a programming language may reject an attempt to divide a number by such a variable because its type is defined as a string of characters and not as an integer. Thus, types may make it easier for a compiler to validate language constraints on the source code.
[0007] A static data type may be determined when the program is compiled or before the program is executed. Static types may be explicitly defined or declared by a programmer. Many popular programming languages, such as C++, C# and Java, allow programmers to explicitly define static types that may be detected by a compiler. In contrast, dynamic data types may be determined at execution time. Therefore, dynamic data types may be associated with run-time values rather than predefined textual expressions. In this instance, a programmer is not required to explicitly define such types. However, type errors cannot be automatically detected until the program is executed. Some examples of dynamically typed languages include, but are not limited to, Lisp, Perl, Python, JavaScript, and Ruby.
[0008] IDEs often provide programmers with auto completion to make coding more efficient and easier. By way of example, a type Integer may have been predefined with two methods, add and subtract. A programmer may define a variable /' of type Integer. While typing the variable "/'. ", a drop down box may appear displaying the methods add and subtract which allows the programmer to simply click on the method of choice. Upon clicking the method of choice, the IDE may insert the selected method into the code. Such auto completions may save a programmer time while coding. Unfortunately, auto completion of source code constructs based on dynamic types is often problematic, since the properties of these types are unknown until runtime. While dynamic types have become increasingly popular of late, the rise in popularity of dynamic types may also make auto code completion less effective at saving time while coding.
[0009] In view of the foregoing, disclosed herein are a system, non-transitory computer readable medium, and method for auto completion of source code constructs. In one example, source code samples may be used to differentiate between proper and improper source code constructs. In another example, auto completion options may be displayed such that the completions are ordered based at least partially on a likelihood that each option would result in a proper source code construct. The techniques disclosed herein may be used to provide auto completion options for dynamic types that are not predefined. Thus, the techniques disclosed herein may provide auto completion options to programmers even while coding dynamic types. The aspects, features and advantages of the present disclosure will be appreciated when considered with reference to the following description of examples and accompanying figures. The following description does not limit the application; rather, the scope of the disclosure is defined by the appended claims and equivalents.
[0010] FIG. 1 presents a schematic diagram of an illustrative computer apparatus 100 for executing the techniques disclosed herein. The computer apparatus 100 may include all the components normally used in connection with a computer. For example, it may have a keyboard and mouse and/or various other types of input devices such as pen-inputs, joysticks, buttons, touch screens, etc., as well as a display, which could include, for instance, a CRT, LCD, plasma screen monitor, TV, projector, etc. Computer apparatus 100 may also comprise a network interface (not shown) to communicate with other devices over a network. The computer apparatus 100 may also contain a processor 1 10, which may be any number of well known processors, such as processors from Intel ® Corporation. In another example, processor 1 10 may be an application specific integrated circuit ("ASIC"). Non-transitory computer readable medium ("CRM") 1 12 may store instructions that may be retrieved and executed by processor 1 10. In one example, the instructions may include a learning module 1 14 and a code completion module 1 16. Non-transitory CRM 1 12 may be used by or in connection with any instruction execution system that can fetch or obtain the logic therefrom and execute the instructions contained therein.
[001 1] Non-transitory CRM 1 12 may comprise any one of many physical media such as, for example, electronic, magnetic, optical, electromagnetic, or semiconductor media. More specific examples of suitable non-transitory CRM include, but are not limited to, a portable magnetic computer diskette such as floppy diskettes or hard drives, a read-only memory ("ROM"), an erasable programmable read-only memory, a portable compact disc or other storage devices that may be coupled to computer apparatus 100 directly or indirectly. Alternatively, non-transitory CRM 1 12 may be a random access memory ("RAM") device or may be divided into multiple memory segments organized as dual in-line memory modules ("DIMMs"). The non-transitory CRM 1 12 may also include any combination of one or more of the foregoing and/or other devices as well. While only one processor and one non-transitory CRM are shown in FIG. 1 , computer apparatus 100 may actually comprise additional processors and memories that may or may not be stored within the same physical housing or location.
[0012] The instructions residing in non-transitory CRM 1 12 may comprise any set of instructions to be executed directly (such as machine code) or indirectly (such as scripts) by processor 1 10. In this regard, the terms "instructions," "scripts," and "applications" may be used interchangeably herein. The computer executable instructions may be stored in any computer language or format, such as in object code or modules of source code. Furthermore, it is understood that the instructions may be implemented in the form of hardware, software, or a combination of hardware and software and that the examples herein are merely illustrative.
[0013] As will be discussed in more detail below, learning module 1 14 may instruct processor 1 10 to differentiate between proper and improper source code constructs contained in source code samples. Furthermore, code completion module 1 16 may instruct processor 1 10 to display an ordered list of auto completion options when a source code prefix is detected such that the completions are ordered based on a likelihood that each completion results in a proper source code construct, as determined by the learning module, when appended to the prefix.
[0014] Working examples of the system, method, and non-transitory computer-readable medium are shown in FIGS. 2-4. In particular, FIG. 2 illustrates a flow diagram of an example method 200 for source code auto completion. FIG. 3 is an example of features of source code samples plotted in a multidimensional space and FIG. 4 is an example of features in previously used constructs plotted in the multidimensional space. The actions shown in FIGS. 3-4 will be discussed below with regard to the flow diagram of FIG. 2.
[0015] As shown in block 202 of FIG. 2, features of source code samples may be categorized to distinguish between proper and improper source code constructs. In one implementation, a team of researchers may determine which sample source code constructs are proper and improper. Such determination may be done visually or with the assistance of automated tools, such as dimensionality reduction algorithms (e.g., Kernel principal component analysis, multi-linear principal component analysis, etc.). In one example, a proper source code construct may be defined as a construct that would compile successfully and execute successfully at runtime. The features of the source code constructs may include, but are not limited to, number of characters in the construct, presence of special characters, or the presence of certain key words.
[0016] In another example, cross validation may be employed to determine which of the extracted features are most indicative of proper and improper source code constructs. Cross validation is a statistical technique for estimating the accuracy of a predictive model. Cross validation may filter out features that seem significant within the context of a limited data set, but are insignificant generally. Thus, cross validation prevents researchers from accepting that a feature is highly indicative of a proper source code construct generally based on a limited data set. One round of cross-validation may involve partitioning a sample of data into complementary subsets. One subset may be used as a training set and another set may be used to validate the analysis of the training set. Multiple rounds of cross-validation may be performed using different partitions and the validation results may be averaged over the multiple rounds.
[0017] Referring now to FIG. 3, a multidimensional space 300 is shown. For ease of illustration, only three dimensions are depicted, but it should be understood that many more dimensions may be used depending on the number of features detected. That is, each feature may be represented by a dimension in the graph. Learning module 1 14 may model the determination of proper and improper source code constructs as a binary classification problem. In one example, learning module 1 14 may comprise a support vector machine ("SVM") algorithm. An SVM algorithm is a binary classifier that may be employed to categorize new data into one of two classes (e.g., proper or improper source code constructs) based on a set of training samples. However, it is understood that other algorithms may be employed, such as, but not limited to, na'ive Bayes or neural networks. [0018] As noted above, learning module 1 14 may be provided with a set of source code samples such that each sample is manually labeled as a proper or improper source code construct. Moreover, each sample submitted to learning module 1 14 may be accompanied by an associated vector and each value in the vector may correspond to one of the detected features. Learning module 1 14 may plot these features in an n-dimensional space such that n is equal to the number of detected features. Since the vectors are already labeled as proper and improper, learning module 1 14 may associate different patterns of vector values with one of the two categories. By way of example, there may be three features detected during analysis of the source code: number of characters, whether the construct contains a particular key word, and whether the construct contains a special character. Thus, a training source code construct of "t.ssn_number" may be represented by the vector <12, 0, 1 >, wherein 12 is the number of characters in the query, 0 indicates that the query does not contain a keyword, and 1 indicates that the source code construct does have a special character (in this example the special character is "_"). Learning module 1 14 may plot this vector in a three-dimensional space. As noted above, three dimensions are used in the examples herein for ease of illustration. That is, if twelve features are detected, learning module 1 14 may plot such features in a twelve dimensional space.
[0019] Multidimensional space 300 shown in FIG. 3 may be generated by learning module 1 14 in accordance with three features. Each point plotted in cluster 304 may be associated with proper source code constructs and each point plotted in cluster 302 may be associated with improper source code constructs. Learning module 1 14 may identify a boundary 306 that differentiates the two classes of source code constructs. As will be discussed further below, boundary 306 may be a decision boundary that may be used to asses future source code constructs. Thus, one goal of learning module 1 14 may be to determine the line or hyperplane, out of all possible lines or hyperplanes, that best represents the boundary between proper and improper source code constructs. If a boundary could not be found, learning module 1 14 may utilize statistical techniques, such as Gaussian kernel, to rearrange the graph.
[0020] Referring back to FIG. 2, it may be determined whether a source code prefix is detected, as shown in block 204. After learning module 1 14 is trained, the resulting graph may be used to rank other source code constructs. In another example, real-time typing of source code may be monitored to detect a source code prefix. One popular source code format is "someObject.XXX' . In this instance, upon detection of the prefix "someObject ", code completion module 1 16 may display an ordered list of auto completion options. The completions may be ranked or ordered based at least partially on an analysis of the multidimensional space and on a likelihood that each completion results in a proper source code construct when appended to the prefix, as shown in block 206 of FIG. 2. In another example, the auto completion options may include previously used completions of previously used constructs. In a further example, previously used completions may be completions used for constructs in a software project associated with the source code file in which the prefix was detected. In another aspect, features of the previously used constructs may be detected and plotted in the multidimensional space to determine the likelihood that each previously used completion would result in a proper construct when appended to the prefix. In turn, the programmer may select the desired previously used completion for the prefix.
[0021] Referring now to FIG. 4, constructs that have been used by a programmer are shown being plotted in the multidimensional space. FIG. 4 shows points 402, 404, 406, 408, and 410 plotted in the multidimensional space. These example points are indicative of feature vectors associated with previously used constructs. Learning module 1 14 may determine which side of boundary 306 to plot the constructs previously used by the programmer based on their features. As the distribution changes over time, learning module 1 14 may determine that a new boundary should be defined.
[0022] In one example, the likelihood that a completion of a previously used construct would result in a proper construct may be based on the distance between the detected features of the previously used constructs plotted in multidimensional space 300 and the boundary 306. As such, the further a vector associated with a given construct is plotted from boundary 306, the higher or lower the completion resulting in that construct is ranked in the auto complete list. In the example of FIG. 4, the completion resulting in the construct associated with point 406 may be ranked higher than any other, since it's the furthest from boundary 306 and it's on the "proper" side of boundary 306. The lowest ranked completion may be the completion resulting in the construct associated with point 408, since it's the furthest from boundary 306 and it's on the "improper" side of boundary 306.
[0023] Advantageously, the foregoing system, method, and non-transitory computer readable medium provides a ranked auto completion list for source code constructs associated with dynamic types. In this regard, rather than displaying auto completions randomly, the completions may be ranked based on features of the resulting constructs as compared to features learned from source code samples. In turn, programmers may continue to code efficiently despite their use of dynamic types.
[0024] Although the disclosure herein has been described with reference to particular examples, it is to be understood that these examples are merely illustrative of the principles of the disclosure. It is therefore to be understood that numerous modifications may be made to the examples and that other arrangements may be devised without departing from the spirit and scope of the disclosure as defined by the appended claims. Furthermore, while particular processes are shown in a specific order in the appended drawings, such processes are not limited to any particular order unless such order is expressly set forth herein; rather, processes may be performed in a different order or concurrently and steps may be added or omitted.

Claims

1 . A system comprising:
a learning module which upon execution instructs at least one processor to differentiate between proper and improper source code constructs contained in source code samples; and
a code completion module which upon execution instructs at least one processor to display an ordered list of auto completion options when a source code prefix is detected such that the completions are ordered based on a likelihood that each completion would result in a proper source code construct, as determined by the learning module, when appended to the prefix.
2. The system of claim 1 , wherein the learning module upon execution instructs at least one processor to plot features of the source code samples in a multidimensional space and determine a boundary within the plotted features that differentiates between proper source code constructs and improper source code constructs.
3. The system of claim 2, wherein the learning module upon execution instructs at least one processor to determine the boundary using a support vector machine algorithm.
4. The system of claim 2, wherein the completion module upon execution further instructs at least one processor to:
include previously used completions of previously used constructs in the list of auto completion options;
detect features of each previously used construct; and
plot the detected features in the multidimensional space to determine the likelihood that each previously used completion would result in a proper source code construct when appended to the prefix.
5. The system of claim 4, wherein the likelihood is further based on a distance between the detected features of the previously used constructs plotted in the multidimensional space and the boundary within the plotted features that distinguishes between proper source code constructs and improper source code constructs.
6. A non-transitory computer readable medium having instructions therein which, if executed, cause a processor to:
plot features of source code samples in a multidimensional space; differentiate between proper and improper source code constructs based on an analysis of the plotted features;
monitor real-time typing of source code to detect a source code prefix; and
display an ordered list of auto completion options when the prefix is detected such that the completions are ordered based at least partially on an analysis of the multidimensional space and a likelihood that each completion results in a proper source code construct when appended to the prefix.
7. The non-transitory computer readable medium of claim 6, wherein the instructions therein, if executed, further instruct at least one processor to determine a boundary within the plotted features in the multidimensional space to differentiate between proper source code constructs and improper source code constructs.
8. The non-transitory computer readable medium of claim 7, wherein the boundary is determined using a support vector machine algorithm.
9. The non-transitory computer readable medium of claim 7, wherein the instructions therein, if executed, further instruct at least one processor to:
include previously used completions of previously used constructs in the list of auto completion options;
detect features of each previously used construct; and
plot the detected features in the multidimensional space to determine the likelihood that each previously used completion would result in a proper source code construct when appended to the prefix.
10. The non-transitory computer readable medium of claim 9, wherein the likelihood is further based on a distance between the detected features of the previously used constructs plotted in the multidimensional space and the boundary within the plotted features that distinguishes between proper source code constructs and improper source code constructs.
1 1 . A method comprising:
plotting, using at least one processor, features of source code samples in a multidimensional space;
categorizing, using at least one processor, the plotted features as being indicative of proper source code constructs or improper source code constructs;
monitoring, using at least one processor, typing of source code to detect a source code prefix; and
if the prefix is detected, displaying, using at least one processor, a list of auto completion options that are ranked based at least partially on an analysis of the multidimensional space and on a likelihood that each completion results in a proper source code construct when appended to the prefix.
12. The method of claim 1 1 , determining, using at least one processor, a boundary within the plotted features that differentiates between proper source code constructs and improper source code constructs.
13. The method of claim 12, wherein the boundary is determined using a support vector machine algorithm.
14. The method of claim 12, further comprising
including, using at least one processor, previously used completions of previously used constructs in the list of auto completion options;
detecting, using at least one processor, features of each previously used construct; and
plotting, using at least one processor, the detected features in the multidimensional space to determine the likelihood that each previously used completion would result in a proper source code construct when appended to the prefix.
15. The method of claim 12, wherein the likelihood is further based on a distance between the detected features of the previously used constructs plotted in the multidimensional space and the boundary within the plotted features that distinguishes between proper source code constructs and improper source code constructs.
PCT/US2014/010951 2014-01-10 2014-01-10 Auto completion of source code constructs WO2015105498A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2014/010951 WO2015105498A1 (en) 2014-01-10 2014-01-10 Auto completion of source code constructs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2014/010951 WO2015105498A1 (en) 2014-01-10 2014-01-10 Auto completion of source code constructs

Publications (1)

Publication Number Publication Date
WO2015105498A1 true WO2015105498A1 (en) 2015-07-16

Family

ID=53524211

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/010951 WO2015105498A1 (en) 2014-01-10 2014-01-10 Auto completion of source code constructs

Country Status (1)

Country Link
WO (1) WO2015105498A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170242668A1 (en) * 2016-02-24 2017-08-24 Microsoft Technology Licensing, Llc Content publishing
CN107239264A (en) * 2016-03-28 2017-10-10 阿里巴巴集团控股有限公司 The generation method and device of code prompt message
US10235141B2 (en) 2016-06-28 2019-03-19 Hcl Technologies Ltd. Method and system for providing source code suggestion to a user in real-time

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070044066A1 (en) * 2005-08-19 2007-02-22 Microsoft Corporation Embedded multi-language programming
US8051408B1 (en) * 2004-09-13 2011-11-01 The Mathworks, Inc. Method of providing interactive usage descriptions based on source code analysis
US8364696B2 (en) * 2009-01-09 2013-01-29 Microsoft Corporation Efficient incremental parsing of context sensitive programming languages
US8458161B2 (en) * 1999-03-22 2013-06-04 Esdr Network Solutions Llc Method, product, and apparatus for enhancing resolution services, registration services, and search services

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8458161B2 (en) * 1999-03-22 2013-06-04 Esdr Network Solutions Llc Method, product, and apparatus for enhancing resolution services, registration services, and search services
US8051408B1 (en) * 2004-09-13 2011-11-01 The Mathworks, Inc. Method of providing interactive usage descriptions based on source code analysis
US20070044066A1 (en) * 2005-08-19 2007-02-22 Microsoft Corporation Embedded multi-language programming
US8364696B2 (en) * 2009-01-09 2013-01-29 Microsoft Corporation Efficient incremental parsing of context sensitive programming languages

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170242668A1 (en) * 2016-02-24 2017-08-24 Microsoft Technology Licensing, Llc Content publishing
CN107239264A (en) * 2016-03-28 2017-10-10 阿里巴巴集团控股有限公司 The generation method and device of code prompt message
US10678514B2 (en) 2016-03-28 2020-06-09 Alibaba Group Holding Limited Method and device for generating code assistance information
CN107239264B (en) * 2016-03-28 2020-06-23 阿里巴巴集团控股有限公司 Method and device for generating code prompt information
US10235141B2 (en) 2016-06-28 2019-03-19 Hcl Technologies Ltd. Method and system for providing source code suggestion to a user in real-time

Similar Documents

Publication Publication Date Title
Malik et al. Nl2type: inferring javascript function types from natural language information
US11269622B2 (en) Methods, systems, articles of manufacture, and apparatus for a context and complexity-aware recommendation system for improved software development efficiency
US11899800B2 (en) Open source vulnerability prediction with machine learning ensemble
Proksch et al. Intelligent code completion with Bayesian networks
US10810470B2 (en) Centroid for improving machine learning classification and info retrieval
Zagane et al. Deep learning for software vulnerabilities detection using code metrics
Lyu et al. Smartseed: Smart seed generation for efficient fuzzing
JP6911059B2 (en) Query optimizer for CPU utilization and code refactoring
Xin et al. Helix: Accelerating human-in-the-loop machine learning
CN104077218A (en) Test method and device of MapReduce distributed system
KR102074909B1 (en) Apparatus and method for classifying software vulnerability
Razzaq et al. An empirical assessment of baseline feature location techniques
Brockschmidt et al. Learning shape analysis
WO2015105498A1 (en) Auto completion of source code constructs
Nagwani et al. A comparative study of bug classification algorithms
CN117461035A (en) Efficient and accurate region interpretation techniques for NLP models
Naik et al. Sporq: An interactive environment for exploring code using query-by-example
Escalante et al. Particle swarm model selection for authorship verification
Hsu et al. Low-level augmented bayesian optimization for finding the best cloud vm
CN110750297A (en) Python code reference information generation method based on program analysis and text analysis
CN113656669B (en) Label updating method and device
US20220092452A1 (en) Automated machine learning tool for explaining the effects of complex text on predictive results
Chawla et al. Automated labeling of issue reports using semi supervised approach
Dimov et al. Weka: Practical machine learning tools and techniques with java implementations
Wang et al. UISMiner: Mining UI suggestions from user reviews

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14878052

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14878052

Country of ref document: EP

Kind code of ref document: A1