Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of sample file analytical approach and device, extract the character visible string from the sample file of binary format, as judging whether file is viral foundation, and the validity of character visible string is filtered, effectively reduce the analysis result set of sample file, greatly promote checking and killing virus efficiency.
For solving the problems of the technologies described above, embodiments of the invention provide a kind of sample file analytical approach, comprising:
Obtain the sample file of binary format;
The sample file of described binary format is mapped to internal memory;
Sample file to the described binary format that is mapped to internal memory carries out full text character string filter analysis, the sample file of the binary format after being filtered;
Export the sample file of the binary format after described filtration.
Wherein, the described sample file to the described binary format that is mapped to internal memory carries out full text character string filter analysis, and the step of the sample file of the binary format after being filtered comprises:
According to the feature string in character set and virus family storehouse, the sample file of the described binary format that is mapped to internal memory is carried out to the full text string matching, obtain the unsuccessful character string of coupling;
Filter out the unsuccessful character string of described coupling, the sample file of the binary format after being filtered.
Wherein, the described feature string according to character set and virus family storehouse, the step of the sample file of the described binary format that is mapped to internal memory being carried out to string matching in full comprises:
The sample file of the described binary format that is mapped to internal memory and the concentrated character string of described character code are complementary, obtain the unsuccessful insignificant character string of coupling and sample file corresponding to character string that the match is successful;
To the described character string that the match is successful, corresponding sample file and the feature string in described virus family storehouse are mated, and obtain the unsuccessful string to be filtered of coupling.
Wherein, the sample file that the described character string that the match is successful is formed and the feature string in described virus family storehouse are mated, and the step that obtains the unsuccessful string to be filtered of coupling comprises:
Calculate the cryptographic hash of the character string in sample file corresponding to the described character string that the match is successful;
Calculate the cryptographic hash of the feature string in described virus family storehouse;
The cryptographic hash of the feature string in the cryptographic hash of the character string in the sample file of described binary format and described virus family storehouse is compared, if unequal, think that two string matchings of comparing are unsuccessful, and obtain the unsuccessful string to be filtered of coupling, otherwise think that the match is successful.
Wherein, the step cryptographic hash of the feature string in the cryptographic hash of the character string in the sample file of described binary format and described virus family storehouse compared comprises:
Adopt a processor instruction that the cryptographic hash of the feature string in the cryptographic hash of the character string in the sample file of described binary format and described virus family storehouse is compared.
Wherein, the described unsuccessful character string of described coupling that filters out, the step of the sample file of the binary format after being filtered comprises:
Filter out the unsuccessful insignificant character string of described coupling and the unsuccessful string to be filtered of described coupling, the sample file of the binary format after being filtered.
Wherein, described character set comprises: UNICODE, UFT-8, GBK, GB2312 and/or MBCS character code.
Embodiments of the invention also provide a kind of sample file analytical equipment, comprising:
Obtain module, for obtaining the sample file of binary format;
Mapping block, be mapped to internal memory for the sample file by described binary format;
Analysis module, carry out full text character string filter analysis, the sample file of the binary format after being filtered for the sample file of the described binary format to being mapped to internal memory;
Output module, for exporting the sample file of the binary format after described filtration.
Wherein, described analysis module comprises:
First analyzes submodule, for the feature string according to character set and virus family storehouse, the sample file of the described binary format that is mapped to internal memory is carried out to full text string matching, obtains and mates unsuccessful character string;
Second analyzes submodule, for filtering out the unsuccessful character string of described coupling, the sample file of the binary format after being filtered.
Wherein, described the first analysis submodule comprises:
The first matching module, be complementary for the sample file of the described binary format to being mapped to internal memory and the concentrated character string of described character code, obtains the sample file that the unsuccessful insignificant character string of coupling and the character string that the match is successful form;
The second matching module, mated for the sample file to the described character string that the match is successful composition and the feature string in described virus family storehouse, obtains the unsuccessful string to be filtered of coupling.
Wherein, described the second matching module comprises:
The first computing module, for the cryptographic hash of the character string of calculating the sample file that the described character string that the match is successful forms;
The second computing module, for the cryptographic hash of the feature string that calculates described virus family storehouse;
Matched sub-block, for the cryptographic hash of the feature string in the cryptographic hash of the character string of the sample file of described binary format and described virus family storehouse is compared, if unequal, think that two string matchings of comparing are unsuccessful, and obtain the unsuccessful string to be filtered of coupling, otherwise think that the match is successful.
Wherein, described second analyze submodule specifically for: filter out the unsuccessful insignificant character string of described coupling and the unsuccessful string to be filtered of described coupling, the sample file of the binary format after being filtered.
The beneficial effect of technique scheme of the present invention is as follows:
In such scheme, sample file by the binary format by acquisition is mapped to internal memory, and the sample file of this binary format is carried out to full text character string filter analysis, thereby filter out the unsuccessful character string of coupling, extract effectively string, effectively reduce the analysis result set of sample file, greatly promote checking and killing virus efficiency.
Embodiment
For making the technical problem to be solved in the present invention, technical scheme and advantage clearer, be described in detail below in conjunction with the accompanying drawings and the specific embodiments.
As shown in Figure 1, embodiments of the invention provide a kind of sample file analytical approach, comprising:
Step 11, the sample file of acquisition binary format;
Step 12, be mapped to internal memory by the sample file of described binary format;
Step 13, carry out the full text character string to the sample file of the described binary format that is mapped to internal memory and filter, the sample file of the binary format after being filtered;
Step 14, export the sample file of the binary format after described filtration.
This embodiment of the present invention is mapped to internal memory by the sample file of the binary format by acquisition, and the sample file of this binary format is carried out to full text character string filter analysis, effectively reduce the analysis result set of sample file, greatly promote checking and killing virus efficiency.
In another embodiment of the present invention, comprise that, on the basis of above-mentioned steps 11-14, above-mentioned steps 13 comprises:
Step 131, according to the feature string in character set and virus family storehouse, carry out the full text string matching to the sample file of the described binary format that is mapped to internal memory, obtains the unsuccessful character string of coupling;
Step 132, filter out the unsuccessful character string of described coupling, the sample file of the binary format after being filtered.
Wherein, character set comprises UNICODE, UFT-8, and GBK, GB2312, all character sets commonly used such as MBCS, the virus family storehouse comprises the feature set of strings that fixed a certain type or polytype virus characteristic of correspondence string form.
In another embodiment of the present invention, comprise that, on the basis of above-mentioned steps 11-14, above-mentioned steps 131 comprises:
Step 1311, be complementary to the sample file of the described binary format that is mapped to internal memory and the concentrated character string of described character code, obtains the unsuccessful insignificant character string of coupling and sample file corresponding to character string that the match is successful;
Step 1312, to the described character string that the match is successful, corresponding sample file and the feature string in described virus family storehouse are mated, and obtain the unsuccessful string to be filtered of coupling.
In this embodiment, the concentrated character string of sample file by the binary format to being mapped to internal memory and character code is complementary, obtain the unsuccessful insignificant character string of coupling, here the method that adopts everyday character to filter, exclude obvious insignificant character string, as: " the heir S cowherb of rewarding with food and drink spreads whiz and scalds " obtains sample file corresponding to character string that the match is successful, thereby dwindle the quantity of the sample file of binary format, can significantly promote viral killing efficiency.
Further, in another embodiment of the present invention, step 1312 can comprise:
Step 13121, calculate Hash (Hash) value of the character string in sample file corresponding to the described character string that the match is successful;
Step 13122, calculate the hash value of the feature string in described virus family storehouse;
Step 13123, the hash value of the feature string in the hash value of the character string in the sample file of described binary format and described virus family storehouse is compared, if unequal, think that two string matchings of comparing are unsuccessful, and obtain the unsuccessful string to be filtered of coupling, otherwise think that the match is successful.
In this embodiment, the hash value of the hash value of character string or feature string is all the values that adopt a DWORD(double byte of CRC32 algorithm generation), when string matching, according to this CRC32 value generated, only need a processor instruction, as, (Cmp, eRx, eRx) just can judge whether the CRC32 value of two character strings equates, whether two character strings mate, the very big like this analysis efficiency that promoted.Particularly, in above-mentioned steps 13123, the described step that the hash value of the feature string in the hash value of the character string in the sample file of described binary format and described family storehouse is compared comprises: adopt a processor instruction that the hash value of the feature string in the hash value of the character string in the sample file of described binary format and described family storehouse is compared.Wherein, after obtaining mating unsuccessful string to be filtered, can mate unsuccessful string to be filtered to these and be sorted, sorted as adopted the quicksort method, thereby these strings to be filtered be fallen in forced filtration.
Correspondingly, in above-described embodiment, step 132 can comprise: filter out the unsuccessful insignificant character string of described coupling and the unsuccessful string to be filtered of described coupling, the sample file of the binary format after being filtered.
Wherein, in above-described embodiment, described character set comprises: UNICODE, UFT-8, GBK, GB2312, MBCS character code.Wherein, can be according to 3500 Chinese characters commonly used, English, symbols etc. are used as the significant character collection and are mated, thereby exclude idle character.
In above-mentioned implementation column of the present invention, sample file to binary format is mapped to internal memory, and employing comprises that the significant character collection of all character sets is mated, exclude obvious insignificant character string, dwindle the quantity of the sample file of binary format, reduce the sample size of analyzing, thereby can significantly promote checking and killing virus efficiency, further the sample file (the visible string extracted) excluded after insignificant character string is mated with the feature string in virus family storehouse, thereby filter out the unsuccessful character string of coupling, thereby effectively reduce the analysis result set, promote checking and killing virus efficiency.
As shown in Figure 2, embodiments of the invention also provide a kind of sample file analytical equipment, comprising:
Obtain module 21, for obtaining the sample file of binary format;
Mapping block 22, be mapped to internal memory for the sample file by described binary format;
Analysis module 23, carry out full text character string filter analysis, the sample file of the binary format after being filtered for the sample file of the described binary format to being mapped to internal memory;
Output module 24, for exporting the sample file of the binary format after described filtration.
This device embodiment of the present invention is mapped to internal memory by the sample file of the binary format by acquisition equally, and the sample file of this binary format is carried out to full text character string filter analysis, effectively reduce the analysis result set of sample file, greatly promote checking and killing virus efficiency.
Wherein, described analysis module comprises:
First analyzes submodule, for the feature string according to character set and virus family storehouse, the sample file of the described binary format that is mapped to internal memory is carried out to full text string matching, obtains and mates unsuccessful character string;
Second analyzes submodule, for filtering out the unsuccessful character string of described coupling, the sample file of the binary format after being filtered.
Wherein, character set comprises UNICODE, UFT-8, and GBK, GB2312, all character sets commonly used such as MBCS, the virus family storehouse comprises the feature set of strings that fixed a certain type or polytype virus characteristic of correspondence string form.
Wherein, described the first analysis submodule comprises:
The first matching module, be complementary for the sample file of the described binary format to being mapped to internal memory and the concentrated character string of described character code, obtains the sample file that the unsuccessful insignificant character string of coupling and the character string that the match is successful form;
The second matching module, mated for the sample file to the described character string that the match is successful composition and the feature string in described virus family storehouse, obtains the unsuccessful string to be filtered of coupling.
Wherein, described the second matching module comprises:
The first computing module, for the cryptographic hash of the character string of calculating the sample file that the described character string that the match is successful forms;
The second computing module, for the cryptographic hash of the feature string that calculates described virus family storehouse;
Matched sub-block, for the cryptographic hash of the feature string in the cryptographic hash of the character string of the sample file of described binary format and described virus family storehouse is compared, if unequal, think that two string matchings of comparing are unsuccessful, and obtain the unsuccessful string to be filtered of coupling, otherwise think that the match is successful.
In this embodiment, the concentrated character string of sample file by the binary format to being mapped to internal memory and character code is complementary, obtain the unsuccessful insignificant character string of coupling, here the method that adopts everyday character to filter, exclude obvious insignificant character string, as: " the heir S cowherb of rewarding with food and drink spreads whiz and scalds " obtains sample file corresponding to character string that the match is successful, thereby dwindle the quantity of the sample file of binary format, can significantly promote viral killing efficiency.
Wherein, described second analyze submodule specifically for: filter out the unsuccessful insignificant character string of described coupling and the unsuccessful string to be filtered of described coupling, the sample file of the binary format after being filtered.
This device embodiment of the present invention is mapped to internal memory by the sample file to binary format equally, and employing comprises that the significant character collection of all character sets is mated, exclude obvious insignificant character string, dwindle the quantity of the sample file of binary format, reduce the sample size of analyzing, thereby can significantly promote checking and killing virus efficiency, further the sample file (the visible string extracted) excluded after insignificant character string is mated with the feature string in virus family storehouse, thereby filter out the unsuccessful character string of coupling, thereby effectively reduce the analysis result set, promote checking and killing virus efficiency.
The above is the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, under the prerequisite that does not break away from principle of the present invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.