CN103020482A - Relation-based spam comment detection method - Google Patents

Relation-based spam comment detection method Download PDF

Info

Publication number
CN103020482A
CN103020482A CN2013100025837A CN201310002583A CN103020482A CN 103020482 A CN103020482 A CN 103020482A CN 2013100025837 A CN2013100025837 A CN 2013100025837A CN 201310002583 A CN201310002583 A CN 201310002583A CN 103020482 A CN103020482 A CN 103020482A
Authority
CN
China
Prior art keywords
comment
reviewer
mark
obtains
hotel owner
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013100025837A
Other languages
Chinese (zh)
Inventor
张卫丰
王云
周国强
张迎周
王子元
周国富
钱小燕
许碧欢
陆柳敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN2013100025837A priority Critical patent/CN103020482A/en
Publication of CN103020482A publication Critical patent/CN103020482A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to a relation-based spam comment detection method, which is based on the relation characteristic of critics of online shopping, comments and shop owners. The method raises concepts of the credibility of critics, the integrity of comments and the credibility of shop owners and leads to the mutual relation of the three concepts as follows: the higher the integrity of comments written by the critics is, the higher the credibility of the critics is; the more the honest comments of the shop owners from the credible critics is, the higher the integrity of the shop owners is; and the more the number of comments supported by other honest comments is, the higher the integrity of the comments is. The iteration relation is raised for the first time, and the iteration relation is applied to the actual detection work. The relation characteristic is utilized to establish a model, and the model is combined with models obtained by other characteristics of the three concepts, so that an improved model used for spam comment detection is obtained.

Description

A kind of rubbish comment detection method based on relation
Technical field
The present invention relates to a kind of method that detects based on the rubbish comment of relation, it mainly is the mutual relationship feature of analyzing between reviewer, comment and the hotel owner three, and model based on this relation proposed, combine with this model with according to three's the resulting model of other features, reach the purpose that comment detects to rubbish.Mainly solve current technology and be the problems such as unicity that the rubbish comment detects the model that proposes and limitation, belonged to machine learning and Data Mining.
Background technology
The shopping online comment is compared commercial quality for client, and hotel owner's service and many other aspects provide valuable information.But the rubbish reviewer occurs now, their purpose is by issuing false or inequitable comment, misleading normal client to commodity or hotel owner's impression.Such as occupation difference teacher of the commenting, as its name suggests, by to others the poor people who comments life, be the emerging occupation that is expedited the emergence of by Taobao exactly.
In more wide field, great majority mainly concentrate on webpage and mail field about the research of rubbish activity. and the rubbish behavior on the webpage is divided into two large classes: rubbish contents and rubbish link.The rubbish link is the rubbish behavior on hyperlink, owing to generally do not have link in the comment, the rubbish link can not appear in the rubbish comment.Content rubbish refers to add incoherent literal in webpage, cheats search engine with this.The reviewer can not add incoherent literal in their comment.Spam typically refers to and sends unapproved commercial advertisement.Although advertisement can occur in comment, quantity after all seldom.
The rubbish comment detection algorithm in early stage all is to use reviewer's behavior to distinguish the rubbish reviewer, for example, and the similarity of comment text, the similarity of scoring and deviation, the commodity amount of rubbish comment etc.According to existing research, these behaviors are effective to the rubbish comment activity of particular type.For example, the reviewer uses a large amount of Similar Texts in to the various comments of identical goods, and the reviewer gives the scoring of different commodity unusual high or low in a short time frequently, and this reviewer probably is exactly the rubbish reviewer.
Nitin and Liu have proposed the problem that comment detects about rubbish first in 2008.The rubbish comment is divided into three types: false comment, only for the comment of brand, do not comment on the comment of content.Use the method for monitoring to detect the rubbish comment: at first, to extract one about comment, the feature set of reviewer and commodity; Then, mainly use text similarity and some artificial means sign rubbish comments.Based on these features and sorter of training data structure, comment on for detection of rubbish.The method largely depends on text similarity, only such rubbish comment behavior is produced effect.
Jindal proposed the unexpected rule of a kind of usage mining in 2010 algorithm detects the rubbish comment.Regard every comment as the record relevant with certain evaluation class, this comment class comprises positive evaluation class, negative evaluation class and neutral evaluation class.Use unexpected rule digging algorithm to generate a unexpected list of rules.Yet this method can not be distinguished real rubbish reviewer, can only find some as the strange behavior of unexpected rule.
Lim proposed another kind of rubbish comment detection method based on reviewer's behavior in 2010.They have found the feature of many rubbish comment behaviors, for example, and various evaluations or comment and effort analysis on single commodity or the one group of commodity.Each reviewer obtains different marks in these features, again these marks is carried out linear combination, and last PTS is exactly this reviewer's suspicious degree.This method is non-supervisory, has saved the cost of many artificial signs.Yet, still depend in essence text similarity according to their research.The rubbish comment that therefore, also can only be used for some specific types detects.
The weak point of above the whole bag of tricks also is, all text of a research and utilization rubbish comment or scoring feature, and this has limitation.Therefore, detect the rubbish comment in the urgent need to a kind of new method.Because in net purchase, the reviewer, comment, hotel owner three is the individuality that can not isolate, and has the relation of many inherences between the three.Therefore find out the relation between this three, and apply it in the rubbish comment testing, find out the dependence of this feature and other behavioural characteristics again, this will improve the degree of accuracy of testing greatly.
Summary of the invention
Technical matters: the rubbish based on relation that the purpose of this invention is to provide a kind of novelty is commented on the method that detects.For the relationship characteristic between reviewer, comment and the hotel owner three, utilize this feature to carry out modeling, combine with this model with according to the resulting model of three's inherent feature, obtain three models that connect each other that represent respectively reviewer, comment and hotel owner.At last, utilize these models to obtain reviewer's confidence level, the honest degree of comment and hotel owner's fiduciary level, detect the purpose that rubbish is commented on according to certain standard to reach.
Technical scheme: the rubbish comment detection method based on relation that the present invention proposes is a kind of reviewer based on net purchase, the detection method of comment and hotel owner's relationship characteristic.Reviewer's confidence level has been proposed, the concept of the honest degree of comment and hotel owner's fiduciary level, and drawn three's mutual relationship: the honest degree of the comment that the reviewer writes is higher, and his confidence level is just higher; The honesty comment from believable reviewer that the hotel owner has is more, and his fiduciary level is just higher; The number that comment is supported by other honest comments is more, and his honest degree is just higher.In the method that current rubbish comment detects, propose such iterative relation for the first time, and apply it in the actual testing.Utilize this relationship characteristic to carry out modeling, this model and three's the resulting model of other features is combined, be used for the model that the rubbish comment detects after being improved.
Rubbish comment detection method based on relation mainly is divided into following steps:
Step 1) is calculated the honest degree mark of comment:
Step 1.1) input comment aggregate information:
Step 1.2) obtains score value and the comment time of all comments;
Step 1.3) calculates the mean value of scoring and commenting on the time the earliest;
Step 1.4) obtains a review information;
Step 1.5) judge that whether review information is empty, if be not empty, then turns step 1.6), otherwise, turn step 1.10);
Step 1.6) calculate the honest degree mark of comment:
Step 1.6.1) obtains the score value of this comment;
Step 1.6.2) according to step 1.3) mean value, it is poor to calculate scoring;
Step 1.6.3) obtains comment time of this comment;
Step 1.6.4) according to step 1.3) the earliest comment time, calculate the comment mistiming;
Step 1.6.5) obtains the comment text of this comment;
Step 1.6.6) according to the cosine law, calculates the text similarity of comment text;
Step 1.6.7) according to step 1.6.2) the poor IRD of scoring, step 1.6.4) mistiming IETF, step 1.6.6) similarity ICS, calculate the honest degree mark A of comment:
A=β 1IRD+β 2ICS+β 3IETF (1)
β wherein 1, β 2, β 3Be constant, and satisfy β 1+ β 2+ β 3=1;
Step 1.7) upgrades the honest degree attribute of commenting on;
Step 1.8) obtains next review information;
Step 1.9) judge that whether this review information is empty, if empty, turns step 1.10), otherwise, turn step 1.2);
Step 1.10) the honest degree mark of output comment;
Step 2) calculate hotel owner's fiduciary level:
Step 2.1) variable h=1 is set;
Step 2.2) obtains h hotel owner's information;
Step 2.3) judge that whether the hotel owner is empty, if be not empty, turns step 2.4), otherwise, turn step 2.8);
Step 2.4) calculating hotel owner's fiduciary level mark:
Step 2.4.1) obtains this hotel owner's commodity degree of conforming to, seller's service, commodity and service, commodity price, the quantitative information of goods delivery;
Step 2.4.2) calculate " S " type score:
S ( x ) = &alpha; x - &beta; 3 + &gamma; , x &GreaterEqual; 0 0 , x < 0 - - - ( 2 )
Wherein α, β, λ are constant, and x is hotel owner's quantitative information;
Step 2.4.3) generates the weight vector of marking;
Step 2.4.4) " S " type score step 2.4.2) multiply by weight vector, obtains the fiduciary level mark;
Step 2.5) renewal hotel owner's fiduciary level attribute;
Step 2.6) h=h+1 turns step 2.2);
Step 2.8) output hotel owner's fiduciary level mark;
Step 3) is calculated reviewer's confidence level:
Step 3.1) obtains all reviewer's information;
Step 3.2) obtains reviewer's information;
Step 3.3) judge that whether reviewer's information is empty, if be not empty, turns step 3.4), otherwise, turn step 3.8);
Step 3.4) calculating reviewer's confidence level mark:
Step 3.4.1) obtains this reviewer's dealing money, credit information;
Step 3.4.2) obtains corresponding score value;
Step 3.4.3) weight vectors of generation score value;
Step 3.4.4) score value step 3.4.2) multiply by weight vectors, obtains reviewer's confidence level mark;
Step 3.5) renewal reviewer's confidence level attribute;
Step 3.6) obtains next reviewer's information, turn step 3.3);
Step 3.8) output reviewer's confidence level mark;
Step 4) initialization iterations is 0;
Step 5) is upgraded the honest degree mark of comment;
Step 5.1) obtain relational model:
H ( r ) = R ( s ) ( 2 1 + e T ( r ) - 1 ) - - - ( 3 )
Wherein, R (s) is the fiduciary level mark of hotel owner s, and T (r) is the confidence level mark of reviewer r;
Step 5.2) calculate the honest degree mark of commenting on:
Step 5.2.1) obtains the reviewer's who delivers this comment confidence level mark;
Step 5.2.2) obtains the hotel owner's that comment comments on fiduciary level mark;
Step 5.3.3) according to step 5.1) model calculate honest degree mark;
Step 5.4) upgrades the honest degree attribute information of commenting on;
The honest degree mark of the comment after step 5.5) output is upgraded;
Step 6) is upgraded reviewer's confidence level mark:
Step 6.1) obtain relational model:
T ( r ) = 2 1 + e H ( r ) - 1 - - - ( 4 )
Wherein, H (r) is the honest degree mark of comment r;
Step 6.2) calculating reviewer's confidence level mark:
Step 6.2.1) obtains the honest degree of all comments that this reviewer delivers;
Step 6.2.2) according to step 6.1) model calculate reviewer's confidence level mark;
Step 6.3) renewal reviewer's confidence level attribute information;
Step 6.4) the confidence level mark of the reviewer after output is upgraded;
Step 7) is upgraded hotel owner's fiduciary level mark:
Step 7.1) obtain relational model:
R ( s ) = 2 1 + e - &theta; - 1 - - - ( 5 )
&theta; = &Sigma; v &Element; U s , T ( k v ) > 0 T ( k v ) ( &Psi; v - &mu; ) - - - ( 6 )
Wherein, T (k v) be the reviewer k of v of making comments vConfidence level, Ψ vBe the scoring of comment v, μ is the mean value of system's comment;
Step 7.2) calculating hotel owner's fiduciary level mark:
Step 7.2.1) obtains this hotel owner's reviewer's confidence level mark;
Step 7.2.2) obtains the scoring of all comments of reviewer;
Step 7.2.3) according to step 7.1) model calculate hotel owner's fiduciary level mark;
Step 7.3) renewal hotel owner's fiduciary level attribute;
The step 8) iterations adds 1;
Whether step 9) judges iterations less than 5, if, turn step 5), otherwise, step 10) turned;
Step 10) output hotel owner's fiduciary level mark, the honest degree mark of comment, reviewer's confidence level mark;
Step 11) output detections result: normal comment, rubbish comment; Normal reviewer, rubbish reviewer.
Beneficial effect: the present invention contrasts existing technology, has following innovative point:
Inherent dependence for reviewer, comment and hotel owner three has proposed the model based on this relation, combines with this model with according to three's the resulting model of other features.
In a word, by using this method, obtain having the result of good reference value and decision value, improved precision and recall rate that the rubbish comment detects.
Description of drawings
Fig. 1 detects rubbish comment process flow diagram;
Fig. 2 calculates hotel owner's fiduciary level process flow diagram;
Fig. 3 calculates the honest degree process flow diagram of comment;
Fig. 4 calculates reviewer's confidence level process flow diagram.
Embodiment
Based on the rubbish comment detection method of relation, producing with Eclipse is developing instrument, and MATLAB combines with yaahp analytical hierarchy process software and does data analysis.Wherein detailed step is as follows, sees Fig. 1.
1, a kind of rubbish comment detection method based on relation is characterized in that the method mainly is divided into following steps:
The honest degree model of step 1) structure comment: from commenting on given scoring, the text similarity of comment and other comments is commented on time three aspects of issuing and is made up model, as seen in Figure 3.
Step 1.1) according to the information of all comments, calculates the mean value of scoring and comment on the time the earliest;
Step 1.2) according to the scoring fractional value of comment, calculate mean value poor of score value and scoring:
D ( p ) = | r p - r p &OverBar; | 4 - - - ( 1 )
Wherein, r pBe this comment to the scoring of commodity P,
Figure BDA00002700654500062
Be the average mark of the resulting comment of commodity p, maximum scores is poor to be that 4, D (p) calculates the scoring of comment and the degree of deviation of commodity average mark.
Step 1.3) according to the comment time of comment, calculate comment time and the mistiming of commenting on the earliest the time:
The rubbish reviewer is in order to produce larger impact, the wrong information of time issue through being everlasting early, so the issuing time of comment from comment on commodity the earliest issuing time more close to, for the possibility of rubbish comment larger.
GTF ( p ) = 0 if T ( p ) - A ( p ) > &beta; 1 - T ( p ) - A ( p ) &beta; otherwise - - - ( 2 )
Wherein, T (p) is the time that obtains commenting on, and A (p) is the comment time that commodity P obtains the earliest, and β is time threshold, if the mistiming surpasses this thresholding, the possibility that then is expressed as the rubbish comment is 0.It is poor that GTF (p) calculates the comment issuing time.
Step 1.4) according to the comment text of comment, calculate the text similarity of comment text:
The rubbish reviewer may repeat to comment on this commodity, because it is very tired all to write the comment of different content at every turn, so comment text also is to copy or close other comment texts that copies, so it is higher to work as the similarity of text, the possibility of commenting on for rubbish is larger.
ICS=avg(cosine(c(p))) (3)
Wherein, c (p) is the comment text of commodity p, and cosine (c (p)) uses the text similarity based on the calculating of vector space cosine Similarity algorithm and other comments.ICS calculates the mean value of several text similarities.
Step 1.5) the honest degree mark of comment is calculated in will mark poor, mistiming, the linear combination of text similarity.
A(r)=β 1IRD(p)+β 2ICS(p)+β 3IETF(p) (4)
Wherein &beta; 1 = 1 5 , &beta; 2 = 2 5 , &beta; 3 = 2 5 ;
Step 2) calculates hotel owner's fiduciary level: be the satisfaction aspect five of the commodity degree of conforming to, seller's service, commodity and service, commodity price, goods delivery to be given a mark after closing the transaction according to the buyer, in conjunction with weights structure model separately.As seen in Figure 2.
Step 2.1) according to hotel owner's commodity degree of conforming to, seller's service, commodity and service, commodity price, goods delivery information structuring score function:
When user satisfaction changes to when better from fine, its score changes should be slower; From better when very poor, its score changes should be greatly.This is because qualitative change has occured user satisfaction; And satisfaction is poorer, and score is lower, so score function is:
S ( x ) = &alpha; x - &beta; 3 + &gamma; , x &GreaterEqual; 0 0 , x < 0 - - - ( 5 )
α wherein, β, λ are constant, x hotel owner information quantization value.
Step 2.2) calculates the score value of each information by score function;
Step 2.3) with the linear combination of score value, obtains hotel owner's fiduciary level mark;
Step 3) is calculated reviewer's confidence level: from this dealing money, two aspects of buyer credit degree make up the score function model, as seen in Figure 4.
Step 3.1) obtains all reviewer's information;
Step 3.2) calculates corresponding score value and the weight vectors of score value according to reviewer's dealing money, credit information;
Step 3.3) according to comment value and weight vector computation reviewer's confidence level mark;
Step 4) is upgraded the honest degree of comment: even a comment and its other comments on every side are inconsistent, and this comment is to be delivered by believable reviewer, and other comments on every side are to be delivered by incredible reviewer, and this comment remains honest comment so:
Step 4.1) calculate honest degree mark according to the honest degree relational model of comment:
H ( r ) = R ( s ) ( 2 1 + e T ( r ) - 1 ) - - - ( 9 )
Wherein, R (s) is hotel owner's fiduciary level mark, and T (r) is reviewer's confidence level mark.
Step 4.3) upgrades the honest degree attribute information of commenting on;
Step 5) is upgraded reviewer's confidence level: the height of reviewer's confidence level depend on front comment that he delivers and negative reviews how much.The honest degree mark summation of the comment of delivering is higher, and this reviewer's confidence level is higher;
Step 5.1) calculate the confidence level mark according to reviewer's confidence level relational model:
T ( r ) = 2 1 + e H ( r ) - 1 - - - ( 10 )
Wherein, H (r) is the honest degree mark of comment.
Step 5.3) renewal reviewer's confidence level attribute information;
Step 6) is upgraded hotel owner's fiduciary level: hotel owner's fiduciary level mainly depends on the comment that all credible reviewers do.The front of being done by the credible reviewer comment that has is more, and hotel owner's fiduciary level is higher;
Step 6.1) calculate the fiduciary level mark according to hotel owner's fiduciary level relational model:
R ( s ) = 2 1 + e - &theta; - 1 - - - ( 11 )
&theta; = &Sigma; v &Element; U s , T ( k v ) > 0 T ( k v ) ( &Psi; v - &mu; ) - - - ( 12 )
Wherein, T (k v) be reviewer's confidence level, Ψ vBe the scoring that this reviewer sends out comment, μ is the mean value of system's comment.
Step 6.3) renewal hotel owner's fiduciary level attribute;
Step 7) output hotel owner's fiduciary level mark, the honest degree mark of comment, reviewer's confidence level mark;
Step 8) output detections result: normal comment, rubbish comment; Normal reviewer, rubbish reviewer.

Claims (1)

1. the rubbish based on relation is commented on detection method, it is characterized in that the method mainly is divided into following steps:
Step 1) is calculated the honest degree mark of comment:
Step 1.1) input comment aggregate information:
Step 1.2) obtains score value and the comment time of all comments;
Step 1.3) calculates the mean value of scoring and commenting on the time the earliest;
Step 1.4) obtains a review information;
Step 1.5) judge that whether review information is empty, if be not empty, then turns step 1.6), otherwise, turn step 1.10);
Step 1.6) calculate the honest degree mark of comment:
Step 1.6.1) obtains the score value of this comment;
Step 1.6.2) according to step 1.3) mean value, it is poor to calculate scoring;
Step 1.6.3) obtains comment time of this comment;
Step 1.6.4) according to step 1.3) the earliest comment time, calculate the comment mistiming;
Step 1.6.5) obtains the comment text of this comment;
Step 1.6.6) according to the cosine law, calculates the text similarity of comment text;
Step 1.6.7) according to step 1.6.2) the poor IRD of scoring, step 1.6.4) mistiming IETF, step 1.6.6) similarity ICS, calculate the honest degree mark A of comment:
A=β 1IRD+β 2ICS+β 3IETF (1)
β wherein 1, β 2, β 3Be constant, and satisfy β 1+ β 2+ β 3=1;
Step 1.7) upgrades the honest degree attribute of commenting on;
Step 1.8) obtains next review information;
Step 1.9) judge that whether this review information is empty, if empty, turns step 1.10), otherwise, turn step 1.2);
Step 1.10) the honest degree mark of output comment;
Step 2) calculate hotel owner's fiduciary level:
Step 2.1) variable h=1 is set;
Step 2.2) obtains h hotel owner's information;
Step 2.3) judge that whether the hotel owner is empty, if be not empty, turns step 2.4), otherwise, turn step 2.8);
Step 2.4) calculating hotel owner's fiduciary level mark:
Step 2.4.1) obtains this hotel owner's commodity degree of conforming to, seller's service, commodity and service, commodity price, the quantitative information of goods delivery;
Step 2.4.2) calculate " S " type score:
S ( x ) = &alpha; x - &beta; 3 + &gamma; , x &GreaterEqual; 0 0 , x < 0 - - - ( 2 )
Wherein α, β, λ are constant, and x is hotel owner's quantitative information;
Step 2.4.3) generates the weight vector of marking;
Step 2.4.4) " S " type score step 2.4.2) multiply by weight vector, obtains the fiduciary level mark;
Step 2.5) renewal hotel owner's fiduciary level attribute;
Step 2.6) h=h+1 turns step 2.2);
Step 2.8) output hotel owner's fiduciary level mark;
Step 3) is calculated reviewer's confidence level:
Step 3.1) obtains all reviewer's information;
Step 3.2) obtains reviewer's information;
Step 3.3) judge that whether reviewer's information is empty, if be not empty, turns step 3.4), otherwise, turn step 3.8);
Step 3.4) calculating reviewer's confidence level mark:
Step 3.4.1) obtains this reviewer's dealing money, credit information;
Step 3.4.2) obtains corresponding score value;
Step 3.4.3) weight vectors of generation score value;
Step 3.4.4) score value step 3.4.2) multiply by weight vectors, obtains reviewer's confidence level mark;
Step 3.5) renewal reviewer's confidence level attribute;
Step 3.6) obtains next reviewer's information, turn step 3.3);
Step 3.8) output reviewer's confidence level mark;
Step 4) initialization iterations is 0;
Step 5) is upgraded the honest degree mark of comment;
Step 5.1) obtain relational model:
H ( r ) = R ( s ) ( 2 1 + e T ( r ) - 1 ) - - - ( 3 )
Wherein, R (s) is the fiduciary level mark of hotel owner s, and T (r) is the confidence level mark of reviewer r;
Step 5.2) calculate the honest degree mark of commenting on:
Step 5.2.1) obtains the reviewer's who delivers this comment confidence level mark;
Step 5.2.2) obtains the hotel owner's that comment comments on fiduciary level mark;
Step 5.3.3) according to step 5.1) model calculate honest degree mark;
Step 5.4) upgrades the honest degree attribute information of commenting on;
The honest degree mark of the comment after step 5.5) output is upgraded;
Step 6) is upgraded reviewer's confidence level mark:
Step 6.1) obtain relational model:
T ( r ) = 2 1 + e H ( r ) - 1 - - - ( 4 )
Wherein, H (r) is the honest degree mark of comment r;
Step 6.2) calculating reviewer's confidence level mark:
Step 6.2.1) obtains the honest degree of all comments that this reviewer delivers;
Step 6.2.2) according to step 6.1) model calculate reviewer's confidence level mark;
Step 6.3) renewal reviewer's confidence level attribute information;
Step 6.4) the confidence level mark of the reviewer after output is upgraded;
Step 7) is upgraded hotel owner's fiduciary level mark:
Step 7.1) obtain relational model:
R ( s ) = 2 1 + e - &theta; - 1 - - - ( 5 )
&theta; = &Sigma; v &Element; U s , T ( k v ) > 0 T ( k v ) ( &Psi; v - &mu; ) - - - ( 6 )
Wherein, T (k v) be the reviewer k of v of making comments vConfidence level, Ψ vBe the scoring of comment v, μ is the mean value of system's comment;
Step 7.2) calculating hotel owner's fiduciary level mark:
Step 7.2.1) obtains this hotel owner's reviewer's confidence level mark;
Step 7.2.2) obtains the scoring of all comments of reviewer;
Step 7.2.3) according to step 7.1) model calculate hotel owner's fiduciary level mark;
Step 7.3) renewal hotel owner's fiduciary level attribute;
The step 8) iterations adds 1;
Whether step 9) judges iterations less than 5, if, turn step 5), otherwise, step 10) turned;
Step 10) output hotel owner's fiduciary level mark, the honest degree mark of comment, reviewer's confidence level mark;
Step 11) output detections result: normal comment, rubbish comment; Normal reviewer, rubbish reviewer.
CN2013100025837A 2013-01-05 2013-01-05 Relation-based spam comment detection method Pending CN103020482A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013100025837A CN103020482A (en) 2013-01-05 2013-01-05 Relation-based spam comment detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013100025837A CN103020482A (en) 2013-01-05 2013-01-05 Relation-based spam comment detection method

Publications (1)

Publication Number Publication Date
CN103020482A true CN103020482A (en) 2013-04-03

Family

ID=47969080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013100025837A Pending CN103020482A (en) 2013-01-05 2013-01-05 Relation-based spam comment detection method

Country Status (1)

Country Link
CN (1) CN103020482A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103745001A (en) * 2014-01-24 2014-04-23 福州大学 System for detecting reviewers of negative comments on products
CN104123328A (en) * 2013-04-28 2014-10-29 北京千橡网景科技发展有限公司 Method and device used for inhibiting spam comments in website
CN104463601A (en) * 2014-11-13 2015-03-25 电子科技大学 Method for detecting users who score maliciously in online social media system
KR20150033901A (en) * 2013-09-25 2015-04-02 에스케이텔레콤 주식회사 Apparatus for interpreting meaning of text emoticon, and recording medium therefor
CN104536980A (en) * 2014-12-05 2015-04-22 百度在线网络技术(北京)有限公司 To-be-commented item quality information determination method and device
CN104867020A (en) * 2015-05-16 2015-08-26 成都数联铭品科技有限公司 False evaluation ID judgment and identification system
CN104867033A (en) * 2015-05-16 2015-08-26 成都数联铭品科技有限公司 Electronic commerce client evaluation judging and marking system
CN104867032A (en) * 2015-05-16 2015-08-26 成都数联铭品科技有限公司 Electronic commerce client evaluation identification system
CN104867018A (en) * 2015-05-16 2015-08-26 成都数联铭品科技有限公司 Electronic commerce evaluation judgment system based on evaluation content and ID similarity identification
CN104867017A (en) * 2015-05-16 2015-08-26 成都数联铭品科技有限公司 Electronic commerce client false evaluation identification system
CN104867019A (en) * 2015-05-16 2015-08-26 成都数联铭品科技有限公司 Electronic commerce evaluation identification system based on ID similarity identification
CN104881796A (en) * 2015-05-16 2015-09-02 成都数联铭品科技有限公司 False comment judgment system based on comment content and ID recognition
CN104881795A (en) * 2015-05-16 2015-09-02 成都数联铭品科技有限公司 E-commerce false comment judging and recognizing method
CN105809379A (en) * 2014-12-30 2016-07-27 阿里巴巴集团控股有限公司 Logistics branch evaluation method, device and electronic device
CN106233316A (en) * 2014-03-05 2016-12-14 电子湾有限公司 Products & services are utilized to comment on
CN106484679A (en) * 2016-10-20 2017-03-08 北京邮电大学 A kind of false review information recognition methodss being applied on consumption platform and device
CN106844349A (en) * 2017-02-14 2017-06-13 广西师范大学 Comment spam recognition methods based on coorinated training
CN109344176A (en) * 2018-09-05 2019-02-15 浙江工业大学 False comment detection method based on Two-way Cycle figure
KR102032091B1 (en) * 2019-03-15 2019-10-14 배준철 Method And System of Comment Emotion Analysis based on Artificial Intelligence
CN114626885A (en) * 2022-03-17 2022-06-14 华院分析技术(上海)有限公司 Retail management method and system based on big data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7574476B2 (en) * 2002-02-27 2009-08-11 Gordano Limited Filtering e-mail messages
CN102254038A (en) * 2011-08-11 2011-11-23 武汉安问科技发展有限责任公司 System and method for analyzing network comment relevance
US20120222100A1 (en) * 2011-02-24 2012-08-30 International Business Machines Corporation Advanced captcha using integrated images

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7574476B2 (en) * 2002-02-27 2009-08-11 Gordano Limited Filtering e-mail messages
US20120222100A1 (en) * 2011-02-24 2012-08-30 International Business Machines Corporation Advanced captcha using integrated images
CN102254038A (en) * 2011-08-11 2011-11-23 武汉安问科技发展有限责任公司 System and method for analyzing network comment relevance

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
JINAL N,ET AL.: "Opinion Spam and Analysis", 《PROCEEDINGS OF THE FIRST ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING(WSDM) 》, 12 February 2009 (2009-02-12) *
JINAL N,ET AL.: "Reviw Spam and Detection", 《PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB》, 31 December 2007 (2007-12-31) *
JINDAL N, ET AL.: "Analyzing and Detecting Review Spam", 《IN PROCEEDING OF THE 7TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING(ICDM‘07)》, 31 December 2007 (2007-12-31) *
JINGJING LIU ,ET AL: "Low-Quality Product Review Detection in Opinion Summarization", 《PROCEEDINGS OF THE 2007 JOINT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND COMPUTATIONAL NATURAL LANGUAGE LEARNING》, 30 June 2007 (2007-06-30) *
LIM E P, ET AL.: "Detecting Product Review Spammers Using Rating Behavious", 《IN PROCEEDINGS OF THE 19TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT(CIKM’10)》, 31 December 2010 (2010-12-31) *
孙升芸等: "产品垃圾评论检测研究综述", 《计算机科学》, vol. 38, no. 10, 31 October 2011 (2011-10-31) *
孙升芸等: "基于评论行为的商品垃圾评论的识别研究", 《计算机工程与设计》, vol. 33, no. 11, 30 November 2012 (2012-11-30) *
邱云飞等: "基于用户行为的产品垃圾评论者检测研究", 《计算机工程》, vol. 38, no. 11, 30 June 2012 (2012-06-30) *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104123328A (en) * 2013-04-28 2014-10-29 北京千橡网景科技发展有限公司 Method and device used for inhibiting spam comments in website
KR20150033901A (en) * 2013-09-25 2015-04-02 에스케이텔레콤 주식회사 Apparatus for interpreting meaning of text emoticon, and recording medium therefor
KR102108129B1 (en) 2013-09-25 2020-05-07 에스케이텔레콤 주식회사 Apparatus for interpreting meaning of text emoticon, and recording medium therefor
CN103745001A (en) * 2014-01-24 2014-04-23 福州大学 System for detecting reviewers of negative comments on products
CN103745001B (en) * 2014-01-24 2016-10-05 福州大学 A kind of product comment spam person's detecting system
CN106233316A (en) * 2014-03-05 2016-12-14 电子湾有限公司 Products & services are utilized to comment on
CN104463601A (en) * 2014-11-13 2015-03-25 电子科技大学 Method for detecting users who score maliciously in online social media system
CN104536980A (en) * 2014-12-05 2015-04-22 百度在线网络技术(北京)有限公司 To-be-commented item quality information determination method and device
WO2016086724A1 (en) * 2014-12-05 2016-06-09 百度在线网络技术(北京)有限公司 Method and apparatus for determining quality information about to-be-commented item
CN105809379A (en) * 2014-12-30 2016-07-27 阿里巴巴集团控股有限公司 Logistics branch evaluation method, device and electronic device
CN104881795A (en) * 2015-05-16 2015-09-02 成都数联铭品科技有限公司 E-commerce false comment judging and recognizing method
CN104867020A (en) * 2015-05-16 2015-08-26 成都数联铭品科技有限公司 False evaluation ID judgment and identification system
CN104867019A (en) * 2015-05-16 2015-08-26 成都数联铭品科技有限公司 Electronic commerce evaluation identification system based on ID similarity identification
CN104867017A (en) * 2015-05-16 2015-08-26 成都数联铭品科技有限公司 Electronic commerce client false evaluation identification system
CN104867018A (en) * 2015-05-16 2015-08-26 成都数联铭品科技有限公司 Electronic commerce evaluation judgment system based on evaluation content and ID similarity identification
CN104867032A (en) * 2015-05-16 2015-08-26 成都数联铭品科技有限公司 Electronic commerce client evaluation identification system
CN104867033A (en) * 2015-05-16 2015-08-26 成都数联铭品科技有限公司 Electronic commerce client evaluation judging and marking system
CN104881796A (en) * 2015-05-16 2015-09-02 成都数联铭品科技有限公司 False comment judgment system based on comment content and ID recognition
CN106484679A (en) * 2016-10-20 2017-03-08 北京邮电大学 A kind of false review information recognition methodss being applied on consumption platform and device
CN106484679B (en) * 2016-10-20 2020-02-11 北京邮电大学 False comment information identification method and device applied to consumption platform
CN106844349B (en) * 2017-02-14 2019-10-18 广西师范大学 Comment spam recognition methods based on coorinated training
CN106844349A (en) * 2017-02-14 2017-06-13 广西师范大学 Comment spam recognition methods based on coorinated training
CN109344176A (en) * 2018-09-05 2019-02-15 浙江工业大学 False comment detection method based on Two-way Cycle figure
KR102032091B1 (en) * 2019-03-15 2019-10-14 배준철 Method And System of Comment Emotion Analysis based on Artificial Intelligence
CN114626885A (en) * 2022-03-17 2022-06-14 华院分析技术(上海)有限公司 Retail management method and system based on big data

Similar Documents

Publication Publication Date Title
CN103020482A (en) Relation-based spam comment detection method
US11954739B2 (en) Methods and systems for automatically detecting fraud and compliance issues in expense reports and invoices
Shmueli et al. Data mining for business analytics: Concepts, techniques, and applications with XLMiner
Shmueli et al. Data mining for business analytics: concepts, techniques and applications in Python
CN111444334B (en) Data processing method, text recognition device and computer equipment
CN108229590A (en) A kind of method and apparatus for obtaining multi-tag user portrait
CN104536980A (en) To-be-commented item quality information determination method and device
CN105302810A (en) Information search method and apparatus
CN109360089A (en) Credit risk prediction technique and device
US20160110763A1 (en) Extracting product purchase information from electronic messages
CN105612515A (en) Device for collecting contradictory expression and computer program for same
CN106202481A (en) The evaluation methodology of a kind of perception data and system
CN104778186A (en) Method and system for hanging commodity object to standard product unit (SPU)
CN106170002A (en) A kind of Chinese counterfeit domain name detection method and system
CN104346408A (en) Method and equipment for labeling network user
CN110163661A (en) Marketing message promotion method, device, electronic equipment and computer-readable medium
CN108572988A (en) A kind of house property assessment data creation method and device
CN109345272A (en) One kind is based on the markovian shop credit risk forecast method of improvement
Goo et al. Improving the prediction of going concern of Taiwanese listed companies using a hybrid of LASSO with data mining techniques
CN105303447A (en) Method and device for carrying out credit rating through network information
Patil et al. Online review spam detection using language model and feature selection
Gupta Applied analytics through case studies using Sas and R: implementing predictive models and machine learning techniques
CN105138572A (en) Method and device for obtaining correlation weight of user tag
CN107291686B (en) Method and system for identifying emotion identification
Shmueli et al. Machine Learning for Business Analytics: Concepts, Techniques and Applications with JMP Pro

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130403