CN104699675A - Message translating method and device - Google Patents

Message translating method and device Download PDF

Info

Publication number
CN104699675A
CN104699675A CN201510119654.0A CN201510119654A CN104699675A CN 104699675 A CN104699675 A CN 104699675A CN 201510119654 A CN201510119654 A CN 201510119654A CN 104699675 A CN104699675 A CN 104699675A
Authority
CN
China
Prior art keywords
emoticon
information
word
translation
mark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510119654.0A
Other languages
Chinese (zh)
Other versions
CN104699675B (en
Inventor
徐金安
赵雁榕
韩晓光
肖冰
徐凡
陈钰枫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN201510119654.0A priority Critical patent/CN104699675B/en
Publication of CN104699675A publication Critical patent/CN104699675A/en
Application granted granted Critical
Publication of CN104699675B publication Critical patent/CN104699675B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a message translating method and device and belongs to the field of natural language treatment. The method includes acquiring an emotion icon contained in a first message of source language format; replacing the emotion icon in the first message with a first identifier used for identifying the emotion icon to obtain a second message; translating the second message into a third message of target language format; extracting a second identifier corresponding to the first identifier from the third message; replacing the second identifier in the third message with an emotion icon that the second identifier corresponds to obtain a fourth message. The device comprises a first acquisition module, a first replacement module, a translating module, a first extraction module and a second replacement module. The method and device achieves the effect that translation is not limited by an emotion icon base and a translation dictionary, achieves high-accuracy translation of emotion icons, reduces cost for constructing translation dictionaries, translation rules, translation models, language models containing emotion icons and the like, and achieves recognition, translation and generation of the emotion icons not logged in an emotion icon dictionary.

Description

The method and apparatus of translation information
Technical field
The present invention relates to natural language processing research field, particularly a kind of method and apparatus of translation information.
Background technology
Current, along with the development of computer network and the communication technology, mobile terminal is more and more universal, and the various social media such as Email, note, Facebook, QQ, micro-letter, microblogging gets more and more the routine work and life that penetrate into people.In the daily exchange activity of people, short text information occurs in a large number, and is mingled with various emoticon be made up of multiple symbol string in word.
On the other hand, the communication space of people is constantly expanded in the development of Internet technology and the communication technology, the interchange of people becomes more and more transnational, information being carried out translating is the important means that people carry out cross-cultural communication, especially when foreign country's language of user is poor, when seeing the information of foreign country's language, generally can use mechanical translation that information translation is become target language.Wherein, may comprise a large amount of emoticons in information, mechanical translation often adopts dictionary for translation that emoticon is translated into target language, and wherein dictionary for translation comprises the word of the target language form of emoticon and its correspondence.
Realizing in process of the present invention, inventor finds that prior art at least exists following problem:
Because emoticon is in constantly change, cause constructing the time-consuming of dictionary for translation, cost is high, when certain emoticon in information is not in dictionary for translation, translation model or translation instance, just cannot translate this emoticon.
Summary of the invention
In order to solve the problems of the prior art, the invention provides a kind of method of translation information.Described technical scheme is as follows:
On the one hand, the invention provides a kind of method of translation information, described method comprises:
Obtain the emoticon comprised in the first information of source language form;
The first mark be replaced into by described emoticon in the described first information for identifying described emoticon obtains the second information;
Be the 3rd information of target language form by described second information translation;
Extract from described 3rd information and described first identify corresponding second and identify;
In described 3rd information, the emoticon that described second mark is replaced into described second mark corresponding is obtained the 4th information.
Further, described first is designated temporary variable, and the form of described temporary variable is all identical in often kind of language format;
Described the first mark be replaced into by described emoticon in the described first information for identifying described emoticon obtains the second information, comprising:
For described emoticon distributes interim numbering;
The interim numbering described emoticon being replaced into described emoticon in the described first information obtains the 5th information;
According to the position of described emoticon in the described first information be described emoticon distribute temporary variable;
Associate the temporary variable of described emoticon and interim numbering;
The interim numbering of described emoticon is replaced into the temporary variable be associated with the interim numbering of described emoticon in described 5th information and obtains the second information.
Further, described second is designated temporary variable, describedly extracts from described 3rd information and described first identifies corresponding second and identify, and comprising:
The temporary variable that described 3rd packets of information contains is extracted from described 3rd information;
Correspondingly, the described emoticon described second mark being replaced into described second mark correspondence in described 3rd information obtains the 4th information, comprising:
Obtain the interim numbering be associated with described temporary variable;
Described temporary variable is replaced into the interim numbering be associated with described temporary variable in described 3rd information and obtains the 6th information;
Obtain the emoticon of described interim numbering correspondence;
The emoticon described interim numbering being replaced into described interim numbering correspondence in described 6th information obtains the 4th information.
Further, described first is designated word corresponding to described emoticon, and the language format of described word is source language form;
Described the first mark be replaced into by described emoticon in the described first information for identifying described emoticon obtains the second information, comprising:
According to described emoticon, obtain the attribute information of described emoticon;
At least one word corresponding to described emoticon is obtained according to the attribute information of described emoticon;
Described emoticon in the described first information is replaced into respectively each word at least one word described, obtains the second information that described each word is corresponding.
Further, described according to described emoticon, obtain the attribute information of described emoticon, comprising:
According to the icon data of described emoticon, from the corresponding relation of icon data and call number, obtain the call number of described emoticon;
According to the call number of described emoticon, the call number corresponding from source language and the corresponding relation of attribute information, obtain the attribute information of described emoticon.
Further, the described attribute information according to described emoticon obtains at least one word corresponding to described emoticon, comprising:
Similarity between each attribute information of the attribute information calculating described emoticon respectively and in semantic dictionary, described semantic dictionary is used for the corresponding relation of attribute information storage and word;
From described semantic dictionary, the similarity obtained between the attribute information of described emoticon meets at least one pre-conditioned attribute information;
The word that each attribute information at least one attribute information described in obtaining from described semantic dictionary is corresponding.
Further, the described attribute information according to described emoticon obtains at least one word corresponding to described emoticon, comprising:
Word corresponding to described emoticon is extracted from the attribute information that described emoticon is corresponding;
Obtain synonym or the near synonym of word corresponding to described emoticon, and using described synonym and near synonym as word corresponding to described emoticon.
Further, describedly extract from described 3rd information and described first identify corresponding second and identify, comprising:
The word of target language form corresponding to described first mark is extracted, using the word of described extraction as the second mark from described 3rd information;
Correspondingly, the described emoticon described second mark being replaced into described second mark correspondence in described 3rd information obtains the 4th information, comprising:
Obtain the attribute information corresponding from target language and the corresponding relation of call number and comprise the described second corresponding relation identified;
Extract the call number comprised in the corresponding relation of described acquisition;
According to described call number, from the corresponding relation of call number and icon data, obtain the icon data of emoticon;
In described 3rd information, the icon data that described second mark is replaced into the emoticon of described acquisition is obtained the 4th information.
Further, describedly extract from described 3rd information and described first identify corresponding second and identify, comprising:
The word of target language form corresponding to described first mark is extracted, using the word of described extraction as the second mark from described 3rd information;
Correspondingly, the described emoticon described second mark being replaced into described second mark correspondence in described 3rd information obtains the 4th information, comprising:
Obtain and described second identify corresponding first and identify;
Obtain the attribute information corresponding from source language and the corresponding relation of call number and comprise the described first corresponding relation identified;
Extract the call number comprised in the corresponding relation of described acquisition;
According to described call number, from the corresponding relation of call number and icon data, obtain the icon data of emoticon;
In described 3rd information, the icon data that described second mark is replaced into described emoticon is obtained the 4th information.
On the other hand, the invention provides a kind of device of translation information, described device comprises:
First acquisition module, for obtaining the emoticon comprised in the first information of source language form;
First replacement module, obtains the second information for the first mark be replaced into by described emoticon in the described first information for identifying described emoticon;
Translation module, for by described second information translation being the 3rd information of target language form;
First extraction module, for extracting and described first identifying corresponding second and identify from described 3rd information;
Second replacement module, for obtaining the 4th information by the emoticon that described second mark is replaced into described second mark corresponding in described 3rd information.
In embodiments of the present invention, the emoticon comprised in the first information of source language form is obtained; The first mark be replaced into by this emoticon in the first information for identifying this emoticon obtains the second information; Be the 3rd information of target language form by the second information translation; Extract from the 3rd information and first identify corresponding second and identify; The emoticon second mark being replaced into the second mark correspondence in the 3rd information obtains the 4th information.Thus the restriction achieved not by emoticon storehouse and dictionary for translation, effectively can realize the high precision translation of emoticon, reduce and construct the dictionary for translation, translation rule or the cost such as translation model, language model that comprise emoticon.And effectively can solve the identification of the emoticon do not logged in emoticon dictionary, translation and Generating Problems.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, below the accompanying drawing used required in describing embodiment is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the method flow diagram of the method for a kind of translation information that the embodiment of the present invention 1 provides;
Fig. 2-1 is the method flow diagram of the method for a kind of translation information that the embodiment of the present invention 2 provides;
Fig. 2-2 is the schematic diagram of a kind of first information that the embodiment of the present invention 2 provides;
Fig. 2-3 is schematic diagram the second information being carried out to participle analysis that the embodiment of the present invention 2 provides;
Fig. 2-4 is schematic diagram the second information being carried out to participle analysis that the embodiment of the present invention 2 provides;
Fig. 2-5 is the schematic diagram of a kind of 4th information that the embodiment of the present invention 2 provides;
Fig. 2-6 is the schematic diagram of another kind the 4th information that the embodiment of the present invention 2 provides;
Fig. 3-1 is the method flow diagram of the method for a kind of translation information that the embodiment of the present invention 3 provides;
Fig. 3-2 is the schematic diagram of a kind of first information that the embodiment of the present invention 3 provides;
Fig. 3-3 is the schematic diagram of a kind of 4th information that the embodiment of the present invention 3 provides;
Fig. 3-4 is the schematic diagram of another kind the 4th information that the embodiment of the present invention 3 provides;
Fig. 3-5 is the schematic diagram of another kind the 4th information that the embodiment of the present invention 3 provides;
Fig. 3-6 is the schematic diagram of another kind the 4th information that the embodiment of the present invention 3 provides;
Fig. 4 is the apparatus structure schematic diagram of a kind of translation information that the embodiment of the present invention 4 provides.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.
Embodiment 1
Embodiments provide a kind of method of translation information, the executive agent of the method is terminal, and the method for this translation information can realize becoming the some or all of of terminal by software, hardware or both combinations; And terminal can not possess any symbol database.This terminal comprises mobile terminal, fixed terminal or server etc.
See Fig. 1, wherein, the method comprises:
Step 101: obtain the emoticon comprised in the first information of source language form;
Step 102: the first mark be replaced into by this emoticon in the first information for identifying this emoticon obtains the second information;
Step 103: the 3rd information by the second information translation being target language form;
Step 104: extract from the 3rd information and first identify corresponding second and identify;
Step 105: the emoticon the second mark being replaced into the second mark correspondence in the 3rd information obtains the 4th information.
Further, first is designated temporary variable, and the form of temporary variable is all identical in often kind of language format;
The first mark be replaced into by this emoticon in the first information for identifying this emoticon obtains the second information, comprising:
For this emoticon distributes interim numbering;
The interim numbering this emoticon being replaced into this emoticon in the first information obtains the 5th information;
Be that this emoticon distributes temporary variable according to the position of this emoticon in the first information;
Associate the temporary variable of this emoticon and interim numbering;
The interim numbering of this emoticon is replaced into the temporary variable be associated with the interim numbering of this emoticon in the 5th information and obtains the second information.
Further, second is designated temporary variable, extracts and first identifies corresponding second and identify, comprising from the 3rd information:
The temporary variable that the 3rd packets of information contains is extracted from the 3rd information;
Correspondingly, the emoticon the second mark being replaced into the second mark correspondence in the 3rd information obtains the 4th information, comprising:
Obtain the interim numbering be associated with temporary variable;
Temporary variable is replaced into the interim numbering be associated with temporary variable in the 3rd information and obtains the 6th information;
Obtain the emoticon that interim numbering is corresponding;
In the 6th information, the emoticon that numbering is replaced into interim numbering corresponding is temporarily obtained the 4th information.
Further, first is designated word corresponding to emoticon, and the language format of word is source language form;
The first mark be replaced into by this emoticon in the first information for identifying emoticon obtains the second information, comprising:
According to this emoticon, obtain the attribute information of this emoticon;
At least one word corresponding to this emoticon is obtained according to the attribute information of this emoticon;
This emoticon in the first information is replaced into respectively each word at least one word, obtains the second information that each word is corresponding.
Further, according to this emoticon, obtain the attribute information of this emoticon, comprising:
According to the icon data of this emoticon, from the corresponding relation of icon data and call number, obtain the call number of this emoticon;
According to the call number of this emoticon, the call number corresponding from source language and the corresponding relation of attribute information, obtain the attribute information of this emoticon.
Further, obtain at least one word corresponding to this emoticon according to the attribute information of this emoticon, comprising:
Similarity between each attribute information of the attribute information calculating this emoticon respectively and in semantic dictionary, semantic dictionary is used for the corresponding relation of attribute information storage and word;
From semantic dictionary, the similarity obtained between the attribute information of this emoticon meets at least one pre-conditioned attribute information;
The word that each attribute information at least one attribute information is corresponding is obtained from semantic dictionary.
Further, obtain at least one word corresponding to emoticon according to the attribute information of this emoticon, comprising:
Word corresponding to this emoticon is extracted from the attribute information that this emoticon is corresponding;
Obtain synonym or the near synonym of word corresponding to this emoticon, and using synonym and near synonym as word corresponding to emoticon.
Further, extract from the 3rd information and first identify corresponding second and identify, comprising:
From the 3rd information, extract the word of target language form corresponding to the first mark, the word extracted is identified as second;
Correspondingly, the emoticon the second mark being replaced into the second mark correspondence in the 3rd information obtains the 4th information, comprising:
The corresponding relation comprising the second mark is obtained the attribute information corresponding from target language and the corresponding relation of call number;
Extract the call number comprised in the corresponding relation obtained;
According to call number, from the corresponding relation of call number and icon data, obtain the icon data of this emoticon;
The icon data second mark being replaced into the emoticon of acquisition in the 3rd information obtains the 4th information.
Further, extract from the 3rd information and first identify corresponding second and identify, comprising:
From the 3rd information, extract the word of target language form corresponding to the first mark, the word extracted is identified as second;
Correspondingly, the emoticon the second mark being replaced into the second mark correspondence in the 3rd information obtains the 4th information, comprising:
Obtain and second identify corresponding first and identify;
The corresponding relation comprising the first mark is obtained the attribute information corresponding from source language and the corresponding relation of call number;
Extract the call number comprised in the corresponding relation obtained;
According to call number, from the corresponding relation of call number and icon data, obtain the icon data of this emoticon;
The icon data second mark being replaced into this emoticon in the 3rd information obtains the 4th information.
In embodiments of the present invention, the emoticon comprised in the first information of source language form is obtained; The first mark be replaced into by this emoticon in the first information for identifying this emoticon obtains the second information; Be the 3rd information of target language form by the second information translation; Extract from the 3rd information and first identify corresponding second and identify; The emoticon second mark being replaced into the second mark correspondence in the 3rd information obtains the 4th information.Thus the restriction achieved not by emoticon storehouse and dictionary for translation, effectively can realize the high precision translation of emoticon, reduce and construct the dictionary for translation, translation rule or the cost such as translation model, language model that comprise emoticon.And effectively can solve the identification of the emoticon do not logged in emoticon dictionary, translation and Generating Problems.
Embodiment 2
Embodiments provide a kind of method of translation information, the executive agent of the method is terminal, and the method for this translation information can realize becoming the some or all of of terminal by software, hardware or both combinations; And terminal can not possess any symbol database.This terminal can be mobile terminal, fixed terminal or server etc.The embodiment of the present invention is applicable to the input equipment of user, does not possess any emoticon database sight.
Be described for the first mark and the second temporary variable being designated emoticon in embodiments of the present invention.See Fig. 2-1, wherein, the method comprises:
Step 201: obtain the emoticon comprised in the first information of source language form;
Wherein, step 201 can be passed through following steps (1) to (2) and realize, and comprising:
(1): terminal obtains the first information of the source language form of user's input;
Wherein, user is to the first information of terminal input source language format, and terminal obtains the first information of the source language form of user's input, and the first information comprises at least one sentence to be translated.
User can adopt manual input mode to input the first information to terminal in embodiments of the present invention, and the mode copying stickup also can be adopted to input the first information to terminal.
Wherein, the manual input mode of user can be one or more in the input of document form, phonetic entry, input through keyboard, touch input, handwriting input, optical character identification input.
Wherein, source language can be arbitrary languages, does not do concrete restriction in embodiments of the present invention to source language.Such as, take source language as Chinese, the first information be " Valentine's Day, give you+rose emoticon and+lollipop emoticon! + titter emoticon! " be described, see Fig. 2-2 for example.Then in this step, terminal obtain user input the first information " Valentine's Day, give you+rose emoticon and+lollipop emoticon! + titter emoticon! ".
(2): detect in the first information whether comprise emoticon, if comprised, then the emoticon comprised in the first information is obtained.
Wherein, emoticon can comprise the symbol with certain semantic that the one or more symbols in letter, numeral, punctuate, phonetic, assumed name, font, classification, image etc. form.
Because emoticon is different with the coded format of word, therefore, detecting the step whether comprising emoticon in the first information can be:
Determine in the first information, whether to comprise the content that coded format is pre-arranged code form, if comprised, then determine to comprise emoticon in the first information, and the content of pre-arranged code form is emoticon; If do not comprised, then determine not comprise emoticon in the first information.
Pre-arranged code form can be coded format etc. corresponding to picture format or symbol string.
Such as, determine the first information " Valentine's Day, give you+rose emoticon and+lollipop emoticon! + titter emoticon! " in comprise emoticon, obtain the emoticon " rose emoticon " that comprises of the first information, " lollipop emoticon " and " titter emoticon ".
Further, terminal has real-time hold function, when getting the emoticon comprised in the first information, the word comprised in the first information and emoticon are stored according to preset data structure, such as, preset data structure can be Hash array " hash [key]=value ", or chained list etc.
Step 202: for this emoticon distributes interim numbering;
According to the storage order of this emoticon in the first information, for this emoticon distributes interim numbering.And after the interim numbering of this emoticon distribution, associate the interim numbering of this emoticon and this emoticon, the interim numbering also by this emoticon and this emoticon is stored in emoticon with in the corresponding relation of interim numbering.
Such as, " rose emoticon ", " lollipop emoticon " and " titter emoticon " storage order in the first information is followed successively by 1,2,3; Then be respectively " rose emoticon ", " lollipop emoticon " and " titter emoticon " distributes numbering temp001 temporarily, temp002 and temp003, association " rose emoticon " and temp001, " lollipop emoticon " and temp002, " titter emoticon " and temp003, also " rose emoticon " and temp001 is about to, " lollipop emoticon " and temp002, " titter emoticon " and temp003 are stored in the corresponding relation of interim numbering of emoticon, as shown in table 1 below:
Table 1
Emoticon Interim numbering
Rose Temp001
Lollipop Temp002
Titter Temp003
…… ……
Step 203: the interim numbering this emoticon being replaced into this emoticon in the first information obtains the 5th information;
Such as, the first information " Valentine's Day, give you+rose emoticon and+lollipop emoticon! + titter emoticon! " in " rose emoticon " is replaced into the interim numbering temp001 of " rose emoticon "; " lollipop emoticon " is replaced into the interim numbering temp002 that " lollipop emoticon " is replaced into " lollipop emoticon "; will " titter emoticon " be replaced into the emoticon that titters " interim numbering temp003, the 5th information that obtains is for " Valentine's Day gives you temp001 and temp002! ".
Step 204: be that this emoticon distributes temporary variable according to the position of this emoticon in the first information;
Such as, according to " rose emoticon ", " lollipop emoticon " and " titter emoticon " position in the first information, for " rose emoticon " distributes temporary variable X, for " lollipop emoticon " distributes temporary variable Y, and distribute temporary variable Z for " titter emoticon ".
Also can be this emoticon distribution temporary variable according to the position of this emoticon in the first information and/or classification in embodiments of the present invention; The form of temporary variable is all identical in often kind of language format.
If when the emoticon comprised in the first information is more, can using temporary variable and number combinatorics on words as the temporary variable of this emoticon, thus be embodied as each emoticon comprised in the first information and distribute a unique temporary variable.Such as, temporary variable is English alphabet, because English alphabet only has 26, when the number of the emoticon comprised in the first information is greater than 26, can obtain numeral, using English alphabet and the temporary variable of number combinatorics on words as emoticon.Such as, by X0, X1, X2 ... Xn and Y1, Y2 ... Deng the temporary variable as emoticon.
Step 205: associate the temporary variable of this emoticon and interim numbering;
The temporary variable of emoticon and interim numbering are stored in the corresponding relation of temporary variable and interim numbering, thus realize the temporary variable of this emoticon of association and interim numbering.
Such as, by the temporary variable X of " rose emoticon " and numbering temp001 temporarily, the temporary variable Y of " lollipop emoticon " and temporarily numbering temp002, and the temporary variable Z of " titter emoticon " and interim numbering temp003 is stored in the corresponding relation of temporary variable and interim numbering, as shown in table 2 below:
Table 2
Temporary variable Interim numbering
X Temp001
Y Temp002
Z Temp003
…… ……
Step 206: the interim numbering of this emoticon is replaced into the temporary variable be associated with the interim numbering of this emoticon in the 5th information and obtains the second information;
According to the interim numbering of this emoticon, the temporary variable that acquisition is associated with the interim numbering of this emoticon from the corresponding relation of interim numbering and temporary variable; In the 5th information, the interim numbering of this emoticon is replaced into the temporary variable be associated with the interim numbering of this emoticon and obtains the second information.
Such as, according to the interim numbering temp001 of " rose emoticon ", the interim numbering temp002 of " lollipop emoticon ", and the interim numbering temp003 of " titter emoticon ", the temporary variable X be associated with temp001 is obtained respectively from table 2, the temporary variable Y be associated with temp002, and the temporary variable Z be associated with temp003; In the 5th information, temp001 is replaced with X, temp002 is replaced with Y, temp003 is replaced with Z, obtain the second information for " Valentine's Day, give your X and Y! ".
Step 207: the 3rd information by the second information translation being target language form;
Wherein, target language can be arbitrary languages, does not do concrete restriction in embodiments of the present invention to target language.Step 207 can be passed through following steps (1) to (2) and realize, and comprising:
(1): from translation algorithm set, select at least one translation algorithm;
Wherein, translation algorithm set comprises rule-based translation algorithm, the translation algorithm of Case-based Reasoning and the translation algorithm of Corpus--based Method.
From translation algorithm set, select any one translation algorithm, or from translation algorithm set, select any two kinds of translation algorithms, or select three kinds of translation algorithms from translation algorithm set.
(2): by the translation algorithm selected, be the 3rd information of target language form by the second information translation;
Such as, take target language as Japanese, the translation algorithm of selection is rule-based translation algorithm is that example is described, then by rule-based translation algorithm, then by the second information " Valentine's Day, give your X and Y! " be translated as the 3rd information of Japanese form for " バ レ Application タ イ ン デ ー は, あ な To X と Y The and え Ru! "; For another example, take target language as Japanese, the translation algorithm of selection is the translation algorithm of Case-based Reasoning, then by the second information " Valentine's Day, give your X and Y! " be translated as Japanese form the 3rd information for " バ レ Application タ イ ン デ ー は, あ な To Gifts り, X と Y! ".
By the translation algorithm selected, when by the second information translation being the 3rd information of target language form, morphology and/or syntactic analysis can also be carried out according to the translation algorithm selected to the second information.Also namely only can carry out lexical analysis to the second information, or only syntactic analysis be carried out to the second information, also first can carry out lexical analysis to the second information, then carry out syntactic analysis.In embodiments of the present invention concrete restriction is not done to lexical analysis and syntactic analysis.
Wherein, the translation algorithm of rule-based translation algorithm, Case-based Reasoning and the translation algorithm based on syntax, need to carry out lexical analysis and syntactic analysis to the second information; And some translation algorithm only can carry out lexical analysis or syntactic analysis to the second information.
Wherein, the instrument of lexical analysis and syntactic analysis is a lot, and such as participle instrument can be Stanford POSTagger (English Chinese Arabic), Computer Department of the Chinese Academy of Science ICTCLAS Chinese analysis system, the thulac Words partition system of Tsing-Hua University, ChaSen, Mecab, JUMAN etc. of Japanese.Such as syntax participle instrument can be the parser such as Cabocha, KNP of Stanford Parse (English Chinese Arabic), Harbin Institute of Technology's Chinese parsing device, Japanese, is not specifically limited in embodiments of the present invention to lexical analysis tool and syntactic analysis instrument.
Such as, use the thulac participle instrument of Tsing-Hua University to the second information " Valentine's Day, give your X and Y! " carry out participle, the word segmentation result obtained for " Valentine's Day/t ,/w give/v you/r X/x and/cY/x! / w Z/x! / w "
Wherein, the part of speech in word segmentation result is as shown in table 3:
Table 3
Symbol Part of speech Symbol Part of speech Symbol Part of speech
n Noun s Place word r Pronoun
np Name v Verb c Conjunction
ns Place name vm Modal verb p Preposition
ni Mechanism's name vd Directional verb u Auxiliary word
nz Other proper name a Adjective y Auxiliary words of mood
m Number d Adverbial word e Interjection
q Measure word h Enclitics o Onomatopoeia
mq Numeral-classifier compound k After to be connected into point g Morpheme
t Time word i Idiom w Punctuate
f The noun of locality j Be called for short x Other
Such as, adopt Harbin Institute of Technology Chinese parsing go to the second information " Valentine's Day, give your X and Y! " carrying out interdependent syntactic analysis, the analysis result of the dependency analysis tree obtained is as Fig. 2-3 and Fig. 2-4.Chinese dependency tree and the mutual switch technology relative maturity of phrase structure tree.Tree is had to arrive string model etc. to tree-model, tree to tree-model, forest to string model, string based on phrase structure tree or based on the statistical machine translation method of dependency structure tree.Described syntactic analysis result, in conjunction with the present invention, is equally applicable to described syntax-based SMT method and Case-based design method.
Step 208: extract the temporary variable that the 3rd packets of information contains from the 3rd information;
Such as, from the 3rd information " バ レ Application タ イ ン デ ー は, あ な To X と Y The and え Ru! " or " バ レ Application タ イ ン デ ー は, あ な To Gifts り, X と Y! " middle the temporary variable X extracting the 3rd packets of information and contain, Y and Z.
Step 209: obtain the interim numbering be associated with temporary variable;
According to the temporary variable obtained, from temporary variable and interim corresponding relation of numbering, obtain the interim numbering be associated with temporary variable.
Such as, according to temporary variable X, Y and Z, from the temporary variable of table 2 and the corresponding relation of interim numbering, obtain the interim numbering temp001 of temporary variable X respectively, the interim numbering temp002 of temporary variable Y, and the interim numbering temp003 of temporary variable Z.
Step 210: temporary variable is replaced into the interim numbering be associated with temporary variable in the 3rd information and obtains the 6th information;
Such as, at the 3rd information " バ レ Application タ イ ン デ ー は, あ な To X と Y The and え Ru! " in temporary variable X is replaced into the interim numbering temp001 be associated with temporary variable X; temporary variable Y is replaced into the interim numbering temp002 be associated with temporary variable Y; temporary variable Z is replaced into the interim numbering temp003 be associated with temporary variable Z, the 6th information that obtains is for " バ レ Application タ イ ン デ ー は, あ な To temp001 と temp002 The and え Ru! Temp003 "; For another example, the 3rd information " バ レ Application タ イ ン デ ー は, あ な To Gifts り, X と Y! " in temporary variable X is replaced into the interim numbering temp001 be associated with temporary variable X; temporary variable Y is replaced into the interim numbering temp002 be associated with temporary variable Y; temporary variable Z is replaced into the interim numbering temp003 be associated with temporary variable Z, the 6th information that obtains is for " バ レ Application タ イ ン デ ー は, あ な To Gifts り, temp001 と temp002! ".
Step 211: obtain the emoticon that interim numbering is corresponding;
According to interim numbering, from the corresponding relation of interim numbering and emoticon, obtain the emoticon of this numbering correspondence temporarily.
Such as, according to numbering temp001, temp002 and temp003 temporarily, from table 1, obtain emoticon corresponding to temp001 " rose emoticon " respectively, the emoticon " lollipop emoticon " that temp002 is corresponding, the emoticon " titter emoticon " that temp003 is corresponding.
Step 212: in the 6th information, the emoticon that numbering is replaced into interim numbering corresponding is temporarily obtained the 4th information.
Such as, at the 6th information " バ レ Application タ イ ン デ ー は, あ な To temp001 と temp002 The and え Ru! Temp003 " in; numbering temp001 is temporarily replaced into " rose emoticon "; numbering temp002 is temporarily replaced into " lollipop emoticon ", numbering temp003 is temporarily replaced into " titter emoticon " and obtains the 4th information for " バ レ Application タ イ ン デ ー は, あ な To+rose emoticon と+lollipop emoticon The and え Ru! + titter emoticon ", as shown in Figure 2-5." in; numbering temp001 is temporarily replaced into " rose emoticon "; numbering temp002 is temporarily replaced into " lollipop emoticon ", numbering temp003 is temporarily replaced into " titter emoticon " and obtains the 4th information for " バ レ Application タ イ ン デ ー は, あ な To Gifts り ,+rose emoticon と+lollipop emoticon! + titter emoticon! ", as shown in figures 2-6.
According to the 4th information " バ レ Application タ イ ン デ ー は, あ な To+rose emoticon と+lollipop emoticon The and え Ru! + titter emoticon " and " バ レ Application タ イ ン デ ー は, あ な To Gifts り ,+rose emoticon と+lollipop emoticon! + titter emoticon! " can find out, although the translation algorithm selected is different, causes translation result different, do not affect effect of the present invention.
Further, numbering is temporarily replaced into after emoticon corresponding to interim numbering obtain the 4th information in the 6th information, the 4th information is exported.Can export with one or more in word, image, voice, emoticon etc. when the 4th information is exported.In embodiments of the present invention concrete restriction is not done to the way of output of the 4th information.
In embodiments of the present invention, by obtaining the emoticon comprised in the first information of source language form; In the first information, by emoticon, the temporary variable be replaced into for identifying this emoticon obtains the second information; Be the 3rd information of target language form by the second information translation; Temporary variable is extracted from the 3rd information; Temporary variable is replaced into corresponding emoticon in the 3rd information and obtains the 4th information.Thus the restriction achieved not by emoticon storehouse and dictionary for translation, effectively can realize the high precision translation of emoticon, reduce and construct the dictionary for translation, translation rule or the cost such as translation model, language model that comprise emoticon.And effectively can solve the identification of the emoticon do not logged in emoticon dictionary, translation and Generating Problems.And the judgement of emoticon in the structural position of target language end and Generating Problems can be solved.At target language end, correctly identify the position at emoticon place, ensure the sentence structure of translation result and semantic integrality.And the present invention by the restriction of languages, effectively can not solve the identification of the emoticon of any languages, translation and Generating Problems.
Embodiment 3
Embodiments provide a kind of method of translation information, the executive agent of the method is terminal, and the method for this translation information can realize becoming the some or all of of terminal by software, hardware or both combinations.This terminal can be mobile terminal, fixed terminal or server etc.The embodiment of the present invention is applicable to the input equipment of user, possesses given situation such as certain symbol expression database, synonymicon etc.
Be designated emoticon word corresponding in source language with first in embodiments of the present invention, the second word being designated emoticon corresponding in target language is that example is described.See Fig. 3-1, wherein, the method comprises:
Step 301: obtain the emoticon comprised in the first information of source language form;
Wherein, step 301 can be passed through following steps (1) to (2) and realize, and comprising:
(1): terminal obtains the first information of the source language form of user's input;
Wherein, user is to the first information of terminal input source language format, and terminal obtains the first information of the source language form of user's input, and the first information comprises at least one sentence to be translated.
User can adopt manual input mode to input the first information to terminal in embodiments of the present invention, and the mode copying stickup also can be adopted to input the first information to terminal.
Wherein, the manual input mode of user can be one or more in the input of document form, phonetic entry, input through keyboard, touch input, handwriting input, optical character identification input.
Wherein, source language can be arbitrary languages, does not do concrete restriction in embodiments of the present invention to source language.Such as, take source language as Chinese, the first information is described for example for " he feels very+happiness emoticon ", as shown in figure 3-2.Then in this step, terminal obtains " he feels very+happiness emoticon " of user's input.
(2): detect in the first information whether comprise emoticon, if comprised, then the emoticon comprised in the first information is obtained.
Wherein, emoticon can comprise the symbol with certain semantic that the one or more symbols in letter, numeral, punctuate, phonetic, assumed name, font, classification, image etc. form.
Because emoticon is different with the coded format of word, therefore, detecting the step whether comprising emoticon in the first information can be:
Determine in the first information, whether to comprise the content that coded format is pre-arranged code form, if comprised, then determine to comprise emoticon in the first information, and the content of pre-arranged code form is emoticon; If do not comprised, then determine not comprise emoticon in the first information.
Pre-arranged code form can be coded format etc. corresponding to picture format or symbol string.
Wherein, user to the first information of terminal input source language format,
Such as, determine to comprise emoticon in the first information " he feels very+happiness emoticon ", obtain the emoticon " happiness emoticon " that the first information comprises.
Further, terminal has real-time hold function, when getting the emoticon comprised in the first information, the word comprised in the first information and emoticon are stored according to preset data structure, such as, preset data structure can be Hash array " hash [key]=value ", or chained list etc.
Further, after getting the emoticon comprised in the first information, for the emoticon obtained distributes interim numbering, emoticon is replaced into interim numbering corresponding to emoticon in the first information.
Such as " happiness emoticon " distributes numbering temp001 temporarily, in the first information " he feels very+happiness emoticon ", " happiness emoticon " is replaced into tempo01, obtains " he feels very temp001 ".
Step 302: according to this emoticon, obtains the attribute information of this emoticon;
Wherein, attribute information can be the word of emoticon, semanteme, classification, part of speech, structure, concept, length, title, expression size, form, content and/or phonetic etc.Various emoticons in this emoticon and emoticon storehouse are carried out pattern match, obtains the attribute information of this emoticon; Wherein, in emoticon storehouse for storing the corresponding relation of emoticon and attribute information.The content in emoticon storehouse can comprise emoticon library name, expression data total length, expression number, use the information such as emoticon, expression index, expression length, expression title, expression size, expression form, expression content, written form, semanteme, classification, part of speech, structure, concept and display position recently.
Wherein, step 302 can pass through following steps (1) and (2) realize, and comprising:
(1) from the corresponding relation of icon data and call number: according to the icon data of this emoticon, obtain the call number of this emoticon;
According to this emoticon, from the corresponding relation of emoticon and icon data, obtain the icon data of this emoticon, according to the icon data of this emoticon, from the corresponding relation of icon data and call number, obtain the call number of this emoticon.
In embodiments of the present invention, emoticon and the corresponding relation of icon data and the corresponding relation of icon data and call number is stored in advance in terminal; Wherein, the corresponding relation of emoticon and icon data is as shown in table 4 below, and the corresponding relation of icon data and call number is as shown in table 5 below:
Table 4
Icon data Emoticon
010011000111……0100100 Surprised
010011000111……0100101 Glad
010011000111……0100110 Titter
010011000111……0100111 By force
010011000111……0101000 Lollipop
010011000111……0111000 Rose
Table 5
Call number Icon data
X…X001 010011000111……0100100
X…X002 010011000111……0100101
X…X003 010011000111……0100110
X…X004 010011000111……0100111
X…X005 010011000111……0101000
X…X100 010011000111……0111000
Such as, according to " happiness emoticon ", the emoticon from table 4 obtains icon data corresponding to " happiness emoticon " with the corresponding relation of icon data be 010011000111 ... 0100101; According to the icon data 010011000111 of " happiness emoticon " ... 0100101, the icon data of the emoticon from table 5 obtains call number corresponding to " happiness emoticon " with the corresponding relation of call number be X ... X002.
(2) attribute information of this emoticon: according to the call number of this emoticon, is obtained the call number corresponding from source language and the corresponding relation of attribute information.
According to source language, obtain the corresponding relation of call number corresponding to source language and attribute information, according to the call number of this emoticon, the call number corresponding from source language and the corresponding relation of attribute information, obtain the attribute information of this emoticon.
In embodiments of the present invention, terminal stores the corresponding relation of call number corresponding to often kind of language and attribute information in advance.Such as, the call number that Chinese is corresponding and the corresponding relation of attribute information are as shown in following table 6 and table 7:
Table 6
Length Title Expression size Form Content Word Position Call number
100bytes /jy 16*16 bmp (⊙o⊙) Surprised 1 X…X001
/gx 16*16 bmp (*^﹏^*) Glad 2 X…X002
/tx 16*16 bmp Titter 3 X…X003
/qiang 16*16 bmp By force 4 X…X004
/bangbangt 16*16 bmp Lollipop 5 X…X005
Table 7
Call number Word Title Phonetic Part of speech Semantic
X…X001 Surprised /jy jinagya adj Emotion
X…X002 Glad /gx gaoxing adj Emotion
X…X003 Titter /tx touxiao v Behavior
X…X004 By force /qiang qiang adj Degree
X…X005 Lollipop /bangbangt bagnbangtang n Food
Such as, according to call number X ... X002, the attribute information obtaining " happiness emoticon " correspondence from table 6 and table 7 comprises name and is called/gx, and expression size is 16*16, and form is bmp, content is (* ^ ﹏ ^*), word is glad, and position is 2, and phonetic is glad, part of speech is adj, and semanteme is emotion etc.For another example, " happiness emoticon " and interim numbering temp001 are formed numerical value (temp001, happiness emoticon), then the emoticon in this array and emoticon storehouse is used to carry out pattern match, the searching number of happiness emoticon can be obtained for " X ... X002 " from table 4 and table 5, then pattern match is carried out according to this searching number " X ... X002 " and table 6 and table 7, the various attribute informations of " happiness emoticon " can be obtained, if length of expressing one's feelings is 100bytes, expression name is called Happy, expression size is 16*16, expression form is bmp, word is glad, phonetic is gaoxing, semanteme is emotion, part of speech is the attribute informations such as adj/ adjective.
Step 303: obtain at least one word corresponding to this emoticon according to the attribute information of this emoticon;
Wherein, step 303 can be realized by first kind of way or the second way, and for the first implementation, step 303 can be passed through following steps (1) to (3) and realize, and comprising:
(1): the similarity between each attribute information of the attribute information calculating this emoticon respectively and in semantic dictionary, semantic dictionary is used for the corresponding relation of attribute information storage and word;
Wherein, semantic dictionary can use synonym or the near synonym dictionary of source language, or the dictionary for translation of source language and target language, or translation model, language model etc.Different according to the word storehouse used, the method calculating similarity can do corresponding adjustment.
The dictionary for translation of source language and target language, or the resource that translation model, language model can make translation system carry.As the bilingual translation dictionary of the translation algorithm of rule-based translation algorithm, Case-based Reasoning, the translation model of the translation algorithm of Corpus--based Method or language model etc., all can be used for the Semantic Similarity Measurement of vocabulary, this type of technology relative maturity, at this, repeat no more.
Chinese can use Chinese thesaurus or HowNet ( http:// www.keenage.com/), English Synonyms/near synonym dictionary can Use Word Net (http://wordnet.princeton.edu/).
In addition, EuroWordNet (http://www.illc.uva.nl/EuroWordNet/) is the multi-lingual semantic network dictionary in Europe, goes for the Semantic Similarity Measurement of the language such as Dutch, Italian, Spanish, German, French, Czech and Estonian.
India semantic dictionary IndoWordNet ( http:// en.wikipedia.org/wiki/IndoWordNet) contain the semantic network of India's 18 kinds of official languages.
Japanese can use Japanese WordNet ( http:// nlpwww.nict.go.jp/wn-ja/), Japanese vocabulary complete works (http://www.kecl.ntt.co.jp/icl/lirg/resources/GoiTaikei/) etc. carries out Semantic Similarity Measurement.
Different according to languages, semantic similarity calculation method is slightly different, is not specifically limited at this; As Chinese uses the semantic similarity calculation method of HowNet.As: Liu Qun, Li Sujian. the Similarity of Words based on " knowing net " calculates [J]. Chinese computing linguistics, 2002,7 (2): 59-76.
The semantic similarity calculation method of English, as: Pedersen T, Patwardhan S, Michelizzi J.WordNet::Similarity:measuring the relatedness of concepts [C], Demonstrationpapers at HLT-NAACL 2004.Association for Computational Linguistics, 2004:38-41.
There is the existing techniques and methods of a lot of Semantic Similarity Measurement equally in other language, is all applicable to the present invention, does not repeat them here.
(2): from semantic dictionary, the similarity obtained between the attribute information of this emoticon meets at least one pre-conditioned attribute information;
Pre-conditionedly can be greater than predetermined threshold value or the maximum default value of similarity for similarity; Then step (2) can be: from semantic dictionary, the similarity obtained between the attribute information of this emoticon is greater than the attribute information of predetermined threshold value; Or, the predetermined number attribute information that acquisition is maximum with the similarity between the attribute information of this emoticon from semantic dictionary.
Predetermined threshold value and default value can be arranged as required, do not do concrete restriction in embodiments of the present invention to predetermined threshold value and default value.
(3): from semantic dictionary, obtain the word that each attribute information at least one attribute information is corresponding.
According to obtain at least one attribute information in each attribute information, from the attribute information semantic dictionary with obtain word corresponding to each attribute information in the corresponding relation of word.
For the second implementation, step 303 can pass through following steps (A) and (B) realizes, and comprising:
(A): from the attribute information that this emoticon is corresponding, extract word corresponding to this emoticon;
Wherein, attribute information comprises word, extracts word corresponding to this emoticon from the attribute information that this emoticon is corresponding.
Such as, from the attribute information of " happiness emoticon " correspondence, extracting the word being somebody's turn to do " happiness emoticon " corresponding is " happiness ".
(B): the synonym or the near synonym that obtain word corresponding to this emoticon, and using the synonym that obtains and near synonym as word corresponding to this emoticon.
Such as, the synonym of " happiness " is " happiness ", and the near synonym of " happiness " comprise joyful, joyful, happy, joy, happy, gambol, rouse oneself, smooth happy, proud, peaceful and comfortable, cheerful and light-hearted, satisfactory, great rejoicing, jump for joy, happy, satisfied, happy, happily, joy, excited, achieve one's ambition, entertainment, happiness, jubilant, happy, pleased etc." happiness " and " happiness " is obtained, by " happiness " and " happiness " word as " happiness emoticon " correspondence from the synonym or near synonym of " happiness ".
Step 304: the emoticon in the first information is replaced into respectively each word at least one word, obtains the second information that each word is corresponding;
Such as, in the first information " he feels very+happiness emoticon ", " happiness emoticon " is replaced into " happiness " and " happiness " respectively, second information corresponding to " happiness " that obtains is " he feels very delight ", and second information of " happiness " correspondence is " he feels to be as cheerful as a lark ".
Step 305: the 3rd information by the second information translation being target language form;
Wherein, target language can be arbitrary languages, does not do concrete restriction in embodiments of the present invention to target language.Step 305 can be passed through following steps (1) to (2) and realize, and comprising:
(1): from translation algorithm set, select at least one translation algorithm;
Wherein, translation algorithm set comprises rule-based translation algorithm, the translation algorithm of Case-based Reasoning and the translation algorithm of Corpus--based Method.
From translation algorithm set, select any one translation algorithm, or from translation algorithm set, select any two kinds of translation algorithms, or select three kinds of translation algorithms from translation algorithm set.
(2): by the translation algorithm selected, be the 3rd information of target language form by the second information translation;
Such as, take target language as Japanese, the second information is " he feels very delight ", the second information " he feels very delight " is translated as the 3rd information of Japanese form for " that は と て も う れ い ", wherein " う れ い " is a Japanese vocabulary, looks like for happiness; For another example, the second information is " he feels to be as cheerful as a lark ", the second information " he feels to be as cheerful as a lark " is translated as the 3rd information of Japanese form for " that は と て も ease い ", wherein , “ ease い " be a Japanese vocabulary, look like for happiness.
By the translation algorithm selected, when by the second information translation being the 3rd information of target language form, morphology and/or syntactic analysis can also be carried out according to the translation algorithm selected to the second information.Also namely only can carry out lexical analysis to the second information, or only syntactic analysis be carried out to the second information, also first can carry out lexical analysis to the second information, then carry out syntactic analysis.In embodiments of the present invention concrete restriction is not done to lexical analysis and syntactic analysis.
Wherein, the translation algorithm of rule-based translation algorithm, Case-based Reasoning and the translation algorithm based on syntax, need to carry out lexical analysis and syntactic analysis to the second information; And some translation algorithm only can carry out lexical analysis or syntactic analysis to the second information.
Wherein, the instrument of lexical analysis and syntactic analysis is a lot, and such as participle instrument can be Stanford POSTagger (English Chinese Arabic), Computer Department of the Chinese Academy of Science ICTCLAS Chinese analysis system, the thulac Words partition system of Tsing-Hua University, ChaSen, Mecab, JUMAN etc. of Japanese.Such as syntax participle instrument can be the parser such as Cabocha, KNP of Stanford Parse (English Chinese Arabic), Harbin Institute of Technology's Chinese parsing device, Japanese, is not specifically limited in embodiments of the present invention to lexical analysis tool and syntactic analysis instrument.
Step 306: the word extracting target language form corresponding to this emoticon from the 3rd information; '
Obtain the word of target language form corresponding to this emoticon according to the word of source language form corresponding to this emoticon, from the 3rd information, extract the word of target language form corresponding to this emoticon.
Step 307: the emoticon word of extraction being replaced into the word of extraction in the 3rd information corresponding obtains the 4th information.
Wherein, step 307 can be realized by first kind of way or the second way, and for the first implementation, step 307 can be passed through following steps (1) to (4) and realize, and comprising:
(1): the attribute information corresponding from target language and the corresponding relation obtaining the word comprising target language form corresponding to this emoticon the corresponding relation of call number;
According to target language, obtain the corresponding relation of attribute information corresponding to target language and call number; According to the word of source language form corresponding to this emoticon, from the corresponding relation obtained, obtain the corresponding relation of the word of target language form corresponding to this emoticon.
Wherein, the corresponding relation of Japanese is corresponding call number, icon data and picture material is as shown in table 8 below:
Table 8
Call number Icon data Picture material (remarks)
Y…Y001 010011000111……1000100 Happiness emoticon in Japanese
Y…Y002 010011000111……1000101 Happy emoticon in Japanese
Y…Y100 010011000111……1011010 Emoticon tired out in Japanese
Wherein, the attribute information of the emoticon in Japanese is as shown in table 9 below:
Table 9
Such as, in the 3rd information, find out " う れ い " with “ ease い " be the formation object of Japanese manufacturing side icon, according to " う れ い " with “ ease い " from table 8, find " う れ い " with “ ease い " distinguish the call number of corresponding icon for " Y ... Y001 " and " Y ... Y002 ", " Y ... Y001 " is updated in " that は と て も う れ い う れ い " and obtains " that は と て も う れ い (Y ... Y001) ", " Y ... Y002 " is updated in " that は と て も ease い " and obtains " that は と て も ease い (Y ... Y002) ".
(2): extract the call number comprised in the corresponding relation obtained;
Wherein, in corresponding relation, comprise call number, the call number that extraction apparatus comprises from the corresponding relation obtained.
(3) from the corresponding relation of call number and icon data: according to this call number, obtain the icon data of this emoticon;
Wherein, the corresponding relation of call number and icon data is stored in terminal.
(4): the icon data this word being replaced into the emoticon of acquisition in the 3rd information obtains the 4th information.
Such as, in the 3rd information " that は と て も う れ い ", " う れ い " is replaced into glad corresponding emoticon, the 4th information that obtains is " that は と て も+happiness emoticon ", as shown in Fig. 3-3; For another example, in the 3rd information " that は と て も ease い ", Jiang “ ease い " be replaced into happy corresponding emoticon, the 4th information that obtains is " that は と て も+happy emoticon ", as shown in Figure 3-4.
Such as, the right side " happiness emoticon " being inserted into those は と て も obtains the 4th information for " that は と て も う れ い (happiness emoticon) ", as in Figure 3-5; The right side " happy emoticon " being inserted into those は と て も obtains the 4th information for " that は と て も ease い (happy emoticon) ", as seen in figures 3-6.
For the second implementation, step 307 can be passed through following steps (A) to (E) and realize, and comprising:
(A): the word obtaining the source language form corresponding with this emoticon;
According to this emoticon, obtain the word of the source language form corresponding with this emoticon, such as, the word obtaining the source language form corresponding with being somebody's turn to do " happiness emoticon " according to " happiness emoticon " is " happiness ".
(B): the attribute information corresponding from source language and the corresponding relation of call number, obtain the corresponding relation comprising the word of this emoticon in source language;
According to source language, obtain the corresponding relation of attribute information corresponding to source language and call number, from the corresponding relation obtained, obtain the corresponding relation comprising the word of this emoticon in source language.
(C): extract the call number comprised in the corresponding relation obtained;
Wherein, in corresponding relation, comprise call number, extract the call number comprised in the corresponding relation obtained.
(D) from the corresponding relation of call number and icon data: according to call number, obtain the icon data of emoticon;
(E): the icon data this word being replaced into this emoticon in the 3rd information obtains the 4th information.
Wherein, the icon data of this emoticon also can be inserted into the left side or right side etc. of this word by step (E) in the 3rd information.
Wherein, it should be noted that, the difference of the first implementation and the second implementation is, the emoticon in first kind of way is the emoticon of target language form, and the emoticon in the second implementation is the emoticon of source language form.
Further, the word of extraction is replaced into after emoticon corresponding to the word of extraction obtain the 4th information in the 3rd information, exports the 4th information.4th information can export in one or more modes in word, image, voice, emoticon etc.
First effect of the present invention, it is the restriction that the present invention can not be subject to emoticon storehouse and dictionary for translation, effectively can realize the high precision translation of emoticon, reduce and construct the dictionary for translation, translation rule or the cost such as translation model, language model that comprise emoticon.
Second effect of the present invention is that the present invention effectively can solve the identification of the emoticon do not logged in emoticon dictionary, translation and Generating Problems.
3rd effect of the present invention is that the present invention effectively can solve the judgement of emoticon in the structural position of target language end and Generating Problems.At target language end, correctly identify the position at emoticon place, ensure the sentence structure of translation result and semantic integrality.
4th effect of the present invention, is the restriction that the present invention is not subject to languages, effectively can solves the identification of the emoticon of any languages, translation and Generating Problems.
Embodiment 4
Embodiments provide a kind of device of translation information, see Fig. 4, wherein, this device comprises:
First acquisition module 401, for obtaining the emoticon comprised in the first information of source language form;
First replacement module 402, obtains the second information for the first mark be replaced into by emoticon in the first information for identifying emoticon;
Translation module 403, for by the second information translation being the 3rd information of target language form;
First extraction module 404, for extracting and first identifying corresponding second and identify from the 3rd information;
Second replacement module 405, obtains the 4th information for the emoticon the second mark being replaced into the second mark correspondence in the 3rd information.
Further, first is designated temporary variable, and the form of temporary variable is all identical in often kind of language format;
First replacement module 402, comprising:
First allocation units, for distributing interim numbering for emoticon;
First permute unit, obtains the 5th information for interim numbering emoticon being replaced into emoticon in the first information;
Second allocation units, for according to the position of emoticon in the first information being emoticon distribution temporary variable;
Associative cell, for associating the temporary variable of emoticon and interim numbering;
Second permute unit, obtains the second information for the interim numbering of emoticon being replaced into the temporary variable be associated with the interim numbering of emoticon in the 5th information.
Further, second is designated temporary variable, and the first extraction module 404, comprising:
First extraction unit, for extracting the temporary variable that the 3rd packets of information contains from the 3rd information;
Correspondingly, the second replacement module 405, comprising:
First acquiring unit, for obtaining the interim numbering be associated with temporary variable;
3rd permute unit, obtains the 6th information for temporary variable being replaced into the interim numbering be associated with temporary variable in the 3rd information;
Second acquisition unit, for obtaining emoticon corresponding to interim numbering;
4th permute unit, for obtaining the 4th information by the emoticon that numbering is replaced into interim numbering corresponding temporarily in the 6th information.
Further, first is designated word corresponding to emoticon, and the language format of word is source language form;
First replacement module 402, comprising:
3rd acquiring unit, for according to emoticon, obtains the attribute information of emoticon;
4th acquiring unit, for obtaining at least one word corresponding to emoticon according to the attribute information of emoticon;
5th permute unit, for the emoticon in the first information being replaced into respectively each word at least one word, obtains the second information that each word is corresponding.
Further, the 3rd acquiring unit, comprising:
First obtains subelement, for the icon data according to emoticon, obtains the call number of emoticon from the corresponding relation of icon data and call number;
Second obtains subelement, for the call number according to emoticon, obtains the attribute information of emoticon the call number corresponding from source language and the corresponding relation of attribute information.
Further, the 4th acquiring unit, comprising:
Computation subunit, for calculate emoticon each attribute information of attribute information respectively and in semantic dictionary between similarity, semantic dictionary is used for the corresponding relation of attribute information storage and word;
3rd obtains subelement, meets at least one pre-conditioned attribute information for the similarity obtained between the attribute information of emoticon from semantic dictionary;
4th obtains subelement, for obtaining word corresponding to each attribute information at least one attribute information from semantic dictionary.
Further, the 4th acquiring unit, comprising:
Extract subelement, for extracting word corresponding to emoticon from attribute information corresponding to emoticon;
5th obtains subelement, for obtaining synonym or the near synonym of word corresponding to emoticon, and using synonym and near synonym as word corresponding to emoticon.
Further, the first extraction module 404, comprising:
Second extraction unit, for extracting the word of target language form corresponding to the first mark from the 3rd information, identifies the word extracted as second;
Correspondingly, the second replacement module 405, comprising:
5th acquiring unit, for obtaining the corresponding relation comprising the second mark in the corresponding relation from attribute information corresponding to target language and call number;
3rd extraction unit, for extracting the call number comprised in the corresponding relation of acquisition;
6th acquiring unit, for according to call number, obtains the icon data of emoticon from the corresponding relation of call number and icon data;
6th permute unit, obtains the 4th information for the icon data the second mark being replaced into the emoticon of acquisition in the 3rd information.
Further, the first extraction module 404, comprising:
3rd extraction unit, for extracting the word of target language form corresponding to the first mark from the 3rd information, identifies the word extracted as second;
Correspondingly, the second replacement module 405, comprising:
7th acquiring unit, to identify corresponding first identify for obtaining with second;
8th acquiring unit, for obtaining the corresponding relation comprising the first mark in the corresponding relation from attribute information corresponding to source language and call number;
4th extraction unit, for extracting the call number comprised in the corresponding relation of acquisition;
9th acquiring unit, for according to call number, obtains the icon data of emoticon from the corresponding relation of call number and icon data;
7th permute unit, obtains the 4th information for the icon data the second mark being replaced into emoticon in the 3rd information.
First effect of the present invention, it is the restriction that the present invention can not be subject to emoticon storehouse and dictionary for translation, effectively can realize the high precision translation of emoticon, reduce and construct the dictionary for translation, translation rule or the cost such as translation model, language model that comprise emoticon.
Second effect of the present invention is that the present invention effectively can solve the identification of the emoticon do not logged in emoticon dictionary, translation and Generating Problems.
3rd effect of the present invention is that the present invention effectively can solve the judgement of emoticon in the structural position of target language end and Generating Problems.At target language end, correctly identify the position at emoticon place, ensure the sentence structure of translation result and semantic integrality.
4th effect of the present invention, is the restriction that the present invention is not subject to languages, effectively can solves the identification of the emoticon of any languages, translation and Generating Problems.
It should be noted that: the method for the translation information provided in above-described embodiment is when translation information, only be illustrated with the division of above-mentioned each functional module, in practical application, can distribute as required and by above-mentioned functions and be completed by different functional modules, inner structure by the device of translation information is divided into different functional modules, to complete all or part of function described above.In addition, the device of the translation information that above-described embodiment provides and the embodiment of the method for translation information belong to same design, and its specific implementation process refers to embodiment of the method, repeats no more here.
It should be added that, translation information method of the present invention and translation information device are not propose for two kinds of specific language, have general applicability with the inventive method.The present invention is equally also applicable to other language pair.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
One of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment can have been come by hardware, the hardware that also can carry out instruction relevant by program completes, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium mentioned can be ROM (read-only memory), disk or CD etc.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. a method for translation information, is characterized in that, described method comprises:
Obtain the emoticon comprised in the first information of source language form;
The first mark be replaced into by described emoticon in the described first information for identifying described emoticon obtains the second information;
Be the 3rd information of target language form by described second information translation;
Extract from described 3rd information and described first identify corresponding second and identify;
In described 3rd information, the emoticon that described second mark is replaced into described second mark corresponding is obtained the 4th information.
2. the method for claim 1, is characterized in that, described first is designated temporary variable, and the form of described temporary variable is all identical in often kind of language format;
Described the first mark be replaced into by described emoticon in the described first information for identifying described emoticon obtains the second information, comprising:
For described emoticon distributes interim numbering;
The interim numbering described emoticon being replaced into described emoticon in the described first information obtains the 5th information;
According to the position of described emoticon in the described first information be described emoticon distribute temporary variable;
Associate the temporary variable of described emoticon and interim numbering;
The interim numbering of described emoticon is replaced into the temporary variable be associated with the interim numbering of described emoticon in described 5th information and obtains the second information.
3. the method for claim 1, is characterized in that, described second is designated temporary variable, describedly extracts from described 3rd information and described first identifies corresponding second and identify, and comprising:
The temporary variable that described 3rd packets of information contains is extracted from described 3rd information;
Correspondingly, the described emoticon described second mark being replaced into described second mark correspondence in described 3rd information obtains the 4th information, comprising:
Obtain the interim numbering be associated with described temporary variable;
Described temporary variable is replaced into the interim numbering be associated with described temporary variable in described 3rd information and obtains the 6th information;
Obtain the emoticon of described interim numbering correspondence;
The emoticon described interim numbering being replaced into described interim numbering correspondence in described 6th information obtains the 4th information.
4. the method for claim 1, is characterized in that, described first is designated word corresponding to described emoticon, and the language format of described word is source language form;
Described the first mark be replaced into by described emoticon in the described first information for identifying described emoticon obtains the second information, comprising:
According to described emoticon, obtain the attribute information of described emoticon;
At least one word corresponding to described emoticon is obtained according to the attribute information of described emoticon;
Described emoticon in the described first information is replaced into respectively each word at least one word described, obtains the second information that described each word is corresponding.
5. method as claimed in claim 4, is characterized in that, described according to described emoticon, obtains the attribute information of described emoticon, comprising:
According to the icon data of described emoticon, from the corresponding relation of icon data and call number, obtain the call number of described emoticon;
According to the call number of described emoticon, the call number corresponding from source language and the corresponding relation of attribute information, obtain the attribute information of described emoticon.
6. method as claimed in claim 4, it is characterized in that, the described attribute information according to described emoticon obtains at least one word corresponding to described emoticon, comprising:
Similarity between each attribute information of the attribute information calculating described emoticon respectively and in semantic dictionary, described semantic dictionary is used for the corresponding relation of attribute information storage and word;
From described semantic dictionary, the similarity obtained between the attribute information of described emoticon meets at least one pre-conditioned attribute information;
The word that each attribute information at least one attribute information described in obtaining from described semantic dictionary is corresponding.
7. method as claimed in claim 4, it is characterized in that, the described attribute information according to described emoticon obtains at least one word corresponding to described emoticon, comprising:
Word corresponding to described emoticon is extracted from the attribute information that described emoticon is corresponding;
Obtain synonym or the near synonym of word corresponding to described emoticon, and using described synonym and near synonym as word corresponding to described emoticon.
8. the method for claim 1, is characterized in that, describedly extracts from described 3rd information and described first identifies corresponding second and identify, and comprising:
The word of target language form corresponding to described first mark is extracted, using the word of described extraction as the second mark from described 3rd information;
Correspondingly, the described emoticon described second mark being replaced into described second mark correspondence in described 3rd information obtains the 4th information, comprising:
Obtain the attribute information corresponding from target language and the corresponding relation of call number and comprise the described second corresponding relation identified;
Extract the call number comprised in the corresponding relation of described acquisition;
According to described call number, from the corresponding relation of call number and icon data, obtain the icon data of emoticon;
In described 3rd information, the icon data that described second mark is replaced into the emoticon of described acquisition is obtained the 4th information.
9. the method for claim 1, is characterized in that, describedly extracts from described 3rd information and described first identifies corresponding second and identify, and comprising:
The word of target language form corresponding to described first mark is extracted, using the word of described extraction as the second mark from described 3rd information;
Correspondingly, the described emoticon described second mark being replaced into described second mark correspondence in described 3rd information obtains the 4th information, comprising:
Obtain and described second identify corresponding first and identify;
Obtain the attribute information corresponding from source language and the corresponding relation of call number and comprise the described first corresponding relation identified;
Extract the call number comprised in the corresponding relation of described acquisition;
According to described call number, from the corresponding relation of call number and icon data, obtain the icon data of emoticon;
In described 3rd information, the icon data that described second mark is replaced into described emoticon is obtained the 4th information.
10. a device for translation information, is characterized in that, described device comprises:
First acquisition module, for obtaining the emoticon comprised in the first information of source language form;
First replacement module, obtains the second information for the first mark be replaced into by described emoticon in the described first information for identifying described emoticon;
Translation module, for by described second information translation being the 3rd information of target language form;
First extraction module, for extracting and described first identifying corresponding second and identify from described 3rd information;
Second replacement module, for obtaining the 4th information by the emoticon that described second mark is replaced into described second mark corresponding in described 3rd information.
CN201510119654.0A 2015-03-18 2015-03-18 The method and apparatus of translation information Active CN104699675B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510119654.0A CN104699675B (en) 2015-03-18 2015-03-18 The method and apparatus of translation information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510119654.0A CN104699675B (en) 2015-03-18 2015-03-18 The method and apparatus of translation information

Publications (2)

Publication Number Publication Date
CN104699675A true CN104699675A (en) 2015-06-10
CN104699675B CN104699675B (en) 2018-01-30

Family

ID=53346814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510119654.0A Active CN104699675B (en) 2015-03-18 2015-03-18 The method and apparatus of translation information

Country Status (1)

Country Link
CN (1) CN104699675B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106708810A (en) * 2016-12-19 2017-05-24 新译信息科技(深圳)有限公司 Machine translation method, device and terminal device
CN108027812A (en) * 2015-09-18 2018-05-11 迈克菲有限责任公司 System and method for multipath language translation
CN110688840A (en) * 2019-09-26 2020-01-14 联想(北京)有限公司 Text conversion method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1655231A (en) * 2004-02-10 2005-08-17 乐金电子(中国)研究开发中心有限公司 Expression figure explanation treatment method for text and voice transfer system
US6963839B1 (en) * 2000-11-03 2005-11-08 At&T Corp. System and method of controlling sound in a multi-media communication application
CN101030368A (en) * 2006-03-03 2007-09-05 国际商业机器公司 Method and system for communicating across channels simultaneously with emotion preservation
CN101937431A (en) * 2010-08-18 2011-01-05 华南理工大学 Emotional voice translation device and processing method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6963839B1 (en) * 2000-11-03 2005-11-08 At&T Corp. System and method of controlling sound in a multi-media communication application
CN1655231A (en) * 2004-02-10 2005-08-17 乐金电子(中国)研究开发中心有限公司 Expression figure explanation treatment method for text and voice transfer system
CN101030368A (en) * 2006-03-03 2007-09-05 国际商业机器公司 Method and system for communicating across channels simultaneously with emotion preservation
CN101937431A (en) * 2010-08-18 2011-01-05 华南理工大学 Emotional voice translation device and processing method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108027812A (en) * 2015-09-18 2018-05-11 迈克菲有限责任公司 System and method for multipath language translation
CN106708810A (en) * 2016-12-19 2017-05-24 新译信息科技(深圳)有限公司 Machine translation method, device and terminal device
CN110688840A (en) * 2019-09-26 2020-01-14 联想(北京)有限公司 Text conversion method and device

Also Published As

Publication number Publication date
CN104699675B (en) 2018-01-30

Similar Documents

Publication Publication Date Title
Khalifa et al. A large scale corpus of Gulf Arabic
Vilares et al. Universal, unsupervised (rule-based), uncovered sentiment analysis
Silberztein Formalizing natural languages: The NooJ approach
CN107463553B (en) Text semantic extraction, representation and modeling method and system for elementary mathematic problems
Clark et al. Text normalization in social media: progress, problems and applications for a pre-processing system of casual English
Gómez-Adorno et al. Improving feature representation based on a neural network for author profiling in social media texts
Vandeghinste et al. Translating text into pictographs
CN103314369B (en) Machine translation apparatus and method
CN102122297A (en) Semantic-based Chinese network text emotion extracting method
Priyadarshi et al. Towards the first Maithili part of speech tagger: Resource creation and system development
CN111428469A (en) Sentence pattern structure diagram analysis oriented interactive labeling method and system
Sibarani et al. A study of parsing process on natural language processing in bahasa Indonesia
CN104699675B (en) The method and apparatus of translation information
CN102929865A (en) PDA (Personal Digital Assistant) translation system for inter-translating Chinese and languages of ASEAN (the Association of Southeast Asian Nations) countries
Sawalha et al. A standard tag set expounding traditional morphological features for Arabic language part-of-speech tagging
Shafi et al. UNLT: Urdu natural language toolkit
Simionescu Graphical grammar studio as a constraint grammar solution for part of speech tagging
US9779083B2 (en) Functioning of a computing device by a natural language processing method comprising analysis of sentences by clause types
CN104699662A (en) Method and device for recognizing whole symbol string
CN105045784A (en) English expression access device method and device
Akhmed-Zaki et al. Development of the information system for the Kazakh language preprocessing
Niederhut niacin: A Python package for text data enrichment
Moussa et al. Tunisian Arabic aeb wordnet: Current state and future extensions
Oudah et al. Studying the impact of language-independent and language-specific features on hybrid Arabic Person name recognition
Lu et al. Language model for Mongolian polyphone proofreading

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant