Background technology
In existing telephone traffic prediction method, be mainly the telephone traffic prediction method adopting based on cluster and ARIMA model (Autoregressive Integrated Moving Average Model difference ARMA model), the flow process of this method mainly comprises: one, according to priori, traffic microzonation is divided into Four types: traffic backbone, bustling business district, institution of higher learning and residential neighborhoods; Two, preliminary treatment, obtains the cluster feature of each traffic community, and described cluster feature comprises coefficient correlation, variance, maximum, median, mean value, minimum value, value and standard deviation that the frequency of occurrences is the highest; Three, according to the cluster feature of each traffic community, and adopt K-MEANS clustering algorithm to carry out cluster, form refinement traffic cell type; Four, adopt ARIMA model to carry out traffic forecast, identical modeling parameters is selected in the refinement traffic community of same type.
While adopting aforesaid way to carry out traffic forecast, the mode of traffic community being divided according to expert's historical experience with very large subjectivity, divide inaccurate problem.Also there is following problem simultaneously:
1. the results service of cluster is explanatory poor: the method adopts coefficient correlation, variance, maximum, median, mean value, minimum value, the value that the frequency of occurrences is the highest and standard deviation are set up Clustering Model, coefficient correlation is the correlation between reflection data, average, the measurement of the Zhi Shi data center value that median and the frequency of occurrences are the highest, and variance, standard deviation, maximum and minimum value are the measurements of data discrete degree, the community that average is close, standard deviation possibility difference is very large, and the close community of standard deviation, it is larger that the difference of average can differ, and the weight of average and standard deviation is also different.The Computing Principle of cluster is to calculate distance and similitude between each variable, each variable all can calculate, because different variablees is the dispersion degree, intensity and the correlation that reflect respectively data, for different variablees, adopt and set up in a like fashion Clustering Model, cause the results service of cluster explanatory poor.
2. the result of cluster can not be cured to traffic forecast algorithm.The process need of cluster is adjusted repeatedly, and needs business to explain, better if business is explained, could application model.Because Clustering Model is to calculate the process of similarity, the result of model does not have regular borders, and the result of cluster can not be cured to prediction algorithm.
Summary of the invention
For the defect existing in prior art, the object of the present invention is to provide a kind of telephone traffic prediction method, improve the precision of cellular traffic prediction.
For achieving the above object, the technical solution used in the present invention is as follows:
A telephone traffic prediction method, comprises the following steps:
(1) obtain the communicating data of each selected traffic community, calculate the traffic measurement feature of each community; Described traffic measurement feature comprises average, variance, standard deviation, median, maximum and the minimum value of cell setting time period internal traffic;
(2) statistical nature is divided into groups, average and median are divided into concentrated stack features, variance, standard deviation, maximum and minimum value are divided into discrete groups feature;
(3) average in the concentrated stack features of selected all traffics community and the standard deviation in discrete groups feature are carried out respectively to cluster, and mark off cell type according to cluster result;
(4) cluster result is set up to disaggregated model, according to disaggregated model, obtain the rule that defines of dissimilar community;
(5) according to the described rule judgment community to be identified of defining, go out cell type, and difference ARMA model is set up in the community of same type, according to difference ARMA model, carry out traffic forecast.
Further, a kind of telephone traffic prediction method as above, in step (3), carries out cluster to average and standard deviation, and the concrete mode that marks off cell type according to cluster result is:
Adopt clustering algorithm to carry out respectively cluster to the average of selected all traffics community and standard deviation, the result after mean cluster is divided into two groups of A and B automatically, the result after standard deviation cluster is divided into two groups of C and D automatically;
Calculate respectively the average of A group and B group, by calculating high one group of average afterwards, be defined as high average group, one group that average is low is defined as low average group;
Calculate respectively the average of C group and D group, by calculating high one group of average afterwards, be defined as high dispersion group, one group of low average is defined as low discrete groups;
Described high average group, low average group, high dispersion group and low discrete groups are intersected, according to the group at place, community, microzonation is divided into high average-high dispersion, high average-low discrete, low average-high dispersion and low average-low discrete Four types.
Further, a kind of telephone traffic prediction method as above, adopts two-step clustering algorithm to carry out respectively cluster to the average of selected all traffics community and standard value.
Further, a kind of telephone traffic prediction method as above, adopts C5.0 decision Tree algorithms to set up disaggregated model to cluster result.
Further, a kind of telephone traffic prediction method as above, in step (4), while adopting C5.0 decision Tree algorithms to set up disaggregated model to cluster result, average and the standard deviation of mode input Zhi Wei community, the type that the variable of output is community.
Further again, a kind of telephone traffic prediction method as above, described in define rule and refer to for identifying the rule of cell type, comprise the boundary parameters of average and standard deviation.
Further, a kind of telephone traffic prediction method as above, in step (5), while carrying out traffic forecast according to difference ARMA model, for the community of same type, adopt identical difference ARMA model parameter to carry out traffic forecast.
Beneficial effect of the present invention is:
1. the technical program effectively promotes the quality of clustering community with respect to prior art scheme, thereby promotes traffic forecast precision.Existing technical scheme is the related data of the related data of central value and the dispersion degree cluster of putting together, and after cluster, the average of generic user's cluster index and variance are all larger.The average of the generic community of grouping and clustering index is compared original technical scheme and is obviously declined respectively with variance, can effectively promote the quality of cluster.
2. disaggregated model identification cluster result, can be cured to algorithm.The process of existing technical scheme cluster and result all can not be cured to traffic forecast algorithm, and the present invention can accurately identify the border of segmenting community after cluster, and the rule on border can be cured to the algorithm of prediction completely, automatically operation.
Embodiment
Below in conjunction with Figure of description and embodiment, the present invention is described in further detail.
Fig. 1 shows the flow chart of a kind of telephone traffic prediction method in specific implementation method of the present invention, and the method mainly comprises the following steps:
Step S11: the traffic measurement feature of calculating each traffic community;
Obtain the communicating data of each selected traffic community, calculate the traffic measurement feature of each community; Described traffic measurement feature comprises average, variance, standard deviation, median, maximum and the minimum value of cell setting time period internal traffic.
In present embodiment, the concrete mode of traffic measurement feature of calculation plot is as follows:
1, average
With x
1, x
2, x
3... x
n(n<=31) Wei Mou community telephone traffic month every day, x is arithmetic mean value, is called for short average: x=(x
1+ x
2+ ...+x
n)/n;
2, variance
S^2=[(x
1-x) ^2+ (x
2-x) ^2+...... (x
n-x) ^2]/n (x is average)
3, standard deviation
s=sqrt(((x
1-x)^2+(x
2-x)^2+......(x
n-x)^2)/(n)))
4, median
To x
1, x
2, x
3... x
nsort, the value of the number on sequence centre position is median
5, maximum and minimum value are x
1, x
2, x
3... x
nin maximum and minimum value
Step S12: statistical nature is divided into groups;
Statistical nature is divided into groups, average and median are divided into concentrated stack features, variance, standard deviation, maximum and minimum value are divided into discrete groups feature.
In the traffic measurement feature of the community of calculating in step S11, average and median are the features of identifying cells telephone traffic intensity to a certain extent, variance, standard deviation, maximum and minimum value are the features of identifying cells telephone traffic dispersion degree to a certain extent, the impact on cellular traffic according to each statistical nature in present embodiment, average in statistical nature and median are divided into concentrated stack features, variance, standard deviation, maximum and minimum value are divided into discrete groups feature.
Step S13: statistical nature is carried out to grouping and clustering, mark off cell type according to cluster result;
The principle of cluster is to calculate distance and similitude between each variable.In to telephone traffic forecast analysis process, because different traffic measurement characteristic variables is the dispersion degree, intensity and the correlation that reflect respectively data, after the traffic measurement feature clustering of community, the community that average is close, standard deviation possibility difference is very large, and the close community of standard deviation, the difference possibility difference of average is very large, if all traffic measurement features are put together and carried out cluster, will cause the result of cluster poor.
In order to overcome the problems referred to above, the mode that the present invention adopts step S12 divides the traffic measurement feature of community in order to concentrate stack features and discrete groups feature, and by concentrating stack features and discrete groups feature to carry out respectively the cluster result that telephone traffic is concentrated dimension cluster result and telephone traffic dispersion degree that obtains of cluster.
To after index (traffic measurement feature) grouping, to the selection of cluster index, be the process of exploring, to features such as average, variance, standard deviation, median, maximum and minimum values, carry out respectively after single index and Multiple Attribute Clustering, by observing the result of cluster and explanatory, find to select, to concentrating average and the standard deviation in discrete groups feature in stack features to carry out respectively cluster, can obtain good Clustering Effect.Therefore, in present embodiment, be finally to adopt respectively average and standard deviation to carry out cluster, other statistical nature is used for describing feature after cluster, that is to say by the analysis judgement average of other statistical natures and the cluster quality after standard deviation cluster, for example, for this statistical nature of median, can be by calculating the average of median of all communities and the quality that standard deviation is described cluster feature and evaluation analysis cluster result.
In present embodiment, adopt two-step clustering algorithm to carry out respectively cluster to the average of selected all traffics community and standard value, and according to cluster result, microzonation is divided into high average-high dispersion, high average-low discrete, low average-high dispersion and low average-low discrete Four types.Average and standard deviation are carried out to cluster, and the concrete mode that marks off cell type according to cluster result is:
Adopt clustering algorithm to carry out respectively cluster to the average of selected all traffics community and standard deviation, the result after mean cluster is divided into two groups of A and B automatically, the result after standard deviation cluster is divided into two groups of C and D automatically;
Calculate respectively the average of A group and B group, by calculating high one group of average afterwards, be defined as high average group, one group that average is low is defined as low average group;
Calculate respectively the average of C group and D group, by calculating high one group of average afterwards, be defined as high dispersion group, one group of low average is defined as low discrete groups;
Described high average group, low average group, high dispersion group and low discrete groups are intersected, according to the group at place, community, selected microzonation is divided into high average-high dispersion, high average-low discrete, low average-high dispersion and low average-low discrete Four types.
In present embodiment, choose the traffic data in 2 weeks April of Liao827Ge community and carry out cluster, after adopting two-step clustering algorithm to mean cluster, mean cluster result is divided into two groups automatically, A group and B group, and then calculate respectively the average of all data of A group and B group, the average that obtains all data of A group is that the average of all data of organizing of 0.48, B is 1.83, A group is high average group so, and B group is low average group.Same method can obtain high dispersion group and low discrete groups according to standard deviation.By obtaining four groups after two average groups and two discrete groups intersections, microzonation is divided into four type communities, the average of Ru Yige community and standard deviation belong to respectively high average group and low centrifugal pump group, and the type of this community is high average-low discrete type so.
The modeling tool using in present embodiment is modeler, and two-step clustering algorithm is existing method, adopts this algorithm to carry out the cluster of average and standard deviation, and the result after cluster can be divided into high and low two groups automatically.The mean profile value of average and standard deviation being carried out to the cluster that cluster obtains is 0.7, and the quality of cluster is good.
Step S14: cluster result is set up to disaggregated model, obtain the rule that defines of dissimilar community according to disaggregated model;
The described rule that defines refers to for identifying the rule of cell type, comprises the boundary parameters of average and mean value.In present embodiment, adopt C5.0 decision Tree algorithms to set up disaggregated model to cluster result, identify the rule that defines of dissimilar community.
The in the situation that of having obtained high average-high dispersion, high average-low discrete, low average-high dispersion and low average-low discrete Four types community in step S13, selected traffic community is divided into four groups by this Four types, and take four groups as sample, set up disaggregated model, obtain the classifying rules of four groups.Sorting algorithm adopts C5.0 algorithm, and the value of input is average and the standard deviation of four groups of all communities, and the variable of output is classification (types of four groups).By C5.0 algorithm, set up decision-tree model, just can obtain the rule that defines of each type community.The modeling tool using in present embodiment is modeler, and C5.0 decision Tree algorithms is prior art.
In this embodiment, adopt modeler modeling tool and C5.0 decision Tree algorithms to set up after disaggregated model the cluster result of selected 827Ge community in step S13, defining of the dissimilar community obtaining is regular as follows:
Wherein, mean is average, and std is standard deviation, and it is low discrete that cluster-1 represents that low average, cluster 2 represent, the above-mentioned rule that defines is specifically expressed as follows:
The low discrete mean<=1.034 of low average, std<=1.002
Low average high dispersion mean<=1.034, std>1.002
The low discrete mean>1.034 of high average, std<=1.003
High average high dispersion mean>1.034, std>1.003
If the telephone traffic average of a community to be identified is less than or equal to 1.034, standard deviation is less than or equal to 1.002, and the type of this community is low average-low discrete community so.
Step S15: identify the cell type of community to be identified according to defining rule, and difference ARMA model is set up in the community of same type, carry out traffic forecast according to difference ARMA model.
According to the rule that defines drawing in step S14, type identification is carried out in community to be identified, identify after the type of community, for the community of same type, set up difference ARMA model, be ARIMA model, and according to this model, carry out the prediction of cellular traffic.
ARIMA model full name is difference ARMA model.By the famous Time Series Forecasting Methods that Bock is thought and Charles Jenkins proposed the beginning of the seventies, Bock think of-Charles Jenkins method.Wherein ARIMA (p, d, q) is called difference ARMA model, and AR is autoregression, and p is autoregression item; MA is rolling average, and q is rolling average item number, and d is that time series becomes the difference number of times of doing when steady.
The basic thought of ARIMA model is: forecasting object is passed in time and the data sequence that forms is considered as a random sequence, by certain Mathematical Modeling, carried out this sequence of approximate description.Once just can carry out predict future value from seasonal effect in time series past value and present value after this model is identified.
While carrying out traffic forecast according to ARIMA model, community for same type adopts identical difference ARMA model parameter (p, d, q) carry out traffic forecast modeling, and adopt AIC criterion (akaike information criterion) to evaluate the fitting result of model.
Adopt the method for the present invention result schematic diagram that traffic forecast is carried out in discrete and high average-high dispersion Four types community to low average-low discrete, low average-high dispersion, high average-low respectively as Figure 2-Figure 5, the actual telephone traffic curve that in figure, curve A is community, curve B is the match value of the telephone traffic predicted of employing method of the present invention.As can be seen from Figure, adopt the inventive method carry out traffic forecast to predict the outcome with actual telephone traffic be basic coincideing, traffic forecast result is good.
Method of the present invention adopts the method for grouping and clustering, different variable (traffic measurement feature) according to feature (concentrated stack features) in variable poly-with first divide into groups from middle feature (discrete groups feature), rear cluster.The cohesion of cluster after grouping and separated after profile clear, can effectively promote the quality that community is divided, thus the precision of lifting traffic forecast.And the method for disaggregated model identification cluster result, can accurately identify the border (defining rule) of dissimilar community after cluster, the rule on border can be cured to the algorithm of prediction completely, automatically operation, like this when the type of judgement community, with regard to not needing to carry out again cluster, directly according to recognition rule, cell type is done and judged.Because the feature of community is time dependent, prior art needs community to be done after cluster identification at every turn, then removes traffic forecast.And after the foundation of this technology model, doing traffic forecast is without setting up cluster and disaggregated model again again, just just can adjust when model need to be optimized later.
Obviously, those skilled in the art can carry out various changes and modification and not depart from the spirit and scope of the present invention the present invention.Like this, if within of the present invention these are revised and modification belongs to the scope of the claims in the present invention and equivalent technology thereof, the present invention is also intended to comprise these changes and modification interior.