WO2007131445A1

WO2007131445A1 - A method, a system and a apparatus for censoring video code stream

Info

Publication number: WO2007131445A1
Application number: PCT/CN2007/001548
Authority: WO
Inventors: Zhong Luo
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2006-05-16
Filing date: 2007-05-14
Publication date: 2007-11-22
Also published as: CN1968250A; CN1968250B

Abstract

A method, a system and a relative apparatus are provided to solve the problem of censoring video code stream based on layered video coding technique. The correlation between base layer and enhancement layer in the layered coding technique is used to generate a censoring code stream.The censoring code stream includes base layer code stream, or may further include part of enhancement layer code stream. A specifically designed censoring apparatus censors and transfers the censoring code stream. When the code stream includes inappropriate content, the censoring apparatus will cut off the transfer of the censoring code stream such that the user receiving the stream can't decode and reconstruct the original video stream anymore. In this way, inappropriate content can be prevented from broadcasting. The method, system and apparatus disclosed can be applied in live mode or storage mode in the streaming media service.

Description

Video code stream review method, system and device

The present invention relates to multimedia communication technologies, and in particular to a method, system and related device for video code stream review based on layered coding in a multimedia communication process. Background technique

As a basic form of multimedia communication, Streaming Media has spawned many forms of multimedia communication services: conference television/visual telephone, IPTV (Internet Protocol Television, IP TV), VOD (Video on Demand, video on demand). ), instant messaging, etc. Therefore, streaming media will become the basic form of communication on the NGN (Next Generation Network). Especially in recent years, the rapid rise of IPTV services at home and abroad, the application of streaming media on the network is also developing rapidly.

One type of service on streaming media, such as IPTV and V0D, is designed to provide video and audio content. The content is very broad, including film and television programs, news, sports competitions, concerts, etc. Various countries, especially China, have always attached great importance to the safety and monitoring of content, and all have relevant laws. From the perspective of protecting minors, countries also have relevant regulations. At the same time, there is such a need at operators/ISPs (Internet Service Providers) and content providers. The IPTV operation will be launched on a large scale in China. So the first question is how to ensure effective content monitoring and filtering to achieve filtering of harmful content. If this problem is not solved, the operation of IPTV in China will not be discussed, and the relevant state departments will not be able to issue licenses. Therefore, the solution to this problem is of great significance for promoting the development of the IPTV industry. For content security, the usual understanding includes two aspects:

1. For content protection, prevent content from being received by users without permission;

For example, to prevent theft of TV shows. For this type of intrusion, there are many mature technologies, such as encryption.

(Encryption) and Scrambling, authentication and authentication, and Digital Right Management (DRM); 2. For the prevention of intrusion of harmful and illegal content, the object of protection is the object of content attack, usually the audience.

This requires real-time review of the content, drawing on the current practice in the broadcast and television industry, mainly the 4 bar TV program stream (generally transmitted according to DVB-T, Digital Video Broadcasting-Terrestrial, the compression format of video and audio is MPEG). - 2, MPEG = Moving Picture Experts Group, an international standards organization) set up inspection nodes on the path from the program source (such as satellite) to the user's TV / STB for manual review. Of course, with the advancement of technology, some content reviews can be done automatically by the system or semi-automatically (human-machine combination). Manual review, once it is found that there is a problem with the content of the program, measures must be taken to stop the transmission of the program stream, and (in most cases) replace the program with harmful content, such as public service advertisements or subtitle announcements, with a temporarily replaced program. and many more. Of course, manual judgment requires the context of the collection content, and it takes a certain amount of time to react and dispose, so there must be a delay device to provide this delay, such as 5 seconds.

Content filtering is the processing and judgment of certain attributes of content. These content attributes can include: the name of the content provider, the URL of the content (Universal Resource Locator, the URL is an important type of UL), content The IP address of the server, etc., and the packet header information of the packet in the case of the packet encapsulation in the case of the packet, the information in the packet, and the like. It can be seen that this processing and filtering is also carried out in a hierarchy from shallow to deep.

The prior art one mainly performs content filtering according to external features of the content, or shallow features. The most typical example is UL filtering. The principle is shown in Figure 1. The content filtering device is located between the core network and the edge access network on the network, so that the media stream from the content source reaches the necessary gateway between the receiving ends. In practice, it can be placed in the same network location as the agent of the enterprise network, NAT (Network Address Translator) / F (Firewall, firewall), and for the case of broadband home users, and BAS (Broadband Administration) System, Broadband Management and Admission System, DISLAM is placed in the same location, or placed on the POP's Point of Presence. The filtering device itself has an internal database with information about multiple content source URLs. According to this database, it is possible to determine whether a part of the content source is harmful, and to block harmful content sources and release harmless content sources. At the same time, there are many content rating service providers that provide third-party services. Their databases are more abundant and professional. Content filtering devices can also connect with such third-party service providers to use their services for URL filtering.

The prior art 1 has the following problems:

1. Mistakes: According to URL filtering, it is possible to filter out harmless content. For example, some websites provide video program on demand, some of which are harmful, but some are healthy movies, which cannot be distinguished only according to URLs;

2, misplacement problem: Some U L may be considered as a qualified website in the grading system, may also have problems (being hacked to impersonate its website, or their own illegal attempts, etc.);

3, U L filtering, usually also requires a third-party rating system, such a rating system is available, some paid rating service providers specialize in providing rating services. But their results are not completely accurate and exhaustive of all the content on the network. And the content on the web is constantly changing, and it is impossible for any rating system to keep up with these changes in a timely manner.

For very demanding application scenarios, such as IPTV for the national public, the harm caused by the successful invasion of harmful content, especially politically sensitive content, is enormous. It must be foolproof, so shallow filtering is unreliable. The deepest level of content filtering must be used, that is, the filtering of video and audio data itself, such as the recognition of images, the identification of harmful scenes (violent, pornographic, etc.), harmful text information (subtitles), faces of specific people, and so on.

To achieve a high rate of filtering accuracy, you must go deep into the deepest level, the content data itself. Belonging to this research focus, deep packet filter DPF (Deep Packet Filtering) ₀

The prior art 2 deep DPF is based on manual deep content setting. In this case, the content filtering device can decode the media stream and play the content (assuming that encryption is not a problem, because the encryption problem can be legally monitored by the communication device. Request for resolution), for review by manual monitors. If a problem is found, the monitor immediately takes action to cut off harmful content and switch to a harmless content such as a public service advertisement. Of course, there must be a considerable capacity after the content filtering device. Delay the device to delay harmful content and give the supervisor a certain amount of judgment and reaction time (for example, 5 seconds). Of course, this process can also be implemented by automatic or human-machine combined semi-automatic methods. The implementation principle is shown in Figure 2.

The basic idea of the prior art 2 is correct, and has been practically applied for many years in the field of broadcast television, and the effect is good. However, the content review for streaming media content services on the IP (Internet Protocol) network requires considerable improvement and improvement. The main problems are:

1. The situation of IP networks is much more complicated than that of broadcast TV networks in terms of structure and network topology. There are many more programs. If the programs being reviewed are transmitted to the network center, they will occupy too much communication resources.

2. The IP network has many content sources and many programs. If the centralized decoding process is performed during the review, the processing capacity of the decoding is too large, and the capacity of the review device is too high, and the existing equipment cannot be satisfied.

Layered video coding is a method of layered compression coding of a 4 bar video data stream. The main idea is to output multiple coding layers. The most important part is the base layer, above the basic layer. There are multiple enhancement layers, and the base layer and the enhancement layer can be sent separately (you can take different network paths). At the receiving end, the base layer can independently decode and reconstruct the base layer video, but the enhancement layer must rely on the base layer and/or the enhancement layer below it to decode the corresponding video. At the receiving end, the reconstructed base layer and each enhancement layer video code stream are superimposed according to rules specified by a specific layered coding method, thereby obtaining a total video stream. Hierarchical coding has many advantages. The most significant is the ability to enhance the adaptability of streaming services to various network conditions (static differences and dynamic changes). For example, the program source can output a basic layer with a bit rate of 384kpbs. And an enhancement layer whose bit rate is 768 kbps, then if the user's access bandwidth is greater than 1024 kbps, the base layer plus the enhancement layer can be received, and if it is less than 1024 kbps, only the base layer can be received. This is the case where there is a static difference in the heterogeneous network. The more common situation in the network is dynamic changes. For example, because of network congestion, the bandwidth is reduced. The original user can receive the base layer and the enhancement layer, but after the bandwidth is reduced, only the base layer can be received. However, the user can still see that if there is no layered coding, there is only one layer, and the bandwidth may be reduced, and the user cannot watch it.

As shown in Figure 3, Figure 3 shows a schematic diagram of the principle of layered video coding, which summarizes the current A variety of major layered coding techniques. In general, there are three different hierarchical coding mechanisms:

1, time layered coding

Discard some coded frames and reduce the number of coded frames per unit time, which is called the frame rate, so as to reduce the bit rate. If these frames are so-called bi-predictive frames (also called B-frames), then these B-frames can form an enhancement layer because the encoding and decoding of B-frames depends on other frames, such as intra-coded frames (I-frames) and predictive coding. Frame (P frame), but other frames do not depend on B frame. Therefore, it just meets the dependency between the base layer and the enhancement layer. Therefore, the method for forming the time layer coding is: the base layer coding only includes the I frame and the P frame; then the B frame is generated by the bidirectional prediction method between the I frame and the P frame, thereby forming an enhancement layer, and may be added as needed The number of B frames.

2. SNR layered coding

The coding quality of the base layer can be relatively low, resulting in a relatively low signal-to-noise ratio (SNR = Signal-to-Noise Ratio). In general, coarse quantization by increasing the quantization parameter (QP = Quantization Parameter) can lower the SNR of the base layer and obtain a lower bit rate. The reconstructed residual of the base layer is sent to the enhancement layer for encoding. In the enhancement layer, if intra coding is performed, an EI frame is obtained, and if inter prediction coding is performed, an EP frame is obtained.

3, spatial layered coding

Spatial layered coding is similar to SNR layered coding, except that the spatial amplification is performed before the enhancement of the enhancement layer coded input data by the base layer reconstruction. In the technical language of video coding, it is called Upsampling. In general, the image is magnified k _h , k _v times in the horizontal and vertical directions. Of course, in general, k _h , k _v are equal. Otherwise the image is distorted and the scale is out of balance. But in special applications, it can be unequal. A typical application is to magnify the image of the base layer by a factor of two in the horizontal and vertical directions. If the base layer is a QCIF image, the enhancement layer is a CIF image.

Corresponding to Fig. 3, in order to summarize the above three mechanisms, a decomposition transform T of the encoding end and a synthetic transform 1 of the decoding end are introduced. As long as the appropriate transforms T and R are defined, different layered codes apply to this basic framework. For example, for spatial layered coding, the decomposition transform T is the difference to obtain the reconstructed residual, and then upsampling; and the synthetic transform R first upsamples the low-level reconstructed frame and then the upper layer. The reconstructed frames are superimposed.

The above three basic mechanisms are widely used in major video compression coding standards such as ITU-T H.263/H.263+, H.264, MPEG-2, and MPEG-4. On top of these three basic mechanisms, there are other variant techniques, such as the so-called FGS (Fine Granularity Scalability), in which the enhancement layer is not in accordance with the conventional coding method, but on the motion prediction residual. After DCT (Discrete Cosine Transform), encoding is performed according to the bit plane method, and a so-called embedded code stream that can be decoded normally for truncation at any position is obtained, thereby providing a very thin bandwidth for the enhancement layer. Granularity and better adaptability to network conditions. However, because fine-grained hierarchical coding requires computational complexity, its practicability is low. Summary of the invention

The present invention provides a video code stream review method, system and related device to solve the problem of how to implement video code stream deep content review based on layered coding technology in the prior art.

In order to solve the above problems, the present invention provides the following technical solutions:

A video stream review method, the content source layer-encoding the original video code stream into a base layer code stream and at least one layer of enhancement layer code stream. When transmitting the video code stream, the method includes the following steps:

A, forwarding the basic layer code stream to the review device, delaying the code stream of the enhancement layer, and transmitting the code stream to the user receiving end;

B. The review device checks whether the basic layer code stream contains harmful content, and if the basic layer code stream is forwarded to the user receiving end, the basic layer code stream is forwarded to the user receiving end.

In the live broadcast mode, in the step A, the content source directly sends or forwards the corresponding code stream of the layered code; in the step B, the review device notifies the content source to raise/lower the basic according to the required video quality. The coding quality of the layer code stream, or the forwarding of the base layer code stream, while increasing/decreasing the enhancement layer code stream for enhancing the review effect and accuracy.

When the content play mode is stored, in the step A, the content source first stores the generated base layer code stream and the enhancement layer code stream respectively to the corresponding base layer code stream track and the enhancement layer track; The base server reads the base layer code stream from the base layer code stream track and forwards it to the review device, reads the enhancement layer code stream and performs delay processing, and then sends the result to the user terminal; in step B, the review The device notifies the streaming media server to increase/decrease the enhanced layer code stream for enhancing the review effect and accuracy while forwarding the base layer code stream according to the required video quality.

The review method of the review device includes: decoding the base layer code stream image and inputting an automatic identification device, and the automatic identification device compares and analyzes the harmful content in the pre-stored harmful content database with the related content included in the base layer code stream image Performing automatic identification of harmful content; and/or displaying the base layer code stream image to a monitor for manual identification of harmful content.

When the manual recognition and the automatic recognition are simultaneously performed, if the recognition results of the two are inconsistent, the judgment result of the automatic identification device or the monitor is preferentially executed.

Alternatively, when the manual identification and the automatic recognition are simultaneously performed, the automatic identification device and the monitor respectively give corresponding harmful degree scores for the identified harmful content according to a preset rule, and then weight the two scores. After processing, the final execution result is obtained. When only the score given by one party for the identified harmful content is received, the default value given by the other party for the content is zero.

The specific weighting processing method is as follows:

S corpse ( W _M XSM+W _H XSH ) I ( WM+WH )

Wherein, W _M and W _H represent the weights of the automatic identification device and the monitor, and the relative size between W _M and W _H represents the degree of trust in the recognition result, and the S _M and the S _H respectively automatically identify the device and the monitor to give If the score is greater than a given value, the judgment result is harmful, otherwise the judgment result is harmless, and W _M , W _H and the given value are respectively set according to the empirical value.

In the method, the reviewing device turns off and forwards the basic layer code stream, and starts forwarding the standby harmless video code stream.

The method also includes: reviewing the device recording and saving the base layer code stream for the specified time period and the enhancement layer code stream for enhancing the review effect and accuracy.

The method also includes: recording the identification of the harmful content in a log and generating a report.

In the method, when the user receiving end receives all the enhancement layer code streams and the base layer code stream, decoding Reconstructing the original video stream.

When the spatial layered video coding method is based on the content source, the original video code stream is reduced by a set ratio, and then layered coding is performed.

The present invention also provides a video stream review system, including: a video input device, a video encoding device, and a streaming server, and a review device;

The video encoding device hierarchically encodes the original video code stream collected by the video input device into a base layer code stream and at least one enhancement layer code stream, and forwards the basic layer code stream to the review device through the communication network, and simultaneously All the enhancement layer code streams are sent to the user receiving end through the communication network after the delay processing; the reviewing device checks whether the basic layer code stream contains harmful content, and if not, forwards the basic layer code stream to the user through the communication network. End, otherwise cut off to forward the base layer code stream to the user receiver.

The video encoding device includes:

An encoder that encodes the base layer code stream and the enhancement layer code stream, and forwards the base layer code stream directly to the reviewing device;

The first delay module delays the enhancement layer code stream and sends the stream to the user receiving end. The review device further includes: a communication module, the communication connection video coding device, the review device notifying the video coding device to correspondingly improve/reduce the coding quality of the basic layer code stream, or forwarding the basic layer code stream according to the required video quality At the same time, the enhancement layer code stream for enhancing the review effect and accuracy is added/reduced, and the enhancement layer code stream for enhancing the review effect and accuracy is delayed and forwarded to the user receiving end.

The video stream review system further includes: a video content database, where the base layer code stream and the enhancement layer code stream are saved, wherein the base layer code stream is saved in a set base layer code stream track, and the enhancement The layer code stream is stored in the set enhancement layer code stream track; the streaming media server comprises: a code stream reading module and a second delay module, wherein the code stream reading module reads out from the base layer code stream track The base layer code stream is forwarded to the reviewing device, and the enhancement layer code stream is read from the enhancement layer code stream track and sent to the user receiving end by performing delay processing by the second delay module.

The review device further includes: a communication module, a communication connection video encoding device streaming media server, The reviewing device notifies the streaming media server to increase/decrease the enhanced layer code stream for enhancing the review effect and accuracy according to the required video quality, and at the same time, is used to enhance the review effect and accuracy. The enhancement layer code stream is delayed and forwarded to the user receiver.

The review device includes:

The third delay module delays processing all the received code streams and forwards them to the user receiving end; the review module reviews all the received code streams, and outputs corresponding control signals when it is detected that the code stream contains harmful content;

a switch module connected behind the third delay module;

And a control module, connected between the review module and the switch module, triggering disconnection of the switch module according to the control signal.

The review module specifically includes a manual identification sub-module and/or an automatic identification sub-module, where: the manual identification sub-module specifically includes: a decoding unit, an enhancement processing unit, a display unit, and an instruction receiving unit, where the decoding unit decodes All the code stream images received, the image is processed by the enhancement processing unit and displayed on the display unit to the monitor, and when the monitor manually recognizes that the image contains harmful content, the instruction is triggered by the instruction receiving unit. The unit outputs the control signal;

The automatic identification sub-module specifically includes: an automatic identification unit and a harmful content database, and the automatic identification unit performs comparative analysis according to the harmful content in the harmful content database and the related content included in all the received code streams to perform harmful content. Automatic identification, and automatically triggering the main control unit to output the control signal when identifying harmful content.

When the review module includes both the manual identification sub-module and the automatic identification sub-module, the review module further includes: a review mode switching module block, and the manual identification sub-module and/or automatic identification is started according to the control option of the control module. a sub-module; and a decision sub-module, receiving an output signal of the instruction receiving unit and the automatic identification unit at the same time, and determining whether to trigger the main control unit to output the control signal according to a set decision rule.

The review module also includes the following main structures:

a decision principle storage submodule connected between the decision submodule and the control module The decision principle entered by the control module.

The content recording module is connected to the third delay module, and is configured to record all the code streams output by the third delay module in a specified time period.

The review device further includes:

The logging module is respectively connected to other modules or sub-modules for generating and outputting the running status of the reviewing device;

Decision support knowledge base, connection control module, used to store specific or temporary harmful content text, face images and related legal and regulatory documents.

The chip source library is replaced, connected to the switch module, and the harmless spare video content is stored.

The present invention also provides a video encoding apparatus, a streaming server, and a reviewing apparatus for the above system.

The beneficial effects of the present invention are as follows:

The technical solution of the present invention utilizes the relationship between the base layer and the enhancement layer in the code stream multi-layer structure of the layered coding mechanism to generate a code stream for review, and the code stream for review includes a base layer code stream, or may further include a part of the enhancement layer code stream. Examining and forwarding the code stream for review through a dedicated review device. When the private review device includes reviewing the code stream for review, the dedicated review device cuts off the code stream for review to the user, and the client cannot decode and reconstruct the original video stream. Therefore, the purpose of controlling the transmission of harmful content is achieved; the data volume of the review code stream formed based on the layered coding technology is small, and the system does not burden the forwarding process;

Further, in the layered coding, the bit rate of each layer can be flexibly set, and the quality of the reconstructed video bitstream can reach a higher level as long as the total bit rate reaches a certain level. Therefore, in order to reduce the overhead caused by the code stream for review, the base layer code stream can be set lower, and the effect of further reducing the system load has been achieved;

The technical solution of the present invention can be used for the live broadcast mode and the stored content play mode, respectively, and provides a general deep content review scheme for the existing streaming media service based on the layered coding technology, thereby ensuring the current streaming media services such as IPTV and digital television. Content security, effectively preventing the spread of harmful content through streaming media services. DRAWINGS

FIG. 1 is a schematic diagram showing the principle of filtering based on a content-based URL;

2 is a schematic diagram of the principle of deep review of existing streaming media content;

3 is a schematic diagram of a layered video coding principle;

4 is a schematic structural diagram of a review system for implementing the technical idea of the present invention;

FIG. 5 and FIG. 6 are schematic diagrams showing a structure of the review device according to the present invention;

FIG. 7 is a schematic diagram of a layered coding principle when the technical method of the present invention is implemented based on a spatial layered coding technique. detailed description

The present invention is based on existing layered coding techniques. The review code stream is generated by the basic layer, and the review device checks the review code stream. When there is no problem in the review, the review device forwards the base layer code stream to the receiving user receiving end, and the enhancement layer code stream is not subjected to the review device. After being sent to the receiving end of the user, after receiving all the base layers and the enhancement layer, the user terminal can decode and reconstruct the total video code stream and watch the program. If the user terminal does not receive the basic layer, the reconstructed viewing program cannot be decoded.

The schematic diagram of the structure of the review system for implementing the technical idea of the present invention is as shown in FIG. 4, and mainly includes a related device of a content source, a content database, and a streaming media server. Review devices and user sink devices, where:

The solid line part is the review implementation process of the live broadcast mode. In the live broadcast mode, the content source directly sends the basic code stream for review directly to the content review device through the streaming media server, and the remaining enhanced code stream is delayed and sent to the user receiving end for review. When the device review basic code stream does not contain harmful content, the basic code stream is forwarded to the user receiving end. If the basic code stream does not contain harmful content, the basic code stream is not forwarded to the user receiving end, so that the user receiving end When the elementary stream is not received, the reconstructed original video stream cannot be decoded. In this process, the reviewing device can notify the content source to increase/decrease the encoding quality of the base layer code stream according to the required video quality, or increase/decrease forwarding for enhancing the review effect and while forwarding the base layer code stream. Accuracy layer code stream. if the latter one, A portion of the enhanced code stream is also included in the code stream for review accordingly.

The dotted line part is a review implementation process for storing the content play mode. In the storage content play mode, the content source first stores the generated base layer code stream and the enhancement layer code stream respectively to the corresponding base layer code stream track and the enhancement layer track; When playing the video content, the server reads the basic layer code stream from the base layer code stream track and forwards it to the reviewing device, reads the enhancement layer code stream and performs delay processing, and then sends the result to the user terminal; likewise, in the process, The reviewing device notifies the streaming media server to forward/decrease the enhancement layer code stream for enhancing the review effect and accuracy according to the required video shield quantity, and correspondingly used in the code stream for reviewing Includes a portion of the enhanced code stream.

The following is a detailed description of the specific structure of the following parts of the system:

First, the content source

The content source device includes a video input device and a video encoding device for collecting a video code stream, and the video input device is generally a camera. The technical idea of the present invention is implemented based on a layered coding technology, so the video encoding device is based on a layered coding technology for the original video. The code stream is encoded as a base stream and at least one layer of enhanced code.

Still referring to FIG. 4, the video encoding device mainly includes the following structure:

An encoder, the encoder layering the original video stream into a base layer code stream and at least one layer of enhancement layer code stream;

In the live broadcast mode, the basic layer code stream is directly outputted to the streaming media server through the communication network, and the enhancement layer code stream is output after the delay processing is performed by the first delay module; or the base layer code stream and the partial enhancement layer code are directly output to the streaming media server. The stream is subjected to delay processing by the first delay module to output the remaining enhancement layer code stream.

The streaming server forwards the code stream for review to the review device, and sends the code stream for non-review directly to the user receiver.

Of course, in the live mode, the video content can be stored in the content database at the same time, and the storage method is as follows: storing the basic code stream in the set basic layer code stream track, and storing the enhancement layer code stream in the set enhancement layer layer. In the code stream track.

Second, streaming media server In the live broadcast mode, the streaming media server only performs the forwarding function and does not perform the delay processing. In the playback storage content mode, the streaming media server needs to delay the non-reviewing.

Still referring to FIG. 4, the streaming media server includes a video stream reading module, and further includes a second delay module connected to the video stream reading module, and the video stream reading module reads the score based on the externally connected content database. Layer-coded base layer code stream and at least one layer of enhancement layer code stream, then:

Directly outputting the base layer code stream, and performing delay processing on the second delay module to output the enhancement layer code stream; or

The base layer code stream and the part of the enhancement layer code stream are directly output, and the remaining enhancement layer code stream is output after the delay processing is performed by the second delay module.

Third, review equipment

As shown in FIG. 5, a review device for implementing video stream content review based on layered coding includes: a communication module, and a communication connection with other network devices;

The third delay module receives the base layer code stream, or the base layer code stream and the part of the enhancement layer code stream, and delays processing all the received code streams and forwards them to the user receiving end;

The review module, which examines all delayed forwarded code streams, and outputs corresponding control signals when the detected code stream contains harmful content;

a switch module connected behind the third delay module;

The control module is connected between the review module and the switch module, and triggers the disconnection switch module according to the control signal.

As shown in FIG. 6, the review module specifically includes a manual identification submodule and/or an automatic identification submodule: the manual identification submodule specifically includes: a decoding unit, an enhancement processing unit, a display unit, and an instruction receiving unit, and the decoding unit decodes and receives All codestream images. After being processed by the enhancement processing unit and displayed on the display unit to the monitor, when the monitor manually recognizes that the image contains harmful content, the instruction receiving unit triggers the main control unit to output a control signal;

The automatic identification sub-module specifically includes: an automatic identification unit and a harmful content database, and the automatic identification unit performs automatic identification of the harmful content according to the comparative analysis of the harmful content in the harmful content database and the related content included in all the received code streams, And automatically identify harmful content Trigger the main control unit to output a control signal.

When the review module includes both the manual identification submodule and the automatic identification submodule, the review module further includes:

The review mode switching sub-module, and starting the manual identification sub-module and/or the automatic identification sub-module according to the control of the control module;

The decision sub-module receives the output signals of the instruction receiving unit and the automatic identification unit at the same time, and determines whether to trigger the control module to output the control signal according to the set decision principle;

The decision principle storage submodule is connected between the decision module and the control module, and stores the decision principle input through the control module;

Decision support knowledge base, connection control module, used to store specific or temporary harmful content text, face images, such as terrorist images just received, and related regulatory documents.

The review device further includes: a content recording module, connected to the output end of the third delay module, for recording all the code streams output by the third delay module in a specified time period.

The review device further includes: a logging module, which is respectively connected to other modules or sub-modules for generating and outputting an operation status log of the review device.

The review device also includes: replacing the source library, connected to the switch module, and storing the harmless spare video content.

The review system consisting of the above video coding device, streaming media server and review device is shown in FIG. 4, and the content source hierarchically encodes the original video code stream into a base layer code stream and at least one layer of enhancement layer code stream, in a live mode. The process of the embodiment of the technical solution of the present invention includes the following steps:

1. The content source forwards the basic layer code stream to the reviewing device, delays processing the enhancement layer code stream, and sends the stream to the user receiving end;

2. Examine the device to check whether the basic layer code stream contains harmful content. If the base layer code stream is forwarded to the user receiving end, the basic layer code stream is forwarded to the user receiving end.

Moreover, in the live mode, the review device notifies the content source to increase/decrease the encoding quality of the base layer code stream according to the required video quality, or increases/decreases the forwarding for enhancing the review effect while forwarding the base layer code stream. Accuracy layer code stream. In the storage content play mode, the content source first stores the generated base layer code stream and the enhancement layer code stream respectively to the corresponding base layer code stream track and the enhancement layer track; the streaming media server reads the basic from the base layer code stream track. Layer code stream is forwarded to the review device, and the enhancement layer code stream is read out and delayed processing is sent to the user terminal;

The reviewing device can notify the streaming server to forward/decrease the layer code stream, while increasing/decreasing the enhancement layer stream for enhancing the review effect and accuracy.

Among them, the review methods for reviewing equipment include:

Decoding the base layer code stream image and inputting the automatic identification device, the automatic identification device compares and analyzes the harmful content in the pre-stored harmful content database with the related content contained in the base layer code stream image to perform automatic identification of the harmful content; and/or

The base layer stream image is displayed to the monitor for manual identification of the harmful content.

When the manual identification and the automatic recognition are simultaneously performed, the automatic identification device and the monitor respectively give corresponding harmful degree scores for the identified harmful content according to a preset rule, and then weight the two scores. After the final execution of the judgment result, when only the score given by one party for the identified harmful content is received, the default score given by the other party for the content is zero.

The above weighting method is:

S = ( W _M XS _M +W _H XSH ) I ( WM+WH )

In the present invention, the harmful content includes at least one of the following: harmful images, harmful superimposed characters or symbols, and specific face images.

In the present invention, when the reviewing device cuts off and forwards the base layer code stream, it starts forwarding the standby harmless video code stream. The review device can also record and save the base layer stream for a specified time period and an enhancement layer stream for enhanced review effectiveness and accuracy; and record the identification of harmful content in a log and generate a report.

When the spatial layered video coding method is used, the content source first reduces the original video code stream according to the set ratio, and then performs layered coding.

The technical solution of the present invention is mainly implemented based on the existing three mainstream layered coding technologies:

1. Time-based layered video coding method

For the base layer, a relatively low frame rate is set, for example, 5 frames per second (5 fps), and the base layer is encoded according to the intra coding mode and the inter prediction coding mode, and then encoded in the bidirectional prediction mode to generate an enhancement layer. The enhancement layer plus the base layer is 25φ ₃ (for PAL), 30φ3 (for NTSC).

2, based on SNR layered video coding method

Influencing factors such as quantization parameters of the base layer coding are controlled such that the bit rate of the base layer is lower than R _b (for example, 128 kbps), and then the first enhancement layer is formed, or is called a Prime Enhancement Layer. The bit rate is R _pe (for example, 384 kbps), and then based on the main enhancement layer, a plurality of Secondary Enhancement Layers are formed, and the bit rates are respectively R _sel ,

Rse2, Rsek.

Because the basic layer is in this aspect only to provide review of the program, as long as the basic information such as the outline of the image can be seen, the bit rate should be as low as possible, and the user simply receives the basic layer and cannot meet the program requirements. The main enhancement layer must be combined to allow normal viewing of the program. Of course, if the network bandwidth is sufficient, users can accept more or even all enhancement layers to watch high-definition programs. In the above example, R _b = 128 kbps, R _pe = 384 kbps, and R _sel = 512 kbps can be set. Then, if the user is an ADSL (Asymmetric Digital Subscriber Line) access broadband user, and can guarantee 512 kbps, the base layer and the main enhancement layer can be received. If the user is an Ethernet access broadband user, the base layer and all enhancement layers can be received for the best video quality.

3, based on spatial layered video coding method

In order to reduce the bit rate of the base layer, in the scheme of generating a code stream based on spatial layered coding, The invention first reduces the original video data stream, and the technical term is called downsampling.

(Down-sampling), causing the image to shrink by k _h , k _v times in the horizontal and vertical directions. The advantage of this is to further reduce the bit rate of the review code stream. For example, in the case of SR layering, if 128kbps is the bit rate capable of providing the video quality required for the review, if we are in spatial layered coding, first reduce the image to Originally 1/4 size (two times horizontally and vertically), then the video quality required for review should be available at around 32 kbps. The principle is shown in Figure 7. The transformation U represents the difference and then the upper sample. Transform D represents downsampling.

In the streaming media business, the two most important business service modes are storage and live broadcast of storage.

1, storage content playback

The video content is compressed and encoded in the form of a video streaming file in a content database. Within the video file, the base layer stream and the enhancement layer stream can be respectively placed on two different media tracks. For the principle, see FIG. It is shown that the base layer coded code stream and the respective enhancement layer coded code streams are first generated by the video encoding device, then written to the video file and stored in the content database.

When playback is required, the streaming server reads the video data of the review track and the enhancement layer track from the video file, and then transmits the enhancement layer data to the network through a delay link that can be set by the delay time, and the review code Streams are sent directly without delay.

A streaming server reads data from a file, packages and sends the packets according to the auxiliary information in the file. The auxiliary information generally provides rules about packaging, and multiple media tracks (such as video is in the file). Synchronization information between a track, an audio track, a text track, etc., so that the streaming server knows which tracks and how much data should be read from which tracks in any particular time period [t, t+At] Pack and send. In this case, the video encoder is responsible for layered encoding and then generating a video file. The streaming server is responsible for reading the data from the video file and packaging it according to the instructions of the auxiliary information:

(1) extracting the data of the review code stream for packet transmission and sending to the review device;

Here, the review code stream may include only the base layer code stream, or an enhancement layer code stream including the base layer code stream and a part of the enhanced review effect and accuracy, increasing or decreasing the enhancement according to the indication of the review device. The specific number of layer streams.

(2) extracting the non-examination enhancement layer code stream for packet transmission and directly transmitting to the user terminal; (3) maintaining time synchronization between the respective media tracks.

(4) It is required to send an advance amount of the enhancement layer code stream in time for the packet stream for review, such as 5 seconds or 10 seconds. This is because the review code stream is reviewed more often than the enhancement layer code stream, especially if the manual review requires a certain amount of advancement so that there is enough time to take action once the harmful content is reviewed. This amount of advance can be achieved by delaying the enhancement layer code stream by a corresponding amount of time because both the advance and the backward are relative.

2. Live broadcast situation

The video encoder directly packs and transmits the encoded video stream according to the code stream for review (basic code stream or elementary stream and partial enhanced code stream) and the non-review code stream (all or the remaining portion of the enhanced code stream). To the streaming server, the streaming server can just forward it. When the encoder outputs the unreviewed video coded stream, it delays the set time through a delay link and then sends it to the network, and the review is sent directly by the code stream without delay.

Once the content review device receives the review video stream from the streaming server, it is subject to review. There are two basic ways to review it:

1. Manual review: It is to manually review the video review content decoded and displayed on the screen. Based on the experience of the reviewer, it is judged whether the content under review is harmful according to the national laws and regulations or some principles agreed by the society.

2. Automated review: Automated content review through the machine. There are many ways, such as recognizing specific harmful scenes, such as violent pornography, through the recognition process of video footage. It is also possible to identify the text by superimposing superimposed characters such as subtitles in the video to determine whether there is harmful text. Information; face recognition can also be performed to find out if a specific person appears in the picture, such as a terrorist. These automated reviews, with the latest technological advances, have become increasingly sophisticated and practical, and [multiple methods, such as overlay text positioning and recognition, do not require decoding of the code stream, so they can greatly improve efficiency. Its practicality is further improved.

Of course, sometimes you can perform manual review and machine review at the same time, and then put the results. Convergence. For example, when a recognition result is unreliable, it may be necessary to combine the two at the same time. For example, if the quality of the code stream is poor, the display image is blurred, and the human eye cannot clearly distinguish it, the machine can be used. In another case, for example, a face that the human reviewer thinks is possible, but it is impossible to determine whether it is necessary. The object of control is identified by automatic review. If there is a face of the person in the decision support knowledge base, the name attribute and the like can be accurately identified.

For the results of the two reviews to be fused, there are many ways to integrate the information. Such as weighted average methods.

After the review, if the content is found to be okay, the review code stream is passed and forwarded to the user terminal, and the user terminal performs decoding and reconstruction in combination with the received enhancement layers to obtain a content video that can be viewed.

If the review finds a problem, it is necessary to break the harmful content, so that the user terminal's program viewing will be interrupted, in order to make up for the "black screen" problem. The content review device can send alternative video content to the user terminal, such as a public service advertisement, from a replacement source library in the content review device. At the same time, harmful content should be recorded for later tracing, legal forensics, etc.

The technical solution provided by the embodiment of the present invention can also be used for recognizing other video content, for example, a sports game highlight shot, such as a soccer shot lens, a basketball long-range hit, a dunk, and the like, for the purpose of identifying the video. Fragments are stored and recorded; images related to specific people are identified from news programs for archiving; videos automatically recorded by electronic eyes (ie cameras installed at major intersections) used in the transportation system are identified, and violations are identified and identified The number of the illegal vehicle; identifies a specific story in the TV program, such as a Harry Potter movie, once it is recognized that the IPTV user can be notified to watch.

It is apparent that those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. Thus, it is intended that the present invention cover the modifications and variations of the inventions

Claims

Rights request

A method for reviewing a video stream, the content source layer-coding the original video stream into a base layer stream and at least one layer of enhancement layer code. When the video stream is transmitted, the method includes the following steps. :

2. The method according to claim 1, wherein, in the live mode,

In the step A, the content source directly sends or forwards the corresponding code stream of the layered code;

In the step B, the review device notifies the content source to increase/decrease the coding quality of the base layer code stream according to the required video quality, or increases/decreases the forwarding for enhanced review while forwarding the base layer code stream. Partial enhancement layer code stream for effect and accuracy.

The method according to claim 1, wherein, when the content playing mode is stored, in the step A, the content source first stores the generated base layer code stream and the enhancement layer code stream respectively. Base layer stream track and enhancement layer track; the streaming server reads the base layer code stream from the base layer code stream track and forwards it to the review device, 'reads the enhancement layer code stream and performs delay processing After being sent to the user terminal;

In the step B, the reviewing device notifies the streaming media server to increase/decrease the enhanced layer code stream for enhancing the review effect and accuracy while forwarding the base layer code stream according to the required video quality.

4. The method according to claim 1, wherein in step B, the review method of the review device comprises:

Decoding the base layer code stream image and inputting the automatic identification device, and the automatic identification device compares and analyzes the harmful content in the pre-stored harmful content database with the related content included in the base layer code stream image to perform automatic identification of the harmful content. ; and / or The base layer code stream image is displayed to the monitor for manual identification of the harmful content.

The method according to claim 4, wherein when the manual recognition and the automatic recognition are simultaneously performed, if the recognition results of the two are inconsistent, the judgment result of the automatic identification device or the monitor is preferentially executed.

6. The method according to claim 4, wherein when the manual identification and the automatic recognition are simultaneously performed, the automatic identification device and the monitor respectively give corresponding contents for the identified harmful content according to a preset rule. The degree of harmfulness is scored, and then the two scores are weighted to obtain the final executed judgment result. When only the score given by one party for the identified harmful content is received, the other party gives the score given by the content. Zero.

7. The method according to claim 6, wherein the weighting processing method is:

S ( W _M XSM+W _H XSH ) I ( WM+W _H )

Wherein, W _M and W _H represent the weights of the automatic identification device and the monitor, and the relative size between W _M and W _H represents the degree of trust in the recognition result, and the S _M and the S _H respectively automatically identify the device and the monitor to give The score, if Si is greater than a given value, the judgment result is harmful, otherwise the judgment result is harmless, W _M , W _H and the given value are respectively set according to the empirical value.

8. The method according to claim 1, wherein the harmful content comprises at least one of the following: a harmful image, a harmful superimposed text or symbol, a specific facial image.

The method according to claim 1, wherein in the step B, the reviewing device cuts and forwards the base layer code stream, and starts forwarding the standby harmless video code stream.

The method according to claim 2 or 3, wherein the method further comprises: reviewing the device to record and save the base layer code stream of the specified time period and the enhancement layer code stream for enhancing the review effect and accuracy .

11. The method according to claim 1, wherein the method further comprises: recording the identification of the harmful content in a log and generating a log report.

The method according to claim 1, wherein the step B further comprises the following steps: decoding, when the user receiving end receives all the enhancement layer code streams and the base layer code stream, decoding and reconstructing the original video code stream.

The method according to claim 1, wherein when the spatial layered video coding method is used, the content source first performs reduction processing on the original video code stream according to the set ratio, and then performs layer coding.

A video stream content review system, comprising: a video input device, a video encoding device, and a streaming media server, wherein the video stream review system further includes a review device; the video encoding device inputs a video The layered code of the original video code stream collected by the device is coded into a base layer code stream and at least one enhancement layer code stream, and the base layer code stream is forwarded to the reviewing device through the communication network, and all enhancement layer code streams are deferred. And then sent to the user receiving end through the communication network; the reviewing device checks whether the basic layer code stream contains harmful content, and if not, forwards the basic layer code stream to the user receiving end through the communication network, otherwise cuts off and forwards to the user receiving end. The base layer code stream.

The system according to claim 14, wherein the video encoding device comprises: an encoder that encodes the base layer code stream and the enhancement layer code stream, and forwards the base layer code stream directly to the review device ;

The first delay module delays the enhancement layer code stream and sends the stream to the user receiving end.

The system according to claim 15, wherein the reviewing device further comprises: a communication module, the communication connection video encoding device, the reviewing device notifying the video encoding device to increase/decrease the basic according to the required video quality The coding quality of the layer code stream, or the forwarding of the base layer code stream, while increasing/decreasing the enhancement layer code stream for enhancing the review effect and accuracy, and at the same time enhancing layer for enhancing the review effect and accuracy. The code stream is delayed and forwarded to the user receiver.

17. The system of claim 14 wherein:

The video stream review system further includes: a video content database, where the base layer code stream and the enhancement layer code stream are saved, wherein the base layer code stream is saved in a set base layer code stream track, and the enhancement The layer code stream is saved in the set enhancement layer code stream track;

The streaming media server includes: a code stream reading module and a second delay module, wherein the code stream reading module reads the base layer code stream from the base layer code stream track and forwards the code stream to the review device, The enhancement layer code stream is read in the strong layer code stream track and subjected to delay processing by the second delay module, and then sent to the user receiving end.

The system according to claim 17, wherein the reviewing device further comprises: a communication module, the communication connection video encoding device streaming media server, the reviewing device notifying the streaming media server to forward the basic according to the required video quality At the same time as the layer code stream, the enhancement layer code stream for enhancing the review effect and accuracy is added/reduced, and the enhancement layer code stream for enhancing the review effect and accuracy is delayed and forwarded to the user receiving end.

The system of any of claims 14-18, wherein the reviewing device comprises:

a switch module connected behind the third delay module;

The system according to claim 19, wherein the review module specifically includes a manual identification submodule and/or an automatic identification submodule, wherein:

Specifically, the manual identification sub-module includes: a decoding unit, an enhancement processing unit, a display unit, and an instruction receiving unit, where the decoding unit decodes all received code stream images, and the image is processed by the enhancement processing unit on the display unit. Displayed to the monitor, when the monitor manually recognizes that the image contains harmful content, the instruction receiving unit triggers the main control unit to output the control signal;

21. The system of claim 20, wherein when the review module is simultaneously packaged Including the manual identification sub-module and the automatic identification sub-module, the review module further includes:

The mode of review switches the module block, and the manual identification sub-module and/or the automatic identification sub-module are activated according to the control of the control module;

The decision sub-module receives the output signals of the instruction receiving unit and the automatic identification unit at the same time, and determines whether to trigger the main control unit to output the control signal according to the set decision principle.

The system according to claim 20, wherein the review module further comprises a decision principle storage submodule connected between the decision submodule and the control module to store a decision principle input through the control module.

The system according to any one of claims 14-18, wherein the review device further comprises one or any combination of the following:

a content recording module, connected to the third delay module, configured to record all code streams output by the third delay module in a specified time period;

Decision support knowledge base, connection control module, for storing specific or temporary harmful content words, face images and related legal and regulatory documents;

A video encoding device, comprising: an encoder, further comprising: the video encoding device further comprising a first delay module connected to the encoder, the encoder layering the original video stream into The base layer stream and at least one layer of enhancement layer code, then:

Directly outputting the base layer code stream, and performing delay processing by the first delay module to output the enhancement layer code stream; or

The base layer code stream and the partial enhancement layer code stream are directly output, and the remaining enhancement layer code stream is output after the delay processing by the first delay module.

A streaming media server, comprising a video stream reading module, wherein the streaming media server further comprises a second delay module connected to the video stream reading module, the video stream reading module Reading a layered code based base layer code stream and at least one layer of enhancement from an external database Layer code stream, then:

Directly outputting the base layer code stream, and performing delay processing by the second delay module to output the enhancement layer code stream; or

The base layer code stream and the partial enhancement layer code stream are directly output, and the remaining enhancement layer code stream is output after the delay processing by the second delay module.

26, a review device, the original video code stream to be examined is hierarchically encoded by the content source into a base layer code stream and at least one layer of enhancement layer code stream, the review device includes: a communication module, and a communication connection with other network devices; It is characterized in that

a review module, which examines all delayed forwarded code streams, and outputs a corresponding control signal when it is detected that the code stream contains harmful content;

a switch module connected behind the third delay module;

The review device according to claim 26, wherein the review module specifically includes a manual identification submodule and/or an automatic identification submodule, wherein:

28. The review apparatus according to claim 27, wherein when said review module is the same When the manual identification sub-module and the automatic identification sub-module are included, the review module further includes: a review mode switching sub-module, and the manual identification sub-module and/or the automatic identification sub-module are activated according to the control of the control module;

The review apparatus according to claim 28, wherein the review module further comprises a decision principle storage submodule connected between the decision module and the control module to store a decision principle input through the control module.

The review apparatus according to any one of claims 26 to 29, wherein the review apparatus further comprises one or any combination of the following:

a logging module, which is respectively connected to other modules or submodules for generating and outputting an operation status log of the reviewing device;