WO2014127823A1

WO2014127823A1 - Digital verification

Info

Publication number: WO2014127823A1
Application number: PCT/EP2013/053475
Authority: WO
Inventors: Dimitrios TSOLIS; Vasileios DOURDOUNIS; Emmanouil KARATZAS; Vagelis PAPAKONSTANTINOU; Theoklitos DRAGONAS
Original assignee: Digital Verification Services Ltd
Priority date: 2013-02-21
Filing date: 2013-02-21
Publication date: 2014-08-28
Also published as: GB2527227A; GB201516703D0

Abstract

A method of watermarking digital content by a trusted third party, comprising: capturing an image of the digital content; embedding a first watermark into the digital content, wherein the presence of the first watermark indicates the owner of the digital content; embedding a second watermark into the digital content, wherein the presence of the second watermark indicates integrity of the digital content; storing the digital content in a repository.

Description

Digital Verification

Technical Field

This invention relates to methods and systems for verifying digital content, and more specifically for verifying the integrity and ownership of web-based content.

Background to the Invention

The internet provides a vast and changing source of information which is created, edited and accessed by users on a global scale. News, opinion, announcements, facts, views and discussions are constantly being published and unpublished as images, video and text on websites, forums, networks, chat rooms, blogs, etc. User generated content (i.e. content generated and edited by individuals) is a widespread and a growing aspect of internet use today. Inherently, such content is largely uncontrolled, and consequently the risk of misuse of content in the form of copyright infringement, harassment, privacy issues, or libellous publications amongst others can be an issue. When content is deleted, traces of its existence are, in most cases, also deleted. It is then very difficult, if not impossible, of establishing that such content was indeed published. In such instances, without evidence of the publication, it may then be difficult to provide recourse for the injured party or prevent further misuse of data. Such issues are particularly relevant for content published on sites on which transient or immediate content is published, such as social networks, forums and blogs.

If content created by a party is copied without authorisation, it is difficult to establish that copyright infringement occurred unless ownership of the copyright can be properly established. Again, this is made difficult by the fact that content may be frequently edited or deleted.

Watermarking principles are mainly used whenever copyright protection of digital content is required. Some parties who are aware of the existence of the watermark may have an interest removing it. In this framework the most popular and demanding application of watermarking is to give proof of ownership of digital data by embedding copyright statements. For this kind of application the embedded information should be robust against manipulations that may attempt to remove it. Many watermarking schemes show weaknesses in a number of attacks and specifically those causing de-synchronization, which is an efficient tool against most marking techniques. De-synchronization of the detector means that the detector is unable to detect a watermark embedded in an image. Thus detection, rather than embedding, is a core problem of digital watermarking. A weakness of many watermarking detection mechanisms is their inability to counter attacks involving the de-synchronization of the detector due to geometrical attacks. In such cases, the watermarked content has been manipulated to the extent that the detector cannot detect the watermarks embedded in it.

Various techniques are currently employed to watermark digital content. For example, a digital mark is embedded into the digital content, such that the content is then 'watermarked'. The watermark itself is imperceptible under normal viewing of the content, and is only detected under certain conditions, and by querying the content using particular algorithms. Such watermarks may be 'robust' or 'weak'. If it can be verified that a watermark was embedded in the content, but is not detected, it might be concluded that the content has been altered or tampered to the extent that the watermark has been removed. Thus integrity of the digital content cannot be verified. In other cases, a detected watermark may indicate that the digital content is owned by a particular party. However, at present there are limitations regarding the applicability and effectiveness of known techniques for rapidly changing internet content, for preserving such content, in the event of any disputes involving the watermarked content, ensuring that the evaluation of the watermarked content is impartial and not biased to any particular party.

It is an aim of the present invention to overcome or at least mitigate some of the drawbacks of known art.

Summary of the Invention

Accordingly, an aspect of the invention provides a method of watermarking digital content by a trusted third party, comprising: capturing an image of the digital content; embedding a first watermark into the digital content, wherein the presence of the first watermark indicates the owner of the digital content; embedding a second watermark into the digital content, wherein the presence of the second watermark indicates integrity of the digital content. A second aspect of the invention provides a system according to appended claim 9.

A third aspect of the invention provides a system according to appended claim 12.

Preferable features of the invention are defined in the appended dependent claims.

The present invention provides an on-demand mechanism by which content is watermarked and stored by an entity acting as a trusted, secure and impartial authority to enable the verification that digital content published on a website included specific content at a specific, defined time, to protect and verify the existence and publication of such content, as well as authenticate its authorship.

Additionally, the present invention provides an on-demand mechanism of providing watermarked content to requesting parties, and for detecting and reading the watermark(s) so as to verify ownership and/or integrity of the data.

Brief description of the drawings

Figure 1 shows components of the overall system architecture for and method of applying watermarking digital content according to an aspect of the invention;

Figure 2 shows steps of a method of providing watermarked content according to an embodiment of the invention;

Figure 3 shows steps of a method of watermarking content according to an embodiment;

Figure 4 shows steps of a method of detecting a digital security mark according to an aspect of the invention;

Figure 5 is a table illustrating the change in quality of the signal due to watermarking;

Figure 6 is a table illustrating the degree of success against attacks due to the watermarking techniques of the present invention; Figure 7 is a table illustrating detection results due to digital to analogue attacks;

Figure 8 shows a graph depicting the relative strength of a watermark against attacks according to an embodiment.

Detailed Description

The main system components employed for the provision of watermarked content will be described with reference to Figure 1, which shows data verification system 100 internet content 101. System 100 comprises capturing module 106, watermarking module 112, digital archive 110 and digital library 108. System 100 is operated and maintained by trusted third party that is financially, commercially and legally distinct from the requesting party. In this way, the trusted third party is an independent and impartial registration and verification authority, and therefore has no bias towards any other party. This is advantageous since the integrity of the capturing and watermarking embedding process, as well as the detection and verification process can be assured - the trusted third party has no connection with any other party or entity involved in the process. The trusted third party is typically a university or other independent institution.

An application form from a requesting party is received by the system 100. On receipt of the request on the application form, the online capturing module 106 takes a 'snap shot' of the requested webpage from internet 101. A still image of the complete content of the webpage is then created . In one embodiment, an administrator of the system is notified, upon receipt of the application form, and prompted to capture the website. However, in other embodiments this step may be automated.

A copy of the original, non- watermarked image is stored, in Digital Archive 110, along with information extracted from the application form. Watermarks are embedded into the image in watermarking module 112. The watermarked image of the captured webpage, along with metadata, is stored in digital library 108.

The process steps are described with reference to Figure 2. As mentioned above, the process is initiated by the submission of an application form (202). The application form is a web- based form which may be accessed by any internet user. The application form is used by any party to request the watermarking of content on a particular website. The application is submitted to the system 100 using a internet portal but may also be sent via email, for example, or via other digital means. The information received by system 100 from application form includes names, contact addresses and the address of the website to which digital watermarking techniques are to be applied. In an alternative embodiment, the content to be captured, watermarked and stored is provided by the requesting party directly. The requesting party, instead of or in addition to requesting a webpage to be captured, provides content directly to the trusted third party operating system 100 in the form of a digital document.

At 204, the online capturing module 106 validates the IP address of the requested web address so as to verify the original source of the web page and trace the route of the web page. Information about this route is stored (as will be described below) to avoid possible forgery of the web address in the future. Online capturing module 106 applies an algorithm which parses the source code of the web page so as to integrate the whole web page, regardless its length, width and included items (text, images, banners etc) into a single still image (step 206). The algorithm can be used with the various and differing technologies and programming languages used to construct and support a web page, as well as engines used by known internet browsers to present a web page to the final user. The still image of the webpage is of a standard format (e.g. jpeg).

At 208, an original, full size and original quality image of the captured webpage is stored (i.e. prior to watermarking) in archive 110. Metadata is directly assigned (202) to the specific snapshot and includes the web address of the captured website, a timestamp, the applicant's name and surname (where applicable), the requesting party's email address, telephone number, fax number, a description of the webpage and any comments provided by the applicant in relation to request and details of the administrator or personnel involved in the capturing of the webpage. In addition, the image is accompanied by technical and descriptive metadata based on international metadata standards. The metadata is also stored with the original image in the archive.

Two watermarking algorithms are applied (212 and 214) to the image so as to embed in the snapshot two watermarks using two encryption keys. The first watermark is a robust and invisible watermark since it is embedded using an identification number which is unique at a global level (it is based on a DOI (Digital Object Identifier) schema to ensure its uniqueness). Extraction of this watermark is only possible using the unique identification number and so it can be verified, by extracting the watermark, that the digital image is owned by the administrator of the digital library, i.e. the trusted third party and that is has been stored in the trusted third party's library. In the case of a dispute, where two parties claim ownership of the content, only the true owner will have access to the DOI number (via the trusted third party, using the securely stored metadata). If this DOI number is provided to the detector and the first watermark is extracted, that party will be shown to be the true owner of the content.

The second watermark is weak and invisible and is used to provide verification of the integrity of the image and control copying of the image. This watermark is embedded into the content using a constant, well known number. Extraction of this watermark can therefore be achieved fairly easy using the constant, well known number acting as a 'key'. Once extracted, this second watermark acts as a "never copy" instruction. Compliant devices are equipped with the detector of the watermarking mechanism and have knowledge of the constant key. If the detector of the compliant device detects the watermark it verifies the "never copy" instruction and forbids the replication of the content. If this watermark is not detected (or has changed), it can be concluded that the content has been altered and its authenticity can therefore not be verified.

Transaction tracking is also facilitated by the use of a weak watermark. This type of watermark is typically embedded into the content at each stage of its distribution. In the event that content has been found to be in the possession of a non-authorized party, the second watermark is extracted using a well known key, and, the extracted watermark (where the detector outputs a numerical value, rather than a Boolean response) therefore indicates the distribution source (i.e. the point from which the content went to the wrong hands)

When both watermarks are detected in the digital content, it can be verified that the snapshot is stored in the system 100 (and therefore owned by a party known to the trusted third party) and that the image has not in any way been altered, processed or manipulated (e.g. that colors, texts, images have not been changed and that cropping, rotation, resize or any other type of processing has not occurred). The watermarked image is stored in the digital library at step 216. To ensure long-term, safe and trusted storage of the watermarked images, the images, in addition to storage in a secure database, are also stored on optical and magnetic storage media. The snapshots are stored based on standard formats. Lossless data types are typically used (e.g. .tiff). A backup routine produces full content surrogates each week and ensures zero data loss in case of hardware and software failure. The optical media (e.g. DVDs) which store images are renewed every 2 years so as not to lose data due to deterioration. In addition, the images are migrated to new storage media in the case that the maximum lifespan of the old storage media has been reached based on the manufactures estimation, or discovery of read errors during storage media tests. Quality checks (bits/bytes comparison, checksum evaluation) are conducted following such migration to ensure data integrity. At step 218, read-only copy of the watermarked image is sent to the requesting party.

The API (Application Protocol Interface) which supports the watermarking module 1 12 supports both embedding and detection of watermarks. Consequently, system 100 thus provides a watermarking mechanism which provides both the watermarking of content and the detection of the watermarks in the case of a dispute between two or more parties claiming ownership of content, for example. The API is structured as a independent dynamic link library which is universally applicable by just referencing the corresponding class.

The steps involved in the process of embedding the robust watermark into the image are described with reference to Figure 3. A unique number is generated (302) is used to embed the first watermark into the image (304). This number is required in order for the detector to detect the watermark in the digital content (i.e. the snapshot of the website).

For security, the unique number used to embed the first watermark is generated by watermarking module 112 and then provided, via secure means, to one of a limited number of known personnel. The unique number key, and the individual to which it has been provided, is stored. After the individual has embedded the watermark using the unique number key provided to him by an administrator of the system 100, and has then submitted the watermarked content, along which the unique number key to the administrator of the system for the purposes of adding the watermarked content to the digital library, the administrator checks that the unique number key provided to the individual matches the unique number key the individual has submitted to the administrator, along with the watermarked content. In this way, unauthorized or unknown individuals are prevented from populating the digital library with potentially malicious data.

Prior to embedding the watermark, supplementary information relating to the original, non- marked digital content is stored in the library using the DOI number (i.e. the key for the first watermark). This information is also retrieved using the DOI number, thereby linking it to the watermarked digital content. Such supplementary information typically includes the format of the image, aspect ratio, average colour and histogram. When the DOI number is created for an image to be embedded in the image as a watermark, an algorithm automatically runs which extracts supplementary information from the image. The stored supplementary information may be used in instances where the original image has been lost (due, for example, to technical reasons) or altered (by human or computer intervention, for example).

A watermark detection process is described with reference to Figure 4. A comparison between the original content and watermarked content which has been manipulated provides a countermeasure against de-synchronization attacks as mentioned above. Image registration enables the original copy of the image to be located. If the original copy of the image is found, the detector is more likely to achieve synchronization and detect the watermark. The necessary information required to achieve synchronization is, in many cases, the original content. However, as discussed below, supplementary information may also be used to do this.

The detection process is initiated at step 402. The DOI number key is fed into the detector, and a detection algorithm is executed in relation to the watermarked content. The detector may detect the presence of a watermark in the watermarked image (412). If the detector does not detect the watermark (in the case that the watermarked content has been manipulated and therefore the detector is desynchronized), the original, non- watermarked content is retrieved from the digital archive. To find the original image, a QBIC (query by image content) is constructed at step 404 which enables a search based on the actual content of the images in the archive. A search is then conducted of the digital archive at step 406. If the original watermarked copy cannot be found, the possibility remains that the watermark is not detected (step 414). If the original watermarked copy is found, it is 'registered' in the archive by assigning to it the first identification number. This allows the image to be located in future in the library. Upon finding the original content, the detector compares the original content with the watermarked content and typically derives supplementary information so as to perform any estimates or adjustments that are necessary to achieve synchronization. The detection process is re-applied at step 410 and the watermark is found at step 412.

If the original content has not been found, a supplementary information database can be queried to retrieve the supplementary information extracted for the original content. By comparison of the supplementary information and the watermarked content, any alterations to the content can be identified. When data load on the system is high, the supplementary information database is queried instead of finding the original content to save time, since it typically takes longer to derive the supplementary information from the original content that to pre process the content and find the supplementary information which has already been stored.

The supplementary information associated with the image by the DOI number enhances the robustness of the watermark since it provides a defense against geometrical attacks during the detection process which may result in the failure of the detector to detect the 'robust' watermark. As such, the supplementary information helps to synchronize the detector and detect the watermark. For example, in the case of an image that has been distorted, the key used to embed the watermark is fed into the detector. If the watermark is not detected, the reason for this may be that the detector is desynchronized due to the distortion. In this case, the supplementary information is retrieved, using the first identification number, in order to determine the extent to which the distortion occurred. Knowledge of the original image enables the detection algorithm to compensate for a changes to the watermarked images caused by manipulation of the watermarked image.

The value returned by the detector indicates the watermark's existence (a Yes or No Boolean response). However, in other embodiments, the detector may return an integer value, which can indicate information regarding the digital object.

Figure 5 shows the peak signal to noise ratio, which provides a measure of the extent of change of quality introduced due to the embedding of watermarks. The higher the PSNR, the lower the difference in the quality of the signal by the original image and the watermarked image. The results of various geometrical attacks are provided in Figure 6, where the average score indicates the degree of success against the attacks.

Figure 7 details the detector outputs of digital to analog attack, where small numbers of images which were compressed with a jpeg algorithm were printed to plain paper and were then scanned back to digital form.

The watermarking technique of the present invention requires consideration of the data payload embedded into the content and the detector's ability to detect multiple watermarks. Data payload refers to the number of bits a watermark encodes within a unit of time or within a digital object. A disadvantage of embedding a substantial number of bits into the content is the extent of distortion to the original content. In the present invention, three zero-bit watermarks (where the detector's output is either one or zero) and one 14-bit watermark encoding 16384 different fingerprints (unique numbers) are used. Distortion introduced by the encoding of 17 bits is imperceptible as indicated by the calculated PSNR (Peak signal to noise ratio) values discussed above. Here, multiple watermarks refers to the detector's potential of detecting and distinguishing portions of different watermarks embedded into a single piece of content without confusion. The algorithm which generates the watermarks by allowing the detector to resolve the signal either above or below a determined threshold.

The detector used in the proposed DRM system reveals the existence of 11 watermarks. Three of them correspond to the three zero-bit schemes while the remaining eight responses are used for the encoding of the fingerprint.

Using the present invention, in addition to being able to verify ownership of content or a time at which content is published, or that content has not been tampered, for example, a person who is the target or victim or harassment or abuse on a social networking site is able to request that the webpage containing such offensive material is captured by the techniques of the present invention. Based on the verification techniques disclosed herein, the actual publication of such content can be verified (long after, for example, the particular page has been edited and the offensive content no longer publicly accessible or viewable). Whilst the invention has been described with reference to a still image derived from webpage content, the techniques described could also be used to watermark video, sound and other moving images or animations.

Claims

1. A method of watermarking digital content by a trusted third party, comprising:

capturing an image of the digital content;

embedding a first watermark into the digital content, wherein the presence of the first watermark indicates the owner of the digital content;

embedding a second watermark into the digital content, wherein the presence of the second watermark indicates integrity of the digital content;

storing the digital content in a repository.

2. The method of claim 1, wherein the first watermark is embedded using a unique image identification key.

3. The method of claim 1 or claim 2, wherein the second watermark is embedded using a constant, well known key.

4. The method of any preceding claim, wherein the digital content is content contained in a webpage and wherein the step of capturing an image comprises determining and validating the IP address of the webpage.

5. The method of claim 4, wherein the step of taking a snapshot comprises taking a still image of a website, wherein the still image comprises all content of the website.

6. The method of any of claims 4 or 5, wherein the step of taking a snapshot further comprises parsing the source code of the website.

7. The method of any preceding claim, further comprising the step of extracting supplementary information relating to the digital content prior to embedding the first and second watermarks, and storing the supplementary information in the repository.

8. The method of any preceding claim, further comprising storing, in a digital archive, a copy of the original digital content.

9. The method of any preceding claim, further comprising generating an authenticity certificate, wherein the authenticity certificate comprises the digital content, the unique image identification key, the constant key and a digital signature.

10. A digital verification system, comprising

means for capturing a still image of digital content

means for embedding a first watermark into the digital content

means for embedding a second watermark into the digital content

a repository for storing the digital content,

wherein the system is operated and maintained by a trusted third party.

11. The system of claim 10, wherein the trusted third party is a university.

12. The system of claim 10 or claim 11, further comprising means for detecting the first and second watermarks.

13. A method of capturing digital content, comprising

receiving, by an entity, a request for capturing digital content from a requesting party, capturing, by the entity, the requested digital content,

watermarking, by the entity, the captured digital content and

storing, by the entity, the watermarked digital content in a secure repository, wherein the first entity is a public organisation which is commercially, financially and legally neutral.

14. The method of claim 13, wherein the entity is a university.

15. The method of claim 13, further comprising assigning metadata to the image of the digital content, the metadata identifying the requesting party and wherein the metadata further comprising a time stamp indicating the time the image was captured.

16. The method of claim 13, 14 or 15, wherein the digital content is a webpage, and wherein step of capturing an image comprising capturing an image of the complete website.