US20140032264A1 - Data refining engine for high performance analysis system and method - Google Patents
Data refining engine for high performance analysis system and method Download PDFInfo
- Publication number
- US20140032264A1 US20140032264A1 US13/951,248 US201313951248A US2014032264A1 US 20140032264 A1 US20140032264 A1 US 20140032264A1 US 201313951248 A US201313951248 A US 201313951248A US 2014032264 A1 US2014032264 A1 US 2014032264A1
- Authority
- US
- United States
- Prior art keywords
- price
- product
- identifier
- uri
- specific
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
Definitions
- This disclosure relates to a method and system to analyze price and product information.
- Search engines such as Google, Bing, and others search and index vast quantities of information on the Internet.
- “Crawlers” (a.k.a. “spiders”) utilize URLs obtained from a “queue” to obtain content, usually from web pages.
- the crawlers or other software store and index some of the content. Users can then search the indexed content, view results, and follow hyperlinks back to the original source or to the stored content (the stored content often being referred to as a “cache”).
- Computing resources to crawl and index are not limitless.
- the URL queues are commonly prioritized to direct crawler resources to web page servers which can accommodate the traffic, which do not block crawlers (such as according to “robots.txt” files commonly available from webpage servers), which experience greater traffic from users, and which experience more change in content.
- search engines if presented with a query, will find corresponding products. For example, it is possible to search for “men's shoes” and to then be presented with a webpage comprising search results for hundreds of thousands of webpages for men's shoes. The search result may further be narrowed by category of men's shoes, brand, and store. Search engines have been incorporated into online stores, wherein a user may search for products, by keyword and/or by category and results can be ordered by price.
- Price history is only narrowly viewed and, when it is, never in the context of a rich attribute set which explores, in detail, which attributes are associated with changes in price. Price histories are not made available in real time, and do not allow intricate comparisons based on stores, merchants, brands, regions, time/date, and other dimensions.
- FIG. 1 is a network and device diagram illustrating exemplary computing devices configured according to embodiments disclosed in this paper.
- FIG. 2 is a functional block diagram of an exemplary Indix Server 200 computing device and some data structures and/or components thereof.
- FIG. 3 is a functional block diagram of the Indix Datastore 300 illustrated in the computing device of FIG. 2 .
- FIG. 4 is a flowchart illustrating an embodiment of an Analytics Routine 400 .
- FIG. 5 is a flowchart illustrating an embodiment of a Core Price Routine 500 .
- FIG. 6 is a flowchart illustrating an embodiment of an Insights Routine 600 .
- FIG. 7 is a flowchart illustrating an embodiment of a Volatility Routine 700 .
- FIGS. 8A-8C are flowcharts illustrating embodiments of a Substitution Routine 800 .
- FIG. 9 is a flowchart illustrating an embodiment of a Mix Routine 900 .
- FIG. 10 is a flowchart illustrating an embodiment of a Prediction Routine 1000 .
- FIG. 11 is a flowchart illustrating an embodiment of a Competition Routine 1100 .
- FIG. 12 is a flowchart illustrating an embodiment of a Promotion Routine 1200 .
- FIG. 13 is a flowchart illustrating an embodiment of a Leadership Routine 1300 .
- FIG. 14 is a flowchart illustrating an embodiment of a Premium Routine 1400 .
- FIG. 15 is a flowchart illustrating an embodiment of a Price Range Routine 1500 .
- FIG. 16 is a flowchart illustrating an embodiment of a Reach Routine 1600 .
- FIG. 17 is a flowchart illustrating an embodiment of a User Contact Routine 1700 .
- the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense, as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to.”
- the term “connected,” “coupled,” or any variant thereof means any connection or coupling, either direct or indirect between two or more elements; the coupling of connection between the elements can be physical, logical, or a combination thereof.
- the words, “herein,” “above,” “below,” and words of similar import, when used in this application shall refer to this application as a whole and not to particular portions of this application.
- words using the singular may also include the plural while words using the plural may also include the singular.
- the word “or,” in reference to a list of two or more items, covers all of the following interpretations of the word: any of the items in the list, all of the items in the list, and any combination of one or more of the items in the list.
- URI Uniform Resource Identifier
- a Uniform Resource Identifier is a string of characters used to identify a resource on a computing device and/or a network, such as the Internet. Such identification enables interaction with representations of the resource using specific protocols. “Schemes” specifying a syntax and associated protocols define each URI.
- URI Request for Comments
- IETF Internet Engineering Task Force
- a URI begins with a scheme name that refers to a specification for assigning identifiers within that scheme.
- the scheme name consists of a letter followed by any combination of letters, digits, and the plus (“+”), period (“.”), or hyphen (“-”) characters; and is terminated by a colon (“:”).
- the hierarchical portion of the URI is intended to hold identification information that is hierarchical in nature. Often this part is delineated with a double forward slash (“//”), followed by an optional authority part and an optional path.
- the optional authority part holds an optional user information part (not shown) terminated with “@” (e.g. username:password@), a hostname (i.e., domain name or IP address, here “example.com”), and an optional port number preceded by a colon “:”.
- @ e.g. username:password@
- hostname i.e., domain name or IP address, here “example.com”
- port number preceded by a colon “:”.
- the path part is a sequence of one or more segments (conceptually similar to directories, though not necessarily representing them) separated by a forward slash (“/”). If a URI includes an authority part, then the path part may be empty.
- the optional query portion is delineated with a question mark and contains additional identification information that is not necessarily hierarchical in nature. Together, the path part and the query portion identify a resource within the scope of the URI's scheme and authority.
- RFC 3986 provides additional information related to the syntax and structure of URIs.
- RFC 3986 is hereby incorporated by reference, for all purposes.
- Process shall be understood to mean “products or services.” References to “Product Attribute” herein shall be understood to mean “product or service attribute.” As used herein, “Products” are associated with iPIDs.
- an “iPID” or iPID 330 is a unique identifier assigned within the Indix System to a URI for a product, such as URI 305 .
- the iPID 330 may be, for example, a hash of URI 305 .
- a “Master iPID” or “MPID” or MPID 332 is an iPID 330 assigned to a group of iPIDs 330 derived from URIs 305 which lead to webpages offering the same Product for sale.
- An MPID is generally meant to identify a single Product, generally produced by a common manufacturer, though the Product may be distributed and sold by multiple parties.
- iPIDs and MPIDSs are associated with Price Attribute 340 records and Product Attribute 345 records.
- a Price Attribute 340 record may comprise one or more records comprising, for example, values which encode an iPRID which may be an identifier for a price observed at a particular time, an iPID (discussed above), a Product Name (a “Product Name” value in this record may also be referred to herein as a “Product”), a Standard Price, a Sale, a Price, a Rebate amount, a Price Instructions record (containing special instructions relating to a price, such as that the price only applies to students), a Currency Type, a Date and Time Stamp, a Tax record, a Shipping record (indicating costs relating to shipping to different locations, whether tax is calculated on shipping costs, etc.), a Price Validity Start Date, a Price Validity End Date, a Quantity, a Unit of Measure Type, a Unit of Measure Value, a Merchant Name (with the name of a merchant from whom the Product is available; a “Merchant Name” value in this record may also be referred to herein as
- a Product Attribute 345 record may comprise, for example, values encoding features of or describing a Product.
- the entire Product Attribute 345 schema may comprise thousands of columns, though only tens or hundreds of the columns may be applicable to any given Product.
- An example set of values in a Product Attribute 345 record for a ring is as follows: Title, “Sterling Silver Diamond & Blue Topaz Ring;” Brand, “Blue Nile;” Category (such as, for example, a Category 335 in a category schema), “rings;” Metal Name, “silver;” Stone Shape, “cushion;” Stone Name, “topaz;” Width, “3 mm;” Stone Color, “blue;” Product Type, “rings,” birthstone, “September;” and Setting Type, “prong.”
- An example set of Product Attributes 345 for a shoe is as follows: Brand, “Asics;” Category (such as, for example, a Category 335 in a category schema or tax
- Content comprises text, graphics, images (including still and video images), audio, graphical arrangement, and instructions for graphical arrangement, including HTML and CSS instructions which may, for example, be interpreted by browser applications.
- Event is information generally in news or current events. Events may be found in Content. Listing Pages, Product Pages, and Event Pages are all examples of Webpage Types 350 .
- PriceDNA comprises a Product Attribute 345 record, one or more Price Attribute 340 records, the output of the Core Price Routine 500 (generally found in the Core Price 380 records), and the output of the Insights 600 routine (generally found in the Insights 375 records).
- a “Brand” is a family or group of Products sold by or under a common trademark, such as the “Nike®” Brand, which sells under this trademark a family of shoes, exercise equipment, and other apparel. Brand is a value within a Product Attribute 345 record.
- a “Store” is an online or physical sales venue.
- a Store is a value within a Price Attribute 340 record.
- a “Merchant” is an operator of one or more Stores.
- a Merchant is a value in a Price Attribute 340 record.
- an Analysis Routine 400 obtains Price Attribute 340 and Product Attribute 345 records from the Indix Database 300 shortly after the records are produced following a crawl of webpages accessed via the URIs 305 .
- the Analysis Routine 400 merges the records, performs a Core Price Routine 500 to develop core price information, such as changes in price, and exports the records and the result to the Core Price Routine 500 to a sequential file which is indexed.
- the result of the Core Price Routine 500 may be searched and accessed by users in close to real-time.
- the Analysis Routine 400 also performs an Insight Routine 600 .
- the Insight Routine 600 comprises a set of sub-routines for deriving additional information from the Price Attribute 340 and Product Attribute 345 records and from the output of the Core Price Routine 500 .
- the Insight Routine 600 identifies what Product Attributes 345 and Price Attributes 340 across the datasets are associated with the changes in price.
- the output of the Insight Routine 600 is also stored in the Indix Database 300 and may be searched and accessed by users, though the accessible values may be refreshed more slowly than the data from the Core Price Routine 500 .
- a User Contact Routine 1700 allows users to search and obtain information and to set alerts relative to the information in the Indix Database 300 .
- FIG. 1 is a network and device diagram illustrating exemplary computing devices configured according to embodiments disclosed in this paper. Illustrated in FIG. 1 are an Indix Server 200 and an Indix Database 300 . The Indix Database 300 is discussed further in relation to FIG. 3 .
- FIG. 1 Also illustrated in FIG. 1 is a Crawl Agent 400 , representing Crawl Agents 1 to N, and a Crawl Agent Database 500 .
- the Crawl Agent 400 and Crawl Agent Database 500 are used to crawl webpages accessed via the URIs 305 .
- a Client Device 105 such as a mobile or non-mobile computer device.
- the Client Device 105 is an example of computing devices such as, for example, a mobile phone, a tablet, laptop, personal computer, gaming computer, or media playback computer.
- the Client Device 105 represents any computing device capable of rendering Content in a browser or an equivalent user-interface. Client Devices are used by “users.”
- the Client Device 105 may interact with the User Contact Routine 1700 .
- a Web Server 115 which may serve Content in the form of webpages or equivalent output in response to URIs, such as URI 305 .
- Ecommerce Platform 160 may provide ecommerce services, such as website and/or webpage hosting via webpage templates comprising HTML and CSS elements.
- Customers of Ecommerce Platform 160 may complete the webpage templates with Content and serve the webpages and websites from, for example, Web Server 115 .
- Interaction among devices illustrated in FIG. 1 may be accomplished, for example, through the use of credentials to authenticate and authorize a machine or user with respect to other machines.
- the computing machines may be physically separate computing devices or logically separate processes executed by a common computing device.
- Certain components are illustrated in FIG. 1 as connecting directly to one another (such as, for example, the Indix Database 300 to the Indix Server 200 ), though the connections may be through the Network 150 . If these components are embodied in separate computers, then additional steps may be added to the disclosed invention to recite communicating between the components.
- the Network 150 comprises computers, network connections among the computers, and software routines to enable communication between the computers over the network connections.
- Examples of the Network 150 comprise an Ethernet network, the Internet, and/or a wireless network, such as a GSM, TDMA, CDMA, EDGE, HSPA, LTE or other network provided by a wireless service provider, or a television broadcast facility. Connection to the Network 150 may be via a Wi-Fi connection. More than one network may be involved in a communication session between the illustrated devices. Connection to the Network 150 may require that the computers execute software routines which enable, for example, the seven layers of the OSI model of computer networking or equivalent in a wireless phone network.
- This paper may discuss a first computer as connecting to a second computer (such as a Crawl Agent 400 connecting to the Indix Server 200 ) or to a corresponding datastore (such as to Indix Database 300 ); it should be understood that such connections may be to, through, or via the other of the two components (for example, a statement that a computing device connects with or sends data to the Indix Server 200 should be understood as saying that the computing device may connect with or send data to the Indix Database 300 ).
- References herein to “database” should be understood as equivalent to “datastore.”
- the computers and databases may be provided by common (or separate) physical hardware and common (or separate) logic processors and memory components. Though discussed as occurring within one computing device, the software routines and data groups used by the software routines may be stored and/or executed remotely relative to any of the computers through, for example, application virtualization.
- FIG. 2 is a functional block diagram of an exemplary Indix Server 200 computing device and some data structures and/or components thereof.
- the Indix Server 200 in FIG. 2 comprises at least one Processing Unit 210 , Indix Server Memory 250 , a Display 240 and Input 245 , all interconnected along with the Network Interface 230 via a Bus 220 .
- the Processing Unit 210 may comprise one or more general-purpose Central Processing Units (“CPU”) 212 as well as one or more special-purpose Graphics Processing Units (“GPU”) 214 .
- the components of the Processing Unit 210 may be utilized by the Operating System 255 for different functions required by the routines executed by the Indix Server 200 .
- the Network Interface 230 may be utilized to form connections with the Network 150 or to form device-to-device connections with other computers.
- the Indix Server Memory 250 generally comprises a random access memory (“RAM”), a read only memory (“ROM”), and a permanent mass storage device, such as a disk drive or SDRAM (synchronous dynamic random-access memory).
- RAM random access memory
- ROM read only memory
- SDRAM synchronous dynamic random-access memory
- the Indix Server Memory 250 stores program code for software routines, such as, for example, Analysis Routine 400 , Core Price Routine 500 , Insight Routine 600 , Volatility Routine 700 , Substitution Routine 800 , Mix Routine 900 , Prediction Routine 1000 , Competition Routine 1100 , Promotion Routine 1200 , Leadership Routine 1300 , Premium Routine 1400 , Price Range Routine 1500 , Reach Routine 1600 , and User Contact Routine 1700 as well as, for example, browser, email client and server routines, client applications, and database applications (discussed further below). Additional data groups for routines, such as for a webserver and web browser, may also be present on and executed by the Indix Server 200 and the other computers illustrated in FIG. 1 .
- Webserver and browser routines may provide an interface for interaction among the computing devices, for example, through webserver and web browser routines which may serve and respond to data and information in the form of webpages and html documents or files.
- the browsers and webservers are meant to illustrate machine- and user-interface and user-interface enabling routines generally, and may be replaced by equivalent routines for serving and rendering information to and in interfaces in a computing device (whether in a web browser or in, for example, a mobile device application).
- the Indix Server Memory 250 also stores an Operating System 255 .
- These software components may be loaded from a non-transient Computer Readable Storage Medium 295 into Indix Server Memory 250 of the computing device using a drive mechanism (not shown) associated with a non-transient Computer Readable Storage Medium 295 , such as a floppy disc, tape, DVD/CD-ROM drive, memory card, or other like storage medium.
- software components may also or instead be loaded via a mechanism other than a drive mechanism and Computer Readable Storage Medium 295 (e.g., via Network Interface 230 ).
- the computing device 200 may also comprise hardware supporting input modalities, Input 245 , such as, for example, a touchscreen, a camera, a keyboard, a mouse, a trackball, a stylus, motion detectors, and a microphone.
- the Input 245 may also serve as a Display 240 , as in the case of a touchscreen display which also serves as Input 245 , and which may respond to input in the form of contact by a finger or stylus with the surface of the Input 245 .
- the computing device 200 may also comprise or communicate via Bus 220 with Indix Datastore 300 , illustrated further in FIG. 3 .
- Bus 220 may comprise a storage area network (“SAN”), a high speed serial bus, and/or via other suitable communication technology.
- the Indix Server 200 may communicate with the Indix Datastore 300 via Network Interface 230 .
- the Indix Server 200 may, in some embodiments, include many more components than those shown in this Figure. However, it is not necessary that all of these generally conventional components be shown in order to disclose an illustrative embodiment.
- FIG. 3 is a functional block diagram of the Indix Datastore 300 illustrated in the computing device of FIG. 2 .
- the components of the Indix Datastore 300 are data groups used by routines and are discussed further herein in the discussion of other of the Figures.
- the data groups used by routines illustrated in FIG. 3 may be represented by a cell in a column or a value separated from other values in a defined structure in a digital document or file. Though referred to herein as individual records or entries, the records may comprise more than one database entry.
- the database entries may be, represent, or encode numbers, numerical operators, binary values, logical values, text, string operators, joins, conditional logic, tests, and similar.
- FIG. 4 is a flowchart illustrating an embodiment of an Analytics Routine 400 .
- the Analytic Routine 400 may be performed by, for example, the Indix Server 200 .
- the Analytic Routine 400 obtains a new set of Price Attribute 340 records and a new set of Product Attribute 345 records, with an assigned MPID 332 and Category 335 . This may occur as frequently as URIs 305 are crawled, the webpages therefrom parsed into Parse Results 325 comprising new Price Attribute 340 and Product Attribute 345 records with an assigned MPID 332 and Category 335 .
- the Analytic Routine 400 performs the Core Price Routine 500 (discussed further below).
- the Analytic Routine 400 appends the then-current Price Attribute 340 record (of box 405 ) to a set of Price Attribute 340 records associated with each iPID 330 (each iPID 330 may be associated with a set of Price Attribute 340 records).
- the Analytic Routine 400 merges the then-current Product Attribute 345 record (of box 405 ) into a Product Attribute 345 record associated with each iPID 330 (each iPID 330 may be associated with one Product Attribute 340 record).
- new values overwrite old values unless the old record is longer or unless the old record otherwise is judged to be of higher quality (such as if the old record uses fewer words, but the words are less common than the words in the new record); if a new record does not have a value where an old value exists, the old value may be left.
- the output of the Core Price Routine 500 and of boxes 410 and 415 are output to a Sequential File 365 record, which Sequential File 365 record is stored, for example, in the Indix Database 300 and which Sequential File 365 is indexed, for example, to allow the contents of the Sequential File 365 to be searched and values in it accessed as it and the index are updated. Updates may occur, for example, in close-to real-time, following crawl of a webpage and output of new Price Attribute 340 and Product Attribute 345 records.
- the Analytic Routine 400 performs the Insight Routine 600 utilizing and expanding upon the output of the Core Price Routine 500 and the boxes, above. Generally, the Insight Routine 600 identifies what Product Attributes 345 and Price Attributes 340 across the datasets are associated with the changes in price.
- the Analytic Routine 400 stores the output of the Insight Routine 600 in the Indix Database 300 as Insights 375 .
- the Analytic Routine 400 performs the User Contact Routine 1700 . Utilizing the User Contact Routine 1700 , users may query the Indix Database 300 and set alerts.
- FIG. 5 is a flowchart illustrating an embodiment of a Core Price Routine 500 .
- Boxes 505 through 540 may iterate for each new Price Attribute 340 record associated with an iPID 330 .
- all Price Attribute 340 records associated with the iPID 330 including the new Price Attribute 340 record of box 405 and historic records (and/or summary values derived therefrom), may be obtained.
- the high, low, average, mean, magnitude, and number of price values over several time periods for the iPID 330 may be calculated.
- a default time period may be 45 or 30 days, though these values may be calculated for several time periods.
- the output may be saved, for example, to the Core Price 380 records, and indexed.
- an MPID 332 associated with the iPID 330 may be obtained.
- the high, low, average, mean, magnitude, and number of price values over several time periods may be calculated for the MPID 332 utilizing the new value associated with the iPID 330 from box 515 .
- the iPID 330 may be a hash of a URI 305 and the result of box 515 is thus limited to a particular sales channel (typically a Store or Merchant) for a particular Product (taking into account that duplicate iPIDs 330 from a base domain name may be treated as equivalent); the MPID 332 is assigned to all iPIDS 330 which represent the same Product, so the MPID version of this calculation in box 525 returns values relating to the Product across Stores, Merchants, Locations, etc.
- the calculation of box 525 may return values which are or may be sorted by, for example, Store, Merchant, Location (such as Region), and by time periods such as a Season.
- the output may be saved, for example, to the Core Price 380 records, and indexed.
- all calculations and other routines which utilize the values for the iPID 330 and the associated MPID 332 may insert the new values calculated for the iPID 330 and the MPID 332 and may recalculate the values. For example, the high, low, average, mean, magnitude, and number of price changes over time periods by Category 335 , such as a Category 335 associated with the iPID 330 , may be calculated.
- the output may be saved, for example, to the Core Price 380 records, and indexed.
- Calculations or other routines which utilize the values calculated in FIG. 5 may refer to data addresses.
- the Core Price Routine 500 may update the values stored at these data addresses, which causes the calculations or other routines to update their output, when such calculations or other routines are (re)executed, such on a schedule or on the occurrence of an event.
- the Core Price Routine 500 may return, for example, to the Analysis Routine 400 .
- FIG. 6 is a flowchart illustrating an embodiment of an Insights Routine 600 .
- the Insights Routine 700 may perform one or more of a set of sub-routines.
- a Volatility Routine 700 may be performed to determine the volatility of prices relative to the many dimensions available in the PriceDNA.
- a Substitution Routine 800 determines substitutes for an iPID 330 , MPID 332 , or Category 335 .
- a Mix Routine 900 determines “how many” relative to the many dimensions available in the PriceDNA.
- a Prediction Routine 1000 makes price predictions relative to the many dimensions available in the PriceDNA.
- a Competition Routine 1100 determines competitors relative to a Product, Store, or Brand.
- a Promotion Routine 1200 determines promotions relative to Products, Stores, Brand, Seasons, and other dimensions available in the PriceDNA.
- a Leadership Routine 1300 determines which Products lead or follow others in terms of price changes.
- a Premium Routine 1400 determines which Products in a Category 335 charge higher (premium) prices.
- a Price Range Routine 1500 determines the number of price ranges and maximum and minimum for iPIDs, MPIDs, and categories.
- a Reach Routine 1600 determines the reach of an iPID or MPID in terms of the number of people who visit a sales venue.
- FIG. 7 is a flowchart illustrating an embodiment of a Volatility Routine 700 .
- the prices associated with an iPID 330 over a time period, such as 30 days, may be obtained, such as from the Core Price 380 records.
- the number of price changes within the time period may be determined (if this was not already a value in the Core Price 380 records).
- the number of price changes within the time period (“VBF”) may be determined relative to, for example, the iPID 330 , relative to an MPID 332 associated with the iPID 330 , relative to a Brand, relative to a Region, relative to a Price Band by MPID 332 , relative to a Category 335 , and relative to all iPIDs 330 associated with a Merchant.
- the values may be saved and indexed to accelerate access to and/or enable searching for the values and/or the values may be calculated on an as-needed basis.
- the values may be saved to the Insights 375 records.
- the benchmark number of Price changes in the period of time may be determined.
- the benchmark may be, for example, the VBF relative to additional criteria, such as, for example, the VBF for a Product (or MPID), plus 1, divided by the maximum VBF of other Products in the same Category as the Product, multiplied by 100 over 101.
- the benchmark VBF for a Category may be determined by the VBF for the Category, plus 1, divided by the maximum VBF of the Category, multiplied by 100 over 101.
- the benchmark VBF for a Merchant may be the VBF of the Merchant, plus 1, divided by the maximum VBF of the Merchant, multiplied by 100 over 101.
- the benchmark VBF for a Brand may be the VBF of the Brand, plus 1, divided by the maximum VBF of the Brand, multiplied by 100 over 101.
- the values may be saved to the Insights 375 records.
- FIGS. 8A-8C are flowcharts illustrating embodiments of a Substitution Routine 800 .
- substitute Products within a Category 335 are identified.
- a Product may be identified by, for example, a user or a routine, and the MPID 332 corresponding thereto may be obtained.
- a Category 335 may be obtained, whether corresponding to the Product and MPID of step 801 or via a user query or other input, and all MPIDs 332 within the Category 335 may be obtained.
- a Price Band may be obtained or calculated relative to the Category 335 (such as from or according to the Price Range Routine 1500 ); the Price Band may be selected by a user. Boxes 815 through 830 may iterate for each iPID 330 within the Category of box 805 .
- the iPIDs 330 in the Category of box 805 and with a Price value within the Price Band of box 810 are identified, such as from the Core Price 380 records.
- the result of box 820 may be subdivided, grouped, or filtered by Region, Time, Used/New, and according to other dimensions available in the PriceDNA.
- the Substitution Routine 800 may iterate over the remaining iPIDs 330 in the Category 335 .
- the results may be saved as Substitutes, such as to the Insights 375 records.
- the process may return.
- a Category 335 may be obtained, whether corresponding to a Product or via a user query or other input, and all MPIDs 332 within the Category 335 may be obtained.
- the Product Attributes 345 of all iPIDS 330 within the MPIDs 332 may be obtained.
- the Product Attributes 345 may be clustered to identify the iPIDs 330 with at least a 50% Product Attribute 345 match or overlap.
- a Price Band may be obtained or calculated relative to the Category 335 (such as from or according to the Price Range Routine 1500 ); the Price Band may be selected by a user.
- Boxes 860 through 870 may iterate for each iPID 330 within the MPIDs 332 and Attribute 345 match of box 850 .
- the iPIDs 330 with a Price value within the Price Band of box 855 and with the Product Attribute 345 match or overlap of box 850 are identified.
- the result of box 865 may be subdivided or grouped further by sub-Price Ranges.
- the Substitution Routine 800 may iterate over the remaining iPIDs 330 in the MPIDs 332 within the Category 335 .
- the results may be saved as Substitutes in the Insights 375 records.
- the process may return.
- substitute Products within a Category 335 with a percentage overlap in Attributes 340 / 345 and in the top or bottom of a Price Range are identified.
- a Category 335 may be obtained, whether corresponding to a Product or via a user query or other input, and all MPIDs 332 within the Category 335 may be obtained.
- the Product Attributes 345 of all iPIDS 330 within the MPIDs 332 may be obtained.
- the Product Attributes 345 may be clustered to identify the iPIDs 330 with at least a 50 % Product Attribute 345 match or overlap.
- Boxes 890 through 897 may iterate for each iPID 330 within the MPIDs 332 and Attribute 345 match of box 885 .
- the iPIDs 330 with the Product Attribute 345 match or overlap of box 885 and in the bottom of a Price Range or Price Band relative to the starting iPID 330 are identified.
- the top or bottom five (or another subset) of box 895 may be selected.
- this embodiment of the Substitution Routine 800 may iterate over the remaining iPIDs 330 in the MPIDs 332 within the Category 335 .
- the results may be saved as Substitutes in the Insights 375 records.
- the process may return.
- FIG. 9 is a flowchart illustrating an embodiment of a Mix Routine 900 .
- the Mix Routine 900 determines “how many” relative to the many dimensions available in the PriceDNA.
- the Mix Routine 900 obtains a first segmentation criteria, such as, for example, a Product Name, Brand, or Category.
- a first sub-segmentation criteria may be obtained, such as, for example, a Store, Location, or Price Band.
- a second sub-segmentation criteria may be obtained, such as, for example, a Store, Location, or Price Band.
- the number of Products such as by MPID 332 , which meet the criteria of blocks 905 , 910 , and 915 may be counted.
- the result of block 920 may be subdivided or grouped by Location, Time, Season, Price Band, Used/New or other dimensions available in the PriceDNA.
- the results of blocks 920 and/or 925 may be saved as Mix values in the Insights 375 records.
- the process may return.
- FIG. 10 is a flowchart illustrating an embodiment of a Prediction Routine 1000 .
- the Prediction Routine 1000 makes price predictions relative to the many dimensions available in the PriceDNA.
- the Prediction Routine 1000 obtains a Product and obtains or identifies an MPID 332 and/or iPIDs 330 associated therewith.
- the last Price of the Product by MPID 332 and/or iPID 330 may be obtained, such as from the Core Price 380 records.
- first and second linear regression parameters may be calculated or obtained.
- the second parameter multiplied by the last price of the Product from block 1010 .
- an error term may be added to the result of block 1020 .
- a confidence interval may be calculated.
- the result may be saved as Predictions in the Insights 375 records.
- the Prediction Routine 1000 may then return.
- the parameters of the model are estimated using the original least squares method as follows:
- FIG. 11 is a flowchart illustrating an embodiment of a Competition Routine 1100 .
- the Competition Routine 1100 determines competitors relative to a Stores, Brands, or Merchants.
- a first and second (or more) Store, Brand, or Merchant may be obtained, along with an optional Category 335 . These may be obtained from a user or another routine.
- all Products sold by or under each of the entities of box 1105 may be obtained, such as from the PriceDNA.
- the Products may optionally be filtered by the Category of box 1105 .
- the affirmative output of this box may be saved as Competitors in the Insights 375 records.
- the Competitors may be filtered by, for example, on or more of Store, Substitute, Substitute by Price Band, Brand, Location (including Region), Time (including Season), and whether the Products are sold as used or new. Which criteria are used in the filter may be determined by input from a user. The output of box 1120 may be saved in the Insights 375 records.
- the average price of Products in the Category 335 of box 1105 may be obtained relative to, for example, the Category 335 , Substitute, Substitute by Price Band, Brand, Location, Time, used/new status, and other criteria.
- the output of box 1125 may be ranked and saved as Price Competitiveness in the Insights 375 records.
- a Store and Location for a target Product may be obtained, such as from a user.
- the Competitors from box 1115 may be obtained or determined and the Competitors filtered to select only Competitors with sales in the Location of box 1135 .
- Stores in the Location which are the same as the Store of box 1135 may be removed from the set of Competitors, leaving the remainder (those not removed).
- the output of box 1150 may be placed in a Voroni Diagram or similar data structure, with the location in the Vononi Diagram being based on physical location of the Stores of the Competitors. Generally, a Voroni Diagram determines the distance between objects in a geometric manner, rather than a power-law manner.
- the distance between the target Store and each Competitor may be ranked.
- the output of box 1160 may be saved as Reach Competitiveness in the Insights 375 records.
- FIG. 12 is a flowchart illustrating an embodiment of a Promotion Routine 1200 .
- the Promotion Routine 1200 determines promotions relative to Products, Stores, Brand, Seasons, and other dimensions available in the PriceDNA.
- a Product may be obtained, such as from user input, and the MPID 332 and/or an IPID 330 corresponding to the Product may be identified in the Attributes 340 / 345 (via, for example, the Sequential File 365 ).
- the Product may be a single Product or a Bundle comprising multiple Products.
- a “Promotion” value may be identified in the Attributes 340 / 345 associated with the MPID 332 and/or IPID 330 ; the “Promotion” value may be a Sale Price and/or a Promotion Code in the Price Attribute 340 records associated with the MPID 332 and/or IPID 330 .
- the Price history for the MPID 332 and/or IPID 330 may be graphed.
- the number, length, date/time, and magnitude of the Promotions may be determined and saved as Promotions in the Insights 375 records.
- the number, length, date/time, and magnitude of the low-points in the graph of box 1210 may be determined and saved as Promotions in the Insights 375 records.
- the output of box 1215 may be filtered by criteria such as, for example, date/time, Price Band, Location (including Region), Season, and Holidays.
- the criteria may be received from, for example, a user and/or a default set of criteria may be applied, with the result of each being saved in the Insights 375 records.
- a time period and a Merchant may be obtained, such as from a user; the Merchant may be associated with the Product of box 1205 .
- the number of Products sold by the Merchant in Promotion during the time period may be determined.
- the result of box 1215 may be benchmarked relative to average Promotion times, durations, and magnitude for other Products (including other Bundles of the Product), the timing of Promotions for other Products, relative to the magnitude of Promotions for other Products, relative to the Products associated with a Brand, relative to all Products sold at a Store, relative to Products in a Price Band, and relative to Competitors and Substitutes.
- the result may be saved in the Insights 375 records.
- FIG. 13 is a flowchart illustrating an embodiment of a Leadership Routine 1300 .
- the Leadership Routine 1300 determines which Products lead or follow others in terms of price changes.
- a Product may be obtained, for example, from a user or another routine, and the associated MPID 332 determined.
- Substitutes for the Product may be obtained (such as from or by the Substitutes 800 routine).
- the change in Price, or Price delta, for the Product and the Substitutes may be determined over periods of time.
- the Price delta may be determined in an absolute sense (whether the change was positive or negative) and/or with a determination of the magnitude of the Price delta.
- the Price deltas determined at box 1315 may be matched, to determine if any of the Price deltas with the same absolute value (positive or negative) occurred within a time window of one another (deltas beyond the time window may not be considered to be correlated), with the result being saved as a Leader/Follower indication in the Insights 375 records.
- the matching Price deltas of box 1320 may be graphed according to time.
- the result of box 1325 may be filtered by criteria such as Region, Rime, Date/Time, Season, Price Band, and Store.
- the number of Leaders and followers may be determined relative to a time period.
- the average lead/follow time may be determined.
- leaders/followers with respect to exact Product matches (for different Stores selling the same Product, determined at box 1330 ) may be identified.
- the results may be benchmarked relative to the number of leaders/followers and other criteria. The result of various of the boxes in FIG. 13 may be saved in the Insights 375 records.
- the Leadership Routine 1300 may return.
- FIG. 14 is a flowchart illustrating an embodiment of a Premium Routine 1400 .
- the Premium Routine 1400 determines which Products (generally, by MPID) in a Category 335 charge higher prices (premium).
- a Product may be received, such as from input by a user or another routine.
- the Substitutes for the Product may be determined or obtained from another routine, such as the Substitution 800 routine and/or the Insights 375 records.
- the prices of the Product and of the Substitutes may be obtained, such as from the Core Price 380 records.
- the obtained Prices of box 1415 may be graphed or mapped and the top of the Price distribution identified.
- the top of the Price distribution may be the top five or ten percent or the top five Products or Substitutes may be identified and saved as the “Premium” Products in the Insights 375 records.
- the Product Attributes 345 of the Products and Substitutes of box 1410 may be obtained and clustered by similarity.
- the Product Attributes 345 unique to or dominant in the Premium Products, determined by the clusters of box 1425 may be identified and saved in the Insights 375 records.
- user votes regarding Product Attributes 345 of Premium Products may be received.
- the user votes may be tallied and, at box 1445 , the “winning” Product Attributes 345 (with the most votes) may be set as the Product Attributes 345 associated with the Premium Products in the Insights 375 records.
- FIG. 15 is a flowchart illustrating an embodiment of a Price Range Routine 1500 .
- the Price Range Routine 1500 determines the number of price ranges and maximum and minimum for iPIDs, MPIDs, and categories.
- a Product may be obtained, such as from a user or another routine.
- the prices for the Product may be obtained, such as from the PriceDNA for the Product.
- the prices of box 1510 may be clustered by similarity and with a minimum cluster size, with the range in Price across each cluster being saved as Price Ranges for the Product in the Insights 375 records.
- the Channel Range for the Product may be set as the minimum and maximum of the prices of box 1510 and saved in the Insights 375 records.
- the results of boxes 1510 , 1515 , and 1520 may be filtered by, for example, Region, Date/Time, and according to other criteria and saved in the Insights 375 records.
- the Price Ranges may be determined relative to all Products in a Category 335 , all Products by a Brand, and relative to a benchmark which may be, for example, the maximum number of Price Ranges within a Category 335 . The result thereof may be saved as Price Ranges in the Insights 375 records.
- FIG. 16 is a flowchart illustrating an embodiment of a Reach Routine 1600 .
- the Reach Routine 1600 determines the reach of an iPID or MPID in terms of the number of people who visit a sales venue.
- a Product may be obtained, such as from a user or another routine.
- the Stores offering the Product for sale may be obtained.
- the traffic at the stores may be obtained, such as from a source for online webpage/website traffic, such as Alexa or similar.
- the result of box 1615 may be filtered by, for example, criteria such as Date/Time (including Season), Location (including Region), Holiday, and other criteria. The result thereof may be saved as Reach in the Insights 375 records.
- the Reach Routine 1600 may return.
- FIG. 17 is a flowchart illustrating an embodiment of a User Contact Routine 1700 .
- a user contact with the User Contact Routine 1700 may be detected.
- the user contact may be part of a user-interface served by the User Contact Routine 1700 .
- a user query may be received, such as for PriceDNA records and/or Insight records.
- the user query may be executed relative to the Index 370 and the Sequential File 365 .
- a determination may be made regarding whether the user has requested that the query be stored as an alert. If so, then at box 1725 a time period for the alert may be obtained or set (such as according to a default time period, such as once per day or week).
- the query may be executed relative to the Index 370 and the Sequential File 365 .
- an alert or other message may be sent to contact information associated with the user.
- the User Contact Routine 1700 may conclude.
Abstract
Price and product attributes from webpages are analyzed over time to identify price changes specific to products on individual webpages and for products across all webpages as well as to identify longitudinal correlations between price changes and product attributes. Users may search the data and set alerts.
Description
- This application claims the benefit of and incorporates by reference U.S. Provisional Patent Application No. 61/675,492, filed on Jul. 25, 2012. This application also incorporates by reference co-pending U.S. patent application Ser. No. ______, filed on Jul. 25, 2013, titled, “Adaptive Gathering of Structured and Unstructured Data System and Method,” which application also claims the benefit of U.S. Provisional Patent Application No. 61/675,492.
- This disclosure relates to a method and system to analyze price and product information.
- The following description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
- Search engines, such as Google, Bing, and others search and index vast quantities of information on the Internet. “Crawlers” (a.k.a. “spiders”) utilize URLs obtained from a “queue” to obtain content, usually from web pages. The crawlers or other software store and index some of the content. Users can then search the indexed content, view results, and follow hyperlinks back to the original source or to the stored content (the stored content often being referred to as a “cache”). Computing resources to crawl and index, however, are not limitless. The URL queues are commonly prioritized to direct crawler resources to web page servers which can accommodate the traffic, which do not block crawlers (such as according to “robots.txt” files commonly available from webpage servers), which experience greater traffic from users, and which experience more change in content.
- Conventional search engines, however, are not focused on price and product information. If a price changes on a webpage, but the rest of the webpage remains the same, traditional crawlers (or the queue manager) will not prioritize the webpage position in the queue, generally because the price is a tiny fraction of the overall content and the change is not labeled as being significant; conversely, if the webpage changes, but the price and/or product information remains the same, the change in webpage content may cause a traditional crawler to prioritize the webpage position in the queue due to the overall change in content, notwithstanding that that price and product information remained the same.
- Conventional search engines, if presented with a query, will find corresponding products. For example, it is possible to search for “men's shoes” and to then be presented with a webpage comprising search results for hundreds of thousands of webpages for men's shoes. The search result may further be narrowed by category of men's shoes, brand, and store. Search engines have been incorporated into online stores, wherein a user may search for products, by keyword and/or by category and results can be ordered by price.
- Price history, however, is only narrowly viewed and, when it is, never in the context of a rich attribute set which explores, in detail, which attributes are associated with changes in price. Price histories are not made available in real time, and do not allow intricate comparisons based on stores, merchants, brands, regions, time/date, and other dimensions.
-
FIG. 1 is a network and device diagram illustrating exemplary computing devices configured according to embodiments disclosed in this paper. -
FIG. 2 is a functional block diagram of an exemplary Indix Server 200 computing device and some data structures and/or components thereof. -
FIG. 3 is a functional block diagram of the Indix Datastore 300 illustrated in the computing device ofFIG. 2 . -
FIG. 4 is a flowchart illustrating an embodiment of anAnalytics Routine 400. -
FIG. 5 is a flowchart illustrating an embodiment of aCore Price Routine 500. -
FIG. 6 is a flowchart illustrating an embodiment of anInsights Routine 600. -
FIG. 7 is a flowchart illustrating an embodiment of aVolatility Routine 700. -
FIGS. 8A-8C are flowcharts illustrating embodiments of aSubstitution Routine 800. -
FIG. 9 is a flowchart illustrating an embodiment of aMix Routine 900. -
FIG. 10 is a flowchart illustrating an embodiment of aPrediction Routine 1000. -
FIG. 11 is a flowchart illustrating an embodiment of aCompetition Routine 1100. -
FIG. 12 is a flowchart illustrating an embodiment of aPromotion Routine 1200. -
FIG. 13 is a flowchart illustrating an embodiment of aLeadership Routine 1300. -
FIG. 14 is a flowchart illustrating an embodiment of aPremium Routine 1400. -
FIG. 15 is a flowchart illustrating an embodiment of aPrice Range Routine 1500. -
FIG. 16 is a flowchart illustrating an embodiment of aReach Routine 1600. -
FIG. 17 is a flowchart illustrating an embodiment of aUser Contact Routine 1700. - The following Detailed Description provides specific details for an understanding of various examples of the technology. One skilled in the art will understand that the technology may be practiced without many of these details. In some instances, structures and functions have not been shown or described in detail or at all to avoid unnecessarily obscuring the description of the examples of the technology. It is intended that the terminology used in the description presented below be interpreted in its broadest reasonable manner, even though it is being used in conjunction with a detailed description of certain examples of the technology. Although certain terms may be emphasized below, any terminology intended to be interpreted in any restricted manner will be overtly and specifically defined as such in this Detailed Description section.
- Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense, as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to.” As used herein, the term “connected,” “coupled,” or any variant thereof means any connection or coupling, either direct or indirect between two or more elements; the coupling of connection between the elements can be physical, logical, or a combination thereof. Additionally, the words, “herein,” “above,” “below,” and words of similar import, when used in this application, shall refer to this application as a whole and not to particular portions of this application. When the context permits, words using the singular may also include the plural while words using the plural may also include the singular. The word “or,” in reference to a list of two or more items, covers all of the following interpretations of the word: any of the items in the list, all of the items in the list, and any combination of one or more of the items in the list.
- Certain elements appear in various of the Figures with the same capitalized element text, but a different element number. When referred to herein with the capitalized element text but with no element number, these references should be understood to be largely equivalent and to refer to any of the elements with the same capitalized element text, though potentially with differences based on the computing device within which the various embodiments of the element appears.
- As used herein, a Uniform Resource Identifier (“URI”) is a string of characters used to identify a resource on a computing device and/or a network, such as the Internet. Such identification enables interaction with representations of the resource using specific protocols. “Schemes” specifying a syntax and associated protocols define each URI.
- The generic syntax for URI schemes is defined in Request for Comments (“RFC”) memorandum 3986 published by the Internet Engineering Task Force (“IETF”). According to RFC 3986, a URI (including a URL) consists of four parts:
-
- <scheme name>: <hierarchical part> [?<query>] [#<fragment>]
- A URI begins with a scheme name that refers to a specification for assigning identifiers within that scheme. The scheme name consists of a letter followed by any combination of letters, digits, and the plus (“+”), period (“.”), or hyphen (“-”) characters; and is terminated by a colon (“:”).
- The hierarchical portion of the URI is intended to hold identification information that is hierarchical in nature. Often this part is delineated with a double forward slash (“//”), followed by an optional authority part and an optional path.
- The optional authority part holds an optional user information part (not shown) terminated with “@” (e.g. username:password@), a hostname (i.e., domain name or IP address, here “example.com”), and an optional port number preceded by a colon “:”.
- The path part is a sequence of one or more segments (conceptually similar to directories, though not necessarily representing them) separated by a forward slash (“/”). If a URI includes an authority part, then the path part may be empty.
- The optional query portion is delineated with a question mark and contains additional identification information that is not necessarily hierarchical in nature. Together, the path part and the query portion identify a resource within the scope of the URI's scheme and authority.
- The query string syntax is not generically defined, but is commonly organized as a sequence of zero or more <key>=<value> pairs separated by a semicolon or ampersand, for example:
-
- key1=value1;key2=value2;key3=value3 (Semicolon), or
- key1=value1&key2=value2&key3=value3 (Ampersand)
- Much of the above information is taken from RFC 3986, which provides additional information related to the syntax and structure of URIs. RFC 3986 is hereby incorporated by reference, for all purposes.
- As used herein, “Product” shall be understood to mean “products or services.” References to “Product Attribute” herein shall be understood to mean “product or service attribute.” As used herein, “Products” are associated with iPIDs.
- As used herein, an “iPID” or
iPID 330 is a unique identifier assigned within the Indix System to a URI for a product, such asURI 305. TheiPID 330 may be, for example, a hash ofURI 305. Whenmultiple URIs 305 from a common base domain name lead to webpages which, when parsed for Price and Product Attributes and product the same Parse Result 325 (notwithstanding that the webpages may contain other Content which does not contribute to the Parse Result 325) may be labeled as equivalent in, for example, theEquivalent iPID 334 record and may be treated as thesame iPID 330. - As used herein, a “Master iPID” or “MPID” or
MPID 332 is aniPID 330 assigned to a group ofiPIDs 330 derived fromURIs 305 which lead to webpages offering the same Product for sale. An MPID is generally meant to identify a single Product, generally produced by a common manufacturer, though the Product may be distributed and sold by multiple parties. - iPIDs and MPIDSs are associated with
Price Attribute 340 records andProduct Attribute 345 records. - A Price Attribute 340 record may comprise one or more records comprising, for example, values which encode an iPRID which may be an identifier for a price observed at a particular time, an iPID (discussed above), a Product Name (a “Product Name” value in this record may also be referred to herein as a “Product”), a Standard Price, a Sale, a Price, a Rebate amount, a Price Instructions record (containing special instructions relating to a price, such as that the price only applies to students), a Currency Type, a Date and Time Stamp, a Tax record, a Shipping record (indicating costs relating to shipping to different locations, whether tax is calculated on shipping costs, etc.), a Price Validity Start Date, a Price Validity End Date, a Quantity, a Unit of Measure Type, a Unit of Measure Value, a Merchant Name (with the name of a merchant from whom the Product is available; a “Merchant Name” value in this record may also be referred to herein as a “Merchant”), a Store Name (a Merchant may have multiple stores; a “Store Name” value in this record may also be referred to herein as a “Store”), a User ID, a Data Channel (indicating the source of the Price Attribute 340 record, such as an online crawl, a crowdsource, a licensed supplier of price information, or from a merchant), a Source Details record (for example, indicating a URI, a newspaper advertisement), an Availability Flag, a Promotion Code, a Bundle Details record (indicating products which are part of a bundle), a Condition Type record (indicating new, used, poor, good, and similar), a Social Rank record (indicating a rank of “likes” and similar of the price), a Votes/Likes record (indicating a number of “likes” and similar which a Price or Product has received), a Price Rank record, a Visibility Indicator record (indicating whether the price is visible to the public, whether it is only visible to a Merchant, or the like), a Supply Chain Reference record (indicating whether the price was obtained from a retailer, a wholesaler, or another party in a supply chain), a Sale Location (indicating a geographic location where the product is available at the price), a Manufactured Location record (indicating where the product was produced or manufactured), a Launch Date record (indicating how long the product has been on the market), and an Age of Product record (indicating how long the product was used by the user). When capitalized herein, the foregoing terms (such as Product, Price, Merchant, Store, Source Details, etc.) are meant to refer to values in a
Price Attribute 340 record. - A
Product Attribute 345 record may comprise, for example, values encoding features of or describing a Product. Theentire Product Attribute 345 schema may comprise thousands of columns, though only tens or hundreds of the columns may be applicable to any given Product. An example set of values in aProduct Attribute 345 record for a ring is as follows: Title, “Sterling Silver Diamond & Blue Topaz Ring;” Brand, “Blue Nile;” Category (such as, for example, aCategory 335 in a category schema), “rings;” Metal Name, “silver;” Stone Shape, “cushion;” Stone Name, “topaz;” Width, “3 mm;” Stone Color, “blue;” Product Type, “rings,” Birthstone, “September;” and Setting Type, “prong.” An example set ofProduct Attributes 345 for a shoe is as follows: Brand, “Asics;” Category (such as, for example, aCategory 335 in a category schema or taxonomy), “Men's Sneakers & Athletic;” Shoe Size, “8;” Product Type, “wrestling shoes,” Color, “black;” Shoe Style, “sneakers;” Sports, “athletic;” Upper Material, “mesh.” When capitalized herein, the foregoing terms (such as Brand, Category, Metal Name, Product Type, etc.) are meant to refer to values in aProduct Attribute 345 record. - As used herein, “Content” comprises text, graphics, images (including still and video images), audio, graphical arrangement, and instructions for graphical arrangement, including HTML and CSS instructions which may, for example, be interpreted by browser applications.
- As used herein, “Event” is information generally in news or current events. Events may be found in Content. Listing Pages, Product Pages, and Event Pages are all examples of Webpage Types 350.
- As used herein, “PriceDNA” comprises a
Product Attribute 345 record, one ormore Price Attribute 340 records, the output of the Core Price Routine 500 (generally found in theCore Price 380 records), and the output of theInsights 600 routine (generally found in theInsights 375 records). - As used herein, a “Brand” is a family or group of Products sold by or under a common trademark, such as the “Nike®” Brand, which sells under this trademark a family of shoes, exercise equipment, and other apparel. Brand is a value within a
Product Attribute 345 record. - As used herein, a “Store” is an online or physical sales venue. A Store is a value within a
Price Attribute 340 record. - As used herein, a “Merchant” is an operator of one or more Stores. A Merchant is a value in a
Price Attribute 340 record. - Generally, an
Analysis Routine 400 obtainsPrice Attribute 340 andProduct Attribute 345 records from theIndix Database 300 shortly after the records are produced following a crawl of webpages accessed via theURIs 305. TheAnalysis Routine 400 merges the records, performs aCore Price Routine 500 to develop core price information, such as changes in price, and exports the records and the result to theCore Price Routine 500 to a sequential file which is indexed. The result of theCore Price Routine 500 may be searched and accessed by users in close to real-time. TheAnalysis Routine 400 also performs anInsight Routine 600. TheInsight Routine 600 comprises a set of sub-routines for deriving additional information from thePrice Attribute 340 andProduct Attribute 345 records and from the output of theCore Price Routine 500. Generally, theInsight Routine 600 identifies what Product Attributes 345 and Price Attributes 340 across the datasets are associated with the changes in price. The output of theInsight Routine 600 is also stored in theIndix Database 300 and may be searched and accessed by users, though the accessible values may be refreshed more slowly than the data from theCore Price Routine 500. AUser Contact Routine 1700 allows users to search and obtain information and to set alerts relative to the information in theIndix Database 300. -
FIG. 1 is a network and device diagram illustrating exemplary computing devices configured according to embodiments disclosed in this paper. Illustrated inFIG. 1 are anIndix Server 200 and anIndix Database 300. TheIndix Database 300 is discussed further in relation toFIG. 3 . - Also illustrated in
FIG. 1 is aCrawl Agent 400, representingCrawl Agents 1 to N, and aCrawl Agent Database 500. TheCrawl Agent 400 andCrawl Agent Database 500 are used to crawl webpages accessed via theURIs 305. - Also illustrated in
FIG. 1 is aClient Device 105, such as a mobile or non-mobile computer device. TheClient Device 105 is an example of computing devices such as, for example, a mobile phone, a tablet, laptop, personal computer, gaming computer, or media playback computer. TheClient Device 105 represents any computing device capable of rendering Content in a browser or an equivalent user-interface. Client Devices are used by “users.” TheClient Device 105 may interact with theUser Contact Routine 1700. - Also illustrated in
FIG. 1 is aWeb Server 115, which may serve Content in the form of webpages or equivalent output in response to URIs, such asURI 305. - Also illustrated in
FIG. 1 is anEcommerce Platform 160, which may provide ecommerce services, such as website and/or webpage hosting via webpage templates comprising HTML and CSS elements. Customers ofEcommerce Platform 160 may complete the webpage templates with Content and serve the webpages and websites from, for example,Web Server 115. - Interaction among devices illustrated in
FIG. 1 may be accomplished, for example, through the use of credentials to authenticate and authorize a machine or user with respect to other machines. - In
FIG. 1 , the computing machines may be physically separate computing devices or logically separate processes executed by a common computing device. Certain components are illustrated inFIG. 1 as connecting directly to one another (such as, for example, theIndix Database 300 to the Indix Server 200), though the connections may be through theNetwork 150. If these components are embodied in separate computers, then additional steps may be added to the disclosed invention to recite communicating between the components. - The
Network 150 comprises computers, network connections among the computers, and software routines to enable communication between the computers over the network connections. Examples of theNetwork 150 comprise an Ethernet network, the Internet, and/or a wireless network, such as a GSM, TDMA, CDMA, EDGE, HSPA, LTE or other network provided by a wireless service provider, or a television broadcast facility. Connection to theNetwork 150 may be via a Wi-Fi connection. More than one network may be involved in a communication session between the illustrated devices. Connection to theNetwork 150 may require that the computers execute software routines which enable, for example, the seven layers of the OSI model of computer networking or equivalent in a wireless phone network. - This paper may discuss a first computer as connecting to a second computer (such as a
Crawl Agent 400 connecting to the Indix Server 200) or to a corresponding datastore (such as to Indix Database 300); it should be understood that such connections may be to, through, or via the other of the two components (for example, a statement that a computing device connects with or sends data to theIndix Server 200 should be understood as saying that the computing device may connect with or send data to the Indix Database 300). References herein to “database” should be understood as equivalent to “datastore.” Although illustrated as components integrated in one physical unit, the computers and databases may be provided by common (or separate) physical hardware and common (or separate) logic processors and memory components. Though discussed as occurring within one computing device, the software routines and data groups used by the software routines may be stored and/or executed remotely relative to any of the computers through, for example, application virtualization. -
FIG. 2 is a functional block diagram of anexemplary Indix Server 200 computing device and some data structures and/or components thereof. TheIndix Server 200 inFIG. 2 comprises at least oneProcessing Unit 210,Indix Server Memory 250, aDisplay 240 and Input 245, all interconnected along with theNetwork Interface 230 via aBus 220. TheProcessing Unit 210 may comprise one or more general-purpose Central Processing Units (“CPU”) 212 as well as one or more special-purpose Graphics Processing Units (“GPU”) 214. The components of theProcessing Unit 210 may be utilized by theOperating System 255 for different functions required by the routines executed by theIndix Server 200. TheNetwork Interface 230 may be utilized to form connections with theNetwork 150 or to form device-to-device connections with other computers. TheIndix Server Memory 250 generally comprises a random access memory (“RAM”), a read only memory (“ROM”), and a permanent mass storage device, such as a disk drive or SDRAM (synchronous dynamic random-access memory). - The
Indix Server Memory 250 stores program code for software routines, such as, for example,Analysis Routine 400,Core Price Routine 500,Insight Routine 600,Volatility Routine 700,Substitution Routine 800,Mix Routine 900,Prediction Routine 1000,Competition Routine 1100,Promotion Routine 1200,Leadership Routine 1300,Premium Routine 1400,Price Range Routine 1500,Reach Routine 1600, andUser Contact Routine 1700 as well as, for example, browser, email client and server routines, client applications, and database applications (discussed further below). Additional data groups for routines, such as for a webserver and web browser, may also be present on and executed by theIndix Server 200 and the other computers illustrated inFIG. 1 . Webserver and browser routines may provide an interface for interaction among the computing devices, for example, through webserver and web browser routines which may serve and respond to data and information in the form of webpages and html documents or files. The browsers and webservers are meant to illustrate machine- and user-interface and user-interface enabling routines generally, and may be replaced by equivalent routines for serving and rendering information to and in interfaces in a computing device (whether in a web browser or in, for example, a mobile device application). - In addition, the
Indix Server Memory 250 also stores anOperating System 255. These software components may be loaded from a non-transient ComputerReadable Storage Medium 295 intoIndix Server Memory 250 of the computing device using a drive mechanism (not shown) associated with a non-transient ComputerReadable Storage Medium 295, such as a floppy disc, tape, DVD/CD-ROM drive, memory card, or other like storage medium. In some embodiments, software components may also or instead be loaded via a mechanism other than a drive mechanism and Computer Readable Storage Medium 295 (e.g., via Network Interface 230). - The
computing device 200 may also comprise hardware supporting input modalities,Input 245, such as, for example, a touchscreen, a camera, a keyboard, a mouse, a trackball, a stylus, motion detectors, and a microphone. TheInput 245 may also serve as aDisplay 240, as in the case of a touchscreen display which also serves asInput 245, and which may respond to input in the form of contact by a finger or stylus with the surface of theInput 245. - The
computing device 200 may also comprise or communicate viaBus 220 withIndix Datastore 300, illustrated further inFIG. 3 . In various embodiments,Bus 220 may comprise a storage area network (“SAN”), a high speed serial bus, and/or via other suitable communication technology. In some embodiments, theIndix Server 200 may communicate with theIndix Datastore 300 viaNetwork Interface 230. TheIndix Server 200 may, in some embodiments, include many more components than those shown in this Figure. However, it is not necessary that all of these generally conventional components be shown in order to disclose an illustrative embodiment. -
FIG. 3 is a functional block diagram of theIndix Datastore 300 illustrated in the computing device ofFIG. 2 . The components of theIndix Datastore 300 are data groups used by routines and are discussed further herein in the discussion of other of the Figures. The data groups used by routines illustrated inFIG. 3 may be represented by a cell in a column or a value separated from other values in a defined structure in a digital document or file. Though referred to herein as individual records or entries, the records may comprise more than one database entry. The database entries may be, represent, or encode numbers, numerical operators, binary values, logical values, text, string operators, joins, conditional logic, tests, and similar. -
FIG. 4 is a flowchart illustrating an embodiment of anAnalytics Routine 400. TheAnalytic Routine 400 may be performed by, for example, theIndix Server 200. Atbox 405, theAnalytic Routine 400 obtains a new set ofPrice Attribute 340 records and a new set ofProduct Attribute 345 records, with an assignedMPID 332 andCategory 335. This may occur as frequently asURIs 305 are crawled, the webpages therefrom parsed into ParseResults 325 comprisingnew Price Attribute 340 andProduct Attribute 345 records with an assignedMPID 332 andCategory 335. - At
box 500, theAnalytic Routine 400 performs the Core Price Routine 500 (discussed further below). Atbox 410, for eachiPID 330 associated with aPrice Attribute 340 orProduct Attribute 345 record inbox 405, theAnalytic Routine 400 appends the then-current Price Attribute 340 record (of box 405) to a set ofPrice Attribute 340 records associated with each iPID 330 (eachiPID 330 may be associated with a set ofPrice Attribute 340 records). Atbox 415, for eachiPID 330 associated with aPrice Attribute 340 orProduct Attribute 345 record inbox 405, theAnalytic Routine 400 merges the then-current Product Attribute 345 record (of box 405) into aProduct Attribute 345 record associated with each iPID 330 (eachiPID 330 may be associated with oneProduct Attribute 340 record). In this merger, new values overwrite old values unless the old record is longer or unless the old record otherwise is judged to be of higher quality (such as if the old record uses fewer words, but the words are less common than the words in the new record); if a new record does not have a value where an old value exists, the old value may be left. - At
box 420, the output of theCore Price Routine 500 and ofboxes Sequential File 365 record, whichSequential File 365 record is stored, for example, in theIndix Database 300 and whichSequential File 365 is indexed, for example, to allow the contents of theSequential File 365 to be searched and values in it accessed as it and the index are updated. Updates may occur, for example, in close-to real-time, following crawl of a webpage and output ofnew Price Attribute 340 andProduct Attribute 345 records. - At
box 600, theAnalytic Routine 400 performs theInsight Routine 600 utilizing and expanding upon the output of theCore Price Routine 500 and the boxes, above. Generally, theInsight Routine 600 identifies what Product Attributes345 and Price Attributes 340 across the datasets are associated with the changes in price. At box 435 theAnalytic Routine 400 stores the output of theInsight Routine 600 in theIndix Database 300 asInsights 375. Atbox 1700, theAnalytic Routine 400 performs theUser Contact Routine 1700. Utilizing theUser Contact Routine 1700, users may query theIndix Database 300 and set alerts. -
FIG. 5 is a flowchart illustrating an embodiment of aCore Price Routine 500.Boxes 505 through 540 may iterate for eachnew Price Attribute 340 record associated with aniPID 330. Atbox 510 allPrice Attribute 340 records associated with theiPID 330, including thenew Price Attribute 340 record ofbox 405 and historic records (and/or summary values derived therefrom), may be obtained. Atbox 515, the high, low, average, mean, magnitude, and number of price values over several time periods for theiPID 330 may be calculated. A default time period may be 45 or 30 days, though these values may be calculated for several time periods. The output may be saved, for example, to theCore Price 380 records, and indexed. - At
box 520, anMPID 332 associated with theiPID 330 may be obtained. Atbox 525, the high, low, average, mean, magnitude, and number of price values over several time periods may be calculated for theMPID 332 utilizing the new value associated with theiPID 330 frombox 515. TheiPID 330 may be a hash of aURI 305 and the result ofbox 515 is thus limited to a particular sales channel (typically a Store or Merchant) for a particular Product (taking into account that duplicateiPIDs 330 from a base domain name may be treated as equivalent); theMPID 332 is assigned to alliPIDS 330 which represent the same Product, so the MPID version of this calculation inbox 525 returns values relating to the Product across Stores, Merchants, Locations, etc. The calculation ofbox 525 may return values which are or may be sorted by, for example, Store, Merchant, Location (such as Region), and by time periods such as a Season. The output may be saved, for example, to theCore Price 380 records, and indexed. - At
box 535, all calculations and other routines which utilize the values for theiPID 330 and the associatedMPID 332 may insert the new values calculated for theiPID 330 and theMPID 332 and may recalculate the values. For example, the high, low, average, mean, magnitude, and number of price changes over time periods byCategory 335, such as aCategory 335 associated with theiPID 330, may be calculated. The output may be saved, for example, to theCore Price 380 records, and indexed. - Calculations or other routines which utilize the values calculated in
FIG. 5 may refer to data addresses. TheCore Price Routine 500 may update the values stored at these data addresses, which causes the calculations or other routines to update their output, when such calculations or other routines are (re)executed, such on a schedule or on the occurrence of an event. - At
box 599, theCore Price Routine 500 may return, for example, to theAnalysis Routine 400. -
FIG. 6 is a flowchart illustrating an embodiment of anInsights Routine 600. - The
Insights Routine 700 may perform one or more of a set of sub-routines. Atbox 700, aVolatility Routine 700 may be performed to determine the volatility of prices relative to the many dimensions available in the PriceDNA. Atbox 800, aSubstitution Routine 800 determines substitutes for aniPID 330,MPID 332, orCategory 335. Atbox 900, aMix Routine 900 determines “how many” relative to the many dimensions available in the PriceDNA. Atbox 1000, aPrediction Routine 1000 makes price predictions relative to the many dimensions available in the PriceDNA. Atbox 1100, aCompetition Routine 1100 determines competitors relative to a Product, Store, or Brand. Atbox 1200, aPromotion Routine 1200 determines promotions relative to Products, Stores, Brand, Seasons, and other dimensions available in the PriceDNA. Atbox 1300, aLeadership Routine 1300 determines which Products lead or follow others in terms of price changes. Atbox 1400, aPremium Routine 1400 determines which Products in aCategory 335 charge higher (premium) prices. Atbox 1500, aPrice Range Routine 1500 determines the number of price ranges and maximum and minimum for iPIDs, MPIDs, and categories. Atbox 1600, aReach Routine 1600 determines the reach of an iPID or MPID in terms of the number of people who visit a sales venue. -
FIG. 7 is a flowchart illustrating an embodiment of aVolatility Routine 700. Atbox 705, the Prices associated with aniPID 330 over a time period, such as 30 days, may be obtained, such as from theCore Price 380 records. Atbox 710, the number of price changes within the time period may be determined (if this was not already a value in theCore Price 380 records). Atbox 715, the number of price changes within the time period (“VBF”) may be determined relative to, for example, theiPID 330, relative to anMPID 332 associated with theiPID 330, relative to a Brand, relative to a Region, relative to a Price Band byMPID 332, relative to aCategory 335, and relative to alliPIDs 330 associated with a Merchant. The values may be saved and indexed to accelerate access to and/or enable searching for the values and/or the values may be calculated on an as-needed basis. The values may be saved to theInsights 375 records. - At
box 720, the benchmark number of Price changes in the period of time may be determined. The benchmark may be, for example, the VBF relative to additional criteria, such as, for example, the VBF for a Product (or MPID), plus 1, divided by the maximum VBF of other Products in the same Category as the Product, multiplied by 100 over 101. The benchmark VBF for a Category may be determined by the VBF for the Category, plus 1, divided by the maximum VBF of the Category, multiplied by 100 over 101. The benchmark VBF for a Merchant may be the VBF of the Merchant, plus 1, divided by the maximum VBF of the Merchant, multiplied by 100 over 101. The benchmark VBF for a Brand may be the VBF of the Brand, plus 1, divided by the maximum VBF of the Brand, multiplied by 100 over 101. The values may be saved to theInsights 375 records. -
FIGS. 8A-8C are flowcharts illustrating embodiments of aSubstitution Routine 800. In a first example of an embodiment of aSubstitution Routine 800 illustrated inFIG. 8A , substitute Products within aCategory 335 are identified. Atbox 801, which, like other steps may be optional, a Product may be identified by, for example, a user or a routine, and theMPID 332 corresponding thereto may be obtained. Atbox 805, aCategory 335 may be obtained, whether corresponding to the Product and MPID ofstep 801 or via a user query or other input, and allMPIDs 332 within theCategory 335 may be obtained. At box 810 a Price Band may be obtained or calculated relative to the Category 335 (such as from or according to the Price Range Routine 1500); the Price Band may be selected by a user.Boxes 815 through 830 may iterate for eachiPID 330 within the Category ofbox 805. - At
box 820, theiPIDs 330 in the Category ofbox 805 and with a Price value within the Price Band ofbox 810 are identified, such as from theCore Price 380 records. Atbox 825, the result ofbox 820 may be subdivided, grouped, or filtered by Region, Time, Used/New, and according to other dimensions available in the PriceDNA. Atbox 830 theSubstitution Routine 800 may iterate over the remainingiPIDs 330 in theCategory 335. Atbox 835, the results may be saved as Substitutes, such as to theInsights 375 records. Atbox 839, the process may return. - In a second example of an embodiment of a
Substitution Routine 800 illustrated inFIG. 8B , substitute Products within aCategory 335 with a percentage overlap inAttributes 340/345 and within a Price Band are identified. Atbox 840, aCategory 335 may be obtained, whether corresponding to a Product or via a user query or other input, and allMPIDs 332 within theCategory 335 may be obtained. Atbox 845, the Product Attributes 345 of alliPIDS 330 within theMPIDs 332 may be obtained. Atbox 850, the Product Attributes 345 may be clustered to identify theiPIDs 330 with at least a 50% Product Attribute 345 match or overlap. At box 855 a Price Band may be obtained or calculated relative to the Category 335 (such as from or according to the Price Range Routine 1500); the Price Band may be selected by a user. -
Boxes 860 through 870 may iterate for eachiPID 330 within theMPIDs 332 and Attribute 345 match ofbox 850. Atbox 865, theiPIDs 330 with a Price value within the Price Band ofbox 855 and with theProduct Attribute 345 match or overlap ofbox 850 are identified. The result ofbox 865 may be subdivided or grouped further by sub-Price Ranges. Atbox 870 theSubstitution Routine 800 may iterate over the remainingiPIDs 330 in theMPIDs 332 within theCategory 335. Atbox 871, the results may be saved as Substitutes in theInsights 375 records. Atbox 874, the process may return. - In a third example of an embodiment of a
Substitution Routine 800 illustrated inFIG. 8C , substitute Products within aCategory 335 with a percentage overlap inAttributes 340/345 and in the top or bottom of a Price Range are identified. Atbox 875, aCategory 335 may be obtained, whether corresponding to a Product or via a user query or other input, and allMPIDs 332 within theCategory 335 may be obtained. Atbox 880, the Product Attributes 345 of alliPIDS 330 within theMPIDs 332 may be obtained. Atbox 885, the Product Attributes 345 may be clustered to identify theiPIDs 330 with at least a 50% Product Attribute 345 match or overlap. -
Boxes 890 through 897 may iterate for eachiPID 330 within theMPIDs 332 and Attribute 345 match ofbox 885. Atbox 895, theiPIDs 330 with theProduct Attribute 345 match or overlap ofbox 885 and in the bottom of a Price Range or Price Band relative to the startingiPID 330 are identified. Atbox 896 the top or bottom five (or another subset) ofbox 895 may be selected. Atbox 897 this embodiment of theSubstitution Routine 800 may iterate over the remainingiPIDs 330 in theMPIDs 332 within theCategory 335. Atbox 898, the results may be saved as Substitutes in theInsights 375 records. Atbox 899, the process may return. -
FIG. 9 is a flowchart illustrating an embodiment of aMix Routine 900. TheMix Routine 900 determines “how many” relative to the many dimensions available in the PriceDNA. Atblock 905, theMix Routine 900 obtains a first segmentation criteria, such as, for example, a Product Name, Brand, or Category. Atblock 910, a first sub-segmentation criteria may be obtained, such as, for example, a Store, Location, or Price Band. Atblock 915, a second sub-segmentation criteria may be obtained, such as, for example, a Store, Location, or Price Band. Atblock 920, the number of Products, such as byMPID 332, which meet the criteria ofblocks block 925, the result ofblock 920 may be subdivided or grouped by Location, Time, Season, Price Band, Used/New or other dimensions available in the PriceDNA. Atblock 930, the results ofblocks 920 and/or 925 may be saved as Mix values in theInsights 375 records. Atblock 999, the process may return. -
FIG. 10 is a flowchart illustrating an embodiment of aPrediction Routine 1000. ThePrediction Routine 1000 makes price predictions relative to the many dimensions available in the PriceDNA. Atblock 1005, thePrediction Routine 1000 obtains a Product and obtains or identifies anMPID 332 and/oriPIDs 330 associated therewith. Atblock 1010, the last Price of the Product byMPID 332 and/oriPID 330 may be obtained, such as from theCore Price 380 records. Atblock 1015, first and second linear regression parameters may be calculated or obtained. - At
block 1020, to the first parameter may be added the second parameter multiplied by the last price of the Product fromblock 1010. Atblock 1025 an error term may be added to the result ofblock 1020. At block 1030 a confidence interval may be calculated. Atblock 1035 the result may be saved as Predictions in theInsights 375 records. Atblock 1035 thePrediction Routine 1000 may then return. - In
FIG. 10 , the predicted Price for a product may be determined according to the following equation: pt=α+βp (t-1)+ε, where pt is the price at time t, α and β are the parameters of the linear regression and ε is the error term and is assumed to be Normally distributed. Confidence, C, is a measure that represents the chance for making 0.01% error in predicting the price of the product, C=normsdist(Z)and -
- In this formula, the parameters of the model are estimated using the original least squares method as follows:
-
-
FIG. 11 is a flowchart illustrating an embodiment of aCompetition Routine 1100. TheCompetition Routine 1100 determines competitors relative to a Stores, Brands, or Merchants. Atbox 1105, a first and second (or more) Store, Brand, or Merchant may be obtained, along with anoptional Category 335. These may be obtained from a user or another routine. Atbox 1110, all Products sold by or under each of the entities ofbox 1105 may be obtained, such as from the PriceDNA. The Products may optionally be filtered by the Category ofbox 1105. - At
box 1115, a determination may be made regarding whether or not the entities ofbox 1105 have 70% or more overlapping Products, per the Products ofbox 1110. The affirmative output of this box may be saved as Competitors in theInsights 375 records. - At
box 1120, the Competitors may be filtered by, for example, on or more of Store, Substitute, Substitute by Price Band, Brand, Location (including Region), Time (including Season), and whether the Products are sold as used or new. Which criteria are used in the filter may be determined by input from a user. The output ofbox 1120 may be saved in theInsights 375 records. - At
box 1125, the average price of Products in theCategory 335 ofbox 1105 may be obtained relative to, for example, theCategory 335, Substitute, Substitute by Price Band, Brand, Location, Time, used/new status, and other criteria. Atbox 1130, the output ofbox 1125 may be ranked and saved as Price Competitiveness in theInsights 375 records. - At
box 1135, a Store and Location for a target Product may be obtained, such as from a user. Atbox 1145, the Competitors frombox 1115 may be obtained or determined and the Competitors filtered to select only Competitors with sales in the Location ofbox 1135. Atbox 1145, Stores in the Location which are the same as the Store ofbox 1135 may be removed from the set of Competitors, leaving the remainder (those not removed). - At
box 1150, the output ofbox 1150 may be placed in a Voroni Diagram or similar data structure, with the location in the Vononi Diagram being based on physical location of the Stores of the Competitors. Generally, a Voroni Diagram determines the distance between objects in a geometric manner, rather than a power-law manner. Atbox 1155, the distance between the target Store and each Competitor may be ranked. Atbox 1160, the output ofbox 1160 may be saved as Reach Competitiveness in theInsights 375 records. -
FIG. 12 is a flowchart illustrating an embodiment of aPromotion Routine 1200. ThePromotion Routine 1200 determines promotions relative to Products, Stores, Brand, Seasons, and other dimensions available in the PriceDNA. Atbox 1205, a Product may be obtained, such as from user input, and theMPID 332 and/or anIPID 330 corresponding to the Product may be identified in theAttributes 340/345 (via, for example, the Sequential File 365). The Product may be a single Product or a Bundle comprising multiple Products. Atbox 1210, a “Promotion” value may be identified in theAttributes 340/345 associated with theMPID 332 and/orIPID 330; the “Promotion” value may be a Sale Price and/or a Promotion Code in thePrice Attribute 340 records associated with theMPID 332 and/orIPID 330. Alternatively, atbox 1210 the Price history for theMPID 332 and/orIPID 330 may be graphed. - At
box 1215, the number, length, date/time, and magnitude of the Promotions may be determined and saved as Promotions in theInsights 375 records. Alternatively, the number, length, date/time, and magnitude of the low-points in the graph ofbox 1210 may be determined and saved as Promotions in theInsights 375 records. Atbox 1220, the output ofbox 1215 may be filtered by criteria such as, for example, date/time, Price Band, Location (including Region), Season, and Holidays. The criteria may be received from, for example, a user and/or a default set of criteria may be applied, with the result of each being saved in theInsights 375 records. - At box 1225 a time period and a Merchant may be obtained, such as from a user; the Merchant may be associated with the Product of
box 1205. Atbox 1225, the number of Products sold by the Merchant in Promotion during the time period may be determined. - At
box 1230, the result ofbox 1215 may be benchmarked relative to average Promotion times, durations, and magnitude for other Products (including other Bundles of the Product), the timing of Promotions for other Products, relative to the magnitude of Promotions for other Products, relative to the Products associated with a Brand, relative to all Products sold at a Store, relative to Products in a Price Band, and relative to Competitors and Substitutes. The result may be saved in theInsights 375 records. -
FIG. 13 is a flowchart illustrating an embodiment of aLeadership Routine 1300. TheLeadership Routine 1300 determines which Products lead or follow others in terms of price changes. Atbox 1305, a Product may be obtained, for example, from a user or another routine, and the associatedMPID 332 determined. Atbox 1310 Substitutes for the Product may be obtained (such as from or by theSubstitutes 800 routine). Atbox 1315, the change in Price, or Price delta, for the Product and the Substitutes may be determined over periods of time. The Price delta may be determined in an absolute sense (whether the change was positive or negative) and/or with a determination of the magnitude of the Price delta. - At
box 1320, the Price deltas determined atbox 1315 may be matched, to determine if any of the Price deltas with the same absolute value (positive or negative) occurred within a time window of one another (deltas beyond the time window may not be considered to be correlated), with the result being saved as a Leader/Follower indication in theInsights 375 records. - At
box 1325, the matching Price deltas ofbox 1320 may be graphed according to time. Atbox 1330, the result ofbox 1325 may be filtered by criteria such as Region, Rime, Date/Time, Season, Price Band, and Store. - At
box 1335, the number of Leaders and Followers may be determined relative to a time period. Atbox 1340, the average lead/follow time may be determined. Atbox 1345, leaders/followers with respect to exact Product matches (for different Stores selling the same Product, determined at box 1330) may be identified. Atbox 1350, the results may be benchmarked relative to the number of leaders/followers and other criteria. The result of various of the boxes inFIG. 13 may be saved in theInsights 375 records. Atbox 1399, theLeadership Routine 1300 may return. -
FIG. 14 is a flowchart illustrating an embodiment of aPremium Routine 1400. ThePremium Routine 1400 determines which Products (generally, by MPID) in aCategory 335 charge higher Prices (premium). Atbox 1405, a Product may be received, such as from input by a user or another routine. Atbox 1410, the Substitutes for the Product may be determined or obtained from another routine, such as theSubstitution 800 routine and/or theInsights 375 records. Atbox 1415, the Prices of the Product and of the Substitutes may be obtained, such as from theCore Price 380 records. Atbox 1420, the obtained Prices ofbox 1415 may be graphed or mapped and the top of the Price distribution identified. The top of the Price distribution may be the top five or ten percent or the top five Products or Substitutes may be identified and saved as the “Premium” Products in theInsights 375 records. - At
box 1425, the Product Attributes 345 of the Products and Substitutes ofbox 1410 may be obtained and clustered by similarity. Atbox 1430, the Product Attributes 345 unique to or dominant in the Premium Products, determined by the clusters ofbox 1425, may be identified and saved in theInsights 375 records. - At
box 1435, user votes regarding Product Attributes 345 of Premium Products may be received. Atbox 1440, the user votes may be tallied and, atbox 1445, the “winning” Product Attributes 345 (with the most votes) may be set as the Product Attributes 345 associated with the Premium Products in theInsights 375 records. -
FIG. 15 is a flowchart illustrating an embodiment of aPrice Range Routine 1500. ThePrice Range Routine 1500 determines the number of price ranges and maximum and minimum for iPIDs, MPIDs, and categories. Atbox 1505, a Product may be obtained, such as from a user or another routine. Atbox 1510, the Prices for the Product may be obtained, such as from the PriceDNA for the Product. Atbox 1515, the Prices ofbox 1510 may be clustered by similarity and with a minimum cluster size, with the range in Price across each cluster being saved as Price Ranges for the Product in theInsights 375 records. - At
box 1520, the Channel Range for the Product may be set as the minimum and maximum of the Prices ofbox 1510 and saved in theInsights 375 records. Atbox 1525, the results ofboxes Insights 375 records. Atbox 1530, the Price Ranges may be determined relative to all Products in aCategory 335, all Products by a Brand, and relative to a benchmark which may be, for example, the maximum number of Price Ranges within aCategory 335. The result thereof may be saved as Price Ranges in theInsights 375 records. -
FIG. 16 is a flowchart illustrating an embodiment of aReach Routine 1600. TheReach Routine 1600 determines the reach of an iPID or MPID in terms of the number of people who visit a sales venue. Atbox 1605, a Product may be obtained, such as from a user or another routine. Atbox 1610, the Stores offering the Product for sale may be obtained. Atbox 1615, the traffic at the stores may be obtained, such as from a source for online webpage/website traffic, such as Alexa or similar. Atbox 1620, the result ofbox 1615 may be filtered by, for example, criteria such as Date/Time (including Season), Location (including Region), Holiday, and other criteria. The result thereof may be saved as Reach in theInsights 375 records. Atbox 1699, theReach Routine 1600 may return. -
FIG. 17 is a flowchart illustrating an embodiment of aUser Contact Routine 1700. Atbox 1705, a user contact with theUser Contact Routine 1700 may be detected. The user contact may be part of a user-interface served by theUser Contact Routine 1700. Atbox 1710, a user query may be received, such as for PriceDNA records and/or Insight records. Atbox 1715, the user query may be executed relative to theIndex 370 and theSequential File 365. Atbox 1720, a determination may be made regarding whether the user has requested that the query be stored as an alert. If so, then at box 1725 a time period for the alert may be obtained or set (such as according to a default time period, such as once per day or week). Atbox 1730, on occurrence of the time period ofbox 1725, the query may be executed relative to theIndex 370 and theSequential File 365. Atbox 1735, an alert or other message may be sent to contact information associated with the user. Atbox 1799, theUser Contact Routine 1700 may conclude. - The above Detailed Description of embodiments is not intended to be exhaustive or to limit the disclosure to the precise form disclosed above. While specific embodiments of, and examples are described above for illustrative purposes, various equivalent modifications are possible within the scope of the system, as those skilled in the art will recognize. For example, while processes or blocks are presented in a given order, alternative embodiments may perform routines having operations, or employ systems having blocks, in a different order, and some processes or blocks may be deleted, moved, added, subdivided, combined, and/or modified. While processes or blocks are at times shown as being performed in series, these processes or blocks may instead be performed in parallel, or may be performed at different times. Further, any specific numbers noted herein are only examples; alternative implementations may employ differing values or ranges.
Claims (24)
1. A computer implemented method of processing information from webpages, the method comprising:
receiving a first and a second set of price and product attributes for a first product, which attributes comprise:
a first identifier of a first identifier-type derived from a URI which links to a webpage offering the product for sale, a second identifier of a second identifier-type assigned to all instances of the product as offered for sale at any URI, and a first category in a category taxonomy;
performing a first URI-specific price analysis of price values in the first and second sets of price and product attributes to identify changes in price for the first product and associating the result with the first identifier of the first identifier-type and saving the result as a first URI-specific core price result;
receiving a third and fourth set of price and product attributes for a second product, which attributes comprise:
a third identifier of the first identifier-type, a fourth identifier of the second identifier-type, and a second category in the category taxonomy;
performing a second URI-specific price analysis of price values in the third and fourth sets of price and product attributes to identify changes in price for the second product and associating the result with the third identifier of the first identifier-type and saving the result as a second URI-specific core price result;
when the second identifier and the fourth identifiers are the same, performing a first non-URI-specific price analysis utilizing the first and second URI-specific core price results to identify changes in price according to the second identifier-type and saving the result as a first non-URI-specific core price result;
saving and indexing the output of the URI-specific and non-URI-specific price analyses in a first file structure and making the first file structure available to be searched substantially as the sets of product and price attributes are received;
performing a meta-analysis utilizing the URI-specific and non-URI-specific core price results to identify what product and price attributes across the datasets are associated with the changes in price; and
saving and indexing the output of the meta-analysis as a second file structure and making the second file structure available to be searched.
2. The method of claim 1 , further comprising merging new product attribute records into prior product attribute records and saving new price attribute records along with prior price attribute records.
3. The method of claim 1 , wherein the URI-specific price analysis comprises determining the high, low, average, mean, magnitude and number of price changes over at least one time period for the price and product attributes associated with the same identifier of the first identifier-type.
4. The method of claim 1 , wherein the non-URI-specific price analysis comprises determining the high, low, average, mean, magnitude and number of price changes over at least one time period for the price and product attributes associated the same identifier of the second identifier-type.
5. The method of claim 4 , wherein at least one of the first and second identifier-types are further associated with at least one of a store, a merchant, and a location and wherein the non-URI-specific price analysis produces results associated therewith.
6. The method of claim 1 , wherein the price attributes comprise at least one of a time, a product name, a price, a quantity, a unit of measurement, a merchant name, a store name, a bundle detail, and a location.
7. The method of claim 1 , wherein the product attributes comprise at least one of a title, a brand, a category in the category taxonomy, a color, a product type, and a size.
8. The method of claim 1 , further comprising receiving a user query and executing the query relative to the first and/or second file structures.
9. The method of claim 1 , further comprising receiving a user query, a schedule for executing the query, executing the query at the scheduled time on the first and/or second file structures, and alerting the user regarding the result of the query.
10. The method of claim 1 , wherein the first and second file structures may be searched by at least one of the first identifier-type, the second identifier-types, or a category in the category taxonomy.
11. The method of claim 1 , wherein the first and second categories are the same.
12. The method of claim 1 , wherein the meta-analysis determines the volatility of price changes over time for each of the first and second products.
13. The method of claim 12 , wherein the volatility is determined by counting the number of price changes in a time period according to at least one of the first identifier-type, the second identifier-type, a brand, a region, a price band, and a category in the category taxonomy.
14. The method of claim 1 , wherein the meta-analysis determines whether one of the products is a substitute for the other.
15. The method of claim 14 , wherein whether one of the products is a substitute for the other is determined by determining if the first and second products are in the same category in the category taxonomy and by determining whether the first and second products are within a price band within the category.
16. The method of claim 15 , further comprising determining if the first and second products share at least fifty-percent of the same product attributes.
17. The method of claim 1 , wherein the meta-analysis determines predictions regarding the future prices for the products.
18. The method of claim 17 , wherein the predictions are determined by obtaining the last price of at least one of the products from the URI-specific core price associated therewth, calculating or obtaining first and second linear regression parameters, multiplying the second linear regression parameter by the last price and adding this to the first linear regression parameter.
19. The method of claim 1 , wherein the price and product attributes comprise at least one of a store, merchant, or brand and the meta-analysis determines products associated therewith and competitors thereof.
20. The method of claim 1 , wherein the meta-analysis determines whether a price change for the first product leads or follows a price change for the second product.
21. The method of claim 1 , wherein the meta-analysis determines whether the first or second product is a premium product relative to the other.
22. The method of claim 1 , wherein the meta-analysis determines the price ranges in which the products are offered for sale.
23. A webpage information processing computing apparatus, the apparatus comprising a processor and a memory storing instructions that, when executed by the processor, configure the apparatus to:
receive a first and a second set of price and product attributes for a first product, which attributes comprise:
a first identifier of a first identifier-type derived from a URI which links to a webpage offering the product for sale, a second identifier of a second identifier-type assigned to all instances of the product as offered for sale at any URI, and a first category in a category taxonomy;
perform a first URI-specific price analysis of price values in the first and second sets of price and product attributes to identify changes in price for the first product and associating the result with the first identifier of the first identifier-type and save the result as a first URI-specific core price result;
receive a third and fourth set of price and product attributes for a second product, which attributes comprise:
a third identifier of the first identifier-type, a fourth identifier of the second identifier-type, and a second category in the category taxonomy;
perform a second URI-specific price analysis of price values in the third and fourth sets of price and product attributes to identify changes in price for the second product and associate the result with the third identifier of the first identifier-type and save the result as a second URI-specific core price result;
when the second identifier and the fourth identifiers are the same, perform a first non-URI-specific price analysis utilizing the first and second URI-specific core price results to identify changes in price according to the second identifier-type and save the result as a first non-URI-specific core price result;
save and index the output of the URI-specific and non-URI-specific price analyses in a first file structure and make the first file structure available to be searched substantially as the sets of product and price attributes are received;
perform a meta-analysis utilizing the URI-specific and non-URI-specific core price results to identify what price and product attributes across the datasets are associated with the changes in price; and
save and index the output of the meta-analysis as a second file structure and make the second file structure available to be searched.
24. A non-transient computer-readable storage medium having stored thereon instructions that, when executed by a processor, configure the processor to:
receive a first and a second set of price and product attributes for a first product, which attributes comprise:
a first identifier of a first identifier-type derived from a URI which links to a webpage offering the product for sale, a second identifier of a second identifier-type assigned to all instances of the product as offered for sale at any URI, and a first category in a category taxonomy;
perform a first URI-specific price analysis of price values in the first and second sets of price and product attributes to identify changes in price for the first product and associating the result with the first identifier of the first identifier-type and save the result as a first URI-specific core price result;
receive a third and fourth set of price and product attributes for a second product, which attributes comprise:
a third identifier of the first identifier-type, a fourth identifier of the second identifier-type, and a second category in the category taxonomy;
perform a second URI-specific price analysis of price values in the third and fourth sets of price and product attributes to identify changes in price for the second product and associate the result with the third identifier of the first identifier-type and save the result as a second URI-specific core price result;
when the second identifier and the fourth identifiers are the same, perform a first non-URI-specific price analysis utilizing the first and second URI-specific core price results to identify changes in price according to the second identifier-type and save the result as a first non-URI-specific core price result;
save and index the output of the URI-specific and non-URI-specific price analyses in a first file structure and make the first file structure available to be searched substantially as the sets of product and price attributes are received;
perform a meta-analysis utilizing the URI-specific and non-URI-specific core price results to identify what price and product attributes across the datasets are associated with the changes in price; and
save and index the output of the meta-analysis as a second file structure and make the second file structure available to be searched.
Priority Applications (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/951,244 US9047614B2 (en) | 2012-07-25 | 2013-07-25 | Adaptive gathering of structured and unstructured data system and method |
US13/951,248 US20140032264A1 (en) | 2012-07-25 | 2013-07-25 | Data refining engine for high performance analysis system and method |
US14/656,171 US11514496B2 (en) | 2012-07-25 | 2015-03-12 | Summarization and personalization of big data method and apparatus |
US14/656,554 US20150287060A1 (en) | 2012-07-25 | 2015-03-12 | Product score method and system |
US14/800,524 US20150356487A1 (en) | 2013-07-25 | 2015-07-15 | Product score method and system |
US14/935,332 US10169802B2 (en) | 2012-07-25 | 2015-11-06 | Data refining engine for high performance analysis system and method |
US16/232,963 US20190205963A1 (en) | 2012-07-25 | 2018-12-26 | Data refining engine for high performance analysis system and method |
US17/973,389 US11922475B1 (en) | 2013-07-25 | 2022-10-25 | Summarization and personalization of big data method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261675492P | 2012-07-25 | 2012-07-25 | |
US13/951,248 US20140032264A1 (en) | 2012-07-25 | 2013-07-25 | Data refining engine for high performance analysis system and method |
Related Parent Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/951,244 Continuation-In-Part US9047614B2 (en) | 2012-07-25 | 2013-07-25 | Adaptive gathering of structured and unstructured data system and method |
US13/951,244 Continuation US9047614B2 (en) | 2012-07-25 | 2013-07-25 | Adaptive gathering of structured and unstructured data system and method |
US14/726,707 Continuation-In-Part US9466066B2 (en) | 2012-07-25 | 2015-06-01 | Adaptive gathering of structured and unstructured data system and method |
Related Child Applications (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/951,244 Continuation-In-Part US9047614B2 (en) | 2012-07-25 | 2013-07-25 | Adaptive gathering of structured and unstructured data system and method |
US14/656,554 Continuation-In-Part US20150287060A1 (en) | 2012-07-25 | 2015-03-12 | Product score method and system |
US14/656,171 Continuation-In-Part US11514496B2 (en) | 2012-07-25 | 2015-03-12 | Summarization and personalization of big data method and apparatus |
US14/726,707 Continuation-In-Part US9466066B2 (en) | 2012-07-25 | 2015-06-01 | Adaptive gathering of structured and unstructured data system and method |
US14/935,332 Continuation-In-Part US10169802B2 (en) | 2012-07-25 | 2015-11-06 | Data refining engine for high performance analysis system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
US20140032264A1 true US20140032264A1 (en) | 2014-01-30 |
Family
ID=49995732
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/951,244 Active US9047614B2 (en) | 2012-07-25 | 2013-07-25 | Adaptive gathering of structured and unstructured data system and method |
US13/951,248 Abandoned US20140032264A1 (en) | 2012-07-25 | 2013-07-25 | Data refining engine for high performance analysis system and method |
US14/726,707 Active US9466066B2 (en) | 2012-07-25 | 2015-06-01 | Adaptive gathering of structured and unstructured data system and method |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/951,244 Active US9047614B2 (en) | 2012-07-25 | 2013-07-25 | Adaptive gathering of structured and unstructured data system and method |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/726,707 Active US9466066B2 (en) | 2012-07-25 | 2015-06-01 | Adaptive gathering of structured and unstructured data system and method |
Country Status (6)
Country | Link |
---|---|
US (3) | US9047614B2 (en) |
CN (2) | CN104685490B (en) |
GB (1) | GB2518117A (en) |
IL (2) | IL236890B (en) |
IN (2) | IN2015DN00474A (en) |
WO (2) | WO2014018781A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140279222A1 (en) * | 2013-03-15 | 2014-09-18 | Aggregate Shopping Corp. | Checkout System and Method for Purchasing Multiple Items On-Line |
WO2015138789A1 (en) * | 2014-03-12 | 2015-09-17 | Indix Corporation | Summarization and personalization of big data method and apparatus |
US9224167B2 (en) | 2012-06-13 | 2015-12-29 | Aggregate Shopping Corp. | System and method for aiding user in online searching and purchasing of multiple items |
US9384504B2 (en) | 2012-06-13 | 2016-07-05 | Aggregate Shopping Corp. | System and method for a user to perform online searching and purchasing of multiple items |
CN110618999A (en) * | 2019-08-01 | 2019-12-27 | 平安科技(深圳)有限公司 | Data query method and device, computer storage medium and electronic equipment |
US10936637B2 (en) | 2016-04-14 | 2021-03-02 | Hewlett Packard Enterprise Development Lp | Associating insights with data |
US11017426B1 (en) | 2013-12-20 | 2021-05-25 | BloomReach Inc. | Content performance analytics |
US11922475B1 (en) | 2013-07-25 | 2024-03-05 | Avalara, Inc. | Summarization and personalization of big data method and apparatus |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8898630B2 (en) | 2011-04-06 | 2014-11-25 | Media Direct, Inc. | Systems and methods for a voice- and gesture-controlled mobile application development and deployment platform |
US8261231B1 (en) | 2011-04-06 | 2012-09-04 | Media Direct, Inc. | Systems and methods for a mobile application development and development platform |
US8978006B2 (en) | 2011-04-06 | 2015-03-10 | Media Direct, Inc. | Systems and methods for a mobile business application development and deployment platform |
US9134964B2 (en) | 2011-04-06 | 2015-09-15 | Media Direct, Inc. | Systems and methods for a specialized application development and deployment platform |
US10169802B2 (en) | 2012-07-25 | 2019-01-01 | Indix Corporation | Data refining engine for high performance analysis system and method |
US20140244391A1 (en) * | 2013-02-28 | 2014-08-28 | talktUp LLC | Online advertising method and system |
US20140281886A1 (en) * | 2013-03-14 | 2014-09-18 | Media Direct, Inc. | Systems and methods for creating or updating an application using website content |
WO2015138787A1 (en) * | 2014-03-12 | 2015-09-17 | Indix Corporation | Product score method and system |
US9741018B2 (en) * | 2014-10-28 | 2017-08-22 | Ebay Inc. | Systems and methods for extracting similar group elements |
US10223453B2 (en) * | 2015-02-18 | 2019-03-05 | Ubunifu, LLC | Dynamic search set creation in a search engine |
US9519766B1 (en) | 2015-09-07 | 2016-12-13 | Voicebox Technologies Corporation | System and method of providing and validating enhanced CAPTCHAs |
WO2017044415A1 (en) | 2015-09-07 | 2017-03-16 | Voicebox Technologies Corporation | System and method for eliciting open-ended natural language responses to questions to train natural language processors |
US9401142B1 (en) * | 2015-09-07 | 2016-07-26 | Voicebox Technologies Corporation | System and method for validating natural language content using crowdsourced validation jobs |
US9448993B1 (en) | 2015-09-07 | 2016-09-20 | Voicebox Technologies Corporation | System and method of recording utterances using unmanaged crowds for natural language processing |
US9734138B2 (en) | 2015-09-07 | 2017-08-15 | Voicebox Technologies Corporation | System and method of annotating utterances based on tags assigned by unmanaged crowds |
US20170076222A1 (en) * | 2015-09-14 | 2017-03-16 | International Business Machines Corporation | System and method to cognitively process and answer questions regarding content in images |
CN106844402B (en) * | 2015-12-04 | 2020-08-28 | 阿里巴巴集团控股有限公司 | Data processing method and device |
US10552898B2 (en) | 2016-11-16 | 2020-02-04 | Microsoft Technology Licensing, Llc | User trainable user interface page classification system |
CN108446287A (en) * | 2017-02-16 | 2018-08-24 | 北京国双科技有限公司 | Web page crawl method and device |
US11880414B2 (en) * | 2017-08-07 | 2024-01-23 | Criteo Technology Sas | Generating structured classification data of a website |
US11803664B2 (en) * | 2018-10-09 | 2023-10-31 | Ebay Inc. | Distributed application architectures using blockchain and distributed file systems |
CN111046083A (en) * | 2019-12-13 | 2020-04-21 | 北京中电普华信息技术有限公司 | Data analysis method and system and big data platform |
CN112633810B (en) * | 2020-12-29 | 2022-11-29 | 上海拓扑丝路供应链科技有限公司 | Price conversion method based on international freight service standard price conversion device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040143600A1 (en) * | 1993-06-18 | 2004-07-22 | Musgrove Timothy Allen | Content aggregation method and apparatus for on-line purchasing system |
US20070073758A1 (en) * | 2005-09-23 | 2007-03-29 | Redcarpet, Inc. | Method and system for identifying targeted data on a web page |
US20080059348A1 (en) * | 2006-09-05 | 2008-03-06 | Brian Scott Glassman | Web Site Valuation |
US7707053B2 (en) * | 2003-01-10 | 2010-04-27 | Google, Inc. | Determining a minimum price |
US8195559B2 (en) * | 2001-12-13 | 2012-06-05 | Bgc Partners, Inc. | System and method for determining an index for an item based on market information |
Family Cites Families (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6185614B1 (en) * | 1998-05-26 | 2001-02-06 | International Business Machines Corp. | Method and system for collecting user profile information over the world-wide web in the presence of dynamic content using document comparators |
US6381597B1 (en) | 1999-10-07 | 2002-04-30 | U-Know Software Corporation | Electronic shopping agent which is capable of operating with vendor sites which have disparate formats |
US6785671B1 (en) * | 1999-12-08 | 2004-08-31 | Amazon.Com, Inc. | System and method for locating web-based product offerings |
WO2001050320A1 (en) * | 1999-12-30 | 2001-07-12 | Auctionwatch.Com, Inc. | Minimal impact crawler |
US20010047404A1 (en) * | 2000-05-24 | 2001-11-29 | Takashi Suda | Apparatus for managing web site addresses |
WO2002010961A2 (en) | 2000-07-25 | 2002-02-07 | Zilliant, Inc. | System and method for product price tracking and analysis |
AU2001212315A1 (en) | 2000-10-24 | 2002-05-06 | Netscape Communications Corporation | Method and apparatus for recognizing electronic commerce web pages and sites |
US20040168124A1 (en) * | 2001-06-07 | 2004-08-26 | Michael Beisiegel | System and method of mapping between software objects & structured language element-based documents |
US7912753B2 (en) * | 2001-06-27 | 2011-03-22 | Hewlett-Packard Development Company, L.P. | System and method for controlling the presentation of advertisements |
US7711775B2 (en) * | 2001-10-24 | 2010-05-04 | Groove Networks, Inc. | Method and apparatus for managing software component downloads and updates |
US7152778B2 (en) * | 2003-06-23 | 2006-12-26 | Bitstock | Collecting and valuating used items for sale |
US7287279B2 (en) | 2004-10-01 | 2007-10-23 | Webroot Software, Inc. | System and method for locating malware |
US7617193B2 (en) | 2005-03-28 | 2009-11-10 | Elan Bitan | Interactive user-controlled relevance ranking retrieved information in an information search system |
US7685133B2 (en) | 2006-05-24 | 2010-03-23 | The United States Of America As Represented By The Secretary Of The Navy | System and method for automated discovery, binding, and integration of non-registered geospatial web services |
US8090622B2 (en) * | 2007-09-21 | 2012-01-03 | Microsoft Corporation | Preferred items list management |
KR101074578B1 (en) | 2008-10-01 | 2011-10-17 | 엔에이치엔(주) | Method and Apparatus for Managing Search Database |
US8438076B2 (en) | 2009-02-13 | 2013-05-07 | Y-Check, LLC | Price comparison process and system |
US20100235329A1 (en) * | 2009-03-10 | 2010-09-16 | Sandisk Il Ltd. | System and method of embedding second content in first content |
WO2011037691A1 (en) * | 2009-09-25 | 2011-03-31 | National Electronics Warranty, Llc | Service plan web crawler and dynamic mapper |
US20110087647A1 (en) * | 2009-10-13 | 2011-04-14 | Alessio Signorini | System and method for providing web search results to a particular computer user based on the popularity of the search results with other computer users |
CN101944221A (en) * | 2010-09-07 | 2011-01-12 | 上海腾唐数码科技有限公司 | Price comparing network shopping system and method |
CN102361484B (en) * | 2011-07-05 | 2012-11-28 | 上海交通大学 | Passive network performance measuring system and page identification method thereof |
-
2013
- 2013-07-25 CN CN201380049781.7A patent/CN104685490B/en active Active
- 2013-07-25 US US13/951,244 patent/US9047614B2/en active Active
- 2013-07-25 IN IN474DEN2015 patent/IN2015DN00474A/en unknown
- 2013-07-25 IN IN473DEN2015 patent/IN2015DN00473A/en unknown
- 2013-07-25 WO PCT/US2013/052108 patent/WO2014018781A1/en active Application Filing
- 2013-07-25 GB GB1500830.3A patent/GB2518117A/en not_active Withdrawn
- 2013-07-25 WO PCT/US2013/052106 patent/WO2014018780A1/en active Application Filing
- 2013-07-25 CN CN201380049798.2A patent/CN104662529B/en active Active
- 2013-07-25 US US13/951,248 patent/US20140032264A1/en not_active Abandoned
-
2015
- 2015-01-22 IL IL236890A patent/IL236890B/en active IP Right Grant
- 2015-06-01 US US14/726,707 patent/US9466066B2/en active Active
-
2017
- 2017-11-23 IL IL255873A patent/IL255873B/en active IP Right Grant
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040143600A1 (en) * | 1993-06-18 | 2004-07-22 | Musgrove Timothy Allen | Content aggregation method and apparatus for on-line purchasing system |
US8195559B2 (en) * | 2001-12-13 | 2012-06-05 | Bgc Partners, Inc. | System and method for determining an index for an item based on market information |
US7707053B2 (en) * | 2003-01-10 | 2010-04-27 | Google, Inc. | Determining a minimum price |
US20070073758A1 (en) * | 2005-09-23 | 2007-03-29 | Redcarpet, Inc. | Method and system for identifying targeted data on a web page |
US7912755B2 (en) * | 2005-09-23 | 2011-03-22 | Pronto, Inc. | Method and system for identifying product-related information on a web page |
US20080059348A1 (en) * | 2006-09-05 | 2008-03-06 | Brian Scott Glassman | Web Site Valuation |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9224167B2 (en) | 2012-06-13 | 2015-12-29 | Aggregate Shopping Corp. | System and method for aiding user in online searching and purchasing of multiple items |
US9384504B2 (en) | 2012-06-13 | 2016-07-05 | Aggregate Shopping Corp. | System and method for a user to perform online searching and purchasing of multiple items |
US20140279222A1 (en) * | 2013-03-15 | 2014-09-18 | Aggregate Shopping Corp. | Checkout System and Method for Purchasing Multiple Items On-Line |
US11922475B1 (en) | 2013-07-25 | 2024-03-05 | Avalara, Inc. | Summarization and personalization of big data method and apparatus |
US11017426B1 (en) | 2013-12-20 | 2021-05-25 | BloomReach Inc. | Content performance analytics |
WO2015138789A1 (en) * | 2014-03-12 | 2015-09-17 | Indix Corporation | Summarization and personalization of big data method and apparatus |
US10936637B2 (en) | 2016-04-14 | 2021-03-02 | Hewlett Packard Enterprise Development Lp | Associating insights with data |
CN110618999A (en) * | 2019-08-01 | 2019-12-27 | 平安科技(深圳)有限公司 | Data query method and device, computer storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
US9047614B2 (en) | 2015-06-02 |
US20150339682A1 (en) | 2015-11-26 |
US9466066B2 (en) | 2016-10-11 |
IL255873A (en) | 2018-01-31 |
WO2014018781A1 (en) | 2014-01-30 |
GB2518117A (en) | 2015-03-11 |
IL236890B (en) | 2018-08-30 |
CN104662529B (en) | 2019-03-29 |
CN104685490A (en) | 2015-06-03 |
CN104662529A (en) | 2015-05-27 |
WO2014018780A1 (en) | 2014-01-30 |
IL255873B (en) | 2019-02-28 |
CN104685490B (en) | 2017-09-15 |
US20140032263A1 (en) | 2014-01-30 |
IN2015DN00474A (en) | 2015-06-26 |
GB201500830D0 (en) | 2015-03-04 |
IN2015DN00473A (en) | 2015-06-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20140032264A1 (en) | Data refining engine for high performance analysis system and method | |
US10452662B2 (en) | Determining search result rankings based on trust level values associated with sellers | |
CN105765573B (en) | Improvements in website traffic optimization | |
US9916589B2 (en) | Advertisement selection using multivariate behavioral model | |
US8990241B2 (en) | System and method for recommending queries related to trending topics based on a received query | |
US20190205963A1 (en) | Data refining engine for high performance analysis system and method | |
US20160364736A1 (en) | Method and system for providing business intelligence based on user behavior | |
US20190012392A1 (en) | Method and device for pushing information | |
US20140012842A1 (en) | Indexing Semantic User Profiles for Targeted Advertising | |
US9251516B2 (en) | Systems and methods for electronic distribution of job listings | |
US20130238422A1 (en) | Automated Multivariate Behavioral Prediction | |
JP6073349B2 (en) | Generate ad campaign | |
WO2013025874A2 (en) | Page reporting | |
US9367638B2 (en) | Surfacing actions from social data | |
US20120124070A1 (en) | Recommending queries according to mapping of query communities | |
US20150310529A1 (en) | Web-behavior-augmented recommendations | |
US10235459B1 (en) | Creating entries in at least one of a personal cache and a personal index | |
CN110347922B (en) | Recommendation method, device, equipment and storage medium based on similarity | |
US9367583B1 (en) | Systems and methods of generating content performance metrics | |
JP6062514B2 (en) | Revenue index value generation system and revenue index value generation method | |
TWI639093B (en) | Object set and processing method and device thereof | |
GB2556970A (en) | Method and system for providing content | |
JP6382139B2 (en) | Information processing apparatus, information processing method, and program | |
Beer et al. | Implementation of context-aware item recommendation through MapReduce data aggregation | |
KR20220104098A (en) | Big data analysis system and method of search service for marketing company thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INDIX CORPORATION, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KALIKIVAYI, SATYANARAYANA RAO;PARTHASARATHY, SANJAY;SELVAM, PRAVEEN;AND OTHERS;SIGNING DATES FROM 20120917 TO 20120921;REEL/FRAME:031891/0447 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: AVALARA, INC., WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INDIX CORPORATION;REEL/FRAME:050068/0182 Effective date: 20190205 |