US20120005172A1 - Information searching apparatus, information managing apparatus, information searching method, information managing method, and computer product - Google Patents
Information searching apparatus, information managing apparatus, information searching method, information managing method, and computer product Download PDFInfo
- Publication number
- US20120005172A1 US20120005172A1 US13/232,089 US201113232089A US2012005172A1 US 20120005172 A1 US20120005172 A1 US 20120005172A1 US 201113232089 A US201113232089 A US 201113232089A US 2012005172 A1 US2012005172 A1 US 2012005172A1
- Authority
- US
- United States
- Prior art keywords
- compressed
- file
- archives
- compressed files
- files
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/1727—Details of free space management performed by the file system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/172—Caching, prefetching or hoarding of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24578—Query processing with adaptation to user needs using ranking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
- G06F12/122—Replacement control using replacement algorithms of the least frequently used [LFU] type, e.g. with individual count value
Definitions
- the dictionary is divided into numerous files and numerous disc areas, requiring a very long time for opening processes and reading processes to cause the dictionary to reside in a cache, and fragmentation of the storage area occurs in the cache resulting in a problem of the storage size increasing.
- a computer-readable recording medium stores therein an information searching program that causes a computer having access to archives including a compressed file group of compressed files that are to be searched and that have described therein character strings.
- the information searching program causes the computer to execute sorting the compressed files in descending order of access frequency of the compressed files; combining the compressed files in descending order of access frequency after the sorting at the sorting such that a storage capacity of a cache area for a storage area that stores therein the compressed file group is not exceeded by a combined size of the compressed files combined; and writing, from the storage area into the cache area, the compressed files combined at the combining, the compressed files combined being written prior to a search of the compressed files combined.
- FIG. 1 is a block diagram of an information search apparatus according to a first embodiment
- FIG. 2 is a schematic depicting stored content of archives
- FIG. 3 is a schematic for explaining relations between a compressed file and files to be searched
- FIG. 4 is a schematic of a single-character appearance map M 1 ;
- FIG. 5 is a schematic of a consecutive-character appearance map
- FIG. 6 is a schematic of a compression parameter
- FIG. 7 is a schematic of a file path table
- FIG. 8 is a schematic of a character appearance map linking table
- FIG. 9 is a schematic of the file path linking table 212 ;
- FIG. 10 is a schematic of a virtual archive capacity table
- FIG. 11 is a block diagram of a functional configuration of an information searching apparatus
- FIG. 12 is a schematic of an example of strict selection of the compressed file using the character appearance map
- FIG. 13 is a flowchart of a virtual archives setting process
- FIG. 14 is a flowchart of an information search process
- FIG. 15 is a schematic of a system configuration of a searching system according to a second embodiment
- FIG. 16 is a schematic for describing a sharing of archives
- FIG. 17 is a schematic for describing an allocation process for new archives
- FIG. 18 is a schematic of a compression symbol table and the compression parameter of archives 200 - 1 ;
- FIG. 19 is a schematic of a Huffman tree generated from the compression symbol table of the archives 200 - 1 ;
- FIG. 20 is a schematic of a compression symbol table and the compression parameter of archives
- FIG. 21 is a schematic of the Huffman tree generated from the compression symbol table of the archives 200 - 2 ;
- FIG. 22 is a schematic of a compression symbol table and the compression parameter of integrated archives
- FIG. 23 is a schematic of a common Huffman tree generated from the compression symbol table of the integrated archives
- FIG. 24 is a schematic of the stored contents of the archives 200 - 1 ;
- FIG. 25 is a schematic of the single-character appearance map of the archives 200 - 1 ;
- FIG. 26 is a schematic of the consecutive-character appearance map of the archives 200 - 1 ;
- FIG. 27 is a schematic of the compression parameter of the archives 200 - 1 ;
- FIG. 28 is a schematic of a file path table of the archives 200 - 1 ;
- FIG. 29 is a schematic of a character appearance map linking table of the archives 200 - 1 ;
- FIG. 30 is a schematic of a file path linking table of the archives 200 - 1 ;
- FIG. 31 is a schematic of the stored contents of the archives 200 - 2 ;
- FIG. 32 is a schematic of the single-character appearance map of the archives 200 - 2 ;
- FIG. 33 is a schematic of the consecutive-character appearance map of the archives 200 - 2 ;
- FIG. 34 is a schematic of the compression parameter of the archives 200 - 2 ;
- FIG. 35 is a schematic of a file path table 222 b of the archives 200 - 2 ;
- FIG. 36 is a schematic of a character appearance map linking table of the archives 200 - 2 ;
- FIG. 37 is a schematic of a file path linking table of the archives 200 - 2 ;
- FIG. 38 is a schematic for explaining an example of common parameter generation
- FIG. 39 is a schematic for explaining a reconfiguration of the character appearance map linking tables
- FIG. 40 is a schematic for explaining reconfiguration of the file path linking tables
- FIG. 41 is a schematic for explaining reconfiguration of the file path tables
- FIG. 42 is a schematic for explaining reconfiguration of the single-character appearance maps
- FIG. 43 is a schematic for explaining reconfiguration of the consecutive-character appearance maps
- FIG. 44 is a schematic for explaining reconfiguration of the compressed file groups
- FIG. 45 is a schematic of the stored contents of new archives A 1 ;
- FIG. 46 is a schematic of the stored contents of new archives A 2 ;
- FIG. 47 is a block diagram of a functional configuration of the master server (information managing apparatus).
- FIGS. 48 and 49 are flowcharts of an archives reconfiguring process by the master server
- FIG. 50 is a schematic for explaining an example of generating a converting Huffman tree
- FIG. 51 is a schematic for explaining a second example of generating a converting Huffman tree
- FIG. 52 is a schematic of a first converting Huffman tree
- FIG. 53 is a schematic of a second converting Huffman tree
- FIG. 54 is a block diagram of a functional configuration of the master server (information managing apparatus) according to a third embodiment
- FIG. 55 is a flowchart of the archives reconfiguring process (latter half) by the master server.
- FIG. 56 is a flowchart of a compressed symbol setting process to the Huffman tree.
- archiving In a narrow sense, archiving generally is a technique of consolidating multiple folders and numerous files of the folders, into one file. Archives are transmitted and received as email attachments, and are used for such purposes as data exchange. In a broad sense, archives are also introduced as an accessory technique of compression because often the archives are combined with a compressing technique. With the prevalence of the Internet, archiving technology advances and a wide variety of tools have been developed combining operability and compression schemes. The advancement of hardware such as a personal computer is remarkable and, especially, the increased speed of central processing units (CPUs) and the increased capacity of recording media such as a memory, a hard disc, and an optical disc are conspicuous.
- CPUs central processing units
- recording media such as a memory, a hard disc, and an optical disc are conspicuous.
- archiving is technology developed mainly in the fields of data storage, information transmission, and information exchange and is characterized by compression and consolidation into one file. When a file is used, expansion (or temporary expansion) is executed. Archives such as ZIP have no full text search function. The search function becomes more important as the number of files increases.
- the time required for a file access process such as reading is reduced by causing a file having a high access frequency to reside in a cache memory when a full text search in a compressed file is implemented.
- FIG. 1 is a block diagram of an information search apparatus according to the first embodiment.
- the information search apparatus includes a central processing unit (CPU) 101 , a read-only memory (ROM) 102 , a random access memory (RAM) 103 , a magnetic disc drive 104 , a magnetic disc 105 , a optical disc drive 106 , a removable recording medium such an optical disc 107 , a display 108 , an interface (I/F) 109 , a keyboard 110 , a mouse 111 , a scanner 112 , and a printer 113 , connected to one another by way of a bus 100 .
- CPU central processing unit
- ROM read-only memory
- RAM random access memory
- magnetic disc drive 104 a magnetic disc 105
- optical disc drive 106 a removable recording medium such an optical disc 107
- display 108 a display 108
- an interface (I/F) 109 a keyboard 110 , a mouse 111 , a scanner 112 , and
- the CPU 101 governs overall control of the information search apparatus.
- the ROM 102 stores therein programs such as a boot program.
- the RAM 103 is used as a work area of the CPU 101 .
- the magnetic disc drive 104 under the control of the CPU 101 , controls reading/writing of data from/to the magnetic disc 105 .
- the magnetic disc 105 stores therein the data written under control of the magnetic disc drive 104 .
- the optical disc drive 106 under the control of the CPU 101 , controls reading/writing of data from/to the optical disc 107 .
- the optical disc 107 stores therein the data written under control of the optical disc drive 106 , the data being read by a computer.
- the display 108 displays a cursor, an icon, a tool box, and data such as document, image, and function information.
- the display 108 may be, for example, a cathode ray tube (CRT), a thin-film-transistor (TFT) liquid crystal display, or a plasma display.
- CTR cathode ray tube
- TFT thin-film-transistor
- the I/F 109 is connected to a network 114 such as a local area network (LAN), a wide area network (WAN), and the Internet through a communications line and is connected to other devices by way of the network 114 .
- the I/F 109 manages the network 114 and an internal interface, and controls the input and output of data from and to external devices.
- the I/F 109 may be, for example, a modem or a local area network (LAN) adapter.
- the keyboard 110 is equipped with keys for the input of characters, numerals, and various instructions, and data is entered through the keyboard 110 .
- the keyboard 110 may be a touch-panel input pad or a numeric keypad.
- the mouse 111 performs cursor movement, range selection, and movement, size change, etc., of a window.
- the mouse 111 may be a trackball or a joystick provided the trackball or joystick has similar functions as a pointing device.
- the scanner 112 optically reads an image and takes in the image data into the information search apparatus.
- the scanner 112 may have an optical character recognition (OCR) function.
- OCR optical character recognition
- the printer 113 prints image data and document data.
- the printer 113 may be, for example, a laser printer or an ink jet printer.
- FIG. 2 is a schematic depicting stored contents of archives.
- the archives are stored in a storage area such as the RAM 103 or the magnetic disc 105 depicted in FIG. 1 .
- Archives 200 include a library area 201 , a management area 202 , and a data area 203 .
- the library area 201 stores therein a character appearance map linking table 211 , a file path linking table 212 , and a virtual archive capacity table 213 .
- the management area 202 stores therein a compression parameter 221 , a file path table 222 , and a character appearance map M (a single-character appearance map M 1 and a consecutive-character appearance map M 2 ).
- the data area 203 stores therein a compressed file group f (compressed files f 1 to fn).
- the archives 200 are stored in the storage area 230 and compressed files from the head to a compressed file f′ are stored in a cache area 240 .
- the cache area 240 is a storage area 230 that is determined relative to the storage area 230 of the archives 200 and that is an area capable of being accessed at a higher speed than the storage area 230 of the archives 200 .
- the cache area 240 is provided on a main memory heap area, etc.
- the cache area further stores therein some or all of character appearance maps, file paths, and virtual archives.
- FIG. 3 is a schematic for explaining relations between a compressed file fi and files to be searched.
- “n” compressed files f 1 to fn are compressed using a common Huffman tree and are expanded using the Huffman tree.
- the expanded file group to be searched is a file group that has described therein, for example character strings such as a dictionary or a glossary.
- Each file is described in a computer readable language such as HyperText Markup Language (HTML) or Extensible Markup Language (XML).
- HTML HyperText Markup Language
- XML Extensible Markup Language
- character strings within thick brackets are headwords.
- FIG. 4 is a schematic of the single-character appearance map M 1 .
- the single-character appearance map M 1 includes a bit row for each character. Bits in the bit row are arranged sequentially according to bit number.
- a bit number “i” corresponds to a file number “i” of a compressed file.
- “1” indicates that a given character is present and “0” indicates that the given character is not present.
- FIG. 5 is a schematic of the consecutive-character appearance map M 2 .
- Consecutive characters are a string of characters. In the example, two consecutive characters are exemplified; however, three or more consecutive characters may be employed.
- the map format is identical to that of the single-character appearance map M 1 depicted in FIG. 4 .
- FIG. 6 is a schematic of the compression parameter 221 .
- the compression parameter 221 is a table correlating characters/consecutive characters described in a file group to be searched and formed by expanding a compressed file group f, with the frequency of appearance of each character. According to the compression parameter 221 , a Huffman tree is generated to compress the file group to be searched into the compressed file group f.
- FIG. 7 is a schematic of the file path table 222 .
- the file path table 222 includes description of a path (file path) to the compressed file fi. More specifically, for example, for each file ID, the table 222 correlates a file path to a compressed file fi, the headword described in the file that is to be searched and formed by expanding the compressed file fi, the address of the compressed file fi, and the size of the compressed file fi.
- a file ID is information that uniquely identifies a compressed file fi.
- a reference symbol allocated to a compressed file is a “file ID”.
- a file path of a compressed file f 23 having a file ID of “f23” is “honmon ⁇ file23.html” having a headword of “ ” (see FIG. 3 ).
- a resident flag described hereinafter is set to be “1”
- a file path to the cache area 240 (for example, “cash ⁇ file23.html”) is written and, when the flag is returned to “0”, the file path to the cache area 240 is deleted.
- FIG. 8 is a schematic of the character appearance map linking table 211 .
- the character appearance map linking table 211 includes the address, the size, the access frequency, and the resident flag for each bit in the bit row of each character in the character appearance map M.
- the “address” is an address indicative of the area in which a given compressed file corresponding to a given bit number in a bit row on the character appearance map M is stored.
- the “size” is the size of a given compressed file. For example, a compressed file that corresponds to the bit number “i” is the compressed file fi. The address at which this compressed file fi is stored is “adri” and the size of the compressed file fi is “si”.
- the “access frequency” is the degree to which the compressed file fi corresponding to the bit number i is accessed and, in the embodiment, the “access frequency” is the number of accesses. In addition to “the number of accesses”, the access frequency may be represented by probability (the number of accesses of the compressed file fi/the total number of accesses of all the compressed files).
- the “resident flag” is a flag that indicates whether the compressed file fi corresponding to the bit number i resides in the cache area 240 resulting from the compressed file fi being moved from the storage are 230 of the archives 200 to the cache area 240 for the storage area 230 .
- the resident flag When the compressed file fi resides in the cache area 240 , the resident flag is “1”. On the other hand, when the compressed file fi does not reside in the cache area 240 and is stored in the storage area 230 , the resident flag is “0”. When the resident flag is set to be “1”, the address of the cache area 240 is written into the “Address” column and, when the resident flag is returned to “0”, the address of the cache area 240 is deleted.
- FIG. 9 is a schematic of the file path linking table 212 .
- the file path linking table 212 is a table that links a file path and the bit number i. When the resident flag is set to be “1”, the file path to the cache area 240 is written and, when the resident flag is returned to “0”, the file path to the cache area 240 is deleted.
- FIG. 10 is a schematic of the virtual archive capacity table 213 .
- the virtual archives include the compressed file group f′ stored in the cache area 240 of the archives 200 stored in the storage area 230 .
- Tables, etc., in the library area 201 and the management area 220 may be included in the virtual archives.
- FIG. 11 is a block diagram of a functional configuration of an information searching apparatus.
- An information searching apparatus 1100 includes a sorting processing unit 1101 , a combining unit 1102 , a writing unit 1103 , a setting unit 1104 , an input unit 1105 , an identifying unit 1106 , a reading unit 1107 , an expanding unit 1108 , a searching unit 1109 , an output unit 1110 , and an updating unit 1111 .
- Units including the sorting unit 1101 to the input unit 1105 and the updating unit 1111 implement a virtual archive setting function.
- Units including the input unit 1105 to the output unit 1110 implement an information searching function.
- the International Publication No. 2006-123448 describes in detail an information searching function, attributes characterizing the function according to the present embodiment are briefly described.
- functions of the sorting processing unit 1101 to the updating unit 1111 are implemented by, for example, causing the CPU 101 to execute a program stored in the storage area 230 such as the ROM 102 , the RAM 103 , and the magnetic disc 105 depicted in FIG. 1 , or by the I/F 109 .
- the sorting processing unit 1101 has a function of sorting the compressed files in the character appearance map linking table 211 in descending order of the access frequency of each compressed file. This sorting is a process executed before setting the resident flag.
- the combining unit 1102 has a function of combining, in terms of size, the compressed files fi in descending order of access frequency after the sorting by the sorting processing unit 1101 . More specifically, the combination is executed such that the storage capacity of the cache area 240 for the storage area 230 that stores therein the compressed file group f is not exceeded. For example, the compressed files are combined in descending order of the access frequency after the sorting by the sorting processing unit 1101 such that the combined size is the largest combined size that does not exceed the storage capacity of the cache area 240 . By calculating the greatest combined value in this manner, the storage capacity of the cache area 240 can be fully utilized.
- the writing unit 1103 has a function of writing the compressed file group combined by the combining unit 1102 from the storage area 230 into the cache area 240 , prior to a search of the file group.
- the compressed file group f to be written into the cache area 240 may be deleted from the storage area 230 or may remain in the storage area 230 . In this manner, by writing the compressed file group having a high access frequency into the cache area 240 , a faster file access speed can be achieved.
- the setting unit 1104 has a function of setting the resident flag for the compressed file group written into the cache area 240 by the writing unit 1103 . More specifically, for example, the resident flag of the compressed file group written in the cache area 240 in the character appearance map linking table 211 is changed from “0” to “1”. When the resident flag is already “1”, the flag is not changed. For a compressed file that was written into the cache area 240 the previous time and is deleted this time, the resident flag that had been set to “1” is changed to “0”. Thereby, a compressed file having a high access frequency and residing in the cache area 240 can be identified.
- the input unit 1105 has a function of receiving input of a search character string. More specifically, the input unit 1105 receives a search character string input through the use of an input apparatus such as the keyboard depicted in FIG. 1 . For example, the input unit 1105 receives input of a search character string such as “ ”. In addition to search character strings, the input unit 1105 may receive input of search conditions such as forward coincidence and reverse coincidence.
- the identifying unit 1106 has a function of identifying a compressed file that includes all the characters included the search character string received by the input unit 1105 . More specifically, by referring to the character appearance map M, the compressed file group is strictly selected and compressed files having therein all the characters constituting the search character string are obtained. For example, when the search character string is “ ”, the string is disassembled into single characters of “ ”, “ ”, “ ”, and “ ”. Logical multiplication is performed with respect to the bit rows of the single characters “ ”, “ ”, “ ”, and “ ” from the single-character appearance map M 1 and thereby, narrowing down compressed files to be searched to the compressed files corresponding to bit numbers for which the result of the computing by logical multiplication is “1” (strict selection).
- FIG. 12 is a schematic of an example of strict selection of the compressed file fi using the character appearance map M.
- a logical product is computed by logical multiplication (AND).
- AND logical multiplication
- a compressed file that may include the search character string can be identified while the file is in a compressed format.
- the reading unit 1107 has a function of reading, from an area based on the resident flag set by the setting unit 1104 , the compressed file identified by the identifying unit 1106 . More specifically, for example, by referencing the resident flag in the character appearance map linking table 211 , the area storing therein the identified compressed file is identified based on the value of the resident flag.
- the compressed file is not stored in the cache area 240 and is read from the storage area of the archives 200 .
- the file path table is referenced and the corresponding compressed file fi is opened based on the head address and the size of a file ID that coincide in the binary search.
- the character appearance map 211 is referenced and, the head address and the size corresponding to the bit number can be obtained.
- the corresponding compressed file fi can be accessed at a high speed.
- the resident flag is “1”
- it is known that the file is stored in the cache area 240 .
- the compressed file fi is accessible from the cache area 240 , thereby further increasing the speed.
- the expanding unit 1108 has a function of expanding a compressed file read by the reading unit 1107 . More specifically, for example, the read compressed file fi is expanded using the Huffman tree generated based on the compression parameter 221 . Consequently, the read compressed file fi only has to be expanded and therefore, the speed of file accesses can be increased.
- the searching unit 1109 has a function of searching the file expanded by the expanding unit 1108 for a character string that coincides with or is related to a search character string. More specifically, for example, a file to be searched having therein a character string that coincides with the search character string is extracted from the file group that is to be searched and whose files have been expanded. A file to be searched having a character string that includes the search character string in forward coincidence or reverse coincidence is extracted as a related file to be searched. In addition, when a character string co-occurring with the search character string is set, the file to be searched including the co-occurring character string is extracted as a related file to be searched.
- the output unit 1110 has a function of outputting a file to be searched that has been expanded by the expanding unit 1108 . More specifically, the form of output from the output unit 1110 may be, for example, display on a display, output by printing by a printer, transmission to another computer, and storage in the storage area 230 of the information searching apparatus 1100 .
- the output is displayed on a display, the expanded files to be searched may be displayed.
- the names of the expanded files to be searched may be displayed in a list and a user may select the name of one of the expanded files to be searched and the linked file to be searched may be read and displayed on a screen.
- the retrieved file to be searched may be displayed.
- the names of the retrieved files to be searched may be displayed in a list and a user may select the name of one of the files to be searched and the linked file to be searched may be read and displayed on a screen.
- the updating unit 1111 has a function of updating the access frequency of the compressed file when the compressed file is expanded by the expanding unit 1108 . More specifically, for example, when the access frequency is expressed by the number of accesses, one is added to the number of accesses of the compressed file fi that has been expanded. When the access frequency is expressed by probability, one is added to the number of accesses of the expanded file and one is also added to the total number of accesses made to the compressed files f 1 to fn.
- the sorting processing unit 1101 executes the sorting process according to the access frequency of the compressed file group f after the frequency is updated.
- the access frequency of the compressed file fi that tends to be strictly selected based on the character appearance map M increases, thereby enabling a faster expansion speed to be realized at subsequent expansions.
- the updating unit 1111 may update the access frequency for the compressed file fi retrieved by the searching unit 1109 and not for the compressed file fi that has been expanded by the expanding unit 1108 . More specifically, for example, when the access frequency is expressed by the number of accesses, one is added to the number of accesses of the compressed file fi including a file to be searched that has been retrieved. When the access frequency is expressed by probability, one is added to the number of accesses of the compressed file fi including a file to be searched that has been retrieved, and one is also added to the total number of accesses made to the compressed files f 1 to fn.
- the sorting processing unit 1101 executes the sorting process according to the access frequency of the compressed file group after the frequency is updated. Thereby, the access frequency of the compressed file fi that is actually searched is increased and, therefore a faster searching speed can be realized at subsequent searches.
- FIG. 13 is a flowchart of a virtual archives setting process executed by a virtual archive setting function of the information searching apparatus 1100 .
- the sorting processing unit 1101 sorts the compressed files in the character appearance map linking table 211 in descending order of access frequency (step S 1301 ).
- Whether the total size “s(1_k+1)” is s(1_k+1)>Ts is judged (step S 1304 ).
- “Ts” is the maximum storage capacity of the cache area 240 .
- step S 1304 When s(1_k+1) is not s(1_k+1)>Ts (step S 1304 : NO), k is incremented (step S 1305 ) and the procedure returns to step S 1303 . On the other hand, when s(1_k+1) is s(1_k+1)>Ts (step S 1304 : YES), because no more compressed files can be stored in the cache area 240 , the virtual archive capacity table 213 is updated such that the bit numbers, the access frequencies, and the sizes are those of the compressed files having the sort positions 1 to k+1 (step S 1306 ).
- the writing unit 1103 writes the compressed files having the sort positions 1 to k into the cache area 240 (step S 1307 ).
- the compressed files having sort positions after k are deleted from the cache area 240 .
- the setting unit 1104 sets the resident flags of the compressed files having the sort positions 1 to k in the character appearance map linking table 211 to be “ON” (from “0” to “1”) (step S 1308 ).
- the resident flag is set to be “OFF” (from “1” to “0”), ending a series of processing.
- compressed files each having a high access frequency can be set preferentially as the virtual archives and therefore, a faster file accessing speed can be realized.
- FIG. 14 is a flowchart of an information search process executed by an information searching function of the information searching apparatus 1100 .
- the input unit 1105 receives input of a search character string (step S 1401 ).
- the search character string is disassembled into single characters or consecutive characters (hereinafter, “character”) (step S 1402 ).
- the bit row for each disassembled character is extracted from the character appearance map M (step S 1403 ) and for each bit number among the extracted bit rows, a logical product is computed by logical multiplication (step S 1404 ).
- the compressed files fi having logical products of “1” as a result of the computing are identified as compressed files that include the disassembled characters (step S 1405 ). Subsequently, whether unprocessed compressed files fi among the identified compressed files fi are present is judged (step S 1406 ). When an unprocessed compressed file fi is present (step S 1406 : YES), an unprocessed compressed file fi is selected (step S 1407 ) and whether the resident flag is “ON” for the selected compressed file fi is judged (step S 1408 ).
- step S 1408 When the resident flag is “ON” (step S 1408 : YES), the reading unit 1107 transfers the selected compressed file fi directly from the cache area 240 to a register of the CPU 101 (step S 1409 ) and the procedure advances to step S 1411 .
- step S 1408 NO
- the reading unit 1107 reads the selected compressed file fi from the storage area 230 of the archives 200 to the cache area 240 and causes the CPU 101 to read this file, and the procedure advances to step S 1411 (step S 1410 ).
- step S 1411 the expanding unit 1108 executes an expansion process using the Huffman tree based on the compression parameter 221 (step S 1411 ) and the procedure returns to step S 1406 .
- step S 1406 when no unprocessed compressed files fi are present (step S 1406 : NO), the searching unit 1109 searches the expanded files using the search character string (step S 1412 ). The output unit 1110 outputs the result of the search (step S 1413 ). Subsequently, the updating unit 1111 adds one to the access frequency of the corresponding compressed file fi in the character appearance map linking table 211 (step S 1414 ), and a series of processing ends.
- the compressed file fi whose resident flag is set to be “ON” (“1”) is read from the cache area 240 and the expansion process is executed. Therefore, a faster file accessing speed can be realized. Because the access frequency for the compressed file fi is updated each time a search is executed, the compressed file fi written in the cache area 240 can be updated one by one. Therefore, a faster file accessing speed can be realized at subsequent accesses.
- the information searching apparatus 1100 of the first embodiment is applicable to a portable terminal such as a portable telephone, a portable game apparatus, and an electronic dictionary in addition to a personal computer and a search server.
- a second embodiment will be described.
- For a site search on the Internet updating of each site is regularly monitored; a large-scale index is generated based on the summarized data to which morphological analysis is executed; and a full text search is executed.
- a site search on the Internet updating of each site is regularly monitored; a large-scale index is generated based on the summarized data to which morphological analysis is executed; and a full text search is executed.
- increases in the amount of data of a site conventionally, increasing the speed of the monitoring process for each site and increasing throughput, and the scalability of searches by multiple computers are problems.
- the second embodiment realizes faster speeds of addition, merger, and deletion of the archives 200 .
- the second embodiment realizes increased efficiency of the searching speed by dividing a search among slave servers and executing parallel processing, and by substantially equalizing the operating rate of each slave server.
- FIG. 15 is a schematic of a system configuration of a searching system according to the second embodiment.
- a searching system 1500 includes a master server 1501 and slave servers 1502 - 1 to 1502 -N.
- the master server 1501 and each of the slave servers 1502 - 1 to 1502 -N, or the slave servers 1502 - 1 to 1502 -N are mutually communicable through the network 114 .
- the master server 1501 supervises and manages the slave servers 1502 - 1 to 1502 -N.
- Each slave server 1502 -I corresponds to the information searching apparatus 1100 of the first embodiment and a slave server 1502 -I has the virtual archive setting function and the information searching function.
- archives 200 -I retained by a slave server 1502 -I are archives of a Japanese dictionary
- archives 200 -J (J ⁇ I) retained by a slave server 1502 -J are archives of a glossary
- archives 200 -K (K ⁇ I, J) retained by a slave server 1502 -K are archives of a English-Japanese dictionary, and similarly, the types and the publishing companies differ among the sets of archives.
- Each slave server 1502 -I has archives 200 -I that differ as well as a compression parameter 221 -I in the archives 200 -I also differs among the slave servers 1502 -I. Therefore, a Huffman tree h-I retained in each slave server 1502 -I has a structure that also differs.
- a multi-book search that is referred to as “meta-search” is executed with respect to the slave server group 1502 above by providing a common search keyword from the master server 1501 .
- Each slave server 1502 -I returns the search result to the master server 1501 and thereby, the master server 1501 is able to obtain a search result from multiple dictionaries.
- FIG. 16 is a schematic for describing a sharing of archives.
- the master server 1501 collects the sets of archives 200 - 1 and 200 - 2 of the slave servers 1502 - 1 and 1502 - 2 , and Huffman trees h- 1 and h- 2 through the network 114 .
- Integrated archives A formed by aggregating the sets of archives 200 - 1 and 200 - 2 , and a common Huffman tree formed by making the Huffman trees h- 1 and h- 2 in archives 200 - 1 and 200 - 2 common are generated.
- FIG. 17 is a schematic for describing an allocation process for new archives.
- the master server 1501 divides the integrated archives A and transmits the divided sets of archives to the slave servers 1502 - 1 and 1502 - 2 as sets of new archives A 1 and A 2 respectively specific to slave servers 1502 - 1 and 1502 - 2 such that the search processes of the slave servers 1502 - 1 and 1502 - 2 are substantially equalized.
- New common Huffman trees H 1 and H 2 are transmitted respectively to the slave servers 1502 - 1 and 1502 - 2 .
- one set of archives is allocated to one slave server and therefore, the common Huffman tree H is transmitted to the slave servers 1502 - 1 and 1502 - 2 .
- a common Huffman tree specific to each slave server is transmitted to the slave server.
- the archives 200 - 1 and 200 - 2 and the Huffman trees h- 1 and h- 2 are present in the slave server 1501 - 1
- the common Huffman tree H is transmitted to the slave server 1502 - 1 .
- FIG. 18 is a schematic of a compression symbol table and the compression parameter 221 of the archives 200 - 1 .
- characters “a” to “f” are described in a file group that is to be searched, is compressed, and is in the archives 200 - 1 .
- section (A) depicts a compression symbol table 1800 of the archives 200 - 1
- section (B) depicts a compression parameter P 1 of the archives 200 - 1 .
- a shorter compression symbol is allocated to a character having a larger frequency of appearance.
- FIG. 19 is a schematic of the Huffman tree h- 1 generated from the compression symbol table 1800 of the archives 200 - 1 .
- a circle is a node.
- the highest sort position node is referred to as “root R” and other nodes are referred to as “internal nodes”.
- a square is a leaf.
- a line connecting nodes or a node and a leaf is a branch.
- a character depicted within a leaf is a character obtained after expansion.
- a character string depicted below a leaf is the compression symbol allocated to the character obtained after expansion indicated in the leaf.
- FIG. 20 is a schematic of a compression symbol table and the compression parameter 221 of the archives 200 - 2 .
- characters “a” to “f” are described in a file group that is to be searched, is compressed, and is in the archives 200 - 2 .
- section (A) depicts a compression symbol table 2000 of the archives 200 - 2
- section (B) depicts a compression parameter P 2 of the archives 200 - 2 .
- a shorter compression symbol is allocated to a character having a larger frequency of appearance.
- FIG. 21 is a schematic of the Huffman tree h- 2 generated from the compression symbol table 2000 of the archives 200 - 2 .
- FIG. 22 is a schematic of a compression symbol table and the compression parameter 221 of the integrated archives 200 A. Because the integrated archives A is a integration of the archives 200 - 1 and 200 - 2 , characters “a” to “f” are described in the files that are to be searched, are compressed, and are included in the integrated archives A. Therefore, the frequency of appearance of the common compression parameter P depicted in FIG. 22 is a value obtained by summing, for each character, the frequency of appearance of the compression parameter P 1 of the archives 200 - 1 and of the compression parameter P 2 of the archives 200 - 2 .
- FIG. 23 is a schematic of the common Huffman tree H generated from the compression symbol table of the integrated archives A.
- the integrated archives A are generated by integrating the archives 200 - 1 and the archives 200 - 2 .
- the stored contents of the archives 200 - 1 will be described.
- FIG. 24 is a schematic of the stored contents of the archives 200 - 1 .
- the archives 200 - 1 are stored in the storage area 230 such as the RAM 103 or the magnetic disc 105 depicted in FIG. 1 .
- the archives 200 - 1 include the library area 201 , the management area 202 , and the data area 203 .
- the library area 201 stores therein a character appearance map linking table 211 a , a file path linking table 212 a , and a virtual archive capacity table 213 a .
- the management area 202 stores therein the compression parameter P 1 , a file path table 222 a , and a character appearance map Ma (including a single-character appearance map Ma 1 and a consecutive-character appearance map Ma 2 ).
- the data area 203 stores therein a compressed file group fa (compressed files fa_ 1 to fa_n) as depicted in FIG. 3 . Descriptions of these components are identical to those described in the first embodiment.
- FIG. 25 is a schematic of the single-character appearance map Ma 1 of the archives 200 - 1 .
- FIG. 26 is a schematic of the consecutive-character appearance map Ma 2 of the archives 200 - 1 .
- the bit numbers in the archives 200 - 1 are indicated as a_ 1 to a_n.
- FIG. 27 is a schematic of the compression parameter P 1 of the archives 200 - 1 .
- FIG. 28 is a schematic of a file path table 222 a of the archives 200 - 1 .
- FIG. 29 is a schematic of a character appearance map linking table 211 a of the archives 200 - 1 .
- FIG. 30 is a schematic of a file path linking table 212 a of the archives 200 - 1 .
- FIG. 31 is a schematic of the stored contents of the archives 200 - 2 .
- the archives 200 - 2 are stored in the storage area 230 such as the RAM 103 or the magnetic disc 105 depicted in FIG. 1 .
- the archives 200 - 2 include the library area 201 , the management area 202 , and the data area 203 .
- the library area 201 stores therein a character appearance map linking table 211 b , a file path linking table 212 b , and a virtual archive capacity table 213 b .
- the management area 202 stores therein the compression parameter P 2 , a file path table 222 b , and a character appearance map Mb (including a single-character appearance map Mb 1 and a consecutive-character appearance map Mb 2 ).
- the data area 203 stores therein a compressed file group fb (compressed files fb_ 1 to fb_m) as depicted in FIG. 3 . Descriptions of these components are identical to those described in the first
- FIG. 32 is a schematic of the single-character appearance map Mb 1 of the archives 200 - 2 .
- FIG. 33 is a schematic of the consecutive-character appearance map Mb 2 of the archives 200 - 2 .
- the character appearance map Mb of the archives 200 - 2 for convenience, to distinguish the bit numbers in the archives 200 - 2 from those in the archives 200 - 1 , the bit numbers in the archives 200 - 2 are indicated as b_ 1 to b_m.
- FIG. 34 is a schematic of the compression parameter P 2 of the archives 200 - 2 .
- FIG. 35 is a schematic of a file path table 222 b of the archives 200 - 2 .
- FIG. 36 is a schematic of a character appearance map linking table 211 b of the archives 200 - 2 .
- FIG. 37 is a schematic of a file path linking table 212 b of the archives 200 - 2 .
- FIG. 38 is a schematic for explaining an example of common parameter generation.
- the common compression parameter P is generated by summing, for each of the characters, the frequency of appearance of the compression parameter P 1 of the archives 200 - 1 and of the compression parameter P 2 of the archives 200 - 2 .
- FIG. 39 is a schematic for explaining a reconfiguration of the character appearance map linking tables 211 a and 211 b .
- the character appearance map linking table 211 a of the archive 200 - 1 and the character appearance map linking table 211 b of the archives 200 - 2 are integrated and the items in the tables 211 a and 211 b are sorted in descending order of access frequency.
- a character appearance map linking table 3900 obtained after the integration includes access frequencies for n+m bit numbers, respectively. Subsequently, the access frequencies are allocated to the slave servers 1502 - 1 and 1502 - 2 such that the access frequencies are substantially equivalent between the slave servers 1502 - 1 and 1502 - 2 .
- a new character appearance map linking table 3900 a is generated by allocating, to the slave server 1502 - 1 , the access frequencies of the bit numbers whose sort positions in descending order of access frequency are odd numbered.
- a new character appearance map linking table 3900 b is generated by allocating, to the slave server 1502 - 2 , the access frequencies of the bit numbers whose sort positions in descending order of access frequency are even numbered.
- the sort positions are divided into odd-numbered sort positions and even numbered sort positions as a method of allocation in this example. However, the sort positions one, four, five, eight, nine, etc., may be allocated to the slave server 1502 - 1 while the sort positions two, three, six, seven, ten, etc., may be allocated to the slave server 1502 - 2 . Further, any allocation method may be employed as far as the sort positions (or the access frequencies) are allocated such that the totals of the allocated sort positions (or the allocated access frequencies) are equivalent.
- FIG. 40 is a schematic for explaining reconfiguration of the file path linking tables 212 a and 212 b .
- the file path linking tables 212 a and 212 b respectively of the sets of archives 200 - 1 and 200 - 2 are integrated.
- a file path linking table 4000 obtained after the integration includes file paths for a total of n+m bit numbers. Subsequently, the bit numbers are allocated according to the allocation method employed for the character appearance map linking table 3900 .
- a file path linking table 4000 a for the same bit numbers as the bit numbers in the character appearance map linking table 3900 a allocated to the slave server 1502 - 1 is obtained.
- a file path linking table 4000 b for the same bit numbers as the bit numbers in the character appearance map linking table 3900 b allocated to the slave server 1502 - 2 is obtained.
- FIG. 41 is a schematic for explaining reconfiguration of the file path tables 222 a and 222 b .
- the file path tables 222 a and 222 b respectively of the sets of archives 200 - 1 and 200 - 2 are integrated.
- a file path table 4100 obtained after the integration has file paths for a total of n+m file IDs.
- the file IDs corresponding to the bit numbers are allocated according to the allocation method employed for the character appearance map linking table 3900 .
- a file path table 4100 a for the file IDs corresponding to the same bit numbers as the bit numbers in the character appearance map linking table 3900 a allocated to the slave server 1502 - 1 is obtained.
- a file path table 4100 b for the file IDs corresponding to the same bit numbers as the bit numbers in the character appearance map linking table 3900 b allocated to the slave server 1502 - 2 is obtained.
- FIG. 42 is a schematic for explaining reconfiguration of the single-character appearance maps Ma 1 and Mb 1 .
- the single-character appearance maps Ma 1 and Mb 1 respectively of the sets of archives 200 - 1 and 200 - 2 are integrated.
- a single-character appearance map Mab 1 obtained after the integration has a bit row including a total of n+m bits for each character. Subsequently, the bit numbers are allocated according to the allocation method employed for the character appearance map linking table 3900 .
- a single-character appearance map MA 1 for the same bit numbers as the bit numbers in the character appearance map linking table 3900 a allocated to the slave server 1502 - 1 is obtained.
- a single-character appearance map MB 1 for the same bit numbers as the bit numbers in the character appearance map linking table 3900 b allocated to the slave server 1502 - 2 is obtained.
- FIG. 43 is a schematic for explaining reconfiguration of the consecutive-character appearance maps Ma 2 and Mb 2 .
- the consecutive-character appearance maps Ma 2 and Mb 2 respectively of the sets of archives 200 - 1 and 200 - 2 are integrated.
- a consecutive-character appearance map Mab 2 obtained after the integration has bit a string including n+m bits in total for each character. Subsequently, the bit numbers are allocated according to the allocation method employed for the character appearance map linking table 3900 .
- a consecutive-character appearance map MA 1 for the same bit numbers as the bit numbers in the character appearance map linking table 3900 a allocated to the slave server 1502 - 1 is obtained.
- a consecutive-character appearance map MB 2 for the same bit numbers as the bit numbers in the character appearance map linking table 3900 b allocated to the slave server 1502 - 2 is obtained.
- FIG. 44 is a schematic for explaining reconfiguration of the compressed file groups fa and fb.
- the compressed file group fa of the archives 200 - 1 is expanded using the Huffman tree h- 1 corresponding thereto. Thus, a file group Fa to be searched is obtained.
- the compressed file group fb of the archives 200 - 2 is expanded using the Huffman tree h- 2 corresponding thereto. Thus, a file group Fb to be searched is obtained.
- the file group Fa to be searched is recompressed using the common Huffman tree H.
- a compressed file group ga is obtained.
- the file group Fb to be searched is recompressed using the common Huffman tree H.
- a compressed file group gb is obtained.
- the compressed file groups ga and gb that have been recompressed are integrated.
- the files are sorted in descending order of access frequency according to the allocation method employed for the character appearance map linking table 3900 .
- an integrated compressed file group “g” in descending order of access frequency is obtained.
- a compressed file group g 1 for the same bit numbers as the bit numbers in the character appearance map linking table 3900 a allocated to the slave server 1502 - 1 is allocated to the slave server 1502 - 1 .
- a compressed file group g 2 for the same bit numbers as the bit numbers in the character appearance map linking table 3900 b allocated to the slave server 1502 - 2 is allocated to the slave server 1502 - 2 .
- FIG. 45 is a schematic of the stored contents of the new archives A 1 .
- the new archives A 1 are transmitted to the slave server 1502 - 1 .
- the new archives A 1 store therein the common compression parameter P depicted in FIG. 38 , the character appearance map linking table 3900 a after the reconfiguration depicted in FIG. 39 , the file path linking table 4000 a after the reconfiguration depicted in FIG. 40 , the file path table 4100 a after the reconfiguration depicted in FIG. 41 , the single-character appearance map MA 1 after the reconfiguration depicted in FIG. 42 , the consecutive-character appearance map M 2 after the reconfiguration depicted in FIG. 43 , and the compressed file group g 1 after the reconfiguration depicted in FIG. 44 .
- FIG. 46 is a schematic of the stored contents of the new archives A 2 .
- the new archives A 2 are transmitted to the slave server 1502 - 2 .
- the new archives A 2 stores therein the common compression parameter P depicted in FIG. 38 , the character appearance map linking table 3900 b after the reconfiguration depicted in FIG. 39 , the file path linking table 4000 b after the reconfiguration depicted in FIG. 40 , the file path table 4100 b after the reconfiguration depicted in FIG. 41 , the single-character appearance map MB 1 after the reconfiguration depicted in FIG. 42 , the consecutive-character appearance map MB 2 after the reconfiguration depicted in FIG. 43 , and the compressed file group g 2 after the reconfiguration depicted in FIG. 44 .
- FIG. 47 is a block diagram of a functional configuration of the master server 1501 (information managing apparatus).
- the master server 1501 includes a receiving unit 4701 , a common compressed parameter generating unit 4702 , a common Huffman tree generating unit 4703 , an expanding unit 4704 , a compressing unit 4705 , a reconfiguring unit 4706 , and a transmitting unit 4707 .
- Functions of the units from the receiving unit 4701 to the transmitting unit 4707 are implemented by, for example, causing the CPU 101 to execute a program stored in the storage area such as the ROM 102 , the RAM 103 , and the magnetic disc 105 depicted in FIG. 1 , or by the I/F 109 .
- the receiving unit 4701 has a function of receiving data transmitted from the slave server 1502 - 1 . More specifically, for example, the receiving unit 4701 receives the sets of archives 200 - 1 to 200 -N and the Huffman trees h- 1 to h-N from the slave servers 1502 - 1 to 1502 -N.
- the common compression parameter generating unit 4702 has a function of generating the common compression parameter P for all sets of archives 200 - 1 to 200 -N. More specifically, for example, as depicted in FIG. 38 , the compression parameters P 1 and P 2 included in the sets of archives 200 - 1 and 200 - 2 received from the slave servers 1502 - 1 and 1502 - 2 are extracted. The common compression parameter P is generated by summing, for each character, the frequency of appearance of the extracted compression parameters P 1 and P 2 . The generated common compression parameter P is transmitted to the common Huffman tree generating unit 4703 and the archives generating unit 4713 .
- the common Huffman tree generating unit 4703 has a function of generating the common Huffman tree H for all sets of archives 200 - 1 to 200 -N. More specifically, for example, the common Huffman tree generating unit 4703 generates the common Huffman tree H by allocating “0” and “1” to characters in descending order of the frequency of appearance of the common compression parameter P using the binary search (see FIGS. 22 and 23 ). The generated common Huffman tree is transmitted to the expanding unit 4704 and the archives generating unit 4713 .
- the expanding unit 4704 has a function of expanding the compressed file group f included in the archives 200 - 1 for each set of archives 200 -I.
- the Huffman tree used in the expansion process is the Huffman tree h- 1 transmitted together with the archives 200 -I.
- the file group Fa to be searched is formed by expanding the compressed file group fa using the Huffman tree h- 1 that is used for the compression of the compressed file group fa.
- the file group Fb to be searched is obtained by expanding the compressed file group fb using the Huffman tree h- 2 that is used for the compression of the compressed file group fb.
- the compressing unit 4705 has a function of recompressing the file group to be searched that has been expanded by the expanding unit 4704 .
- the Huffman tree used for the recompression is the common Huffman tree H.
- the compressed file group ga is obtained by recompressing the file group Fa using the common Huffman tree H.
- the compressed file group gb is obtained by recompressing the file group Fb using the Huffman tree H that is used for compressing the file group Fb.
- the compressed file groups ga and gb are integrated by an integrating unit 4711 .
- the reconfiguring unit 4706 has a function of reconfiguring the each set of received archives 200 -I and each Huffman tree h-I.
- the reconfiguring unit 4706 includes the integrating unit 4711 , an allocating unit 4712 , and an archives generating unit 4713 .
- the integrating unit 4711 has a function of integrating the data in each set of archives 200 -I.
- the integrating unit 4711 integrates: the character appearance map linking tables 211 a and 211 b respectively of the sets of archives 200 - 1 and 200 - 2 ; the file path tables 222 a and 222 b ; the single-character appearance maps Ma 1 and Mb 1 ; and the consecutive-character appearance map Ma 2 and Mb 2 and, thereby, the integrating unit 4711 obtains the character appearance map linking table 3900 after the integration, the file path linking table 4000 after the integration, the file path table 4100 after the integration, the single-character appearance map Mab 1 after the integration, and the consecutive-character appearance map Mab 2 after the integration.
- the integrating unit 4711 integrates the compressed file group ga and gb that are recompressed by the compressing unit 4705 respectively for the sets of archives 200 - 1 and 200 - 2 and the integrating unit 4711 obtains the compressed file group g after the integration.
- the allocating unit 4712 has a function of allocating to each slave server 1502 -I, the data integrated by the integrating unit 4711 such that the access frequency after the allocation is equivalent in each slave server 1502 -I for each set of archives.
- the records of the character appearance map linking table 3900 after the integration are allocated such that the access frequencies or the sort positions thereof are substantially equivalent, and thereby, the character appearance map linking tables 3900 a and 3900 b reconfigured respectively for the slave servers 1502 - 1 and 1502 - 2 are obtained.
- the allocating unit 4712 further allocates, for the compressed file group g after the integration, such that the access frequencies or the sort positions thereof are substantially equivalent, and the compressed file groups g 1 and g 2 that are reconfigured for each of the slave servers 1502 - 1 and 1502 - 2 are obtained.
- the archives generating unit 4713 has a function of generating new archives that are reconfigured for each slave server 1502 -I. More specifically, for example, the archives generating unit 4713 aggregates, for each of the slave servers 1502 - 1 and 1502 - 2 , the data allocated respectively thereto and thereby, forms the sets of new archives A 1 and A 2 .
- the transmitting unit 4707 has a function of transmitting data to slave servers 1502 -I. More specifically, for example, the transmitting unit 4707 transmits a request for the collection of the sets of archives 200 - 1 to 200 -N and the Huffman trees h- 1 to h-N. The transmitting unit 4707 transmits the new archives A 1 (A 2 ) respectively together with the common Huffman tree H to respective allocation destinations, the slave server 1502 - 1 ( 1502 - 2 ).
- FIGS. 48 and 49 are flowcharts of the archives reconfiguring process by the master server 1501 .
- the receiving unit 4701 collects the sets of archives 200 - 1 to 200 -N and the Huffman trees h- 1 to h-N of the slave servers 1502 - 1 to 1502 -N (step S 4801 ).
- the integrating unit 4711 extracts and integrates the character appearance map linking tables 211 a and 211 b in the sets of archives 200 - 1 and 200 - 2 (step S 4802 ).
- the allocating unit sorts the character appearance map linking table 3900 after the integration in descending order of access frequency (step S 4803 ), and allocates the character appearance map linking tables 3900 a and 3900 b respectively to the slave servers 1502 - 1 and 1502 - 2 (step S 4804 ).
- the file path tables 222 a and 222 b , the file path linking tables 212 a and 212 b , and the character appearance maps Ma and Mb are integrated, and are allocated to slave servers 1502 - 1 and 1502 - 2 according to bit numbers that each have a high access frequency (or the corresponding file ID) and, thereby, the reconfiguration is executed (step S 4805 ).
- the common compression parameter generating unit 4702 generates the common compression parameter P (step S 4806 ).
- the common Huffman tree generating unit 4703 generates the common Huffman tree H (step S 4807 ).
- the expanding unit 4704 expands the compressed file groups fa and fb respectively for the sets of archives 200 - 1 and 200 - 2 using respectively the Huffman trees h- 1 and h- 2 that are used for the compression of the file groups fa and fb (step S 4901 ).
- the compressing unit 4705 recompresses the file groups Fa and Fb that are to be searched and that have been expanded respectively for the sets of archives 200 - 1 and 200 - 2 , using the common Huffman tree H (step S 4902 ).
- the integrating unit 4711 integrates the compressed file groups ga and gb that have been recompressed (step S 4903 ), and sorts the bit numbers in descending order of access frequency (step S 4904 ).
- the allocating unit 4712 allocates the bit numbers to the slave servers 1502 - 1 and 1502 - 2 such that the totals of the access frequencies or the sort positions thereof are substantially equivalent (step S 4905 ).
- the archives generating unit 4713 generates the sets of new archives A 1 and A 2 respectively for the slave servers 1502 - 1 and 1502 - 2 (step S 4906 ), and transmits the new archives A 1 (A 2 ) and the common Huffman tree H to the slave server 1502 - 1 ( 1502 - 2 ) that is the allocation destination of the new archives A 1 (A 2 ) (step S 4907 ), ending a series of processing.
- the second embodiment by reconfiguring the sets of archives 200 - 1 and 200 - 2 of the slave servers 1502 - 1 and 1502 - 2 , substantial equalization of the searching speed between the slave servers 1502 - 1 and 1502 - 2 is achieved. Therefore, when the same search character string is given to each of the slave servers 1502 - 1 and 1502 - 2 , the search results are returned substantially simultaneously from the slave servers 1502 - 1 and 1502 - 2 . That is, the waiting time for the last search result can be reduced and therefore, improvement of the searching speed is enabled.
- a third embodiment is configured by improving a portion of the second embodiment.
- the second embodiment is configured to execute the step of expanding the compressed file group of each set of archives 200 -I using the Huffman tree h- 1 that is used for compressing the compressed file group, and the step of recompressing the file group that has been expanded and is to be searched, using the common Huffman tree H are executed.
- the second embodiment is also configured to be able to compress and expand in each slave server 1502 -I using the common Huffman tree H by executing this two-path processing, that is, the expansion and the recompression.
- the third embodiment is configured to identify, from the common Huffman tree H, the leaves of the same characters as the expanded characters buried in the leaves of the Huffman tree of each slave server 1502 -I by the expansion process using the Huffman tree.
- the compressed symbols allocated to the identified leaves of the common Huffman tree H are set instead of the expanded characters of the leaves of the Huffman tree h-I that is the identification origin.
- the Huffman tree after the setting is referred to as “converting Huffman tree”.
- the compressed file group is converted into a compressed file group corresponding to the compression symbol of the common Huffman tree H by executing the expansion process of the compressed file group that has been compressed using the Huffman tree h- 1 obtained before setting, using the converting Huffman tree.
- the compressed file group corresponding to the compression symbol of the common Huffman tree H remaining in the compressed format, can be obtained for each slave server 1502 -I by the one-path processing of one converting process. Therefore, an increased speed of the reconfiguring process in the master server 1501 can be realized.
- FIG. 50 is a schematic for explaining an example of generating the converting Huffman tree.
- the example depicted in FIG. 50 is of generation that uses the Huffman tree h-I of the archives 200 -I and the common Huffman tree H.
- the leaf to which a character “b” of a Huffman tree b- 1 is set is noted.
- a leaf of the common Huffman tree to which the same character as the character “b” of the noted leaf is identified (step S 5001 ).
- a compression symbol “110” and the length of the compression symbol (in this example, “three”) that are allocated to the character “b” of the identified leaf of the common Huffman tree are read (step S 5002 ).
- the compression symbol and the length of the compression symbol “110 (3)” that are read are written (step S 5003 ).
- Other characters “a”, “c”, to “f” are similarly converted.
- FIG. 51 is a schematic for explaining a second example of generating the converting Huffman tree.
- the example depicted in FIG. 51 is of generation that uses the Huffman tree h- 2 of the archives 200 - 2 and the common Huffman tree H.
- a leaf to which a character “f” of the Huffman tree h- 2 is noted.
- a leaf of the common Huffman tree to which the same character as the character “f” of the noted leaf is identified (step S 5101 ).
- a compression symbol “1110” and the length of the compression symbol (in this example, “four”) that are allocated to the character “f” of the identified leaf of the common Huffman tree are read (step S 5102 ).
- the compression symbol and the length of the compression symbol “1110 (4)” that are read are written (step S 5103 ).
- Other characters “a” to “e” are similarly converted.
- FIG. 52 is a schematic of a first converting Huffman tree.
- a converting Huffman tree H 1 is a Huffman tree generated by the generation process that uses the Huffman tree h- 1 of the archives 200 - 1 and the common Huffman tree H depicted in FIG. 50 .
- the expansion does not provide the character “b” depicted in FIG. 50 , but rather provides conversion to the compression symbol “110” set instead of the character “b”. Therefore, the two-path processing including the expansion and the recompression of the compressed file is not necessary.
- the compressed file that can be compressed and expanded using the common Huffman tree H can be obtained by the one-path processing handling the compressed file in the compressed format.
- FIG. 53 is a schematic of a second converting Huffman tree.
- a converting Huffman tree H 2 is a Huffman tree generated by the generation process that uses the Huffman tree h- 2 of the archives 200 - 2 and the common Huffman tree H depicted in FIG. 51 .
- the expansion does not provide the character “f” depicted in FIG. 51 , but rather provides conversion to the compression symbol “1110” set instead of the character “f”. Therefore, the two-path processing including the expansion and the recompression of the compressed file is not necessary.
- the compressed file that can be compressed and expanded using the common Huffman tree H can be obtained by the one-path processing handling the compressed file in the compressed format.
- FIG. 54 is a block diagram of a functional configuration of the master server 1501 (information managing apparatus) according to the third embodiment.
- the master server 1501 includes a selecting unit 5401 , an identifying unit 5402 , a setting unit 5403 , and a converting unit 5404 . More specifically, functions of the units from the selecting unit 5401 to the converting unit 5404 are implemented by, for example, causing the CPU 101 to execute a program stored in the storage area such as the ROM 102 , the RAM 103 , and the magnetic disc 105 depicted in FIG. 1 , or by the I/F 109 .
- the selecting unit 5401 has a function of successively selecting arbitrary leaves from the Huffman tree h-I used for compression of the compressed file group in a corresponding archives 200 -I, for each set of archives 200 -I. More specifically, for example, the selecting unit 5401 successively selects the leaves of the Huffman tree h-I depicted in FIGS. 50 and 51 .
- the identifying unit 5402 has a function of identifying from the common Huffman tree H, the leaves of the same character as the character expanded, using the leaves successively selected by the selecting unit 5401 . More specifically, as depicted in FIGS. 50 and 51 , the identifying unit 5402 identifies the leaves of the common Huffman tree H to which the same character is set as depicted in FIGS. 50 and 51 .
- the setting unit 5403 has a function of setting, to a leaf selected in the Huffman tree h-I, the compressed symbol allocated to the leaf identified by the identifying unit 5402 instead of the character expanded using the selected leaf. More specifically, the setting unit 5403 overwrites the compression symbol identified from the common Huffman tree H into the area into which an expanded character is written in the construction of the selected leaf. The setting unit 5403 writes the length of the compressed symbol into another blank area. In the structure of the leaf that is the setting target, the pointer to an upper node remains as it is and, therefore, the conversion provides the compression symbol written in the structure of the leaf by designating the compression symbol allocated to the selected leaf.
- the converting unit 5404 has a function of converting the compressed file compressed using the Huffman trees h- 1 and h- 2 before the conversion, using the converting Huffman trees H 1 and H 2 handling the compressed file remaining in the compressed format. Thereby, the compressed file groups ga and gb that can be compressed and expanded using the common Huffman tree H are obtained. Similarly to the second embodiment, the compressed file groups ga and gb after the conversion are integrated by the integrating unit 4711 , and are allocated to the slave servers 1502 - 1 and 1502 - 2 such that the totals of the access frequencies or the sort positions thereof are substantially equivalent between the slave servers 1502 - 1 and 1502 - 2 .
- FIG. 55 is a flowchart of the archives reconfiguring process (latter half) by the master server 1501 .
- the reconfiguring process (former half) is identical to that depicted in FIG. 18 and the description thereof is omitted.
- a compression symbol setting process for the Huffman trees h- 1 and h- 2 is executed (step S 5501 ).
- the converting unit 5404 executes the one-path converting process (step S 5502 ).
- the one-path converting process is executed for each of the compressed file groups ga and gb in the sets of archives 200 - 1 and 200 - 2 using the converting Huffman trees H 1 and H 2 obtained respectively for the sets of archives 200 - 1 and 200 - 2 .
- the integrating unit 4711 integrates the compressed file groups ga and gb after the conversion (step S 5503 ), and sorts in descending order of access frequency (step S 5504 ). Subsequently, the allocating unit 4712 allocates the compressed files to the slave servers 1502 - 1 and 1502 - 2 such that the totals of the access frequencies or the sort positions thereof are substantially equivalent (step S 5505 ).
- the archives generating unit 4713 generates the sets of new archives A 1 and A 2 respectively for the slave servers 1502 - 1 and 1502 - 2 (step S 5506 ), and transmits the new archives A 1 (A 2 ) and the common Huffman tree H to the slave server 1502 - 1 ( 1502 - 2 ) that is the allocation destination of the new archives A 1 (A 2 ) (step S 5507 ), ending a series of processing.
- FIG. 56 is a flowchart of the compressed symbol setting process to the Huffman tree.
- the selecting unit 5401 judges whether any unprocessed Huffman trees are present from the Huffman trees h- 1 and h- 2 respectively of the sets of archives 200 - 1 and 200 - 2 (step S 5601 ).
- an unprocessed Huffman tree is present (step S 5601 : YES)
- an unprocessed Huffman tree is selected (step S 5602 ).
- the selecting unit 5401 judges whether any unprocessed leaves are present in the selected Huffman tree (step S 5603 ).
- step S 5603 When an unprocessed leaf is present (step S 5603 : YES), an unprocessed leaf is selected (step S 5604 ).
- the identifying unit 5402 identifies the leaf to which the same character as the character set in the selected leaf is set, from the common Huffman tree H (step S 5605 ).
- the setting unit 5403 sets, in the structure of the selected leaf, the compressed symbol and the length of the compressed symbol that are allocated to the identified leaf (step S 5606 ). The procedure returns to step S 5603 .
- step S 5603 when no unprocessed leaf is present (step S 5603 : NO), the procedure returns to step S 5601 .
- step S 5601 when no unprocessed Huffman tree is present (step S 5601 : NO), the procedure moves to the one-path converting process (step S 5502 ).
- the compressed file groups corresponding to the compression symbols of the common Huffman tree, remaining in a compressed format can be obtained for the slave servers 1502 - 1 and 1502 - 2 using a one-path process, i.e., the one-time converting process. Therefore, the file opening process (expansion process) to apply the file to the common Huffman tree H becomes unnecessary and, therefore, increased speed of the archives reconfiguring process in the master server 1501 can be realized.
- the archives 200 reconfiguring process can be realized with a simple configuration and without creating a new algorithm because the existing process, the Huffman tree expansion process, is applied.
- higher efficiency of the search process can be realized by achieving increased speed of the file accessing process when a full text search is implemented to compressed files, handling the files remaining in the compressed format.
- Increased speed of a full text search can be realized by causing the cache area to be a resident memory and increasing the efficiency of the server resources.
- the method explained in the present embodiment can be implemented by a computer, such as a personal computer and a workstation, executing a program that is prepared in advance.
- the program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read out from the recording medium by a computer.
- the program can be a transmission medium that can be distributed through a network such as the Internet.
- compressed files in the archives is compressed using a common parameter; and the management area of the archives has written therein a table that stores therein bit numbers in a character appearance map that uses the head address of each compressed file for the strict selection of the files to be searched such the strict selection described in International Publication Pamphlet No. 2006-123448, and the head addresses of the compressed files corresponding to the bit numbers.
- the head address of a corresponding compressed file can be obtained based on the position number of a flag of a file that is strictly selected using the character appearance map and, therefore, a high-speed opening process can be realized.
- archives that include a compressed file group including compressed files that are to be searched and that have described therein character strings are accessed; the compressed files are sorted in descending order of the frequency of accesses; the compressed files are combined in the descending order of the access frequency after the sorting such that the combined size does not exceed the storage capacity of a cache area for the storage area having stored therein the compressed file group; and the combined compressed file group is written from the storage area into the cache area prior to a search in the file group.
- compressed files having a high access frequency can be stored in the cache area with preference.
- a plurality of slave servers are accessed, each having stored therein archives that include a compressed file group including compressed files that have character strings described therein and that are to be searched; the archives are received from each of the slave servers; based on each character described in the file group to be searched for each set of received archives and compression parameters concerning the appearance frequency of each character, appearance frequencies are totaled for each character; thereby, a compression parameter that is common to the compressed file group is generated; based on the generated common compression parameter, a Huffman tree common to the compressed file group is generated; the compressed files are allocated to the slave servers such that sums of the access frequencies of each compressed file are substantially equivalent among the slave servers; and new archives including the compressed file group allocated to each slave server and the common Huffman tree are transmitted to the slave server to which the compressed files are allocated.
- archives including the compressed file group for which the access frequencies are equivalent can be distributed to the slave servers and, therefore, higher efficiency and a higher speed of a search among all the grid computers can to be facilitated.
- an effect is exerted that, when a full text search is implement for a compressed file remaining in the compressed format, higher efficiency of the search process can be realized by enabling a higher-speed file accessing process.
Abstract
A computer-readable recording medium stores therein an information searching program that causes a computer having access to archives including a compressed file group of compressed files that are to be searched and that have described therein character strings, to execute: sorting the compressed files in descending order of access frequency of the compressed files; combining the compressed files in descending order of access frequency after the sorting at the sorting such that a storage capacity of a cache area for a storage area that stores therein the compressed file group is not exceeded by a combined size of the compressed files combined; and writing, from the storage area into the cache area, the compressed files combined at the combining, the compressed files combined being written prior to a search of the compressed files combined.
Description
- This is a Divisional of application Ser. No. 12/361,316, filed Jan. 28, 2009.
- This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2008-143527, filed on May 30, 2008, the entire contents of which are incorporated herein by reference.
- The embodiments discussed herein are related to information search and management with respect to compressed files.
- Conventionally, as disclosed in International Publication Pamphlet No. 2006-123448, a technique of reducing the number of files that are opened involves handling the files in a compressed format and strictly selecting files that have a potential of satisfying search conditions. In general, concerning file searches, the use of archives is accepted to be effective against the problems of fragmentation of storage areas and increased storage size, problems occurring because the number of times opening processes are executed increases and file management is executed for each sector.
- However, with the archives above, the computation of a compression parameter is necessary because files are respectively compressed using different compression parameters. Consequently, a problem has arisen in that the time necessary for the opening processes increases overall. With the technique disclosed in International Publication Pamphlet No. 2006-123448 and with archives, a problem has arisen in that the percentage of the files for which the opening processes are performed drastically increases as the number of files to be searched increases. Particularly, for large-scale dictionaries, opening processes accounts for 20 to 30% of the entire file processing and consequently, a problem has arisen in that this becomes a factor in reducing the speed of a full text search. In addition, a problem has arisen in that 13 comparisons are necessary in a binary search to identify a designated file among approximately 5,000 files. Furthermore, fragmentation of the disc area occurs because file management is executed for each sector, arising in a problem of the storage size increasing.
- Meanwhile, for a search in a dictionary on a system including a single server that includes a large-capacity main storage memory, the dictionary is divided into numerous files and numerous disc areas, requiring a very long time for opening processes and reading processes to cause the dictionary to reside in a cache, and fragmentation of the storage area occurs in the cache resulting in a problem of the storage size increasing.
- In many dictionary searches using grid computing, the entire search process is affected when a search process of a grid computer is delayed causing reduced search efficiency.
- According to an aspect of an embodiment, a computer-readable recording medium stores therein an information searching program that causes a computer having access to archives including a compressed file group of compressed files that are to be searched and that have described therein character strings. The information searching program causes the computer to execute sorting the compressed files in descending order of access frequency of the compressed files; combining the compressed files in descending order of access frequency after the sorting at the sorting such that a storage capacity of a cache area for a storage area that stores therein the compressed file group is not exceeded by a combined size of the compressed files combined; and writing, from the storage area into the cache area, the compressed files combined at the combining, the compressed files combined being written prior to a search of the compressed files combined.
- The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
- It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
-
FIG. 1 is a block diagram of an information search apparatus according to a first embodiment; -
FIG. 2 is a schematic depicting stored content of archives; -
FIG. 3 is a schematic for explaining relations between a compressed file and files to be searched; -
FIG. 4 is a schematic of a single-character appearance map M1; -
FIG. 5 is a schematic of a consecutive-character appearance map; -
FIG. 6 is a schematic of a compression parameter; -
FIG. 7 is a schematic of a file path table; -
FIG. 8 is a schematic of a character appearance map linking table; -
FIG. 9 is a schematic of the file path linking table 212; -
FIG. 10 is a schematic of a virtual archive capacity table; -
FIG. 11 is a block diagram of a functional configuration of an information searching apparatus; -
FIG. 12 is a schematic of an example of strict selection of the compressed file using the character appearance map; -
FIG. 13 is a flowchart of a virtual archives setting process; -
FIG. 14 is a flowchart of an information search process; -
FIG. 15 is a schematic of a system configuration of a searching system according to a second embodiment; -
FIG. 16 is a schematic for describing a sharing of archives; -
FIG. 17 is a schematic for describing an allocation process for new archives; -
FIG. 18 is a schematic of a compression symbol table and the compression parameter of archives 200-1; -
FIG. 19 is a schematic of a Huffman tree generated from the compression symbol table of the archives 200-1; -
FIG. 20 is a schematic of a compression symbol table and the compression parameter of archives; -
FIG. 21 is a schematic of the Huffman tree generated from the compression symbol table of the archives 200-2; -
FIG. 22 is a schematic of a compression symbol table and the compression parameter of integrated archives; -
FIG. 23 is a schematic of a common Huffman tree generated from the compression symbol table of the integrated archives; -
FIG. 24 is a schematic of the stored contents of the archives 200-1; -
FIG. 25 is a schematic of the single-character appearance map of the archives 200-1; -
FIG. 26 is a schematic of the consecutive-character appearance map of the archives 200-1; -
FIG. 27 is a schematic of the compression parameter of the archives 200-1; -
FIG. 28 is a schematic of a file path table of the archives 200-1; -
FIG. 29 is a schematic of a character appearance map linking table of the archives 200-1; -
FIG. 30 is a schematic of a file path linking table of the archives 200-1; -
FIG. 31 is a schematic of the stored contents of the archives 200-2; -
FIG. 32 is a schematic of the single-character appearance map of the archives 200-2; -
FIG. 33 is a schematic of the consecutive-character appearance map of the archives 200-2; -
FIG. 34 is a schematic of the compression parameter of the archives 200-2; -
FIG. 35 is a schematic of a file path table 222 b of the archives 200-2; -
FIG. 36 is a schematic of a character appearance map linking table of the archives 200-2; -
FIG. 37 is a schematic of a file path linking table of the archives 200-2; -
FIG. 38 is a schematic for explaining an example of common parameter generation; -
FIG. 39 is a schematic for explaining a reconfiguration of the character appearance map linking tables; -
FIG. 40 is a schematic for explaining reconfiguration of the file path linking tables; -
FIG. 41 is a schematic for explaining reconfiguration of the file path tables; -
FIG. 42 is a schematic for explaining reconfiguration of the single-character appearance maps; -
FIG. 43 is a schematic for explaining reconfiguration of the consecutive-character appearance maps; -
FIG. 44 is a schematic for explaining reconfiguration of the compressed file groups; -
FIG. 45 is a schematic of the stored contents of new archives A1; -
FIG. 46 is a schematic of the stored contents of new archives A2; -
FIG. 47 is a block diagram of a functional configuration of the master server (information managing apparatus); -
FIGS. 48 and 49 are flowcharts of an archives reconfiguring process by the master server; -
FIG. 50 is a schematic for explaining an example of generating a converting Huffman tree; -
FIG. 51 is a schematic for explaining a second example of generating a converting Huffman tree; -
FIG. 52 is a schematic of a first converting Huffman tree; -
FIG. 53 is a schematic of a second converting Huffman tree; -
FIG. 54 is a block diagram of a functional configuration of the master server (information managing apparatus) according to a third embodiment; -
FIG. 55 is a flowchart of the archives reconfiguring process (latter half) by the master server; and -
FIG. 56 is a flowchart of a compressed symbol setting process to the Huffman tree. - Preferred embodiments of the present invention will be explained with reference to the accompanying drawings.
- In a narrow sense, archiving generally is a technique of consolidating multiple folders and numerous files of the folders, into one file. Archives are transmitted and received as email attachments, and are used for such purposes as data exchange. In a broad sense, archives are also introduced as an accessory technique of compression because often the archives are combined with a compressing technique. With the prevalence of the Internet, archiving technology advances and a wide variety of tools have been developed combining operability and compression schemes. The advancement of hardware such as a personal computer is remarkable and, especially, the increased speed of central processing units (CPUs) and the increased capacity of recording media such as a memory, a hard disc, and an optical disc are conspicuous.
- With the advancement of hardware, the diversification of data and changes in, for example, practical applications of data such as for analysis, inquiries, and research are also remarkable. Conversely, concerning leaks of personal information and window-dressed accounting, the strengthening of security functions compliant with legislation such as the personal information protection law is demanded. Conventionally, archiving technology has focused on compression and expansion performance. However, from now on, functions linked to searching, security, etc. will be the focus.
- Conventional archiving is technology developed mainly in the fields of data storage, information transmission, and information exchange and is characterized by compression and consolidation into one file. When a file is used, expansion (or temporary expansion) is executed. Archives such as ZIP have no full text search function. The search function becomes more important as the number of files increases.
- According to a first embodiment, the time required for a file access process such as reading is reduced by causing a file having a high access frequency to reside in a cache memory when a full text search in a compressed file is implemented.
-
FIG. 1 is a block diagram of an information search apparatus according to the first embodiment. As depicted inFIG. 1 , the information search apparatus includes a central processing unit (CPU) 101, a read-only memory (ROM) 102, a random access memory (RAM) 103, amagnetic disc drive 104, amagnetic disc 105, aoptical disc drive 106, a removable recording medium such anoptical disc 107, adisplay 108, an interface (I/F) 109, akeyboard 110, amouse 111, ascanner 112, and aprinter 113, connected to one another by way of abus 100. - The
CPU 101 governs overall control of the information search apparatus. TheROM 102 stores therein programs such as a boot program. TheRAM 103 is used as a work area of theCPU 101. Themagnetic disc drive 104, under the control of theCPU 101, controls reading/writing of data from/to themagnetic disc 105. Themagnetic disc 105 stores therein the data written under control of themagnetic disc drive 104. - The
optical disc drive 106, under the control of theCPU 101, controls reading/writing of data from/to theoptical disc 107. Theoptical disc 107 stores therein the data written under control of theoptical disc drive 106, the data being read by a computer. - The
display 108 displays a cursor, an icon, a tool box, and data such as document, image, and function information. Thedisplay 108 may be, for example, a cathode ray tube (CRT), a thin-film-transistor (TFT) liquid crystal display, or a plasma display. - The I/
F 109 is connected to anetwork 114 such as a local area network (LAN), a wide area network (WAN), and the Internet through a communications line and is connected to other devices by way of thenetwork 114. The I/F 109 manages thenetwork 114 and an internal interface, and controls the input and output of data from and to external devices. The I/F 109 may be, for example, a modem or a local area network (LAN) adapter. - The
keyboard 110 is equipped with keys for the input of characters, numerals, and various instructions, and data is entered through thekeyboard 110. Thekeyboard 110 may be a touch-panel input pad or a numeric keypad. Themouse 111 performs cursor movement, range selection, and movement, size change, etc., of a window. Themouse 111 may be a trackball or a joystick provided the trackball or joystick has similar functions as a pointing device. - The
scanner 112 optically reads an image and takes in the image data into the information search apparatus. Thescanner 112 may have an optical character recognition (OCR) function. Theprinter 113 prints image data and document data. Theprinter 113 may be, for example, a laser printer or an ink jet printer. -
FIG. 2 is a schematic depicting stored contents of archives. The archives are stored in a storage area such as theRAM 103 or themagnetic disc 105 depicted inFIG. 1 .Archives 200 include alibrary area 201, amanagement area 202, and adata area 203. Thelibrary area 201 stores therein a character appearance map linking table 211, a file path linking table 212, and a virtual archive capacity table 213. Themanagement area 202 stores therein acompression parameter 221, a file path table 222, and a character appearance map M (a single-character appearance map M1 and a consecutive-character appearance map M2). Thedata area 203 stores therein a compressed file group f (compressed files f1 to fn). - The
archives 200 are stored in thestorage area 230 and compressed files from the head to a compressed file f′ are stored in acache area 240. In this example, thecache area 240 is astorage area 230 that is determined relative to thestorage area 230 of thearchives 200 and that is an area capable of being accessed at a higher speed than thestorage area 230 of thearchives 200. For example, when thestorage area 230 of thearchives 200 is themagnetic disc 105, thecache area 240 is provided on a main memory heap area, etc. The cache area further stores therein some or all of character appearance maps, file paths, and virtual archives. -
FIG. 3 is a schematic for explaining relations between a compressed file fi and files to be searched. “n” compressed files f1 to fn are compressed using a common Huffman tree and are expanded using the Huffman tree. The expanded file group to be searched is a file group that has described therein, for example character strings such as a dictionary or a glossary. Each file is described in a computer readable language such as HyperText Markup Language (HTML) or Extensible Markup Language (XML). For a Japanese dictionary, the number of characters in one file may be 4,000 or more and the number of files “n” is on the order of n=4,000 to 6,000. - For example, concerning a Japanese dictionary, when a compressed file f23 having a file number of i=23 is expanded, a file F23 to be searched is obtained as depicted in section (A) of
FIG. 3 ; when a compressed file f158 having a file number of i=158 is expanded, a file F158 to be searched is obtained as depicted in section (B) ofFIG. 3 ; and, when a compressed file f4971 having a file number of i=4971 is expanded, a file F4971 to be searched is obtained as depicted in section (C) ofFIG. 3 . In the files F23, F158, and F4971 to be searched, character strings within thick brackets are headwords. -
FIG. 4 is a schematic of the single-character appearance map M1. The single-character appearance map M1 includes a bit row for each character. Bits in the bit row are arranged sequentially according to bit number. A bit number “i” corresponds to a file number “i” of a compressed file. In the bit row, “1” indicates that a given character is present and “0” indicates that the given character is not present. For example, a bit number i=1 for a Hiragana character “” is “1” and therefore, the Hiragana character “” is present in a file that is to be searched and formed by an expansion of the compressed file f1. On the other hand, a bit number i=1 for a Kanji character “” is “0” and therefore, the Kanji character “” is not present in a file that is to be searched and formed by an expansion of the compressed file f1. -
FIG. 5 is a schematic of the consecutive-character appearance map M2. Consecutive characters are a string of characters. In the example, two consecutive characters are exemplified; however, three or more consecutive characters may be employed. The map format is identical to that of the single-character appearance map M1 depicted inFIG. 4 . For example, a bit number i=1 for consecutive numerals “99” is “0” and therefore, the consecutive numerals “99” are not present in a file that is to be searched and formed by an expansion of the compressed file f1. On the other hand, a bit number i=2 for the consecutive numerals “99” is “1” and therefore, the consecutive numerals “99” are present in a file that is to be searched and formed by an expansion of the compressed file f2. -
FIG. 6 is a schematic of thecompression parameter 221. Thecompression parameter 221 is a table correlating characters/consecutive characters described in a file group to be searched and formed by expanding a compressed file group f, with the frequency of appearance of each character. According to thecompression parameter 221, a Huffman tree is generated to compress the file group to be searched into the compressed file group f. -
FIG. 7 is a schematic of the file path table 222. The file path table 222 includes description of a path (file path) to the compressed file fi. More specifically, for example, for each file ID, the table 222 correlates a file path to a compressed file fi, the headword described in the file that is to be searched and formed by expanding the compressed file fi, the address of the compressed file fi, and the size of the compressed file fi. A file ID is information that uniquely identifies a compressed file fi. - For explanatory purposes, a reference symbol allocated to a compressed file is a “file ID”. For example, a file path of a compressed file f23 having a file ID of “f23” is “honmon\file23.html” having a headword of “” (see
FIG. 3 ). When a resident flag described hereinafter is set to be “1”, a file path to the cache area 240 (for example, “cash\file23.html”) is written and, when the flag is returned to “0”, the file path to thecache area 240 is deleted. -
FIG. 8 is a schematic of the character appearance map linking table 211. The character appearance map linking table 211 includes the address, the size, the access frequency, and the resident flag for each bit in the bit row of each character in the character appearance map M. The “address” is an address indicative of the area in which a given compressed file corresponding to a given bit number in a bit row on the character appearance map M is stored. The “size” is the size of a given compressed file. For example, a compressed file that corresponds to the bit number “i” is the compressed file fi. The address at which this compressed file fi is stored is “adri” and the size of the compressed file fi is “si”. - The “access frequency” is the degree to which the compressed file fi corresponding to the bit number i is accessed and, in the embodiment, the “access frequency” is the number of accesses. In addition to “the number of accesses”, the access frequency may be represented by probability (the number of accesses of the compressed file fi/the total number of accesses of all the compressed files). The “resident flag” is a flag that indicates whether the compressed file fi corresponding to the bit number i resides in the
cache area 240 resulting from the compressed file fi being moved from the storage are 230 of thearchives 200 to thecache area 240 for thestorage area 230. - When the compressed file fi resides in the
cache area 240, the resident flag is “1”. On the other hand, when the compressed file fi does not reside in thecache area 240 and is stored in thestorage area 230, the resident flag is “0”. When the resident flag is set to be “1”, the address of thecache area 240 is written into the “Address” column and, when the resident flag is returned to “0”, the address of thecache area 240 is deleted. -
FIG. 9 is a schematic of the file path linking table 212. The file path linking table 212 is a table that links a file path and the bit number i. When the resident flag is set to be “1”, the file path to thecache area 240 is written and, when the resident flag is returned to “0”, the file path to thecache area 240 is deleted. -
FIG. 10 is a schematic of the virtual archive capacity table 213. The virtual archives include the compressed file group f′ stored in thecache area 240 of thearchives 200 stored in thestorage area 230. Tables, etc., in thelibrary area 201 and the management area 220 may be included in the virtual archives. -
FIG. 11 is a block diagram of a functional configuration of an information searching apparatus. Aninformation searching apparatus 1100 includes asorting processing unit 1101, a combiningunit 1102, awriting unit 1103, asetting unit 1104, aninput unit 1105, an identifyingunit 1106, areading unit 1107, an expandingunit 1108, asearching unit 1109, anoutput unit 1110, and anupdating unit 1111. - Units including the
sorting unit 1101 to theinput unit 1105 and theupdating unit 1111 implement a virtual archive setting function. Units including theinput unit 1105 to theoutput unit 1110 implement an information searching function. The International Publication No. 2006-123448 describes in detail an information searching function, attributes characterizing the function according to the present embodiment are briefly described. - More specifically, functions of the sorting
processing unit 1101 to theupdating unit 1111 are implemented by, for example, causing theCPU 101 to execute a program stored in thestorage area 230 such as theROM 102, theRAM 103, and themagnetic disc 105 depicted inFIG. 1 , or by the I/F 109. - The sorting
processing unit 1101 has a function of sorting the compressed files in the character appearance map linking table 211 in descending order of the access frequency of each compressed file. This sorting is a process executed before setting the resident flag. - The combining
unit 1102 has a function of combining, in terms of size, the compressed files fi in descending order of access frequency after the sorting by the sortingprocessing unit 1101. More specifically, the combination is executed such that the storage capacity of thecache area 240 for thestorage area 230 that stores therein the compressed file group f is not exceeded. For example, the compressed files are combined in descending order of the access frequency after the sorting by the sortingprocessing unit 1101 such that the combined size is the largest combined size that does not exceed the storage capacity of thecache area 240. By calculating the greatest combined value in this manner, the storage capacity of thecache area 240 can be fully utilized. - The
writing unit 1103 has a function of writing the compressed file group combined by the combiningunit 1102 from thestorage area 230 into thecache area 240, prior to a search of the file group. The compressed file group f to be written into thecache area 240 may be deleted from thestorage area 230 or may remain in thestorage area 230. In this manner, by writing the compressed file group having a high access frequency into thecache area 240, a faster file access speed can be achieved. - The
setting unit 1104 has a function of setting the resident flag for the compressed file group written into thecache area 240 by thewriting unit 1103. More specifically, for example, the resident flag of the compressed file group written in thecache area 240 in the character appearance map linking table 211 is changed from “0” to “1”. When the resident flag is already “1”, the flag is not changed. For a compressed file that was written into thecache area 240 the previous time and is deleted this time, the resident flag that had been set to “1” is changed to “0”. Thereby, a compressed file having a high access frequency and residing in thecache area 240 can be identified. - The
input unit 1105 has a function of receiving input of a search character string. More specifically, theinput unit 1105 receives a search character string input through the use of an input apparatus such as the keyboard depicted inFIG. 1 . For example, theinput unit 1105 receives input of a search character string such as “”. In addition to search character strings, theinput unit 1105 may receive input of search conditions such as forward coincidence and reverse coincidence. - The identifying
unit 1106 has a function of identifying a compressed file that includes all the characters included the search character string received by theinput unit 1105. More specifically, by referring to the character appearance map M, the compressed file group is strictly selected and compressed files having therein all the characters constituting the search character string are obtained. For example, when the search character string is “”, the string is disassembled into single characters of “”, “”, “”, and “”. Logical multiplication is performed with respect to the bit rows of the single characters “”, “”, “”, and “” from the single-character appearance map M1 and thereby, narrowing down compressed files to be searched to the compressed files corresponding to bit numbers for which the result of the computing by logical multiplication is “1” (strict selection). -
FIG. 12 is a schematic of an example of strict selection of the compressed file fi using the character appearance map M. As depicted inFIG. 12 , for each bit number among the bit rows for the characters “”, “”, “”, and “”, a logical product is computed by logical multiplication (AND). Thus, it is known that all of the single characters “”, “”, “”, and “” are included in the files compressed into the compressed files f1, f23, f158, and f4971, respectively corresponding to the bit numbers i=1, 23, 158, and 4971 and each having a logic product of “1”. - Although, at this stage, it is only known that the files include all of the single characters “”, “”, “”, and “”, and not whether the single characters “”, “”, “”, and “” are included sequentially as a character string, a compressed file that may include the search character string can be identified while the file is in a compressed format.
- The
reading unit 1107 has a function of reading, from an area based on the resident flag set by thesetting unit 1104, the compressed file identified by the identifyingunit 1106. More specifically, for example, by referencing the resident flag in the character appearance map linking table 211, the area storing therein the identified compressed file is identified based on the value of the resident flag. - When the resident flag is “0”, the compressed file is not stored in the
cache area 240 and is read from the storage area of thearchives 200. In an opening process based on a common file name, the file path table is referenced and the corresponding compressed file fi is opened based on the head address and the size of a file ID that coincide in the binary search. With archives that have many compressed files stored therein, the time for the opening process becomes long. - On the other hand, in an opening process based on the bit number in the character appearance map, the
character appearance map 211 is referenced and, the head address and the size corresponding to the bit number can be obtained. Thus, the corresponding compressed file fi can be accessed at a high speed. When the resident flag is “1”, it is known that the file is stored in thecache area 240. Hence, the compressed file fi is accessible from thecache area 240, thereby further increasing the speed. - The expanding
unit 1108 has a function of expanding a compressed file read by thereading unit 1107. More specifically, for example, the read compressed file fi is expanded using the Huffman tree generated based on thecompression parameter 221. Consequently, the read compressed file fi only has to be expanded and therefore, the speed of file accesses can be increased. - The searching
unit 1109 has a function of searching the file expanded by the expandingunit 1108 for a character string that coincides with or is related to a search character string. More specifically, for example, a file to be searched having therein a character string that coincides with the search character string is extracted from the file group that is to be searched and whose files have been expanded. A file to be searched having a character string that includes the search character string in forward coincidence or reverse coincidence is extracted as a related file to be searched. In addition, when a character string co-occurring with the search character string is set, the file to be searched including the co-occurring character string is extracted as a related file to be searched. - The
output unit 1110 has a function of outputting a file to be searched that has been expanded by the expandingunit 1108. More specifically, the form of output from theoutput unit 1110 may be, for example, display on a display, output by printing by a printer, transmission to another computer, and storage in thestorage area 230 of theinformation searching apparatus 1100. When the output is displayed on a display, the expanded files to be searched may be displayed. Alternatively, the names of the expanded files to be searched may be displayed in a list and a user may select the name of one of the expanded files to be searched and the linked file to be searched may be read and displayed on a screen. - When the search is executed by the searching
unit 1109, the retrieved file to be searched may be displayed. Alternatively, the names of the retrieved files to be searched may be displayed in a list and a user may select the name of one of the files to be searched and the linked file to be searched may be read and displayed on a screen. - The
updating unit 1111 has a function of updating the access frequency of the compressed file when the compressed file is expanded by the expandingunit 1108. More specifically, for example, when the access frequency is expressed by the number of accesses, one is added to the number of accesses of the compressed file fi that has been expanded. When the access frequency is expressed by probability, one is added to the number of accesses of the expanded file and one is also added to the total number of accesses made to the compressed files f1 to fn. - On the other hand, for the compressed file fi that is not expanded, only the total number of accesses made to the compressed files f1 to fn is incremented by one. Therefore, the sorting
processing unit 1101 executes the sorting process according to the access frequency of the compressed file group f after the frequency is updated. Thus, the access frequency of the compressed file fi that tends to be strictly selected based on the character appearance map M increases, thereby enabling a faster expansion speed to be realized at subsequent expansions. - The
updating unit 1111 may update the access frequency for the compressed file fi retrieved by the searchingunit 1109 and not for the compressed file fi that has been expanded by the expandingunit 1108. More specifically, for example, when the access frequency is expressed by the number of accesses, one is added to the number of accesses of the compressed file fi including a file to be searched that has been retrieved. When the access frequency is expressed by probability, one is added to the number of accesses of the compressed file fi including a file to be searched that has been retrieved, and one is also added to the total number of accesses made to the compressed files f1 to fn. - On the other hand, for the compressed file fi of the files to be searched that is not retrieved, one is added only to the total number of accesses made to the compressed files f1 to fn. Therefore, the sorting
processing unit 1101 executes the sorting process according to the access frequency of the compressed file group after the frequency is updated. Thereby, the access frequency of the compressed file fi that is actually searched is increased and, therefore a faster searching speed can be realized at subsequent searches. -
FIG. 13 is a flowchart of a virtual archives setting process executed by a virtual archive setting function of theinformation searching apparatus 1100. The sortingprocessing unit 1101 sorts the compressed files in the character appearance map linking table 211 in descending order of access frequency (step S1301). Here, a sort position “k”, after the sorting is set to be k=1 (step S1302) and the combiningunit 1102 calculates the total size of the compressed files having the sort positions 1 to k+1 (step S1303). Whether the total size “s(1_k+1)” is s(1_k+1)>Ts is judged (step S1304). In this example, “Ts” is the maximum storage capacity of thecache area 240. - When s(1_k+1) is not s(1_k+1)>Ts (step S1304: NO), k is incremented (step S1305) and the procedure returns to step S1303. On the other hand, when s(1_k+1) is s(1_k+1)>Ts (step S1304: YES), because no more compressed files can be stored in the
cache area 240, the virtual archive capacity table 213 is updated such that the bit numbers, the access frequencies, and the sizes are those of the compressed files having the sort positions 1 to k+1 (step S1306). - The
writing unit 1103 writes the compressed files having the sort positions 1 to k into the cache area 240 (step S1307). In the example, the compressed files having sort positions after k are deleted from thecache area 240. Subsequently, thesetting unit 1104 sets the resident flags of the compressed files having the sort positions 1 to k in the character appearance map linking table 211 to be “ON” (from “0” to “1”) (step S1308). - For each of the compressed files having sort positions after k, the resident flag is set to be “OFF” (from “1” to “0”), ending a series of processing. According to the virtual archives setting process, prior to a search, compressed files each having a high access frequency can be set preferentially as the virtual archives and therefore, a faster file accessing speed can be realized.
-
FIG. 14 is a flowchart of an information search process executed by an information searching function of theinformation searching apparatus 1100. Theinput unit 1105 receives input of a search character string (step S1401). The search character string is disassembled into single characters or consecutive characters (hereinafter, “character”) (step S1402). The bit row for each disassembled character is extracted from the character appearance map M (step S1403) and for each bit number among the extracted bit rows, a logical product is computed by logical multiplication (step S1404). - The compressed files fi having logical products of “1” as a result of the computing are identified as compressed files that include the disassembled characters (step S1405). Subsequently, whether unprocessed compressed files fi among the identified compressed files fi are present is judged (step S1406). When an unprocessed compressed file fi is present (step S1406: YES), an unprocessed compressed file fi is selected (step S1407) and whether the resident flag is “ON” for the selected compressed file fi is judged (step S1408).
- When the resident flag is “ON” (step S1408: YES), the
reading unit 1107 transfers the selected compressed file fi directly from thecache area 240 to a register of the CPU 101 (step S1409) and the procedure advances to step S1411. On the other hand, When the resident flag is “OFF” (step S1408: NO), thereading unit 1107 reads the selected compressed file fi from thestorage area 230 of thearchives 200 to thecache area 240 and causes theCPU 101 to read this file, and the procedure advances to step S1411 (step S1410). At step S1411, the expandingunit 1108 executes an expansion process using the Huffman tree based on the compression parameter 221 (step S1411) and the procedure returns to step S1406. - At step S1406, when no unprocessed compressed files fi are present (step S1406: NO), the searching
unit 1109 searches the expanded files using the search character string (step S1412). Theoutput unit 1110 outputs the result of the search (step S1413). Subsequently, theupdating unit 1111 adds one to the access frequency of the corresponding compressed file fi in the character appearance map linking table 211 (step S1414), and a series of processing ends. - According to the information search process, the compressed file fi whose resident flag is set to be “ON” (“1”) is read from the
cache area 240 and the expansion process is executed. Therefore, a faster file accessing speed can be realized. Because the access frequency for the compressed file fi is updated each time a search is executed, the compressed file fi written in thecache area 240 can be updated one by one. Therefore, a faster file accessing speed can be realized at subsequent accesses. - As described above, according to the first embodiment, in accessing the compressed file fi in the archives, a faster speed can be achieved by using the character appearance map linking table 211 based on the bit number in the character appearance map. Files can be accessed in less time by placing files having a high access frequency in the cache memory. Therefore, expansion can be completed in significantly less time and a faster search speed can be achieved. Saving of the memory can be realized by effectively using the cache area. The
information searching apparatus 1100 of the first embodiment is applicable to a portable terminal such as a portable telephone, a portable game apparatus, and an electronic dictionary in addition to a personal computer and a search server. - A second embodiment will be described. For a site search on the Internet, updating of each site is regularly monitored; a large-scale index is generated based on the summarized data to which morphological analysis is executed; and a full text search is executed. With respect to increases in the amount of data of a site, conventionally, increasing the speed of the monitoring process for each site and increasing throughput, and the scalability of searches by multiple computers are problems.
- With respect to such problems, the second embodiment realizes faster speeds of addition, merger, and deletion of the
archives 200. For the scalability concerning grid computers, etc., the second embodiment realizes increased efficiency of the searching speed by dividing a search among slave servers and executing parallel processing, and by substantially equalizing the operating rate of each slave server. -
FIG. 15 is a schematic of a system configuration of a searching system according to the second embodiment. Asearching system 1500 includes amaster server 1501 and slave servers 1502-1 to 1502-N. Themaster server 1501 and each of the slave servers 1502-1 to 1502-N, or the slave servers 1502-1 to 1502-N are mutually communicable through thenetwork 114. Themaster server 1501 supervises and manages the slave servers 1502-1 to 1502-N. Each slave server 1502-I corresponds to theinformation searching apparatus 1100 of the first embodiment and a slave server 1502-I has the virtual archive setting function and the information searching function. - The type of the
archives 200 included in each slave server 1502-I (I=1 to N) differs. For example, archives 200-I retained by a slave server 1502-I are archives of a Japanese dictionary; archives 200-J (J≠I) retained by a slave server 1502-J are archives of a glossary; and archives 200-K (K≠I, J) retained by a slave server 1502-K are archives of a English-Japanese dictionary, and similarly, the types and the publishing companies differ among the sets of archives. - Each slave server 1502-I has archives 200-I that differ as well as a compression parameter 221-I in the archives 200-I also differs among the slave servers 1502-I. Therefore, a Huffman tree h-I retained in each slave server 1502-I has a structure that also differs.
- A multi-book search that is referred to as “meta-search” is executed with respect to the
slave server group 1502 above by providing a common search keyword from themaster server 1501. Each slave server 1502-I returns the search result to themaster server 1501 and thereby, themaster server 1501 is able to obtain a search result from multiple dictionaries. Hereinafter, for simplicity of description, it is assumed in the description that the number of the slave servers 1502-I is two (N=2). -
FIG. 16 is a schematic for describing a sharing of archives. To substantially equalize the search processes in each of the slave servers 1502-1 and 1502-2, themaster server 1501 collects the sets of archives 200-1 and 200-2 of the slave servers 1502-1 and 1502-2, and Huffman trees h-1 and h-2 through thenetwork 114. Integrated archives A formed by aggregating the sets of archives 200-1 and 200-2, and a common Huffman tree formed by making the Huffman trees h-1 and h-2 in archives 200-1 and 200-2 common are generated. -
FIG. 17 is a schematic for describing an allocation process for new archives. Themaster server 1501 divides the integrated archives A and transmits the divided sets of archives to the slave servers 1502-1 and 1502-2 as sets of new archives A1 and A2 respectively specific to slave servers 1502-1 and 1502-2 such that the search processes of the slave servers 1502-1 and 1502-2 are substantially equalized. New common Huffman trees H1 and H2 are transmitted respectively to the slave servers 1502-1 and 1502-2. In the example, one set of archives is allocated to one slave server and therefore, the common Huffman tree H is transmitted to the slave servers 1502-1 and 1502-2. However, when plural sets of archives and respective Huffman trees are present for each slave server, a common Huffman tree specific to each slave server is transmitted to the slave server. For example, when the archives 200-1 and 200-2 and the Huffman trees h-1 and h-2 are present in the slave server 1501-1, the common Huffman tree H is transmitted to the slave server 1502-1. -
FIG. 18 is a schematic of a compression symbol table and thecompression parameter 221 of the archives 200-1. InFIG. 18 , for simplicity of description, characters “a” to “f” are described in a file group that is to be searched, is compressed, and is in the archives 200-1. InFIG. 18 , section (A) depicts a compression symbol table 1800 of the archives 200-1 and section (B) depicts a compression parameter P1 of the archives 200-1. In the compression symbol table 1800, a shorter compression symbol is allocated to a character having a larger frequency of appearance. -
FIG. 19 is a schematic of the Huffman tree h-1 generated from the compression symbol table 1800 of the archives 200-1. InFIG. 19 , a circle is a node. The highest sort position node is referred to as “root R” and other nodes are referred to as “internal nodes”. A square is a leaf. A line connecting nodes or a node and a leaf is a branch. A character depicted within a leaf is a character obtained after expansion. A character string depicted below a leaf is the compression symbol allocated to the character obtained after expansion indicated in the leaf. -
FIG. 20 is a schematic of a compression symbol table and thecompression parameter 221 of the archives 200-2. InFIG. 20 , for simplicity of description, characters “a” to “f” are described in a file group that is to be searched, is compressed, and is in the archives 200-2. InFIG. 20 , section (A) depicts a compression symbol table 2000 of the archives 200-2 and section (B) depicts a compression parameter P2 of the archives 200-2. In the compression symbol table 2000, a shorter compression symbol is allocated to a character having a larger frequency of appearance.FIG. 21 is a schematic of the Huffman tree h-2 generated from the compression symbol table 2000 of the archives 200-2. -
FIG. 22 is a schematic of a compression symbol table and thecompression parameter 221 of the integrated archives 200A. Because the integrated archives A is a integration of the archives 200-1 and 200-2, characters “a” to “f” are described in the files that are to be searched, are compressed, and are included in the integrated archives A. Therefore, the frequency of appearance of the common compression parameter P depicted inFIG. 22 is a value obtained by summing, for each character, the frequency of appearance of the compression parameter P1 of the archives 200-1 and of the compression parameter P2 of the archives 200-2.FIG. 23 is a schematic of the common Huffman tree H generated from the compression symbol table of the integrated archives A. - Reconfiguration of the archives in the second embodiment will be described. The integrated archives A are generated by integrating the archives 200-1 and the archives 200-2. The stored contents of the archives 200-1 will be described.
-
FIG. 24 is a schematic of the stored contents of the archives 200-1. The archives 200-1 are stored in thestorage area 230 such as theRAM 103 or themagnetic disc 105 depicted inFIG. 1 . The archives 200-1 include thelibrary area 201, themanagement area 202, and thedata area 203. Thelibrary area 201 stores therein a character appearance map linking table 211 a, a file path linking table 212 a, and a virtual archive capacity table 213 a. Themanagement area 202 stores therein the compression parameter P1, a file path table 222 a, and a character appearance map Ma (including a single-character appearance map Ma1 and a consecutive-character appearance map Ma2). Thedata area 203 stores therein a compressed file group fa (compressed files fa_1 to fa_n) as depicted inFIG. 3 . Descriptions of these components are identical to those described in the first embodiment. -
FIG. 25 is a schematic of the single-character appearance map Ma1 of the archives 200-1.FIG. 26 is a schematic of the consecutive-character appearance map Ma2 of the archives 200-1. In the character appearance map Ma of the archives 200-1, for convenience, to distinguish bit numbers in the archives 200-1 from the bit numbers in the archives 200-2, the bit numbers in the archives 200-1 are indicated as a_1 to a_n.FIG. 27 is a schematic of the compression parameter P1 of the archives 200-1.FIG. 28 is a schematic of a file path table 222 a of the archives 200-1.FIG. 29 is a schematic of a character appearance map linking table 211 a of the archives 200-1.FIG. 30 is a schematic of a file path linking table 212 a of the archives 200-1. -
FIG. 31 is a schematic of the stored contents of the archives 200-2. The archives 200-2 are stored in thestorage area 230 such as theRAM 103 or themagnetic disc 105 depicted inFIG. 1 . The archives 200-2 include thelibrary area 201, themanagement area 202, and thedata area 203. Thelibrary area 201 stores therein a character appearance map linking table 211 b, a file path linking table 212 b, and a virtual archive capacity table 213 b. Themanagement area 202 stores therein the compression parameter P2, a file path table 222 b, and a character appearance map Mb (including a single-character appearance map Mb1 and a consecutive-character appearance map Mb2). Thedata area 203 stores therein a compressed file group fb (compressed files fb_1 to fb_m) as depicted inFIG. 3 . Descriptions of these components are identical to those described in the first embodiment. -
FIG. 32 is a schematic of the single-character appearance map Mb1 of the archives 200-2.FIG. 33 is a schematic of the consecutive-character appearance map Mb2 of the archives 200-2. In the character appearance map Mb of the archives 200-2, for convenience, to distinguish the bit numbers in the archives 200-2 from those in the archives 200-1, the bit numbers in the archives 200-2 are indicated as b_1 to b_m.FIG. 34 is a schematic of the compression parameter P2 of the archives 200-2.FIG. 35 is a schematic of a file path table 222 b of the archives 200-2.FIG. 36 is a schematic of a character appearance map linking table 211 b of the archives 200-2.FIG. 37 is a schematic of a file path linking table 212 b of the archives 200-2. -
FIG. 38 is a schematic for explaining an example of common parameter generation. The common compression parameter P is generated by summing, for each of the characters, the frequency of appearance of the compression parameter P1 of the archives 200-1 and of the compression parameter P2 of the archives 200-2. -
FIG. 39 is a schematic for explaining a reconfiguration of the character appearance map linking tables 211 a and 211 b. The character appearance map linking table 211 a of the archive 200-1 and the character appearance map linking table 211 b of the archives 200-2 are integrated and the items in the tables 211 a and 211 b are sorted in descending order of access frequency. A character appearance map linking table 3900 obtained after the integration includes access frequencies for n+m bit numbers, respectively. Subsequently, the access frequencies are allocated to the slave servers 1502-1 and 1502-2 such that the access frequencies are substantially equivalent between the slave servers 1502-1 and 1502-2. - A new character appearance map linking table 3900 a is generated by allocating, to the slave server 1502-1, the access frequencies of the bit numbers whose sort positions in descending order of access frequency are odd numbered. A new character appearance map linking table 3900 b is generated by allocating, to the slave server 1502-2, the access frequencies of the bit numbers whose sort positions in descending order of access frequency are even numbered. The sort positions are divided into odd-numbered sort positions and even numbered sort positions as a method of allocation in this example. However, the sort positions one, four, five, eight, nine, etc., may be allocated to the slave server 1502-1 while the sort positions two, three, six, seven, ten, etc., may be allocated to the slave server 1502-2. Further, any allocation method may be employed as far as the sort positions (or the access frequencies) are allocated such that the totals of the allocated sort positions (or the allocated access frequencies) are equivalent.
-
FIG. 40 is a schematic for explaining reconfiguration of the file path linking tables 212 a and 212 b. The file path linking tables 212 a and 212 b respectively of the sets of archives 200-1 and 200-2 are integrated. A file path linking table 4000 obtained after the integration includes file paths for a total of n+m bit numbers. Subsequently, the bit numbers are allocated according to the allocation method employed for the character appearance map linking table 3900. - Thus, for the slave server 1502-1, a file path linking table 4000 a for the same bit numbers as the bit numbers in the character appearance map linking table 3900 a allocated to the slave server 1502-1 is obtained. Similarly, for the slave server 1502-2, a file path linking table 4000 b for the same bit numbers as the bit numbers in the character appearance map linking table 3900 b allocated to the slave server 1502-2 is obtained.
-
FIG. 41 is a schematic for explaining reconfiguration of the file path tables 222 a and 222 b. The file path tables 222 a and 222 b respectively of the sets of archives 200-1 and 200-2 are integrated. A file path table 4100 obtained after the integration has file paths for a total of n+m file IDs. The file IDs corresponding to the bit numbers are allocated according to the allocation method employed for the character appearance map linking table 3900. - Thus, for the slave server 1502-1, a file path table 4100 a for the file IDs corresponding to the same bit numbers as the bit numbers in the character appearance map linking table 3900 a allocated to the slave server 1502-1 is obtained. Similarly, for the slave server 1502-2, a file path table 4100 b for the file IDs corresponding to the same bit numbers as the bit numbers in the character appearance map linking table 3900 b allocated to the slave server 1502-2 is obtained.
-
FIG. 42 is a schematic for explaining reconfiguration of the single-character appearance maps Ma1 and Mb1. The single-character appearance maps Ma1 and Mb1 respectively of the sets of archives 200-1 and 200-2 are integrated. A single-character appearance map Mab1 obtained after the integration has a bit row including a total of n+m bits for each character. Subsequently, the bit numbers are allocated according to the allocation method employed for the character appearance map linking table 3900. - Thus, for the slave server 1502-1, a single-character appearance map MA1 for the same bit numbers as the bit numbers in the character appearance map linking table 3900 a allocated to the slave server 1502-1 is obtained. Similarly, for the slave server 1502-2, a single-character appearance map MB1 for the same bit numbers as the bit numbers in the character appearance map linking table 3900 b allocated to the slave server 1502-2 is obtained.
-
FIG. 43 is a schematic for explaining reconfiguration of the consecutive-character appearance maps Ma2 and Mb2. The consecutive-character appearance maps Ma2 and Mb2 respectively of the sets of archives 200-1 and 200-2 are integrated. A consecutive-character appearance map Mab2 obtained after the integration has bit a string including n+m bits in total for each character. Subsequently, the bit numbers are allocated according to the allocation method employed for the character appearance map linking table 3900. - Thus, for the slave server 1502-1, a consecutive-character appearance map MA1 for the same bit numbers as the bit numbers in the character appearance map linking table 3900 a allocated to the slave server 1502-1 is obtained. Similarly, for the slave server 1502-2, a consecutive-character appearance map MB2 for the same bit numbers as the bit numbers in the character appearance map linking table 3900 b allocated to the slave server 1502-2 is obtained.
-
FIG. 44 is a schematic for explaining reconfiguration of the compressed file groups fa and fb. The compressed file group fa of the archives 200-1 is expanded using the Huffman tree h-1 corresponding thereto. Thus, a file group Fa to be searched is obtained. Similarly, the compressed file group fb of the archives 200-2 is expanded using the Huffman tree h-2 corresponding thereto. Thus, a file group Fb to be searched is obtained. - The file group Fa to be searched is recompressed using the common Huffman tree H. Thus, a compressed file group ga is obtained. Similarly, the file group Fb to be searched is recompressed using the common Huffman tree H. Thus, a compressed file group gb is obtained.
- Subsequently, the compressed file groups ga and gb that have been recompressed are integrated. The files are sorted in descending order of access frequency according to the allocation method employed for the character appearance map linking table 3900. Thus, an integrated compressed file group “g” in descending order of access frequency is obtained.
- Hence, a compressed file group g1 for the same bit numbers as the bit numbers in the character appearance map linking table 3900 a allocated to the slave server 1502-1 is allocated to the slave server 1502-1. Similarly, a compressed file group g2 for the same bit numbers as the bit numbers in the character appearance map linking table 3900 b allocated to the slave server 1502-2 is allocated to the slave server 1502-2.
-
FIG. 45 is a schematic of the stored contents of the new archives A1. The new archives A1 are transmitted to the slave server 1502-1. The new archives A1 store therein the common compression parameter P depicted inFIG. 38 , the character appearance map linking table 3900 a after the reconfiguration depicted inFIG. 39 , the file path linking table 4000 a after the reconfiguration depicted inFIG. 40 , the file path table 4100 a after the reconfiguration depicted inFIG. 41 , the single-character appearance map MA1 after the reconfiguration depicted inFIG. 42 , the consecutive-character appearance map M2 after the reconfiguration depicted inFIG. 43 , and the compressed file group g1 after the reconfiguration depicted inFIG. 44 . -
FIG. 46 is a schematic of the stored contents of the new archives A2. The new archives A2 are transmitted to the slave server 1502-2. The new archives A2 stores therein the common compression parameter P depicted inFIG. 38 , the character appearance map linking table 3900 b after the reconfiguration depicted inFIG. 39 , the file path linking table 4000 b after the reconfiguration depicted inFIG. 40 , the file path table 4100 b after the reconfiguration depicted inFIG. 41 , the single-character appearance map MB1 after the reconfiguration depicted inFIG. 42 , the consecutive-character appearance map MB2 after the reconfiguration depicted inFIG. 43 , and the compressed file group g2 after the reconfiguration depicted inFIG. 44 . -
FIG. 47 is a block diagram of a functional configuration of the master server 1501 (information managing apparatus). Themaster server 1501 includes areceiving unit 4701, a common compressedparameter generating unit 4702, a common Huffmantree generating unit 4703, an expandingunit 4704, acompressing unit 4705, areconfiguring unit 4706, and atransmitting unit 4707. Functions of the units from the receivingunit 4701 to thetransmitting unit 4707 are implemented by, for example, causing theCPU 101 to execute a program stored in the storage area such as theROM 102, theRAM 103, and themagnetic disc 105 depicted inFIG. 1 , or by the I/F 109. - The receiving
unit 4701 has a function of receiving data transmitted from the slave server 1502-1. More specifically, for example, the receivingunit 4701 receives the sets of archives 200-1 to 200-N and the Huffman trees h-1 to h-N from the slave servers 1502-1 to 1502-N. - The common compression
parameter generating unit 4702 has a function of generating the common compression parameter P for all sets of archives 200-1 to 200-N. More specifically, for example, as depicted inFIG. 38 , the compression parameters P1 and P2 included in the sets of archives 200-1 and 200-2 received from the slave servers 1502-1 and 1502-2 are extracted. The common compression parameter P is generated by summing, for each character, the frequency of appearance of the extracted compression parameters P1 and P2. The generated common compression parameter P is transmitted to the common Huffmantree generating unit 4703 and thearchives generating unit 4713. - The common Huffman
tree generating unit 4703 has a function of generating the common Huffman tree H for all sets of archives 200-1 to 200-N. More specifically, for example, the common Huffmantree generating unit 4703 generates the common Huffman tree H by allocating “0” and “1” to characters in descending order of the frequency of appearance of the common compression parameter P using the binary search (seeFIGS. 22 and 23 ). The generated common Huffman tree is transmitted to the expandingunit 4704 and thearchives generating unit 4713. - The expanding
unit 4704 has a function of expanding the compressed file group f included in the archives 200-1 for each set of archives 200-I. The Huffman tree used in the expansion process is the Huffman tree h-1 transmitted together with the archives 200-I. For example, as depicted inFIG. 44 , the file group Fa to be searched is formed by expanding the compressed file group fa using the Huffman tree h-1 that is used for the compression of the compressed file group fa. Similarly, the file group Fb to be searched is obtained by expanding the compressed file group fb using the Huffman tree h-2 that is used for the compression of the compressed file group fb. - The
compressing unit 4705 has a function of recompressing the file group to be searched that has been expanded by the expandingunit 4704. The Huffman tree used for the recompression is the common Huffman tree H. For example, as depicted inFIG. 44 , the compressed file group ga is obtained by recompressing the file group Fa using the common Huffman tree H. Similarly, the compressed file group gb is obtained by recompressing the file group Fb using the Huffman tree H that is used for compressing the file group Fb. The compressed file groups ga and gb are integrated by an integrating unit 4711. - The
reconfiguring unit 4706 has a function of reconfiguring the each set of received archives 200-I and each Huffman tree h-I. Thereconfiguring unit 4706 includes the integrating unit 4711, an allocatingunit 4712, and anarchives generating unit 4713. The integrating unit 4711 has a function of integrating the data in each set of archives 200-I. - More specifically, for example, as depicted in
FIGS. 39 to 43 , the integrating unit 4711 integrates: the character appearance map linking tables 211 a and 211 b respectively of the sets of archives 200-1 and 200-2; the file path tables 222 a and 222 b; the single-character appearance maps Ma1 and Mb1; and the consecutive-character appearance map Ma2 and Mb2 and, thereby, the integrating unit 4711 obtains the character appearance map linking table 3900 after the integration, the file path linking table 4000 after the integration, the file path table 4100 after the integration, the single-character appearance map Mab1 after the integration, and the consecutive-character appearance map Mab2 after the integration. - As depicted in
FIG. 44 , the integrating unit 4711 integrates the compressed file group ga and gb that are recompressed by thecompressing unit 4705 respectively for the sets of archives 200-1 and 200-2 and the integrating unit 4711 obtains the compressed file group g after the integration. - The allocating
unit 4712 has a function of allocating to each slave server 1502-I, the data integrated by the integrating unit 4711 such that the access frequency after the allocation is equivalent in each slave server 1502-I for each set of archives. - More specifically, for example, as depicted in
FIG. 39 , the records of the character appearance map linking table 3900 after the integration are allocated such that the access frequencies or the sort positions thereof are substantially equivalent, and thereby, the character appearance map linking tables 3900 a and 3900 b reconfigured respectively for the slave servers 1502-1 and 1502-2 are obtained. - Descriptions of the file path linking table 4000 after the integration, the file path table 4100 after the integration, the single-character appearance map Mab1 after the integration, and the consecutive-character appearance map Mab2 are identical to those given with respect to
FIGS. 40 to 43 . As depicted inFIG. 44 , the allocatingunit 4712 further allocates, for the compressed file group g after the integration, such that the access frequencies or the sort positions thereof are substantially equivalent, and the compressed file groups g1 and g2 that are reconfigured for each of the slave servers 1502-1 and 1502-2 are obtained. - The
archives generating unit 4713 has a function of generating new archives that are reconfigured for each slave server 1502-I. More specifically, for example, thearchives generating unit 4713 aggregates, for each of the slave servers 1502-1 and 1502-2, the data allocated respectively thereto and thereby, forms the sets of new archives A1 and A2. - The
transmitting unit 4707 has a function of transmitting data to slave servers 1502-I. More specifically, for example, thetransmitting unit 4707 transmits a request for the collection of the sets of archives 200-1 to 200-N and the Huffman trees h-1 to h-N. Thetransmitting unit 4707 transmits the new archives A1 (A2) respectively together with the common Huffman tree H to respective allocation destinations, the slave server 1502-1 (1502-2). -
FIGS. 48 and 49 are flowcharts of the archives reconfiguring process by themaster server 1501. As depicted inFIG. 48 , the receivingunit 4701 collects the sets of archives 200-1 to 200-N and the Huffman trees h-1 to h-N of the slave servers 1502-1 to 1502-N (step S4801). - The integrating unit 4711 extracts and integrates the character appearance map linking tables 211 a and 211 b in the sets of archives 200-1 and 200-2 (step S4802). The allocating unit sorts the character appearance map linking table 3900 after the integration in descending order of access frequency (step S4803), and allocates the character appearance map linking tables 3900 a and 3900 b respectively to the slave servers 1502-1 and 1502-2 (step S4804).
- The file path tables 222 a and 222 b, the file path linking tables 212 a and 212 b, and the character appearance maps Ma and Mb are integrated, and are allocated to slave servers 1502-1 and 1502-2 according to bit numbers that each have a high access frequency (or the corresponding file ID) and, thereby, the reconfiguration is executed (step S4805).
- The common compression
parameter generating unit 4702 generates the common compression parameter P (step S4806). The common Huffmantree generating unit 4703 generates the common Huffman tree H (step S4807). - As depicted in
FIG. 49 , the expandingunit 4704 expands the compressed file groups fa and fb respectively for the sets of archives 200-1 and 200-2 using respectively the Huffman trees h-1 and h-2 that are used for the compression of the file groups fa and fb (step S4901). Thecompressing unit 4705 recompresses the file groups Fa and Fb that are to be searched and that have been expanded respectively for the sets of archives 200-1 and 200-2, using the common Huffman tree H (step S4902). The integrating unit 4711 integrates the compressed file groups ga and gb that have been recompressed (step S4903), and sorts the bit numbers in descending order of access frequency (step S4904). - Subsequently, the allocating
unit 4712 allocates the bit numbers to the slave servers 1502-1 and 1502-2 such that the totals of the access frequencies or the sort positions thereof are substantially equivalent (step S4905). Thearchives generating unit 4713 generates the sets of new archives A1 and A2 respectively for the slave servers 1502-1 and 1502-2 (step S4906), and transmits the new archives A1 (A2) and the common Huffman tree H to the slave server 1502-1 (1502-2) that is the allocation destination of the new archives A1 (A2) (step S4907), ending a series of processing. - As described above, according to the second embodiment, by reconfiguring the sets of archives 200-1 and 200-2 of the slave servers 1502-1 and 1502-2, substantial equalization of the searching speed between the slave servers 1502-1 and 1502-2 is achieved. Therefore, when the same search character string is given to each of the slave servers 1502-1 and 1502-2, the search results are returned substantially simultaneously from the slave servers 1502-1 and 1502-2. That is, the waiting time for the last search result can be reduced and therefore, improvement of the searching speed is enabled.
- A third embodiment is configured by improving a portion of the second embodiment. The second embodiment is configured to execute the step of expanding the compressed file group of each set of archives 200-I using the Huffman tree h-1 that is used for compressing the compressed file group, and the step of recompressing the file group that has been expanded and is to be searched, using the common Huffman tree H are executed. The second embodiment is also configured to be able to compress and expand in each slave server 1502-I using the common Huffman tree H by executing this two-path processing, that is, the expansion and the recompression.
- Whereas, the third embodiment is configured to identify, from the common Huffman tree H, the leaves of the same characters as the expanded characters buried in the leaves of the Huffman tree of each slave server 1502-I by the expansion process using the Huffman tree. In addition, the compressed symbols allocated to the identified leaves of the common Huffman tree H are set instead of the expanded characters of the leaves of the Huffman tree h-I that is the identification origin. The Huffman tree after the setting is referred to as “converting Huffman tree”.
- The compressed file group is converted into a compressed file group corresponding to the compression symbol of the common Huffman tree H by executing the expansion process of the compressed file group that has been compressed using the Huffman tree h-1 obtained before setting, using the converting Huffman tree. In this manner, the compressed file group corresponding to the compression symbol of the common Huffman tree H, remaining in the compressed format, can be obtained for each slave server 1502-I by the one-path processing of one converting process. Therefore, an increased speed of the reconfiguring process in the
master server 1501 can be realized. - The components identical to those in the first and the second embodiments are given identical reference numerals and the description thereof is omitted. In the third embodiment, for the simplicity of description, the description will be given assuming that the number of the master servers 1502-I is two (N=2).
-
FIG. 50 is a schematic for explaining an example of generating the converting Huffman tree. The example depicted inFIG. 50 is of generation that uses the Huffman tree h-I of the archives 200-I and the common Huffman tree H. In the example, the leaf to which a character “b” of a Huffman tree b-1 is set is noted. A leaf of the common Huffman tree to which the same character as the character “b” of the noted leaf is identified (step S5001). A compression symbol “110” and the length of the compression symbol (in this example, “three”) that are allocated to the character “b” of the identified leaf of the common Huffman tree are read (step S5002). Instead of the character “b” set to the noted leaf, the compression symbol and the length of the compression symbol “110 (3)” that are read are written (step S5003). Other characters “a”, “c”, to “f” are similarly converted. -
FIG. 51 is a schematic for explaining a second example of generating the converting Huffman tree. The example depicted inFIG. 51 is of generation that uses the Huffman tree h-2 of the archives 200-2 and the common Huffman tree H. In the example, a leaf to which a character “f” of the Huffman tree h-2 is noted. A leaf of the common Huffman tree to which the same character as the character “f” of the noted leaf is identified (step S5101). A compression symbol “1110” and the length of the compression symbol (in this example, “four”) that are allocated to the character “f” of the identified leaf of the common Huffman tree are read (step S5102). Instead of the character “f” set to the noted leaf, the compression symbol and the length of the compression symbol “1110 (4)” that are read are written (step S5103). Other characters “a” to “e” are similarly converted. -
FIG. 52 is a schematic of a first converting Huffman tree. A converting Huffman tree H1 is a Huffman tree generated by the generation process that uses the Huffman tree h-1 of the archives 200-1 and the common Huffman tree H depicted inFIG. 50 . For example, when expansion aiming at an expanded character to which the compression symbol “1110” is allocated is attempted using the converting Huffman tree H1, the expansion does not provide the character “b” depicted inFIG. 50 , but rather provides conversion to the compression symbol “110” set instead of the character “b”. Therefore, the two-path processing including the expansion and the recompression of the compressed file is not necessary. Hence, the compressed file that can be compressed and expanded using the common Huffman tree H can be obtained by the one-path processing handling the compressed file in the compressed format. -
FIG. 53 is a schematic of a second converting Huffman tree. A converting Huffman tree H2 is a Huffman tree generated by the generation process that uses the Huffman tree h-2 of the archives 200-2 and the common Huffman tree H depicted inFIG. 51 . For example, when expansion aiming at an expanded character to which the compression symbol “1111” is allocated is attempted using the converting Huffman tree H2, the expansion does not provide the character “f” depicted inFIG. 51 , but rather provides conversion to the compression symbol “1110” set instead of the character “f”. Therefore, the two-path processing including the expansion and the recompression of the compressed file is not necessary. Hence, the compressed file that can be compressed and expanded using the common Huffman tree H can be obtained by the one-path processing handling the compressed file in the compressed format. -
FIG. 54 is a block diagram of a functional configuration of the master server 1501 (information managing apparatus) according to the third embodiment. In addition to the configuration described in the second embodiment, themaster server 1501 includes a selectingunit 5401, an identifyingunit 5402, asetting unit 5403, and a convertingunit 5404. More specifically, functions of the units from the selectingunit 5401 to the convertingunit 5404 are implemented by, for example, causing theCPU 101 to execute a program stored in the storage area such as theROM 102, theRAM 103, and themagnetic disc 105 depicted inFIG. 1 , or by the I/F 109. - The selecting
unit 5401 has a function of successively selecting arbitrary leaves from the Huffman tree h-I used for compression of the compressed file group in a corresponding archives 200-I, for each set of archives 200-I. More specifically, for example, the selectingunit 5401 successively selects the leaves of the Huffman tree h-I depicted inFIGS. 50 and 51 . - The identifying
unit 5402 has a function of identifying from the common Huffman tree H, the leaves of the same character as the character expanded, using the leaves successively selected by the selectingunit 5401. More specifically, as depicted inFIGS. 50 and 51 , the identifyingunit 5402 identifies the leaves of the common Huffman tree H to which the same character is set as depicted inFIGS. 50 and 51 . - The
setting unit 5403 has a function of setting, to a leaf selected in the Huffman tree h-I, the compressed symbol allocated to the leaf identified by the identifyingunit 5402 instead of the character expanded using the selected leaf. More specifically, thesetting unit 5403 overwrites the compression symbol identified from the common Huffman tree H into the area into which an expanded character is written in the construction of the selected leaf. Thesetting unit 5403 writes the length of the compressed symbol into another blank area. In the structure of the leaf that is the setting target, the pointer to an upper node remains as it is and, therefore, the conversion provides the compression symbol written in the structure of the leaf by designating the compression symbol allocated to the selected leaf. - The converting
unit 5404 has a function of converting the compressed file compressed using the Huffman trees h-1 and h-2 before the conversion, using the converting Huffman trees H1 and H2 handling the compressed file remaining in the compressed format. Thereby, the compressed file groups ga and gb that can be compressed and expanded using the common Huffman tree H are obtained. Similarly to the second embodiment, the compressed file groups ga and gb after the conversion are integrated by the integrating unit 4711, and are allocated to the slave servers 1502-1 and 1502-2 such that the totals of the access frequencies or the sort positions thereof are substantially equivalent between the slave servers 1502-1 and 1502-2. -
FIG. 55 is a flowchart of the archives reconfiguring process (latter half) by themaster server 1501. The reconfiguring process (former half) is identical to that depicted inFIG. 18 and the description thereof is omitted. - A compression symbol setting process for the Huffman trees h-1 and h-2 is executed (step S5501). The converting
unit 5404 executes the one-path converting process (step S5502). The one-path converting process is executed for each of the compressed file groups ga and gb in the sets of archives 200-1 and 200-2 using the converting Huffman trees H1 and H2 obtained respectively for the sets of archives 200-1 and 200-2. - The integrating unit 4711 integrates the compressed file groups ga and gb after the conversion (step S5503), and sorts in descending order of access frequency (step S5504). Subsequently, the allocating
unit 4712 allocates the compressed files to the slave servers 1502-1 and 1502-2 such that the totals of the access frequencies or the sort positions thereof are substantially equivalent (step S5505). - The
archives generating unit 4713 generates the sets of new archives A1 and A2 respectively for the slave servers 1502-1 and 1502-2 (step S5506), and transmits the new archives A1 (A2) and the common Huffman tree H to the slave server 1502-1 (1502-2) that is the allocation destination of the new archives A1 (A2) (step S5507), ending a series of processing. -
FIG. 56 is a flowchart of the compressed symbol setting process to the Huffman tree. The selectingunit 5401 judges whether any unprocessed Huffman trees are present from the Huffman trees h-1 and h-2 respectively of the sets of archives 200-1 and 200-2 (step S5601). When an unprocessed Huffman tree is present (step S5601: YES), an unprocessed Huffman tree is selected (step S5602). The selectingunit 5401 judges whether any unprocessed leaves are present in the selected Huffman tree (step S5603). - When an unprocessed leaf is present (step S5603: YES), an unprocessed leaf is selected (step S5604). The identifying
unit 5402 identifies the leaf to which the same character as the character set in the selected leaf is set, from the common Huffman tree H (step S5605). Subsequently, thesetting unit 5403 sets, in the structure of the selected leaf, the compressed symbol and the length of the compressed symbol that are allocated to the identified leaf (step S5606). The procedure returns to step S5603. - At step S5603, when no unprocessed leaf is present (step S5603: NO), the procedure returns to step S5601. At step S5601, when no unprocessed Huffman tree is present (step S5601: NO), the procedure moves to the one-path converting process (step S5502).
- As described above, according to the third embodiment, by utilizing an existing process (the Huffman tree expansion process) the compressed file groups corresponding to the compression symbols of the common Huffman tree, remaining in a compressed format, can be obtained for the slave servers 1502-1 and 1502-2 using a one-path process, i.e., the one-time converting process. Therefore, the file opening process (expansion process) to apply the file to the common Huffman tree H becomes unnecessary and, therefore, increased speed of the archives reconfiguring process in the
master server 1501 can be realized. Thearchives 200 reconfiguring process can be realized with a simple configuration and without creating a new algorithm because the existing process, the Huffman tree expansion process, is applied. - According to the first to the third embodiments, higher efficiency of the search process can be realized by achieving increased speed of the file accessing process when a full text search is implemented to compressed files, handling the files remaining in the compressed format. Increased speed of a full text search can be realized by causing the cache area to be a resident memory and increasing the efficiency of the server resources.
- The method explained in the present embodiment can be implemented by a computer, such as a personal computer and a workstation, executing a program that is prepared in advance. The program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read out from the recording medium by a computer. The program can be a transmission medium that can be distributed through a network such as the Internet.
- According to the first embodiment, compressed files in the archives is compressed using a common parameter; and the management area of the archives has written therein a table that stores therein bit numbers in a character appearance map that uses the head address of each compressed file for the strict selection of the files to be searched such the strict selection described in International Publication Pamphlet No. 2006-123448, and the head addresses of the compressed files corresponding to the bit numbers.
- According to the first embodiment, numerous opening processes are possible by executing once the computing of the compression parameters. The head address of a corresponding compressed file can be obtained based on the position number of a flag of a file that is strictly selected using the character appearance map and, therefore, a high-speed opening process can be realized.
- According to the second embodiment, archives that include a compressed file group including compressed files that are to be searched and that have described therein character strings are accessed; the compressed files are sorted in descending order of the frequency of accesses; the compressed files are combined in the descending order of the access frequency after the sorting such that the combined size does not exceed the storage capacity of a cache area for the storage area having stored therein the compressed file group; and the combined compressed file group is written from the storage area into the cache area prior to a search in the file group.
- According to the second embodiment, compressed files having a high access frequency can be stored in the cache area with preference.
- According to the third embodiment, a plurality of slave servers are accessed, each having stored therein archives that include a compressed file group including compressed files that have character strings described therein and that are to be searched; the archives are received from each of the slave servers; based on each character described in the file group to be searched for each set of received archives and compression parameters concerning the appearance frequency of each character, appearance frequencies are totaled for each character; thereby, a compression parameter that is common to the compressed file group is generated; based on the generated common compression parameter, a Huffman tree common to the compressed file group is generated; the compressed files are allocated to the slave servers such that sums of the access frequencies of each compressed file are substantially equivalent among the slave servers; and new archives including the compressed file group allocated to each slave server and the common Huffman tree are transmitted to the slave server to which the compressed files are allocated.
- According to the third embodiment, archives including the compressed file group for which the access frequencies are equivalent can be distributed to the slave servers and, therefore, higher efficiency and a higher speed of a search among all the grid computers can to be facilitated.
- According to the embodiments, an effect is exerted that, when a full text search is implement for a compressed file remaining in the compressed format, higher efficiency of the search process can be realized by enabling a higher-speed file accessing process.
- All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiment(s) of the present inventions have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Claims (7)
1. A computer-readable recording medium storing therein an information searching program that causes a computer having access to archives including a compressed file group of compressed files that are to be searched and that have described therein character strings, to execute:
sorting the compressed files in descending order of access frequency of the compressed files;
combining the compressed files in descending order of access frequency after the sorting at the sorting such that a storage capacity of a cache area for a storage area that stores therein the compressed file group is not exceeded by a combined size of the compressed files combined; and
writing, from the storage area into the cache area, the compressed files combined at the combining, the compressed files combined being written prior to a search of the compressed files combined.
2. The computer-readable recording medium according to claim 1 , wherein
combining includes combining the compressed files in descending order of access frequency after the sorting at the sorting, such that the combined size of the compressed files is of a greatest value that does not exceed the storage capacity of the cache area, and
the writing includes writing from the storage area into the cache area, the compressed files whose combined size is the greatest value, the writing being performed prior to the search of the compressed files combined.
3. The computer-readable recording medium according to claim 1 , wherein the information searching program further causes the computer to execute:
setting a resident flag for the compressed files written in the cache area at the writing;
receiving input of a search character string;
identifying a compressed file that includes all the characters included the search character string received at the receiving, by referencing a character appearance map indicating a presence or absence of characters in each compressed file;
reading the compressed file identified at the identifying, from an area according to the resident flag set at the setting;
expanding, into a file to be searched, the compressed file read at the reading; and
outputting the file to be searched expanded at the expanding.
4. The computer-readable recording medium according to claim 3 , wherein
the information searching program further causes the computer to execute searching for a character string that coincides with or is related to the search character string from the file that is to be searched expanded at the expanding, and
the outputting includes outputting a search result obtained at the searching.
5. The computer-readable recording medium according to claim 3 , wherein
the information searching program further causes the computer to execute updating the access frequency of the compressed file when the compressed file is expanded at the expanding, and
the sorting includes sorting the compressed files in order of descending access frequency of the compressed files based on an access frequency after updating at the updating.
6. The computer-readable recording medium according to claim 4 , wherein
the information searching program further causes the computer to execute updating, when a file to be searched is retrieved at the searching, updates the access frequency of a compressed file formed by compressing the file to be searched, and
the sorting includes sorting the compressed files in descending order of access frequency of the compressed files based on an access frequency after the updating at the updating.
7. An information searching apparatus having access to archives including a compressed file group of compressed files that are to be searched and that have described therein character strings, the information searching apparatus comprising:
a sorting unit that sorts the compressed files in descending order of access frequency of the compressed files;
a combining unit that combines the compressed files in descending order of access frequency after the sorting by the sorting unit such that a storage capacity of a cache area for a storage area that stores therein the compressed file group is not exceeded by a combined size of the compressed files combined; and
a writing unit that writes, from the storage area into the cache area, the compressed files combined by the combining unit, the writing unit writing the compressed files combined prior to a search of the compressed files combined.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/232,089 US20120005172A1 (en) | 2008-05-30 | 2011-09-14 | Information searching apparatus, information managing apparatus, information searching method, information managing method, and computer product |
US15/044,781 US9858282B2 (en) | 2008-05-30 | 2016-02-16 | Information searching apparatus, information managing apparatus, information searching method, information managing method, and computer product |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2008-143527 | 2008-05-30 | ||
JP2008143527A JP5782214B2 (en) | 2008-05-30 | 2008-05-30 | Information search program, information search device, and information search method |
US12/361,316 US8037035B2 (en) | 2008-05-30 | 2009-01-28 | Apparatus for searching and managing compressed files |
US13/232,089 US20120005172A1 (en) | 2008-05-30 | 2011-09-14 | Information searching apparatus, information managing apparatus, information searching method, information managing method, and computer product |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/361,316 Division US8037035B2 (en) | 2008-05-30 | 2009-01-28 | Apparatus for searching and managing compressed files |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/044,781 Continuation US9858282B2 (en) | 2008-05-30 | 2016-02-16 | Information searching apparatus, information managing apparatus, information searching method, information managing method, and computer product |
Publications (1)
Publication Number | Publication Date |
---|---|
US20120005172A1 true US20120005172A1 (en) | 2012-01-05 |
Family
ID=41381027
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/361,316 Active 2030-03-13 US8037035B2 (en) | 2008-05-30 | 2009-01-28 | Apparatus for searching and managing compressed files |
US13/232,089 Abandoned US20120005172A1 (en) | 2008-05-30 | 2011-09-14 | Information searching apparatus, information managing apparatus, information searching method, information managing method, and computer product |
US15/044,781 Active US9858282B2 (en) | 2008-05-30 | 2016-02-16 | Information searching apparatus, information managing apparatus, information searching method, information managing method, and computer product |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/361,316 Active 2030-03-13 US8037035B2 (en) | 2008-05-30 | 2009-01-28 | Apparatus for searching and managing compressed files |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/044,781 Active US9858282B2 (en) | 2008-05-30 | 2016-02-16 | Information searching apparatus, information managing apparatus, information searching method, information managing method, and computer product |
Country Status (2)
Country | Link |
---|---|
US (3) | US8037035B2 (en) |
JP (1) | JP5782214B2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160275072A1 (en) * | 2015-03-16 | 2016-09-22 | Fujitsu Limited | Information processing apparatus, and data management method |
US9519574B2 (en) | 2012-11-28 | 2016-12-13 | Microsoft Technology Licensing, Llc | Dynamic content access window loading and unloading |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8918374B1 (en) * | 2009-02-13 | 2014-12-23 | At&T Intellectual Property I, L.P. | Compression of relational table data files |
US8949260B2 (en) * | 2009-10-09 | 2015-02-03 | Ricoh Co., Ltd. | Method and apparatus for secure and oblivious document matching |
JP5418218B2 (en) * | 2009-12-25 | 2014-02-19 | 富士通株式会社 | Information processing program, information search program, information processing device, and information search device |
US9176995B2 (en) * | 2010-02-22 | 2015-11-03 | International Business Machines Corporation | Organization of data within a database |
WO2012117544A1 (en) * | 2011-03-02 | 2012-09-07 | 富士通株式会社 | Search program, search device, and search method |
US9323769B2 (en) * | 2011-03-23 | 2016-04-26 | Novell, Inc. | Positional relationships between groups of files |
WO2012140701A1 (en) * | 2011-04-15 | 2012-10-18 | Hitachi, Ltd. | File sharing system and file sharing method |
JPWO2012150637A1 (en) * | 2011-05-02 | 2014-07-28 | 富士通株式会社 | Extraction method, information processing method, extraction program, information processing program, extraction device, and information processing device |
JP5844554B2 (en) * | 2011-06-08 | 2016-01-20 | Jfeシステムズ株式会社 | Data management storage system |
US8898592B2 (en) * | 2011-06-30 | 2014-11-25 | International Business Machines Corporation | Grouping expanded and collapsed rows in a tree structure |
US9251289B2 (en) | 2011-09-09 | 2016-02-02 | Microsoft Technology Licensing, Llc | Matching target strings to known strings |
EP2581704A1 (en) * | 2011-10-14 | 2013-04-17 | Harman Becker Automotive Systems GmbH | Method for compressing navigation map data |
JP5939259B2 (en) * | 2011-11-04 | 2016-06-22 | 富士通株式会社 | Collation control program, collation control device, and collation control method |
GB2510523B (en) | 2011-12-22 | 2014-12-10 | Ibm | Storage device access system |
WO2014045320A1 (en) | 2012-09-21 | 2014-03-27 | 富士通株式会社 | Control program, control method and control device |
US9448740B2 (en) * | 2012-11-27 | 2016-09-20 | Hitachi, Ltd. | Storage apparatus and hierarchy control method |
US9330159B2 (en) | 2012-12-27 | 2016-05-03 | Teradata Us, Inc. | Techniques for finding a column with column partitioning |
US10423596B2 (en) * | 2014-02-11 | 2019-09-24 | International Business Machines Corporation | Efficient caching of Huffman dictionaries |
US20190087599A1 (en) | 2014-04-02 | 2019-03-21 | International Business Machines Corporation | Compressing a slice name listing in a dispersed storage network |
JP6609404B2 (en) * | 2014-07-22 | 2019-11-20 | 富士通株式会社 | Compression program, compression method, and compression apparatus |
KR20170027036A (en) * | 2015-09-01 | 2017-03-09 | 에스케이하이닉스 주식회사 | Data processing system |
US9930146B2 (en) | 2016-04-04 | 2018-03-27 | Cisco Technology, Inc. | System and method for compressing content centric networking messages |
JP6737117B2 (en) * | 2016-10-07 | 2020-08-05 | 富士通株式会社 | Encoded data search program, encoded data search method, and encoded data search device |
CN109429101B (en) * | 2017-08-31 | 2021-03-05 | 中国电信股份有限公司 | Desktop loading method and device of interactive network television |
US10877959B2 (en) * | 2018-01-17 | 2020-12-29 | Sap Se | Integrated database table access |
CN109413176B (en) * | 2018-10-19 | 2021-06-08 | 中国银行股份有限公司 | Report downloading method and device |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5675789A (en) * | 1992-10-22 | 1997-10-07 | Nec Corporation | File compression processor monitoring current available capacity and threshold value |
US5809527A (en) * | 1993-12-23 | 1998-09-15 | Unisys Corporation | Outboard file cache system |
US5822759A (en) * | 1996-11-22 | 1998-10-13 | Versant Object Technology | Cache system |
US20030217113A1 (en) * | 2002-04-08 | 2003-11-20 | Microsoft Corporation | Caching techniques for streaming media |
US20040225497A1 (en) * | 2003-05-05 | 2004-11-11 | Callahan James Patrick | Compressed yet quickly searchable digital textual data format |
US20060242163A1 (en) * | 2005-04-22 | 2006-10-26 | Microsoft Corporation | Local thumbnail cache |
US20070168398A1 (en) * | 2005-12-16 | 2007-07-19 | Powerfile, Inc. | Permanent Storage Appliance |
US20080098024A1 (en) * | 2005-05-20 | 2008-04-24 | Fujitsu Limited | Information retrieval apparatus, information retrieval method and computer product |
US20080201341A1 (en) * | 2007-02-19 | 2008-08-21 | Takuya Okamoto | Contents management method |
Family Cites Families (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5532694A (en) * | 1989-01-13 | 1996-07-02 | Stac Electronics, Inc. | Data compression apparatus and method using matching string searching and Huffman encoding |
JP3220865B2 (en) * | 1991-02-28 | 2001-10-22 | 株式会社日立製作所 | Full text search method |
US5333313A (en) * | 1990-10-22 | 1994-07-26 | Franklin Electronic Publishers, Incorporated | Method and apparatus for compressing a dictionary database by partitioning a master dictionary database into a plurality of functional parts and applying an optimum compression technique to each part |
JPH0877201A (en) * | 1994-09-09 | 1996-03-22 | Toshiba Corp | System and method for document database retrieval |
JPH08221954A (en) * | 1995-02-14 | 1996-08-30 | Sanyo Electric Co Ltd | Multimedia reproducing device |
US5748121A (en) * | 1995-12-06 | 1998-05-05 | Intel Corporation | Generation of huffman tables for signal encoding |
JP3305190B2 (en) * | 1996-03-11 | 2002-07-22 | 富士通株式会社 | Data compression device and data decompression device |
JP4105260B2 (en) * | 1997-08-25 | 2008-06-25 | 富士通株式会社 | Information processing device |
US6112208A (en) * | 1997-08-25 | 2000-08-29 | Fujitsu Limited | Data compressing method and apparatus to generate bit maps in accordance with extracted data symbols |
JP3737885B2 (en) * | 1998-06-02 | 2006-01-25 | 大日本印刷株式会社 | Virtual space sharing system |
US6393149B2 (en) * | 1998-09-17 | 2002-05-21 | Navigation Technologies Corp. | Method and system for compressing data and a geographic database formed therewith and methods for use thereof in a navigation application program |
JP3753598B2 (en) * | 2000-07-06 | 2006-03-08 | 株式会社日立製作所 | Computer, computer system and data transfer method |
JP4556087B2 (en) * | 2001-03-22 | 2010-10-06 | ソニー株式会社 | DATA PROCESSING DEVICE, DATA PROCESSING METHOD, PROGRAM, AND PROGRAM RECORDING MEDIUM |
JP4229626B2 (en) * | 2002-03-26 | 2009-02-25 | 富士通株式会社 | File management system |
JP2003337822A (en) * | 2002-05-21 | 2003-11-28 | Fujitsu Ltd | Compression retrieval archive processing method, compression retrieval archive processing program and recording medium with its program recorded |
US7126500B2 (en) * | 2002-06-26 | 2006-10-24 | Microsoft Corporation | Method and system for selecting grammar symbols for variable length data compressors |
JP2004258865A (en) * | 2003-02-25 | 2004-09-16 | Canon Inc | Method of processing information |
JP4490068B2 (en) * | 2003-09-22 | 2010-06-23 | 大日本印刷株式会社 | Data storage system using network |
JP2006302012A (en) * | 2005-04-21 | 2006-11-02 | Sony Corp | File management device, file management method and program |
JP4736593B2 (en) | 2005-07-25 | 2011-07-27 | ソニー株式会社 | Data storage device, data recording method, recording and / or reproducing system, and electronic device |
US7307552B2 (en) * | 2005-11-16 | 2007-12-11 | Cisco Technology, Inc. | Method and apparatus for efficient hardware based deflate |
US8776052B2 (en) * | 2007-02-16 | 2014-07-08 | International Business Machines Corporation | Method, an apparatus and a system for managing a distributed compression system |
US7688233B2 (en) * | 2008-02-07 | 2010-03-30 | Red Hat, Inc. | Compression for deflate algorithm |
-
2008
- 2008-05-30 JP JP2008143527A patent/JP5782214B2/en active Active
-
2009
- 2009-01-28 US US12/361,316 patent/US8037035B2/en active Active
-
2011
- 2011-09-14 US US13/232,089 patent/US20120005172A1/en not_active Abandoned
-
2016
- 2016-02-16 US US15/044,781 patent/US9858282B2/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5675789A (en) * | 1992-10-22 | 1997-10-07 | Nec Corporation | File compression processor monitoring current available capacity and threshold value |
US5809527A (en) * | 1993-12-23 | 1998-09-15 | Unisys Corporation | Outboard file cache system |
US5822759A (en) * | 1996-11-22 | 1998-10-13 | Versant Object Technology | Cache system |
US20030217113A1 (en) * | 2002-04-08 | 2003-11-20 | Microsoft Corporation | Caching techniques for streaming media |
US20040225497A1 (en) * | 2003-05-05 | 2004-11-11 | Callahan James Patrick | Compressed yet quickly searchable digital textual data format |
US20060242163A1 (en) * | 2005-04-22 | 2006-10-26 | Microsoft Corporation | Local thumbnail cache |
US20080098024A1 (en) * | 2005-05-20 | 2008-04-24 | Fujitsu Limited | Information retrieval apparatus, information retrieval method and computer product |
US20070168398A1 (en) * | 2005-12-16 | 2007-07-19 | Powerfile, Inc. | Permanent Storage Appliance |
US20080201341A1 (en) * | 2007-02-19 | 2008-08-21 | Takuya Okamoto | Contents management method |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9519574B2 (en) | 2012-11-28 | 2016-12-13 | Microsoft Technology Licensing, Llc | Dynamic content access window loading and unloading |
US20160275072A1 (en) * | 2015-03-16 | 2016-09-22 | Fujitsu Limited | Information processing apparatus, and data management method |
US10380240B2 (en) * | 2015-03-16 | 2019-08-13 | Fujitsu Limited | Apparatus and method for data compression extension |
Also Published As
Publication number | Publication date |
---|---|
US8037035B2 (en) | 2011-10-11 |
US9858282B2 (en) | 2018-01-02 |
JP2009289196A (en) | 2009-12-10 |
JP5782214B2 (en) | 2015-09-24 |
US20160162504A1 (en) | 2016-06-09 |
US20090299973A1 (en) | 2009-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9858282B2 (en) | Information searching apparatus, information managing apparatus, information searching method, information managing method, and computer product | |
US6678687B2 (en) | Method for creating an index and method for searching an index | |
EP2443564B1 (en) | Data compression for reducing storage requirements in a database system | |
US6694323B2 (en) | System and methodology for providing compact B-Tree | |
US7743060B2 (en) | Architecture for an indexer | |
US20040205044A1 (en) | Method for storing inverted index, method for on-line updating the same and inverted index mechanism | |
JP4646624B2 (en) | Store and query relational data in a compressed storage format | |
US8866647B2 (en) | Computer product, information processing apparatus, and information search apparatus | |
KR20160145785A (en) | Flash optimized columnar data layout and data access algorithms for big data query engines | |
US10810174B2 (en) | Database management system, database server, and database management method | |
CN105404677A (en) | Tree structure based retrieval method | |
JPH09245043A (en) | Information retrieval device | |
JP5448428B2 (en) | Data management system, data management method, and data management program | |
US7953721B1 (en) | Integrated search engine devices that support database key dumping and methods of operating same | |
JP6006740B2 (en) | Index management device | |
CN105426490A (en) | Tree structure based indexing method | |
Bookstein et al. | Using bitmaps for medium sized information retrieval systems | |
JP5494860B2 (en) | Information management program, information management apparatus, and information management method | |
JP2007048318A (en) | Relational database processing method and relational database processor | |
WO2013069149A1 (en) | Data search device, data search method and program | |
JPWO2011099114A1 (en) | Hybrid database system and operation method thereof | |
JP5238105B2 (en) | Program and data extraction method | |
CN110825747B (en) | Information access method, device and medium | |
JP2006073035A (en) | Computerized document retrieval system, retrieval device and recording medium | |
JP2001312517A (en) | Index generation system and document retrieval system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |