Methods, apparatus and systems are provided to generate from a set of training documents a set of training data and a set of features for a taxonomy of categories. In this generated taxonomy the degree of feature overlap among categories is minimized in order to optimize use with a machine-based categorizer....http://www.google.es/patents/US20070185901?utm_source=gb-gplus-sharePatente US20070185901 - Creating Taxonomies And Training Data For Document Categorization
Creating Taxonomies And Training Data For Document Categorization