January 20, 2009
Draft!! What kinds of semantic web tools can information architects use to make it easier for search engines to match specific queries with the most relevant information? Tools developed in Library Sciences (and now Information Studies) such as taxonomies, faceted classification and thesauri enhance search results. Information Studies with a focus on marketing do not attempt to help users link to the most relevant information in response to their queries as much as they want to advertise their products and services. A higher Google ranking means greater findability and more potential clients but cannot ever ensure robust truth claims or the most efficient research avenue on the subject under investigation. Behaviour-based taxonomy (BBT) architectures attempt to direct more traffic to their sites which could, if used unethically, simply translate into claiming more tags and categories than are justified.
Advances in technology make it easier for Information Architects to monitor traffic on sites and collect detailed statistics on queries and hits (page views, etc). Behaviour-based taxonomy (BBT) uses data on visitors/users/clients’ queries to revisit, redefine and refine taxonomies: categories, tags, metatags, etc. If I understand it correctly behaviour-based taxonomy (BBT) could be developed by reclassifying objects in different hierarchies within traditional subject and content-based taxonomies, using faceted categories effectively when classical taxonomy is not sufficient and being aware of other common terms that describe the object (thesauri).
Again if I am correctly understanding behaviour-based taxonomy it could also be a behaviour-based folksonomy, in which an ordinary blogger, like myself, could monitor wordpress dashboard data on visitors queries to develop categories and tags.
Faceted categories: Although the concept of the faceted categories system was first introduced by S.R. Ranganathan in 1933 in his publication Colon Classification, this simple but highly-efficient tool from the Library Sciences and Library and Information Studies (LIS), is proving to be highly effective in Web 2.0 applications such as del.icio.us social bookmarking and wordpress blogs. The popularity of facet categories has increased. Library and Information Studies S.R. Ranganathan faceted universal classification system used only five faceted categories (PMEST): Personality (the something in question, e.g. a person or event in a classification of history, or an animal in a classification of zoology), Matter (what something is made of), Energy (how something changes, is processed, evolves), Space (where something is), Time (when it happens). These have also become the classic who, what, how, where, when.
Dalhousie University’s School of Library and Information Studies professor Louise Spiteri’s article (1998-04) hosted on the Information Architecture Institute’s site, simplifies Ranganathan’s system and explained developments in the use of faceted categories through the establishment in 1952 of the British Classification Research Group (CRG) to expand on and modify Ranganathan’s facet analysis in response to the limitations of traditional enumerative classification systems when confronted with compound subjects. British Classification Research Group (CRG) studies facet analysis, relational operators and the theory of Integrative Levels (Spiteri 1995). Spiteri explored how the CRG’s bottom-up approach provided a useful alternative to the traditional top-down approach to classification. In the top-down approach predetermined areas of knowledge are broken down into their constituent elements. The bottom-up approach pieces together individual elements and then determines the areas of knowledge they form using the theory of integrative levels.
Faceted categories are based on emergent phenomena whose behaviour cannot be predicted from their constituent parts versus resultant phenomena whose behaviour can be predicted from their constituent parts.
Emergence has been used as a concept/metaphor in philosophy of the mind studies to illustrate why mechanistic calculations of brain architecture as used in cognitive science are inadequate. Michael Polanyi has found emergence to be a potent concept in the the study of knowledge itself helping to explain why we know what we know, how we build new ideas when our knowledge becomes tacit (Polanyi 1966). See Tennis “ (2004). Polanyi (1967: 4) explored processes inherent in connoisseurship/discovery of new models and theories versus Popper’s validation/refutation of theories and models in value-free scientific knowledge, and developed the concept of tacit knowledge, a pre-logical phase of knowing where ‘we can know more than we can tell (1967: 4)’ comprised of concepts, images and sensory information that help make sense of our experiences. Polanyi argued that the knowledge of approaching discovery or establishing authenticity is based on a highly personal (not objective) tacit knowledge where the knower senses that there is something valuable to be discovered and feels compelled or committed to investigating the hidden truth claim and relating evidence to an external reality (Polanyi 1967: 24-5). See Smith (2003).
A more recent faceted universal classification system called Bliss Bibliographic Classification (BC2), added and redefined the classical classification to include: thing/entity [who], kind, part, property, material, process [how], operation [how], patient, product, by-product, agent [who], space [where], time [when] (Broughton 2001:79).
(Broughton 2001) described how facted classifications continue to be “powerful tools for the management of vocabulary, characterised by a rigorous analytical approach to terms, and the clear identification of semantic and syntactic relationships and structures. The philosophy and function of BC2 are described, as is the process of building a knowledge structure on facet analytical principles. The range of related functions of such structures when employed as knowledge management tools (as classification, thesaurus, subject heading list, browsable index) is considered, as is the potential of facet analytical knowledge structures for the management of digital materials. Facet analysis is regarded as a powerful methodology for the creation of structures appropriate to specific retrieval requirements in a range of contexts, with emphasis on the problems of complex subject description and retrieval and multidimensionality.”
William Denton described (2007-02) “How to Make a Faceted Classification and Put It On the Web” by choosing between these two systems.
Metadata in terms of content management and information architecture, refers to “information about objects” (documents, images, etc). The well-known Dublin Core specification for metadata includes 15 elements: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage and Rights. The DCMI Type Vocabulary “provides a general, cross-domain list of approved terms that may be used as values. They also provide a more comprehensive document: “DCMI Metadata Terms These are described in detail on their site. For example under subject:
Term Name: subject
Definition: The topic of the resource.
Comment: Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary. To describe the spatial or temporal topic of the resource, use the Coverage element.
Type of Term: Property
Note: This term is intended to be used with non-literal values as defined in the DCMI Abstract Model (http://dublincore.org/documents/abstract-model/). As of December 2007, the DCMI Usage Board is seeking a way to express this intention with a formal range declaration.
- Subject-Based Taxonomies (SBT) : “Subject or “Domain” taxonomies attempt to completely describe all of the terms in a field, as well as the relationship between the terms. Typically these relationships are hierarchical, and they are the kind of taxonomies we use to classify knowledge – the kind of taxonomies your biology teacher would talk about. You need a real subject matter expert to create useful subject based taxonomy. And whatever you do, don’t hire two (or more) subject experts, because they will never agree on the taxonomy (Enterprise Search 2007-04-25) .”
Subject-based classification includes metadata properties or fields that directly describe objects by listing discrete subjects.
- Content-Based Taxonomies (CBT): “Content based taxonomies are organized using existing content. Organization charts, computer directory/folder structures, or social tagging content is typically a ‘content based’ taxonomy. These taxonomies are often built by humans – you do it yourself when you decide what folders to use on your computer. But these can also be done automatically with tools many search and content management vendors sell (Enterprise Search 2007-04-25).”Guarino, Masolo and Vetere’s (1999-05) article described how the OntoSeek system adopts “a language of limited expressiveness for content representation and uses a large ontology based on WordNet for content matching. [S]tructured content representations coupled with linguistic ontologies can increase both recall and precision of content-based retrieval. They compared tf-idf results from a corpus of 100 documents for term ‘cancer’ obtained from Google (cancer, cell, 3. breast, 4. research, 5. treatment, 6. tumor, 7. information, 8. color, 9. patient, 10. health, 11. support, 12. news, 13. care, 14. wealth, 15. tomorrow, 16. entering, 17. writing, 18. loss, 19. dine, 20. mine, 21. dinner) and terms expanded using WordNet (1. cancer, 2. cell, 3. tumor, 4. patient, 5. document, 6. carcinoma, 7. lymphoma, 8. disease, 9. access, 10. treatment, 11. skin, 12. liver, 13. leukemia, 14. risk, 15. breast, 16. genetic,
17.tobacco, 18. thymoma, 19. malignant, 20. gene, 21. clinical .
- Behavior-Based Taxonomy (BBT): (Enterprise Search (2007-04-25)) argued that Behavior-Based Taxonomy (BBT) is more important than Subject-Based Taxonomies (SBT) or Content-Based Taxonomies (CBT). Behaviour-Based Taxonomy refers to the actual list of search terms that people actually use when they search a site which can be monitored through your site’s search logs every few months. They recommend that you aim to provide useful results for the top hundred queries on your site (Enterprise Search 2007-04-25) .
wordpress.com offers an amazing service to their bloggers by providing statistics on page views, etc. But they also list queries which has in some ways encouraged me to improve responses to the most frequent search enquiries. There are search analytic tools available, but we can monitor our own sites using slow world technologies as well.
tags: queries, Search Analytics, findability, search relevance, search engine optimization (SEO), taxonomy, taxonomies, hierarchies, taxonomy, ontology, semantic web, Data Management, Knowledge Management, Information Architects, information specialists, classification, bottom-up, top-down, hierarchies (arborescent or rhizomic), folksonomy, what is being done in the name of,
faceted tagging: taxonomy: hierarchies,
categories: taxonomies, Internet search engines,
The British Classification Society (BCS)’s cross-disciplinary  membership includes anthropologists, archaeologists, astronomers, biologists, chemists, computer scientists, forensic scientists, geologists, information specialists, librarians, psychologists, soil scientists and statisticians who are concerned about principles and practice of classification.
Several subject-specific faceted classification systems such as the London Education Classification, the London Classification for Business Studies, and theClassification for Library and Information Science emerged through this initiative. Facet analysis has been applied not only to several classification systems, but has also been used in the design of information retrieval thesauri such as Thesaurofacet, DHSS-DATA Thesaurus and BSI Root Thesaurus, indexing systems such as BTI, CIFT, and GREMAS, and knowledge-based indexing systems such as MedIndex and SIMPR (Aitchison 1969, 1985;British Standards Institution 1988; Burton 1986; Gibb & Fleming 1991; Revie & Smart 1991; Stiles 1985; Travis1989, 1990)
Simpli “was an early search engine that offered disambiguation to search terms. A user could enter in a search term that was ambiguous (e.g., Java) and the search engine would return a list of alternatives (coffee, programming language, island in the South Seas). The technology was rooted in brain science and built by academics to model the way in which the mind stored and utilized language. The early technology was derived heavily from WordNet, which was invented by George A. Miller at Princeton University. George Miller was an advisory Board member to Simpli.” wiki
FAST In April 25, 2008 Microsoft acquired FAST Search & Transfer.
During the last five years, From 1967 until he died in 1972 renowned library scientist, Ranganathan, went through a period of prolific writing. It was at this time that he worked on Colon Classification and “proved that the design and development a scheme for classification is a life time activity (biography).”
3. Garshol’s (2004-10-26)” article focussed on the advantages of using topic maps to represent metadata and subject-based classification thereby re-using existing classifications and classification techniques with more precise descriptions, he also provided clear explanations of the role of faceted categories, subject-based classification and metadata.
1. wiki: The semantic lexicon WordNet groups English words into synsets, provides short definitions, and records the various semantic relations between these synonym sets thereby producing a more intuitive dictionary/thesaurus that is also capable of supporting automatic text analysis and artificial intelligence applications. Every synset contains a group of synonymous words or collocations (a collocation is a sequence of words that go together to form a specific meaning, such as “car pool”); different senses of a word are in different synsets. The meaning of the synsets is further clarified with short defining glosses (Definitions and/or example sentences). A typical example synset with gloss is: good, right, ripe — (most suitable or right for a particular purpose; “a good time to plant tomatoes”; “the right time to act”; “the time is ripe for great sociological changes”) Most synsets are connected to other synsets via a number of semantic relations.”
dog, domestic dog, Canis familiaris
=> canine, canid
=> placental, placental mammal, eutherian, eutherian mammal
=> vertebrate, craniate
=> animal, animate being, beast, brute, creature, fauna
On the WordNet site they provide an online search. The query: “dog” gives these results:
S: (n) dog, domestic dog, Canis familiaris (a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds) “the dog barked all night”
S: (n) frump, dog (a dull unattractive unpleasant girl or woman) “she got a reputation as a frump”; “she’s a real dog”
S: (n) dog (informal term for a man) “you lucky dog”
S: (n) cad, bounder, blackguard, dog, hound, heel (someone who is morally reprehensible) “you dirty dog”
S: (n) frank, frankfurter, hotdog, hot dog, dog, wiener, wienerwurst, weenie (a smooth-textured sausage of minced beef or pork usually smoked; often served on a bread roll)
S: (n) pawl, detent, click, dog (a hinged catch that fits into a notch of a ratchet to move a wheel forward or prevent it from moving backward)
S: (n) andiron, firedog, dog, dog-iron (metal supports for logs in a fireplace) “the andirons were too hot to touch”
S: (v) chase, chase after, trail, tail, tag, give chase, dog, go after, track (go after with the intent to catch) “The policeman chased the mugger down the alley”; “the dog chased the rabbit”
5. In his useful article Tennis (2004) described three paths of interdisciplinary work that shape the future of classification research: emergence, encyclopedism, and ecology.
Webliography and Bibliography
Broughton, Vanda. 2001. “Faceted classification as a basis for knowledge organization in a digital environment; the Bliss Bibliographic Classification as a model for vocabulary management and the creation of multi-dimensional knowledge structures.” The New Review of Hypermedia and Multimedia 2001: 67-102.
Denton, William. 2007-02. “How to Make a Faceted Classification and Put It On the Web.”
Guarino, Nicola; Masolo, Claudio; Vetere, Guido. 1999-05. “OntoSeek: Content-Based Access to the Web.” IEEE Intelligent Systems 14: 3:70-80.
Garshol, Lars Marius. 2004-10-26. “Metadata? Thesauri? Taxonomies? Topic Maps!: Making sense of it all.” XML Europe 2004.
Garshol, Lars Marius. 2004. “Metadata? Thesauri? Taxonomies? Topic Maps!: Making sense of it all.” Journal of Information Science. Chartered Institute of Library and Information Professionals. 30:4:378-39.
Garshol, Lars Marius. 2004-09. “Metadata? Thesauri? Taxonomies? Topic Maps!: Making sense of it all.” interChange. 10:3:17-30.
Jones, Matthew; Alani, Harath. 2006-07-23. “Content-based Ontology Ranking.” 9th International Protege Conference. Stanford, California.
Kehoe, Miles. 2007-Q1. “Interpreting Your Search Activity Reports: What to look for as you review your reports.” New Idea Engineering Inc.4:1.
McIlwaine, I.; Broughton, Vanda. 2000. “The Classification Research Group: then and now.” Knowledge Organization. 27:4: 195-199.
Polanyi, Michael. 1966. Tacit Dimension. Garden City, NY: Doubleday. See Michael Polanyi and Tacit Dimension.
Ranganathan, S.R. 1933 [1987 7th ed]. Colon Classification. Madras: Madras Library Association.
Ranganathan, S.R. 1933 . Colon Classification. 6th Edition. Ess Ess Publications, Delhi, India.
Ranganathan, S.R. 1962. Elements of library classification. New York: Asia Publishing House.
Smith, Mark. K. (2003) “Michael Polanyi and tacit knowledge.” Encyclopedia of Informal Education.
Spiteri, Louise F. 1995-06. The Classification Research Group and the Theory of Integrative Levels. Katharine Sharp Review. 1:1-6.
Taylor, A. G. 1992. Introduction to Cataloging and Classification. 8th ed. Englewood, Colorado: Libraries Unlimited.
Tennis, Joseph T. 2004. “Three Spheres of Classification Research: Emergence, Encyclopedism, and Ecology.” In
Advances in Classification Research. vol. 13. (Medford, NJ: Information Today for the American Society for
Information Science and Technology).
November 23, 2008
Knowing that this blog begun in November 2006 has now reached its 100,000 visit left me speechless. I believe my first blogging experience was with Flickr in the fall of 2006. Since then I discovered the interconnectivity of a rhizome of 2.0 technologies.
Over the past few hours search engines sent visitors to this site to find answers for these queries: “Treaty Seven timeline” (2), Jennifer Naglingniq (1), “dangerous concepts for a business economy” (1), “ripples” (1), bringing the blog into six-digits.
I believe I uploaded this image to Flickr on October 22, 2006. By January 2009 25 people tagged this image as a favorite and it had been viewed 46, 003 times. Because it was one of the earlier images I posted in my Flickr account, once my free Flickr account hit the magic number of 200 images, this one and many others are no longer visible on my photostream. However, since it is well-tagged and linked to this blog and others, and has been highlighted and used by others through the Creative Commons Licensing, it is still being found through searches!
I know a pro account would bring these missing images back into the photo stream, but I do not want to take the risk of album deletions, etc. And my cyber experiment has been with open source and free accounts.