Treffer: IT : big I and small T or big T and small I; an assessment

Title:

IT : big I and small T or big T and small I; an assessment

Authors:

MAHON, B

Source:

Cranfield Conference, UK, June 1999: Is there a Future for Informatics Research?Information services & use. 19(2):93-98

Publisher Information:

Amsterdam: IOS Press, 1999.

Publication Year:

1999

Physical Description:

print,

Original Material:

INIST-CNRS

Subject Terms:

Document Type:

Konferenz Conference Paper

File Description:

text

Language:

English

ISSN:

0167-5265

Access URL:

http://pascal-francis.inist.fr/vibad/index.php?action=search&terms=1182176

Rights:

Copyright 2000 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS

Notes:

Sciences of information and communication. Documentation

FRANCIS

Accession Number:

edscal.1182176

Database:

PASCAL Archive

Weitere Informationen

Since computers were first applied to information management there has been a tension between the view that computing could solve the problems of information handling and the view that computing could only be an auxiliary to intellectual effort. In the more than 40 years that have elapsed between KWIC and KWOC indexing and the Web Portals and other search engines of to-day, the argument has ebbed and flowed. This paper will attempt to assess whether the argument has been resolved.

AN0002504199;ISU01APR.99;1999Nov23.15:38;v4.0

IT -- BIG I AND SMALL T OR BIG T AND SMALL I; AN ASSESSMENT

Introduction

I wrote my M.Sc. thesis for Jason Farradane on the subject of SDI -Selective Dissemination of Information, which was probably the first information service which was designed to be delivered by computer. That was in 1969.

Those of you who knew Jason know the problem I had in getting my opus past his gimlet like eyes. He didn't believe in computers, he believed in intellectual assessment of information. He believed that information had to be categorised in relational fashion and that only the (experienced) human mind could do that.

I stuck to my task, I was young and enthusiastic. I firmly believed that computing could and would provide a solution for managing information. I think, even now 30 years later, that I persuaded Jason that the computer had a place in information handling. He probably accepted that it could be used to do drudge jobs such as sorting, but nothing more elaborate. Of course he never admitted that. To be fair, he passed my thesis.

Was I right to be so enthusiastic?

Summary

In the more than 40 years that have elapsed between KWIC and KWOC indexing and the Web Portals and other search engines of to-day, the argument has ebbed and flowed. This paper will attempt to assess whether the argument has been resolved.

1. The early attempts

In 1969 the computer was still a curiosity. Pete Luhn was working in the IBM labs in the US and he had invented the KWIC, KWOC, etc. indexing system, as it was called. Of course it was not indexing in the true sense, it was a means of feeding the titles of items into a computer and permuting them so that they could be put in a listing in alphabetical order of the different words. It was called an index of the content of the documents.

In 1969 it was unusual to use a computer to manipulate text. Tony Kent was probably still only allowed to use the computer at Nottingham on Saturdays because his text retrieval experiments interfered with the real computing that was done on Mondays to Fridays by the engineers, physicists and mathematicians. What we call computing to-day was called data processing at that time. In terms of the use of computers to handle text, text was merely data without many numbers.

The capability of those early computers was very limited. They could deal with data only in computer specific formats. Since they were not optimised for text they were consequently very slow. The major problem was they worked in batch mode so you had to wait until the end of the batch operation to discover the result. More often than not you discovered that you had an input error on your punched cards and the whole operation was a waste.

Nevertheless pioneers such as The American Chemical Society were experimenting with magnetic tapes which resulted from the use of computers to set the type for the printed volumes of Chemical Abstracts. They were creating KWIC, etc., indexes and also selecting items on the basis of their subject matter. More importantly they were using the computer to sort out chemical compounds and rationalising their names and structures in the Registry system. They were also using the computer to create what they called patent concordances, an attempt to use the computer to recognise the same patent filed in different jurisdictions.

The intellectual arguments at the information level were between the zealots of the thesaurus and the high priests of free indexing, z Computing for them was limited to creating the lists of thesaurus terms or the alphabetical list of index terms. That of itself was a challenge, the word processor had not been invented, Bill Gates was wearing short trousers.

Those early experiments and even the early products made no attempt or claim to offer an improvement in information handling apart from rationalising, in a very basic fashion, the handling of certain tasks. The challenges were in the computing, devising clever ways to create an alphabetic list, scanning text quickly, comparing items efficiently, presenting output nicely. The research was directed at improving these processes, the intellectual effort went into the programming.

Looking at that list of challenges of the nineteen sixties one is struck by their similarity to the challenges of to-day. The details may be different, alphabetical listing is not a priority now but the processes of scanning, sorting, comparing and presenting are still the fundamental objectives of information researchers.

The I and T assessment of those early years comes down clearly on the side of the T, such as it was at that time.

2. The middle years

The late seventies brought the microprocessor providing flexibility and a low cost environment which opened up the field. The nineteen eighties brought the PC, and information management using computers took off. The democratisation of computing brought a lot of new players into the world of information handling. In the same time-frame telecommunications was linked to computing to provide remote access and the growth of the big computer centre offering online access to databases.

Research in this period moved on from the religious wars of the nineteen sixties to the development of various clever ways of searching for information using computers. The rather crude truncation capability was elaborated into the capacity to search phrases, sentences and paragraphs, to designate word order, to search specific parts of documents, to specify the field where the text sought was located, to sort output in a meaningful way.

Once again these developments were computer related. The improvements were the consequence of increases in computing power. Whilst it had always been possible, in theory, to search for words in a certain order the computing power required to do so was only available to a limited number of people. It is interesting to recall that the early offers of online information suppliers such as Dialog and ESA were based upon search languages that did not offer much more than single term selection, truncation and very little in the way of sequence searching or word order searching. The eighties was a period when improvement in searching capability were offered in a sort of leapfrog between the big players, the competition between them being the spur to improving the quality.

In the same period other applications of computers for information management made their impact. The best known of these was citation indexing and searching. My fellow speaker in this session was the pioneer here and it is a measure of his dedication that this area of information handling became so efficient and popular. The use of computers to sort and present contents pages originated in the same environment. In the USA, but not much in Europe, specialised computer based information services for the field of law were developed. In the world of library management the application of computers to cataloguing made rapid progress.

The nineteen seventies and eighties were the years when computing came out of the research laboratory. In the world of information management computers were widely applied to the growing volumes of computer readable material. The democratisation of computing brought about by the PC did not permeate the world of information to any great extent in that period. At the end of the eighties computer based information was still an area restricted to relatively few specialists.

The middle years were the period when computing generally became more respectable. However, with a few notable exceptions, in the world of information, computing was looked upon with suspicion. It was a relatively crude method of trawling the available material but tried to give the impression that it was great deal more useful than it really was. The big user populations in the universities and major industry sectors were becoming convinced of the value but outside these sectors the penetration was poor.

The I&T assessment for the nineteen seventies and nineteen eighties shows a movement towards the I but is still dominated by the T. The technology improved significantly and it had an impact on the amount of data being processed. The information world made good use of the technology but perhaps it did not make the best use of the opportunities to leverage the broadening of the user base.

3. The Web and all that follows

The nineteen nineties is dominated by IT, not just in the field of information handling and management. For example, it is quite normal and very refreshing to see young television presenters explaining complex computing based developments in an articulate fashion, with the help of sophisticated graphics.

Of the many developments in IT one has been seminal to information management. Tim Berners-Lee has been widely acclaimed for his invention, the World Wide Web. It can truly be said that it has revolutionised the presentation of information.

Many people including information professionals tend to forget that the Bemers-Lee invention was the Web, not the manifestation of it on the screen in Windows 98 or Internet Explorer or Netscape Communicator, the browser. At CERN, the European Nuclear Research Centre where Bemers-Lee worked researchers were frustrated by the fact that documents concerning their research were located on computers all over Europe. Nuclear Physicists turned computer specialists bent their mind to the problem and came up with the idea of applying the computing power and telecommunications at their command to the creation of an interconnected set of documents. The problem should be familiar to information professionals - how to deal with large collections of documents in different formats in different places. The major difference is that information handling deals with a lot of materials which cannot physically be located on computers.

The information world did have the concept of interconnected documents. Its efforts had been concentrated, up to the time the Web was created, on external management of the links, most notably through citation management but also through terminology control, indexing and cataloguing conventions, specific identifier systems for complex products and patent concordance and classification schemes. The real invention of the CERN researchers was the automated connection of documents through internal cross-referencing. This was possible in their distributed computing environment.

The Web has provided the opportunity for the creator of a piece of information to manage how the reader will connect it to other related items. This capability has been extended to the form and shape of information products and services where the creators can direct the users' attention through the design of the product. The Web enables the intellectual effort of indexing to be graphically presented to the user. It is a watershed in the relationship between I&T.

The effect of the Web on information management could not have been felt were it not for a parallel revolution in communications technology. For those of us who were blooded on X25 and the politics of monopoly telecommunications supply what has happened in the last 10 years can be likened to a hurricane blowing through the bureaucracy of telecommunications policy. The time scale of the change is so short that it cannot be said that older telecommunications executives are spinning in their graves, they are spinning in their comfortably upholstered office chairs. It is quite ironic to recall that the CERN development of the Web was possible because they were using their own leased line communications, it would not have been possible over public data networks at that time.

This liberation of world-wide communications has been accompanied by internal networking of computing in almost all organisations. This has brought about the Intranet, enabling the internal presentation of information in a common format.

The other main information management development, intimately associated with the Web, was the implementation of a standardised information presentation language HTML. This development has created what was the holy grail of the document management advocates, the structured data set for describing documents.

HTML is a subset of SGML, which was a creation of the printing and publishing sector but was considered too complex by many in the information sector to be used as a standard for document description. How we have changed our opinion since the advent of the Web!! We are now about to become ardent advocates of XML, a new standard which permits further depth to be added to document description in distributed computing environments.[2]

A small parenthesis is needed here to re-define the meaning of document. Digitisation of information materials enables any form of information communication - text, audio, video and combinations of these can be treated in a common fashion. Document in this context means any presentation of information in any format.

The combined effect of these developments has been to bring a new set of players into the information management sector. Names like Alta Vista, Yahoo, Lycos, have entered the vocabulary of the general public as well as that of the London Stock Exchange and Wall Street. It is doubtful if any of these people ever heard of the Anglo American Cataloguing Rules.

Research in information handling has been rejuvenated. However, there is a distinct impression that the newer researchers, driven by a different set of priorities, do not take much cognisance of previous work. Since so much of the work is done in commercial environments it is hard to know whether the creators of Alta Vista, Netseek, Dejanews, etc., have incorporated the work of previous information retrieval researchers in their work. The likelihood is that they have used the computing power at their disposal to develop their products without much reference to history. The establishment of the Microsoft research facility in Cambridge where well known IR researchers will be based could change that.

This recalls the days when information professionals bemoaned the interference of computer specialists who seemed to think that solving information problems was easy. The difference is that to-day's entrepreneurs have proved that they can create and launch new information management products and services that satisfy large audiences. They may not make the best use of the research available but they work/

The information retrieval researchers have not been idle. The TREC group carries on the proud tradition of IR research, their work being given a new meaning by the expectations of the new audiences for information products and services.

The real pressure on traditional information suppliers is that user expectation has been heightened by the publicity surrounding the new Web-based information services.

The traditional information services have reacted to the changes in the market place and in user behaviour by offering Web or Web-like interfaces. Some have gone further and restructured their offer to make full use of the new options, Engineering Information being one of the early pioneers.

There is a lot of activity underway in many information supply organisations. Dialog has recently unveiled a new rule based online automatic indexing system, built around an indexing engine in a Web browser. Dialog are also undertaking a complete re-indexing of the news archives of the BBC using the same technology and has purchased a majority of the shares of Muscat, a developer of new IR software. At the same time Reuters is publicising that they are (still) using human indexers for their new archive and Yahoo recruits librarians to classify the output of their Web search engine.

4. The overall assessment

The argument may well continue - in many ways it must continue if we are to have any future for research, between the advocates of computing and the advocates of (human) intellectual processing. In terms of I&T it seems clear that the emphasis has moved towards I. There is a widely held view that we are not awaiting a major breakthrough in technology to improve information management. The ball is clearly in the court of the information organisers, the technologists have delivered the means.

In this paper the phrases information management and information handling have been used arbitrarily to describe the different elements which go to make up the world in which we all work. This loose use of the terminology has been unavoidable, we are not able to decide which phrase should be used where. If there is a major challenge facing research in this field it is that we should be striving to deliver information meaning, a much more specific term than information handling or information management.

The latest buzz phrase is knowledge management. It has caught the imagination of a wide audience in business and is seen as the next big thing for software developers. The well known names in document management are dusting off some of their existing IR offerings and re-labelling them with knowledge management terminology. We risk another turf war with these zealots. To avoid it we should rally beneath the banner of creating applications which deliver information meaning. The challenge is great, the philosopher in Jack Meadows may shrink from such an objective. There is an implication that we are trying to tell the audience what the information means. Without weakening the objective we might re-label it as delivering the means to tell the client what a document (in the new definition) is about, not merely that it exists.

* Some material in this paper is reprinted from articles by Bernard Dixon in Current Biology, Vol. 9, pp. R154 and 466, by kind permission of Elsevier Science (34 Cleveland Street, London W1P 6LE, UK).

1 It still goes on - see Muddamale, JASIS 49(10), 881-887.

2 The issues surrounding the future of document structure standards as well as some historical reviews of the developments are contained in the July 1997 issue of JASIS. (JASIS 48(7)); see also http://www.enigmainc.com/XML/index.htm. 3 See Kahn and Locatis, JASIS 49(14), 1248-1253; Bodoff and Kambil, JASIS 49(14), 1254-1269; Kambil and Bodoff, JASIS 49(14), 1270-1282.

By Barry Mahon

P.O. Box 1416, L-1014 Luxembourg