Internet, Libraries and a VCL

The Agent based Digital Library Infrastructure Project

The aim of this project is to create a Virtual Community Library where each user has a personal library and, at the same time, is part of a larger community consisting of the other users' personal libraries and, through intermediators, other digital libraries. Being a part of a community means that each user can benefit from the work put into the other libraries.E.g by obtaining documents through search queries or recommendations using social filtering and also by getting help to organize the personal library.

Background

The Internet is sometimes compared to a giant digital library. This is not, however, an entirely suitable metaphor. A library will usually have some characteristics not found on the Internet, e.g. persistency commitments for, and search abilities on its collection.

On the WWW there are no commitments on persistency or classification of content whatsoever. This makes it very easy to make information public but makes it harder to make sure that the information will be found by the targeted audience or that it can be found again. This commitment-less philosophy has probably been of profound importance to make the WWW grow quickly but makes it very difficult to build services that use the information presented on the WWW.

Usually, a library will also have a scope of interest, i.e. it maintains a collection of items for a particular purpose. The reason for this need not primarily be to reduce costs. Maintaining a collection and making it searchable means that one has to maintain some dictionary or more elaborate structure, e.g. some topic hierarchy. The larger this structure gets, the harder will it be to keep it consistent and to make sure the maintainers use it consistently (e.g. how do we decide what keywords appropriately describe a document).

If we see the library as collection of items for a particular purpose (e.g. a department's books collection or even a broader scope, such as "contemporary swedish literature") one can talk about the relevance of adding an item to the collection of a certain library. This means that one can also talk about the collection as being more or less complete with regard to the topic and the indexing as being more or less sufficient for that library's purpose. Basically, this is how libraries already tend to be organized; different libraries have different fields of expertise but they can search the collections of other libraries. There are often efforts to merge the collections of the libraries but this work is done in parallel to expansion and evolution of the local collections of the libraries, i.e. one does not want to sacrifice the usefulness of the current collection to the utopian goal of one, single all-encompassing collection.

Seachable structures encompassing the entire WWW (yahoo etc.) will be incomplete, outdated and often inconsistent because of the sheer size of the subject they try to cover. (If the WWW, as a whole, is seen as a library then that scope is really very wide...). It is also difficult to assess the quality (w.r.t. accuracy of classification and "updatedness") of such broad collections. Within interest-communities, smaller, topic-centered collections of resources are often maintained. This can be seen as a a parallel to topic centred libraries, although without any of the commitments usually associated with a library.  Collections are usually maintained also by individual users (e.g. bookmarks-lists and home pages) so there is really no lower bound to what could be seen as a collection.

Is it possible to merge the two concepts, the openness of the Internet and stability of the library, to gain advantages from both of them? Can we  combine a library's characteristics of persistency and search ability of its collection with the WWW-idea that anyone can contribute information? Our approach is to provide personal libraries to community of users, all maintaining information  relevant to themselves, individually or in coalitions. By exchanging information between the libraries more information than was registered in the personal library will be available to the individual user. We call this combined search space a Virtual Community Library (VCL).

By requiring that the information is relevant to the individual users we address the problem that users will rarely bother to register information they, themselves, have no interest in. Classification of information by end-users usually introduces the problem of having non-librarians doing classification according to the best of their own knowledge. This makes it harder to develop use-conventions in classifications (e.g. to mark an item with a keyword that is not even mentioned in the text, but still is relevant). We will still need librarians that classify larger collections to get homogeneous use-conventions. Such professionally maintained collections can too be seen as personal libraries  (neither they can claim to "complete" or reflect more than a the classifiers opinions), but might be used more frequently by others than a library representing only one individual (and, in that sense, be more influential).

The VCL can be seen as a community of agents in the sense of the participants all being, basically, self-interested and having incomplete knowledge.  We want the agents to be able to make use of other agents work or, (at least) know when work can be shared. The agents might be committed to things such as maintaining collections of information or informing others of changes in their knowledge bases but would typically not allow manipulation of their own knowledge by others.

Each agent's internal knowledge representation may be as detailed as it needs, but the agents might be constrained in what they can communicate by the communication language or by what knowledge is shared. E.g. it might not be possible to communicate that two books are written by the same person if the other agent has no ability to understand the concept of a "person".  I.e. since we want as many information sources as possible to be regarded as agents, we cannot demand that they all commit to use a single ontology.

[Tail of this section is not edited]

*goals: share work, manage different classifications-schemes, generate representations according to different standards*
* hard: finding "same" papers/items ... *
* SO: What is relevant to a user.   His/her papers, references, projects, readings-list personal info, interests..
 

Since more can be said and guaranteed about the  information in a library, quite possibly, a library has useful properties when experimenting with techniques such  as Social Filtering etc. [see Recommending And Evaluating Choices In A Virtual Community].
*examples here*
Also, a personal digital library can probably help when collecting information about a particular user's interests and further guide the filtering.

The Requirements of a Virtual Community Library

The Characteristic of the Web, a Library and a Personal Library

To get a picture of what a Virtual Community Library can be, we have analysed the three components we think should be combined. The three components are the characteristics of the web, a library and a personal library.  Basically, the web and a library do not have that much in common while a personal library can, in many ways, be seen as a special case of a library. The analysis is presented in table 1.
 
 
The Web A Library A Personal Library
Goal To make it easy to make documents public To have a complete collection in its knowledge area. To have a complete collection in its interest area.
Commitments No commitments. Issues references to the documents in its collection to be used for later retrieval of the document. Same commitments as for a library.
    Issues bibliographic information to the documents in its collection to be used for later searching for the document.  
How one adds new documents Anyone may publish documents. Certain people are allowed to add and register documents. Same way to add documents as for a library.
  No need to register documents. Immediate access to new documents. There is an active choice of what documents to add according to the goal.  
The information organization No centralized control of the flow of information. Organizes information about the internal and external documents. Same organization as for a library.
  Users can organize their information with uni-directed links. Attaches meta-information to the the items in the collection. + Possibilities to make personal annotations (e.g. tags and notes).
Durability of documents One finds a document as long as the publishing user wants. Different users should be able to find exactly the same documents over and over again. I have to be able to find exactly the same document over and over again.
Table 1. The characteristics of the web, a library and a personal library.

Note: If the bibliographic information is standardized in some way, (e.g. title or isbn-number) the same information can be used by another library which maintains a collection with a copy of the item. For this reason collections of standardized bibliographic information becomes a resource that is interesting to maintain in its own right.
Note: A library will usually also keep collections of references and bibliographic information contained in other libraries. This can be thought of as keeping a reference to the other library.
Note: Libraries traditionally maintain collections of "frozen" documents. Maintaining references to living documents (and the consequences of changing content) is still a research topic not part of traditional library cataloguing work. The characterization of a library above is meant for a traditional library.

The Characteristic of a Virtual Community Library (VCL)

A Virtual Community Library should combine the different characteristics of the web, the library and the personal library.
Goal
Commitments
Note: We cannot guarantee that the bibliographic information associated with one item in one personal library can be used to retrieve a copy or variant of the document stored in another  personal library, since the libraries are not committed to use the same bibliographic standards.
How one adds new documents
The information organization
Durability of documents
A desirable characteristic is that the work of storing documents and attaching information should be spread out in the community to the users that have access to the correct information.