Unit-VII: Search and Resource Discovery Paradigms

Sakshi Education

Information search and Retrieval:
Search and retrieval begin when a user provides a description of the information being to an automated discovery system. Using the knowledge of the environment, the system attempts to locate the information that matches the given description. An information retrieval method depends on the libraries. The challenge is to develop user in domains such as electronic shopping. Search and retrieval methods refine queries through various computing techniques such as nearest neighbors, them variants of original query.

Electronic catalogs and directories:
Information organizing and browsing is accomplished using directories or catalogs‘organizing refers to how to interrelate information, by placing it in some hierarchy. Maintaining large amount of data is difficult.

Information filtering:
Goal of information filtering if selecting of data that is relevant, manageable and understandable.

Filters are of two types

Local filter
Remote filter

Local filters: Local filters work on incoming data to a PC, such as news feeds.

Remote filters: Remote filters are often software agents that work on behalf of the user and roam around the network from one data base to another.

Consumer Search and Retrieval

Search and Resource Discovery Paradigms
Information Search and Retrieval
Electronic Commerce Catalogs or Directories
Information Filtering

Information Search and Retrieval
Information search is sifting through large volumes of information to find some target information. Search & retrieval system are designed for unstructured & semi structural data. The process of searching can be divided into two types:

The publisher indexing phase
WAIS (Wide Area Information Service)

The end-user retrieval phases consist of three steps:

First is, the user formulates a text based query to search data.
Second is the server interprets users query, performs the search and returns the user a list of documents.
Third is, the user selects documents from the hit list and browses them, reading and perhaps printing selected portions of retrieved data.

The publisher indexing phase:

It consists of entering documents into the system and creating indexes and pointers to facilitate subsequent searches.
The process of loading a document and updating indexes is normally not a concern to the user.
These two phases are highly interdependent

WAIS (Wide Area Information Service):
It enables users to search the content of the files for any string of text that they supply. WAIS has three elements:

Client Sever Indexer

It uses an English language query front end a large assortment of databases that contains text-based documents.
It allows users search the full text of all the documents on the server.
Users on diff platforms can access personal, company, and published information from one interface i.e., text, picture, voice, or formatted document.
Anyone can use this system because it uses natural language questions to find relevant documents.
Then the servers take the user questions and do their best to find relevant documents. Then WAIS returns a list of documents from those users selects appropriate documents.
Today, the Netscape or NCSA mosaic browser with the forms capability is often used as a front-end to talk to WIAS sever.

Search Engines:
WAIS is a sophisticated search engine. The purpose of the search engine in any indexing system is to find every item that matches a query, no matter where it is located in the file system. Search engines are now being designed to go beyond simple, broadband searches for which WIAS is so popular. It uses both keywords and information searching to rank the relevance of each document. Other approaches to data searching on the web or on other wide area networks are available.

Indexing methods:
To accomplish accuracy and conserve disk space, two types of indexing methods are used by search engines. They are:

File-level indexing
Word-level indexing

File-level indexing:
It associates each indexed word with a list of all files in which that word appear at least once. It does not carry any information about the location of words within the file.

Word-level indexing:
It is more sophisticated and stores the location of each instance of the word. The disadvantage of the word-level indexing is that all the extra information they contain gobbles up a lot of disk space, it is 35-100 percent of the original data. The process of indexing data is simple one, it has large number of indexing packages: These indexing packages are categorized into three types, they are:

The client-server approach
The mainframe-based approach
The parallel-processing approach

Search and new data types:
We have the following search technologies for effective search:

Hypertext: richly interwoven links among items in displays allow users to move in relatively ad hoc sequences from display to display with in multimedia.

Sound: speech input and output, music and wide variety of acoustic cues include realistic sounds that supplement and replace visual communication.

Video: analog are digital video input from multiple media, including video tapes, CD-ROM, incorporated broadcast videos turners, cables and satellites.

3D-images: virtual reality displays offer a 3D environment in which all portions of the user interface are 3D.Searching using these new types of information poses interesting challenges that need to be addressed soon.

Electronic Commerce Catalogs or Directories
A directory performs an essential support function that guides customers in a maze of options by enabling the organizations of the information space.

Directories are of two types:

The white pages
Yellow pages

The white pages are used to people or institutions and yellow pages are used to consumers and organizations.

Electronic white pages:
Analogues to the telephone white pages, the electronic white pages provide services from a static listing of e-mail addresses to directory assistance. White pages directories, also found within organizations, are integral to work efficiency. The problems facing organizations are similar to the problems facing individuals.

A white pages schema is a data model, specifically a logical schema, for organizing the data contained in entries in a directory service, database, or application, such as an address book. A white pages schema typically defines, for each real-world object being represented: What attributes of that object are to be represented in the entry for that object.

What relationships of that object to other objects are to be represented?
One of the earliest attempts to standardize a white pages schema for electronic mail use was in X.520 and X.521, part of the X.500 a specification that was derived from the addressing requirements of X.400.

In a white pages directory, each entry typically represents an individual person that makes the use of network resources, such as by receiving email or having an account to log into a system.

In some environments, the schema may also include the representation of organizational divisions, roles, groups, and devices.

The term is derived from the white pages, the listing of individuals in a telephone directory, typically sorted by the individual's home location (e.g. city) and then by their name.

White pages through x.500:
One of the first goal of the X.500 project has been to create a directory for keeping track of individual electronic mail address on the internet.

X.500 offers the following features:

Decentralized maintenance
Each site running x.500 is responsible only for its local part of the directory.

Searching capabilities: x.500 provides powerful searching capabilities i.e. in the white pages; you can search solely for users in one country. From there you can view a list of organizations, then departments, then individual names.

This represents the tree structure.
Single global name space: x.500 provides single name space to users.

Structured information framework: X.500 defines the information framework used in the directory, allowing local extensions.

Standards-based directory: X.500 can be used to build directory applications that require distributed information.

Electronic Yellow Pages:
The term Yellow Pages refers to a telephone directory of businesses, categorized according to the product or service provided. In 1886 Reuben H. Donnelley created the first official yellow pages directory, inventing an industry. The traditional term Yellow Pages is now also applied to online directories of businesses. To avoid the increasing cost of yellow paper, the yellow background of the pages is currently printed on white paper using ink. Yellow paper is no longer used.

The name and concept of "Yellow Pages" came about in 1883, when a printer in Cheyenne, Wyoming working on a regular telephone directory ran out of white paper and used yellow paper instead.

Today, the expression Yellow Pages is used globally, in both English-speaking and non-English speaking countries. In the US, it refers to the category, while in some other countries it is a registered name and therefore a proper noun.

Third-party directories can be categorized variously:
Basic yellow pages: These are organized by human-oriented products and services.

Business directories: This takes the extended information about companies, financial-health, and news clippings.

State business directories: this type of directory is useful in businesses that operate on a state or geographic basis.

Directories by SIC :( standard industrial classification) directories are compiled by the government.

Manufacturer’s directories: if your goal is to sell your product or service to manufacturers, then this type of directory is used.

Big-business directory: This directory lists companies of 100 or more employees.

Metropolitan area business directory: It develops sales and marketing tools for specific cities.

Credit reference directory: This directory provides credit rating codes for millions of US companies.

World Wide Web directory: This lists the various hyperlinks of the various servers scattered around the internet.

Information Filtering
An Information filtering system is a system that removes redundant or unwanted information from an information stream using (semi)automated or computerized methods prior to presentation to a human user. Its main goal is the management of the information overload and increment of the semantic signal-to-noise ratio. To do this the user's profile is compared to some reference characteristics. A notable application can be found in the field of email spam filters. Thus, it is not only the information explosion that necessitates some form of filters, but also inadvertently or maliciously introduced pseudo-information.

On the presentation level, information filtering takes the form of user-preferences-based newsfeeds, etc. Recommender systems are active information filtering systems that attempt to present to the user information items (movies, music, books, news, webpage) the user is interested in.

Information filtering describes a variety of processes involving the delivery of information to people who need it. This technology is needed as the rapid accumulation of information in electronic databases. Information filtering is needed in e-mails, multimedia distributed system and electronic office documents. The features of the information filtering are:

Filtering systems involves large amounts of data (gigabits of text).
Filtering typically involves streams of incoming data, either being broadcast by remote sources or sent directly by other sources like e-mails.
Filtering has also been used to describe the process of accessing and retrieving information from remote database.
Filtering is based on descriptions of individual or group information preferences, often called profiles.
Filtering system deal primarily with textual information.

Email filtering:
It is the processing of e-mail to organize it according to specified criteria. Most often this refers to the automatic processing of incoming messages, but the term also applies to the intervention of human intelligence in addition to anti-spam techniques, and to outgoing emails as well as those being received.

Email filtering software inputs the email. For its output, it might pass the message through unchanged for delivery to the user's mailbox, redirect the message for delivery elsewhere, or even throw the message away. Some mail filters are able to edit messages during processing. Common uses for mail filters include removal of spam and of computer viruses. A less common use is to inspecting outgoing e-mail at some companies to ensure that employees comply with appropriate laws. Users might also employ a mail filter to prioritize messages, and to sort them into folders based on subject matter or other criteria

Mail-filtering agents:
Users of mailing-filtering agents can instruct them to watch for items of interest in e-mail in-boxes, on-line news services, electronic discussion forums, and the like. The mail agent will pull the relevant information and put it in the users personalized newspapers at predetermined intervals.

Example of Apple’s Apple Search software. Mail filters can be installed by the user, either as separate programs (see links below), or as part of their e-mail program (e-mail client).

In e-mail programs, users can make personal, "manual" filters that then automatically filter mail according to the chosen criteria. Most e-mail programs now also have an automatic spam filtering function. Internet service providers can also install mail filters in their mail transfer agents as a service to all of their customers. Corporations often use them to protect their employees and their information technology assets.

News-filtering agents:
These deliver real-time on-line news. Users can indicate topics of interest, and the agent will alert them to news stories on those topics as they appear on the newswire. Users can also create personalized news clipping reports by selecting from news services. Consumers can retrieve their news from through the delivery channel of their choice like fax, e-mail, www page, or lotus notes platform.

Published date : 29 Jul 2015 02:46PM

Unit-VII: Search and Resource Discovery Paradigms

Tags

Photo Stories

Telangana 10th Class (SSC) Exam Time..

AP Intermediate 2nd Year Time Table ..

AP Intermediate 1st Year Time Table ..

TG Intermediate 2nd year Exam Time T..

More Articles

Most Read

Code Optimization

History of Data Base Systems

Push Down Automata

Context Free Grammars

Introduction to the Relational Model

Data Modeling