Go to: Book Description (including complete Table of Contents), Cover (267K),
Appendix II (Web Resources), or Preface.
To start thinking about how technology can be applied toward achieving this goal, consider two widely used metaphors for better communications devices - the picture phone and desktop videoconferencing. The picture phone, a telephone with video capability, has been a popular (though sometimes maligned) concept since the AT&T demonstrations in the 1960s By themselves, a pair of picture phones is limited to two participants, and only partially transcends distance and physical boundaries. The need for a handset for sound, and the very small picture, relative to the people and surroundings, makes the participants very conscious that they are using a special device. (Use of the picture phone might be compared to use of a telescope -- there is noticeable benefit, but little illusion.) In the desktop videoconferencing approach, a personal computer or workstation provides augmented audio and video. As with the picture phone, there is little attempt to mask the obvious boundaries between sites and to present an illusion of shared space. However, the augmented computer provides tremendous communication capabilities, and will be the most cost-effective approach for many users.
We don't mean to imply that picture phones and desktop videoconferencing systems are inherently limited to a pair of participants. Just as it is possible to have conference calls with multiple telephones, various approaches may be used to extend the number of participants in picture phone and desktop conferences. Depending on the application, such multipoint conferences may be fundamental to effective communication.
To transcend the physical boundaries, first imagine an environment without the boundaries and then attempt to extend the environment beyond the normal limits. For example, try to think of stretching a conference room across multiple sites. First, we want audio provided in such a way to allow hands-free group discussion. Typically, this means using multiple microphones and speakers with appropriate acoustic controls. Second, we need video cameras and large monitors to make the participants easily visible to each other. And, in many cases, we must provide for shared presentation materials, documents, marker boards, and so forth, so that most of the routine meeting facilities are shared across the multiple sites.
Similarly, imagine a classroom, or a medical practice, or a brokerage house, stretched across multiple sites. We want to provide seemingly single site facilities across multiple sites, in such a way that physical boundaries are not barriers to the participants in the activities.
These metaphors, and the notion that an illusion of shared space can be achieved, set a high level of expectation of system performance. Also, we are used to the audio quality levels established by telephones and radio and the video quality levels provided by commercial television. If the conferencing equipment does not have comparable audio and video quality, the illusion will be diminished.
Achieving this performance requires real time transfer of large amounts of audio, video and data, orders of magnitude more than the quantities associated with telephones. The transfer must be directed and often must be secure, so broadcast technology associated with television is not appropriate. Thus the biggest obstacle to pervasive use of videoconferencing is the gap between the communication requirements and the limitations of the available communication infrastructure. Much of our discussion in this book is about narrowing and eliminating that gap.
Despite difficult gaps between communication requirements and capability, videoconferencing is practical and rapidly growing in popularity. Business meetings are effectively conducted by joining desks and conference rooms with videoconferencing equipment. Distance learning across multiple classrooms and campuses is now a routine practice. Telemedicine enables specialists and general practitioners to collaborate, and provides medical care in rural areas that would otherwise do without. Employers and job candidates meet without either having to travel for a face-to-face meeting. Arraignments and other legal proceedings are conducted by videoconference. Few of the participants in any of these situations have the illusion that they are located at the same sites. But many of them forget they are at multiple sites and proceed as if they were all together.
Let's take a closer look at direct costs. It is feasible to equip a conference room with a reasonable video system for roughly twenty thousand dollars. Depreciating that amount over three years, the monthly equipment cost per room can be kept to well under a thousand dollars. Costs of intra-continental long distance communication for the video systems can easily be kept well under a hundred dollars an hour. So if a room videoconferencing system is used only once a month, it will likely cost less than direct travel costs for a meeting. These are fairly conservative figures; some users will see better cost benefits of videoconferencing. With more frequent use, the direct cost benefit clearly favors videoconferencing over travel. Similar arguments can be used to justify the costs of desktop videoconferencing systems. In this case, the cost benefit may be realized sooner, since desktop systems are much less expensive.
Of course, you could say it would be more cost-effective to use telephones. But in many circumstances, telephones are insufficient. Visual contact between people may or may not be the qualitative difference that makes an activity effective. Those who participate in multiway telephone conferences know that communication is seriously impaired without visual contact between people and shared access to documents, visual aids, diagnostic equipment, stock tickers, and so on. To help reduce these barriers, audiographics systems have been developed as a means to augment telephones with graphics such as shared documents. For some activities, audiographics may be sufficient. We believe that audiographics are a major aspect of videoconferencing, and that motion video is becoming sufficiently affordable that most applications will include video. Much of the discussion in this book is not about motion video per se, but about the aspects of videoconferencing encompassed by audiographics.
Cost of travel versus cost of videoconferencing is often not the correct comparison. Videoconferencing is more than just travel replacement, it is an enabler of communication that otherwise would not take place. Physical meetings are necessary from time to time, but videoconferencing users can make more electronic trips in a day (or week) than they can physical ones, and with much less wear and tear. The telephone is still useful, but when it is insufficient, and a physical meeting is not possible, videoconferencing technologies allow meetings that would otherwise fail, or perhaps not even be attempted.
There are some limits that will likely not be overcome. Some individuals have a reluctance to being on camera and resist the new technology, just as some avoid telephones and airplanes. The boundaries between sites of a conference are visible and inhibit some activities, e.g., side conversations during a meeting, and preclude others, e.g., physical contact.
As with other new technologies, estimating the extent and pace of usage growth is necessarily guesswork. Analogies to the computer industry have significant defects, but are still useful. Some have said that videoconferencing is of limited value and that few systems will be deployed. When IBM began making computers, there were serious questions of whether more than a few tens of computers would ever be sold and used! Rapidly increasing sales of videoconferencing contradict the minimal usage predictions. At the other extreme, some suggest that videoconferencing is the next killer application (in the sense that computer spreadsheets were the killer application that spawned the personal computer market), that will drive demand for computers and communication lines. For the next few years, at least, there are sufficient obstacles to deny the killer application scenarios. But it is reasonable to expect growth sufficient to strain the delivery capacity of equipment suppliers and communication lines. In the personal computer industry, Local Area Networks seemed ready for widespread usage every year from 1984 forward. Each new year was declared The Year of the LAN. Local Area Networks became pervasive by 1989. For several years now, analysts have forecast widespread availability of videoconferencing. Some year soon, the forecasts will have become reality, without a recognizable Year of Videoconferencing.
Sounds audible to humans have a frequency range up to roughly 20,000 cycles per second, or Hertz, abbreviated Hz. The sounds required for speech use a much smaller frequency range, up to roughly 3500 Hz. The telephone network is designed to transmit sounds in this smaller frequency range. Originally, this was done in terms of analog signals, where the strength of the signal on the telephone wires is directly analogous to the loudness of the sound, and the voltage of the signal alternates (between positive and negative) at the frequency of the sound. Most telephone service for residences and small organizations today uses the same analog conventions that were established early in the twentieth century.
Analog telephone service is not well-suited to sophisticated connections of calls, either local and long distance. Also, when analog signals are sent long distance, the quality of the signals always degrades. For these and other reasons, long distance telephone service and private branch exchanges (PBX) for connecting telephones within large organizations began converting to digital signals in the 1960's. Essentially all long distance service is digital now, as is most PBX service.
For home telephones and other phones connected directly to the telephone company switching facility, the last mile, the circuit from the home to the switching facility, is usually still analog. The telephone uses analog signals. These are converted to and from digital signals at the switching facility. When these circuits are used for fax and computer purposes, digital information must be converted to and from analog at both ends, because the switching facility is always performing the conversion. Fax machines and computers use modems (MOdulator/DEModulators) to perform the conversion. The maximum achievable data rate for modems using analog telephone circuits appears to be about 34,000 bits per second.
Digital representations of sounds use numbers, usually called samples, to represent the loudness of the sound. The range of the numbers in a sample determines the signal to noise ratio. Seven bit samples are sufficient for speech, and 16 bit samples are sufficient for high-fidelity representation of music. Two samples per cycle of sound are sufficient to get good representation. Thus, for speech, roughly 8000 seven bit samples per second are enough. For music, roughly 40,000 sixteen bit samples per second are needed for high fidelity; Compact Discs use 44,000 sixteen bit samples per second, and professional recording equipment uses 48,000 sixteen bit samples per second.
Digital telephone systems are designed to handle connection and transmission of many channels of 56,000 (8000 x 7) or 64,000 (8000 x 8) bits per second, each channel representing the sounds of a telephone conversation . (In the U.S., 56,000 bit channels were used originally, but the trend worldwide is toward 64,000 bit channels.) A 64,000 bit channel is referred to as a B-channel (B for bearer). A 56,000 bit channel is restricted. A typical telephone line in urban areas is capable of transmitting a pair of B-channels. If video is going to travel on the telephone network, it should fit within a few of these B-channels. As we now see in discussing the components of video signals, transmitting video within a reasonable number of B-channels is a significant challenge.
A picture on a television screen consists of many small dots, called pixels (picture elements). These are intended to be small enough that only the composite picture is seen, not the dots, but the pixels are readily visible if one looks closely at the screen. For North American broadcast systems, there is a maximum of roughly 360 to 400 pixels per row. There are roughly 480 visible rows broadcast, but most televisions show slightly more than half the rows. For videoconferencing (world-wide), a standard picture consists of 352 pixels horizontal resolution by 288 pixels vertical resolution. A single 352 by 288 picture is usually referred to as a frame.
352 x 288 equals 101,376 pixels. To directly represent a full color pixel requires 24 bits (eight bits for each of the primary colors of light, red, green and blue). Thus one picture could take 101,376 x 24 = 2,433,024 bits. To have motion requires 15-30 picture frames per second, so full-color, full-motion standard resolution video could require up to 73 million bits per second, well over a thousand B-channels! Fortunately, there are bridges across this apparent chasm.
By discarding less important information (for example, using far fewer than 24 bits of color per pixel) and coding the information (for example, only sending the differences between frames, not the entire pictures) it is possible to send a tiny fraction of those 73 million bits and still get good results. A pair of B-channels, across a telephone circuit, gives acceptable results for many uses. With today's coding technology, six B-channels (three telephone circuits) are enough to get very good results. Using more B-channels than six, say 12 or more, allows excellent results.
The most aggressive coding techniques have led to products intended for use with modems and analog telephone lines, using about 20,000 bits per second for the video. These products use lower pixel resolution, 176 x 144 or lower, and low frame rates. These products are becoming available in 1996 as a bundled aspect of personal computers sold for home use, so they are likely to be present in large quantities. It is unknown how well low resolution and frame rates will be accepted for home usage. For some families, having even low quality video will be a wonderful benefit in allowing members to see each other, while for other situations the resolution and frame rate will be too limited to be considered valuable. It is not likely that low resolution and frame rates will be considered sufficient for serious applications.
Coding techniques can also be used to reduce the bandwidth required for audio, but not by such dramatic factors. Instead of using a full B-channel for audio, as implied earlier, it is practical to get speech quality audio in as little as one tenth of a B-channel.
The primary alternative to directly using the telephone network is to use the local area networks and other networks designed for computer-communication. The most overused phrase of 1994 was The Information Superhighway, so we abuse the phrase a few more times to depict the bumps in that road to videoconferencing! The telephone network is based on circuit-switching, which means that once a telephone call (or videoconference) is established, there are circuits (B-channels) dedicated to the call. Most computer networks are based on packet-switching, which means that packets (packages) of data travel on the same network, much like boxes on a conveyor belt or on trucks on the highway. As long as the packets flow smoothly, a computer network is a very good highway for audio and video data, providing the capacity equivalent of many B-channels. However, there are almost always traffic jams on the information highway. When the jams are minor, audio and video can get through in time and things work well. When the jams are major, conversation is halted. Improving computer networks to manage traffic jams, and improving videoconferencing technology to mask the effects of the traffic jams, are major thrusts of current development.
In 1996 there has been a plethora of products and prototypes for telephony on the Internet. As Internet telephony becomes practical and standardized, this will benefit Internet videoconferencing. We will discuss this further in Chapters 2 and 13.
Figure 1.1 - Roll-about Videoconferencing System
Figure 1.1 shows a representative roll-about system circa 1996. This is a good example to start with, from which we can consider the commonality and extensions to both desktop and larger room environments. A roll-about system is a medium scale system intended for use by small groups in typical meetings. It is transportable from room to room, as long as the room has appropriate connections to telephone or local area networks.
The cabinet under the television monitor houses a personal computer and additional equipment to support conferencing. In most cases, the additional equipment is installed inside the personal computer. The camera on top of the monitor is motorized to enable convenient positioning (pan, tilt, zoom in/out) by the participants.
The core technology is based on an industry standard personal computer, with added support for audio and audio coding, motion video and coding and communication protocols and interfaces. There are two major benefits of beginning with the PC. First, by taking advantage of the mass production and low cost of the PC, many of the conferencing functions can be provided using the PC hardware and software at relatively low cost. Second, the integration of the PC makes its normal capabilities directly available to the conference participants.
The television monitor is very appropriate to motion video, but has relatively few pixels per inch (roughly thirty) compared to a computer monitor (roughly 75 pixels per inch). With a group system and this relatively low resolution (pixels per inch) monitor, it is usually appropriate to devote the full monitor screen to motion video from the remote site(s). To display shared presentation materials, shared marker boards, shared computer applications, etc., either requires switching the monitor away from motion video or overlaying video with these alternate images. (This is analogous to mixing of a forecaster and a weather map on a newscast.)
For an individual, a desktop system is both more manageable in physical size and more functional. The user is usually sitting much closer to the monitor than with a group system. Rather than dedicating the full screen to motion video from a remote site, the remote site video is shown in a window, of perhaps 352 by 288 pixels out of a total of 1024 by 768 pixels. The leftover screen pixels can be used for a local site (preview) video window, shared presentation materials, computer applications, and other images. Figure 1.2 depicts a personal computer display with a collection of such windows.
Figure 1.2 - Desktop Windows Example
For a larger scale room system, two or more monitors are used to allow for display of multiple sources of video, shared presentation materials and computer applications. Figure 1.3 shows such a system.
Figure 1.3 - Large Scale Videoconferencing System
Business meetings. Larger companies with multiple locations have videoconferencing rooms at each of their locations. They may even have videoconferencing systems in most of their conference rooms. The systems used are typically mid-scale, larger than the roll-about shown in Figure 1.1 but not as complete as the system in Figure 1.3. The rooms and videoconferencing equipment are scheduled as part of the overall meeting scheduling. Use for intra-company meetings is probably much more prevalent than for meetings involving other companies. One of the explanations for tending toward internal meetings is familiarity of the participants with each other. People seem to be more comfortable with using videoconferencing with people they already know than as part of a first meeting. Once companies become familiar with videoconferencing capabilities, they use it extensively. A large financial institution installed systems in two large cities 1500 miles apart. It was anticipating use in certain emergencies, with dedicated communication circuits already in place. The systems quickly came into use six hours per day for collaboration amongst groups that had been unable to get travel funds to visit each other. One large multinational company reportedly spent $500,000 on long distance charges for videoconferencing in 1994. That figure leads to a guess that that company keeps dozens of videoconferencing facilities busy at least half the business day.
Distance Learning. Most state universities in the United States have multiple campuses, typically a primary campus and several additional campuses, and most of these have videoconferencing facilities for instructional purposes. It is often the case that a single instructor can conduct a class simultaneously across several of the campuses by videoconference. The precedent has been set for decades by broadcast lectures sent from a central site. There are obvious limitations with the broadcast approach, for example, interaction between the instructor and students, shared use of marker boards, etc. Current videoconferencing approaches can overcome these limitations, plus provide capabilities analogous to traditional classroom facilities. For example, student response terminals can be used to not only give the effect of raising hands, but also communicate specific responses. The same approaches apply outside of universities, of course, in corporate, government and other learning and training environments.
Professional Conferences. Is a professional conference a business meeting or a learning event? In many cases it is both of these and more. The number of conferences is both daunting and tempting for many of us because there are many more interesting conferences than we can attend. (We can't spend full time going to conferences.) Fortunately, it is more and more common for major portions of technical conferences and similar meetings to be available, at least with audio and video, on the Internet. With Internet video availability, I can easily (and discreetly) attend the most interesting portions of a technical conference without leaving my office.
Telemedicine. Medical practice is tending toward higher degrees of specialization, along with increasing numbers of general practitioners. In rural or remote areas, there may be no physicians at all. Videoconferencing is being used to allow specialists and generalists to collaborate more effectively in diagnosis and treatment, not only by allowing physicians at different sites to view a single patient, but to share radiographic and other diagnostic information and instrumentation. Even in urban areas, it is feasible to use videoconferencing systems for patient monitoring, potentially allowing patients to remain at home. Again, it is not just motion video that would be communicated in a patient/physician visit, but diagnostic information (heart rate, blood pressure, temperature, etc.) from instruments that can be jointly managed during the visit.
Financial. There is an obvious surge toward electronic funds transfer, automated teller machines and home banking services via telephone and personal computer. On the other hand, there are still some services, for example, opening/closing accounts, loan application, and so forth, that seemingly require human intervention, possibly with more senior personnel. In many instances, these activities can be handled at a branch location via videoconferencing kiosks. In brokerage firms, traders often have a plethora of computing and information devices on their desks. One of these devices is likely to be a videoconferencing system, used for contact with other brokers, with clients and with sources of information.
Product Support. Remote support of products by telephone is routine in many industries, covering everything from household appliances to manufacturing systems. In many cases, the customer problems are much more readily resolved if the support person can see the product and/or the customer can see visual examples from the support person. For sufficiently expensive products, manufacturers find it worthwhile to include videoconferencing equipment with the products to better enable remote support.
Legal. Some preliminary legal proceedings, such as arraignments and depositions, are handled by videoconference. The New York District Attorney's office uses desktop videoconferencing systems for arraignments. Rather than requiring the arresting officer to appear in person in court, the officer participates by videoconference, lessening travel, scheduling and other difficulties.
Employment Interviews. Travel for employment interviews is often a significant impediment for both applicant and employer, especially for initial interviews. By using videoconferences for initial interviews, the employer is able to better evaluate a pool of candidates, then have on-site interviews as warranted.
Sales Kiosks. Direct sales of many kinds of merchandise via telephone and television have been a significant trend in recent years. The customer and the salesperson often desire better communication than is achievable with a telephone call, and desire an interactivity not possible with the television shopping networks. Videoconferencing kiosks allow the centralization efficiency of telemarketing, with added communication benefits for both parties.
System costs. Semiconductor and computing performance, and cost/performance ratio, continue to improve without foreseeable barriers. These trends have a twofold benefit to the effectiveness and cost of videoconferencing equipment. First, the capabilities and costs of specialized hardware follow the general trend. Second, the capabilities of mass produced computers more adequately match the needs of videoconferencing, reducing the need for specialized equipment.
New Applications. As availability and capability improve, many new applications will be found. Telecommuting will become a reality for a noticeable fraction of the work force. Entertainment and social events will be augmented by, or even based on, videoconferencing.
New Approaches. The increase in communication and computing capability will stimulate improvements in providing the illusion of shared space. For example, with current technology, conferences amongst multiple sites typically allow only one or a few of the sites to be seen concurrently, often in an unrealistic fashion. Better technology will enable more sites to be visually active, and may allow virtual reality approaches to presenting the many sites as a shared space.
The first words of many multiway telephone conferences are Does everyone have a copy of the faxed charts?
For most people, any such reluctance goes away with experience.
In the computer industry, it is normal to use K (kilo) and M (mega) to represent 1024 and 1048576, respectively. In the communications industry, it is normal to use these to represent 1000 and 1,000,000, respectively. For example, a 56,000 bit per second line would often be referred to as 56K. In general, we will limit our use of these abbreviations, so as to avoid confusion.
There are many differences in numbers of visible pixels, based on the broadcast standard in use (there are three standards used by different countries) and the design of the television receiver.