This thesis provides a framework for integrating non-speech sound into human-computer interfaces. Previously there was no structured way of doing this, it was done in an ad hoc manner by individual designers. This led to ineffective uses of sound. In order to add sounds to improve usability two questions must be answered: What sounds should be used and where is it best to use them? With these answers a structured method for adding sound can be created. An investigation of earcons as a means of presenting information in sound was undertaken. A series of detailed experiments showed that earcons were effective, especially if musical timbres were used. Parallel earcons were also investigated (where two earcons are played simultaneously) and an experiment showed that they could increase sound presentation rates. From these results guidelines were drawn up for designers to use when creating usable earcons. These formed the first half of the structured method for integrating sound into interfaces. An informal analysis technique was designed to investigate interactions to identify situations where hidden information existed and where non-speech sound could be used to overcome the associated problems. Interactions were considered in terms of events, status and modes to find hidden information. This information was then categorised in terms of the feedback needed to present it. Several examples of the use of the technique were presented. This technique formed the second half of the structured method. The structured method was evaluated by testing sonically-enhanced scrollbars, buttons and windows. Experimental results showed that sound could improve usability by increasing performance, reducing time to recover from errors and reducing workload. There was also no increased annoyance due to the sound. Thus the structured method for integrating sound into interfaces was shown to be effective when applied to existing interface widgets.
The combination of visual and auditory information at the human-computer interface is a natural step forward. In everyday life both senses combine to give complementary information about the world; they are interdependent. The visual system gives us detailed data about a small area of focus whereas the auditory system provides general data from all around, alerting us to things outside our peripheral vision. The combination of these two senses gives much of the information we need about our everyday environment. Dannenberg & Blattner (, pp xviii-xix) discuss some of the advantages of using this approach in multimedia/multimodal computer systems: "In our interaction with the world around us, we use many senses. Through each sense we interpret the external world using representations and organisations to accommodate that use. The senses enhance each other in various ways, adding synergies or further informational dimensions". They go on to say:
"People communicate more effectively through multiple channels. ... Music and other sound in film or drama can be used to communicate aspects of the plot or situation that are not verbalised by the actors. Ancient drama used a chorus and musicians to put the action into its proper setting without interfering with the plot. Similarly, non-speech audio messages can communicate to the computer user without interfering with an application". These advantages can be brought to the multimodal human-computer interface. Whilst directing our visual attention to one task, such as editing a document, we can still monitor the state of other tasks on our machine. Currently, almost all information presented by computers uses the visual sense. This means...