How to work with the Newton Text-to-Speech extension

Note: this file originally appeared on the Newton Underground site (http://www.newton-underground.com/dev/a0000003.shtml, later moved to http://resources.pdadash.com/newtund/NU/dev/). Since it no longer seems to be available, I have uploaded it here, and commented out or modified links as appropriate. Steve

Article 00003.1
How to work with the Text-to-Speech extension. Contributed by William Nelson <will@newton-underground.com> and Jake Bordens <jake@newton-underground.com>

Note: This article is presented for informational purposes only. We cannot provide you the TTS extensions themselves, nor can I tell you where to find them. Hopefully, Apple/Newton will realize their potential, and release them publicly.

Working with the Text-to-Speech extensions for OS 2.1 Newtons is not difficult - currently, the most difficult aspect of integrating TTS support into your applications is locating the extensions themselves, which are not publicly available. They are, however, fairly widely dispersed amongst the user community, and most users who really want them have found them.

The pre-beta version of TTS that has been circulating consists of two autoparts: Macintalk, which is the actual speech codec, and SpeakText:Newton, which installs as a transport and routes text to Macintalk for speaking with a given set of preferences for voice type, rate, pitch, and so on. However, SpeakText is unnecessary for TTS functionality; it's nothing more than a nice global hook for routing text to Macintalk. When incorporating TTS support into your applications, you'll want to send text to Macintalk directly, with the appropriate control codes.

With Macintalk installed, playSound(textstring) will produce spoken results. So for instance,

playSound("Hello, this is Newton");

someText:="12:00"; 

playSound(someText);

playSound("It is now"&someText&"o'clock");

will all result in spoken output.

Note that raw Macintalk speaking of this sort is quite low in volume, so you'll want to increase the volume, preferably by using the delimited volume command (see Jim Bailey's article, More Text to Speech), or by bracketing any spoken text with a routine that raises system volume to the maximum and then reduces it to the user preference, e.g.

thevolume:=getvolume();

Setvolume(4);

playSound(yourtext);

Setvolume(thevolume);

The default voice type is "Fred", and the rate and pitch controls default at a middle range. However, you can easily produce speech in any of the 9 available voices, and in a great range of pitches and speaking rates. The basic method for control over these options is to embed in the text string specific control codes that Macintalk will parse and respond to appropriately. Any text between double brackets -- [[any text]] -- that is sent to Macintalk via playSound as part of a text string will not be spoken, but rather parsed by Macintalk for control codes. This is true even if the bracketed text is in an invalid format. Jim Bailey's More Text to Speech has a full glossary of controls, so you'll want to consult that for an in-depth discussion. But as an example [[svox xxxx]] will cause text to be spoken in voice xxxx, where xxxx is one of the following nine voices:

fred - the default; sort of like kermit the frog
zarv - zarvox, a kind of spacey computer voice
ralf - ralph, a deep-voiced fred
junr - junior, a bit like fred's son
kath - kathy, fred's wife (can sound "sexy" at the lower pitches)
whis - whisper, a kind of creepy whisper voice
prin - princess, another (higher) female voice
gnws - Good News, essentially sings text
bnws - Bad News, sings text to a dirgelike tune

So, for example,

playSound("[[svox zarv]] Hello, this is Newton");

will speak that sentence in Zarvox's voice.

Putting it all together

Two things to note about the control codes are that 1) they may be placed anywhere within the text string to be spoken; and 2) multiple control codes are possible. So for instance, this text string will be spoken as intended:

playSound("[[svox gnws]][[pbas -10]][[rate +200]] 

Hello, this is [[svox zarv]][[rate - 500]] Newton");

Sample Routine:

The following is a sample NS snippet that I wrote for use in GestureLaunch to speak user-hilighted text in the voice type of their preference, (parm, representing a four letter text string -- zarv, kath, etc). It could easily be keyed to an on-screen button, with similar preferences set for rate and pitch.

begin

	local hilitedText;

	local hiliteOffsets;

	local thevolume;

	local voxchoice;

	thevolume:=getvolume(); // Get current volume settings

	voxchoice:=clone(parm);   // Get the user's choice of voice type

	hiliteOffsets:=gethiliteoffsets(); // Get the hilited section

	

	If not classof (hiliteOffsets) = 'array or length(hiliteOffsets) <0 

then return;



	hilitedText:=substr(hiliteOffsets [0] [0].text, hiliteOffsets [0] [1], 

hiliteOffsets [0][2] - hiliteOffsets [0][1]; //Strip out the text from the hilites



try

	Setvolume(4); //Set volume to max

	playSound("[[svox "&voxchoice&"]] "&hilitedText); //Append the control 

code to the text string to be spoken



	Setvolume(thevolume); // Return volume to user setting



end

Advanced topic: Multiple Sound Channels

The MP2x00 can create and play up to four sound channels simultaneously (and maybe more?). You can take advantage of this to have Macintalk speak the same or different text in different voices simultaneously -- e.g., to have up to four voices speaking at once.

// Text of project C:\bottles\bottles2.ntk written on: 01/05/98 15:33:24

// Beginning of text file definitions.txt

DefConst('kSongText, "[[nmbr norm]][[pmod 1]]]99[[pbas -8]] bottles of

[[pbas +8]]beer[[pbas -8]] on the [[rate 0]][[pbas +8]]wall [[rate 100]]

[[pbas +3]]99[[pbas -8]]bottles of [[rate 0]][[pbas +8]]beer[[pbas -5]]

[[rate 200]] you take one [[rate 0]]down[[rate 200]] pass it around [[slnc 200]]

[[pbas - 8]]98[[pbas +3]] bottles of [[pbas +2]]beer on the [[pbas +2]]wall");



DefConst('kCloseChannelsFunc,



func(channel)



begin



print("Checking to see if we can close the channels");

if (channel[2] = nil) then return;

if (channel[2]:isActive() = nil) then



begin



//close the channels

print("closing the channels");

channel[0]:Close();

channel[1]:Close();

channel[2]:Close();



end; 



else 

AddDeferredCall(channel[3], [channel]);



end



);



// End of text file C:\rowing\definitions.txt

// Beginning of file base.lyt

rowingBaseView :=

{viewBounds: {left: -3, top: 20, bottom: 254, right: 151},

channel1: nil,

channel2: nil,

channel3: nil,

debug: "rowingBaseView",

_proto: @179

};



_view000 := /* child of rowingBaseView */ {_proto: @166};



_view001 := /* child of rowingBaseView */

{

buttonClickScript:

func()

begin



//initialize the sound channels

print("Initializing the sound channels");

spCh1 := {_proto: @431};

spCh2 := {_proto: @431};

spCh3 := {_proto: @431};



//open the sound channels

print("Opening the sound channels");

spCh1:Open();

spCh2:Open();

spCh3:Open();



//sechedule the text to speak

print("Scheduling the text to speak");

spCh1:Schedule("[[rset 0]][[svox ralf]][[pbas 60]][[slnc 10]]" & kSongText);

spCh2:Schedule("[[rset 0]][[svox zarv]][[pbas 45]]" & kSongText);

spCh3:Schedule("[[rset 0]][[svox prin]][[pbas 45]]" & kSongText);



//start the song

print("starting the song");

spCh1:start(true);

spCh2:start(true);

spCh3:start(true);



//add a deferred call to close the sound channels

print("registering the deferred call"); 

AddDeferredCall(kCloseChannelsFunc, [[spCh1, spCh2, spCh3, kCloseChannelsFunc]]);



end,

text: "Sing?",

viewBounds: {left: 14, top: 194, right: 140, bottom: 210},

_proto: @226

};



_view002 := /* child of rowingBaseView */

{

text:

"2 Drunks & a Robot!! -- 

The following initalizes three sound channels and plays a voice track in each. 

Be warned, use this at your own risk. Now... 

without further delay....",

viewBounds: {left: 12, top: 12, right: 142, bottom: 182},

viewJustify: 0,

viewFont: ROM_fontSystem10,

_proto: @218

};



constant |layout_base.lyt| := rowingBaseView;

// End of file base.lyt