An SDK for iOS mobile applications enabling use of the Bluemix Watson Speech To Text and Text To Speech APIs from Watson Developer Cloud
The SDK include support for recording and streaming audio and receiving a transcript of the audio in response.
Using the framework
- Download the watsonsdk.framework.zip and unzip it somewhere convenient
- Once unzipped drag the watsonsdk.framework folder into your xcode project view under the Frameworks folder.
Some additional iOS standard frameworks must be added.
-
Select your project in the Xcode file explorer and open the "Build Phases" tab. Expand the "Link Binary With Libraries" section and click the + icon
-
Add the following frameworks
- AudioToolbox.framework
- AVFoundation.framework
- CFNetwork.framework
- CoreAudio.framework
- Foundation.framework
- libicucore.tbd (or libicucore.dylib on older versions)
- Quartzcore.framework
- Security.framework
in Objective-C
#import <watsonsdk/SpeechToText.h>
#import <watsonsdk/TextToSpeech.h>
in Swift
Add the headers above for Objective-c into a bridging header file. - Use SwiftSpeechHeader.h in Swift sample
This repository contains a sample application demonstrating the SDK functionality.
To run the application clone this repository and then navigate in Finder to folder containing the SDK files.
Double click on the watsonsdk.xcodeproj to launch Xcode.
To run the sample application, change the compile target to 'watsonsdktest-objective-c' or 'watsonsdktest-swift' and run on the iPhone simulator.
Note that this is sample code and no security review has been performed on the code.
The Swift sample was tested in Xcode 8.2.1
.
By default the Configuration will use the IBM Bluemix service API endpoint, custom endpoints can be set using setApiURL
in most cases this is not required.
in Objective-C
STTConfiguration *conf = [[STTConfiguration alloc] init];
in Swift
let conf:STTConfiguration = STTConfiguration()
There are currently two authentication options.
Basic Authentication, using the credentials provided by the Bluemix Service instance.
in Objective-C
[conf setBasicAuthUsername:@"<userid>"];
[conf setBasicAuthPassword:@"<password>"];
in Swift
conf.basicAuthUsername = "<userid>"
conf.basicAuthPassword = "<password>"
Token authentication, if a token authentication provider is running at https://my-token-factory/token
in Objective-C
[conf setTokenGenerator:^(void (^tokenHandler)(NSString *token)){
NSURL *url = [[NSURL alloc] initWithString:@"https://<token-factory-url>"];
NSMutableURLRequest *request = [[NSMutableURLRequest alloc] init];
[request setHTTPMethod:@"GET"];
[request setURL:url];
NSError *error = [[NSError alloc] init];
NSHTTPURLResponse *responseCode = nil;
NSData *oResponseData = [NSURLConnection sendSynchronousRequest:request returningResponse:&responseCode error:&error];
if ([responseCode statusCode] != 200) {
NSLog(@"Error getting %@, HTTP status code %li", url, (long)[responseCode statusCode]);
return;
}
tokenHandler([[NSString alloc] initWithData:oResponseData encoding:NSUTF8StringEncoding]);
} ];
in Swift
...
confSTT.tokenGenerator = self.tokenGenerator()
...
func tokenGenerator() -> ((((String?) -> Void)?)) -> Void {
let url = URL(string: "https://<token-factory-url>")
return ({ ( _ tokenHandler: (((_ token:String?) -> Void)?) ) -> () in
SpeechUtility .performGet({ (data:Data?, response:URLResponse?, error:Error?) in
if error != nil {
print("Error occurred while requesting token: \(error?.localizedDescription ?? "")")
return
}
guard let httpResponse: HTTPURLResponse = response as? HTTPURLResponse else {
print("Invalid response")
return
}
if httpResponse.statusCode != 200 {
print("Error response: \(httpResponse.statusCode)")
return
}
let token:String = String(data: data!, encoding: String.Encoding.utf8)!
tokenHandler!(token)
}, for: url, delegate: self, disableCache: true, header: nil)
})
}
in Objective-C
@property SpeechToText;
...
self.stt = [SpeechToText initWithConfig:conf];
in Swift
var stt:SpeechToText?
...
self.stt = SpeechToText(config: conf)
in Objective-C
[stt listModels:^(NSDictionary* jsonDict, NSError* err){
if(err == nil)
... read values from NSDictionary ...
}];
in Swift
stt?.listModels({ (jsonDict: [AnyHashable: Any]?, error: Error?) in
if err == nil {
print(jsonDict!)
}
})
Available speech recognition models can be obtained using the listModel function.
in Objective-C
[stt listModel:^(NSDictionary* jsonDict, NSError* err){
if(err == nil)
... read values from NSDictionary ...
} withName:@"WatsonSpeechModel"];
in Swift
stt?.listModel({ (jsonDict: [AnyHashable : Any]?, error: Error?) in
if err == nil {
print(jsonDict!)
}
}, withName: "WatsonSpeechModel")
The speech recognition model can be changed in the configuration.
in Objective-C
[conf setModelName:@"ja-JP_BroadbandModel"];
in Swift
confSTT.modelName = "ja-JP_BroadbandModel"
By default audio sent to the server is uncompressed PCM encoded data, compressed audio using the Opus codec can be enabled.
in Objective-C
[conf setAudioCodec:WATSONSDK_AUDIO_CODEC_TYPE_OPUS];
in Swift
confSTT.audioCodec = WATSONSDK_AUDIO_CODEC_TYPE_OPUS
in Objective-C
[stt recognize:^(NSDictionary* res, NSError* err){
if(err == nil) {
SpeechToTextResult *sttResult = [stt getResult:res];
if([sttResult transcript]) {
if([sttResult isFinal]) {
// final transcript
NSLog(@"%@", [sttResult transcript]);
}
else {
// partial transcript
NSLog(@"%@", [sttResult transcript]);
}
}
}
else {
[stt stopRecordingAudio];
[stt endConnection];
}
}];
in Swift
self.sttInstance?.recognize({ (result: [AnyHashable : Any]?, error: Error?) in
if error == nil {
let sttResult = self.sttInstance?.getResult(result)
guard let transcript = sttResult?.transcript else {
return;
}
if (sttResult?.isFinal)! {
// final transcript
print(sttResult?.transcript ?? "")
}
else {
// partial transcript
print(sttResult?.transcript ?? "")
}
self.result.text = transcript
}
else {
self.sttInstance?.stopRecordingAudio()
self.sttInstance?.endConnection()
}
})
The app must explicity indicate to the SDK when transmission should be ended if the continous option is YES.
in Objective-C
[conf setContinuous:YES];
...
[stt endTransmission];
in Swift
conf.continuous = true
...
stt?.endTransmission()
A confidence score is available for any final transcripts (whole sentences). This can be obtained from SpeechToTextResult instance.
in Objective-C
SpeechToTextResult *sttResult = [stt getResult:res];
NSLog(@"Confidence score: %@", [sttResult confidenceScore])
in Swift
let sttResult = self.sttInstance?.getResult(result)
print("Confidence score: \(sttResult?.confidenceScore)")
in Objective-C
[stt recognize:^(NSDictionary *, NSError *) {
...
} powerHandler:^(float power) {
NSLog(@"Power level: %f", power);
}];
in Swift
self.sttInstance?.recognize({ (result: [AnyHashable : Any]?, error: Error?) in
...
}, powerHandler: { (power: Float) in
print("Power level: \(power)")
})
By default the Configuration will use the IBM Bluemix service API endpoint, custom endpoints can be set using setApiURL
in most cases this is not required.
in Objective-C
TTSConfiguration *conf = [[TTSConfiguration alloc] init];
[conf setBasicAuthUsername:@"<userid>"];
[conf setBasicAuthPassword:@"<password>"];
in Swift
let conf: TTSConfiguration = TTSConfiguration()
conf.basicAuthUsername = "<userid>"
conf.basicAuthPassword = "<password>"
You can change the voice model used for TTS by setting it in the configuration.
in Objective-C
[conf setVoiceName:@"en-US_MichaelVoice"];
in Swift
conf.voiceName = "en-US_MichaelVoice"
If you use tokens (from your own server) to get access to the service, provide a token generator to the Configuration. userid
and password
will not be used if a token generator is provided.
in Objective-C
[conf setTokenGenerator:^(void (^tokenHandler)(NSString *token)){
// get a token from your server in secure way
NSString *token = ...
// provide the token to the tokenHandler
tokenHandler(token);
}];
in Objective-C
self.tts = [TextToSpeech initWithConfig:conf];
in Swift
var tts: TextToSpeech?
...
self.tts = TextToSpeech(config: conf)
in Objective-C
[tts listVoices:^(NSDictionary* jsonDict, NSError* err){
if(err == nil)
... read values from NSDictionary ...
}];
in Swift
tts?.listVoices({ (jsonDict:[AnyHashable: Any]?, error:Error?) in
if error == nil {
print(jsonDict!)
}
})
in Objective-C
[self.tts synthesize:^(NSData *data, NSError *reqErr) {
// request error
if(reqErr){
NSLog(@"Error requesting data: %@", [reqErr description]);
return;
}
// play audio and log when playing has finished
[self.tts playAudio:^(NSError *err) {
if(err)
NSLog(@"error playing audio %@", [err localizedDescription]);
else
NSLog(@"audio finished playing");
} withData:data];
} theText:@"Hello World"];
in Swift
tts?.synthesize({ (data: NSData!, reqError: NSError!) -> Void in
if reqError == nil{
tts?.playAudio({ (error: NSError!) -> Void in
if error == nil{
... do something after the audio has played ...
}
else{
... data error handling ...
}
}, withData: data)
}
else
... request error handling ...
}, theText: "Hello World")
in Objective-C
[self.tts synthesize:^(NSData *data, NSError *reqErr) {
// request error
if(reqErr){
NSLog(@"Error requesting data: %@", [reqErr description]);
return;
}
// play audio and log when playing has finished
[self.tts playAudio:^(NSError *err) {
if(err)
NSLog(@"error playing audio %@", [err localizedDescription]);
else
NSLog(@"audio finished playing");
} withData:data];
} theText:@"Hello World" customizationId:@"your-customization-id"];
in Swift
tts?.synthesize({ (data: NSData!, reqError: NSError!) -> Void in
if reqError == nil{
tts?.playAudio({ (error: NSError!) -> Void in
if error == nil{
... do something after the audio has played ...
}
else{
... data error handling ...
}
}, withData: data)
}
else
... request error handling ...
}, theText: "Hello World", customizationId: "your-customization-id")
Find more open source projects on the IBM Github Page.
Copyright 2017 IBM Corporation under the Apache 2.0 license.