diff --git a/.Rbuildignore b/.Rbuildignore index e3dcca5..7c43eda 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -9,5 +9,5 @@ ^\.httr-oauth$ ^cran-comments\.md$ ^\.Renviron$ -^build$ +^cloud_build$ ^CRAN-RELEASE$ diff --git a/build/build.R b/cloud_build/build.R similarity index 100% rename from build/build.R rename to cloud_build/build.R diff --git a/build/cloudbuild-tests.yml b/cloud_build/cloudbuild-tests.yml similarity index 100% rename from build/cloudbuild-tests.yml rename to cloud_build/cloudbuild-tests.yml diff --git a/vignettes/speech.Rmd b/vignettes/speech.Rmd index e694209..590c647 100644 --- a/vignettes/speech.Rmd +++ b/vignettes/speech.Rmd @@ -1,17 +1,17 @@ --- -title: "Google Cloud Speech API" +title: "Google Cloud Speech-to-Text API" author: "Mark Edmondson" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > - %\VignetteIndexEntry{Google Cloud Speech API} + %\VignetteIndexEntry{Google Cloud Speech-to-Text API} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- -The Google Cloud Speech API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases. +The Google Cloud Speech-to-Text API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases. -Read more [on the Google Cloud Speech Website](https://cloud.google.com/speech/) +Read more [on the Google Cloud Speech-to-Text Website](https://cloud.google.com/speech/) The Cloud Speech API provides audio transcription. Its accessible via the `gl_speech` function. @@ -47,7 +47,7 @@ return$timings # etc... ``` -### Demo for Google Cloud Speech API +### Demo for Google Cloud Speech-to-Text API A test audio file is installed with the package which reads: @@ -96,6 +96,23 @@ result$timings #5 0.900s 1s Dream ``` +## Custom configurations + +You can also send in other arguments which can help shape the output, such as speaker diagrization (labelling different speakers) - to use such custom configurations create a [`RecognitionConfig`](https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig) object. This can be done via R lists which are converted to JSON via `library(jsonlite)` and an example is shown below: + +```r +## Use a custom configuration +my_config <- list(encoding = "LINEAR16", + diarizationConfig = list( + enableSpeakerDiarization = TRUE, + minSpeakerCount = 2, + maxSpeakCount = 3 + )) + +# languageCode is required, so will be added if not in your custom config +gl_speech(my_audio, languageCode = "en-US", customConfig = my_config) +``` + ## Asynchronous calls For speech files greater than 60 seconds of if you don't want your results straight away, set `asynch = TRUE` in the call to the API. diff --git a/vignettes/speech.html b/vignettes/speech.html index 187bea3..eb6dada 100644 --- a/vignettes/speech.html +++ b/vignettes/speech.html @@ -12,9 +12,9 @@ - + -
The Google Cloud Speech API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases.
-Read more on the Google Cloud Speech Website
+The Google Cloud Speech-to-Text API enables you to convert audio to text by applying neural network models in an easy to use API. The API recognizes over 80 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone or enable command-and-control through voice among many other use cases.
+Read more on the Google Cloud Speech-to-Text Website
The Cloud Speech API provides audio transcription. Its accessible via the gl_speech
function.
Arguments include:
A test audio file is installed with the package which reads:
“To administer medicine to animals is frequently a very difficult matter, and yet sometimes it’s necessary to do so”
@@ -378,16 +378,30 @@Word transcripts
#4 0.700s 0.900s A #5 0.900s 1s Dream
You can also send in other arguments which can help shape the output, such as speaker diagrization (labelling different speakers) - to use such custom configurations create a RecognitionConfig
object. This can be done via R lists which are converted to JSON via library(jsonlite)
and an example is shown below:
## Use a custom configuration
+my_config <- list(encoding = "LINEAR16",
+ diarizationConfig = list(
+ enableSpeakerDiarization = TRUE,
+ minSpeakerCount = 2,
+ maxSpeakCount = 3
+ ))
+
+# languageCode is required, so will be added if not in your custom config
+gl_speech(my_audio, languageCode = "en-US", customConfig = my_config)
For speech files greater than 60 seconds of if you don’t want your results straight away, set asynch = TRUE
in the call to the API.
This will return an object of class "gl_speech_op"
which should be used within the gl_speech_op()
function to check the status of the task. If the task is finished, then it will return an object the same form as the non-asynchronous case.
The API can talk several different languages, with more being added over time. You can get a current list via the function gl_talk_languages()
or online
gl_talk_languages()
-# A tibble: 32 x 4
- languageCodes name ssmlGender naturalSampleRateHertz
- <chr> <chr> <chr> <int>
- 1 es-ES es-ES-Standard-A FEMALE 24000
- 2 ja-JP ja-JP-Standard-A FEMALE 22050
- 3 pt-BR pt-BR-Standard-A FEMALE 24000
- 4 tr-TR tr-TR-Standard-A FEMALE 22050
- 5 sv-SE sv-SE-Standard-A FEMALE 22050
- 6 nl-NL nl-NL-Standard-A FEMALE 24000
- 7 en-US en-US-Wavenet-A MALE 24000
- 8 en-US en-US-Wavenet-B MALE 24000
- 9 en-US en-US-Wavenet-C FEMALE 24000
-10 en-US en-US-Wavenet-D MALE 24000
gl_talk_languages()
+# A tibble: 32 x 4
+ languageCodes name ssmlGender naturalSampleRateHertz
+ <chr> <chr> <chr> <int>
+ 1 es-ES es-ES-Standard-A FEMALE 24000
+ 2 ja-JP ja-JP-Standard-A FEMALE 22050
+ 3 pt-BR pt-BR-Standard-A FEMALE 24000
+ 4 tr-TR tr-TR-Standard-A FEMALE 22050
+ 5 sv-SE sv-SE-Standard-A FEMALE 22050
+ 6 nl-NL nl-NL-Standard-A FEMALE 24000
+ 7 en-US en-US-Wavenet-A MALE 24000
+ 8 en-US en-US-Wavenet-B MALE 24000
+ 9 en-US en-US-Wavenet-C FEMALE 24000
+10 en-US en-US-Wavenet-D MALE 24000
If you are looking a specific language, specify that in the function call e.g. to see only Spanish (es
) voices issue:
gl_talk_languages(languageCode = "es")
-# A tibble: 1 x 4
- languageCodes name ssmlGender naturalSampleRateHertz
- <chr> <chr> <chr> <int>
-1 es-ES es-ES-Standard-A FEMALE 24000
gl_talk_languages(languageCode = "es")
+# A tibble: 1 x 4
+ languageCodes name ssmlGender naturalSampleRateHertz
+ <chr> <chr> <chr> <int>
+1 es-ES es-ES-Standard-A FEMALE 24000
You can then specify that voice when calling the API via the name
argument, which overrides the gender
and languageCode
argument:
gl_talk("Hasta la vista", name = "es-ES-Standard-A")
Otherwise, specify your own gender
and languageCode
and the voice will be picked for you:
gl_talk("Would you like a cup of tea?", gender = "FEMALE", languageCode = "en-GB")
Some languages are not yet supported, such as Danish. The API will return an error in those cases.
Support is also included for Speech Synthesis Markup Language (SSML) - more details on using this to insert pauses, sounds and breaks in your audio can be found here: https://cloud.google.com/text-to-speech/docs/ssml
To use, send in your SSML markup around the text you want to talk and set inputType= "ssml"
:
You can output audio files that are optimised for playing on various devices.
+To use audio profiles, supply a character vector of the available audio profiles listed here: https://cloud.google.com/text-to-speech/docs/audio-profiles
- the audio profiles are applied in the order given.
For instance effectsProfileIds="wearable-class-device"
will optimise output for smart watches, effectsProfileIds=c("wearable-class-device","telephony-class-application")
will apply sound filters optimised for smart watches, then telephonic devices.
Creating and clicking on the audio file to play it can be a bit of a drag, so you also have a function that will play the audio file for you, launching via the browser. This can be piped via the tidyverse’s %>%
library(magrittr)
-gl_talk("This is my audio player") %>% gl_talk_player()
-
-## non-piped equivalent
-gl_talk_player(gl_talk("This is my audio player"))
library(magrittr)
+gl_talk("This is my audio player") %>% gl_talk_player()
+
+## non-piped equivalent
+gl_talk_player(gl_talk("This is my audio player"))
The gl_talk_player()
creates a HTML file called player.html
in your working directory by default.
A shiny module has been created to help integrate text-to-speech into your Shiny apps, demo in the video above and below:
-library(shiny)
-library(googleLanguageR) # assume auto auth setup
-
-ui <- fluidPage(
- gl_talk_shinyUI("talk")
-)
-
-server <- function(input, output, session){
-
- transcript <- reactive({
- paste("This is a demo talking Shiny app!")
- })
-
- callModule(gl_talk_shiny, "talk", transcript = transcript)
-}
-
-
-shinyApp(ui = ui, server = server)
library(shiny)
+library(googleLanguageR) # assume auto auth setup
+
+ui <- fluidPage(
+ gl_talk_shinyUI("talk")
+)
+
+server <- function(input, output, session){
+
+ transcript <- reactive({
+ paste("This is a demo talking Shiny app!")
+ })
+
+ callModule(gl_talk_shiny, "talk", transcript = transcript)
+}
+
+
+shinyApp(ui = ui, server = server)