Closed Bug 1003457 Opened 10 years ago Closed 9 years ago

Text to Speech API on Firefox for Windows

Categories

(Core :: DOM: Core & HTML, defect)

All
Windows 8
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla42
Tracking Status
firefox42 --- fixed

People

(Reporter: yash.girdhar, Assigned: m_kato)

References

(Depends on 1 open bug)

Details

(Keywords: dev-doc-complete, Whiteboard: [webspeechapi])

Attachments

(2 files, 12 obsolete files)

19.89 KB, text/plain
Details
15.24 KB, patch
jimm
: review+
Details | Diff | Splinter Review
As a part of Web Speech API (https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html), implement Text to Speech API on firefox for OS X.

Plan is to use SAPI (Speech Application Programming API).

The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech synthesis within Windows applications.
Component: Disability Access APIs → DOM
Assignee: nobody → yash.girdhar
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
First and a very basic version of the Sapi service. Following functionalities have been implemented :
-> speak() function
-> getVoices() function
-> specific voice can be selected. Demo : http://eeejay.github.io/webspeechdemos/
-> events support(start and end of input stream, word and sentence boundary), the stop/pause/resume/interrupt functions (Though I have tested them at a basic level).

Though I am still working on the patch right now, feedback/suggestions are welcome.
I just tried this with a debug build on windows. It works indeed! Some quick feedback, please use the feedback flag in the future when you are ready with a new version.

1. The patch needs to be rebased, nsStaticXULComponents.cpp went away.
2. Please clean up all the whitespace, replace all tabs with spaces and remove trailing whitespace.
3. Please conform to the coding style in that module.

After playing with it, I noticed:
1. The voice doesn't change, even though I choose different voices.
2. The boundary events don't give accurate timestamps.
3. It crashes a lot. This is a 64bit debug try build, I'm not really set up for getting a stack trace in windows, soI don't know if it related to the patch.
-> support for cancelling the current speech, along with pause and resume events.I have tested them at some level, but they need more testing.
-> support for changing pitch, rate and volume.
-> It gives correct timestamps for the boundary events now. I have used windows inbuilt function and I am searching if there is any better option.

@Eitan : Regarding changing the voice, it works fine on my system, so I am still looking for what could the problem in your case.

Next to do :

Make the speech concurrent, so that two windows can simultaneously use it. Right now, the speech for two different windows is not independent.
Attachment #8451741 - Attachment is obsolete: true
Attachment #8460158 - Flags: feedback?
Attachment #8460158 - Flags: feedback? → feedback?(eitan)
Comment on attachment 8460158 [details] [diff] [review]
Added some more functionalities to the service

Review of attachment 8460158 [details] [diff] [review]:
-----------------------------------------------------------------

Hey Yash.

Before I give feedback, please rebase this on top of most recent mozilla-central. Thanks!
Attachment #8460158 - Flags: feedback?(eitan)
Some minor changes and rebased on the Mozilla-central head.
Attachment #8460158 - Attachment is obsolete: true
Attachment #8461732 - Flags: feedback+
some changes, to resolve the try errors.
Attachment #8461732 - Attachment is obsolete: true
Attachment #8462011 - Flags: feedback+
The previous one failed on try on windows, due to a known bug in visual studio 2010 - http://connect.microsoft.com/VisualStudio/feedback/details/621653/including-stdint-after-intsafe-generates-warnings .
Added a workaround for that in this patch. Didn't test locally.
Attachment #8462011 - Attachment is obsolete: true
Attachment #8462960 - Flags: feedback+
Disabled 'FAIL_ON_WARNINGS', so that the patch runs on the try server. 
Removed extra whitespace and corrected newlines.
Attachment #8462960 - Attachment is obsolete: true
Attachment #8470183 - Flags: feedback+
Rebased on top of Mozilla-central.
Attachment #8470183 - Attachment is obsolete: true
Attachment #8470206 - Flags: feedback+
Corrected a minor mistake.
Attachment #8470206 - Attachment is obsolete: true
Attachment #8470414 - Flags: feedback+
Comment on attachment 8470414 [details] [diff] [review]
Added more functionalities to the service

Review of attachment 8470414 [details] [diff] [review]:
-----------------------------------------------------------------

This works as expected!

To be honest, the windows code is a bit hard for me to understand. I don't really get how this does not completely block the UI thread, but it magically doesn't.

- Still plenty of whitespace issues, both trailing whitespaces and tabs. Also, code style is wrong.
- The mochitests fail on windows. Looks like the addition of more voices are not accounted for in the test.
- The voice lang name is wrong, for David I see "409", it should be "en-US".
- The voice URLs are strange (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-US_DAVID_11.0). They should be consistent with other speech services: urn:moz-tts:[servicename]:[voicename]?[languagecode].

::: content/media/webspeech/synth/nsISpeechService.idl
@@ +149,5 @@
>     * @param aTask  task instance for utterance, used for sending events or audio
>     *                 data back to browser.
>     */
>    void speak(in DOMString aText, in DOMString aUri,
> +             in float aRate, in float aPitch, in float aVolume,

Add documentation above. Explain that the volume should only be used in indirect services.

::: content/media/webspeech/synth/nsSpeechTask.cpp
@@ +462,5 @@
>      DebugOnly<nsresult> rv = mCallback->OnPause();
>      NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Unable to call onPause() callback");
>    }
> +  
> +  if (!mIndirectAudio)

Trailing spaces (above if line).

As a rule, indirect services should emit their own events, correct?

::: content/media/webspeech/synth/nsSynthVoiceRegistry.cpp
@@ +542,5 @@
>                              const nsAString& aLang,
>                              const nsAString& aUri,
>                              const float& aRate,
>                              const float& aPitch,
> +							const float& aVolume,

Remove tab indentations.

::: content/media/webspeech/synth/nsSynthVoiceRegistry.h
@@ +34,5 @@
>    already_AddRefed<nsSpeechTask> SpeakUtterance(SpeechSynthesisUtterance& aUtterance,
>                                                  const nsAString& aDocLang);
>  
>    void Speak(const nsAString& aText, const nsAString& aLang,
> +             const nsAString& aUri, const float& aRate, const float& aPitch, const float& aVolume,

Wrap lines at 80 columns.

::: content/media/webspeech/synth/sapi/nsSapiService.cpp
@@ +22,5 @@
> +#include "mozilla/DebugOnly.h"
> +
> +enum TtsGenderType {
> +  TTS_GENDER_NONE,
> +    TTS_GENDER_MALE,

indentation

@@ +26,5 @@
> +    TTS_GENDER_MALE,
> +    TTS_GENDER_FEMALE
> +};
> +
> +double starting_time;

Why is this a global?

@@ +39,5 @@
> +        public:
> +          SapiVoice() {}
> +          ~SapiVoice() {}
> +
> +          NS_INLINE_DECL_THREADSAFE_REFCOUNTING(SapiVoice)

No need for this to be thread safe, AFAICT.

@@ +43,5 @@
> +          NS_INLINE_DECL_THREADSAFE_REFCOUNTING(SapiVoice)
> +
> +          WCHAR *mName;
> +          WCHAR *mLanguage;
> +          TtsGenderType mGender;

Why do we need gender? Our API does not use that, and I don't see it used anywhere.

@@ +47,5 @@
> +          TtsGenderType mGender;
> +          CComPtr<ISpObjectToken> voice_token;
> +      };
> +    
> +      HRESULT WaitAndPumpMessagesWithTimeout(HANDLE hWaitHandle, DWORD dwMilliseconds)

Not sure why this needs do be a windowsy function. One argument could be gotten from a field off of runnable, and the other is always the same (INFINITE). This could be a private method of the runnable and return void.

@@ +50,5 @@
> +    
> +      HRESULT WaitAndPumpMessagesWithTimeout(HANDLE hWaitHandle, DWORD dwMilliseconds)
> +      {
> +        HRESULT hr = S_OK;
> +          BOOL fContinue = TRUE;

Don't use platform types when not necessary.

@@ +54,5 @@
> +          BOOL fContinue = TRUE;
> +
> +          while (fContinue)
> +          {
> +            DWORD dwWaitId = ::MsgWaitForMultipleObjectsEx(1, &hWaitHandle, dwMilliseconds, QS_ALLINPUT, MWMO_INPUTAVAILABLE);

This DWORD is the only platform type you need to use in this entire file.

@@ +106,5 @@
> +        std::wstring mText;
> +
> +        // We use this pointer to compare it with the current service task.
> +        // If they differ, this runnable should stop.
> +        nsISpeechTask* mTask;

The comment above is not true. You use mTask for dispatching events to gecko.

@@ +211,5 @@
> +        DebugOnly<nsresult> rv =
> +        data->mRegistry->AddVoice(
> +            data->mService, aUri, nsDependentString(static_cast<char16_t*>(aVoice->mName)), nsDependentString(static_cast<char16_t*>(aVoice->mLanguage)), true);
> +        NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Failed to add voice");
> +        rv = data->mRegistry->SetDefaultVoice(aUri, true);

So every voice is set as default? That doesn't make sense..

@@ +221,5 @@
> +      {
> +        //not sure if we need this
> +        /*if (mCurrentTask != callbackRunnable->mTask) { 
> +          return;
> +          } */

From some crude testing, it is not. Not sure why. But you should remove this before review.

@@ +241,5 @@
> +              break;
> +            }
> +            case SPEI_TTS_BOOKMARK:{
> +              currentIndex = static_cast<ULONG>(speechEvent.lParam) - prefixLen;
> +              callbackRunnable->mTask->DispatchBoundary(nsDependentString(static_cast<char16_t*>(L"mark")), GetTickCount() - starting_time, currentIndex);

Wrap long lines

@@ +291,5 @@
> +        mVoice->SetInterest(event_mask, event_mask);
> +
> +        /*set the callback function for receiving the events*/
> +        SPNOTIFYCALLBACK *cb = SpeechEventCallback;
> +        mVoice->SetNotifyCallbackFunction(cb, 0, 0);

Can't you directly cast SpeechEventCallback here instead of an intermediate variable?

@@ +315,5 @@
> +
> +        for (unsigned int i = 0; i < voice_count; i++){
> +          SapiVoice *voice = new SapiVoice();
> +          CComPtr<ISpObjectToken> voice_token;
> +          if (S_OK != voice_tokens->Next(1, &voice_token, NULL)){

How about that condition being the while loop instead of a for loop?

@@ +342,5 @@
> +            else if (0 == _wcsicmp(gender_s, L"female")){
> +              gender = TTS_GENDER_FEMALE;
> +            }   
> +          }
> +          voice->mGender = gender;

This gender stuff looks useless.

@@ +345,5 @@
> +          }
> +          voice->mGender = gender;
> +            
> +            WCHAR *language = nullptr;
> +            if (S_OK != attributes->GetStringValue(L"Language", &language)){

It looks like this returns an MS LangID, it needs to be converted to a BCP 47 language tag (eg. en-GB).

@@ +365,5 @@
> +
> +      void nsSapiService::RegisterVoices()
> +      {
> +        VoiceTraverserData data = { this, nsSynthVoiceRegistry::GetInstance() };
> +        mVoices.Enumerate(SapiAddVoiceTraverser, &data);

Can't this happen in PopulateVoices instead of going over the list again here?

::: content/media/webspeech/synth/sapi/nsSapiService.h
@@ +63,5 @@
> +        bool mInitialized;
> +
> +        nsRefPtrHashtable<nsStringHashKey, SapiVoice> mVoices;
> +
> +        ISpVoice* mVoice;

The name is misleading, since it does not refer to the same distinct voices that we use. I would call this mSapiVoice or mSapiClient.
Attachment #8470414 - Flags: feedback+
(In reply to Eitan Isaacson [:eeejay] from comment #12)
> Comment on attachment 8470414 [details] [diff] [review]
> Added more functionalities to the service
> 
> Review of attachment 8470414 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This works as expected!
> 
> To be honest, the windows code is a bit hard for me to understand. I don't
> really get how this does not completely block the UI thread, but it
> magically doesn't.
> 
> - Still plenty of whitespace issues, both trailing whitespaces and tabs.
> Also, code style is wrong.
> - The mochitests fail on windows. Looks like the addition of more voices are
> not accounted for in the test.
> - The voice lang name is wrong, for David I see "409", it should be "en-US".
> - The voice URLs are strange
> (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-
> US_DAVID_11.0). They should be consistent with other speech services:
> urn:moz-tts:[servicename]:[voicename]?[languagecode].
> 
> ::: content/media/webspeech/synth/nsISpeechService.idl
> @@ +149,5 @@
> >     * @param aTask  task instance for utterance, used for sending events or audio
> >     *                 data back to browser.
> >     */
> >    void speak(in DOMString aText, in DOMString aUri,
> > +             in float aRate, in float aPitch, in float aVolume,
> 
> Add documentation above. Explain that the volume should only be used in
> indirect services.

Done.

> 
> ::: content/media/webspeech/synth/nsSpeechTask.cpp
> @@ +462,5 @@
> >      DebugOnly<nsresult> rv = mCallback->OnPause();
> >      NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Unable to call onPause() callback");
> >    }
> > +  
> > +  if (!mIndirectAudio)
> 
> Trailing spaces (above if line).
> 
> As a rule, indirect services should emit their own events, correct?

Correct, AFAIK.

> 
> ::: content/media/webspeech/synth/nsSynthVoiceRegistry.cpp
> @@ +542,5 @@
> >                              const nsAString& aLang,
> >                              const nsAString& aUri,
> >                              const float& aRate,
> >                              const float& aPitch,
> > +							const float& aVolume,
> 
> Remove tab indentations.

Done.

> 
> ::: content/media/webspeech/synth/nsSynthVoiceRegistry.h
> @@ +34,5 @@
> >    already_AddRefed<nsSpeechTask> SpeakUtterance(SpeechSynthesisUtterance& aUtterance,
> >                                                  const nsAString& aDocLang);
> >  
> >    void Speak(const nsAString& aText, const nsAString& aLang,
> > +             const nsAString& aUri, const float& aRate, const float& aPitch, const float& aVolume,
> 
> Wrap lines at 80 columns.

Done.

> ::: content/media/webspeech/synth/sapi/nsSapiService.cpp
> @@ +22,5 @@
> > +#include "mozilla/DebugOnly.h"
> > +
> > +enum TtsGenderType {
> > +  TTS_GENDER_NONE,
> > +    TTS_GENDER_MALE,
> 
> indentation

Gender Code removed.

> @@ +26,5 @@
> > +    TTS_GENDER_MALE,
> > +    TTS_GENDER_FEMALE
> > +};
> > +
> > +double starting_time;
> 
> Why is this a global?

It is used in various functions and I didn't think it's important enough to be a field of the class. 

> 
> @@ +39,5 @@
> > +        public:
> > +          SapiVoice() {}
> > +          ~SapiVoice() {}
> > +
> > +          NS_INLINE_DECL_THREADSAFE_REFCOUNTING(SapiVoice)
> 
> No need for this to be thread safe, AFAICT.

If I remove this, it gives error in using SapiVoice in the nsRefPtrHashtable. 
 
> @@ +43,5 @@
> > +          NS_INLINE_DECL_THREADSAFE_REFCOUNTING(SapiVoice)
> > +
> > +          WCHAR *mName;
> > +          WCHAR *mLanguage;
> > +          TtsGenderType mGender;
> 
> Why do we need gender? Our API does not use that, and I don't see it used
> anywhere.

Removed.

> @@ +47,5 @@
> > +          TtsGenderType mGender;
> > +          CComPtr<ISpObjectToken> voice_token;
> > +      };
> > +    
> > +      HRESULT WaitAndPumpMessagesWithTimeout(HANDLE hWaitHandle, DWORD dwMilliseconds)
> 
> Not sure why this needs do be a windowsy function. One argument could be
> gotten from a field off of runnable, and the other is always the same
> (INFINITE). This could be a private method of the runnable and return void.

Have made this a method of the runnable.

> @@ +50,5 @@
> > +    
> > +      HRESULT WaitAndPumpMessagesWithTimeout(HANDLE hWaitHandle, DWORD dwMilliseconds)
> > +      {
> > +        HRESULT hr = S_OK;
> > +          BOOL fContinue = TRUE;
> 
> Don't use platform types when not necessary.

Converted.
 
> @@ +54,5 @@
> > +          BOOL fContinue = TRUE;
> > +
> > +          while (fContinue)
> > +          {
> > +            DWORD dwWaitId = ::MsgWaitForMultipleObjectsEx(1, &hWaitHandle, dwMilliseconds, QS_ALLINPUT, MWMO_INPUTAVAILABLE);
> 
> This DWORD is the only platform type you need to use in this entire file.
> 
> @@ +106,5 @@
> > +        std::wstring mText;
> > +
> > +        // We use this pointer to compare it with the current service task.
> > +        // If they differ, this runnable should stop.
> > +        nsISpeechTask* mTask;
> 
> The comment above is not true. You use mTask for dispatching events to gecko.

Changed.

> @@ +211,5 @@
> > +        DebugOnly<nsresult> rv =
> > +        data->mRegistry->AddVoice(
> > +            data->mService, aUri, nsDependentString(static_cast<char16_t*>(aVoice->mName)), nsDependentString(static_cast<char16_t*>(aVoice->mLanguage)), true);
> > +        NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Failed to add voice");
> > +        rv = data->mRegistry->SetDefaultVoice(aUri, true);
> 
> So every voice is set as default? That doesn't make sense..

Changed it. It was initially written for some testing, forgot to remove it.

> @@ +221,5 @@
> > +      {
> > +        //not sure if we need this
> > +        /*if (mCurrentTask != callbackRunnable->mTask) { 
> > +          return;
> > +          } */
> 
> From some crude testing, it is not. Not sure why. But you should remove this
> before review.

Removed.
 
> @@ +241,5 @@
> > +              break;
> > +            }
> > +            case SPEI_TTS_BOOKMARK:{
> > +              currentIndex = static_cast<ULONG>(speechEvent.lParam) - prefixLen;
> > +              callbackRunnable->mTask->DispatchBoundary(nsDependentString(static_cast<char16_t*>(L"mark")), GetTickCount() - starting_time, currentIndex);
> 
> Wrap long lines

Done.
 
> @@ +291,5 @@
> > +        mVoice->SetInterest(event_mask, event_mask);
> > +
> > +        /*set the callback function for receiving the events*/
> > +        SPNOTIFYCALLBACK *cb = SpeechEventCallback;
> > +        mVoice->SetNotifyCallbackFunction(cb, 0, 0);
> 
> Can't you directly cast SpeechEventCallback here instead of an intermediate
> variable?

Done.

> @@ +315,5 @@
> > +
> > +        for (unsigned int i = 0; i < voice_count; i++){
> > +          SapiVoice *voice = new SapiVoice();
> > +          CComPtr<ISpObjectToken> voice_token;
> > +          if (S_OK != voice_tokens->Next(1, &voice_token, NULL)){
> 
> How about that condition being the while loop instead of a for loop?

Done.

> @@ +342,5 @@
> > +            else if (0 == _wcsicmp(gender_s, L"female")){
> > +              gender = TTS_GENDER_FEMALE;
> > +            }   
> > +          }
> > +          voice->mGender = gender;
> 
> This gender stuff looks useless.

Removed.
 
> @@ +345,5 @@
> > +          }
> > +          voice->mGender = gender;
> > +            
> > +            WCHAR *language = nullptr;
> > +            if (S_OK != attributes->GetStringValue(L"Language", &language)){
> 
> It looks like this returns an MS LangID, it needs to be converted to a BCP
> 47 language tag (eg. en-GB).

Will do that in the next version. Tried some in-Built methods, but no success.

> @@ +365,5 @@
> > +
> > +      void nsSapiService::RegisterVoices()
> > +      {
> > +        VoiceTraverserData data = { this, nsSynthVoiceRegistry::GetInstance() };
> > +        mVoices.Enumerate(SapiAddVoiceTraverser, &data);
> 
> Can't this happen in PopulateVoices instead of going over the list again
> here?

Done.

> ::: content/media/webspeech/synth/sapi/nsSapiService.h
> @@ +63,5 @@
> > +        bool mInitialized;
> > +
> > +        nsRefPtrHashtable<nsStringHashKey, SapiVoice> mVoices;
> > +
> > +        ISpVoice* mVoice;
> 
> The name is misleading, since it does not refer to the same distinct voices
> that we use. I would call this mSapiVoice or mSapiClient.

Renamed to mSapiClient.
Attached patch First version of the service (obsolete) — Splinter Review
worked on the feedback. 
except MS LangID still not converted to a BCP 47 language tag. will do that in the next version.
Attachment #8470414 - Attachment is obsolete: true
Attachment #8473735 - Flags: feedback?(eitan)
Comment on attachment 8473735 [details] [diff] [review]
First version of the service

Review of attachment 8473735 [details] [diff] [review]:
-----------------------------------------------------------------

Overall this is progress.

I am sure smaug will have a lot more feedback in his review. Before you request a review make sure that:
1. The language string is fixed.
2. Mochitests pass on Windows.

::: content/media/webspeech/synth/nsSpeechTask.cpp
@@ +454,5 @@
>      return;
>    }
>  
> +  if (mStream) {
> +	  mStream->ChangeExplicitBlockerCount(1);

whitespace

@@ +463,5 @@
>      NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Unable to call onPause() callback");
>    }
>  
> +  if (!mIndirectAudio){
> +	  DispatchPauseImpl(GetCurrentTime(), GetCurrentCharOffset());

whitespace

@@ +477,5 @@
>      return;
>    }
>  
> +  if (mStream) {
> +	  mStream->ChangeExplicitBlockerCount(-1);

whitespace

@@ +486,5 @@
>      NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Unable to call onResume() callback");
>    }
>  
> +  if (!mIndirectAudio){
> +	  DispatchResumeImpl(GetCurrentTime(), GetCurrentCharOffset());

whitespace

@@ +498,5 @@
>  
>    LOG(PR_LOG_DEBUG, ("nsSpeechTask::Cancel"));
>  
> +  if (mStream) {
> +	  mStream->ChangeExplicitBlockerCount(1);

whitespace

@@ +507,5 @@
>      NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Unable to call onCancel() callback");
>    }
>  
> +  if (!mIndirectAudio){
> +	  DispatchEndImpl(GetCurrentTime(), GetCurrentCharOffset());

whitespace

::: content/media/webspeech/synth/sapi/nsSapiService.cpp
@@ +20,5 @@
> +#include "prenv.h"
> +
> +#include "mozilla/DebugOnly.h"
> +
> +double starting_time;

This still seems wrong to me, especially since starting_time is specific to each call of speak(). Please make this a field in the runnable class (have it public so service instance could access it).
Attachment #8473735 - Flags: feedback?(eitan) → feedback+
Made the code changes suggested in the last feedback. 

However, regarding the mochitest failing, I am getting some test timed out errors on running the mochitest for the synth directory(on two tests) on my system.

Eitan, I guess you were working on something similar here(https://bugzilla.mozilla.org/show_bug.cgi?id=1059472) . I tried after adding your patch also, but its not working.

I am also attaching the errors which I am getting with this patch.
Attachment #8473735 - Attachment is obsolete: true
Flags: needinfo?(eitan)
Here is the errors which I am getting on my system, when running the mochitest.
Flags: needinfo?(eitan)
I was finally able to build it and test.

First of all: this patch introduces a dependency on atlbase.h. In the past, I tried to get this to compile with visual studio express 2013 with no luck because apparently, that file is not included.

I decided this week to give it a second chance on a clean VM, and it builds fine. The 3 variables I could think of that would make this work are:
1. Something else in m-c introduced an include for atlbase.h and we somehow added a workaround for it.
2. A newer MozillaBuild is doing something different.
3. I built it with 32bit toolchain (i think), perhaps it is missing in 64bit?

I have no clue...

But I am still concerned about this introduction, and I am not sure how we should be dealing with this. Should we disable TTS support on windows during configure if no atlbase.h is found?

Ryan, what are the best practices for this?
Flags: needinfo?(ryanvm)
(In reply to Yash from comment #16)
> I am also attaching the errors which I am getting with this patch.

Those errors have to do with the fact that the SAPI voices have higher priority since they were added earlier. They are interfering with our voice selecting priority test. You will need to provide a preference for removing SAPI voices during mochitests, and disable it in the test's pushPrefEnv().
Comment on attachment 8513423 [details] [diff] [review]
Updated patch for the sapi service

Review of attachment 8513423 [details] [diff] [review]:
-----------------------------------------------------------------

Besides the whitespace comments, please rebase this on mozilla-central.
Also, the voice.lang attribute needs to be fixed and return a proper locale string. Maybe someone more familiar with windows could give better insight on how to fix it. But it is definitely a show stopper.

After you rebase, and address these issues. Please flag smaug for feedback, he should probably review this.

::: content/media/webspeech/synth/moz.build
@@ +41,5 @@
>  
> +    if CONFIG['MOZ_SYNTH_SAPI']:
> +	      DIRS = ['sapi']
> +	      CFLAGS += [
> +            '-wd4005'

Can't this be put in sapi/moz.build?

@@ +49,5 @@
>      'ipc/PSpeechSynthesis.ipdl',
>      'ipc/PSpeechSynthesisRequest.ipdl',
>  ]
>  
> +FAIL_ON_WARNINGS = False

Is this necessary?

::: content/media/webspeech/synth/nsSpeechTask.cpp
@@ +474,5 @@
>      DebugOnly<nsresult> rv = mCallback->OnPause();
>      NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Unable to call onPause() callback");
>    }
>  
> +  if (!mIndirectAudio){

Add space between ) and {

@@ +497,5 @@
>      DebugOnly<nsresult> rv = mCallback->OnResume();
>      NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Unable to call onResume() callback");
>    }
>  
> +  if (!mIndirectAudio){

Same

@@ +518,5 @@
>      DebugOnly<nsresult> rv = mCallback->OnCancel();
>      NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Unable to call onCancel() callback");
>    }
>  
> +  if (!mIndirectAudio){

Same.

::: content/media/webspeech/synth/sapi/moz.build
@@ +6,5 @@
> +
> +UNIFIED_SOURCES += [
> +'nsSapiService.cpp',
> +'SapiModule.cpp'
> +]

Indentation.

::: content/media/webspeech/synth/sapi/nsSapiService.cpp
@@ +32,5 @@
> +  SapiVoice() {}
> +  ~SapiVoice() {}
> +  NS_INLINE_DECL_THREADSAFE_REFCOUNTING(SapiVoice)
> +  WCHAR *mName;
> +  WCHAR *mLanguage;

* goes near type.

@@ +64,5 @@
> +  nsRefPtr<nsSapiService> mService;
> +
> +  // A windows handle to receive speech events, passed on to WaitForEvents()
> +  HANDLE hEventsWaitHandle; 
> +  

Trailing whitespace

@@ +79,5 @@
> +void SapiCallbackRunnable::WaitForEvents(HANDLE hEventsWaitHandle)
> +{
> +  DWORD dwMilliseconds = INFINITE;
> +  bool fContinue = true;
> +  while (fContinue){

space between ) and {.

@@ +82,5 @@
> +  bool fContinue = true;
> +  while (fContinue){
> +    DWORD dwWaitId = ::MsgWaitForMultipleObjectsEx(1, &hEventsWaitHandle, dwMilliseconds,
> +                                                   QS_ALLINPUT, MWMO_INPUTAVAILABLE);
> +    switch (dwWaitId){

Same.

@@ +83,5 @@
> +  while (fContinue){
> +    DWORD dwWaitId = ::MsgWaitForMultipleObjectsEx(1, &hEventsWaitHandle, dwMilliseconds,
> +                                                   QS_ALLINPUT, MWMO_INPUTAVAILABLE);
> +    switch (dwWaitId){
> +    case WAIT_OBJECT_0:{

Same.

@@ +87,5 @@
> +    case WAIT_OBJECT_0:{
> +      fContinue = false;
> +      break;
> +    }
> +    case WAIT_OBJECT_0 + 1:{

Same. Not sure I like this + in a switch statement. There is no proper constant to use?

@@ +89,5 @@
> +      break;
> +    }
> +    case WAIT_OBJECT_0 + 1:{
> +      MSG Msg;
> +      while (::PeekMessage(&Msg, NULL, 0, 0, PM_REMOVE)){

Same.

@@ +95,5 @@
> +        ::DispatchMessage(&Msg);
> +      }
> +      break;
> +    }
> +    case WAIT_TIMEOUT:{

Same.

@@ +99,5 @@
> +    case WAIT_TIMEOUT:{
> +      fContinue = false;
> +      break;
> +    }
> +    default:{

Same. I think we don't do curly brackets if there are no scoped variables in the switch statement.

@@ +110,5 @@
> +}
> +
> +NS_IMETHODIMP SapiCallbackRunnable::Run()
> +{
> +  if (!IsCurrentTask()){

Same.

@@ +117,5 @@
> +  }
> +
> +  nsISpeechTask* task = mTask;
> +  task->Setup(this, 1, 16000, 2);	//the last three parameters doesn't matter
> +                                  //for an indirect service

Either align these comments or just make it one above the Setup() line.

@@ +128,5 @@
> +
> +NS_IMETHODIMP
> +SapiCallbackRunnable::OnPause()
> +{
> +  if (S_OK != mService->mSapiClient->Pause()){

Same.

@@ +160,5 @@
> +  mService->mSapiClient->Speak(L"", SPF_PURGEBEFORESPEAK | SPF_ASYNC, NULL);
> +
> +  mService->mCurrentTask = nullptr;
> +  mTask->DispatchEnd(GetTickCount() - starting_time, mService->currentIndex);
> +  if (mService->isPaused){

Same. You get the picture, fix this everywhere :) I'll stop annotating these.

@@ +263,5 @@
> +
> +  mSapiClient->SetInterest(event_mask, event_mask);
> +
> +  /*set the callback function for receiving the events*/
> +  mSapiClient->SetNotifyCallbackFunction((SPNOTIFYCALLBACK *)SpeechEventCallback, 0, 0);

No space in cast (SPNOTIFYCALLBACK*)

@@ +285,5 @@
> +  unsigned long voice_count, i = 0;
> +  voice_tokens->GetCount(&voice_count);
> +
> +  while(i < voice_count){
> +    SapiVoice *voice = new SapiVoice();

* goes near type.

@@ +291,5 @@
> +    if (S_OK != voice_tokens->Next(1, &voice_token, NULL)){
> +      return;
> +    }
> +
> +    WCHAR *description = nullptr;

Same.

@@ +304,5 @@
> +      NS_WARNING("unable to get voice attributes from sapi");
> +      return;
> +    }
> +
> +    WCHAR *language = nullptr;

Same.

@@ +378,5 @@
> +{
> +  MOZ_ASSERT(NS_IsMainThread());
> +  if (XRE_GetProcessType() != GeckoProcessType_Default) {
> +    MOZ_ASSERT(false, "nsSapiService can only be started on main gecko process");
> +      return nullptr;

indentation.
+void SapiCallbackRunnable::WaitForEvents(HANDLE hEventsWaitHandle)
+{
+  DWORD dwMilliseconds = INFINITE;
+  bool fContinue = true;
+  while (fContinue){
+    DWORD dwWaitId = ::MsgWaitForMultipleObjectsEx(1, &hEventsWaitHandle, dwMilliseconds,
+                                                   QS_ALLINPUT, MWMO_INPUTAVAILABLE);
+    switch (dwWaitId){
+    case WAIT_OBJECT_0:{
+      fContinue = false;
+      break;
+    }
+    case WAIT_OBJECT_0 + 1:{
+      MSG Msg;
+      while (::PeekMessage(&Msg, NULL, 0, 0, PM_REMOVE)){
+        ::TranslateMessage(&Msg);
+        ::DispatchMessage(&Msg);
+      }
+      break;
+    }
+    case WAIT_TIMEOUT:{
+      fContinue = false;
+      break;
+    }
+    default:{
+      fContinue = false;
+      break;
+    }
+    }
+  }
+  return;
+}

Is this getting called on the main thread? (It looks like it since we dispatch the runnable to the main thread.)

+  callbackRunnable = new SapiCallbackRunnable(final_text, aTask, this);
+  return NS_DispatchToMainThread(callbackRunnable);

In which case this is dangerous. Can we move it to a background thread?
Comment on attachment 8513423 [details] [diff] [review]
Updated patch for the sapi service

Review of attachment 8513423 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/webspeech/synth/sapi/nsSapiService.cpp
@@ +242,5 @@
> +{
> +  MOZ_ASSERT(!mInitialized);
> +
> +  /*initialize the COM*/
> +  if (FAILED(::CoInitialize(NULL))){

You shouldn't need this, and there's a risk here that if it fails for some reason, you still call couninitialize in the destructor, which would leave our coinit count unbalanced. If you do call this, you should track the success of the call and always match that to the uninitialize. (Actually looking at this more, if this did fail we'd just crash in destructor as it's currently defined, which isn't much better.)

@@ +326,5 @@
> +                                  nsDependentString(static_cast<char16_t*>(voice->mName)),
> +                                  nsDependentString(static_cast<char16_t*>(voice->mLanguage)),
> +                                  true);
> +    NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Failed to add voice");
> +    if(i==0){

nit - if (!i) {

also please use something like 'idx' or 'index'.

@@ +355,5 @@
> +
> +  original_text = aText.BeginReading();
> +
> +  /*set the pitch using xml*/
> +  std::wstring pitch = std::to_wstring(static_cast<long long>(aPitch * 10 - 10));

I don't think std::wstring is approved for use in mozilla code. Look at the mozilla string classes as a replacement.

@@ +388,5 @@
> +}
> +
> +nsSapiService::~nsSapiService()
> +{
> +  mSapiClient->Release();

Isn't it possible for mSapiClient to be null here? What versions of Windows support ISpVoice?
(In reply to Eitan Isaacson [:eeejay] from comment #18)
> Ryan, what are the best practices for this?

My understanding is that it doesn't come with the Express editions. However, the MSVC 2013 Community Edition (functionally equivalent to Professional but with a free for open source development license) has largely usurped Express AFAICT and definitely does come with it. Given that we've officially dropped support for anything older than MSVC2013, I'd just steer people to the Community Edition and get on with life, personally :)

I'm not sure if there's catches to implicitly dropping support for the Express edition that I'm not aware of, though. Nor am I sure who the right person to answer that would be either. Is this a feature that could be disabled via .mozconfig if such a need were to arise? Maybe that would suffice.
Flags: needinfo?(ryanvm)
(In reply to Eitan Isaacson [:eeejay] from comment #18)
> First of all: this patch introduces a dependency on atlbase.h.

Based on a cursory glance, the only ATL thing the patch uses is CComPtr. The ATL dependency can be dropped by switching to Microsoft::WRL::ComPtr.
(In reply to Birunthan Mohanathas [:poiru] from comment #24)
> (In reply to Eitan Isaacson [:eeejay] from comment #18)
> > First of all: this patch introduces a dependency on atlbase.h.
> 
> Based on a cursory glance, the only ATL thing the patch uses is CComPtr. The
> ATL dependency can be dropped by switching to Microsoft::WRL::ComPtr.

There's no need for this dependency, the uses of CComPtr should simply be removed. COM pointers can be stored in nsRefPtr if need be.
(In reply to Jim Mathies [:jimm] from comment #25)
> (In reply to Birunthan Mohanathas [:poiru] from comment #24)
> > (In reply to Eitan Isaacson [:eeejay] from comment #18)
> > > First of all: this patch introduces a dependency on atlbase.h.
> > 
> > Based on a cursory glance, the only ATL thing the patch uses is CComPtr. The
> > ATL dependency can be dropped by switching to Microsoft::WRL::ComPtr.
> 
> There's no need for this dependency, the uses of CComPtr should simply be
> removed. COM pointers can be stored in nsRefPtr if need be.

I think I would need help on that.
Flags: needinfo?(jmathies)
(In reply to Yash from comment #26)
> (In reply to Jim Mathies [:jimm] from comment #25)
> > (In reply to Birunthan Mohanathas [:poiru] from comment #24)
> > > (In reply to Eitan Isaacson [:eeejay] from comment #18)
> > > > First of all: this patch introduces a dependency on atlbase.h.
> > > 
> > > Based on a cursory glance, the only ATL thing the patch uses is CComPtr. The
> > > ATL dependency can be dropped by switching to Microsoft::WRL::ComPtr.
> > 
> > There's no need for this dependency, the uses of CComPtr should simply be
> > removed. COM pointers can be stored in nsRefPtr if need be.
> 
> I think I would need help on that.

We do this throughout the code base, here are some example in hal of using nsrefptr to store a com pointer - 

http://mxr.mozilla.org/mozilla-central/search?find=%2Fhal%2Fwindows%2F&string=nsRefPtr
Flags: needinfo?(jmathies)
Attached patch Updated patch for the service. (obsolete) — Splinter Review
Worked on the feedback provided and also removed the ATL dependency. 

But, the application is crashing on pause, remove and resume buttons. I tried to run it with the last successful patch, but it's still resulting in a crash. So, I guess the recent changes to remove the atl dependency are not the cause behind the crash.

I have not been getting enough time to debug it as it's my last month of graduation and there is so much going on. 
I don't want to hold up this patch. So, I am removing myself as the assignee.
 
Anyone who is interested can take this up.
Attachment #8513423 - Attachment is obsolete: true
Assignee: yash.girdhar → nobody
Status: ASSIGNED → NEW
Whiteboard: [webspeechapi]
Depends on: 1160844
Attached patch Implement Windows SAPI backend (obsolete) — Splinter Review
based on yash's code.

At first, we shouldn't use nsRunnable for SAPI implementation.  Speak method is async and this COM interface has callback function for notifications such as start and end, so it is unnecessary to use nsRunnable.

Now although I discuss with Eitan about current mochitests, some tests has bugs that it doesn't work with any backend even if Pico for B2G.
Assignee: nobody → m_kato
Attachment #8592161 - Attachment is obsolete: true
Attachment #8602033 - Attachment is obsolete: true
Comment on attachment 8603164 [details] [diff] [review]
Implement Windows SAPI backend v1.1

based no Yash's code.
Attachment #8603164 - Flags: review?(jmathies)
Comment on attachment 8603164 [details] [diff] [review]
Implement Windows SAPI backend v1.1

Review of attachment 8603164 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/webspeech/synth/windows/SapiService.cpp
@@ +98,5 @@
> +  // After cancel, mCurrentIndex may be updated.
> +  // At cancel case, use mCurrentIndex for DispatchEnd.
> +  mSpeakTextLen = 0;
> +  // Purge all the previous utterances and speak an empty string
> +  if (FAILED(mSapiClient->Speak(nullptr, SPF_PURGEBEFORESPEAK, nullptr))) {

any risk of jank here?

@@ +347,5 @@
> +{
> +  MOZ_ASSERT(NS_IsMainThread());
> +  if (XRE_GetProcessType() != GeckoProcessType_Default) {
> +    MOZ_ASSERT(false,
> +               "SapiService can only be started on main gecko process");

will this trigger on startup in content processes?
Attachment #8603164 - Flags: review?(jmathies) → review+
Hey Makoto,

Looks like this may be ready to land? Do you have answers about jank/startup?
Flags: needinfo?(m_kato)
Vladan, if we land this on nightly and there is jank will that be automagically caught via tooling?
Flags: needinfo?(vdjeric)
(In reply to Jim Mathies [:jimm] from comment #32)
> Comment on attachment 8603164 [details] [diff] [review]
> Implement Windows SAPI backend v1.1
> 
> Review of attachment 8603164 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/media/webspeech/synth/windows/SapiService.cpp
> @@ +98,5 @@
> > +  // After cancel, mCurrentIndex may be updated.
> > +  // At cancel case, use mCurrentIndex for DispatchEnd.
> > +  mSpeakTextLen = 0;
> > +  // Purge all the previous utterances and speak an empty string
> > +  if (FAILED(mSapiClient->Speak(nullptr, SPF_PURGEBEFORESPEAK, nullptr))) {
> 
> any risk of jank here?

No.  all speak queue is reset only

> @@ +347,5 @@
> > +{
> > +  MOZ_ASSERT(NS_IsMainThread());
> > +  if (XRE_GetProcessType() != GeckoProcessType_Default) {
> > +    MOZ_ASSERT(false,
> > +               "SapiService can only be started on main gecko process");
> 
> will this trigger on startup in content processes?

This implementation is XPCOM.  So it is possible that addon creates instance.
(In reply to Makoto Kato (:m_kato) from comment #35)
> > will this trigger on startup in content processes?
> 
> This implementation is XPCOM.  So it is possible that addon creates instance.

But I think that we can remove this.  I will file new bug if needed
(In reply to David Bolter [:davidb] from comment #34)
> Vladan, if we land this on nightly and there is jank will that be
> automagically caught via tooling?

If there is a regression in "hot" startup time (startup time without disk I/O cost), it will be flagged.

The jank from actually using the API will not be caught because existing Talos tests don't exercise text-to-speech functionality.
Flags: needinfo?(vdjeric)
We turn off webspeech prefs for Firefox desktop now.  And mochitest always uses Mock implementation.
Flags: needinfo?(m_kato)
https://hg.mozilla.org/mozilla-central/rev/cd54ba9f7d53
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
It seems this patch breaks compiling with VS2013 Express. It reports error:
> C:\Program Files (x86)\Windows Kits\8.1\include\um\sphelper.h(51) : fatal error C1083:
> Cannot open include file: 'atlbase.h': No such file or directory
when compiling Unified_cpp_synth_windows0.cpp. And it seems ATL is part of Visual Studio except Express Edition [1].

Well, I know we have official deprecated Express Edition... (And I eventually have to upgrade MozillaBuild and try VS2015 now)


[1] http://stackoverflow.com/a/16404536
> It seems this patch breaks compiling with VS2013 Express.

I just hit this as well.
Maybe we could detect lack of ATL during configure, and disable this code?
Why does Windows SDK use ATL? :-<

But I think that I can replace SpEnumToken with SAPI COM API, then I can remove sphelper.h.
Blocks: 1187155
patch is coming at bug 1187155 soon.
Depends on: 1221443
Depends on: 1228134
This has now been documented — see all the goodness at https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API! A technical review would be really nice, although I appreciate that it is a fairly big API, so we might want to share the work between the different engineers involved.
Depends on: 1230533
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: