How to Build a Text to Speech IOT Speaker

2876 Views, 5 Favorites, 0 Comments

How to Build a Text to Speech IOT Speaker

Motivation

This project is about building a remote controlled speaker that has text to speech ability. I've started working on this project because my home-automation project lacked the ability to send messages verbally to me.

The speaker will expose an secured simple rest API for text to speech. Another concern of this project is keeping costs down, the budget it's under 30$.

Some possible usages for this kind of project are:
- integrate it with a home automation system

- use it with a platform like thingspeak as an alarm clock or integrate it with twitter or with various custom sensors - use the API to integrate it with your custom application if you know a little programming

- learn something new as you build it

The building process overview:

For this project you'll need a speaker with amplifications and supports an audio jack as input, a development board (i've used C.H.I.P. to keep the cost down) a 5V 2 A power supply (we'll be powering the development board and the speaker from it), USB cables, audio cable with two male jacks. If you follow the links you'll see where you can purchase the components.

As for the software we'll be using linux and python, C.H.I.P. comes with a pre-installed version and we'll use git to clone the project from github and then configure it. So you'll need some skills for working with linux, some knowledge about programming in python and basic networking stuff.

Assembly

The assembly part is pretty simple: the development board and audio speaker must be powered. I've chosen an 5V 2A phone charger and a USB hub to achieve that but anything will do as long as they are powered.

Besides power an audio cable is needed to connect the audio output of the development board to the audio input of the speaker.

I haven't built an enclosure for the whole thing, you can leave it open or enclose in in a box as you see fit. For example you can use a plastic box like this from ebay, and you'll need to drill some holes for the cables, and speaker output

Getting the Development Board Ready

By getting ready i mean:

- the operating system up and running

- wifi configured

- ssh enabled

- python, pip, git and festival installed

Every board has it's own setup mechanisms. Luckily C.H.I.P. comes with a pre installed operating system and all you need to do is configure wifi and enable ssh. I won't get into details about how to configure those because it's different for every board. on C.H.I.P. they have a setup documentation here.

To install pip on a debian based development board use apt-get like so to install the packages we need. I've assumed python >= 2.7 it's already installed.

sudo apt-get install -y pip git festival

Installing and Configuring the Software (python Server)

First we need to download the project and install python dependencies:

cd /root
git clone https://github.com/danionescu0/home-automation cd home-automation/remote-speaker sudo pip install -r requirements.txt

This project it's located in a larger home-automation project, we won't be talking about this now but feel free to explore it and pose some questions if you like.
Then comes the configuration part, by editing config.py you cand change the default port, username and password for the API authentication. The port can remain 80, but the user/pass combination must be changed for security.

vi config.py
host = '0.0.0.0' 
port = 80 
http_user = 'test_user' 
http_pass = 'test_pass'

The last part it's making our webserver auto start on reboot, we'll be using systemctl to do that:

vi remote-speaker.service 
[Unit] 
Description=Starts remote speaker 
After=network-online.target 
[Service] 
Type=simple 
ExecStart=/usr/bin/python "/root/projects/remote-speaker/server.py" 
[Install] 
WantedBy=multi-user.target

replace /root/projects/remote-speaker/server.py with your own path, like: /root/home-automation/remote-speaker
Now copy the service configuration to /etc/systemd/system/ and enable it:

cp remote-speaker.service /etc/systemd/system/
sudo systemctl enable remote-speaker.service sudo systemctl start remote-speaker.service sudo systemctl status remote-speaker.service

If all went well you will see something like this:

● remote-speaker.service - Starts remote speaker 
  Loaded: loaded (/etc/systemd/system/remote-speaker.service; enabled) 
  Active: active (running) since Mon 2017-06-19 05:39:58 UTC; 2s ago 
Main PID: 29626 (python) 
  CGroup: /system.slice/remote-speaker.service 
          └─29626 /usr/bin/python /root/projects/remote-speaker/server.py 
Jun 19 05:39:58 chip systemd[1]: Started Starts remote speaker. 
Jun 19 05:39:58 chip python[29626]: Bottle v0.12.13 server starting up (using WSGIRefServer())... 
Jun 19 05:39:58 chip python[29626]: Listening on http://0.0.0.0:80/ 
Jun 19 05:39:58 chip python[29626]: Hit Ctrl-C to quit.

Testing the Implementation

Ok, so now our python server is listening on port 80 for incomming API calls, and we're going to test that but first a few things about how to call the service, there are two options

a) call the service from the local newtork, for example from your laptop which happens to be on the same wireless newtowk as the development board. In this case you need to find the internal IP address.

ifconfig
............................................................. wlan0 Link encap:Ethernet HWaddr cc:79:cf:23:74:e9 inet addr:192.168.0.107 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::ce79:cfff:fe23:74e9/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 .............................................................

So in this case the local ip address will be "192.168.0.107", to test if the API is working you can use curl to make a http request in the local network like so:

sudo apt-get install -y curl
.... curl -i --user test_user:test_pass "http://192.168.0.107/api/tts?text=my+speech+test%0D%0A×=1" HTTP/1.0 200 OK Date: Tue, 20 Jun 2017 18:49:27 GMT Server: WSGIServer/0.1 Python/2.7.9 Content-Length: 0 Content-Type: text/html; charset=UTF-8

--user will add the authentication header with username "test_user" and password "test_pass", then comes the GET body of the request composed of:
* protocol: "http"

* ip address: 192.168.0.107

* implicit port: 80

* url: /api/tts

* url encoded get parameters text and times, text means the text to speech text, times means how many time to repeat it If you don't hear the voice playing go to the development board and issue

sudo systemctl status remote-speaker.service

and it will issues some debug lines of the server log
b) call the service from the outside world. You will need to know your external ip and then configure a port forward in your router. The easiest way you can find out the external IP by searching on google "what is my ip". The port forwarding it's a bit more complicated, you can check this tutorial for more details.

Basically you tell the router that for the local ip "192.168.0.107" you want to forward local port 80 to the external port 80, or some other port. The configuration differs on every router interface.

How Does It Work (optional)

You can safely skip this if your're not interested about the code.

The project it's written in python and it's very simple actually. We have two main files with the code: server.py and TextToSpeech.py Server.py loads the configuration and starts a http server using bottle framework. It exposes one route "/api/tts".

When called it checks for the authorization header and verifies it against the config, then calls TextToSpeech obect TextToSpeech.py contains a wrapper around the festival command, you can check it out in console:

echo "some text" | festival --tts

So that's about it, the rest of the files are for configurations.

Using the Speaker

We talked of how to build the thing, but now what are the uses. I'll try to give some ideas and examples:

If you know a little programming you program a little python script to do anything with this like alarms, reading the email, telling you jokes, notifications about you facebook account etc.

If your programming skills are not so great, then you can use something like thingspeak. ThingSpeak is an open source “Internet of Things” web application. If you don't have an account visit the account create page.

Thingspeak supports for you to define some "actions" in the system, these actions can be http calls and they are called "thinghttp". Ok so we can define a http call to a voice saying "please wake up" and use the "timecontrols" to define when the call should be made it supports one time call, or recurring calls. The interface it's pretty easy to use. Please check the video for details of how to use this part of thingspeak web interface.

But this is not all the power of this online tool , you can use an app called "thingtweets" to control the http call to the voice defined earlier when you post something on tweeter. A more complicated tools is "reacts" with this you can define thresholds if some sensor attached to thingspeak reports some value.

This is a power and versatile tool for something you can build under 30$, it enables voice notifications via an API so it opens a world of possibilities if you have some coding skills.