Software apps and online services
Enable the Where'sMyPhone skill on your Alexa device here!
Ever lose your phone in your own house/apartment? Under a cushion, in the bathroom, in the microwave...
Ok, maybe that last one's just me. In any case, the time-honored human tradition of asking your friend to call your phone so you can find it seems out-of-place in this advanced world we live in. In 2016, who needs a human friend - we have Alexa!
Where'sMyPhone integrates your Alexa-enabled device with the Twilio Voice API, reducing the task of locating your phone to less than ten words, spoken to the small black cylinder on your shelf. Isn't it great living in the future?UX
(VUI diagram here)
The user's experience with Where'sMyPhone starts with a simple but secure onboarding process. When the user launches Where'sMyPhone for the first time (Alexa, open Where'sMyPhone), the skill notifies the user that they have no phone number set, and asks them if they wish to set a number. When the user accepts, the skill prompts them to dictate their phone number. Phone numbers are added to a secure DynamoDB database, protecting them from hackers and tech-savvy telemarketers alike.
When the user finishes speaking their number, the skill notifies them that it will send a verification call to their phone. After the user answers their phone and presses 1 to verify their number (as prompted by the recorded voice of yours truly) the verification status of the phone is updated in the database, and the skill can now be used to call the phone normally.
Security and user privacy was always in mind during the development of this skill. Phone numbers are stored in the database with anonymized identifiers provided by Alexa, and there is no way for a user to directly query the database through the skill (or any other means.)The Nitty-Gritty
The Alexa part of this project is pretty ordinary - I used the wonderful alexa-app, my go-to library, to write the Lambda handler, with session variables keeping track of the state of the VUI.
The Twilio side is where it starts to get interesting. Twilio uses an XML-based document format called TwiML to define the phone call experience (also a form of VUI!) When you place a call with the API, you specify a URL for a TwiML document to start off that call. As the person on the other end picks up, Twilio requests your TwiML document (which may contain instructions to speak, play an audio file, dial another number, etc.) and starts running it.
Button interactions (para espanol oprima dos!) are handled with the Gather TwiML "verb". Gather instructions tell Twilio to wait for a certain number of buttons to be pressed, and then POST the buttons pressed to an HTTP endpoint. That endpoint also gets all kinds of juicy information about the call, but it's expected to send back another TwiML document for Twilio to run. This system may seem overcomplicated at first, but if you think about the fact that phone calls can conceivably run for hours, it makes sense to use this sort of architecture.
When a user triggers the verification call through Where'sMyPhone, the skill looks up their number in the database and sends it to another Lambda function, the "caller" function. This functions handles all the interactions with Twilio. It tells Twilio to call the number it's been given, but for the TwiML URL it specifies a special URL that actually just points back to itself (through Amazon API Gateway).
When the user picks up, the recorded message asks them to press one to verify their number, or press two if the call was sent in error. After the user chooses a number, the Gather verb sends it to the caller function through API Gateway, along with a state variable as a query parameter. The function checks the number that was pressed - if it's one, it updates the verification status of the phone number in the database, tells the user their phone is now verified, and hangs up. If the number is two, it deletes the number, apologizes for any inconvenience, and - you guessed it - hangs up. That another cool thing about TwiML - you can generate it on the fly based on what happens in the call.