Lqd's Journal

Icon

It's actually pronounced liquid!

Uncovering an invisible button on the iPhone, with sound

I’d like to explain the idea I had last summer on how to “add a new button” to the iPhone, thanks to the microphone, as an interaction design mockup. I initially wanted to do that in an article dedicated to interaction design strategies for iPhone games, but it doesn’t look I’ll be able to do that soon, so here goes the 1st of them. It’ll be clearer than 140 characters at a time on twitter.

I thought about this a week after Sonar Ruler (the site was down when I tried to find a link, so here’s a demo on Vimeo) was released. This app made me think of other uncommon uses of the microphone. While using your voice has become a 1st class citizen in the interaction landscape (or close to it) probably starting with Science-Fiction a while back, I was trying to focus on sound in an indirect way, as a side-effect or by-product of the interaction.

The “Sonic Button”

I never really thought about a name for this, but SonicButton and sonic tapping is probably descriptive enough (suggestions are welcome and appreciated).

The concept, as the name and the title of this post suggest, is using the microphone to listen to the sound of interactions with the iPhone, effectively turning it into a button: listen to taps, and especially taps made using a nail. The great thing about this is it works anywhere on the phone: front, back or the sides, the whole surface is available for the interaction. In turn, this allows great flexibility for the users, any finger can be used, in any orientation, you handle the phone regularly and use the finger you want to tap wherever you want. You can even tap with the nail on the screen itself, and not register a finger touch. As I said, the *whole* surface is available for you to use.

Of course, this is applicable to any phone and not just the iPhone, provided it has a microphone (pretty much all of them except the ones used by mimes) usable from an API (far less common). Android comes to mind as an additional platform, and surely you could name others. However, I only tested the microphone behavior on my iPhone 3G.

Mechanics

While I’m not an iPhone or Android developer (yet) I did the best I could to test this theory by using existing applications. I used the SoundMeter app to see how the microphone reacted to this interaction, under different conditions: portrait and landscape orientations, holding the phone with one and two hands, for regular use and games (where the grip is usually different), and in calm and noisy environments.

The typical sonic tap will obviously manifest itself as a short spike in the audio stream. Depending on the noise conditions, where and how you tap (with the nail or a fingertip), the intensity will of course be different, but in a calm room, hitting with a nail, I usually get between 25 and 35 dB for a soft-to-regular-strength sonic tap.

Something interesting happens using a “game grip” (your hands, thumbs and 1st fingers pretty much covering every side of the phone). The microphone is obstructed and picks up sounds at a high level (80-90 dB, out of, what I think the maximum is, around 105dB or so). Even here, with the microphone completely blocked, a sonic tap registers a spike in I believe the same way as people hear their own voice, even when they block their ear canals with their fingers: with sound travelling through the skull rather than from the outside in (note: my biology knowledge is pretty limited, so this might be wrong). Here I think the sonic tap travels inside the phone and the mic picks it up.

This is actually something that can be taken advantage of, in a noisy room, where a sound spike coming from the environment would be considered a sonic tap, you can block the microphone deliberately and still use this interaction.

Analysis

To my eyes, the most interesting part of this is the fact that it allows to interact with something other than the screen. Even though it’s only one button, the fact that it’s a button that can be used without obscuring the view is really nice.

It’s also a discrete and simple event, and in that sense would be far easier to use than the accelerometer for instance (which depending on the use case can be seen as rather imprecise, and breaking down when using it for two dimensions). It’s not that it’s hard to tilt your phone, it’s that it’s hard to tilt just the amount you need to do what you want, (said amount is also app-dependent), whereas a tap is a tap, in every app. Sure, the variety of the environments, of the implementation thresholds can turn this into the same non-deterministic behavior, but a strong sonic tap should generate a high-enough spike to be detected in most implementations. We’ll see.

Just like with the accelerometer, a problem is that it would probably require calibration to match the user’s behavior and environment, even though sensible defaults could be chosen regarding strength, and an app could detect a noisy environment and take appropriate action, be it changing thresholds or notifying the user to switch his grip for instance.

It’s also about as hard to discover as it gets, otherwise I wouldn’t be talking about it. However, I don’t feel discoverability to be such an issue, the usual in-app “tutorial” solves those kind of problems with ease most of the time, and even if it’s rare to see them in utility apps rather than games, any app with a different enough UI offers one.

The initial (really short) learning curve passed, I feel an interaction like this one would be fun and useful, and should offer a great experience to the users, which is what the iPhone spirit is all about.

I would see this being used in immersive apps (like games and such) for local consumption only, ie I don’t think it would be useful to broadcast the sonic tap event over the network, other than maybe if you wanted to do a mini-drums simulator that’d work remotely from the back of your phone, or a human metronome, who knows.

The possible interactions

The most common way to hold a phone is (in my own experience, and limited testing with real people) in portrait mode, with one hand. This can be seen as a vertical handling of the position called “the dealer’s grip” in the card-playing world. In this position, the 1st finger is almost not used for holding, resting most of the time close to the lock button (using the left hand; this button is not located here by chance) or on the back (and probably not lower than the Apple logo) while the others are touching the sides. This is the simplest case to hit the SonicButton, on the back side of the phone. It’s also possible to use the 2nd finger, however it’s not as comfortable, so it didn’t look as good a choice in my tests (I only tested with a handful of people, though). As I said before, letting the users choose will end up naturally on the most comfortable choice and position for them, in practice I found this to be pretty powerful.

Using both hands in portrait mode can happen when the user is typing, on the web or writing a mail or SMS, and only if the user is skilled at typing on the virtual keyboard (the small number of beginners I know type approximatively the same way, by holding the phone with one hand and either hitting the keyboard with the holding hand’s thumb or with the 1st finger of the other hand. The latter being more common, and this also held true for the ones having a phone with a physical portrait keyboard, sliding up or down, or with clamshell phones). In this position, a very interesting situation comes up where you can still use your thumbs but hit the glass *under* the screen, at the left or right of the Home button. I did say you could sonic tap anywhere.

In landscape mode, people rarely use it with a single hand, but it can happen. If the user’s holding the phone and not interacting with it, reading or watching a movie, his fingers aren’t in front of the screen and he can sonic tap on the back. If he’s interacting with the phone, it’s usually with the thumb and here a sonic tap to the sides of the screen around the top speaker or home button, or once again on the screen with the nail only, is doable. In practice, I didn’t see this behavior in my limited testing. Still works if you do use it that way.

When they use the phone with both hands in landscape mode, the grip matters (in blocking the microphone) but basically you can sonic tap with your thumbs on the sides of the screen on the front, or with your 1st fingers on the back of the phone. Using the “game grip” you can also use your 1st fingers on the top edge, or depending on the user’s dexterity the 2nd fingers on the lower part of the back side, close to the bottom edge.

As with regular taps, you can have a multiple sonic taps, even though I suspect filtering will modify the way to detect the following taps.

What’s also interesting with this, is it allows eye-free interactions like a real physical button, even though the environment noisiness could be an issue, a double tap might be the gesture to use. It’s not that interesting in practice because it’d mean an app would have to run when the phone is in a pocket, unlocked, and with the microphone and sound processing the battery probably wouldn’t last long.

Implementation

As this is an interaction concept, the only pointers I could give would be the Audio Queue Services inside CoreAudio, for the iPhone, and also Stephen Celis’ sc_listener which seems to be the perfect candidate for the job.
I don’t know if you can get a live stream on Android, but the AudioRecord, MediaRecorder, and this tutorial certainly could be a start.

In conclusion

This sonic button is closely related to Chris Harrison’s ScratchInput: I actually read his project page when it came out, and I remembered it and looked it up again after coming up with this. The mechanics are roughly the same and use the exact same principles, but in ScratchInput the listening to scratching on surfaces is done via custom hardware (because scratching is a lot softer than tapping), and could be adapted into mobile phones, turning them into passive listening devices (that can broadcast data over the local WiFi network) used to turn a regular surface into a scratch-enabled one; whereas in what I presented here, the interaction is focused on taps and happening on the phone itself (and not a wall or desk) and the input data is actively used by the iPhone and its apps to enable new features. So in my mind, they’re really close and part of the same interaction family, but I wouldn’t say they’re exactly the same.

I’d love to get feedback on the concept, you can me find a twitter.com/lqd.