Insight • UX

Designing Multimodal Experiences: UX Beyond the Screen

Designing for voice, gesture, and spatial interfaces alongside screens. A practical UX framework for multimodal products.

Updated: 23 March 2026 6 min read Published: 23 March 2026
Person interacting with multiple devices showing voice, gesture, and screen-based interfaces simultaneously
Need help putting this into practice?

We help teams turn insight into action with clear plans, templates, and delivery support.

Book a 15-minute call See services

Screens are not disappearing, but they are no longer the only surface that matters. Voice assistants handle millions of daily queries. Gesture-based interactions are now standard on phones and are showing up in spatial computing. Haptic feedback can carry information without demanding visual attention. Smart environments respond to presence and context.

For UX designers, that changes the brief. Designing for a single screen is not enough. The task now is to make experiences hold together across screen, voice, gesture, haptics and spatial interfaces, sometimes all at once.

What multimodal UX actually means

Multimodal UX is not about building for every interface under the sun. It is about designing for the modalities your users actually meet, and making sure they work together instead of competing.

A simple example: a cooking app shows recipe steps on screen, reads the next step aloud when the user asks for it because their hands are covered in flour, and sends a haptic pulse when the timer ends. Each modality does a specific job. Together, they do something a single channel cannot.

The three types of multimodal interaction

  1. Sequential multimodal: the user moves between modalities over time, starting on a phone, continuing on a smart speaker, then finishing on a laptop.
  2. Simultaneous multimodal: the user uses more than one modality at once, such as looking at a screen while issuing voice commands.
  3. Complementary multimodal: different modalities handle different parts of the task, with the screen carrying visual information, haptics handling alerts and voice covering hands-free control.

Which one you are designing for changes the rest of the work.

A framework for multimodal design decisions

Not every product needs every modality. This framework helps narrow the field and keep the design grounded.

Step 1: Map the user's context

For each key task, document the environment, attention, hands, social context and accessibility needs.

Ask where the user is - at a desk, in a car, in a kitchen, in a public space or walking. Work out how much visual or cognitive attention they really have. Check whether their hands are free, occupied or dirty. Note whether they are alone or in a shared or public space. Accessibility needs should be part of the map from the start, not added later.

That context map tells you which modalities are realistic. Voice works when hands are occupied, but it falls apart in noisy public spaces. A screen works when the user has visual attention, but not when driving. Haptics work almost anywhere, though they carry limited information.

Step 2: Assign modalities to tasks

For each task, choose a primary modality and one or two fallbacks.

The primary modality is the one that fits the typical context. Fallback 1 covers the case where that modality is unavailable. Fallback 2 covers accessibility.

For navigation instructions, that might mean a visual map on screen as the primary, spoken turn-by-turn directions as the first fallback and haptic pulses for left and right turns as the second.

Step 3: Design the transitions

The hardest part of multimodal UX is the handoff between modalities. Users should be able to switch without losing context. That means state must stay synchronised across modalities, the user should not have to repeat information when switching, and each modality should acknowledge the current state rather than starting from scratch.

Step 4: Define the information hierarchy per modality

Each modality has different bandwidth. A screen can show a complex table. Voice can get across one or two key points. Haptics can signal yes, no or urgency levels. Design the information hierarchy for each modality separately, then keep the meaning aligned across all of them.

Designing for voice alongside screens

Voice interfaces are mature enough to be useful, but still limited enough to need careful design. In 2026, the common pattern is voice alongside a screen, not voice instead of one.

When voice works

Voice is a good fit for hands-free contexts such as cooking, driving and exercise. It also works for quick queries with simple answers, for accessibility when a user cannot see or interact with a screen, and for commands such as "play", "next", "set timer" and "call".

When voice falls short

It struggles with complex decisions that need comparison, because voice cannot show a table. Noisy environments degrade speech recognition. Private information is awkward in public spaces. Tasks that need precision, such as editing text or positioning elements, are also a poor fit.

Practical voice UX patterns

  • Confirm before acting: echo the interpretation before executing a command, as in "Setting timer for 15 minutes. Is that right?"
  • Offer escape hatches: always provide a screen-based alternative.
  • Keep responses short: once voice answers run beyond 15 seconds, attention drops.
  • Handle errors gracefully: "I didn't understand that. You can say X or Y" is better than silence.

Designing for gesture and spatial interfaces

Gesture interfaces run from phone swipe patterns to hand tracking in spatial computing. The problem is discoverability. Unlike buttons, gestures stay invisible until someone learns them.

Design principles for gesture

Start with simple, discoverable gestures and add more complex ones as the user gets better at them. Use visual affordances - animation, ghost hands, tutorial overlays - to show what is possible. Leave room for forgiveness; any gesture-based action should have undo. And every gesture needs a button or voice equivalent for accessibility.

Spatial design considerations

Spatial computing, whether AR or VR, adds a third dimension and physical space to the design canvas. The basics still matter, only more so. Keep arm positions comfortable and avoid "gorilla arm" fatigue. Use depth to signal hierarchy and importance. Anchor spatial elements to the environment or the user so they do not feel randomly floating. Performance has to be strict, because spatial interfaces are extremely sensitive to latency and any lag breaks the illusion.

Haptic design patterns

Haptics are underused in most product design, even though they are one of the most efficient ways to carry simple signals.

Effective haptic patterns

A short pulse can confirm that an action succeeded. A distinct pattern, such as a double pulse or escalating vibration, can warn about errors or alerts. Directional pulses work for turn-by-turn guidance. Rhythmic pulses that change as a process completes can show progress.

Haptic design rules

Keep patterns simple and distinct from each other. Test on multiple devices, because haptic motors vary a lot. Never rely on haptics as the only communication channel. Let users customise or turn off haptic feedback.

Accessibility in multimodal design

Multimodal design can improve accessibility because it offers alternatives. A user who cannot see a screen can use voice. A user who cannot speak can use touch. A user with limited mobility can use voice or eye tracking.

The key principle is blunt: every critical action must be possible through at least two modalities. That is not just accessibility practice. It is good multimodal design, because any user can temporarily lose access to a modality in a noisy room, with full hands or in bright sunlight.

For foundational accessibility guidance, refer to our creative audit checklist and W3C WAI fundamentals.

Testing multimodal experiences

Traditional usability testing tends to focus on a single interface. Multimodal testing needs a few extra checks. Test in the actual environments where the modalities will be used, or in realistic simulations. Check what happens when users switch between modalities mid-task. Measure cognitive load as well, because multimodal interactions can reduce it or add to it depending on the design. And walk through each modality on its own to make sure it still works as a standalone experience.

Common pitfalls

  • Modality overload: adding modalities because you can, not because they earn their place.
  • Inconsistent mental models: the voice interface uses different terms from the screen interface.
  • Ignoring fallbacks: assuming the primary modality will always be available.
  • Over-engineering: building complex multimodal flows when a plain screen interaction would do the job.

What to do next

Start with the user's actual contexts. Map the environments, tasks and constraints. Add modalities only where they genuinely help, not where they merely look impressive. If you need support designing for multimodal experiences, book a call or explore our services.

Written by CID Creative

Senior-led studio for brand systems, web delivery, and campaign creative. We focus on clarity, accessibility, and lightweight performance.

Last updated: 23 March 2026