Skip to main content
Personal ProjectsIn Development

Synesthesia Machine

An experimental project translating audio signals into visual representations in real-time.

PythonAudio ProcessingVisualization

This project is currently in development

What It Will Do

The Synesthesia Machine translates sound into sight in real-time. Feed it music, speech, ambient noise, or any audio signal, and it generates a live visual representation that maps the sonic characteristics to color, shape, movement, and texture. Not a waveform visualizer or a spectrum analyzer. Those show you what sound looks like as data. This shows you what sound feels like as an experience.

The core idea draws from synesthesia, the neurological phenomenon where stimulation of one sense triggers automatic perception in another. Some people genuinely see colors when they hear music. The Synesthesia Machine simulates that cross-sensory experience computationally, creating a visual language for audio that's consistent, learnable, and genuinely beautiful.

The end product will be a web application where you can pipe in audio from your microphone or upload a track, and watch as the visual field responds to every beat, melody, and texture change. Different instruments get different visual signatures. A sustained violin note will look fundamentally different from a staccato piano chord, not because I hard-coded those mappings, but because the audio features that distinguish them naturally produce distinct visual outputs.

Why

I've always been fascinated by the boundaries between senses. There's something compelling about the question: if you could see music, what would it actually look like? Not as a graph or a meter, but as an immersive visual experience. The computational challenge is equally interesting. Real-time audio analysis, feature extraction, and visual generation at interactive frame rates is a genuinely hard problem that sits at the intersection of signal processing, creative coding, and performance engineering.

This project is also a playground for ideas I want to explore more deeply. Audio feature extraction has applications far beyond visualization: music information retrieval, speech analysis, sound classification. Building the Synesthesia Machine forces me to develop a deep understanding of these techniques in a context where the output is immediately, viscerally testable. If the visuals don't respond convincingly to the audio, you know instantly.

Beyond the technical motivation, I think there's real value in tools that make the invisible visible. Sound is inherently temporal and ephemeral. A visual representation gives it persistence, makes it something you can study, compare, and share. Whether that ends up being useful for musicians, sound designers, educators, or just people who enjoy looking at interesting things while listening to music, the exploration is worth the build.

Current Progress

  • Initial research and concept definition
  • Audio input pipeline prototype (Python)
  • Basic frequency analysis and FFT implementation
  • Spectral feature extraction (centroid, bandwidth, rolloff)
  • Onset detection for beat-reactive visuals
  • Visual rendering engine (Canvas/WebGL)
  • Audio-to-visual mapping system
  • Real-time synchronization pipeline
  • User-configurable mapping profiles
  • Web deployment (WebAudio API + Canvas/WebGL)
  • Public demo

The audio analysis side is working and tested. The Python prototype can take a live audio stream, extract spectral features in real-time, and output a structured feature vector at around 30 frames per second. The features include spectral centroid (brightness), spectral bandwidth (richness), spectral rolloff (high-frequency content), onset strength (percussive energy), and chroma features (pitch class information).

What's proven is that the audio features are discriminative enough to drive meaningfully different visual outputs. A bass-heavy electronic track produces a fundamentally different feature trajectory than a solo acoustic guitar piece. That's the foundation the visual system will build on.

Next Milestone

The immediate next step is building the visual rendering engine. The plan is to start with HTML Canvas for a 2D prototype, then evaluate whether WebGL is needed for the visual complexity and performance targets I'm aiming for. The mapping system (which audio features drive which visual parameters) is the creative core of the project, and I want to iterate on it quickly with a rendering layer that's easy to experiment with before optimizing for performance.

The target is a working web demo where you can drop in an audio file and see responsive visuals within the next few months. It won't be the final product, but it'll be enough to validate the core concept and start getting feedback on whether the visual mappings feel right.

Impact

Impact

Making musical perception tangible and visual. Exploring the intersection of sensory processing, signal analysis, and creative computation.

Constraints

Real-time processing latency must stay under 30ms for audio-visual synchronization to feel responsive. Browser audio APIs have their own latency characteristics that need careful management.

Trade-offs

Web deployment over native for accessibility, accepting some performance constraints. 2D Canvas first over WebGL for faster iteration on the mapping system, with a plan to upgrade when visual complexity demands it.