from Andreas Blixt, former Technology Lead at Spotify (2009-2014) on Quora:

The core of all our clients is C++, but that core has since Rasmus's post gotten condensed, with functionality split out into modules. As Spotify becomes available on more and more platforms as well as getting a richer feature set, we need to ensure that "core" doesn't become "a little bit of everything". This has meant breaking out certain features, such as playback control, into their own separate modules. These modules are still C++ but are self-contained enough that their logic could theoretically be implemented in other languages. We call the interface layer to these modules "Cosmos", and it works in a way not too dissimilar from HTTP. Cosmos lets any part of the client communicate with a module using arbitrary paths and payloads, allowing for a much more flexible architecture. Some obvious benefits are versioned interfaces (example: GET sp://player/v1/main returns player state) and JSON for passing data around. This is important for another change in our desktop client.

A lot of our desktop UI these days is actually using Chromium Embedded Framework (CEF), which basically means our views are powered by JavaScript, HTML and CSS. For all of our feature teams to be able to work on their features without fear of breaking someone else's view, each view is sandboxed in their own "browser" (I guess you can think of the views as tabs in Chrome, except we show more than one at a time). This brings with it one restriction though: sharing data between views gets more difficult. This is where Cosmos comes in and really simplifies the communication between core (C++) and JavaScript land: the JS clients can make arbitrary requests and if there's a binding, that request gets handled and responded to. One example is the "messages" endpoint which lets any view push JSON data out to any other view that's listening (kind of like window.postMessage in HTML5, except this one can also interface with C++ modules). This is also how all the play buttons in the client know whether a track is playing or not, or whether it's available offline (another Cosmos module), or whether you've saved a song to your music.

Another important change to our technology stack is that we've moved some logic further "back", into view aggregation services. So where we would before do almost all logic in the clients, only using the backend as a data store, we now do much more work in a logic layer between the data stores and the clients, exposing endpoints very similar to Cosmos (in fact, you can call a backend the exact same way you call a Cosmos module, so moving between layers is not a hassle). The reason for this is two-fold: one, it lets us expand to more platforms more quickly because there's less client logic to implement and two, it really helps us keep our client behavior more consistent and up-to-date because the client is more "stupid". To mitigate any slowdown that might come from this we have ensured that there are caching rules for all data, so that the client will still keep data locally, it's just not responsible for as much business logic as it used to be.

On top of this marvelous C++ "one stop shop" they also have:

from Consistent, Thin, & Dumb: Redesigning the Spotify iOS App by Hector Zarate on Realm.io:

The point here is that it’s very dynamic, and the content is just different types, not just one kind of operating a specific model. This team, they came with a very, very elegant model of describing this, so they figured this out. They have a space that contains blocks that, in turn, contain items, and while they have some basic properties like an item has a title, subtitle, a block has a title as well, and the space also has a title, and there is one key property here. That is the render type for the block.

This render type dictates the layout of the page and the presentation of the information that is stored in the models in each of our screens and sub-screens. We extracted this idea so it could be used for almost every component of the app, based on a cool name called Ceramic to compose these blocks.

Yep, so that's a shit ton of quoting but I think these are great bits of context into how my opinion of Spotify has been shaped. Next up: my credentials.



81,872 minutes... Holy wow. So there are only 525,600 minutes in a year. This would mean that for 15.57686454% of last year I had music playing. And then let's say I slept for about a third of every day. 525,600 * (2/3) = 350,400 which would mean that 23.3652968% of my waking hours I had music playing.

As a dev, I know how hard it is to build and maintain an application. I have context into:

This shit is so ridiculously hard to keep track of. And then, cool, so you have one "backend" API powering all of the nitty gritty but there's still so much internally that you have to have the headspace for. Some people sort of cop out here and say that there's no way as a product owner or CTO you could keep track of it all and I whole-heartedly agree but I think that's the fundamental problem with the state of SaaS (startup as a service) projects where you need to attempt to eventually support all the clients. React Native, Xamarin, blah blah blah, etc - these are all good tools for the task but I Spotify it seems like takes it next level. This C++ lib they talk about sounds like it does some fancy stuff.

Think of Spotify's described C++ lib as an "interface" (like Java impl interface) for the frontend client layer of EVERYTHING. Let's start with audio as the primary example: coreaudiod on MacOS is a single audio buffer "daemon" that all audio gets piped into where tools like Boom audio can filter over top of. This C++ lib would be able to interpret commands like play, pause, skip and many other very mechanical and very hardware oriented stuffs and translate and actually execute on basically anywhere that you could get your code compiled to run on.

Here to me is the craziest concept: "don't be afraid to use a lower language"

This large push on Rust language is showing me the bowels of audio processing, web assembly browser translation for any CPU architecture and it basically brings me full circle back around to how Spotify is already partially acting on this. The only other company I can think of that has all the contexts is WhatsApp but not quite to the extent that Spotify is with the hardware implementations of Echo and Chromecast devices where I can scrub songs to certain places. It floors me how well Spotify keeps track of which device I'm on in a very unobtrusive (almost kind of non-scary) way. Bluetooth/WiFi/mobile/desktop (Windows/MacOS)... it reminds me of iTunes when they specifically has to build a Windows version for adoption purposes.

I digress... So lemme recess...

So I have many more thoughts on this matter but I want to get these core ideas out on the interwebs for now! I will follow up with how this example for proper "context switching" is also helping me balance the way I develop code and manage my time.

Share the love!

Thank ya for the read!