Remote Ruby

Decoding Postgres: A Journey Through User-Friendly Database Experiences with Craig Kerstiens

Jason Charnes, Chris Oliver, Andrew Mason

Get ready to embark on a captivating journey into the world of databases with our esteemed guest, Craig Kerstiens, a 12-year veteran of working with Postgres. From an unplanned stint as product manager for Heroku Postgres to the unique challenges he faced in marketing the platform to developers, Craig's story is as intriguing as it is enlightening. 

Fasten your seat belts as we navigate through the terrain of user-friendly database experiences, the evolution of language in the tech industry, and the sometimes-dreaded realm of Postgres among developers. Do you ever wonder what makes a database experience user-friendly? Or perhaps you're curious about tightening security for your databases and the role of a solid database checklist for production? We're on hand to guide you through these essential topics, alongside insights into the importance of multi-tenancy in databases, and how a well-thought-out strategy can make all the difference.

But that's not all! We have a special treat for all you cocktail lovers out there - a delightful chat about our favorite Tiki bars, because who doesn't enjoy a good drink while contemplating databases? So, join us for an episode packed with technical insights, practical advice, and a dash of fun. Whether you're a tech aficionado or curious about how language evolves in the tech industry, this episode has something for everyone. Tune in and quench your thirst for knowledge (and perhaps a cocktail too)!

Honeybadger
Honeybadger is an application health monitoring tool built by developers for developers.

Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.


Speaker 1:

This is remote Ruby. Have you any remote ideas to the meaning of the word? Are you good?

Speaker 2:

Chris, I'm pretty good. I was fried mentally yesterday to the point where I was like I'd call Colin and we'd pair on Tupel for something and it was like two seconds after he joins the call and I try and explain. But I'm like this route it's not working. And then I like scroll a little bit over in the routes file to show him the route and it's if statement at the end that's like clearly skipping this route and I'm like, all right, thanks, man, hang up. He didn't have to say anything. Yeah, that was yesterday. I was struggling. We had three or four calls like that. That were just very basic things. My brain was not functioning.

Speaker 3:

So start Friday off and head to the weekend. It's let's dive into databases because that's the way to go into.

Speaker 2:

Yeah, let's get in the weeds.

Speaker 1:

Well, yeah, let's go there. Let's go there right now. Today we have Craig Kirsten's from on she data, but that is not the only place you might know Craig from, and I will let Craig introduce himself real quick.

Speaker 3:

Cool. I'm Craig, a running product and engineering at crunchy have, for better or worse, just kind of been doing postgres in the cloud for people for 12, 13, maybe almost 14 years now since the early days of heroku. So joined heroku actually to maybe alienate the whole audience right now To watch their Python support. I was like one of the few heroku people that was not a rubious like more in the Python community, have many friends in the Ruby community but like I was not one of those people that joined there and ended up most of my time building and running heroku postgres. Went from there to site of state which turned postgres into shard and distributed horizontally scalable database. That's a lot of buzzwords in there, but think kind of infinitely scalable postgres heroku.

Speaker 3:

We ran I think a million and a half databases. At the time that I left site is it was a little different. It was like the average customer was 40 terabytes, the largest was 960 terabytes, so not quite petabyte scale but I'm going to round out and put it on my resume. And then now for several years now over at crunchy data I joke because it feels kind of like it's unfinished business from the heroku postgres days, of like just running postgres really well, creating kind of that developer joy for postgres that we always sit out to do it heroku and got close but never kind of finished it. So let's start there.

Speaker 1:

Early days of heroku. What was kind of your role in heroku postgres?

Speaker 3:

Yeah, as I mentioned, like I started over on the Python side of things and was product manager on like billing and fraud and some of the core languages, and when I joined heroku I had previously been bootstrapping my own thing, built out my own like heroku, like deployment thing, as we are all doing on our VPS, and it's okay. Never came close to heroku, right. If we remember way, way back 15 years ago, the git push is it's still magic, no one else is still doing it as well as easy it's, it just works Right and you get back to building. So I'd been building out like on the Python side of things and they're like oh, we have this build pack stuff. I remember really early on we leaped the Python support on several times on hacker news because you would do a git push with the requirementstxt, right, which is the same as your gym file for Python, and it would be like detecting Django out. And the hacker news comments were amazing. They're like look, I've hacked heroku. I figured out how to deploy a Python app and it's like this support is fully. It was amazing.

Speaker 3:

We leaked it five times before you officially announced it, but then we had all these internal people there and we had launched heroku postgres and I can get into some of that story too. But we have all these internal like Rails developers and they're like yeah, yeah, yeah, postgres, fine, whatever. And I'm like why aren't you using this in postgres? Why aren't you using this? And I felt myself internally evangelizing postgres. And the GM was like, hey, you're doing all this marketing internally, you should come help us with actual marketing for postgres. And I'm like I'm not a marketer, marketers are slimy and gross and they're like running ad came on like like just developers should know why postgres is awesome. You just give it a shot. And so I did that. And shortly after that, the PM that was over there kind of moved out of that to another area.

Speaker 3:

And how I found myself kind of running products marketing, kind of all that business side for heroku postgres, really accidentally, of just developers like quit, treating like you know, I think DHH she called it like hey, the database is a dumb hash in the sky, right. And so no, it's not, it's a really powerful thing. Here's why you don't have to be afraid of it. Here's how you can take advantage of it. And I ended up doing that for three and a half years there and then they gave me a whole bunch of other product areas to run for a little while. So once it was kind of humming along they're like, hey, go fix other broken things.

Speaker 3:

But man, I could wax poetic about how we thought about things like fork and follow, trying to move away from the idea of like master slave. And we really thought about fork and follow a lot because we wanted you to interact with it and think about it and the same way you work with, like get right, like a follower, is this continuous stream of fork, is this point in time move away from database terminology like point in time recovery and standby and like replica for what? Like those things are foreign to developers for the most part. So how do we actually kind of create this as an experience that developers find joy in working with?

Speaker 2:

Just for some context. What year did you join Heroku?

Speaker 3:

Right around the acquisition. So like recall all that announcement. So I saw that early transition being owned by Salesforce, which was a lot of talk now about. Her to and Salesforce. I think the there's a lot that we could go to the earth. That's a whole other, probably deep, dive. I think a lot of that was we were left untouched for five years. No acquisition ever gets left untouched for that long. The COO of the Salesforce at the time basically said no as acquisition has ever been this left alone. No acquisition has ever been successful. No acquisition will ever be this left alone again. And so you can put those like wait what? The Salesforce acquisition was good and I think heroke is in the status and now because of lack of investment over time, not because what was that? 11, 12 years ago now.

Speaker 2:

Yeah, yeah.

Speaker 3:

Over 10. And now people are talking about hey, what's happening with Heroku in the last few years? But no one talked about it for the first five or six.

Speaker 1:

Yeah, that's a really good point. Like you said, there are still ergonomics of Heroku that just still feel magical. Even today. They still feel, above everything else, like other companies are catching up, which is like just a testament to Heroku itself. So, within Heroku, like you're kind of overseeing, I guess, the product Heroku Postgres, is that right. What are some of the other areas you worked in?

Speaker 3:

So I touched a bunch of areas the core languages at a time for a little while, the Add-ons Marketplace, which kind of became like Heroku element stuff. But the Add-ons Marketplace, the API, the CLI, just about touched almost all areas except for just like the core runtime. So like the languages stuff is a lot of the built back stuff, so that's what people think and work with. But like the core runtime itself of like you know, dynos and dynos sizes, it's a lot of internal orchestration. So not so much. Oh, you ship this shiny feature right. You think about like stuff like let's encrypt, right and dynos sizes and packing in dynos and LXC and migrating from C groups to C groups, v2 and all that sort of stuff. So I never touched that area. But almost every other area of Heroku at some point in my career kind of touched the product side of those pieces. It's amazing.

Speaker 1:

Maybe now's not the right time to get into it, but I would like to kind of hear some of the things you envisioned for Heroku Postgres that maybe never came to fruition, and how they've come to fruition maybe through either Citus or Crunchy.

Speaker 3:

So we built Postgres and Postgres Heroku. Postgres became a great Postgres experience for Heroku. We never built Heroku for Postgres, if that makes sense. So if you think about that magic, get push or rollback right, like from a data security perspective, because we have the Postgres space backups in the right-of-head log I've had customers that have dropped a table right. If you run like your Rails test in the first thing is like Truncate table, like oops, I ran that against production. That's a bad day. I've had customers that do that Friday at 4 pm and say, oh, can you get us back? We can look at the Postgres right-of-head log and see where that command ran and get you back to exactly that moment in time where you never lost a transaction right.

Speaker 3:

Now you're down while we're bringing that back up. But from a security perspective, if you say hey, can we run GitBisect to find some bug that's not getting x time ago? Can we run GitBisect to see when we had a inside actor going change data maliciously? Can we spin up 20 forts and just replay the right-of-head log bit by bit by bit, to see when this data changed? Who was connected at this time? What were they doing? Like that kind of stuff.

Speaker 3:

When you look at what Postgres already has in the covers and then take that and apply it down to cloud automation and innovation. One thing we built into Crunchy Bridge right, a senior engineer that worked on Heroku Postgres is running their database on Crunchy Bridge. They know Postgres really well. They know how to orchestrate it. They know how to manage it. We built something called the production check and this was a feature. There was a production check on Heroku. We never built that for the database.

Speaker 3:

That goes and audits all these things about your Postgres database and says, yeah, you're good to go production. Do you have AJ enabled? What's your backup configuration? Are you connecting with a super user? Because you shouldn't for production, that's bad. Super user can do everything. Do you have things like a statement timeout, so long running queries that run over 30 seconds automatically get killed. Or do you log all slow queries over to all of these things that you should do?

Speaker 3:

Right, and you can go, and there's blog posts. They've written about it over the years. Others have written about it, like here's what you should do to make sure your database is ready for production. But you go to RDS, you go to Heroku, and there's no like easy button as a developer to say am I good, not doing this every day? I don't ship a new app to production every day. What's my Postgres checklist? Right, this senior database engineer that helped with me build Heroku Postgres is using your service and then came in and clicked the button. I'm kind of like a lot of this is 101 or 201 level stuff and you're a distinguished level engineer, principal level engineer, stat like very senior, and you're sitting here clicking this easy button and it's like, yeah, it makes my life easier and I don't have to think about this. Right, like with Heroku, you don't have to think about what's your strategy for a rollback for bad deploy.

Speaker 1:

In your career around what time did you transition out of Heroku and Citus?

Speaker 3:

What was it eight or nine years ago now? So is it Citus for almost five years, heroku for about five and crunching out for about three. Hopefully it's more than two more characters. It's a good time.

Speaker 3:

So, yeah, I came into Citus fairly early They've been around for a long time but really enterprise focused and came in and dealt out their cloud offering, which made Citus way more accessible. So myself and kind of founding engineer one and two of Heroku Postgres came with me to Citus and then also came with me to Crunchy. So in addition to myself, it's a very similar team. If you have Heroku tickets from 12 and 13 years ago that are database related, there's a chance. One of the three of us responded to it. I joke to people like if you want that customer support experience again, just come on over to Crunchy.

Speaker 3:

So, yeah, when I came over to the Citus, it was a very special purpose tool, a little more kind of data warehousing, focused distributed systems, and shifted a bit by bit to focusing more on app developers and really just like sharding and scaling for performance, right. A lot of people here like distributed database and think you know distributed clocks and consensus and Cassandra type workloads and it's much less that it's more of. I am a SAS beta. The app that I want to shard my data and I want it to scale for performance.

Speaker 3:

You think of one of the biggest, probably SAS beta the apps, which I know is one of DHH's favorite, is salesforcecom and when you go to salesforcecom and log in you'll see like in a like 23, whatever. That's actually a physical shard somewhere that determines that how you get routed. So that's a whole like copy of Salesforce that's running with a dedicated database Internally at Salesforce. They'll talk about org 62, which is the internal one, that's the number 62 one that was created, but that's like Salesforce's own internal Salesforce of Power Salesforce. But, like all SAS apps, are B2B apps and so how do you shard them, scale them efficiently, worry about noise, neighbors, all that? So scientists really kind of dove deep on that side.

Speaker 1:

That's cool. Also, I'm curious to then what lets you do crunchy?

Speaker 3:

I joke, but it's kind of, it's true, it's like unfinished business. I look across the database providers in the world and no one is creating a heroku like experience. And I think my view on database is a little conflict lately. We have a lot of new shiny databases. They're like really pretty and you're like look at this new shiny bell and feature and for me, at the core of it, it's running a production ready database.

Speaker 3:

I've heard about other providers that experience with support or shoddy experience with backups or transactions or consistency, and it's as a database provider, if I lose your data, the worst thing that my bank can do is lose my money. Right, I'm putting it in there like buying, anoint me with this credit card, offer that credit card or all sorts of things, but don't lose my money as a database, don't lose your data, and really focus on that. To start, right, I would say that was the first year and a half of us. Building a crunchy bridge was like just being really solid, and now we're focusing on the how do we make this a delightful experience with things like production check, kind of our, our save queries, data clips, equivalent container apps where you can launch a podman container directly from inside Postgres Cool, let's see how much fun we can have with the database. Right, we've got Python support for like data science stuff, rich support for extensions, but a lot of this is making it a simple eat like a lot of that magic of a. Roku wasn't shiny, it was just working, simple features and meeting you where you were, and so how do we do that on a database? I didn't feel like anyone was doing that. We were on that path of Roku and it kind of sold out.

Speaker 3:

Don't think it's the big box providers. I don't think they provide a curated, lovely, polished experience and a lot of the kind of newcomers in the space are trying to be more shiny than like get back to building. For the most part, you don't think about crunchy data or crunchy bridge. You think about what are the features I want to build. Now I'm having a bad day or I don't understand it, or something about my database. How can I get help when I need it so I can get back to building. You shouldn't think about us all the time, like we should fade away the back, like if you think about your bank every day like that's a bad situation, but the only time you hear from your bank when they call you is. It's great. Your social security numbers been stolen, right? You shouldn't hear from your database provider? They should just fade into the background so you can focus on your app.

Speaker 2:

That's a very common thing, and part of the magic of Heroku was like look, you don't have to know anything about servers. We take your code that was running fine in development and now it runs fine in production and they now have Docker and stuff like that, where it's like the DevOps work is now sort of the developers responsibility to some extent there, which, like most people don't really want to deal with, and you definitely have the other rabbit hole, like you've spent your entire career in learning all the ins and outs of Postgres and how you can make it better and easier to use, and the average developer doesn't want to have to do that. They want you to do that for them. So it's like important to have that.

Speaker 2:

What was the data clips at Heroku? I remember seeing that for the first time and I was like, oh, this is amazing and seeing like the slow queries and stuff that I didn't even know about. And then I'd see, oh, production is telling me like hey, with production loads in development, everything was fast because they didn't have a big database. But then, look, production is telling me this is not efficient, you should add an index here or whatever. Those were things that I found like really valuable early on in my career, like I didn't know that I wasn't paying attention, and now they're like giving me tips and how to improve it, and that I've found like incredibly useful. Yeah, you're part of that.

Speaker 3:

Yeah, data clips. It was so funny. The entire zone that was pushed by one person. The entire rest of the team was anti data clips from rebuilding it. They're like this is dumb, no one's going to use this, like really why. And then once it was there, man data clips was the most hidden feature that no one as a customer ever used at Heroku hardly, and it's gorgeous. And so one of the first things at Crunchy was like yeah, we've got to rebuild this and we've got to build it better and iterate and improve on it.

Speaker 3:

When you're early on bootstrapping and building something out, that's how I would recommend building your internal dashboard. So data clips or save queries on crunchybridgecsv to that URL and it'll automatically pull in a CSV. Now what you can do is in Google Sheets you can go in your formula bar equals import CSV, open parentheses, quote and put that CSV and it'll automatically import it and periodically refresh it. So you've got now live SQL data imported that you can then chart in signups week over week and you can import all of that directly there in Google Sheets without having to go and be like all right, what BI tool are we going to get and build and all of that. It's a wonderful tool. Right there.

Speaker 3:

I got to admit follow my story but I was wrong on data clips. But yeah, I mean the slow query stuff, right, absolutely. Look at that. Have either of you ever run a query or migration against production and you start to run it in staging, you run it in production and then it haims because it's writing a default value, right? Or you ran a query and it's taking way too long and you're like quick, kill this, kill this. And some other engineer comes you and slack or walks over your desk and it's like what are you doing? Because production is down because of you. Either of you have that experience ever.

Speaker 1:

Yeah, I'm the king of locking tables at Podia.

Speaker 2:

If you don't do anything without locking tables, it's for safety right.

Speaker 1:

I put an exclusive lock on and just go get a cup of coffee and then come back and get back to what I was doing.

Speaker 3:

This is important.

Speaker 1:

I need the lock to make sure no one else messes with it, right yeah?

Speaker 3:

We actually set certain defaults on crunchy bridge databases of statement timeouts. Hey, if this query runs for more than 30 seconds, kill it. Now. That doesn't mean you can't overwrite it, right. But why bring down production? Because you didn't know, it was like a write-up passage for every person that worked on the API team at Heroku to bring down production by running a migration that read out a default value. It was just everyone did it. Things like logging slow queries any query over two seconds we log into your logs.

Speaker 3:

If you look at your logs, as a Rails developer, get an alert on errors and all that sort of thing, and it's a setting right there in Postgres. Why not set it right? But most people don't go in and set that, so it's those same defaults, that you don't have to think about it that can make all the difference, as I'm giving away our product secrets right here for everyone to copy.

Speaker 1:

No, but it's great, though, as we joked about in Amsterdam at Rails World, we have actually started taking steps towards migrating towards Crunchy Bridge, podia and even things you have in the UI that are things that I have to run right now at Heroku is either get into Postgres myself and run or use some kind of command line. Tool is available to me just as clicking around in the interface. I don't know that little stuff is just so valuable to me. It's less brain space I have to use, remember what command to run or what query to write.

Speaker 3:

Oh yeah, complete the database insights of slow queries or catch hit rates here. Bloke it really is like being thoughtful on that developer experience for the database. I can show you the query for bloke and you'll like you'd never run that against your database. It looks like a Ruby core dump. It's like what are you about to do to my production database? And it's harmless, right. But you get the results and you're like, okay, this is below 99%. Here's what I should do. How do we just simplify it so that you're not scared? I mean, you can't mention hey, we're not database experts and we want to get back to building stuff.

Speaker 3:

And I was at a conference this was like seven or eight years ago and it was at the speaker's dinner the night before and one of the speakers was like talking about how us, as developers, pretend like we're so thought out, well-reasoned in every decision we make that it's like, of course I'm going to pick Ruby, I would never pick Python because of this and this, and they're basically the same language. The communities are so similar. Yes, there are differences and I'll probably get raked over the coals for this, but no, they're the same language. Sure, let's go talk Go or Rust and how they're different but like Python and Ruby are the same and get the religion of no, I would never do this in Python or that in Python, Right, and it's not like you should just swap back and forth.

Speaker 3:

Take, for example, Postgres. Of course I'm a Rails developer, Of course I'm using Postgres. Why wouldn't I use Postgres? Actually, I had no idea why Postgres, but if you ask me that, right, like such a bandwagon fan and it was really funny because half the table got super quiet and just looked over at me and I was just like, well, because of transactional DEL, can you're an index creation, JSON B, range types, like timestamps with time zone? Okay, so there is a good reason, but I do think there's this barrier I would love to take on this. Right, You're scared of the database? Right, you don't want to? Hey, I love Postgres. And then it's like I show you an explain plan. I'm like, well, let's not get crazy. Now, I didn't mean like actual Postgres, it's like the idea of Postgres.

Speaker 1:

I feel like I'm a little different because I want to know that stuff, specifically about Postgres, but at the same time I'm still scared of it. I want to be able to run and explain and be like, oh, of course, like there's too many iterations here or whatever, but no, it scares the hell out of me and I'm trying to learn and I'm rereading Andrew Atkinson's new high performance Postgres with Railsbook right now and it's making it less scary. But there's a lot of stuff there and, yeah, at some point I like I kind of feel like I need to balance more and learn more about the database. But then at the same time I'm like I'm just going to kind of coast back to like it's there and I know some things about it and I'm just going to write active record and close my eyes.

Speaker 2:

Let's take a second to thank our sponsor, honey Badger. The number one reason startups fail is that they run out of money. There's so many ways for startups to lose money. Down time should not be one. Recent studies found that downtime can cost four hundred and twenty seven dollars per minute for small businesses and up to nine thousand dollars per minute for medium sized businesses. As every single minute, that's insane. A monthly subscription with Honey Badger helps you prevent costly downtime by giving you all the monitoring you need and one easy to use platform so you can quickly understand what's going on and how to fix it, which helps you stay in business. That is good. Best of all, honey Badger is free for small teams and set up takes as little as five minutes. Get started today at Honey Badgerio. That is wwwhoneybadgerio.

Speaker 2:

I want to know the internals and how it works and I'm excited to learn that stuff.

Speaker 2:

But I also am getting paid to ship features and build stuff.

Speaker 2:

There's always a fine line of well, we can dive into the explain when the query is really slow or something, but not getting paid usually to go into all those nuances and learn the differences.

Speaker 2:

I feel like a lot of the developers also have the same feeling like you were saying about use Postgres and it was popularized by Rails and Heroku and whatever. But when I was learning Rails, it was like I came from WordPress and PHP and stuff and it was all my SQL and I switched over to Postgres and was like, oh, there's this cool JSON column now and I learned a few things, but I didn't really know a lot about my SQL to even really compare them properly, and I think that's probably another thing that and also people are starting to do talking a lot about SQLite in the Rails community lately too, and it's you don't really have a lot of ideas on the differences between all three of them, because there's a lot to understand about limitations or benefits or whatever of every different system and it can take a long time to go learn all those and know what fits the bill.

Speaker 2:

Here and there. You can see Peter Lovell's using SQLite on RemoteOK and his massive apps and it works great for his stuff, but maybe doesn't work great for your business and whatever. So I feel like we get overwhelmed by that as well, where there's just so much to learn at the same time.

Speaker 3:

Yeah, I know it totally makes sense. And I think on the explain side, I've worked with Postgres almost nonstop for close to 15 years now, right, with some small breaks in there. And I still see it in explain playing and I have to like, all right, have I had my coffee? It's not like I'm like oh yeah, here's the problem. So, like Postgres explained is the most single, most like intimidating thing and it's you don't have to be an expert at it, just find like the right tool. Right, there's a number of tools that kind of visualize this and visualize it in color code on my list at Crunchy Bridge, right? And how do we just hey, we've already got your slow queries. How do we bubble up the explain and give you easy to understand stuff? Right, I think, from the feature perspective, rails 7.1, like to me, I read it and I'm like man, this is like the Postgres release.

Speaker 3:

Other people like talking about all sorts of stuff in Rails 7.1 and I'm just like there's a lot of database stuff in here if you hadn't put this. And Rails is great about picking up the latest and greatest stuff like range type support, json B, all that, and I don't think you have to become like you can, right, if you want to learn how like a GenIndex works and just indexes, cool. It's cool, actual CS stuff we talked about like the lead code interviews for databases, that kind of matter, right. What are the five indexes tied in? Can you reverse a linked list? Well, in databases you kind of do need to do some of that stuff, but you don't have to understand it. Take advantage of all of the features, because it's like Postgres is doing that work under the covers of just being really solid. But yeah, I mean, when things come out, I think it's like, hey, if you're curious, google and we probably have a blog post on it, or like our Postgres playground. Have you guys checked that out?

Speaker 2:

No, I don't think so.

Speaker 3:

So we took a postgres in your web browser and there's like guided tutorials on, oh, you want to learn how to write CTEs? And it'll load up a sample database in your web browser. So, don't refresh it, oh, lose your state. But people are like, well, what's this connected to? We got security reports. I was like, look, I broke down to the shell and I'm now into your system. I'm like, no, you're now into your own web browser. So, but, thank you for the security report. I love people trying to hack it. And look, I got you and you got yourself. But yeah, just a ton of tutorials.

Speaker 3:

We over the summer, one of our marketing folks was like, hey, I'm volunteering with this kids bootcamp on coding and created like a sequel for kids, like kid level basics of the database. Don't be afraid to spend 10, 15, 30 minutes. How does this feature work? And then totally get back to writing features and shipping features. Right? I think that you don't have to dive deep into explain and most of my talks I put up and we've got a page on why postgres on crunchy datacom, like check out that page and our Postgres tips and it's like just go through and browse the features and you'll be like I had no idea what like Pub sub is. Listen, notify is Pub sub in your data. That's pretty sweet.

Speaker 2:

Yeah, I remember that was one of the options for a back end for action cable and I think there's a limit of like 4,000 characters or something. That maybe makes it not as production ready If you're piping over a giant HTML or anything. But it totally works and as one less dependency that you could use. As long as you pay attention to that limitation, you're going to run postgres anyways, so you could just take advantage of it, which I thought was awesome. But Rails is, I think, kind of not mentioned it a whole ton because of the limited. Well, if you're rendering all the HTML from background jobs or something and piping it over to the browser, you might catch yourself with things getting cut off or whatever. But that's another incredible feature like that postgres just has that I don't think hardly anybody know about, which is like amazing. There's so many things that it can do like that. I mean there's a lot that just falls in the background to like transactional DDL.

Speaker 3:

If you're running a migration and you have a default column, right, so it adds a column and you write out a default, like certain other databases, if that feels you're in a broken state, whereas postgres rolls that back. Right, you can't have a table in an inconsistent state. Cleaning that up in other databases is a huge nightmare. Well, that's what you would expect a database to do, right? Don't leave me in a broken state while running a migration. The rich indexing types right. Most databases have create index, which is a B tree index. Right. Postgres has gen. Just bring indexes. It just keeps coming. That is really crazy.

Speaker 1:

Super rich In a way going back to our earlier discussion, I chose postgres mostly because that's what Heroku used and I saw other Rails developers using it. I was using my SQL before that, with PHP. I just removed a bunch of indexes in preparation for our migration to Crunchy and I did those using concurrent algorithm. Do other databases have support for doing concurrent structure changes like that, or is that a postgres specific thing?

Speaker 3:

I have to go and query database by database. I don't believe many use concurrent as the line. I don't know if that's in the SQL standard or not. I joke like are you going to quiz me on my SQL knowledge and now I'm going to quiz on my SQL knowledge.

Speaker 1:

Well, let me ask you do you have to vacuum a table after vacuuming a chicken?

Speaker 3:

Yeah, vacuuming. Postgres is the biggest thing that people are like. I love when developers are like that keeps running, I'm going to turn off vacuum. No, actually this is like postgres under the covers is a giant appended only database. If you want to understand how it works, when you update a record it just marks the old one is dirty. It doesn't go in and change things on this. It says this is dirty, write a new record. This is where things like index only scans come in, because in the index it's maintained and fresh and so if you can query stuff that's fully in the index, it doesn't have to go to disk. Otherwise it's. Oh, I scan the table, but then I have to check is this dirty? Oh, it's dirty, so it's actually deleted. Right, rose, over simplification of how it works, right. But then vacuum is coming in and saying oh, this is a block here. I'm not going to go and delete this, I'm going to free it up. And so the next time I write a new record, where are all these dirty blocks that I can write new data to? So if you're just writing data only ever, you don't have bloat and you don't need to worry about vacuum.

Speaker 3:

But it's when the leads come in that it's okay. It's deleting this data in the background, actually, and it's all ties into like MVCC, like multi version concurrency control, where you can have one person writing data right now, one person reading data and still get a consistent view. Right, it's not. Oh, this person's doing a transaction Like it knows. Oh, this transaction was in flight. I need to wait to commit this, but in the meantime, this other person can still do this entire view of the database at this time. Right, it's pretty fascinating from just a CS perspective when you dive into how it all works and as a developer, you never thought about that. Right, it's just like I'm writing. You read and David.

Speaker 1:

It's amazing the more I use Postgres, the more features I learned about it, the more I am amazed by it and the more I'm amazed at how easy it makes my life to do some of these things that without the tools there I wouldn't know how to do this type of database maintenance or migrations. I'm really glad to have it. I am curious. So you mentioned, like the ride ahead logs earlier and you mentioned being able to fork off those and do playback to see if there's any bad actors and also being able to restore from ride ahead logs. Do you have any other features at Crunchy that are built off the ride ahead logs?

Speaker 3:

Good question. I mean at the core of it, it towers kind of everything. That is your state of record for Postgres A lot of people think of. Well, I have a daily backup and as well, postgres is more like a timeline. I was just catching up on low key Like the idea of timelines. Right, like your database is in a point in time. It's a timeline Now do you want your database as a this point in time?

Speaker 3:

So everything from our like HA functionality to you want to fork something to you want to read replica, all that gets exercised constantly by our system and our system under the covers it's a Ruby finite state machine. So it's Ruby. We love Ruby because of the rebel. There's a little bit of crystal laying in there. We're fans of crystal. I don't know that we use it for the full blown. But we're a little different on the stack. Like it's Ruby with Sorbet, which I know is kind of a loaded topic of wait, static Ruby, what For large code refactors? It's amazing. Like it took us some time to get kind of fully on board with Sorbet. But like and we've given some talks, I can dig up some links of kind of how this works right From a.

Speaker 3:

It's basically one large finite state machine orchestrating AWS and Azure and GCP. No kind of extra abstraction there. It's like natively orchestrating. But then it's like, hey, we know how databases work and how they fail and, having run on cloud so long, we expect this to fail. Here's the health check, right. And then, based on this health check, here's the remediation path, right. So just a complex finite state machine orchestrating all that there, which is really about kind of managing the wall and the instance in the state of 10. Yeah, that's fascinating.

Speaker 1:

You'll be surprised to learn. I don't know much about right ahead logs, but I'm familiar with the concept and it's the more I learned about like how crucial they are to actual Postgres is so fascinating.

Speaker 3:

I mean, your database is just an appended only log. Rose over simplification, it's all Postgres is.

Speaker 2:

Yeah, it's cool. I think if anybody hasn't really dove into that stuff, they haven't probably realized that the history of what happens is different from the current state of your bank account. Balance is $10,000. But the amount of changes that went to get to the current state is what you actually need to keep track of and that's just a view at this moment in time. And your database is the same kind of thing, and analytics are, and all kinds of systems work similarly like that, where here's all of the changes over time and then you can also look at the current state of the world as well.

Speaker 3:

Yeah. And then when you think of the hey, hundreds of people doing a change on this same record at the same time, what my bank account is usually only me and not 10 people with my credit card man you think about the complex of multiple people doing all this action at the same time on similar data sets. I don't know how I would have from scratch, designed this myself. It's kind of a mass, full piece of software.

Speaker 2:

Yeah, it is Problems you'd rather have somebody else solve and just consume yourself. One question that I get every week is how do you do multi-tenancy? And I thought you would be a good person to talk to about that, because everybody's seen like the apartment gem using schemas and then doing row level multi-tenancy where you put, like, the account or tenant ID on every record. Love to have you speak to that and explain how you should do it and maybe help people understand what approach to take on adding multi-tenancy to their applications, because I think it's just one of those things that people have a hard time wrapping their head around.

Speaker 3:

From what I've seen, so everyone did think I slipped you a 20 for that question Because I'm working on two blog posts right now, but dive so deep into that oh, my goodness, perfect. It really is a matter of what's the scale you're going to get to, right? And I mean, we have a bunch of kind of multi-tenant style customers, a ton, and some use like one database per customer. You're like, no, that's a bad idea. Well, you have 10 customers and they're really high value. That's totally fine, right, I would say, if you're never going to get beyond 100 customers, right? Hey, our customers pay us a million a year and that's our market and we're going to have, we're going to grow to 100 customers. That's fine. What you're then doing right, is losing a lot of what you get with active record and reals of.

Speaker 3:

Ok, I need to run a migration. Well, now I'm running 100 migrations and what happens? Like this idea of transactional DDA, right? What happens if this fails on number 96? Well, is my code work with both versions or not? Or am I running 100 copies of my app? You're now reinventing all of these things that we just get out of the box that you've got to worry about. Schema was interesting with apartment I. Is apartment still maintained or not? I'm kind of unclear, but I think it kind of still works mostly.

Speaker 2:

Yeah, I think it still works, but it's not really no new features or anything added to it, which probably doesn't need it either. But yeah, as far as I know, there hasn't been much activity on it. And so, I think, the face of the way you might even need to explain what a schema is, because I think a lot of people just I connect to a database and I have a user and a password and a database and haven't quite wrapped their head around schemas either.

Speaker 3:

Yeah, so I mean in Postgres it's super confusing because you're not connecting to a data, you're connecting to a database, but when you connect to an instance it's also a Postgres cluster and so you can have multiple databases inside an instance. So you can get inside Postgres and if you have super, usually you can create DB and create a new one, right. So you can have database one, database two, database three. These are completely isolated, not from a wall perspective but from a PG dump perspective. So if I take a dump on my database, I can dump database one or database two, right, and I run a migration and it runs on database one. Schemas are sort of that, except for you can see across each other. So I can query from schema one to schema two. So in Postgres if you ever written a query where you had like select, start from user one, dot events, user one is the schema and so you can create schemas, kind of just like you can create databases. The difference is you can't and there's a nuance you can query across databases but you're doing it with four data wrappers and DB like. It's not easy to just query from database one to database two. You connect in front of isolated to that one, schemas are all in the same database but you can query across them, so they're reasonably isolated. And then the third approach is kind of the what people call like a tenant discriminator, or putting a tenant ID or customer ID on everyone. Right, schemas are a really nice middle ground Because you can query across them if you need to. So a lot of the same problems as the databases in terms of running migrations. You don't run a migration across multiple schemas at once. Apartment helps you on some of this stuff. I recently cited support for schema based sharding. So used to with Citus was that only that tenant discriminator where you had to have the tenant ID or customer ID on every table. They recently got schema support, which I think makes the combination of Citus and apartment super interesting. So now you can have a little better data isolation to say, oh no, this customer is only in the schema, they're not mixed and matched.

Speaker 3:

I do think the customer ID discriminator is the scalable way to do things Over simplifying it, right, you think of like oh well, we'll have like users grouping one, users grouping two. Right, you'll have two tables for sharded ton of setups and customer one will go in users grouping one, customer three will go in users grouping one right Even on split them up kind of round robin them, that sort of thing and then when you join using something like active record, multi-tenant just enforces. You have that tenant ID on each column so it enforces every query we'll have let's join events and messages for customer ID one right, and it'll make sure that customer ID one is there on every single one. Fascinatingly, we help migrate a bunch of people to Citus that were Rails apps, that were that multi-tenant setup, and usually it was kind of three to six months of effort to kind of get ready for Citus. It wasn't drop in magic, but once you're there it's kind of infinitely scalable. And we've got Citus support now on Crunchy Bridge as well Slow row line kind of gradually embracing that for customers. It's not a drop in magic or magically started infinitely scalable, but once you get onto Citus it is pretty infinitely scalable.

Speaker 3:

If you're running a multi-tenant app, then you're going to need a petabyte Like Citus can actually handle it, and I don't know how many of you I mean there's.

Speaker 3:

You're in unique territory when you're at that size, and so to me though, now that Citus has support for schema base, it's really fascinating, and it works with PG Bouncer as well for connection scaling.

Speaker 3:

That's one of those things that, like PG Bouncer, I cannot evangelize enough, like everyone should just use it and it just got support for prepared statements, which was usually the arguments against it, which every time I saw one just someone disable prepared statements in Rails and use PG Bouncer it was still just a huge win in terms of performance. Fascinatingly, every time someone used that like tenant discriminator or that customer ID on every column and added the indexes and the composite keys which is now in Rails 7.1, composite primary keys, which is awesome. They would get like a performance boost of like one to three X just by having that extra index and the Postgres query planner would work so much better that it's like you're building a multi-tenant app. Just do that, put customer ID there, no matter what, add your indexes there and the planner works really well. Though I am really curious with the new Citus support for schema to explore apartment maybe only get kind of reinvigorated with like apartment and Citus was always super tempting to me. They just never played together for a long time.

Speaker 2:

Yeah, that would be cool to see because I know it got a bad rap for a while and people had moved on to kind of help maintain access tenant now and it does the row level stuff and you know that works really well. I know people would talk about performance issues with schemas at certain level or whatever, but there are also probably people who aren't super knowledgeable about Postgres and that sort of thing. You mentioned PG Bouncer probably a very common thing that I see as well is running Puma. I'm running Sidekick, ran out of database connections and my jobs are failing and whatever. But we have a connection pool in Rails and people don't quite understand like what is PG Bouncer and how does it help and fit into the puzzle there? I thought that would be a good one to talk about briefly too. Yeah, I need like a whiteboard.

Speaker 3:

So when you've got your like connection pool in Rails right, okay, it comes in, give me a connection to the database. You've got 10 connections sitting there Prior to Postgres 14, every connection to your database consumed about 10 megs of memory and so active record comes in. You get a new request and it says give me a connection to the database. And it sets it in a bunch of stuff, does some logic and then runs a query and says okay, give me these last five events for the user. Now let me template this. Blah, blah, blah. Now let me close the connection and return the request.

Speaker 3:

Right, so much of that time was like not database stuff and you've got a connection there just doing absolutely nothing, and if you're not doing anything, it's well. There's 10 megabytes just wasted on nothing Like. That connection is sitting there open in the pool. Now the difference is okay, well, let's just get rid of the pool, right? The problem with that is like SSL and TLS negotiation is slow. It's 50 to 100 milliseconds. So you're like oh, let me grab a new connection to the database. Okay, cool, here's the connection database. Let me run this query which is select the last five events for a user, which is two milliseconds. My database is slow. No, grabbing the connection was slow, so you want the pool to lower the latency. The problem is you don't want like a thousand open connections each consuming 10 megs of memory.

Speaker 3:

So what PG bouncer does is sit there and run on a server. It talks to native protocol and your reals connection will thinks it has a connection but it doesn't. And PG bouncer will give you that connection but it's not actually going to give you that connection until you like start to run a transaction, until you go like begin and an active record, even if you just like never begin a transaction. You say, hey, give me the last five events. Well, under the covers, active record goes begin. Give me the last five events in query commit. Right, that's what happens under the covers. So PG bouncer is just sitting there waiting for those like keyword transactions.

Speaker 3:

Rails never knew it didn't actually have a connection, so it's hiding this all the way. It's not that you shouldn't use the rails connection pool, but it's like these things are not the same thing. One's a true connection pool to the database. The other one is lowering the latency of grabbing that connection. And if you query your database and still account star of how many are idle, there's a query in the blog post that'll walk you through that. We bubble this up in the crunchy bridge dashboard and play in his day like here's the state of your connections. It's one of our production check items. Like it looks like you have more than 20 idle connections. You're just wasting resources. There are cases where I've seen putting PG bouncer in and it didn't improve performance. I've never seen it negatively impair performance 98% of the time. Like PG bouncer is basically a free lunch if there ever was when it comes to your databases and your Rails apps.

Speaker 2:

Yeah, that's a great description of it and pretty much, if you're going to install Postgres, also add PG bouncer because you might as well like it. Those rare cases are probably pretty obscure situations or something to probably, I would guess.

Speaker 3:

It was like the first thing we added to crunchy bridge and to me. You're not running a production Postgres service and I'm amazed at others that just don't have it in this day and age and it's been launched for years and it's like this is one of the most valuable things and that it just got. Support for prepared statements is huge, Because usually you have to come into active record and disabled for good statements. And there there are some caveats, like because your DDL is transactional, well, you don't want to run your migrations for PG bouncer because of the way multiplexes transactions, right, so that's not a bad thing. Great, you have other connections you connect on, or you know five, four, three, one with PG bouncer on us you can connect five, four, three, two for the standard connection, so you can totally go around it, it's not a bad thing, but for your mass production Rails app, you totally want using it.

Speaker 1:

We need to start wrapping up, unfortunately. But there's one thing I want to talk about. I am insistently amazed, a at your postgres knowledge, but B was stunned in Amsterdam to learn about your Tiki bar knowledge. I was standing at a table and I this is all I said. I said, yeah, we've got a long layover on the way home and my favorite bars in Chicago, so we're going to leave the airport and, without blinking, craig goes three dots in a dash. Why? Yes, that is my favorite bar and subsequently, where we're going and it was delightful. But you were telling me about one you went to recently and I'd like to hear more about that.

Speaker 3:

I think the next night or two nights later I was at Smugglers Cove, which I think is in the top 50 bars consistently in the U? S, and classic Tiki bar. They've got a book. Yeah, I'm definitely a thing of the Tiki bars for sure, which is it's so funny because there's 20 good ones in all the U? S and most people think I was there with one colleague He'll be so excited that I'm talking about this and we went to it at the actual Tiki bar in Amsterdam in Rails world, and he walked into the place.

Speaker 3:

He's I'm staying for one drink. I can't believe I'm in this place. He's looking at these things of like hurricanes and daiquiris and I think he first ordered a mojito. He's like that is the best mojito I've ever had and he lives in Alabama and goes down to the Cove and you think Mai Tai and it's like pineapple and orange juice and it's no proper Mai Tai is like orange, which is like almond syrup and lime juice, cointreau and rum. Right, that's a proper Mai Tai. Most places you order one. That is not what you, that's not your. You don't order a Mai Tai in your college dive bar.

Speaker 3:

But it's so funny because then he ended up having a classic daiquiri and now is on like a complete daiquiri campaign of man. This is amazing. I'm going to get home and make these. I think he texted a hundred of his friends and you do realize you're going to get made fun of for this and he did not care. Proper tiki drinks are wonderful. They're not necessary. I mean they can be slightly sweet, but for the most part not sweet like you think it's a wonderful hobby with way too many tiki mugs as well. It's a good thing you ended on this, otherwise you would have never gotten to Postgres.

Speaker 1:

Well, we should have a tiki episode sometime, because three dogs in a dash is the only place I've ever been, and so now I've got to add smugglers. Swanglers Cove was at it.

Speaker 2:

Yep.

Speaker 3:

That's my list I have strong water in Aina. Hyme is unbelievable, Like yeah, there's a long list.

Speaker 3:

This is a whole new world for me, so this is going to sound ridiculous, but there is actually a tiki bar at Disneyland and Disney World. If you ever go there, that is legit and as good as three dots in a dash like this isn't better drinks necessarily there. I would say close in quality, very close, but you can order certain drinks and it's like Disney a five, like there is a chip in a bottle that will like the whole place will like start like the lights will start like going down, there'll be like lightning effects and the ship in a bottle will start sinking. There is one where, and as it's sinking, the bartenders will grab spray bottles of water and as you're at the bar and tables and everything those are start spraying around as if you're drowning. Right A Disney a five, three dots in a dash. Right, where this is a immersive experience you can imagine.

Speaker 3:

I feel like why is respected Craig? Because he sounded knowledgeable about Postgres and now he's talking about tiki bars at Disney. I'm never going to trust him for food or drink choices. Thoroughly, give it a shot and you'll be like okay, now I'm going to follow you. Usually, you know, when we're out at team meetings they're like wherever Craig's going, that's when I'm drinking and where I'm going.

Speaker 2:

Sign me up Speaking of Disney. We went to the Cantina Disney and had I forget what the drink was, but it had the foam, yeah, the foam, the numbing foam stuff in it. It was amazing, it was so wild.

Speaker 3:

Yeah, I think the drink is called a fuzzy tauntaun. Yeah, I think that's right.

Speaker 2:

Yeah, and we looked up whatever the, whatever it's made out of.

Speaker 3:

that does that a flower, it's a buzz button, it's a flower that you can get, and so I actually thought it on making this phone at home. We were there one time with some family friends and one of the ladies we were with drank it and you know it was like yeah, I don't know about this phone, and the waiter, so perfectly like in character, brought her over a whole cup of it like a separate, like paper cup. And so this phone, like it, makes you numb, it's like made out of buzz button, like flowers, and like your mouth will go numb for a couple of minutes and it's fine.

Speaker 2:

And yeah, it was wild because we like had no idea that was in it or anything, and you're like in this story yeah, I think it said something on there or whatever.

Speaker 2:

But we were like, yeah, whatever, maybe this is like a gimmick, but no, felt like a drink that you would actually expect from a Star Wars cantina. It was wild. Yeah, we had such a good time with that because it was so over the top that like, where else are you going to end up? I didn't even know a buzz button when it's a thing that was so cool and, yeah, totally in character.

Speaker 3:

Cast member came over, took the cup and she's like no, thanks. He's like oh no, and he put it off out of her drink and made sure. He's like you're finishing this before you leave. And she was like half there by the like, in character, in a good way, full commitment to it.

Speaker 1:

Amazing. This is great, Craig. Where can people find you on the interwebs?

Speaker 3:

Craig Kirsten's on Twitter. To me, it'll always be Twitter and not X, just like the Facebook is always the Facebook Blue Sky as well. Craigkirstencom, a lot of the crunchy. I mean, check out the crunchy resources If you're curious about Postgres. It's not like I'm going to become a DBA, no, we talk to you as an app developer. Our Postgres playground. We've got a collection of Postgres tips that it's just like. Hey, small, did you know things to? Why Postgres? If you ever have the question of like, why should I use Postgres? Well, coming on talking about these things, but also, how do you share this easily with others? So I mean, definitely take a look at our blog, our newsletter, our playground, all that sort of thing. If you're at all not, okay, cool, I'm going to go use SQLite now. If you're at all intrigued, take a look at some of those crunchy resources.

Speaker 1:

Awesome. Well, thanks for taking time to sit down with us and talk. All things, postgres. We appreciate it.

Speaker 2:

Yeah, thanks. Good fun Next time at the TQ bar, with the erupting blocking, you know, in the background that sounds perfect. I'm sold.

People on this episode