Ever wondered why a seasoned software engineer would transition into becoming an author? Meet our guest for today, Andrew Atkinson, a software veteran who is about to launch his book focusing on database operations for Rails developers. Andrew's rich 15-year career as a software engineer has culminated in this exciting new venture, as he peels back the layers of database operations, specifically in the Ruby on Rails landscape.

In our lively discussion, we delve into the heart of relational databases - comparing the merits of Postgres and MongoDB, and when to use one over the other. Andrew demystifies the assumption that one necessarily needs multiple databases, discussing how Postgres could be potentially used as a catch-all solution. Not stopping there, we journey through the thorny terrain of data synchronization challenges across multiple databases and the treasures of transactional consistency.

Finally, we discuss the importance of performance optimization in Rails applications and the role of database internals. Andrew dispenses nuggets of wisdom on how to optimize Rails performance and database queries. We also talk about the benefits of strict loading in active record - a key player in avoiding the notorious n plus one query problem. Wrapping up our discussion, Andrew guides us to the Pragmatic Bookshelf where his upcoming book awaits all keen Rails developers. So, lean in and listen, as we uncover layers of database operations that could dramatically level-up your projects.

Honeybadger
Honeybadger is an application health monitoring tool built by developers for developers.

Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.

Listen on

Apple Podcasts Spotify Amazon Music Podcast Index Overcast iHeartRadio +

Share Episode

Share on Facebook Share on Twitter Share on LinkedIn

Speaker 2: 0:00

This is remote, ruby have you any remote ideas to the meaning of the word.

Speaker 3: 0:08

All right, I'm here with all three Andrews Andrew Oliver, Andrew Mason, andrew Atkinson, andrew Charnes, andrew Charnes. Today We'll call Andrew Oliver, chris, we'll call Andrew Mason, we'll just call him Andrew I'm trying to remember that we're recording right now and we'll call Andrew Atkinson. Andy, for the sake of this conversation, you will call me. You call me daddy, andrew. All right, all right, dad. So yeah, today we have Andy Atkinson with us. I will briefly mention Andy and I met since it Ruby in Vegas, and I was thinking about this last night. It's 2022 was the last in cities. That right? Yep, that's right, because he skipped 23. And then you're speaking at 2024.

Speaker 2: 0:53

Yeah, that's right. Yeah, I'm excited about that. So I'll be out in Vegas again with everyone else that's attending there, and next spring. That's awesome.

Speaker 3: 1:00

Well, andy, I will let you give a quick introduction and overview about yourself, and then we'll dive in.

Speaker 2: 1:07

Well, thank you. Yeah, I'm Andrew or Andy Atkinson and I'm a software engineer for about 15 years and worked with Ruby on Rails and postcards for about a decade and recently I have written a book that is a poster, as a book for mostly at Rails developers, and I've kind of been exploring this specialization in my career area to work with kind of all aspects of the database operations. So, yeah, that's kind of what I've been up to.

Speaker 3: 1:36

I remember you, I think were the first people that came up to me after my talk in the city because mine was like not very deep but covered some database stuff with Rails and you're like, oh, like this is my wheelhouse. And then the next thing I know you're writing a book and as someone who has read through it, it is a very detailed like this is what we've been missing, I think in the Rails community is a book like this so a lot we can talk about database wise.

Speaker 4: 2:03

I have a question first, though. I'm sorry. I'm sorry. I just want to know how do you feel about MongoDB? Oh God.

Speaker 2: 2:10

I don't really have an opinion about it. I actually did use it, like when it was really new quite a long time ago. Obviously, the books about relational databases and even one of the last chapters is about ways you can use relational databases for what. You might use other types of databases like key value, store objects, store databases like Mongo, ways you might do that with Postgres. So at least in terms of staying on brand, I would say, hey, do you really need Mongo? But it's just like.

Speaker 3: 2:36

You need to read that chapter.

Speaker 4: 2:38

I know you do.

Speaker 3: 2:40

Andrew's trying to destroy all his relationships in life.

Speaker 4: 2:43

Yes.

Speaker 3: 2:44

So let's dig in. I think, first and foremost, I'd be interested in hearing how the book came about. Pragprog is, in my opinion, one of the gold standards of book publishers for technical and programming books, so I'd love to hear how you got synced up with them, how the book came to life.

Speaker 2: 3:03

Yes, I was really honored to have the proposal accepted. The kind of quicker story is I didn't really set out to write a book, but I have always kind of liked pairing up with other developers and having programming be kind of a social thing and kind of building things. And I've been blogging for a long time and used to go to a lot of meetups but still kind of get to some meetups and conferences and things like that and I hadn't really spoken much at conferences or that kind of thing. But I was kind of getting to a point in my career where I wanted to kind of grow my impact outside of just my immediate team or even immediate company and I thought that getting into conference presentations would be a way to do that and it also, I think, forced me to work hard on the quality and accuracy of the content and that kind of thing, more so than if I was just giving a presentation within my team or something like that, which is also great but it's not as maybe rigorous. So what happened was I was at a company past company where we did have a lot of significant post-gres scaling challenges and it was a Rails app and it was a social media style app with heavy load in the pandemic. It was an education app K through 12. And teachers were trying it all around the country and even outside the United States. So we were getting a huge amount of increased load on our system.

Speaker 2: 4:25

And I joined after the first wave, which was the kind of initial shutdown period, and the team had already done loads of work and had scaled the instance vertically to the max size available on AWS, for example, and we were at the point where we wanted to try to squeeze more performance out because we didn't have much left for headroom. So we were doing kind of like I had bought a book called Postgres. It was actually 9.0. It was quite old, but we were on an older version of Postgres. It actually bought a high performance Postgres book and in that pandemic period just kind of really dove in, so I was able to take on query optimization, some auto vacuum tuning. We ended up splitting out a second database which was kind of application level sharding which I didn't know what that term was at the time but later found out that's what you call that and did a whole bunch of things, put it all into a presentation and some of them were quite small, some of them were big, but put it into a presentation, learned that at conferences you can do a case study and that way you can kind of just talk about your experience and got a talk proposal accepted to Postgres conference in New York City PGConf NYC back in 2021. So I gave that presentation. That was my first ever databases conference first, technically my second presentation, but my first in a decade.

Speaker 2: 5:41

I had done one a long time ago, totally different topic. That was probably the backstory, and then the rest of it's kind of downhill like in terms of there wasn't as much work that I was doing, it was more about getting the proposal, I guess. So I was contacted by what they call a acquisitions editor, which I think most book publishers have. I didn't really know what that was either, but they said that they would like to publish in Postgres and actually was not pragmatic programmers as a different publisher. And they said, hey, it turns out that this person actually had written a Postgres book many years ago, which I found out later, which was cool to find out and this person's name is Jonathan. We didn't end up going with that publisher, but Jonathan was great to work with. I believe that I won't say where he's at now because I'm not 100%. Jonathan helped me develop a proposal and that's really what his job was is to reach out to prospective authors and encourage them if they're interested and then essentially you do a proposal similar to a conference proposal, but each publisher has different standards.

Speaker 2: 6:40

So I went through the proposal process with that publisher and it was accepted and that was, of course, really critical milestone to realize oh, actually this could be a thing, but, as we probably all do, for when you're learning about something new, how does this work? As a new author, and I learned about the concept of it's okay to submit your proposal to other publishers. So I went ahead and had a connection. Brian Hogan at Pragmatic Programmers talked with him. We hadn't really met before that, but we had mutual friends and he really encouraged me to submit a proposal there and shared with me a bunch of the benefits of working with that publisher.

Speaker 2: 7:18

And I started to explore that and got really excited because I also purchased a lot of Prag Prog Books, especially when I was learning Ruby and considered them, at least within the Ruby world, to be a great publisher, and what really stood out to me with them was the amount of support that they provide the author, so they have a development editor that you work with through the whole thing. So you write a chapter and then the development editors focuses on the narrative and says, hey, this doesn't really make sense. We launched into this topic but there's no kind of explanation or it doesn't sound a sequence or whatever, and it's always excellent feedback. You're like, oh yeah, it's the kind of thing that's hard to do when you're the author also. So that's been really huge.

Speaker 2: 7:58

And that's been Don who had been working with that Prag Prog. So to wrap this up, I guess. So I submitted the proposal to Prag Prog. They had a little bit of a different setup, but had it accepted there too. Very exciting. And yeah, we kind of specced out all the chapters. And then it was this year plus blast of writing trying to maintain my family life, my full time job and, yeah, quite challenging and time intensive, but really happy to be on the other side of it now just about done.

Speaker 3: 8:24

Yeah, that's awesome. It has been a long process and I remember I think he reached out to me, probably about over a year ago now, saying, hey, I'm like considering writing a book. That's a lot of effort, but from what I've read of it like it's very high quality you can tell that you've put in the effort. I think, too, it's interesting that you were talking about like it was really like 2020 pandemic, when you like really started to get into Postgres or like really kind of get into like the internals and stuff or like lower level concepts. Because, yeah, like, whenever I hear anyone talk about being an expert at the database, I'm like, oh, they've been doing it like 15, 20 years and that's just what I assumed with you because you were so knowledgeable. So it's amazing to me, like how much you've picked up in that short amount of time.

Speaker 2: 9:03

Thanks, yeah, I mean there's all kinds of layers of expertise. You know what I mean, and I have been doing Rails since my first paid Rails job was like 2008. So much longer. More like on the 15 year timeline Right, there was a few years where I was, there was a couple of years where I was not working with Rails, but for the most part the last 15 years has been Rails and so I kind of saw this overlap.

Speaker 2: 9:24

If you had this middle ground between Postgres and a full stack web framework, but more obviously backend oriented, where there's all kinds of knowledge around operations it's more at the user level of Postgres but all kinds of knowledge around, like tuning the queries you write, having an understanding all the way from active record to how those queries execute and whether or not there's optimizations you can make through the instance or to parameters or all kinds of things.

Speaker 2: 9:53

I thought that was an opportunity and I think at a lot of companies that's been a way to level up within the IC track as a backend engineer is to have kind of operational expertise.

Speaker 2: 10:05

So partly why I want to tell the pandemic story is because I think a lot of databases topics can be pretty abstract and can be kind of this is what you would do if you have this scenario and a little bit conceptual, and we really were facing a lot of those scenarios and then it was like, oh okay, now I get it Now or now I understand why this capability exists If you have to optimize rights or you're trying to optimize reads or whatever.

Speaker 2: 10:30

So, from a book marketing perspective, hopefully the book is useful to people facing these kinds of challenges, and then we also kind of position the book as could be aspirational. It could be that you want to get into some of these challenges, but the truth is you may not have them at your workplace. It depends on if you're working on a smaller scale app with lower query volumes, with databases sized, say, less than 10 gigabytes or something. You're not going to be pushing the boundaries of your Postgres instance. But there's still loads and loads of content in the book that I think you'll find useful. I think a reader would find useful to optimize at their particular instance level and have a lot of headroom on their instance so they can do things like big backfills or online migrations or things like that.

Speaker 4: 11:16

I have some more Railsy questions for you and some stuff around, some of the Rails world stuff that was announced. But before we get into all that, dhh has always been like yeah, we use MySQL. And then PlanetScale was like yeah, we're going to make MySQL cool. And now MySQL feels like the cool thing. So I'm just wondering why would you still recommend people use Postgres over MySQL?

Speaker 2: 11:37

I'd say that a lot of people will probably join a company where, if they're using a relational database, it's already been decided, so they might not really be choosing. But if you're setting up a brand new project, I think there are some good reasons to choose Postgres. I think that it really depends on the context of who's evaluating. If you're more of a back-end engineer and if you want to do more product development, then you can kind of operate probably either one of them without a huge amount of differences. If you have a preferred hosting provider, if you are happy with PlanetScale they have the test for sharding that's built in. They have some query cache kind of boosting stuff, and I think you can find similar things in the Postgres world. But you got to probably it's more like a little bit legos you got to put together some things yourself a bit. If you're kind of like oh, I have a lot of experience with a major cloud hosting provider of Postgres, like it could be AWS or Google or Azure, both of them are going to be comparable in terms of cost and it might be more about what kind of integrations with. If you want to stay within one of those ecosystems, you want to bring on more engineers and have access, management and logging and metrics all kind of work the same way. I'm most familiar with AWS, so if you have other AWS services it can be nice to have everything in kind of one console spot. As far as Postgres itself, with Rails there's a few things that you can dig up that they're kind of less common. Let me give you one less common thing that is thrown around, as there's a bunch of different database constraint types, there's a constraint type in Postgres that's exclusive to Postgres.

Speaker 2: 13:12

Maybe a more interesting thing is Postgres has this built-in concept called transactional DDL, which means that when you do like a Rails migration and you add a column or you create an index or something that's a DDL change or a structural change. The way Postgres works is when that change is applied. It either fully commits or fully rolls back and there's. You can't be in like a partially committed state, and in other database engines that's not the case. They don't have this concept of being able to perform DDL change within a transaction that rolls back. But this gets into the weeds of like databases. People like that kind of thing because you can also do explicit transactions yourself and make DDL changes that you manually roll back. If you want to add an index, if you want to mess around with structure, but roll it back, you have that ability. But day-to-day active record, migrations and stuff like that it's almost more like a nice thing that you just take for granted. It's just kind of there behind the scenes.

Speaker 2: 14:14

But it's a good question. I think my answer is a little bit unsatisfying. I mean, back in the day Heroku was for the Rails world was extremely popular platform. It's kind of become less popular, I think, with post-salesforce acquisition, post-removal of free tier and things like that On Heroku. Of course it was really helpful to have Postgres operational knowledge so that you could make sure that you could easily operate your database, since they at least initially launched. I'm not sure if they ever supported other relational databases, but I know that when they launched they were just Postgres. I think they're just not as well known in the Rails world.

Speaker 2: 14:51

But there are some exciting startups building on Postgres. And another way to answer your question might be is there some kind of leverage I can get or some kind of well-tuned Postgres stack or a cost-efficient stack or something? There's three I can think of, though. One is Superbase. They're built on Postgres and they have a lot of great technical blogs and a lot of great extensions that they support. That lets you take a Postgres instance and use it for full-text search or use it for vector similarity search or different things like that.

Speaker 2: 15:22

A newer one is Tembo and they did a rebrand a bit ago, but got to meet some of the folks over there and what they're trying to do is interesting. They're taking stock open-source Postgres and then they're creating different stacks for different types of web applications or different types of applications. So one might be OLTP what we would think of as a Ruby on Rails app a lot of high-volume short transactions, people logging in, people creating things that sort of thing OLTP workloads and then they'll have a different one that's more for analytical queries, where you have give me 5,000 rows and do some calculations on them and dump a huge file. And Postgres is it's both a blessing and a curse that it's out of the box. It's a pre-minimal and it's got several hundred parameters you can tune and it does demand a little bit more. But on the opposite side of things, it's very flexible and you can use it for a lot of different types of workloads. So that's kind of where Tembo is trying to target their offering.

Speaker 2: 16:21

And then another interesting one is Neon. I don't know if you guys have seen Neon, but one of the big things with AWS's Postgres offering was Aurora, which one of the main features is separating the compute from the storage. So, databases you're writing all this data, you're reading all this data, and you kind of want your storage to scale up and down and that kind of thing, and ideally it's kind of separate from your compute, like you might scale up your compute for different times of the day if you have e-commerce or Black Friday or something like that. So Postgres doesn't work that way out of the box though. So Neon, what they have done is they're kind of doing the separate storage and compute. They're at least partially open source. I believe they're offering a hosted version. I haven't really tried it out myself and then only throw one more company and I'll wrap that up.

Speaker 2: 17:06

So another company also met them recently Hydra.

Speaker 2: 17:10

There's a whole other world from what we think of in Ruby on Rails, with web app transactions, all of the analytical transactions.

Speaker 2: 17:17

So what Hydra is doing is they're taking Postgres and allowing you to do what they call columnar tables.

Speaker 2: 17:23

So instead of tables that have rows, you have tables that have columns as your primary mechanism, and the reason to do that is you get much better performance for calculations across really large sets of data, interesting in that, they're offering their service as an extension to Postgres.

Speaker 2: 17:39

So if you can set up any extensions that you have, or if you can self host Postgres and you can install extensions, you can set up Hydra and you can actually just convert one or two of your tables to this kind of columnar style table and you might want to do that to have these really big analytics queries that you run every day or something like that. So, in closing, I guess there are some interesting startups building on Postgres, pushing the boundaries of what it's doing, and some of those things kind of feed back into its upstream release cycle as well. Folks, I think, in the Postgres community consider it to be a pretty exciting time, similar maybe to how I do feel, like there's some similarity in how Rails feels kind of exciting again now, right, like with the 7-1 release, and that's how I would answer that, I guess.

Speaker 4: 18:24

So I have a question about solid cache, but before I have a selfish question that I thought of while you were talking. I learned about unlogged tables recently and here's my question to you Could I speed up my tests by telling Postgres to only use unlogged tables, like in the test environment?

Speaker 2: 18:41

Yeah, that's actually in. Also have a few connections over at Crunchy Data, but they just wrote a great post. We can send you a link. It might be their most recent post on their tech blog, but that was one of the things they showed in. There actually is there's a new. I think it's an active support. There's a new setting. That's something like create unlogged tables.

Speaker 4: 18:58

Yes, I saw that setting recently.

Speaker 2: 19:01

Yeah, incidentally, I tried it out recently and in a test app I have where the test suite runs pretty quickly, I didn't really see a big difference. But I'm also using the. It's actually the app for the book. It's using the SQL structure format, the db slash structuresql file, and in Postgres unlogged tables. You can create a table as unlogged by giving the unlogged keyword when you create the table, and I'm wondering if it creates the tables. Yeah, I did check and what I found was actually, with that setting enabled, I didn't see the table was unlogged. It was a regular logged table. So I needed to look into that a little bit more. But that is what's advertised. That one thing I'm wondering is if it relies on the db, the Ruby schema format, and that's just maybe like not documented.

Speaker 2: 19:49

I was going to dig into it a bit more. But that is what's advertised and I can say from my experience with unlogged tables for other use cases the speedup is really, really significant. The tradeoff is you lose what Postgres docs call crash protection, which for tests it doesn't matter. It's totally right.

Speaker 4: 20:06

Exactly. Yeah, I feel like I tried to turn it on in our CI but then didn't see much of a change, so I need to go back and try that again.

Speaker 1: 20:16

I just want to take a second to thank our sponsor, honey badgerio. Do you currently use one service for uptime monitoring, another for air tracking, another for status pages and yet another to monitor your cron jobs and microservices? Paying for all of those services separately may be costing you more than you think. If you want to simplify your stack and lower your bills, it's time to check out Honey Badger. Honey Badger combines all of those services into one easy platform. It's everything you need to keep production healthy and your customers happy. Best of all, honey Badger is free for small teams and setup takes as little as five minutes. Get started today at honey badgerio. That's honey badgerio.

Speaker 4: 20:58

So I was sitting in the keynote for Rails world and DHH is like Redis don't need it, we can use the database for some of the caching stuff. And my brain, as someone who y'all have been doing this much longer than me and my brain is like what is happening right now. It's shattering my understanding of the world Because I'm like I thought we use Redis for caching because Redis is fast and I've always heard that in Rails the database is always the slow part. So I'm confused about why we're going back to the database for this.

Speaker 2: 21:27

Yeah, it's funny because writing a Postgres book I was drinking the Kool-Aid and kind of in a way. I know I'm in a little bit of an echo chamber now but I over the last. It's actually been about 18 months since I started and in that time there's been some interesting posts that kind of validated this idea that like, hey, you could actually probably just use Postgres for everything. Like a lot of shops have multiple databases, they have what they would call heterogeneous databases, like completely different paradigms, like document-oriented key value store. So a recent job we had Elasticsearch and Redis and there were some discussions about other places I've worked might have Kafka or, like you mentioned, mongodb could be used for different things, and I think actually I noticed DHH mentioned this and Christopher over at CrunchyData mentioned this in a presentation as well.

Speaker 2: 22:22

But I'd say the biggest change that's easy to understand for me is how much faster storage has gotten in the last decade and you can look at these staggering benchmarks where it's like order of magnitude improvement. And then I liked how in the keynote there was a discussion about how like the world has changed from when some of these technologies were created, where storage was slow. So we wanted to have an in-memory store so that we had really fast reads and writes. And at a certain point, when storage gets so much faster not as fast as memory, but faster as well let's actually be a little bit more skeptical or critical of this challenge around operating multiple databases. And like one of the big ones, that's a big reason to just do everything in Postgres or any relational database, like you could also say, do everything in MySQL, is if you have to synchronize data to another database. You're going to be building a bunch of stuff, you're going to build a sync pipeline and then you're going to have likely inconsistencies. You're going to have this row in this relational database that's represented by this object over here.

Speaker 2: 23:25

What happens if it goes down for a while? Okay, now we have to run a backfill and a sync? What happens if that big batch job like then affects the performance of it and it doesn't serve queries Like okay, now we've footgunned ourselves. So there's a lot of operational challenges with multiple databases. And then there's kind of that when you're within the same instance, like if you were just within a Postgres instance you have the concept of a transaction. So you can say in this transaction do a bunch of stuff but do it all together or don't do any of it, and that's something you give up when you start to move to multiple databases. So if we look at background jobs and caching and some of these things we might use other databases for it is worth considering. Can we trade off a little bit of performance, just pure write and read speed in order to get some of these benefits like transactional consistency, less operational burden, probably less cost?

Speaker 4: 24:19

So, yeah, I don't understand how SSDs relate to Postgres performance.

Speaker 2: 24:24

Oh yeah, well, so every row that you're writing let's just say you have a blogging app and you're all these posts and comments, easy, classic example All of that stored in disk pages in Postgres and storage which would be. It's not a spinning disk, but it's like a SSD drive in a data center somewhere and on the operating system. Postgres also has an operating system level cache and then all of the data is stored in your table data. But then, as we know, for queries, we want fast query performance for the web applications, so we build indexes. So indexes a whole nother topic we can get into, but indexes are going to be critical for high performance queries. That's also just another data structure on that same storage medium, and Postgres has its own concept of caching. And then there's also the OS cache.

Speaker 2: 25:13

So really, what you're doing often when you're running these queries is the Postgres query planner is trying to say where is this data Right? I need to fetch Andrew's posts. So he wrote a query that's like select star from posts where user ID equals Andrew, and so the query planner says, okay, I'm going to basically filter down. Let's say there's no indexes. I need to go and load every row of the table for posts and then I need to filter it down to just Andrew's. So it's going to read all of those pages for where the posts are stored, which is hidden from us. We just think of tables and rows and columns, but under the covers is a bunch of data structures. So it's either going to do that or it's going to say, oh, we have an index on the user ID, so I'm going to access the index. I can go really quickly right to where Andrew's posts are, and then I'm going to still read the table data, but I know exactly where to go to get it because I use the index to do that initial filter step.

Speaker 2: 26:06

So anyway, if you think about that, all of that is disk IO. The examples I was giving were more reads, but if you think about when you're creating posts or creating comments or think about like with social media apps, you have a really high volume of rights, right, you have people liking, hearts, reactions like retweeting, like you have a lot of rights happening and from the database's perspective, that's also disk access, right, like it needs to go and put a row right into a page for where that row is. It needs to go and possibly update an index that reflects that row change. So those are all the rights and then simultaneously you have all these queries and all of that's just like IO input output on the disk. So even if no other changes happen to Postgres over 10 years, but disks got 10 times faster, like Postgres, and everything else running on the hardware gets faster.

Speaker 4: 26:58

Today I learned I was talking to someone the other day about solid cache and they were asking me like I don't get, like how the relation? And I was thinking I was like I don't really either. So thanks for explaining that, yeah hopefully that was useful.

Speaker 2: 27:11

And then this is something that I'm not super expert on, but I've learned over the course of my learnings with Postgres as well. But if you compare Postgres to an in memory data store like Redis, what's usually talked about as in memory data store I think there was a point in time where I thought, oh, postgres, just everything happens to the disk Right, like it's always, like a new row is written or an update, it's going to be like a disk right. But that's not the case. Actually, it's also everything's really just in memory as well, and then it's periodically flushed to disk, like essentially asynchronously. So just in terms of a mental model, it's not really. Again, this is a little bit of a highlight, like you know. Someone else might be able to nitpick this a little bit. But in terms of a mental model like, I think of them both as being primarily in memory, like main access, at least for the rights and then for the reads. What Postgres tries to do is, every time you run a query, it actually tracks all kinds of data about what is happening and it tries to put that same data because it's likely it will be queried again into a hot path like a prioritized place. So it has its own shared memory, called share buffers, that whenever you so if you access a row like it's going to be in. It's a limited space but it's going to be put into this memory and that way the second time you read that row it's actually going to be like incredibly faster because it's just coming straight out of memory. If it's not something you've accessed in a year but it's in your database, it's going to likely be a discreet. In that case it's going to find it and then go and figure out where it is and pull it. But again, the second time you access it super fast. So those are some of the other ways and this kind of gets into beyond the scope of the book, but I do have some references to some great. There's even a great online book about Postgres internals, like the storage internals, where you can read about that kind of stuff.

Speaker 2: 28:56

But yeah, this book is targeting more like I think it's good to know some of that because it helps inform. Like if you're digging into the query planner and you're like why is this query slow and there's layers kind of of learning about, oh, it's reading off of the disk. Oh, it's reading off of memory oh, it's a lot of IO. Oh, it's a lot of CPU. Like it's building an index. That's slower. There's other things that are CPU intensive versus disk intensive. So I think it's kind of good to know about that stuff.

Speaker 2: 29:23

But as a Postgres user operator, the idea is you don't really need to care either. Right, kind of like a web framework, like we don't really need to know about all of the nitty gritty details about the HTTP spec, like we have a request object and we can read headers and some things off of it. But with the database side there's a similar concept where you have these abstractions, tables and things like I mentioned. So it's kind of where I'm at. Like I'm interested in learning more about the internals. But this book again is more aimed at I want my Postgres database to run reliably. I want to be able to scale it if I need to, I want to squeeze everything I can out of it and I also want to make sure my application side is good. I want my queries that ActiveRecre generates to be reasonable and I want to understand what they do. That kind of thing.

Speaker 4: 30:10

What do you think about the average Rails developer? I don't think about the performance of my database all the time. I'm like, oh, I need to get these records. I'm going to slap a wear on there. Maybe I do. An includes, maybe I don't. But what are some average everyday things that, as a Rails developer, you couldn't think about? To leverage better performance out of the database and using active record.

Speaker 2: 30:31

I was thinking about that too. It's a good question and I think that the first advice might be surprising, but it's actually not really to focus on the database performance. It's really to focus at your application level. So if you have an APM tool like New Relic or Datadog or something application performance monitoring tool for a Rails web app, you probably want to look more, at least for backend performance. You want to look more at the API level and then dig into whether or not the database query is a significant portion of a poorly performing endpoint or not. It could be something else, like it could be. We're serializing a thousand objects and it's all CPU time, but there's one fast query. So I'd say, start there.

Speaker 2: 31:12

But let's say then you've dug in and you have this important endpoint that, like for your application, is used a lot by customers and the queries are slow. You're seeing like multiple seconds which for a real, you know a web application is slow. Even half a second is slow 500 milliseconds and the challenge to it a lot of companies is if you really want to prioritize that engineering work, that kind of performance engineering work, it's going to be very context dependent. If you have business app that gets occasional use. Maybe it's fine. Say that in my experience, e-commerce, or if you have the fortune of being very successful as an app with a lot of traffic, you're kind of forced into the situation. But if there is a desire to say, ok, yeah, this query is 500 milliseconds, can we make it five milliseconds? Then I'd say likely. You're writing active record, so you want to go and see. Is there anything? Even if we don't talk about indexes, is there anything I can do to select fewer things? If the query is slow? Is there conditions you can add to the where clause or the join conditions? Do you really need to join all the tables you're joining? Like, maybe there's five tables being joined and maybe actually, when you go and look at it, there's only data needed from two tables, like in the view or in a serializer. Remove the three joins. Maybe there's 30 columns in the table or 50 columns Some of them are JSON columns and some of them are from a feature that is dead. And you go and look in the serializer and you're like well, we actually only display three things. Just select those three columns. Then that can make a really big improvement. But there's a limit to what you can do there. But again, going back to our discussion of IO, you always want to be thinking about, when you're optimizing queries, like how do I reduce what I'm asking for? Like, I ask for a smaller thing, it's faster. So, yeah, fewer tables, fewer columns, fewer rows.

Speaker 2: 33:04

Then you get into more of the weeds though about, let's say, you've cut it down as much as you can. Then you want to look at your indexes on the table and ideally look at your where clause, so like where user ID equals one and user is active, generally anything in your where clause. You want to have an index that is built, that has its definition, that matches those conditions. So you might want to index that's like in this example, on the user ID column, and that would let you quickly zero in on just. You know that query can efficiently retrieve just that user's rows, and so that will then make what they call, instead of a full table scan or a sequential scan, that will then use an index scan, and then there's an additional optimization. Where this is going to be, though, you're going to possibly need some database tooling to get at this info, but with database indexes you can actually, if you think about it, you can think of them as kind of like a cache as well. You can put the data you need for a query If it's a very high performance or like a high call volume query, like it's called thousands of times, possibly a second, and you think about it and it's got two or three fields, you actually can put those in the definition of an index.

Speaker 2: 34:12

So you'd have a custom built index just for that query and that's going to be very fast. That's about as fast or as efficient as you can get. So yeah, there's kind of a spectrum of like start from your application level, determine if it's database or not. If it's database, dig back to the query the active record code that generates it. Try to cut down what's being fetched at one time. Once you've done that, then go and look like, okay, we're pulling the minimal set of data. Let's look at our indexes and try to optimize our indexes, and there's a bunch of tools for optimizing indexes and things. So, yeah, those are some high level, generic kind of tips.

Speaker 1: 34:49

What about, like Basecamp talks about doing, was the Russian doll cashing and strict loading is built into active record. Maybe you can talk a little bit about how that affects possibly requiring stuff too trying to think about do we need to load all this data because we've already got it in cash? But maybe we do in order to check the cash to see if it's cashed.

Speaker 2: 35:10

Yeah, it's a good point and actually I kind of have gone. So caches are super important for performance and there's all kinds of different caches and in the book, like there's a bunch of them are covered. Even the application ones are the ones that Ruby on Rails provides. So what I said before was more like thinking about things just with the capabilities of the database, but we have the benefit of using Rails as a framework, so we're not limited to only Postgres and we can use the query cache, we can use countercache columns and there are some examples in the book of where those are valuable and you can even there's hands on examples where you can turn them on and off and check the performance back and forth if you want and see the effect. There's also a prepared statements, which are a way to take a query and separate out the parameters and not need to generate a new query plan on the fly but keep parameter list version of the query in a cache, and Rails supports that as well.

Speaker 2: 36:07

Regarding strict loading, active record is a nice way to write queries with kind of like a navigating different associations, like I want to get the post comments or I want to get the social media profile of the users or something like that. But for those that aren't familiar with strict loading out of the box, you can do this lazy loading in active record. That then can contribute to this n plus one query problem where you get. You know, you load an association and then there's a bunch of repeated queries to that associated table. That could all be one query if it was loaded up front.

Speaker 2: 36:38

And Rails encourages you to solve that by using eager loading. Though you say, oh, I know I need posts and comments and social media profiles or whatever, and so I'm going to eager load using the includes method or preload those associations, so comments or whatever, and active record is going to take care of determining for me if that's going to be another query to that table at that time, or left outer join or a different type of query. But I'm not going to have that problem with those repeated queries and with strict loading. Then you can actually say, like, actually I want this to always be the default, I don't want to allow lazy loading, I want anyone that uses this code. And actually I think, chris, you showed that in your Rails world talk, even on the association, which I just watched yesterday, so it's fresh in my memory cache yeah.

Speaker 1: 37:26

Yeah.

Speaker 2: 37:27

You showed it on the association, which I didn't show in the book, but that's pretty cool too. I'm not setting it globally in the app, which I think you can do as well, but I'm going to set it just for this association. So anybody that's loading this is going to be forced to eager load and they're not going to have that M plus one query problem.

Speaker 1: 37:44

I don't know if you've seen there's the mode option Like if you use strict loading as like a scope, you can pass in the mode there's not actually a configuration option for that globally or whatever to set N plus one only Because I guess it will handle literally any time you access the association by default.

Speaker 1: 38:06

But you can also set it to be like well, only raise an error if we're doing an N plus one situation, like you can enable this on like the belongs to, like the most examples show it. Like on a has many, but really the belongs to where you're like here's a post, let's make a query for the author, let's make a query for the next author and whatever we may have efficiently loaded the posts but not the authors, and it can catch just that situation. But if I remember right, if you loaded just one post and tried to access the author, it's going to make a separate query, no matter what. So the N plus one only mode I think would not raise an error in that situation. It's like well, only if you had a whole bunch of posts and then a whole bunch of authors would it raise that error. But I was kind of surprised there's not a config option to set that like globally. So I don't know entirely if that's not used very often or what.

Speaker 2: 39:03

We hadn't yet used it.

Speaker 2: 39:04

We were using you mentioned this in your talk as well the N plus one detection gems like bullet is popular and then in the book I recommend the I've never heard it pronounced, but prosopite, I believe, is how I say it and another thing that the documentation for that gem has that I always wanted to get set up but we didn't really quite get there.

Speaker 2: 39:23

Maybe I'll set it up, for the example app is to do that N plus one detection in your CI and then you get a little bit of earlier feedback of at least discovering those or maybe configuring it to raise an exception and fail a test. Or you would do that by configuring it to fail a test so that you could go and fix that up, do an eager load. But yeah, I think if you're just one developer, like it's challenging to try to. In my opinion, like you can't really care about everything performance wise, you're probably going to focus more on application level concerns. But there is this whole world of being more focused on database optimizations and then the interplay between the queries of the application and the database. That I think is interesting.

Speaker 1: 40:05

Yeah, it's like a scale that moves over time where, like you're building your app for the first time, you just need it to work.

Speaker 1: 40:11

You don't have hardly any data. So who cares if we're doing N plus one queries but over time, like we got millions of rows in the table. Well, it's helpful if our entire team knows using strong migrations gem to make sure that nobody makes a mistake and takes down production because it's creating an index and did something very inefficient or blocking while your application still needs to run. So I feel like the same thing applies to the N plus one queries where it's like at the beginning, do you care? Probably not, but you can go through and turns out this is a problem. Let's strict load on this association. We don't need to worry about everything, but you know, eventually you get probably to a point where it's helpful to enforce that for every developer on the team, because I assume everybody's going to have different levels of experience with databases and stuff. So you can kind of teach them and enforce that stuff together by using strict loading or strong migrations or whatever else.

Speaker 2: 41:16

It's probably a good thing to do, but not necessarily a thing to do at the beginning, because you may like a one thing yeah, you actually you jog my memory now with what I was going to say too which is, if you have the privilege of working on a team, you'll probably have I do encourage all developers to try to become aware of their production environment, and I have worked with teams where that's not necessarily the case, like CI is good and finding N plus one queries is good and that kind of thing. But if you only have a right code and run your unit tests and kind of ship it to production but you don't really have an awareness of the production characteristics, you're missing out on a lot of what we're talking about in the conversation today, where your code and your queries and that kind of thing could be much more efficient. And really what matters to end users is how it executes and runs in production right. It doesn't matter if it's how it runs on your local machine. And the other thing, too is for bigger organizations.

Speaker 2: 42:09

You might have multiple deployments of your Rails app and there could be different levels of like different hardware capabilities in those deployments.

Speaker 2: 42:15

Like you might have twice the CPUs on one or half the memory or that kind of thing, and that is a real thing I've faced and that can make performance optimization challenging as well, because then you kind of want to optimize for the lower resource setup so that it works well, and then that kind of thing.

Speaker 2: 42:33

But yeah, I think it's important to, like you said, chris, like there's a spectrum of moving from some of these like lower hanging fruits, like eliminating n plus one queries and then having awareness of, like, how much data your queries are pulling and that kind of thing, and then kind of getting more into the operational side with indexes and possibly we didn't really get into this but like read and write splitting, having a primary instance and a replica instance with queries that run on a replica to separate those that load right, so you don't have all of the load on one database server instance. So yeah, there's levels, I guess, of which problems are appropriate for your organization, like that are you're actually facing, and then choosing the right set of techniques to improve things that are going to make a real difference.

Speaker 1: 43:22

Yeah, there's a rabbit hole you are preparing yourself to jump into. If you want to look at all of this stuff all of the time, you'll probably never build any features if you're constantly worried about it. So I think it's one of those things where you, as junior, you like start learning about all these different optimizations and you want to do everything perfect and do it right. But it's like, well, in some cases, if you do it this way, it's going to be a terrible situation, because it really is a trade off in a lot of cases on speed of development or whatever.

Speaker 1: 43:56

But as you get into your large application, well, things are a lot more set in stone which means that you can invest in those like performance optimizations, and that'll be a good thing because it's not likely to change a whole lot.

Speaker 2: 44:12

Yeah, that's a really good point, and part of this too is like the book is aimed at an intermediate audience that has used Ruby on Rails and use Postgres and operates it, and the reason for that is because this book really isn't for day one, folks.

Speaker 2: 44:25

It could be if you really want to leverage some features that are part of your product development, like some database features, you want to use a view or a trigger function or something like that, but it shouldn't really distract from your differentiator as you're building your application. Like you want unique value for your application, the database is one of the tools that's part of the technical stack. So then, doesn't that kind of contradict. What is the purpose of this book then? Well, organizations, over time they calcify. Change becomes really difficult. So it's like you can't just swap out your relational database. That's a big change, right? You can't just immediately improve all of the performance of all the queries on the instance. That's also a big change.

Speaker 2: 45:08

So if you are in a team and you are a back-end engineer and you want to essentially do some of those fixes so that other people don't have to or care about it too and that's what I've been doing in my career lately, and a little bit more of also, who this book is aimed at. Where you've got plenty developers, you're a Rails shop and you're debating, do we need to scale up our instance? It's going to be another 20 grand a month, or we can try to fix these queries, make them run faster, eliminate them, drop some tables, archive some data, some things like that, which are all things that are covered in the book, and it's like, oh, we don't actually need to, we just saved scaling up to this instance and that 20 grand a month or whatever.

Speaker 1: 45:48

So yeah, it could be 20 grand one month to invest in those optimizations. And then you save the 20 grand every month after that which David's been talking about at base camp, the moving off the cloud, and just the sheer amount of monthly fees now that they invest on buying hardware for the next five or 10 years or something, and that gets amortized across and it's good. I think too, if you're on a smaller team or by yourself or whatever, reading this book is important because you can be aware of the tools you have available. You may not need them right away, but you could plan for them. You can think about okay, my application right now doesn't exist, I'm building it, but I do know that it is going to be heavy on rights. So we should make sure that we think about how we design it early so we don't have like well, you don't walk ourselves into a trap later where we're like oh God, everything falls apart all the time and you can prepare yourself for it.

Speaker 2: 46:52

Yeah, thank you for that. It's a good way to frame it and I think what it does to, or what I was thinking when you're saying that, is like it forces some awareness on, even if you didn't do anything specifically to optimize rights for that example, but you had an awareness of what your current setup was. That's a win, right? You're like, okay, we have 100 commits a second on our database. I can go and I know where to look for that and I can see it, and I can see that, like, our CPU is fine, everything's fine, we got enough headroom. If that's something that a reader didn't know how to do or think of before, and then they read the book and they do that after, I think that's a win for them. Because whether or not you go for all the optimizations, a lot of it's engineering work, it's challenging and it only really makes sense if there's a purpose, like a business purpose. So first step is awareness. That's how it always goes for me anyways.

Speaker 4: 47:41

I have one quick question for you Structuresql or schemarb?

Speaker 2: 47:46

Structuresql all day, all right.

Speaker 4: 47:49

And that's it for today.

Speaker 3: 47:52

Whoa Damn son. I have a very important question how are you feeling about the Timber Wolf season this year?

Speaker 2: 47:59

Who so excited? It's possibly Jason. I forgot Jason. You and I are both NBA fans and a little bit of a fun rivalry, at least in my head. I don't know about your head, but Grizzlies, timberwolves, they're kind of Midwest-ish teams.

Speaker 3: 48:11

Yeah, the first I think, right as we became friends was the year we played each other in the playoffs.

Speaker 2: 48:16

Yeah, Really excited. I hope to get to a few games, and last year I took my seven-year-old daughter to her first game and I was like this is what we do in our family we go to the NBA game. No, it's good.

Speaker 3: 48:27

But yeah, I've been trying to do that my kids could care less. Yeah, oh, basketball again.

Speaker 2: 48:32

She was like let's go get a stuffy and some nachos and stuff. So that's cool too.

Speaker 4: 48:38

Yeah, whatever Sun's Nation Sun's.

Speaker 2: 48:41

Nation. Yeah, quite a stacked team over there at Phoenix too.

Speaker 4: 48:44

Damn right. Yeah, wow, kd Nation.

Speaker 3: 48:49

Well, Andy, where can people find you online?

Speaker 2: 48:52

Yeah, I'm still using Twitter and X quite a bit lately to promote the book and stuff. So it's Andi A-T-K-I and Atky on there and then I blog at AndiAtkinsoncom. So there's probably the two of the best spots.

Speaker 3: 49:07

And then with the book. Where can people pick that up at?

Speaker 2: 49:11

Yeah, so it's Pragmatic Programmers and the publishing portion is called the Pragmatic Bookshelf and the website is pragpragcom, p-r-a-g, p-r-o-g, and if you just search Postgres on the site you'll find it. I think there's only a couple of databases books that they have published, so you'll be able to find it there. And then I also I think I'm going to keep doing this, but I started a newsletter related to the book. Initially it was to like in a pre-launch, but now I'm thinking like it might be fun to keep it going a bit for people that want to hear about, like additional content that's related or relevant. So that's at PGRailsBookcom and I'm also using that hashtag on Twitter like PGRailsBook, with some tweets and stuff related to the book.

Speaker 3: 49:55

Awesome, well, thanks for joining us and answering all of Andrew's questions, and I'm sure he has more.

Speaker 2: 50:02

You'll hear about soon, but it's been awesome chatting with you. Thanks for the opportunity. Guys Really appreciated it.

Remote Ruby

Unleashing the Power of Postgres with Andrew Atkinson

Listen to this podcast on