S01E48: Xavier Shay

0:00 About about Xavier Shay
0:38 Xavier’s introduction to open source
1:19 What is DataMapper? What are the goals of DataMapper?
2:52 Xavier’s work on the analytics team at Square
3:55 Technologies Square uses: Ruby, JVM, Cube, D3
4:50 Impressions of and use for various JVMs: Rubinius, MRI, JRuby
6:10 Xavier’s interest in speeding up Rails startup time
8:20 Protecting yourself from data integrity problems and the importance of using foreign keys
12:35 Considering tradeoffs of various databases
14:41 Xavier’s favorite talks from RubyConf 2011

Links About Xavier Shay \"Twitter:\":http://twitter.com/#!/xshay \"Github:\":https://github.com/xaviershay

\"DataMapper\":http://datamapper.org/

\"Square\":https://squareup.com

\"Cube\":http://square.github.com/cube/

\"d3.js\":http://mbostock.github.com/d3/

\"Rubinius\":http://rubini.us/ \"MRI\":http://www.ruby-lang.org/en/

\"JRuby\":http://jruby.org/

\"Xavier Shay’s Blog Post “Speeding up Rails Startup Time”\":http://rhnh.net/2011/05/28/speeding-up-rails-startup-time

\"Xavier’s Workshop “Database is your friend”\":http://www.dbisyourfriend.com/

RubyConf 2011 \"1. Greg Moeck’s “Why You Don’t Get Mock Objects” presentation\":http://confreaks.net/videos/659-rubyconf2011-why-you-don-t-get-mock-objects \"2. Brian Ford’s presentation on improving tooling for Ruby “Nikita: The Ruby Secret Agent”\":http://confreaks.net/videos/691-rubyconf2011-nikita-the-ruby-secret-agent \"3. Dr. Nic’s “Threading versus Evented”\":http://rubyconf.org/presentations/18

Eric: Hi, my name is Eric Jones, I’m the database administrator at Engine Yard working primarily in the support organization but also helping assist with some of the back end development and on the database – and data tier. I’m here with Xavier Shay, we’re gonna go over some questions and whatnot. Xavier, do you wanna introduce yourself?

Xavier: Yeah, hi my name’s Xavier Shay; I’m currently living in San Francisco but I’m originally from Melbourne, Australia. I’m technically with the analytics team at Square at the moment.

Eric: Cool. So you’ve done a good bit of open source work, when were you first introduced to open source in general?

Xavier: So I actually don’t really remember. Back sort of at university I was sort of playing around with stuff. I think it really sort of started taking off when I got a GitHub account because that’s basically the earliest record of stuff that I can find on the internet that I’ve done and that started off sort of with Ruby on Rails back 2006 I guess.

Eric: Okay.

Xavier: So yeah, I guess Ruby’s kind of been the main contribution. Before that it was just random stuff like AngelScript which probably nobody has ever heard of. I did some work with LiteStep and other kind of random projects but I can’t really find any record of that anymore so. Yeah, that’s – and so mostly Ruby since then.

Eric: Cool. So you’ve been DataMapper contributor, give us just like a description of what DataMapper is, why you consider it to be important?

Xavier: Yeah, for sure. DataMapper is an ORM, some of the same class as say ActiveRecord. It sort of has a couple of different goals I guess. So DataMapper works much more closely with your database. For instance the ActiveRecord line is kind of – your database is this big hash in the sky, right? Whereas DataMapper takes the opposite approach and says well, you’re probably using foreign keys and stuff your database and we wanna be able to map to that. Furthermore, DataMapper actually goes further and says well, we don’t really even care whether you’re using a database, so you can have like a data app adapter for instance say Git or IMAP or any sort of random background data store that you want. And so for me I think they’re important, 1.) Because it gives you consistent access to a whole heap of different things, 2.) Since it’s really good at mapping database conventions, it’s also really good for legacy databases that you don’t necessarily have control over, 3.) And three is also just does a few other things which I think are really nice like the introspection on it’s really good. You declare your properties in your models which I kind of like. And it also means you can actually operate, especially once again for legacy databases, on like half of the table and DataMapper will just ignore the rest for instance.

Eric: Okay.

Xavier: So yeah, I think it’s a really cool project from that perspective. I should mention that I not actually really an active contributor at the moment. I was using on some previous projects; I’m not actually using it currently. So for instance I was involved in the 1.1 release but the 1.2 release I didn’t really help out with that. That’s what I just – I don’t have a project that’s suitable for it at the moment.

Eric: Okay, cool. And you mentioned a minute ago that you moved to San Francisco to work with Square. Can you tell us a little bit about what you’re working on there?

Xavier: Yeah, definitely. So I’m – as I mentioned I lead on the analytics team, that basically means that we are providing the rest of the organization with good sources of data and good ways to interpret that data. So at the moment that means we’re doing a lot of work supporting our risk analysts because managing risk as a company that processes credit cards is kind of important for us. But yeah, more generally we also support the marketing guys, the finance guys, and even actually also other engineers like if we want to provide products to our consumers that are built on top of DataMapper we wanna make sure that have good access to that. So I kinda see us as like an internal team and then any of the external facing stuff sort of comes – another team will build on top of what we do to provide that externally. So that’s kind of the approach we’re taking. We also do a lot of visualization stuff so in the media if you see any visualizations about Square of whatever they generally come from our team.

Eric: Okay, neat.

Xavier: We do like internally for radiators and stuff like that.

Eric: Cool, cool. So can you tell us a little bit about what kind of technologies you guys are leveraging there?

Xavier: Yeah, definitely. So we’re using Ruby and the JVM – yeah, quite a bit. That’s our sort of go to at the moment. We – some of the interesting stuff we’ve got two open source projects actually. One – one called cube which is built on top of MongoDB and does some really cool – allegedly really cool ad hoc visualization of time series data so that’s really quite cool. And also a JavaScript library called D3 which is if you wanna do sort of data visualization, it’s kind of one of the better JavaScript libraries for that. And yeah, we’re sort of moving more towards JVM, we sort of got some bigger projects coming out soon that we’ll probably be investigating other sort of JVM languages for but we haven’t made any strong decisions there yet.

Eric: Okay. So you mentioned you’re using like the JVM a lot. Are you using JRuby? Do you prefer that over MRI or Rubinius? What do you think about the different interpreters that are out there?

Xavier: I actually use them all. I think they all have their strengths. So Rubinius I don’t actually use in production but like on the weekend I hack around on it just because it’s fun. I really just enjoy being able to say hey, here’s a patch for a Ruby interpreter like I think that’s a really – I have a really good ecosystem around that. MRI I use because it’s fast, just like the startup time’s really quick. If you’re developing Ruby I think MRI is the nicest experience there because – I mean I think startup time is really, really important if you’re trying tight TDD loop. You can’t – at the moment you can’t get that with JRuby. I mean you can kinda use like Nailgun and these sorts of things but it’s kind of – it’s still a bit flaky we found. And then especially compared to MRI. For deployment though we’re using JRuby but we deploy on JRuby for most of our apps and that’s really nice.

I mean the JVM as a platform is great, we actually have a lot of Java stuff, not in house at Square, not necessarily the analytics team but we do deploy Java apps so it’s nice to have one single platform to deploy to. And also just access to things like a real threading model and a better garbage collector and this sort of stuff just really helps in deployment. So I think there’s a spot for all three and I’m happy jumping between them all.

Eric: Okay. You mentioned a minute ago about how important you think startup time is and you wrote a blog post, it got fairly popular a while back about speeding up real startup time with the MRI, right? And what made you really get into that? I mean was there like an itch, was there a certain project that it was really just kinda driving you crazy or is it –

Xavier: I was working on a project and it had about a 17 or 18 second startup time on 1.9.2 and I was kinda like man, this kinda sucks a bit but – and I hadn’t really done anything about it. We did this cool thing – this is my last company back in Melbourne – we did a dev swap where with one of the other companies we just switched two devs for a week and that was really cool. But we had this new guys come in and he was like wow, your app takes like twice as long as ours to start and their app was much larger than ours. And they’re on 1.8.7 and we’re on 1.9.2 and I’m like oh, that doesn’t sound great. And so I was like well, we must be doing something wrong. So we did some benchmarking, we did some profiling and we found that yeah, we were doing some things wrong but that only like cut about three seconds off our startup time, we were still sitting up around 16 seconds.

So I started digging in and start asking around IRC and everybody was like it’s probably RubyGems, RubyGems is crap, we’re fixing it, don’t worry about it. I don’t really know if that’s true so did some digging, did some benchmarking and it was like no, I don’t – don’t think it’s RubyGems, I’ve got that disabled. I got down to this point where it dawned on me, I was like wow, no, I think there’s actually a problem with Ruby, with the C code. And so that was kinda scary because I haven’t done C code since university and I like my Ruby but I thought well, it’s just code so I went down, I dug into it and spent quite a while trying to figure out what was going on and rewriting bits and pieces and yeah, sort of popped my head up a bit later and had a patch so.

Eric: Awesome.

Xavier: Yeah, it was kinda just a –

Eric: Right, first you were looking about how to fix it in your app and it ended up going down to being a patch in the interpreter.

Xavier: Yeah.

Eric: Cool.

Xavier: And I really liked that the fact that I actually can go right down to the interpreter and it’s just – it’s just the next logical step, right?

Eric: Right, right. And one of the major benefits working with open source technologies.

Xavier: That’s it. Like it’s C code but it’s still – it’s just programming, right? Like it’s all the same programming stuff. When it comes down to it it was just an algorithm that needed tweaking so.

Eric: Right. Cool, cool. So switching back in towards the database gears for a minute because you were talking – we were talking about DataMapper a minute ago, you’ve gotta a really keen interest in databases and relational databases particularly. You gave a really popular workshop a while back, Ruby on Rails workshop called Your Database Is Your Friend. For – what’s something in that quick that you say is some really important piece of information that developers should know or consider to help make their lives better because I mean this is Ruby on Rails, a lot of times the databases really are one of the last things developers will really look at when they’re working on a project. So what do you think that they should really be getting in front of?

Xavier: So I think the most important thing for Ruby developers or for Rails developers specifically is Rails is really, really good at prototyping stuff really fast. A lot of the Rails best practices don’t scale up to an app that’s actually in production. So for instance I haven’t seen a large Ruby on Rails application that doesn’t have data integrity problems. So –

Eric: As a long time DBA with Engine Yard I can back you up on that.

Xavier: Yeah. So I think one thing you need to acknowledge is that you will have concurrency problems and the only way to protect against these data integrity issues proactively is to actually start using database constraints. So for instance I think it’s widely accepted in the Rails world that if you have a constraint that’s uniqueness of you need to have an uniqueness validation backing that. This is in the documentation, everybody sort of accepted this. What’s also true is for exactly the same reason that you need that, you also need foreign keys because they protect against the same – it’s sort of slightly more –

Eric: And you specifically needed to have a uniqueness constraint in the database –

Xavier: To validate the uniqueness of, yes.

Eric: Right.

Xavier: Whereas – and if you have a has-many relationship to protect from the same sorts of problems like one process deletes a parent whilst another process adds a child, you need to have a foreign key in your database. And new people to Rails actually kind of assume that Rails does this for you. A lot of language, like in migration you say t.references, whatever – this is something I picked up –

Eric: Especially coming from other ORM’s that do do that.

Xavier: Yes. So this is one thing I actually picked up teaching the course it that most people expect the Rails just does it for you which is I find is kinda – kinda interesting. So that’ll be my – yeah, that will sorta be my one major thing, if thinking up a new project, use foreign keys. They really help out. And they actually don’t really slow you down. It used to be pain with pictures and stuff but now a days since nobody really uses those too much –

Eric: – also adding an index is –

Xavier: Yes. Yeah, yeah, yeah, index is another big one. For the course I actually didn’t really even focus on performance very much. Most people sort of figured out that oh, my app doesn’t – is slower, I need to add indexes.

Eric: Right.

Xavier: I was more focused for my course on sort of the data integrity and architecture side of things. So I guess that’s kind of my number one thing is use foreign keys. My other one is – I don’t know how to phrase it but you sort of get to a point with ActiveRecord where it can do some really stupid things and it gets really hard to get it not to do those stupid things. Like it’s loading in eight different tables and you try and – and you load them but you can’t because you’ve used a polymorphic association. Don’t use polymorphic associations, that’s my other thing.

Eric: Okay.

Xavier: Basically because you can’t do data integrity on them properly. But yeah, so you get to this point where actually if you’re pulling all these things you can’t eager load so then you need to try and hook into the ActiveRecord eager loading code but then it’s still selecting a whole bunch of stuff out of the database and you sort of get to this point where you’ve got all this complexity and you kinda can’t really make it simple enough to get performance gains or whatever. Most people at this point just go to caching, which you know from a cost benefit tradeoff is actually kind of fine but I think it’s just something to be aware of. I don’t have a great solution, I mean it’s not like – DataMapper has this problem as well, right?

Eric: Right, right. I think instead of the – regardless of what ORM people use they need to not be afraid to step into using SQL instead of forcing the ORM to try and do something that it’s really not gonna wanna do.

Xavier: Yeah, that I like because you get to the point where actually it’s not even about its generating back queries. The queries it’s generating execute really, really fast but it’s loading up hundreds of objects which take time and then also need to be garbage collected so I think people in Ruby-land are generally just too afraid of SQL. I’m not gonna say it’s perfect but it’s – it actually has some good qualities and we should recognize those.

Eric: Cool, cool. So getting away from SQL, what are your thoughts on the – the plethora of no SQL databases that have sprung up in the last few years; a lot of them are really popular. Do you have preferences, pro and con on any of them or just the whole idea in general?

Xavier: Yeah, so I – I mean I think there’s definitely a lot of use cases for which they’re really great. My biggest concern is people who have sort of been using Ruby on Rails with the databases being hashed in the sky now thinking oh, I’ve actually got a bit hash in the sky without realizing that – like that actually – this idea of treating a database as a hash has some pretty serious concurrency and transaction related issues. So I think if you just take a standard app and say oh, well I can use Mongo because I don’t have to work about schema, you’re gonna hit all the same problems. I must say I haven’t used – I’ve started developing on Longer and I’ve never got to the point of actually releasing the product because we had too many sort of concurrency related problems that we just – or data integrity related problems so I guess I have to put that caveat in there because I haven’t – I couldn’t call myself an expert on it. And so I think there’s definitely so use cases where these technologies are really good. I think people really need to focus on the tradeoffs. SQL databases like Postgres are – they’re solid pieces of technology, they’re boring but they do what they do really, really well. That’s why –

Eric: Right. Even if you were using them with hash in the sky there’s a lot of integrity and safety guarantees that you get just for using those that don’t necessarily come along with the no SQL databases and I think that’s where – I agree with you in that people need to really learn and understand that those – those tradeoffs that are kinda behind the scenes that aren’t really developer facing.

Xavier: Yeah. So I mean I think the burden of proof is on the no SQL solutions because I think the SQL ones are – they’re the boring pieces of tech, right? And I’m actually like, I’m in technology but I’m actually like, I’m not a gadget person, I’m not on the bleeding edge, I haven’t upgraded to Lion. I’m a big fan of boring stuff that works so I guess I’m a little curmudgeonly like that.

Eric: That will be the next big framework, boring stuff that works.

Xavier: I should register that to my name or something.

Eric: Right, right. All right, so just kinda wrap up with something kinda just general and just – you went to RubyConf in New Orleans this year. What were some of your favorite talks? Were they on the street outside of the conference or where – good stuff that you get to hear?

Xavier: I was actually really impressed by the quality of talks at RubyConf. Some of my favorites, the – Greg Moeck did one of why we don’t get mocks and – so a testing related talk which I thought was really, really important. I think it’s online, if you haven’t watched the talk I think you probably should. It – yeah, I think it’s really – it showed me something that I’ve sort of been going through recently is there’s been a bit of a backlash about mocking in the Ruby community and I think because people sort of run into this big problem where if you’re using ActiveRecord and try to mock things out there’s this fundamental incompatibility because ActiveRecord needs to be able to talk to database so you try mocking out the database and people just sort of run into these big bowls of mud and when you realize that actually sort of the Java guys who came up with this mocking idea like ages ago were explicitly saying hey guys, don’t do that. And so we ran into this problem where the way most Rails apps were designed they don’t really allow you to mock very well at all. But so what that says to me is not that oh well, we shouldn’t use mocking, it says that well we should actually listen to the mocks and realize that the way we design a lot of Rails apps actually doesn’t really scale up very well. We should probably be looking at our early design a bit more closely. That’s coming from someone who has written a lot of code that would fall in this category. So I don’t know, I think that was really important. There’s been a bit of – that’s sort of – there’s been a lot of bad press and stuff sort of in the past six months or whatever about this in the Ruby community and I think it’s really important. So that was kind of one of my favorites.

Brian Ford gave a talk about tooling for Ruby and basically how it’s not very good and we need to make it better. I just found it inspirational I guess in that he was like guys, everybody’s moving to Java because it’s got better tooling; Ruby tooling sucks, what are we doing about it? It was kind of a real call to action which I really liked. It probably sounds a bit kiss ass to say on an Engine Yard show but Dr. Nic’s talk was kind of fun about sort of this eventing versus threading and how it’s basically like okay guys, here’s how you should deploy applications that make sense and he did it in a very entertaining way so those were probably my – a couple of my favorites just on the top of my head. But yeah, just talking to people just randomly in the street was also really good. It was one of those conferences where normally I find either the talks are good or the people are good but when both of them were good it was really valuable so.

Eric: Right, right. Hallway track’s always got something good to offer.

Xavier: Yeah, to –

Eric: Cool. Well, thanks a lot Xavier, appreciate your time.

Xavier: Oh, my pleasure. Yeah, thanks Eric, that was – that was good.

Eric: Hopefully we’ll meet in person some time.

Xavier: Yeah, I’ll see you around.