Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
82.  A.I. and Materials Discovery - an Interview with Taylor Sparks image

82. A.I. and Materials Discovery - an Interview with Taylor Sparks

Breaking Math Podcast
Avatar
1.7k Plays1 year ago

Transcripts of this episode are avialable upon request.  Email us at BreakingMathPodcast@gmail.com. 

In this episode Gabriel Hesch interviews Taylor Sparks, a professor of material science and engineering, about his recent paper on the use of generative modeling a.i. for material disovery.  The paper is published in the journal Digital Discovery and is titled 'Generative Adversarial Networks and Diffusion MOdels in Material Discovery. They discuss the purpose of the call, the process of generative modeling, creating a representation for materials, using image-based generative models, and a comparison with Google's approach. They also touch on the concept of conditional generation of materials, the importance of open-source resources and collaboration, and the exciting developments in materials and AI. The conversation concludes with a discussion on future collaboration opportunities.

Takeaways

  • Generative modeling is an exciting approach in materials science that allows for the prediction and creation of new materials.
  • Creating a representation for materials, such as using the crystallographic information file, enables the application of image-based generative models.
  • Google's approach to generative modeling received attention but also criticism for its lack of novelty and unconditioned generation of materials.
  • Open-source resources and collaboration are crucial in advancing materials informatics and machine learning in the field of materials science.

Help Support The Podcast by clicking on the links below:

  • Start YOUR podcast on ZenCastr!   Use my special link  ZenCastr Discount to save 30% off your first month of any Zencastr paid plan
  • Visit our Patreon

    How is Machine Learning being used to further original scientific discoveries?  



Recommended
Transcript

Introduction by Gabrielle Hash

00:00:01
Speaker
All right, Taylor, thank you so much for joining me. I appreciate this. Just so that all of our guests know, my name is Gabrielle Hash. I'm affiliated with the Breaking Map podcast. Thank you for all our listeners who are patient with me and my voice. Quick story, I was breaking up a fight with my kids and I raised my voice and it damaged my vocal cords. But my son told me that if I did not yell, then maybe I'd earn my voice back.
00:00:28
Speaker
Thank you for your patience.

Taylor Sparks Introduction

00:00:29
Speaker
Oh, and Taylor, would you please introduce yourself? Tell us who you are, what you do, and your podcast. Absolutely. My name is Taylor Sparks. I'm a professor, right? So I'm academic. I'm in a material science and engineering department. And I've got a podcast called materialism, where we try and talk about all things material science.
00:00:47
Speaker
Yes. Awesome. Awesome.

Purpose of the Call: Machine Learning & Materials

00:00:49
Speaker
Now this, the purpose of this call is based on a paper that you published recently. I'm going to stop talking as soon as I can, because I don't like hearing my own voice when it's all scratchy. The Breaking Math podcast is delving super deep into machine learning. And we're going to have a bunch of conversations with material scientists, with neuroscientists, with biologists, as well as humanities and public researchers, including linguists.
00:01:17
Speaker
And we're all going to talk about how artificial intelligence and machine learning has been used in our field, as well as the limitations of it. So I was very, very excited to see your paper. Without further ado, if you could share a bit of your abstract, you can either read it or just give us the TLDR.
00:01:37
Speaker
Well, I don't have it in front of me, so I'll just tell you what we did. I will say that we've been interested in generative modeling for materials for a while. And one of the reasons why is we've been doing materials informatics, machine learning plus materials research for 10 years, a little more. And what I had seen a lot of is people would build models that would allow you to predict the material property or something. And then they would take all the materials, and they'd pour it in the top of this model. And then it would get predictions for those things. It would apply these labels to them.
00:02:05
Speaker
And then you can screen it, right? So maybe you're trying to find a high bandgap material or like a really low bandgap or a high strength or whatever else you are interested in. And that's great, but that's not materials discovery. It's materials identification, right?

Generative Modeling in Materials Science

00:02:17
Speaker
Because you built a model and then you poured into the top of it stuff that we already knew. And so if you're going to do something other than just screening through materials and actually finding materials, that's where we started to think a lot about generative models.
00:02:29
Speaker
And so starting back in COVID actually is kind of when we got into this. In 2020, when our labs were shut and we're kind of working remotely anyways, I had a grad student who thought like, boy, wouldn't it be great if we could just predict new materials outright, like new crystal structures. And I was like, God, that'd be hard though, because.
00:02:45
Speaker
Our friends in chemistry have done this, but they do it for individual molecules. But it gets hard when you take that molecule and you try and figure out how do those individual molecules stack together if they crystallize? Because there was no good representation that allowed you to do that. If you tried to use the tools that they had, it would basically put atom chunks on top of one another. It just didn't work. It didn't follow basic rules of even translational symmetry, let alone the more complicated ones like rotational or the different things that these can have.
00:03:14
Speaker
So he had the idea that what if we just took information from the sieve card. The sieve card is the crystallographic information file. It basically has all the symmetry information that you need to make a periodic crystalline sort of repeating unit cell. You can make it as many unit cells as you want, like you could fill up space with this thing.
00:03:33
Speaker
It doesn't account for defects, like a missing atom. It doesn't account for things like edge dislocations, the stuff that you learn about material science. But it'll get you a defect-free material, which is a heck of a lot better than what we could do in the past. So what we came up with was a representation based off of that information in the SIV card. Basically, you take the info out of the SIV card. You put it into a matrix, a tensor, where we can do machine learning with it.
00:03:57
Speaker
And our first paper in this area was called Crystal to Ping because we realized that we could encode that information in a format that looked kind of like a PNG image file. In fact, for any material, we can create a kind of cool image. It kind of looks like a QR code or something, but it's a representation of different materials. And once we had that, what's awesome is that now that you've represented your material as an image,
00:04:22
Speaker
you can turn loose all these cool different image-based generative models. So Imogen and Dolly 2 and Mid-Journey, stuff like that. Those models with very little tweaking, we can now make new materials, right? It's going to come out looking like this weird barcode, and then we have to convert that over to an actual structure. And so the paper that we just published last week is us demonstrating that. We showed it with GANs, which are a type of machine learning architecture.
00:04:48
Speaker
Wasserstein GANS, which are a different one. And then the big ones, things like Mid-Journey and Imogen that they're using, those are called diffusion models. And so the best performing ones in our case was also the diffusion models. They turned out to be pretty great. And the types of materials that we were able to predict
00:05:03
Speaker
They look really good. They look like reasonable structures. The bond distances aren't crazy. Some of the chemistry is crazy because we haven't encoded any chemical information yet. But as a first pass, it's a really great first start to show that you can generate new materials.
00:05:19
Speaker
Awesome. Awesome. Very cool. Very cool.

Critique of Google's DeepMind Project

00:05:21
Speaker
Now, if you wouldn't mind for our general audience, I know there's been a lot of media attention to what Google did and then it received a lot of criticism within the material scientists and I think chemistry spaces. Can you tell us a little bit about what Google did with materials and why it didn't work out and how yours is different?
00:05:44
Speaker
Sure. So Google has a company, DeepMind, and they do actually really great work. For example, Alpha Fold was a paper that came out, I don't know, three or four years ago. And it's one where they used machine learning to predict the ways that protein structures would fold together. This was considered a grand challenge problem. For 50 some odd years, we've been trying to solve this, and it was a really hard problem. And using machine learning, they were able to address it. So they've got some cred in this space of this sort of material space.
00:06:14
Speaker
But this was to my knowledge, and I'm pretty sure it's the only instance of them actually moving into the hard material space where now they were actually looking at new inorganic compounds. It's pretty impressive. At first glance what they did, I think it did get overhyped. It came out in the middle of the MRS conference, which is the big one in my field, the fall meeting.
00:06:34
Speaker
They timed its release to be right in the middle of that conference. They released it with another paper that was synthetically trying to make the new materials that they were predicting. It had dozens of news articles writing about it that had been given press release, embargoed versions. It was not your typical paper in that respect. It felt a little less like science and more like a press release because it was a press release.
00:07:00
Speaker
So I'm not trying to disrespect them in that way, but it was very different than your typical paper. And then the paper itself, when I looked at it, I got an early access view myself, Rex, I got interviewed by one of these publishers. And it's fine. I think that it's great. It looks like they use a lot of tools that were already out there in the wild, if you will. There's a lot of tools that get developed by academics like us, like, for example, Crystal to Ping and things like that are being generated all the time.
00:07:29
Speaker
And then it looks like DeepMind gobbled some of those up. That's not the right word. You know, they brought those different things together and they've deployed them at scale. Now I've since seen them push back on that notion and they claim that there were some novel aspects of the methodology itself.
00:07:45
Speaker
Whatever. I'll take their word for it. But they deployed these things at scale. Whereas mine was, you know, our architecture for our representation, we've been calling it Crystal Tensor or Crystal to Ping. Theirs actually does a different approach. It uses graphs. And there's others that have gone the graph route and graph neural networks are a powerful way to represent materials.
00:08:04
Speaker
Because graphs are made up of nodes, edges, and so the node can be like the atom, and the edge can be like a bond. It sort of is a natural framework to think about materials. And anyways, using this approach and massive compute resources at Google, they claim to have generated millions of new compounds.
00:08:24
Speaker
and then they've gone further and said of these millions of compounds some large fraction of them they've found to be actually stable because you could dream up any you know collection of atoms in some 3d arrangement that you want doesn't mean that it will actually exist in nature right that has to do with thermodynamics and so they've done a calculation of whether or not some of these will be stable so that's i don't know it certainly sucked all the air out of the room a lot of people were talking about it non-stop but those of us that have worked in this area one of my immediate
00:08:51
Speaker
Critiques was the one that I mentioned that, yeah, this looks like sort of existing tools that they've just deployed at scale. Nothing wrong with that, but it doesn't look like a lot new there. And a second criticism might be that we already have loads of materials that we've been predicting for, like, I don't have hundreds of thousands of stable materials, but we have lots and lots, way more than we can synthetically access, right? Like, that's already been the case that we have way more materials that
00:09:16
Speaker
We would love to explore experimentally, and so adding another 100,000, like what's that do for us? And maybe a third criticism of that word is just my thoughts on it. It's maybe a weakness to it, is that these were unconditioned generation of new materials.
00:09:33
Speaker
For example, you could imagine a scenario where you condition the generation to say, not only give me new materials, but give me ones that have maybe a specific band gap or a high strength or whatever else. Conditional models are very interesting. If you think of the image space, instead of just give me pictures of human face, give me a picture of a guy with a mustache and freckles. Give me something specific. That would be conditioning. They didn't demonstrate that. Our paper came out the day after theirs, which is ironic.
00:10:00
Speaker
And then like, I want to say like two days after that, Microsoft released their MatterGen paper, which is another one of these things. Theirs is conditional though. It's property conditioned. So now you can say like, give me materials that have a specific bandgap or specific modulus, something like that, which is I think an improvement over DeepMind. I haven't dug into that paper yet. I don't know how many stable compounds. I don't know if they've reported that or anything and I haven't had a chance to look through it yet, but
00:10:24
Speaker
This is a cool field. It's moving fast. The same sort of things that have made deep fakes and all these other things in the image and the video space possible. If you saw that and thought, man, what a dystopian world we live in, I think this offers a counter opinion to that. It offers another point of view where you could say, boy, those same sort of tools when applied to materials research, maybe they're going to help us find the next super cheap, low cost,
00:10:51
Speaker
high crazy high efficiency photovoltaic or catalyst or battery or you name it right the vast majority of other engineering disciplines i'm a material scientist at the university of utah right but when i look across the campus at other buildings you are mechanical engineers or chemicals they're all waiting on new materials all of every single one of them could tell you boy i can make this amazing gadget if only
00:11:13
Speaker
I had a material that did X, Y, and Z, right? And X, Y, and Z properties might be things that are just inaccessible right now, but with new materials, they might become accessible. So I'm actually quite positive and optimistic about this. Google and Microsoft, I don't think they're releasing any of their data or code. They haven't yet that I know of.

Educational Resources in Materials Informatics

00:11:32
Speaker
Maybe we can convince them to change their mind, if not.
00:11:37
Speaker
Maybe you have to use it without knowing exactly how the model works or how they trained it. You can't really validate it if you don't have the code. But academics are continuing to release their code, and at the academic level, you're still seeing really rapid progress, even without the massive compute resources that maybe Google has. That's actually really exciting. Regarding machine learning techniques, I have to ask you, are you or any of your colleagues familiar with the professor that goes by Ike and Steve on all social media?
00:12:06
Speaker
I don't know that name. I have to take a look. Here's why I asked and I think you're going to see very quickly where I bring him up. He just wrote a brand new second edition data science and machine learning textbook. Big fat textbook on methods used in data science and machine learning.
00:12:26
Speaker
And the entire thing is free in PDF format on his website. Not only the book itself, but every, I don't know that it's every problem, but problems throughout every chapter on every concept. The MATLAB and the Python.
00:12:44
Speaker
is available for free. He is an extremely open source kind of guy and everything is beautiful. So what I've been thinking with breaking math, you know, we're pivoting a little bit.
00:12:59
Speaker
We kind of want to get into bringing separate fields together and maybe, you know, fingers crossed here, it would be really awesome if some relationship that we help catalyze, like if we have a giant dinner party with all the scientists, you know what I mean? Or like a Avengers Assemble, but with scientists in different fields over the topic of machine learning. It might yield something, I don't know, you know what I mean? One can hope.
00:13:28
Speaker
Well, I'm encouraged to hear that. I myself, you know, I teach a class called materials informatics. I'll be teaching it this spring again. And every lecture I do is on YouTube. It's all free. All of my notes and slides are on GitHub. All the workbooks and notebooks that I put together in Jupiter, you know, or whatever else we do them in.
00:13:46
Speaker
That's all available because i am a big proponent of free and open education i see i see that as sort of the textbook of the future you know in my field it's kind of interesting the textbook that's used all around the world the most popular material science textbook is calistars intro to material science calistars my neighbor he's from utah right he put us on the map as being like the materials education
00:14:08
Speaker
And I see, you know, the stuff that we're generating as being just the next version of that for 2023, right? So all my courses are on YouTube for free. You can do learn them as you like. If you want to come to the University of Utah, you can hear it in person.
00:14:23
Speaker
And you can get assessment, which you can't get from YouTube, which is why we still think that it's worth going to college. But the learning is out there. It's for free for anybody. We try to make that as accessible as possible. We also have a really great, I don't know if you're going to have show notes for this, but I can send you a link. We have a best practices paper for people interested in getting into materials informatics.
00:14:43
Speaker
So it talks everything from like, where's the data? Where's the problem? What are the types of tasks that we do with materials plus machine learning? Talks about splitting your data, avoiding p-hacking, all this sort of like really introductory stuff. I think that anybody who's getting in this space ought to start with that article. We published it in Chemistry Materials and it's been downloaded a gajillion times. I think it's a great first place to take a look.
00:15:05
Speaker
Oh, fantastic, fantastic, okay. Now, I realize my intention is to keep this somewhat short, but before I do, you have had much more feedback on your publication, obviously, than I'm even aware

Impact of AI on Material Science Research

00:15:17
Speaker
of. So rather than myself coming up with the questions, I'd love to give you the opportunity to tell us anything that we as a general audience with just general interest in materials machine learning should know about your entire process, machine learning, or materials in general.
00:15:34
Speaker
So much to cover there. I would just say it's an exciting time to be studying materials. AI is an enabling technology for materials research. We are finding materials faster. We're getting better at predicting their properties. We're having higher confidence in the models. We're wasting less time chasing dead ends.
00:15:52
Speaker
It is a really great time to be doing material science and particularly if you can come to with an interest in learning to code and things like CoPilot, Chagipeteer making even that easier. It's just a, it feels like an inflection point. It feels like a new type of material science engineer is being trained right now.
00:16:10
Speaker
Okay. Cool. It froze up there for a second, but we're good. We're good. Awesome. Awesome. Okay. Very good. Well, this has been like a, kind of a shortage video. I will be producing a newsletter very soon with the Breaking Map podcast. And I want to include any links from everybody we interview and you know, as well as other guests, all my future guests are going to know about this as well. So you may have some interesting questions from some interesting perspectives.
00:16:33
Speaker
So yeah, and this will also be posted on anything that allows me to post. So I think that's all that I have for now. I really appreciate you coming on here and talking because I know it's cognitively taxing to always read the small papers, but if you just listen to someone talk, that's at least an introduction to it. So yeah. Okay, happy to help.
00:16:57
Speaker
Awesome, dude. All right. All right. Well, thank you so much. And it's always a blast collaborating with the materials podcast. I was going to be materials before I went into EE. So it's a way to live vicariously for me. So I will send you these videos. As soon as you have them, if you want to use them for any purposes, this is just for the general public. I'll upload them when I can, but for anybody, anytime. OK. Sounds good, man.
00:17:49
Speaker
Awesome. Thank you so much, sir. And we'll be in touch soon.