Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
From Homework to Portfolio: NYC Open Data in the Classroom image

From Homework to Portfolio: NYC Open Data in the Classroom

S12 E311 · The PolicyViz Podcast
Avatar
315 Plays3 hours ago

Welcome back to the show! In this week’s episode, I chat with Christian Martinez, a faculty member at Brooklyn College and several other CUNY schools, and Shannon Joyce, a newly minted master’s graduate in psychological research—who, as we note at the top, literally graduated the day before we recorded. Christian shares how he redesigned his graduate stats and R course around NYC Open Data, building what he calls an “accidental author” process that transforms students’ weekly homework into portfolio books and, ultimately, chapters in a published student gallery. Shannon walks us through her own project exploring the relationship between mold complaints and domestic violence rates in New York City, and reflects on what it means to learn to code by asking questions you actually care about. We also dig into the NYC Open Data R package Christian and his students built together—now streamlined from 40 functions down to three and approaching 2,000 installs—and close with a lively conversation about whether open data skews too negative and what a truly positive city dataset might look like.

Keywords: NYC open data, R programming, data visualization, teaching data science, open data, CUNY Brooklyn College, R package, data education, open educational resources, data storytelling, Quarto, RStudio, graduate education, data literacy, public data

Subscribe to the PolicyViz Podcast wherever you get your podcasts.

Become a patron of the PolicyViz Podcast (https://patreon.com/policyviz) for as little as a buck a month

Find Christian Martinez and all student work at NYCOpenDataLab.org. Find Shannon Joyce on GitHub (github.com/ShannonJoyce) and LinkedIn.

Follow me on Instagram, LinkedIn, Substack, Twitter, Website, YouTube

Email: jon@policyviz.com

Recommended
Transcript

Introduction to Policy Viz Podcast

00:00:12
Speaker
Welcome back to the Policy Viz Podcast. I'm your host, Jon Schwabisch. We are very slowly and surely getting to the end of this season, season 12 of the podcast, and then I'll take a little break this summer, get some R&R and not record for a little bit.
00:00:27
Speaker
But I'm not done quite

Christian's Class at CUNY and R Package Creation

00:00:29
Speaker
yet. On this week's show, I am joined by Christian Martinez, an instructor at CUNY in New York City, and one of his students, Shannon Joyce, and we talk about learning R in Christian's class, using New York City Open Data, and creating the New York City Open Data R package. if you're interested in thinking about how to code and also how students are thinking about using ai as they learn to code and also if you're a fan of open data this is the episode for you we talk a lot about teaching data and data visualization we talk obviously about learning code and how ai is facilitating that learning and we also obviously talk about a lot about open data and we also take a little bit of a turn at the end here to talk about positive data rather than negative data that you also often find in these New York City open data files now before i
00:01:21
Speaker
set you off and listen to this week's episode of the

Jon's Web App Projects

00:01:24
Speaker
show. I just want to note that I've been building a lot of different web apps and tools. Yes, with my friend Claude, but a lot of things that I've been interested in some serious, some sort of fun. Most recently, I've built a podcast explorer tool of the Lonely Island with Seth Meyers podcast, one of my favorite fun podcasts to listen to. It's sort of a precursor to a tool I've been working on for a while now with my own show. I have some more advantages there because my transcripts Many of the transcripts for this show have been human transcribed and they are also tagged by speakers. So I can actually look at which guests and the host have said what over time and can compare them and link them all together. Can't quite do that with the Lonely Island podcast, but still is a fun project to create and a fun tool to build. I've also built a social security explorer tool. I've also built a a data visualization style guide builder. So if you haven't checked that out, that is, ah I think, a pretty cool project that I've been working on and a variety of other things, all available over at policyvis.com.
00:02:25
Speaker
All right, so that's enough of that. Let me get you over to the show. This, in case you were wondering, is episode 311 of the Policy of This podcast. As we get near the end of season 12, here's my conversation with Christian Martinez and Shannon Joyce.
00:02:42
Speaker
Two fun new guests. Hi, Christian,

Christian's Teaching Role and Inspiration

00:02:45
Speaker
Shannon. Good to meet you both. Thanks for coming on the show. Thanks for having us. Yes, thank you. I'm very excited. We've got a lot of open data, open source stuff to talk about, big projects, fun stuff. Let's start in the natural way. Let's start with introductions so folks know who they're going listening to today. um Christian, you want to start and then and then Shannon, you can go.
00:03:06
Speaker
Absolutely. Hello, everyone. My name is Christian Martinez. I'm currently faculty a faculty member at Brooklyn College and a few other CUNY schools, which has been such a fun experience because I am a CUNY alumni myself.
00:03:20
Speaker
And this this semester and the past semester, my students and I, our class did some amazing work with open data, specifically New York City open data, and I'm really excited to talk about it.
00:03:33
Speaker
Awesome. Shannon?

Shannon's Interest in Psychology and Sociology

00:03:35
Speaker
Hi, I'm Shannon. I just graduated from Brooklyn College with a master's in psychological research. Let's caveat that for folks who are listening because um like that happened yesterday. so that's very exciting.
00:03:50
Speaker
So like, I'm not totally surprised, but that you're like, you know, you're up and awake and like not yeah super hungover, that's awesome. Still processing it. Still processing. That's right. Congratulations. Maybe tonight. Yeah, yeah, yeah, yeah.
00:04:05
Speaker
Anyway, i interrupted. yeah Oh, no. But yeah, so I just graduated just starting the job search. And I'm just excited to start working on projects outside of school and, you know, delve into some more subjects that I'm more interested in.
00:04:23
Speaker
Terrific. So I want to hear more about your multiple skills that you sort of piece together there, but let's start with this project. So maybe Christian, you can, you sort of gave us a little bit of a glimpse, but maybe you can talk about the evolution of how you thought about teaching data and data viz and open data in class, and then how you came to this, you know, this sort of open data platform where students would, would publicly put all their, ah their, their projects.
00:04:51
Speaker
Yeah, this is a great question. So I'm going to be as upfront as possible and say last semester was the first time I've ever taught at the graduate level.

Teaching Open Data at Graduate Level

00:05:01
Speaker
So I've been teaching at John Jay College of Criminal Justice, one of the other CUNY schools since maybe spring of 2023. And I only taught undergraduates additionally in the psych program, but a totally different beast from this.
00:05:15
Speaker
So when I started teaching and when I was creating my syllabus, I wanted to make sure that my students had something that they could be proud of. And more so that I'm brand new to the program. I'm brand new at teaching at the graduate level.
00:05:33
Speaker
I don't have as much to offer my students as some of my colleagues. For example, Shannon has a research advisor where she did her thesis. Other colleagues have more traditional labs than myself.
00:05:47
Speaker
And I was thinking, okay, I don't want my class to just be a class where students do homework and that's it, where they just learn statistics or learn R and that's it.
00:05:58
Speaker
I wanted there to be something tangible. And so I thought of the idea of what if we take all of our research projects and turn them into a book?
00:06:10
Speaker
I had never done anything like that before. I'd never been the author of anything. Sure, I created and and performed my thesis, but not in the more traditional sense of authoring a book.
00:06:22
Speaker
And so I sent a lot, a lot, a lot of emails to a lot of different people. I then got to the Open Educational Resources Librarians at Brooklyn College.
00:06:35
Speaker
And I said, hey, I have this idea of taking my students final research projects and turning them all into individual chapters of a student gallery book.
00:06:48
Speaker
What do you think? And here we came to our New York City Open Data student gallery book. And I mean, there's a lot in between there, but that's the origins of the project.

Integrating Psychology and Sociology in Research

00:07:00
Speaker
Yeah. So Shannon, tell me how you, um well, let's start with your like core field of study. And then how did you come to this class? And then what was your immediate reaction to, okay, we're not just like going to be taking blue book exams or whatever people take now? I don't know.
00:07:21
Speaker
don't know. You know, high school, so like Google Chromebooks or whatever. um and then having stuff like out in the world. Cause I can imagine that like initially being like, I don't want people to see my stuff.
00:07:33
Speaker
Definitely. So, um, the approach that I have taken to my research throughout this program has kind of been, mixing psychology and sociology, just to see how macro and micro structures, um, you know, uh,
00:07:50
Speaker
interact with one another and and how they interact with ourselves. So, ah so I was just kind of taking it from that angle. And my final project for this class, I wanted to see how mold complaints in New York City correlated with domestic violence complaints in New York City, and not that, you know, they necessarily are related ah to one another specifically specifically, but I wanted to see just how they trended together and kind of see what that tells us about those two variables and about so the interaction overall. So I don't really know what to expect. I figured that they would be...
00:08:35
Speaker
I figured that they would be interacting in some way and that they would be correlating having the same trends. And they did. And I'll talk more about that in a little bit. um But coming into this class, i didn't know what to expect. I've never coded before. The only statistics that I had ever dealt with was a research statistics class in the year prior, which was...
00:08:57
Speaker
you know, it was the exams and the typical homework assignments where it didn't really relate to anything in real life, at least in in my life and what I was interested in. it was just learning the statistics, which was very helpful. But i I didn't have that intrinsic motivation to ask questions and find out insights about, um you know, certain statistical questions. um So,
00:09:26
Speaker
Coming into this class, first few classes was really just about learning the basics in RStudio. um Again, never coded before, so that was my first introduction to it. And i feel like it stuck pretty easily. And i feel like maybe a month in, it started to really take off into what projects can we create using and NYC Open Data.
00:09:52
Speaker
And I feel like that's where the class really transformed for all of us in there because we were able to dive into data sets that we were personally interested in ask questions that we wanted to ask, and it wasn't going off of a specific, you know, something that Christian was telling us to find out. we We were able to take it into our own hands and ask our own questions and kind of be creative with it.
00:10:17
Speaker
Right. So OK, so this is the the r the teaching R part is really is really fascinating. um And I've spent, a few years ago, spent a lot of time like trying to learn R and thinking about how to teach R. So Christian, can you tell me a little bit about like the students that are coming in? So so I guess the real question is, like wheren like how many people how many students were like Shannon that like had never seen a coding language before? And how do you think about the difference when you're teaching coding to like show students something, how to do something, and then say, you know you go do it for a while, versus like let's write a line of code. And you know like like how do you how do you approach teaching ah a coding language? Because I think this's a this is a big question. Obviously, now with the AI tools, it may change things. so But I'm curious how you approach this philosophy of teaching ah a coding language, like like R.

Teaching R with Real-World Examples

00:11:12
Speaker
But it could be anything, i guess.
00:11:14
Speaker
Fantastic question and again this is at the graduate level and so I only had nine students which is the smallest class I had but that was great because I then knew that I had to spend even more attention to each individual student and we could be more of a collective and work together.
00:11:30
Speaker
So when I'm developing the the class, the class was structured on a Tuesday-Thursday schedule And my thought was, okay, let's just do a drinking from the fire hose situation on Tuesday where we do a code dump. And I throw as much code as possible out there just to show you what is actually possible.
00:11:50
Speaker
So it's supposed to be overwhelming. It's supposed to be confusing, but it's supposed to just introduce you to what's going on. The Thursday class is where where we really tie together because I say, hey,
00:12:04
Speaker
Remember what we did on Tuesday? Let's do a real world example of that today. So I'll give you an example. Let's say we were doing a correlation and how to run correlations in That was on Tuesday. And in Thursday, on the Thursday class, we pretended like we were analysts.
00:12:24
Speaker
And we were trying to see if there's any relationship between the minutes played and points. So we'd start from scratch. Let's open an R script. Let's load our packages.
00:12:36
Speaker
Here is the data that we have. And let's work with and so at least from my opinion i'd love to hear shan's opinion as well this was cool because on tuesdays you got to experience it and then you got to digest it and then thursdays it was like oh snap like this is kind of how people are really doing these exact problems if you were an nba analyst if you were working with the police commissioner if you were doing any of the things that we were practicing this is what this is how we be starting.
00:13:06
Speaker
This is how we be doing the things in the middle. And then this is how we be ending it. And then to tie it off, keep everything a little coagulated, i had homeworks due every Monday.
00:13:17
Speaker
So the following Monday. And so this way you had time, i had a similar prompt. And so this way you had time to look at what you did on Tuesdays, look at what we did on Thursdays, and then do it yourself on Mondays. And I thought that would be the best spacing because we are learning are the language, and it could be dense.
00:13:40
Speaker
But then yeah if you try to make it where, hey, like I'm not just teaching you language to be perfunctory. I'm teaching it because this is what real people are doing and these are real scenarios that we're practicing.
00:13:52
Speaker
Then there's a little more. And if I may go back to Shannon's point, I have to say and credit to her and her peers in that. They let me know very early on that the data

Engaging Students with Relevant Datasets

00:14:04
Speaker
sets we were using were boring, little apple pray boring,
00:14:10
Speaker
but But it's so important, and like at the graduate level especially, and it's one of the great things about having a smaller class is we can be a little more intimate. I'm happy that I at least created an environment where I i wasn't just saying, hey, you can talk to me about anything and no one says anything.
00:14:23
Speaker
yeah They were very straight up. and For example, I'm teaching base R, and I thought it was important, but teaching base R to show that R comes with preset data sets. So one of the ones was MT cars, maybe the classic R data set.
00:14:39
Speaker
And I'm teaching empty cars, which is a data set about cars to people that live in New York city. in new york None of my students have a car. like No one, no one knows what a cylinder is. No one drives a car. Everyone uses the bus or train or walk. So there's no connection to the data set. And I had a ah colleague of mine describe it as like, I'm, I'm using cadaver data sets. They're dull. They're boring. They're dead.
00:15:03
Speaker
And so when one of her peers was like, Hey, professor Martinez, like this sucks. I was like, okay, okay. Let's change it up. And that's when I had the idea of really incorporating, not just waiting till the end to the final project, but incorporating New York City Open Data into as many, if not all of the weekly assignments and classes that we had.
00:15:24
Speaker
And I think personally, that's when it changed the game. Yeah. So i'm I'm curious on one thing. So I think my favorite book on R is Hadley Wickham's R for Data Science book. yeah And what's interesting about Hadley's book is that it starts with data viz. It doesn't start with analysis. doesn't start with you know calculations or regressions or correlations. It starts with data viz. And I think...
00:15:46
Speaker
from a book perspective, I think that makes a lot of sense because it's really easy in a tool like R to make something with just a couple of lines of code, you know, especially if you're bringing in in empty cars or the, what is it? The penguins data set, right? Like really easy to build something, but it doesn't sound like you started with the, with the data viz. And is that because um it's just different in a, in a room when you're, when you're with students teaching rather than like a person working on their own with a, with a textbook?
00:16:16
Speaker
So I would say that it's not that I didn't start with visualizations. I would say, and Shannon, you could correct me if you think I'm wrong, is that every week on Tuesdays and Thursdays, we started with visualization. So it's not like we went, here's how to visualize data, and here's how to correlate data, and here's how to this. It's, hey, we're working on, again, correlations, but how do we visualize the data first?
00:16:41
Speaker
My biggest thing, and if it's the only thing that my students take away, it's that, Visualize the data before you do anything else, because I'm probably miscon mispronouncing it, but it's the Acombe's principle where you have the four different data sets. yeah They all look the same, but they're totally different when you visualize them. So that's something I brought up immediately. and Every single class I said, hey,
00:17:07
Speaker
We're visualizing the data first. Hey, your homework's due. I want to see visualizations first. So kind of taking the same principles, which I think are incredibly important. Yeah. um So Shannon, I want to i would take this in a little bit of a different direction because you can't have a conversation these days without talking about AI. It doesn't matter what you're talking about. it has AI has to be involved.

Shannon's Approach to AI and Coding

00:17:29
Speaker
I'm curious because this is something I've been wrestling with in my head when teaching data viz is is you know how is AI changing the way in which I teach or the assignments I'm going to give.
00:17:39
Speaker
Did you and your and your classmates, did you use AI tools? I mean, I don't know. I mean, at least the universities I teach in don't have any rules yet about proper AI usage. So it's like the wild, wild west. um Like, what was that? Like, did you try AI tools? Or was it just like, did you find it better?
00:18:00
Speaker
more worth your time to just like stay away from those and just really kind of solve the problem in in the code itself. Shadda, that you graduated, you can tell the truth. Yeah, now you graduated, you can say whatever you want. you you got You got the cap yesterday, you're good. Right, right. yeah Well, I definitely, I feel like, you know, asking about my classmates, I feel like maybe all of us kind of used AI in different ways or maybe approach them in different ways.
00:18:26
Speaker
The only time that i usually used it is to understand errors because I noticed that if I tried to use that to learn how to code what I was just taught, I retained absolutely nothing.
00:18:39
Speaker
and And any it wasn't worth my time personally. um And I noticed that if i if I was using it to try to um you know, come up with a model or or clean the data. I didn't even know what it was doing necessarily. And actually it wasn't even necessarily doing exactly what I wanted it to do. You know, so if I was going to use it, it was in a way where I am prompting it very specifically with what I've already done And it was more so in a way to understand what I should be doing and not to have it do it for me. So i i was having ah if I was having trouble
00:19:19
Speaker
like you know, just aggregating the data in a certain way, or i wanted to visualize it in a certain way, i would maybe help have it help me in that way. But overall, and maybe this is just because of the way that R is structured, um I prefer to build it myself because I know exactly which steps I took. When I need to backtrack, I know exactly which step to go back to.
00:19:46
Speaker
I feel a lot more connected to the whole code. um when I'm doing it rather than um you know the steps told to me. And so I feel like i really i really didn't use it all that much because it just didn't serve much of a purpose for me personally. right I'm so proud.
00:20:08
Speaker
Yeah, i mean, that that that um I think echoes my experience as well. Like when you have those like maybe weird errors or like it's super like you just can't debug it. like some some And sometimes the the AI tools still can't get it right. Exactly. So, Christian, i want to I want to turn this back to you, like thinking ahead. How are you thinking about, i maybe this is more of a question for your undergrad classes, actually, because I would i would guess those are larger classes and and maybe students have more of a ah inclination to use the i AI tools to go faster. But like, how are you thinking about potentially changing your approach to teaching coding skills in this new era where
00:20:54
Speaker
you could conceivably just pop something into, you know question into into Claude and say, write this for me. Yeah, it's ah it's it's not something I've fully grasped yet. I will say that i I don't think I was shy about encouraging my students to use AI because of the fact that like,
00:21:14
Speaker
if you're not using it in some capacity, now you're slower than the rest. And my whole mantra was, hey, you're about to graduate, I'm trying to prepare you as best as possible. So if I just teach you how to code and not expect you use AI, when you get out of the my class and you get into the real world, you're gonna be way behind.
00:21:33
Speaker
And so it's not gonna be a skill. I have tried to, when, creating my homeworks.

Balancing AI in Learning Coding

00:21:43
Speaker
I tried to make sure that they mimicked our Tuesday and Thursday classes as best as possible so that you would not have to use Now, I have to admit that there were some times where, and and like there's there's two different categories, because like there's the times where I could see a student use AI and i was like, all right, like I understand why they used it there or how it helped progress them.
00:22:07
Speaker
And then there are other times where I'm like, what the hell did you write? Like, I'm not saying I'm the master, that's not what I'm saying. But there was one time in particular where I'm thinking of,
00:22:18
Speaker
and a student wrote something that was so convoluted and so complex. I was like, this is impossible that they wrote this. So that there's the, it's the catch 22 because it can be very helpful and push them to almost like what I think Shannon's describing, like a ah better version of what stack overflow was.
00:22:36
Speaker
Yeah. Yeah. A hundred percent. Which is, yeah, exactly. It was almost like a more instantaneous or a more like individualized answer for what I needed in the moment. But I,
00:22:48
Speaker
I felt like the way that you had set up the the classes and the corresponding homework, there kind of was no need for it, you know? And actually a lot of the stuff I found on Stack Overflow. So it wasn't even like I needed to necessarily use an AI tool to find that out.
00:23:06
Speaker
And again, I feel like i I retained so much more of it when I didn't just get a quick answer. And ah kind of pushing back against your, you know, going into the,
00:23:18
Speaker
into the job market answer. heart Part of why I prefer to build it myself and kind of like make mistakes and, and go back. And I would spend hours on these homeworks. And I don't think this, these homeworks necessarily required hours to be on, but I learned so much just by trial and error. um And i use stacked overflow a ton. um And I feel like going into the job market, maybe, um maybe that's something
00:23:49
Speaker
hopefully that I can provide that because sometimes these AI tools, they might, um you know, they might answer the question that you're looking for. They might do it very quickly, but it doesn't necessarily mean that the quality is there and the, um,
00:24:04
Speaker
the attention to detail is there and even the creative ah approach to answering these questions. And so I feel like by kind of straying away from it, but also using it as a tool when needed, um i feel like I was just able to, i feel like I was able to learn so much through using it in that way rather than, you know, going to it first. I feel like it was like my last effort rather than my first effort.
00:24:33
Speaker
No, I think that's the the the right way to think about it. i Personally, i think the people who take the more of the approach that you took, which is let me learn this skill and use ai as that like supplemental tool is how you actually learn these things.
00:24:51
Speaker
um that the the ai I mean, they are just tools. They're not quite there yet. I mean, I've seen them make lots of error in the code that I have tried to write. and um you know Yeah, I mean, I think it's it's ah it's a harder piece, I think, from the instruction side of how to, I think, sort of great you know you know evaluate people's projects when you've got someone spending hours and someone spending five minutes in a coding world where the answer, if it's just a mathematical result,
00:25:25
Speaker
I think that's harder. But the thing about R and that we're going to get to in a moment with these projects is that there is a visual component to them. And that's where the creativity comes in. And that, at least for the moment, still the the human endeavor. So if I may, I've said a joke probably too me too many times, but I always say r puts the R in artist.
00:25:48
Speaker
And I know it's a corny joke, but exactly that. I think so many people get maybe op appropriately scared when they are introduced to any programming language, whether it's R or something else, and it's like, oh my God, this is crazy and this is technical.
00:26:01
Speaker
But I really see it as an art form because Shannon versus any of my other students could all get to 100 on the homeworks and all have totally different code. It's not right or wrong. it's It's their way of speaking. It's their way of displaying their art form. It's their way of personalizing their own work. And that's the beauty of it. And so another mantra i was trying to introduce is hey, this is an art form.
00:26:27
Speaker
have fun, do what you want to do and what comes to you. And I think also if you add that to the what we've been talking about, you're a little more excited to write code yourself.
00:26:39
Speaker
e Yeah, I wonder if the R package that's artist would leave out the A or the I. i don't know. You'd have to leave out something. yeah OK, so let's turn to the the final product, which is this

Turning Student Projects into Public Books

00:26:56
Speaker
open library. so like Christian, when you got to this idea and talked to the libraries, and i want to build this sort of like digital book of these projects.
00:27:06
Speaker
Was there any hesitation that like, maybe the students wouldn't want to do this or had you already sort of like crossed that bridge and the students were like, yeah, we're on board. Let's build something that like everybody, like it becomes a portfolio. Like what was the sort of feeling for you about turning the coursework into a public project?
00:27:24
Speaker
Okay. So can i answer that twofold? Yeah. yeah So the first one is that I had mentioned, I believe that I mentioned that I wanted, once I got word that we could do this book and there was funding for it and grants and it was possible, I had the idea and I sent it to my students like, hey, this is our final project.
00:27:47
Speaker
It's already available. You can start working on it, but I want to make turn this into a book. But if I may step back a little bit before, I actually started prepping my students on being authors, I would say, from the first week. So I had this accidental, what I call the accidental author process.
00:28:08
Speaker
idea and so and shannon may be able to attest to this so each week as i've mentioned you had homeworks right and i wanted to of course get as much of my students work out into the open because i don't have a traditional lab and i wanted to get away from this one and done world in academia where you do a homework you never see it again Maybe it lives in your downloads file folder. Maybe it lives in your documents. Maybe you already threw it out.
00:28:39
Speaker
A lot of times in academia, and to no one's fault, you do a homework and that's it. So my thought was instead to turn all of their homeworks into their own portfolio book.
00:28:54
Speaker
So what ends up happening is that their last homework assignment right before they have to do the final project is they have to take all of their homeworks and turn them into a quarto book.
00:29:06
Speaker
So they each have their own portfolio. So it takes their homeworks from ah just a homework to, man, this is a portfolio piece that I could send to a recruiter, to a job, to this. end from a pedagogical standpoint what happens is is i double the amount of touch points that you have for each homework because okay i did my homework from week one and week three and that's the only time i touch it but now i have to put it into a portfolio book i want to make sure that all the code looks good at the ad introduction etc so now that's two times you've worked on the same homework and i've had this outstanding
00:29:47
Speaker
rule that if you wanted to get points back on your homeworks at any time, you could work on them and resubmit them to me, which means that a maximum of three different touch points for each homework.
00:29:59
Speaker
So even before we talked about this collaborative book, my students were already surprised to become their own authors and work on stuff themselves. And I was hoping that that would prep them for, okay,
00:30:13
Speaker
I've worked on my own stuff that is a representation of me when we're doing this New York City Open Data Student Gallery book of real research. Maybe that could push me even further.
00:30:25
Speaker
So that was my my twofold goal of getting students work out there. And so it was at this point where you said, we're not going use empty cars, we're not going to penguins, um we're going to use the the New York City open data. And was that like part of the requirement that students, I mean, there's a lot of opportunities there. So it's not like you know it's not like there's two things to look at. But like was that one of the requirements that, and and we should also talk about the R package that you that you both developed. but like Was that part of the requirement of the of the final project? You have to be within this sort of New York City open data ecosystem?
00:31:04
Speaker
Yes. so So I had a lot of tricks up my sleeve all semester. Maybe too many, but... Future students should be should be listening to this, right? Yeah, it's so...
00:31:17
Speaker
You're absolutely right. I did not have I did not come into this class thinking that we would create a book. That was something that I had that through all of this. And I did not know that we'd be using open data throughout the entire class.
00:31:30
Speaker
What I did know is that the final project was going to mimic the requirements for New York City Open Data Week. So for anyone that's not aware, New York City has the New York City Open Data Portal.
00:31:45
Speaker
And then each year they host a conference, the New York City Open Data Week, where anyone using data that's related to New York City and open can present.
00:32:00
Speaker
So we tie it back to, I don't have a traditional lab. I don't have like conferences that I traditionally go to, but what if all of my students as their final projects create research that uses New York City and open data, potentially New York City open data, but that wasn't a hard requirement, just those two.
00:32:21
Speaker
And their final project can live outside of just the classroom and potentially be presented at New York City open data. Gotcha. Now, the individual my individual students did not end up presenting at New York City Open Data, but our book, which was funded by the OER librarians and by Brooklyn College and CUNY, we proposed to present at York City Open Data Week this past March, and were accepted. So all of my students and I presented our our work, which was amazing. Nice, nice. So collaborative presentation of a collaborative project. yeah
00:32:58
Speaker
um Okay, Shannon, how about this? um Maybe you can tell us about your project. And then I think I want to i want to ask you both an outside the box question. But can you tell us about like, I want to hear both about this mold domestic violence project and um and also not just the the data itself and your process, but also like, what was the final thing that you created?

Shannon's Presentation at Open Data Week

00:33:25
Speaker
Yes. So um it all obviously happened in R. The final creation was at least when I had submitted it for class, which was the semester prior, it was in R markdown. So it was a lot of coding. It was a lot of cleaning and the the test that I ran and visualization. So that was the format that it was uploaded as. And when we had...
00:33:51
Speaker
um presented it in Open Data Week, we had turned it into, what was it, ah a Quarto presentation? yeah yeah So we all put it into a Quarto presentation. So what was cool was it was all done within our studio down to the presentation format. um But so my project, looking at mold and domestic violence rates, I just wanted to see how they kind of co-occurred together, if they did, if they didn't, what kind of trends were there. And unsurprisingly, they did trend together.
00:34:22
Speaker
There are lots of reasons why this could be. um Something that I found interesting was I was looking at resolution times. I wanted to see if the, I wanted to see the resolution times for mold complaints and if more or less resolution times correlated with more or less domestic violence rates. And interestingly enough, longer resolution times um were correlated with less domestic violence rates. And my takeaway from that was kind of these people are probably, you know, like pulling away from the system because they're not seeing results when they report stuff.
00:35:00
Speaker
Um, so the project overall was interesting. It was a lot of cleaning because I, what i do and what I keep doing to myself is I love to take these big projects and then I'm like, yeah, it it should take me like, you know, maybe a week or something.
00:35:17
Speaker
I don't know where, I don't know why I do that, but then I'm up for hours just combing through and combing through. and that wasn't the first time. And it was it's definitely not the last time.
00:35:31
Speaker
And I'm kind of going through right now with another one. But You know, you really get to know the data and I thought because a lot of these data sets I'm looking over a period of years, they're all different data sets per year. So now I'm working with like 15 different data sets at this point and um I'm filtering within the open data portal. before I even download it to my computer because the data sets are massive. um So it was just an interesting process altogether. i think if if I were to go back, I would have done a few things differently.
00:36:06
Speaker
i probably would have looked at rates rather than raw numbers because I feel like that kind of gives me a better insight. But the whole thing was a learning process. um And when it came down to kind of going back to the project and recreating it for Open Data Week because... I submitted it back in December and open data week was in March. So I kind of returned to it and I was looking at it and i'm like, why did I do it this way? And now I can't even recreate it. And now I'm going to present this. So it was just, it was, it was cool to go back a few months later.
00:36:41
Speaker
even having learned more since then and seeing how much I've i've progressed since then. yeah um But yeah, it it was just, ah it was cool to be able to kind of create my own thing just with the open world of the of the open data portal. very Right, right, right.
00:37:01
Speaker
um Before I turn to my out of the box question, i did I had a note. I want to make sure we talk about the R package. So you two and maybe others work together to build the and NYC Open Data R package. Christian, maybe you can talk just for a couple minutes about about what that is.

Developing R Package for NYC Open Data

00:37:17
Speaker
Absolutely. So... Common theme, empty cars, penguins, a lot of the data sets that I was experimenting with, epic fail. My students are like, hey, I need something awesome and relevant. And so we turn to New york City Open Data and all of a sudden we're looking at slashing and stabbings and interesting things. And tried to switch it up each week for a different topic and do totally different data sets.
00:37:42
Speaker
So the problem is is that with pulling in New York City open data, we have two options. We can either download the Excel files and put them into R, which like structurally works and is an important skill to learn if you're going into corporate world because you're working with a ton of data sets.
00:38:01
Speaker
But if you're working in like a ah more programming setting, it's not the best way to do it. Because number one, if you're working with something like 311 data set, which changes every day, now you have to download and upload and your names get switched up and where is this and where is that.
00:38:21
Speaker
And one of the beautiful things about New York City Open Data, the portal specifically, is it works with an API. So you can connect through API. But then... That would mean that I have to teach my students how to use R, how to run statistics, how to visualize, how to story tell, how to create documents, how to create presentations, all the different skills within all of those, and then add how to use APIs, yeah which is like an explosion of of information and really just too much for my students, in my opinion. And don't worry, he didn't throw that in there. learn We did learn how to ah how to do that So i was like, okay, like how do I not just absolutely lose all of my students, which I'm sure I was on the brink of many times. and So i was like, you know what what the hell?
00:39:10
Speaker
Let me see if I can make an R package. So it started with just the 311 open data set where it was like New York City underscore 311 and it would just call the New york City 311 data set.
00:39:23
Speaker
And then it turned into several functions where each function would call a specific data set from the New York City open data portal. And then I provided the opportunity for my students to become
00:39:39
Speaker
creators of their own functions within a package. My thought was, hey, we know R. Let's see if we could take it to the next level. Here's my code. You can remix it however you'd like to work for data sets on whichever you want.
00:39:54
Speaker
Again, trying to maximize the level of creativity. So hey, you like this data? Boom, work with this data. And that was regardless of whether you wanted to do it for your final project or for the package.
00:40:07
Speaker
So then I think very cool, all of my students contributed one specific function to the package and they became, you know, junior software developers. Unfortunately, I'd say good and bad, unfortunately, all of their functions, including all of my original ones, are no longer part of the package. ah Yeah, so I submitted it to our open side, and they're one of the if not the premier communities that try their best to make sure our packages are at their highest caliber.
00:40:44
Speaker
So I submitted it for peer review. They loved it. It has been approved by one and hopefully will be approved by the second one this week. But it was approved by two reviewers. But one of the things that they were saying was, hey,
00:40:57
Speaker
you've got 40 different functions. They all basically do the same thing. Like, it's really not the best way to handle this. And they're 100% right. Was it great for my students to get exposure and learn how package creating works and how functions work and how they can remix other people's code? 100%. from a yeah an actual perspective, it was not maintainable. And if that meant that if there was a problem in one function, all the other ones had to be in.
00:41:25
Speaker
You know, and anyone that codes, copy and paste is ah is a nightmare. yeah so But now it's been downloaded or installed over 2,000 times by over 2,000 different people.
00:41:37
Speaker
It about to be hopefully approved by our OpenSci. And I'm just really glad that it can help not only my students, but anyone that's using R. So that's great. So, so, um, did you then go back and like streamline the whole thing? Yeah. Like you had all these functions. So, so they said, you know, this is great, but you know, x Y, and Z. So did you go through that and did you just streamlined the whole package?
00:42:00
Speaker
Yeah. So I turned the 40 different, about 40 different functions into three. So the first one is, which is super impactful is it's a, it pulls all the metadata about all the data sets on new york city open data portals and there's like don't call me maybe like 2500 2000 so there's a lot and that means you have access to all of them then the second one is you could take the ID that you get from the first one and just plug it in. So now it's just, hey, what specific data set do you want to pull?
00:42:36
Speaker
And then the third one is, ah i would describe it as a fail safe. So you could take the actual JSON link, put it in, just in case you can't find it on the metadata or there's something wrong or you just want to do it your own way, you could put your own JSON link and it works. So 40 three is is Huge. yeah Yeah, definitely. Definitely an efficiency gain. OK, I want to ask you a a little outside the box question

Positive Datasets in Open Data

00:42:58
Speaker
here. And I'm going to give credit to this question to um Jason Forrest from Data Vandals, who was on the show a few weeks ago. And and he and I were talking yesterday. We happen to be talking about ah New York City open data as part of our conversation. and And he made a comment that the data in the New York City open data, and I would venture to say most open data, city level open data platforms, tend to be pretty negative, right? It's
00:43:25
Speaker
mold, it's domestic violence, it's ah it's probably it's a lot of 311, 911 calls, right which tend to be kind of on the negative side. And I'm curious, and I haven't gone through all the projects, but I've seen bunch of those that are sort of in that similar vein. And I'm curious,
00:43:40
Speaker
whether you think there should be a place in these open data sets for stuff that's like more positive. And I don't know what that is necessarily. I mean, obviously the one that comes to mind is like, you have a survey of people saying how great New York city is or how great their train station is, whatever. But I'm curious if, and i know I know I'm throwing this at you like last second, but I'm curious about how you, what you think about that, of having this data set that is 2,500 different series that tend to be kind of negative things when there's a lot of positive stuff that we could get in data.
00:44:15
Speaker
Shannon? Shannon, if you want to first. I actually, yeah. so um i love the idea of positive, ah you know, data sets in open data and whether or not what's on there already is negative, maybe more neutral, but the insights might be negative or the the hard thing about it is, and correct me if I'm wrong, Christian, because you're closer to that community. But I believe that most of these data sets do come from city agencies. So the data that they're, or the data sets that they're putting in there are, um you know, stuff that they've collected for their own missions and and for their own reasons. And so I think,
00:45:00
Speaker
it just makes it hard to get those more positive data sets for a reason that would be for agencies, if that makes sense. Like I feel like the agencies would need a positive reason to have that. But with that being said, I do love the work that data vandals does because they get to engage with the community and they do get to get these opinions. And ah a lot of them are very positive. they're They're just different than what you usually would get with open data because it's a little bit more qualitative. Yeah.
00:45:29
Speaker
Yeah. um and And just getting to talk to real people on the street. I would be interested to see how something like that can be worked into the open data system. I'm just not sure how, given that it mostly comes from the agencies, if that makes sense. Yeah, I agree. Christian, i will give you a user chance. i mean i think I think negative is... um Negative is a hard, is maybe a harsh word because, you know, we want people to report domestic violence incidents, right? So you don't want there to be domestic violence, but when there is, you do want people to report it. um
00:46:06
Speaker
You know, is having reports of the potholes being fixed, like that I get, that's a positive thing, right? And you want to have that information that the potholes are being fixed. um But i think I think to your point, and and I think what Jason was was getting in our and conversation is like, there are a lot of great things happening um and they're not always collected.
00:46:28
Speaker
um And yeah, I don't know how to collect them, certainly at scale. I think that's ah that's ah that's another hard thing to do. But I think your point about like these are agency mission driven data sets. And so there's probably not the New York City happiness agency, although that would be a lot of fun, right? Like they're not around giving balloons everybody, but. Right. And and it also might be that the way that we are engaging with the data is we're looking for so for issues and and not sounding like, oh, we're looking for issues, but but we're we're looking for a problem that needs to be fixed. We're trying to find data on what's going wrong so that we can figure out how to make it right. um
00:47:08
Speaker
But a project that I just well recently have been working on and Christian's been helping me out with it is... um looking to see how speed cameras have improved ah crash-related injuries over time. And it is, i like, you know, maybe it's not the most engaging ah data set, but it is, that is a positive, you know, that's an improvement in infrastructure. And so I think it might just be the way that we are analyzing and engaging with the data. It's a lot easier to find what's wrong
00:47:42
Speaker
ah in whatever sector you're looking in. yeah But depending on the question that you ask, you can also probably find a lot of good, you know? Yeah. k Christian, i'll let you I'll let you take us out. like let me let me Let me rephrase it a little bit. um If maybe you could get, if you had unlimited supply, unlimited funds, unlimited time, what positive data set would you like to collect from your fellow ah New York City folks?
00:48:15
Speaker
All right. Imagine that you just asked me this question off the fly. I would love to see the amount of friends made when moving to New York City.
00:48:28
Speaker
Oh, yeah. I think that would be a cool one. Something with like time spent with other people because i think that… I've been to all 50 states, which I think is super cool. And there is still no city like New York and there's always something to do. And as long as you are yourself in New York City, I personally feel New York City will accept you. Like it doesn't matter how niche you are, as long as you are yourself.
00:48:54
Speaker
And so I think there's such a beautiful community inside that there was a way to measure how much or how intertwined a person was either. on a normal scale or when they like become a New Yorker and they no longer are Taurus and moved in, I think that would be cool.
00:49:13
Speaker
that's my That's my hot take. I like it. I like that hot take. I like ending on a positive note. That's a good one. That would that would be fun. um Okay. Let me just round out. Christian, where can people find the data lab and where can they find the R package?
00:49:29
Speaker
Yeah, so all of the amazing work that myself and my absolutely great students that are so hireable and you should definitely hire, especially Shannon. Good one. all of it yeah all of them All of the work can be found on the New York City Open Data Lab dot org.
00:49:49
Speaker
And so that includes the R packages that we created because we turned the New York City one and did it for Chicago, LA, we're working on Austin and we're working on another one. So there are a few different cities that we've done, especially in New York City. the The book, the New York City Open Data Gallery book that we all created is on there.
00:50:15
Speaker
Additionally, I created like a meta book of all of my individual students' portfolio books can be found on there. Shannon is currently working on an amazing analysis of the relationship between the increase in speed cameras and motor vehicle crashes, specifically death-related crashes, which she's working on, which can also be found on the lab, new York City Open Data Lab.org. So that's the place to be. Try to create one ecosystem, not only for all the work that I'm doing, but all the work that my students are doing and have done.
00:50:49
Speaker
Awesome. And Shannon, in addition to the part of the the lab that has your project, where can, and I'm going to be explicit about this, where can potential employers find you?
00:51:02
Speaker
Well, they can find me at, me let me pull it up, but it's my my GitHub username is Shannon Joyce. i no Yeah. GitHub.com slash Shannon Joyce. They can find me there. they can find me on LinkedIn.
00:51:15
Speaker
um they If they search me, they can find me. Okay. Sounds good. The way of 2026. Easy to find. exactly All right. Christian, Shannon, thanks so much for coming on the show. This is ah this has been a lot of fun. Thanks. Thanks for having us Thanks everyone for tuning into the show. I hope you enjoyed that episode. I hope you learned a little bit about R, learned a little bit about AI, and learned a lot about New York City Open Data. I'd encourage you to check out the New York City Open Data Lab where all those projects are published and also check out the R package and to encourage you to listen to more episodes of this show.
00:51:49
Speaker
One last request, if you take a moment of your time and please rate review the show, wherever you get your podcasts, iTunes, Spotify, wherever you're listening to it, I would appreciate it. So until next time, this has been the PolicyViz podcast.
00:52:01
Speaker
Thanks so much for listening.