Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Open Context and Data Sharing with the Kansas - Episode 6 image

Open Context and Data Sharing with the Kansas - Episode 6

The ArchaeoTech Podcast
Avatar
245 Plays10 years ago

Russell and Doug chat with Eric Kansa and Sarah Whitcher Kansa, the husband and wife archaeologist team behind the data publication tool "OpenContext" about data sharing in archaeology, the culture of archaeologists, White House awards, and more!

Recommended
Transcript

Introduction to ArchaeoSec Podcast

00:00:01
Speaker
You are listening to the Archaeology Podcast Network. Hello and welcome to the ArchaeoSec Podcast, episode four.

Introducing Guests and Discussion Topics

00:00:11
Speaker
I'm your host, Russell Eileen Willems. Today on the show, my co-host Doug and I talk with Eric Kanza and Sarah Witrick Kanza about their work with open context, open science, and sharing zoo archaeological data.
00:00:28
Speaker
Eric, would you please give our listeners a little background about yourself and your current project?

Eric Kanza's Journey and Open Context

00:00:32
Speaker
Hi, thanks, Russell. Yeah, so I direct Open Context, which is a data publishing venue for archaeology. And my background is mainly from Near Eastern archaeology. So I started out as an undergrad working my first field school in the Negev Desert of Israel.
00:00:52
Speaker
And in grad school, I was working in Israel, and in Egypt, and a bit in Jordan and Turkey, doing research in the early Bronze Age. And I sort of, while I was doing that, I shifted gears getting really interested in digital data and archaeology. And after I finished my PhD in 2001, we started really focusing on the needs of data sharing and archaeology. Great. Thanks for that, Eric.

Sarah's Work and Founding Alexandria Archive

00:01:22
Speaker
Sarah, would you mind giving a little bit about your background and some of your current projects? Sure. Yeah. Hi. And thanks for having me. So I also started my research in grad school working in Israel in the Negev desert. You can see the link then between the two of us.
00:01:37
Speaker
especially where we met and started sort of talking about these things. I ended up going into zoarchaeology and I worked mostly in Israel and then moved into Turkey and worked there for 15 years on a Neolithic site. And over the course of that time, Eric and I started talking about our frustrations about access to data and the data that we were producing, the future of it, where it was going to end up or not end up, as the case was.
00:02:03
Speaker
And that's where we got sort of the seed of the idea for the Alexandria archive Institute was really born back in about the year 2000 where we besides that was Time for us to try to
00:02:17
Speaker
put something together that would help people provide access to the data that they worked so hard to collect. And that would then just languish in general. So we started, founded this nonprofit organization. And then several years later, the idea of open context was born out of that nonprofit organization. And it has become the main focus of our research efforts with the Alexander Archive Institute is focusing on getting funding for open context to help people publish their data openly in order to provide access to it.

Understanding Open Context Platform

00:02:47
Speaker
Could you guys tell us a little bit more about what Open Context is? Obviously, we're going to put a link in the show notes so people can check it out. But could you tell us a bit about how it works and who sort of puts their data on it? Yeah. So what Open Context is, is a, so we call it a data publishing venue. And the way it works is that it's
00:03:11
Speaker
basically a giant database, and it has a very generalized kind of a backend data structure, data model or schema. And that allows us to publish very disparate kinds of research data that archaeologists create. So, you know, archaeologists work in all sorts of different kinds of regions, time periods, they work with very different kinds of materials,
00:03:40
Speaker
And in order to handle that wide diversity, we have to have a pretty generalized system in the backend. And by doing that, we can also index everything, make it all cross searchable. And increasingly, what we do is a lot more in the way of linked data annotation that makes these different data sets cross-references them with one another.
00:04:10
Speaker
so that there are more meaningful linkages across these different databases. So the data sets that we get

Data Publishing Process

00:04:18
Speaker
are coming to us from field researchers who have sent us basically Excel spreadsheets for the most part. Sometimes people send us relational databases like an Access or FileMaker or something like that, and big image archives. And what we do is we map these different data sets
00:04:39
Speaker
into our own system. We have a data cleanup stage where we review the data set, try to make them more consistent, and we link them up to these linked data standards that allow them to be cross-referenced across one another. And then we publish them, and when we publish them, they're web resources that you can link to. Every individual pot, shirt, or bone, or whatever is its own individual web resource that has a stable identifier.
00:05:08
Speaker
And when we do that, then we finally put that into a digital library. It goes into the California Digital Library, which is the main university repository for the University of California. And that acts as our preservation repository. We don't have the capacity ourselves to preserve things for the long term. And I'd actually like to add something about who the users are.

User Flexibility in Open Context

00:05:37
Speaker
We have tried to keep it very flexible and open because this is all sort of a brave new world in data sharing and data publishing and open context is actually I think that concept was a little ahead of its time and the technology, more and more technologies are emerging that are making it more and more sort of
00:05:53
Speaker
the vision, I think, that we had for it in terms of linked data and ontologies and that kind of thing. But in terms of data publishing, there's the question about what that is and how that looks to people. And so we have tried to keep it very open in terms of we will take large field projects that have a whole bunch of different types of diverse data. We will also work with graduate students or single individual researchers if they have one data set they want to publish.
00:06:21
Speaker
We'll work with them on that. We have several projects that are specifically to link data sets with like chapters and books or with specific publications. And so those are maybe of a slightly different type of nature. And so it's about we've tried to leave it open to be flexible to accommodate the different types of ways that people want to share data because there's no really set idea of what data sharing is. Now, data publishing, something very new. How
00:06:52
Speaker
So if for a listener who does not have never done this before, how is it very similar to let's say publishing an article or how is it very different? Do you guys do peer review for these data sets or do you just do your sort of cleanup and then it's out there for anyone to try to use?

Peer Review in Open Context

00:07:10
Speaker
Yeah, that's a good question. And this is also one of these things that is
00:07:16
Speaker
we're honestly, we're trying to sort through what data publishing really means and how to do the process well because this is a really new kind of a thing. And the fundamental need that we do see is that some sort of, there needs to be some ways that contributing researchers who have data need to work with other people to, and it's a sort of co-production with,
00:07:46
Speaker
co-production between the researchers who can submit the data and us, sort of an editorial team, to work with them on improving the quality of that data and aligning it to expected standards. That's sort of the fundamental need. And that's where it's very similar in some ways to conventional publishing, where when you publish an article, you submit an article to a journal or a manuscript for a book.
00:08:16
Speaker
It goes through an editorial process that also involves really collaboration between different people who have different roles, right? So somebody producing the content, other people with editorial expertise. Sometimes that's copy editing. Sometimes it's more thematic editing to sort of make it fit some sort of expectations of quality and standards. So data need the same kind of thing. They really need that kind of investment. And it's an intellectual investment also.
00:08:46
Speaker
As far as how peer review goes and all that, that's something that needs to be a little bit different, I think, for data. We don't want to filter for impact, which is what a lot of journals do. So if you submit a paper to Nature or Science or something like that, they might say that, well, it's scientifically fine, but it's not very interesting. And they'll reject it. Data, I think, doesn't need to work that way. I think data is best when it's
00:09:15
Speaker
When it is shared, the impact might be far downstream. And it could be, if anything, just expanding like the sample size or statistical power for future meta-analyses that sort of use lots of data together. So we don't want to reject a data set because we think it's boring. And so that's a different kind of a thing. So peer review around data needs to focus much more on how well a data set sort of meets the kinds of expectations of
00:09:45
Speaker
what people need to what people need to see in terms of documentation and understanding that data. And if the data seems to have any sort of big methodological problems or some gaps in documentation, that's what peer review is intended for.

Linked Data in Archaeology

00:10:03
Speaker
And it's, again, for us, it's mainly for trying to work with the researchers to miss the data to improve that rather than just sort of reject it. You guys mentioned linked data.
00:10:13
Speaker
Now, is that linked inside of open context or do you guys have linked data out to other databases or other repositories or do people have people linked into you and are using your data and maybe another project on a different website? I think so just to explain a little bit. Yeah. So the essential bit about linked data that's really important is
00:10:38
Speaker
the notion of referencing different concepts or other kinds of entities, like a record about a bone or a potcher.
00:10:47
Speaker
with a web identifier, a stable web identifier, so a web URI. And when you do that, when you publish data that has stable web URIs for different entities and different concepts, and those concepts could be terms in a controlled vocabulary, or there could be records of individual bones or potchards or whatever,
00:11:11
Speaker
then you can cross-reference different data sets across the web. So it allows you to do data integration on a web scale, which is really cool. But then it also works within open context. So I guess a concrete example of that is with the zoarchaeological data that we've been working with recently, a project in Turkey, where we integrated over a dozen data sets from the Neolithic around around the Neolithic period.
00:11:38
Speaker
People will record species names in different ways. And one of our aims is that we do not want to force people to standardize the way that they record their data. So I'm not going to make people call a wild sheep Ovis orientalis, even though that is the Latin name for it. Some people might just say wild sheep. Some people might say Ovis orientalis. Some people might have a code that they use for it. And there's all these different ways that people might record that particular species in their data sets.
00:12:07
Speaker
What we do then as part of the editorial process with open context is we can take all the different ways that people have referred to that particular species and we can use an authoritative identifier and what we did with the taxa is we used the encyclopedia of life which provides concepts for different species and so we can then link that those different
00:12:31
Speaker
descriptions of wild sheep to the authoritative link that that describes wild sheep and so within open context then those data sets are linked and then also with anything else on the web that references that concept it's also linked so it does internal as well as is beyond open context linking yeah absolutely so to answer your question yeah we we
00:12:54
Speaker
make linked data or linkable data and we also reference other people's databases of linked data in a way to annotate the data that we published to sort of flesh out the meaning and in that way it makes the data more intelligible and also more interoperable with other data that's in open context but also other data that anybody else is publishing on the web. So it's really intended to try to make sure that
00:13:23
Speaker
The data in open context is related to that bigger ecosystem of information that is the World Wide

Founding Open Context: Motivations and Challenges

00:13:29
Speaker
Web. So how did you guys come up with this idea of making essentially all this linked data, putting it on the website, open context? Was there sort of a single moment or is this sort of a long term thing that you guys have been thinking about and eventually ended up doing? There was a single moment. We actually remember it.
00:13:51
Speaker
I mean, the very beginning of it, it was like in 2000, right? We were driving down the freeway, 580, going to Eric's parents' house. And we were talking about this, what I mentioned earlier, about the frustration of having collected so much data for our dissertations. And wouldn't it be great if there was a place you could go to access that data and the data other people have collected so that it would save time in doing your research? And that's where it was born. We decided we need to
00:14:17
Speaker
Create a place where this kind of thing can happen and obviously over the years it changed and worked in very significant ways. The idea of data publishing came many years later and it really came as a result. I mean for us as a result of having started working with researchers to publish their or to share their data and realizing they had a lot of
00:14:36
Speaker
needs and a lot of concerns about the way their data was visualized and the way other people would access it. And it's not just about throwing data on the web. And then as we started working with them, we realized it's also not just about making sure you've got your spreadsheet archived. Because in fact, in the end, it may not be very usable if you haven't gone through an editing process in the process to assure that people can actually understand it.
00:15:03
Speaker
Yeah. So that that was it. I mean, the other thing is just personal stuff that, you know, we were we really wanted to live in the San Francisco Bay Area. My parents are out here. Sarah's parents are out here. We wanted to have kids. And the sort of option of going for sort of tenure track archeology jobs was just not that attractive because you obviously don't have a lot of wiggle room and where you're going to live. You could just get a job if you if you're lucky enough to get a job.
00:15:33
Speaker
And two people with PhDs in Near Eastern Archaeology, you know, yeah, there aren't so many jobs. So we really wanted to try to do something that gave us a little bit more agency about where we could locate ourselves and also
00:15:50
Speaker
this kind of thing about focusing on data, I think it's still really hard for people to do in mainstream academic positions. It's really not those kinds of that kind of professional world just doesn't suit well for a focus on digital outcomes of, you know, doing stuff like software, playing with data, that kind of thing. It's just still very difficult for people to do within conventional academia.
00:16:19
Speaker
All right, I'm here with Jordan Harbinger from the Art of Charm, and he's going to tell us what the Art of Charm podcasts are all about. Go ahead, Jordan. Hey, sure. So thanks for the opportunity. Basically, what we do at AOC, this is the show that we wish we had 10 years ago, and I'm 34 now. So there's a lot of folks that are 20s and 30s, and we look at how we live our lives and the way that we do things. And it's always that, if I had known, if I had only known.
00:16:46
Speaker
What I'm doing with The Art of Charm and what we're doing as a team here is we bring together the best minds in pretty much every industry to teach people how to crush it in life with their relationships, at work, et cetera. So it's like having a mix of experienced mentors teaching you their expertise and packing all their research and testing and tough lessons, School of Hard Knocks or otherwise, into a curriculum. And we're very practical, which is great for your sort of scientific audience as well.
00:17:13
Speaker
Yeah, absolutely. This is great for networking, for just learning some personal skills that you can use on the job, and for finding jobs in your relationships with people.
00:17:23
Speaker
Yeah, so we talk about things like body language, the way you sit, stand, walk, and talk, networking, how to follow up with the network, how to be authentic when you're creating relationships for work, because a lot of people think networking is like, here's my business card, give me a call when you wanna buy a used car, and it's like, no, it's about giving, it's about relationships, but since people don't have a game plan, they kind of ignore it, and especially in your field, they're probably thinking, oh, I really hope my work stands up for itself someday, and I get that promotion, and it's like,
00:17:50
Speaker
Well, it's all about who you know. And you can either say, oh, it's all about who you know, and I hate that. Or you can be like, thank goodness it's all about who you know, because I'm never going to be the top of this industry until it's too late for me to care, right?

Jordan Harbinger's Introduction to The Art of Charm

00:18:05
Speaker
Right, right. All right, so go check out the Art of Charm podcast on iTunes, Stitcher, or wherever you download podcasts. And you can find them also on www.theartofcharm.com.
00:18:21
Speaker
I believe Russell has a question. Yeah. So Eric, it sounds like both you and Sarah found out through your own dissertation work that you needed data sets or that you wanted to make yours more available to other researchers.

Evolving Open Context and Web Standards

00:18:35
Speaker
How did you find out about the web standards and about ways that you could use these open web standards to share your data, not only with other archaeologists, but kind of interoperate with other data sets like the Encyclopedia of Life?
00:18:49
Speaker
Yeah, that's a great question. I mean, and this is where we've gone through so many iterations. So open context, the site that's up now is basically in its third major revision. And I'm doing another major revision right now. And it's got, and so we've been sort of always sort of trying to follow what's going on on the web, but also with more
00:19:15
Speaker
other kinds of archaeological, I guess informatics kinds of approaches. And what initially started was that we were going to be using a system that was at the University of Chicago called x star, and then it became ochre, which is a system that's being developed there that is very sophisticated archaeological data management.
00:19:40
Speaker
And Ochre is really powerful. David Sloan, who is one of the conceptual architects behind it, is on our board of directors. But when we wanted to use it, and for some of the archaeology work that we were wanting to share,
00:20:02
Speaker
It wasn't ready yet, and we had grant deadlines. But it was also a bit of a different kind of a vision. So OCR is very useful for sort of active data management, creating data, doing research with the data. What we wanted to do more was focus on the dissemination side, and this is back in like 2006, 2007.
00:20:24
Speaker
and there wasn't a lot around. And so what we did was we took the important conceptual bits of Oakford, the schema, the organization of the data, and we implemented that in a MySQL database with PHP, and that was the original open context. And so we are still very interested in a lot of the
00:20:49
Speaker
kinds of information organization approaches that Ochre and other researchers are developing. Now we're much more working with linked data. A lot of the new version of open context is retaining
00:21:06
Speaker
some of the key data structures that we got from Ochre with their system, their organization was called ArcUML, the archaeological markup language. They're not using that term anymore, but the key ways of organizing the information that they developed were retaining, but we're also using more in the way of linked data to reference controlled vocabularies, ontologies,
00:21:32
Speaker
and other data sets that other people are publishing, not just at the University of Chicago, but lots of different places around the web. So there's a lot of iteration with this, and this is one of the key things that you're never really done playing with this because the world of standards is constantly evolving, expectations change, technologies change, and so it really is very much a full-time effort to try to keep up with all this.
00:22:01
Speaker
And I think keeping up means, I mean, Eric, at least it seems that it's, it's really about sort of keeping your finger on the pulse of like the digital humanities community and what's happening in those areas. And so it's, it's
00:22:16
Speaker
being aware of these changes that are happening and thinking about how we might implement these various new technologies is very important to sort of not become, you know, cemented in one approach.

Guidance on Data Submission

00:22:27
Speaker
Yeah. And if there's some listeners out there who want to submit something to open context, do you have both advice for before you submit it, and then some advice for how to what the process is about and how to do it, you know, how to make it run smoothly when they do submit something to you guys?
00:22:47
Speaker
So yeah, we do have on our website, we have advice about sort of tips and good practices for how to prepare your data for being on the web. And we also have a project cost estimation form that actually when someone fills it out, just to get an idea of how much sort of
00:23:07
Speaker
publication and archiving of their project might cost. And this is especially for like NSF applicants because NSF requires a data management plan. And so we offer this estimation form on open context for people to fill out. Then that, when you do that estimation form, it's not a commitment. It's just an estimation. And you get an email that gives you a lot of tips about things to put in your data management plan and also about ways to work with your data to help make it easier to publish.
00:23:38
Speaker
So, and, but that's a lot of, we've been working with researchers a lot and what has emerged in working with people firsthand on publishing their data and doing the back and forth editorial process to prepare their data for publication is that we and researchers also are starting to realize that they're things that they had assumed were sort of common practices or standard approaches actually aren't.
00:24:02
Speaker
And the way people implement different standards or different methodological approaches actually varies pretty wildly across the researcher community. And it's something that nobody really noticed looking at their summarized data and their summary tables that they put in publications. You cite your mind using the method of whoever you cited and everyone goes, OK, great. I know that method. I understand these data. Well, actually, it turns out that people make their own little tweaks and changes to it or they
00:24:30
Speaker
sort of interpret the methodology a little differently. And so when you start looking at their actual data sets, these differences emerge. And some, we have heard people say, wow, you know, I would have never thought of doing it that way. And I really like that approach. And so what we're thinking is that over time, new sort of approaches and new standards will emerge that are a lot better informed and that the research community will be able to do a lot more sort of valid and realistic comparisons across data sets.
00:24:57
Speaker
I think that's a really good point because as we publish data, one of the things that we're also publishing are the ways that people organize their data. And that modeling aspect of archaeological research, the way that you organize your information has been really just very informally done under background thing. Nobody ever cared to look at it. Nobody really talked about it too much.
00:25:25
Speaker
And as more data gets published and shared, then the ways that people organize and model their information become really critical for future downstream uses of those data. And there are good ways of modeling data. There are some not so great ways of modeling data. And I think that this is one of the ways that not just sharing the data, but sharing the models is actually one of the interesting areas where this new frontier of open data is actually going to help advance the discipline.

Cultural Challenges in Data Sharing

00:25:54
Speaker
So, Sarah, it sounds like you have both learned quite a bit about the culture of archaeology and archaeologists from just having collected people's data and finding out some assumptions maybe they had that weren't shared by other colleagues. Can you speak maybe a little bit more about any other cultural obstacles to sharing data or to getting people to agree on a standard of whether they even should be standards in archaeological data?
00:26:18
Speaker
Yes, that's an interesting topic. When you mentioned standards, people either sieve or they nod their heads eagerly. People have very different ideas about the use of standards. And the approach of using linked open data to facilitate ties across our links across data sets has been wonderful because it has helped
00:26:44
Speaker
Let's have that conversation about standards without having to move toward some kind of agreement about standards, because this kind of solves that problem, that you don't have to tell people we're insisting that you change the way you do things. Another thing working with people on data sharing is sort of people's willingness to share data. It's been really interesting.
00:27:11
Speaker
Because in the beginning, when we first started doing this, people always said to us, oh, of course the grad students do it right, but the older people don't. Or they said, oh, of course the older people do it, but the grad students don't. Everyone had their assumptions about who would be here to share data and who wouldn't. And over the years, for us, it really seems more like it's about the person and their confidence in their work and their
00:27:37
Speaker
eagerness to get their data out there. It hasn't been about their juniorness or their seniorness or whatever. It's just been sort of person to person and it seems almost more personality based. Having said that, there are a lot of people who are still very anxious about publishing their data or sharing their data, but there are so many people who want to do it that right now we're just happy working with the people who want to share data and learning about data sharing in that process.
00:28:04
Speaker
And I think eventually it's just it's becoming so sort of more common now that I think I think it's just it's happening and it's going to happen.

Motivations for Data Sharing

00:28:14
Speaker
So one project that we did recently that we've now published on is this this project that was funded partly by the Encyclopedia of Life and by the National Endowment for the Humanities, which was this project where we shared
00:28:27
Speaker
that I mentioned earlier, over a dozen data sets from Anatolia. And the interesting thing about getting that project going was that we worked with our colleague Ben Arbuckle, who's at University of North Carolina Chapel Hill now. And he had already for several years been talking with all these different researchers about doing this project. He wanted to do a data sharing project. He was sort of getting the ball rolling, and he was establishing relationships with all of them. And they trusted him because he had been
00:28:56
Speaker
working with them for years. And when the funding came through, it was sort of sudden. And then he told them, oh my gosh, here, it's happening. And they were already all on board because he had that previous relationship established with them. And that was something that we found very interesting, was that everybody jumped on and said, great, let's do it. And we were just surprised by that. We thought there'd be a little more reluctance to put data out there. But I think it had to do with the fact that there was this person who they trusted and who had already established that this was going to happen.
00:29:25
Speaker
And it actually went really smoothly. Yeah. Yeah, I think that's sort of the sort of personal networks of trust are really important in the whole thing. And the other thing was that the zoarchaeologist, in that case, all had an interest in seeing each other's data, too. So this is so there are a couple of different motivations for people publishing with us. One is the sort of exhibitionism of individual researchers. Sometimes they want to show off how
00:29:54
Speaker
awesome their research is and the richness of the finds that they've got, the sort of sophistication of their recording. Those are the kinds of things that, you know, data publishing can actually help demonstrate. It's a way of sort of advertising really the sort of the richness of your own research program.
00:30:17
Speaker
And then the second thing with the zoarchaeologist was that they were really interested in seeing each other's data and working with it with each other's data. And so they had a vested interest in and collaborating with each other. And that was sort of a tight knit community that was really fostered by Ben Arbuckle to build up those collaborative ties, those ties of trust. And they had a mutual interest in seeing each other's research. Well, and increasingly another motivation, which is
00:30:43
Speaker
interesting to see now in recent years is that people know it's something they should be doing. And so rather than having to be convinced or strong armed into doing it, it's more of a like, oh my God, I've got to get my data together and do it. I just, it's more of a time issue than anything, I think. And have either of you two seen on kind of the backend of open context, people reusing the data sets that maybe warrant original contributors for say grad students doing a thesis or a dissertation?

Anecdotes of Data Reuse

00:31:13
Speaker
Yeah, so with the ZOARC data, one of the big things going on right now with that is that that data set is being reused over and over again in teaching and also for different people wanting to write different papers about it. So there's a lot of activity around that one.
00:31:34
Speaker
Other activity that we see around open context is sort of reuse. There have been classes using the API, the Application Program Interface, which is a way of getting machine-readable data out of open context. And they've been essentially using that for developing, say, geospatial data skills or web data skills. All right, so as far as the way we understand reuse,
00:31:59
Speaker
A lot of it is going to be anecdotal in the sense that what people tell us. So say when we're at conferences or we get email communications from people, we hear about how the data that we publish gets reused. And a lot of it is being reused now for an instructional context especially where
00:32:23
Speaker
Some of the data that we've got available, people are teaching data analysis skills and research skills using primary data. So that's a really important kind of area of reuse. We also know about people
00:32:38
Speaker
reusing the data for research applications. They want to publish papers. So with that zoarchaeology data set in Anatolia, that initial publication came out in PLOS One over the summer. And some of the other people participating in that project, they want to publish other papers reusing that same data set, talking about other kinds of issues. But we mainly understand these things anecdotally.
00:33:09
Speaker
where the other area of reuse could be learned, where we could learn more, is if we would be collecting more information about, you know, in terms of web statistics and web analytics. The problem is that we want to be very careful about web analytics because there are political sensitivities in archaeological data. And if we collect
00:33:34
Speaker
data about how individual users are using open context, these could be potentially politically sensitive. It could get some people into trouble. And this is one of the reasons why we sort of adopt the privacy guidelines that the American Library Association has developed for patron data. We want to make sure that we're collecting as little as possible about individuals.
00:34:04
Speaker
because you never know how this information could be used against individuals. It's an important aspect of academic freedom.

Scaling Open Context

00:34:11
Speaker
So this is going to be a slightly
00:34:14
Speaker
off topic of what we've been talking about question, but I've been thinking about this for a couple of minutes and wondering if each object or entry gets its own page, how many entries are you guys up to? I mean, the website must be somewhat massive at this point. Yeah. Um, we're, um, well, and then the next version we're exposing even more, uh, entities. So we're, we're looking at around,
00:34:41
Speaker
1.1 million different entities at the moment, especially with the redesign and open context. So, yeah, there are scaling issues. And one really useful thing right now is that I'm working closely with the German Archaeological Institute, the DAI, which is part of the German Foreign Ministry.
00:35:07
Speaker
And they have pretty large scale computing facilities. And what we're doing now is we're going to be deploying OpenContext and its index on their cloud computing infrastructure. And so that's going to allow us to scale things better and more cheaply than what we could do by ourselves.
00:35:28
Speaker
So with that, that will allow our index to grow and it allows to publish at that kind of granularity, continue to do so without running into severe memory issues and that sort of thing.
00:35:44
Speaker
And are all these web pages searchable and open to the internet? So in a sense, I could put in objects and a certain number and find it through Google, or is it something that some parts of it are open to the web, but some parts, obviously, if you have bots going through and going through 1.1 million different pages that can really take your server out.
00:36:11
Speaker
Yeah, for the most part, we allow Google on everything. The only thing that area we don't allow Google to go on is the part of the website that runs the faceted search. The faceted search is, just to explain what faceted search is, it gives sort of a summary of the overall collection or the overall corpus.
00:36:35
Speaker
In different areas of metadata. So one area of metadata would be say context right so where is something found another area of metadata would be authorship project information geospatial information are also different facets.
00:36:52
Speaker
computing those facets to count up how much stuff we have in those different of those different kinds of categories, that's all pretty computationally expensive. And so we don't like Google crawl that, but the individual records of everything. We like Google crawl. Another aspect of that, though, is that yes, everything in open context is open. That just
00:37:16
Speaker
You made me think about that. So far, we don't deal with information that needs to be kept private for some reason or whatever. We've only focused on publishing data that can be totally open access. And authors, data contributors, have to assure that they've collected the proper approvals and agreements in order to have the data published openly. Yeah. One of the things is being small,
00:37:46
Speaker
We don't want to get into the liability issues of trying to manage permissions around potentially sensitive information, especially in the context of, say, US archaeology, where there are pretty strict laws regulating site locations and other aspects of archaeological data.
00:38:05
Speaker
we want to make sure that we're not in a situation where, you know, if open context gets hacked, then we would be legally liable for something because, you know, we definitely don't have the staff to try to work with the highest standards of IT security and that type of thing. So there are, we have to sort of, the openness is, you know, part of
00:38:32
Speaker
our mission, but it's also there are practical issues associated with sticking with open data in the sense that we want to make sure that we don't get into a realm of security issues that we don't have a personal expertise or capacity to be able to handle.
00:38:48
Speaker
And having said that, we do work with data contributors to make sure that what it goes out on the web is what they want on the web. And so if they don't want their exact site location pinpointed, we will work with them to, you know, reduce the precision there. Yeah.

Recognition for Transparency in Science

00:39:03
Speaker
Speaking of openness, you guys won a or was recognized by the White House recently or by recently I think about a year ago. Did you guys get to go to the White House to receive your award?
00:39:15
Speaker
Yeah, we did. And that was pretty amazing and weird. So yeah, so just a little bit of background about that. So in 2013, in a response to a public petition that was, I think, signed by something like 65,000 people, the White House Office of Science and Technology Policy, OSTP,
00:39:44
Speaker
policy recommendation for open access to federally funded research. So if you're a research agency that has, I believe, a budget of over $100 million a year or something like that,
00:39:58
Speaker
Then you're required now by the White House, this is an executive order, to make sure that the publication outcomes of that research are available openly without restriction. And the specific implementation of that new order is still up in the air. But part of that recommendation from the OSTP also includes some discussion about open data.
00:40:26
Speaker
The White House, in making that announcement, also wanted to recognize different activists, basically, people who've been active in trying to promote more transparency and openness in science. And we were recognized as one of the, actually, the one other person
00:40:48
Speaker
He's now at the University of Pennsylvania. He does some amazing work with imagery of epigraphy of papyrus and documents.
00:41:04
Speaker
We were two people recognized who were mainly funded by the National Endowment for the Humanities. And so NEH was really thrilled about this too, because, you know, along with like the Human Genome Project and the Sloan Digital Sky Survey, all of these really big science kinds of initiatives from funded by the National Institutes for Health and National Science Foundation, which have budgets that are orders of magnitude bigger than NEH,
00:41:32
Speaker
NEH had two people recognized there at the White House, and that was kind of a thrill to have the humanities represented. Yeah, and this is out of there. I think there were 13 people who were recognized, and Eric was among those 13, Eric, and this other guy, Will.
00:41:47
Speaker
who were the sort of humanities folks. And it was called Champions of Change and Open Science. And it's part of the White House Does the Whole Champions of Change program. And I don't know if it's still going on, but it was going on for at least a year where they were recognizing people who were changing things and making an impact in their communities in various areas. And so the open science thing came out of, I think, the OSTP mandate that Eric mentioned.
00:42:17
Speaker
It was, yeah, they had a panel and they, it's online. They had a whole like sort of hour long panel discussion with all these people. And the president unfortunately was not there. And we had flown back actually from Germany to go to this thing. And he, it turns out we had flown to Germany to do something else. So we missed him.
00:42:36
Speaker
The future of open context, what are your guys' plans for the future? And what are some things coming down the pipeline fairly soon? Well, I mean, yeah.

Future Plans for Open Context

00:42:48
Speaker
We would love actually to expand because we're actually getting to the point now where we're getting much more data, people wanting to publish with us. We're working with now
00:43:03
Speaker
The site file managers from many of the US states, SHPO offices, state site file managers, they're publishing site file databases with us. And so we're starting to need to worry about scaling things up. And so on the technical side, cloud computing is definitely one of the things that we're actively working on right now.
00:43:29
Speaker
We're working on a very new version of OpenContext that's using Python as the scripting language, using Django as a programming framework, and Postgres as a backend database, and Apache Solar for doing facet search. All of those are really neat kinds of technologies that kind of scale well.
00:43:49
Speaker
And what's going to happen is that we're going to have data that we're publishing. It's going to be a lot cleaner. It's going to be using much more updated web standards that are going to be really useful for people wanting to do geospatial things and to do linked data things. So that's what's going on on the technical side. On the non-technical side, organizationally, we're actually
00:44:15
Speaker
really wanting to develop much more of finding ways of financing, giving away other people's data for free. And part of that is developing more ties with the library communities to make sure that data management
00:44:35
Speaker
Granting budgets and things like that can help support this kind of thing. But the other side of this is honestly, we're probably needing to raise some
00:44:47
Speaker
something of an endowment, actually, to help sustain this kind of thing. And so we're actually starting to build up an effort in that kind of an area to help subsidize and underwrite the costs of the editorial processes, the data cleanup, the technology development, all the sorts of things that have to go into making high quality data accessible.
00:45:11
Speaker
Yeah, because this is, I'm sure what listeners are going to wonder, how is this kind of thing maintained? And then we get that question a whole lot. And since its inception, we've had a variety of support from private and federal granting agencies. And we've done some consulting, and now we're actually getting
00:45:31
Speaker
a bit of revenue from people actually putting data publishing costs and archiving costs into their grant applications. So there are a variety of ways of funding this thing.
00:45:42
Speaker
So far, we have write a lot of grant applications. So we are really exploring ways that we can make this a more permanent thing that we can focus on doing the work rather than writing quite so many grant applications year to year. Could you go into a little more detail about what you guys are talking about starting an endowment? Yeah. This is an area that ultimately
00:46:11
Speaker
Open data is in some ways one of the kind of almost a perfect public good, right? In the sense that if data is open, if it has no copyright or legal restrictions associated with it, if it has no restrictions around like technically around formats or anything like that, it can flow anywhere and be used by anybody and reused by anybody. And the big challenge in that is
00:46:41
Speaker
How one does that sustainably because most cost recovery mechanisms, say, associated with traditional conventional publishing, you know, where you have like a subscription access charges, something like that, those kinds of ways of financing dissemination.
00:47:00
Speaker
don't work well with data because they break the utility, they break the value of the data. And so we need to find a way of financing open data because open data is the most valuable kind of data. And so ultimately, that means that we can't charge subscriptions, we can't do a lot of the conventional approaches of sort of
00:47:21
Speaker
market monetization and to sustain this, an endowment is one strategy. Now, what we're trying to do with that is that it doesn't necessarily need to support all of our operating costs, right? So essentially a mixed model of having an endowment to help underwrite part of it and then other costs recovery coming in through data management charges.
00:47:47
Speaker
that type of thing. That's probably the realistic approach. If we can raise a bit of money to help maintain basic operation with a revenue stream from our publishing activities, sustaining basic revenue, then the grants and everything like that can be used for doing the important bits of new technologies, expanding, that sort of thing.
00:48:17
Speaker
So yeah, we're in the middle right now of cultivating donors. So if you know of any, send them our way. Great. Well, thank you so much. That was Sarah and Eric from Open Context.

Conclusion and Resources

00:48:31
Speaker
This was another episode of Archeotech podcast. And thank you so much for listening. Thank you so much, Doug. And thank you so much, Russell. Thanks for having us.
00:48:48
Speaker
Still recording on paper in the field? Hate having to process hundreds of site records when you get back to the office and would rather go straight to report writing and research? Digtech has the answer. Hi, I'm Chris Webster, founder of Digtech LLC, a disabled veteran owned CRM firm and archaeological technology research and development firm. At Digtech, we're creating applications for smartphones and tablets that will increase efficiency in the field and will keep archaeologists doing what they love, archaeology, and will reduce the amount of busy work in the office. Some of what we do involves enhancing existing third party applications that are already on the app stores.
00:49:17
Speaker
Use our consultation form on the website at www.digtech-llc.com forward slash tablet, and we'll help you figure out what digital solution is best for you. The cost of going digital is a lot less than you think, and once you do it, you'll wonder why you ever recorded on paper to begin with. Contact Chris over at DigTech, the parent company of the RKLG Podcast Network today, and let DigTech help you save paper, save time, save resources, and go digital. Now back to the show.
00:49:49
Speaker
That's it for another episode of the Archaeotech Podcast. Links to some of the items mentioned on the show are in the show notes for this podcast, which can be found at www.archaeologypodcastnetwork.com forward slash archaeotech. If you like the show and want to comment, please do. You can leave comments about this or any other episode on the website or on the iTunes page for this episode.
00:50:08
Speaker
You can also email us at chrisatarchaeologypodcastnetwork.com or use the contact form on the podcast webpage. If you'd like us to answer a question on a future episode, email us. Use the contact form on the website or tweet your questions with the hashtag archaeotech or tag at arcpodnet in your tweet. Please share the link to this show wherever you saw it. If you'd like to subscribe to this podcast, you can do so on iTunes or on Stitcher Radio. You can also type the name of the podcast into your favorite podcasting app and subscribe that way.
00:50:35
Speaker
Don't forget to go over to iTunes and leave a review of the show. It helps us get noticed so more people can find our podcast and benefit from the content. Also, send us show suggestions and interview suggestions. We want this to be a resource for field technicians everywhere and we want to know what you want to know about it.
00:50:54
Speaker
This has been a presentation of the Archaeology Podcast Network. Visit us on the web for show notes and other podcasts at www.archaeologypodcastnetwork.com. Contact us at chrisatarchaeologypodcastnetwork.com.
00:51:13
Speaker
Thanks again for listening to this episode and for supporting the Archaeology Podcast Network. If you want these shows to keep going, consider becoming a member for just $7.99 US dollars a month. That's cheaper than a venti quad eggnog latte. Go to arcpodnet.com slash members for more info.