I first read about it on Reddit, followed shortly by the CANLIB-DATA Listserv, but as of today Google has a new search engine dedicated to research data sets, the cleverly-named Google Dataset Search.
The good: Surfacing this stuff is great! Google is using schema.org to discover stuff, and has a pretty extensive page on how this all works. Results link through to Google Scholar to show who has cited a dataset. Likely not comprehensive, but a good start. Oh wait, it's not very good at all - I thought it was linking to the DOI, but it's just some sort of keyword linking. I just found a declassified Los Alamos report from 1957 in the top spot that supposedly links to one of these datasets :-/ Right idea, totally the wrong approach.
The annoying: Just as with Google Scholar, there's no way to know exactly what is and isn't being indexed. Also annoying not to have a count on the number of results. I can't get the "share" button to work, but that may very well be something specific with my browser and some extension - not a huge deal right now.
The weird: Of course I did a search for MPOW, and "University of Calgary" auto-suggests to a record about our institutional repository, but nothing from within our IR. Do we not conform with schema.org? (entirely possible). Why is that link from the French version of the National Research Council Canada?
The bad: No filters or facets of any sort - boo!
I have already found a couple of datasets of interest, and one that eventually led through to a deleted dataset, making me wonder how fresh the index is.
Definitely one to watch!
Somehow I wasn't subscribed to the Google Scholar blog already, and missed last week's announcement that they'd made some tweaks to the interface of Google Scholar. Today though I was presented with a big link under the search box to the post on their blog touting Better ways of getting around. I've gotta say I found their screenshot pretty confusing. In a nutshell, they've moved things like Alerts and Metrics to the hamburger menu which appears to the left of the words "Google Scholar" on all the results pages.
Feb 2, 2017 update: improvements have been made!
The other day I saw a headline about a new extension for the Chrome browser from the Internet Archive, Wayback Machine. I had initially ignored this, assuming it was the same thing as Wayback Chrome, but upon further inspection, the latter, which has been around for quite some time, is from a third-party, not the IA itself. That said, I'm keeping them both installed, and here's why.
Wayback Chrome, the earlier extension, has always worked upon a mouse click. You land on a page, see that it doesn't resolve, and then click the Wayback button, which takes you directly to the archived version at the Internet Archive. I just tried it for http://change.gov which initially lands me at:
I then hit the wayback button in my toolbar and am taken to the archived website.
This newer extension tries to be more helpful in the following way (emphases mine):
By using the “Wayback Machine” extension for Chrome, users are automatically offered the opportunity to view archived pages whenever any one of several error conditions, including code 404, or “page not found,” are encountered. If those codes are detected, the Wayback Machine extension silently queries the Wayback Machine, in real-time, to see if an archived version is available. If one is available, a notice is displayed via Chrome, offering the user the option to see the archived page.
That means it doesn't work at all for my previous example, because http://change.gov doesn't return a 404 error code, it returns a, um, I can't figure out what code it's returning :-(
I also can't find a single page where it DOES work, because I can't think of how to find a disappeared page and I've already spent 20 minutes trying to find an example. Hey Archive.org, can you put an example on your blog post please? OK, I kept looking and finally found an example for you: http://www.supremecourt.ohio.gov/publications/annrep/IOCS/2011OCS.pdf Hit that with the new extension installed and you'll be offered a visit to the archived version. Which OMG doesn't work!!!!!!
FFS. I guess I'll keep it installed for those times it might pop up and actually prove handy, but I'm going to continue to rely upon the original Wayback Chrome extension I've been using.
Oh, I looked again at my post title. Maybe it doesn't suck, but it's sure not as useful as it might be! Much too restrictive in the error codes it acts upon IMHO.
Update on Jan 17, 2017: I've been emailing back and forth with the Director of the Wayback Machine and have agreed that the word "sucks" is too strong, so have changed that out from the title of the blog post :-) He gave me a URL to test with, http://www.pfaw.org/attacks.htm and mentioned that URL does appear on the Chrome web store, and it has also been added to the original blog post announcing the tool. Finally, he clarified that "http://change.gov results in a DNS error. The "Wayback Machine" extension does not detect DNS errors at this time. We are looking into adding that feature. Note that we ARE detecting error conditions: 404, 408, 410, 451, 500, 502, 503, 504, 509, 520, 521, 523, 524, 525, and 526 We are also looking to add a persistent UI to save and lookup URLs on demand."
As I did last year, I have captured all the tweets from this year's Semantic Web in Libraries conference using the TAGS tool for Google Sheets. As of this posting there are 630 tweets with the hashtag #SWIB, 573 of them unique. Last year had 1,736 tweets! with 1,633 of them unique.
Here's the archive for 2016. Have fun!
Want to impress the young folk in your life with an awesome book or two this holiday season?
Want to avoid the disappointed /horrified / mortified looks of these same young people with stunning gaffes with selections like Boy’s Body Book or Golden Girls Forever or A History of Interest Rates (4th ed.)?
Or perhaps you’re aiming bigger and want to turn them into life-long readers or at least temporarily redirect their attention to something other than social media, their phones or computer games?
Then this list may just get you on your way.
Check it out!
Mita Williams has an excellent blog post titled Why Libraries Should Maintain the Open Data of Their Communities. It's a long (for a blog post), but important read, and includes an excellent history of how Canadian government data has evolved, and how it compares (poorly) to US government data .
I've been making the same case for Libraries and Open Data myself this year, but in a much less eloquent and scholarly way :-) While Mita's post is based on slightly older research (2014), the bibliography is still a great place to learn pretty much all you need to know on the subject. Having similarly researched over the course of this year, I sadly don't think much has come out in the interim, except maybe Brian Jackson's 2015 article, The State of Canadian Library Data.
If you're at all interested in the role libraries can or should play when it comes to Open Data, you owe it to yourself to carve out some time to give Mita's post a read.
Surely someone smarter than me can think of something interesting to do with the following?
Each of these contains citation information for peer reviewed scientific and technical articles published and authored or co-authored by National Research Council scientists and researchers. Includes .csv, .xlsx, .xml, and .ris files.
Is there any significance to the fact that the numbers have fallen each year from 2012-1015?
Update: It's an hour or so later and I thought I'd take a stab at a citation network using VOSviewer, which is pretty cool, but as soon as I saw how dirty the NRC bibliographic data was I gave up. Good luck if you're going at it! :-(
I'm about halfway through an excellent Coursera MOOC called Getting and Cleaning Data, part of my plan to learn R. We were just shown a slide listing a bunch of collections where we could find data sets to play with, and the first one on the list had disappeared. Turns out it was originally created by Hilary Mason, who apparently is someone I should know about in the data world! Hilary used to be the chief scientist at bitly.com, and the disappearance of this excellent list makes me wonder if they had a falling out. Regardless, and even though it's available on Archive.org, I thought I'd recreate it myself as a Delicious Tag Bundle, so I did. I added annotations to many of them, weeded out one or two that were really dead and gone, and added a couple as well, so it's not quite canonical ;-) Of course Delicious has been pretty flaky of late, so fingers crossed IT doesn't disappear!
Earlier this month I happened across a blog post from a company called OpenDataSoft (ODS) in which they described how they put together and mapped a list of over 1,600 Open Data portals around the world. I thought that was pretty cool, and did a little exploration of their web-based platform and decided to try my hand at some data enrichment and publishing. I trolled the City of Calgary Open Data website for something I thought might work well and settled upon their List of City Amenities (dog parks, EMS stations, arenas and such). While they offer all the data as a .csv file, the geographic / mapping file was completely separate, and I wanted to try to put it all together into one nifty application.
For my purposes, the .csv provided by the city was juuuust short; it didn't include any postal codes with the addresses, so I couldn't automatically generate a map with ODS. I ended up throwing the addresses at http://geocoder.ca/ which allowed me to grab the postal codes. Then I realized that at this time, ODS only maps postal codes in France (they're based in Paris). SO I grabbed an API key from Google Maps (linked to from within ODS) and THEN I was able to generate the desired map. There were a few outliers that I had to manually correct, but here's the result. Note the filters on the left - that was added with an option with ODS, and then there's the separate map tab, which again is an enhancement over what the City provided. I did take a look at the City's .kml file in Google Earth - it wasn't very useful, or accurate, IMHO.
There are a number of share/embed options, including the ability to download / share only filtered results. Here's a map of Calgary's Outdoor Pools (we're not a big outdoor pool city, as you might imagine):
Anyhoo, do take a look at OpenDataSoft - all this was done for free, though if you're planning to work with lots of data sets you'll have to pay, I think.
Here's a thought-provoking talk given by Simone Kortekaas of Utrecht University Library in the Netherlands at this year's UKSG conference. In it, she talks about how they decided to do away with their discovery tool and steer users to Google Scholar, Web of Science, and Scopus. Utrecht appears to be a science-heavy institution. The title is a bit off, as they do still run their traditional catalogue for now, but still, their statistics showed their users were using tools other than those built by the library, so that's where they focused their efforts. Think you could get away with this at your school? Where are YOUR users actually starting their research?
Thanks for the link Heather!