A couple of months ago I posted about using Flourish to build a bar chart race data visualization. Shortly after that Flourish had another blog post highlighting all the different ways their site could be used to visualize election data, and I've had it bookmarked waiting for another opportunity to experiment with the service. Earlier this week I learned of a wonderful new data set, Canadian Federal Candidates: 1867-2017, by Semra Sevi, and I gave it a go.
This is admittedly very quick and dirty, but I'm pretty happy with it so far. I did some text manipulation with OpenRefine to take care of what was originally presented in CAPS, and I've actually done a bunch of normalization of the Occupation field, which deserves more exploration, but the original purpose of the data set seems to be to explore gender, so that's what I initially focused on too. Oh, and this is only showing those candidates who were elected, while the original data set includes candidates who didn't win election as well.
So you'll want to play around with the stuff in the pale yellow box to the right of the screen. The play button will move from the default of all data to showing data for each Parliament from 01 to 42. You'll want to choose Group by Gender and Shade by Gender, and then just play around with the other options to watch the data change. That's pretty much the whole reason I chose the Survey Response type visualization, because of all the flying changes ;-) It'll take a moment for the embed to load below...
I want to do some more data cleanup on the date fields, which may provide additional ways to show the data on a timeline. I also want to explore mapping the data - I think Flourish is expecting country names, but it should work at the Provincial level as well. TWT.
A couple of notes on the process with Flourish. I had a minor heart attack when I explored including a background image behind the visualization, and it seemed I couldn't get rid of it again. I ended up fixing it by uploading a new image (I had originally pointed to an online image), and once I'd done that I then was able to delete it from the box, but that wasn't an option when I was pointing to a web-hosted image.
I also wish there were some more options to customize the display, namely I couldn't figure out how to include a label on the slider, which is showing the Parliament number. I'd also love to insist that Flourish didn't show something when there was no data to show (like a Province that didn't exist before a certain year). Oh, and a way to set the default groupings and shadings would be nice too.
I might try to take some time to see how Tableau compares...
I guess I'll wait to see what TV+ brings later before declaring yesterday's Apple event a total yawner, but it did remind me to check out Flipster and Browzine again before spending any time and money on News+. What are Flipster and Browzine, you ask? They're digital magazine and journal platforms that you might want to check before you shell out money to Apple, since you may already be able to access the content you want for free.
Flipster is available to me through my local public library, and offers only 51 titles, though what your library offers likely differs. You can browse all potentially-available titles on the EBSCO Flipster site.
Browzine is more oriented towards scholarly literature, and is thus likely offered through your local college or university. I couldn't find a list of individual titles, but here's a list of the publishers they cover.
Both offer apps for your iOS (and Android) devices, and both work quite nicely.
Oh, and for newspapers, poke around your public library web page for PressReader.
Here's a list of the titles available in Canada for News+.
UPDATE: Lifehacker has a similar post, and theirs clued me in to something called RBdigital Magazines, which was new to me. My Public Library has it, and it appears to offer 190 titles, so depending on what they all are, possibly my best bet. Will be exploring...
We've been doing a lot of talking about Data Management Plans at MPOW, and I recently came across the following article from PLOS One: A funder-imposed data publication requirement seldom inspired data sharing.
In it, the authors take a look at 315 research projects that had been funded by the Exxon Valdez Oil Spill Trustee Council (EVOSTC) between 1989 - 2010 to see how many data sets could be obtained. This is an interesting funding agency, as it funded research from many different disciplines, including the Social Sciences. As you might guess, the number of retrieved data sets is pretty darn low (26%), but the single biggest reason was that the authors were unable to reach the original researcher. I guess that shouldn't surprise me as much as it did. The other thing I found surprising is that they found slightly lower retrievability for research that was conducted after the EVOSTC enacted a formal data policy in 1995. I guess that's the sound bite for the article title!
But the other thing we've been talking about at MPOW is what kind of reward might entice researchers to post their data, and this article addressed that as well:
"Inherent in all of the above hurdles to data recovery is the absence of a reward structure for sharing data. There is no reward for the time investment required to learn how to curate data or construct data packages (including well documented and formatted error-free data, with project and file level metadata) in cases where original data curation was insufficient. The focus in science has traditionally been on production and citation of publications, and as such, a process for identification and citation of manuscripts has been well developed. Similar attribution and recognition of data would incentivize the archiving and sharing of data in the scientific community. One way publications are tracked and cited is through the use of digital object identifiers (DOIs), a tool that is also increasingly being used to attribute data. DOIs are particularly important for data identification because unlike manuscripts, data can be regularly updated or exist in multiple formats or subsets, so identifying specific versions of the data via a DOI is key to data proper data attribution and use. Although the use of data DOIs is a relatively new practice, they should facilitate more routine data citation as compared to traditional methods. Incorporating data citations into a scientists’ overall research output alongside journal publications should further incentivize data sharing."
So they're making some claims that they can't back up with this research, but which certainly makes sense. The one thing that might get researchers to post their data sets is better recognition in the tenure and promotion process of the work that's involved in making this information available online.
As a snarky aside, I wonder if the EVOSTC website has seen a redesign since it was first launched? :-0
I first read about it on Reddit, followed shortly by the CANLIB-DATA Listserv, but as of today Google has a new search engine dedicated to research data sets, the cleverly-named Google Dataset Search.
The good: Surfacing this stuff is great! Google is using schema.org to discover stuff, and has a pretty extensive page on how this all works. Results link through to Google Scholar to show who has cited a dataset. Likely not comprehensive, but a good start. Oh wait, it's not very good at all - I thought it was linking to the DOI, but it's just some sort of keyword linking. I just found a declassified Los Alamos report from 1957 in the top spot that supposedly links to one of these datasets :-/ Right idea, totally the wrong approach.
The annoying: Just as with Google Scholar, there's no way to know exactly what is and isn't being indexed. Also annoying not to have a count on the number of results. I can't get the "share" button to work, but that may very well be something specific with my browser and some extension - not a huge deal right now.
The weird: Of course I did a search for MPOW, and "University of Calgary" auto-suggests to a record about our institutional repository, but nothing from within our IR. Do we not conform with schema.org? (entirely possible). Why is that link from the French version of the National Research Council Canada?
The bad: No filters or facets of any sort - boo!
I have already found a couple of datasets of interest, and one that eventually led through to a deleted dataset, making me wonder how fresh the index is.
Definitely one to watch!
Somehow I wasn't subscribed to the Google Scholar blog already, and missed last week's announcement that they'd made some tweaks to the interface of Google Scholar. Today though I was presented with a big link under the search box to the post on their blog touting Better ways of getting around. I've gotta say I found their screenshot pretty confusing. In a nutshell, they've moved things like Alerts and Metrics to the hamburger menu which appears to the left of the words "Google Scholar" on all the results pages.
Feb 2, 2017 update: improvements have been made!
The other day I saw a headline about a new extension for the Chrome browser from the Internet Archive, Wayback Machine. I had initially ignored this, assuming it was the same thing as Wayback Chrome, but upon further inspection, the latter, which has been around for quite some time, is from a third-party, not the IA itself. That said, I'm keeping them both installed, and here's why.
Wayback Chrome, the earlier extension, has always worked upon a mouse click. You land on a page, see that it doesn't resolve, and then click the Wayback button, which takes you directly to the archived version at the Internet Archive. I just tried it for http://change.gov which initially lands me at:
I then hit the wayback button in my toolbar and am taken to the archived website.
This newer extension tries to be more helpful in the following way (emphases mine):
By using the “Wayback Machine” extension for Chrome, users are automatically offered the opportunity to view archived pages whenever any one of several error conditions, including code 404, or “page not found,” are encountered. If those codes are detected, the Wayback Machine extension silently queries the Wayback Machine, in real-time, to see if an archived version is available. If one is available, a notice is displayed via Chrome, offering the user the option to see the archived page.
That means it doesn't work at all for my previous example, because http://change.gov doesn't return a 404 error code, it returns a, um, I can't figure out what code it's returning :-(
I also can't find a single page where it DOES work, because I can't think of how to find a disappeared page and I've already spent 20 minutes trying to find an example. Hey Archive.org, can you put an example on your blog post please? OK, I kept looking and finally found an example for you: http://www.supremecourt.ohio.gov/publications/annrep/IOCS/2011OCS.pdf Hit that with the new extension installed and you'll be offered a visit to the archived version. Which OMG doesn't work!!!!!!
FFS. I guess I'll keep it installed for those times it might pop up and actually prove handy, but I'm going to continue to rely upon the original Wayback Chrome extension I've been using.
Oh, I looked again at my post title. Maybe it doesn't suck, but it's sure not as useful as it might be! Much too restrictive in the error codes it acts upon IMHO.
Update on Jan 17, 2017: I've been emailing back and forth with the Director of the Wayback Machine and have agreed that the word "sucks" is too strong, so have changed that out from the title of the blog post :-) He gave me a URL to test with, http://www.pfaw.org/attacks.htm and mentioned that URL does appear on the Chrome web store, and it has also been added to the original blog post announcing the tool. Finally, he clarified that "http://change.gov results in a DNS error. The "Wayback Machine" extension does not detect DNS errors at this time. We are looking into adding that feature. Note that we ARE detecting error conditions: 404, 408, 410, 451, 500, 502, 503, 504, 509, 520, 521, 523, 524, 525, and 526 We are also looking to add a persistent UI to save and lookup URLs on demand."
As I did last year, I have captured all the tweets from this year's Semantic Web in Libraries conference using the TAGS tool for Google Sheets. As of this posting there are 630 tweets with the hashtag #SWIB, 573 of them unique. Last year had 1,736 tweets! with 1,633 of them unique.
Here's the archive for 2016. Have fun!
Want to impress the young folk in your life with an awesome book or two this holiday season?
Want to avoid the disappointed /horrified / mortified looks of these same young people with stunning gaffes with selections like Boy’s Body Book or Golden Girls Forever or A History of Interest Rates (4th ed.)?
Or perhaps you’re aiming bigger and want to turn them into life-long readers or at least temporarily redirect their attention to something other than social media, their phones or computer games?
Then this list may just get you on your way.
Check it out!
Mita Williams has an excellent blog post titled Why Libraries Should Maintain the Open Data of Their Communities. It's a long (for a blog post), but important read, and includes an excellent history of how Canadian government data has evolved, and how it compares (poorly) to US government data .
I've been making the same case for Libraries and Open Data myself this year, but in a much less eloquent and scholarly way :-) While Mita's post is based on slightly older research (2014), the bibliography is still a great place to learn pretty much all you need to know on the subject. Having similarly researched over the course of this year, I sadly don't think much has come out in the interim, except maybe Brian Jackson's 2015 article, The State of Canadian Library Data.
If you're at all interested in the role libraries can or should play when it comes to Open Data, you owe it to yourself to carve out some time to give Mita's post a read.
Surely someone smarter than me can think of something interesting to do with the following?
Each of these contains citation information for peer reviewed scientific and technical articles published and authored or co-authored by National Research Council scientists and researchers. Includes .csv, .xlsx, .xml, and .ris files.
Is there any significance to the fact that the numbers have fallen each year from 2012-1015?
Update: It's an hour or so later and I thought I'd take a stab at a citation network using VOSviewer, which is pretty cool, but as soon as I saw how dirty the NRC bibliographic data was I gave up. Good luck if you're going at it! :-(
I'm about halfway through an excellent Coursera MOOC called Getting and Cleaning Data, part of my plan to learn R. We were just shown a slide listing a bunch of collections where we could find data sets to play with, and the first one on the list had disappeared. Turns out it was originally created by Hilary Mason, who apparently is someone I should know about in the data world! Hilary used to be the chief scientist at bitly.com, and the disappearance of this excellent list makes me wonder if they had a falling out. Regardless, and even though it's available on Archive.org, I thought I'd recreate it myself as a Delicious Tag Bundle, so I did. I added annotations to many of them, weeded out one or two that were really dead and gone, and added a couple as well, so it's not quite canonical ;-) Of course Delicious has been pretty flaky of late, so fingers crossed IT doesn't disappear!