The 7th edition of the Designing Libraries conference just wrapped up here at the U of Calgary, and I thought it'd be fun to capture the tweets using the hashtag #designinglibraries, so here you go:
Here's the full archive (457 tweets as of posting)
Here's a dashboard summary view
Here's a nifty view that allows for easy searching and filtering. (only one tweet even mentioned the weather!)
All this brought to you by the awesome TAGS project.
Update: Here's an article from our campus newspaper: Calgary 'head of the list' of library cities around the world.
One of my hats these days has me supporting students and faculty in the use of NVivo, a piece of software used for qualitative data analysis. Often people are wanting to analyze the text of recorded interviews, and of course that usually requires someone to transcribe the audio into text. I'm always on the lookout for free tools to automate this process.
I didn't realize it was that long ago, but last December I started playing with an automatic transcription tool called AutoEdit2, and found it pretty decent. Yesterday and today, ResearchBuzz led me to two new options, so I thought I'd do a quick comparison.
First up was a link to this announcement on TechCrunch about Deepgram. Then today was a pointer to Hongkiat's How to Transcribe YouTube Videos Automatically, which actually has pointers to several different tools or methods. I was most interested in the new-to-me option of having Google Docs automatically type what it heard into a document. Do note this method requires the use of the Chrome browser - I missed that and couldn't find the option in Firefox when I first tried...
Here's what Deepgram made of it:
please your right and never after me i george walk bush too solemnly swear i told what university so this work but i will say for extra office of person of the united states that i was lately excuse a of the united states and roll to the best of my and well the best of my ability sure protect learn the constitution of the united states sarah protect and the fan constitution of the united states saw not like so help
me god in congratulations
Here's what Google Docs heard:
The only answer please raise your right hand and repeat after me I George Walker Bush do solemnly swear a George Walker Bush to sell them I swear that I will Faithfully execute the office of President of the United States that I will Faithfully execute the office of President of the United States and will to the best of my ability and will to the best of my ability preserve protect and defend the Constitution of the United States Constitution of the United States
And here's what AutoEdit2 came up with:
the only please raise your right hand and repeat after me I George Walker bush do solemnly swear I George Walker bush do solemnly swear that I will faithfully execute the office of president of the United States that I will faithfully execute the office of president of the United States and will to the best of my ability and will to the best in my ability preserve protect and defend the constitution of the United States preserve protect and defend the constitution of the United States so help me god so help me god congratulate
So it seems to me on this very short audio sample that AutoEdit2 did very well, followed by Google Docs, with Deepgram bringing up the rear. Deepgram is the easiest to use, requiring the user to simply upload an audio file or point to either a YouTube video or online audio file. Once installed (MacOS only), AutoEdit2 is also very simple, though does require the file to be local, not online. And Google Docs is a bit of a pain to set up initially, but after a little trial and error is fairly straightforward and can record either a local file or something streaming online.
Curious to hear if you've played with any of these, or other tools, with longer audio clips, and if you find the same results?
Some time ago I learned about TheHerdlocker.com from a post on ResearchBuzz. Since then, it's grown a bit, and I've been receiving useful weekly updates to a few hashtags I like to follow on the Twitter.
In a nutshell, rather than you creating a column within Tweetdeck to collect all mentions of a given hashtag, TheHerdLocker monitors a hashtag and then sends you a weekly email listing the top content using that hashtag over the past week. They claim by filtering out retweets, duplicates and spam, they knock out about 88% of what would be considered noise on a given hashtag. You can't, at this time, simply provide a hashtag to monitor; you have to pick from amongst the ones that are already being monitored. I don't know if that feature is in the works, but I do know that I've emailed the developer three times and gotten three hashtags added to the list right away, so if you don't see one you're interested in, just ask.
I'm currently monitoring these hashtags:
And here's what one of the actual emails you'll receive looks like.
It occurs to me that that is also a good way to keep a finger on a pulse within Twitter w/o ever even needed to visit the site, or even have an account!
Good stuff! No kickback, just a happy user :-) Check it out.
For the past couple of weeks I've been helping a student gather a collection of historical tweets, based on specific hashtags. You may be aware that Twitter's search API limits searches to only the past week or so, which sucks for those doing historical research. I'm still on the hunt for a tool that would allow me to easily gather older tweets, so please let me know if you have one! I tried Octoparse and Import.io, but neither ended up being reliable for this purpose.
At this point I reached out to Ed Summers, creator of the awesome command-line tool twarc, who suggested I try scraping with Webrecorder.io (protip, use the autoscroll button at the top). Webrecorder.io did a pretty decent job of capturing my historical search, but it took a fair amount of work. It seemed to bog down a bit after a while, so I ended up chunking my searches into 2-3 days, necessitating running the scrape about a dozen times to capture the entire event. And then I was knee-deep in learning about WARC files. I was unable to find a tool that I could make work that would allow me to extract either the full tweets, or the tweet-ids from the WARC files, so simply having the entire search results at my disposal still didn't help me.
Then I did what I should've done first. I searched to see if anyone had posted an archive of the tweets my student was looking for. And someone had. If ever you're doing social media research around crisis situations, you'll want to know about CrisisLex.org.
So now we almost had what we needed. But, Twitter also says that if you've collected a pile of tweets, you can't post them for someone else to download, you can only post a file of the tweet-ids. Ed again has some good thoughts on this rule. This is why I needed to "hydrate" the tweet-ids contained in the CrisisLex files in order to get the actual details of the original tweets.
twarc does this, but somehow I screwed up my extraction of the tweet-ids from the .csv provided by CrisisLex and it didn't seem to be working correctly for me. So I went crying back to Ed to see if he knew what I was doing wrong, and he showed me a screen shot of a tool he used to prove that it was all working for him.
Hey, if I can find a "program" to do something so I don't have to do it in the command line, I am one happy camper! Turned out he was using Hydrator, which works on OS X, Windows and LINUX. You feed it a file of tweet-ids and it spits out a JSON file AND a CSV file of the actual tweets. Golden. My student is tickled and I can move on to something new, but with a wicked new tool at my disposal. I owe you several drinks, Ed!
Over the weekend Ed Summers released a tool called diffengine that monitors an RSS feed and then posts captures of any changed text to Twitter. It's meant for monitoring news sites, but could be used for anything that has an RSS feed. Useful for monitoring your news site of choice for disappearing news, but also reports that change on the fly, here's what it looks like in action, from a CBC article:
Probably a really neat tool for journalism students too! Right now there are a handful of mainstream or influential sites publicly posted, such as The Washington Post, Breitbart and the Toronto Star. Not sure if there will be a logical directory of these things once people start running their own instances?
As I did last year, I have captured all the tweets from this year's Semantic Web in Libraries conference using the TAGS tool for Google Sheets. As of this posting there are 630 tweets with the hashtag #SWIB, 573 of them unique. Last year had 1,736 tweets! with 1,633 of them unique.
Here's the archive for 2016. Have fun!
Surely someone smarter than me can think of something interesting to do with the following?
Each of these contains citation information for peer reviewed scientific and technical articles published and authored or co-authored by National Research Council scientists and researchers. Includes .csv, .xlsx, .xml, and .ris files.
Is there any significance to the fact that the numbers have fallen each year from 2012-1015?
Update: It's an hour or so later and I thought I'd take a stab at a citation network using VOSviewer, which is pretty cool, but as soon as I saw how dirty the NRC bibliographic data was I gave up. Good luck if you're going at it! :-(
Disclaimer: I haven't actually tried either of these, but have heard the results of the first one, and the second one is such a good review it must be worth a look, right? :-)
First up, a web-based podcast-recording studio called Cast. Awful name, IMHO, as it's hard to find amongst all the ChromeCast results in a search, but there you go. It's not free, but you do get a month to play with it before having to pony up. You'll record audio, alone or with guests elsewhere, then edit, and then host, all within Cast, no download required. Includes analytics. I first heard (and heard about it) on ep 50 of the DCRainmaker podcast, so go ahead and give that episode a listen if you want to check the quality first.
Second up is a post from Lifehacker in which Eric Ravenscraft declares Discord to be "the Voice Chat App I’ve Always Wanted." He says it's similar to Slack in feel, and while it's actually geared towards gaming it sounds as though it might work well for collaboration in general. Definitely adding to my bookmark list.
Tara Calishain at ResearchBuzz has posted part 2 of her 3 part series on creating and working with information traps, Setting up and sharing Google alerts. In this post I learned how to automatically create and populate columns in a Google Sheet using IFTTT. Off the top of my head I can think of at least two projects where this is going to immediately come in handy!
I've said it before, and I'll say it again, Tara's work is worth supporting financially! C'mon, go create an account at Patreon and become part of the sharing economy - it really will give you the warm fuzzies. :-0
Last week I was intrigued when this paper came out: Trust, tribalism and tweets: has political polarization made science a “wedge issue”?. In it, "Helmuth and his Northeastern colleagues analyzed the Twitter accounts of U.S. senators to see which legislators followed research-oriented science organizations, including those covering global warming. Democrats, they found, were three times more likely than Republicans to follow them, leading the researchers to note that “overt interest in science may now primarily be a ‘Democrat’ value.”" Interesting approach, I thought. I was very pleased to learn that the data sets for the article are available at Northwestern University's data repository (yay open science!).
Then for fun I thought I'd check the datasets to see whether any US senators followed Edward Snowden. None did! Then I realized that's 'cause the data sets are from February 2015, and @Snowden didn't appear on twitter until the fall of 2015! Doh!
Now I'm on a rabbit hunt to figure out how to compare @Snowden's 2.12 million followers to see if any of them are US senators. Since I had the list of senators from the aforementioned data sets, I thought it would be easiest to just get a list of all @Snowden's followers and compare the two. Surely somebody has a utility that will allow me to download all 2.12 million followers, right? Not so much :-(
I started with something in php I found on Github that sounded absolutely perfect, but it flat out wouldn't work, and so far no feedback from the developer. Next checked with Nick Ruest to see if twarc might be able to do it, and he suggested I look at tweepy (python) instead. That looked really promising, except I couldn't figure out how to work around the rate limits imposed by Twitter over gathering such a large follower list. Next up, a random kind stranger suggested a program written in ruby called simply "t". I now LOVE this program, in part because it's very simple and very well documented. But it still falls short because of the rate limits. :-( And then my knight in shining armour, Ed Summers, came through with some real life working examples in tweepy! I'm now 3 hours in to gathering the full list of @Snowden's followers, and napkin math suggests I'm only halfway though. Fingers crossed for when I come back in on Monday!
I'd already spent too much time on this, though I've learned tons along the way, and will be able to utilize a lot of this in the future. But I still wanted to find out which US senators followed @snowden. Back to t and I find the command "t does_follow". Hell, there's only 100 senators, so I ended up doing it manually, which in the end only took about 10-15 minutes. And here's what I got for each and every one:
ppival$ t does_follow chriscoons snowden
No, @chriscoons does not follow @snowden.
Talk about an echo chamber! I guess if he's still officially a traitor it looks bad for a senator to be acknowledging him? Sure, if he says something useful I'm sure someone would share it with the VIP, but c'mon, you can't even hear what he has to say or acknowledge his existence by following him on twitter? That's pretty lame, IMHO. btw, neither Hillary nor Donald follow @snowden either. Neither do Hillary or Donald follow each other. Whatever.
Thanks also to John Brosz for helping me work though code, and Andrew Pasterfield for the same.
A while back I started using Buffer to schedule posts to my social media accounts. I was always kind of annoyed when someone spammed my twitter feed with 20 posts during the 5 minutes they found themselves on twitter, and I didn't want to be that person back. First world problem, I know ;-) Anyhoo, Buffer allowed me to set a schedule and fire posts into a queue, thus ensuring my feed would be more polite, and I could appear to be posting tweets even while riding my bike, thus amazing my friends! I was using the free account, and probably could've continued to happily do so, but sometimes ran up against the limit of only being able to have 10 posts in my queue. I thought I'd fire them off an email to see if they'd extend academic pricing to me, and they did!
So now I subscribe to the Awesome Plan, which gets me up to 200 posts in my queue, and I can link to additional social media accounts. If I wanted, the Awesome Plan also allows two people to share an account (hint hint small libraries). Oh, and the reason I'm telling you this is that you, too, can most likely receive this 50% discount.
If you're an educator or student, email them at [email protected] to get set up. To verify your educational status, they'll ask that you have a Buffer account set up with your .edu email address or email them a copy of their Student/Educator ID.
This is not a sponsored post - I happily ponied up the price of the discounted Awesome Plan, and now you can too!
See you online!
A week or so ago I came across a series of posts from Spencer Greenhalgh in which he described how he used R to take a large group of collected tweets around the terrorist attacks in Paris in order to geolocate them on a map. The trouble is that very few users actually include geolocation information in their tweets, so Spencer had to figure out a way to grab an approximation of that information by mining each user's twitter profile page for his/her stated home location. This wouldn't indicate where the users were when they actually tweeted, but it gave him a good idea where the people usually were when they tweeted (if the profile page could be trusted), which was good enough for his experiments.
I've been playing with Open Refine of late, so thought I'd take a stab at this same experiment with that hammer. I figured out all the pieces, but was unsuccessful in the end, and I can't tell you exactly why. I got Spencer's archive of tweets, and was easily able to import it into Open Refine. The problem came when I started to pull in the user profile pages - no matter how I sliced and diced the archive I would always time out on some section or another. The full data set contains just over 6,000 lines, so I tried breaking that in half by date, into smaller pieces by client used to send the tweet, and by unique vs duplicate users. I was able to get a full scrape of the user profile pages for about 600 users, but every other group I tried would simply time out (I tried all sorts of variations on the throttle delay, from 250 - 10,000 milliseconds, but my scrape would always hang, depending on the set I was trying, either around 67% or 89%. Maybe it was a memory issue? Because it simply hung, I never got an error message to work with. If you have an idea what might've been the problem I'd LOVE to know!
The GREL expression I did eventually use to dig out the user location from those I could get was as follows:
That searches through the scraped profile page HTML for the CSS class around the user-stated location and just returns that bit of info, which would look like this:
<span class="ProfileHeaderCard-locationText u-dir" dir="ltr"> Atlanta, GA </span>
It did take me a while to figure this bit out, so for future reference I found this page most useful.
So now we've got a column of messy data, which I cleaned with the following expression:
I found that one at https://github.com/OpenRefine/OpenRefine/wiki/StrippingHTML and once run it leaves behind the nice clean string of text we're after:
And from there it would be trivial to run that column through any number of free online mapping tools in order to generate a map of where users say they're from.
So the biggest headache was getting all the HTML from the user profile page. JSON would've been MUCH cleaner, but like Spencer I wanted to try to do this without using the Twitter API, so AFAIK I was stuck with HTML.
Fast forward to today, when I see Nick Ruest's post, A look at 14,939,154 #paris #Bataclan #parisattacks #porteouverte tweets. I was having trouble with 6,000 tweets, and Nick was churning through almost 15 million tweets with a far more appropriate tool called twarc.
twarc is a command line tool and Python library for archiving Twitter JSON data. Each tweet is represented as a JSON object that is exactly what was returned from the Twitter API. Tweets are stored as line-oriented JSON. It twarc runs in three modes: search, filter stream and hydrate. When running in each mode twarc will stop and resume activity in order to work within the Twitter API's rate limits.
I had tried to work with twarc a few months ago, but couldn't get it running on the shared server I was on, but after seeing just what it's capable of I will get it up and running, hog frog or dog!
My takeaway from this is that while there may be many different ways to solve a problem, they're certainly not all equal. Spencer stuck with what he knew and got something usable out of it. I tried with what I knew and failed, and Nick used what he knew and kicked both our butts. Spencer, do look at twarc too! :-)
Upward and onward!
Way back in 2011 I thought it'd be fun to note the books I had read the previous year. I kept the info in a text file and created a couple of infographics to go along with the post. I did the same in 2012, but never got around to 2013. I have been logging all my reads in Goodreads though, so it's time for a quick update. I wish they provided some graphics to go along with, but c'est la vie.
I do almost all of my reading on my Kindle Paperwhite now, with most books coming from Overdrive at Calgary Public Library, loaded through Calibre. Any remainders would be in paper. It'd still be interesting (to me) to do the breakdown by month and genre, as I had done originally, so maybe I'll update this post over the holidays.
The latest issue of Library Hi Tech News has the following article, which contains a pretty good list of free and freemium tools that may be new to you as many were to me.
Christine Palma Forbes , (2014),"Free Web-based Tools for Information Literacy Instruction", Library Hi Tech News, Vol. 31 Iss 10