Latest Posts | Preliminary Inventory of Digital Collections by Jason Ronallo

This page has permanently moved to http://ronallo.com/page/4/

The stale content is below for a limited time for your convenience while DNS caches get expired.

Page 4 of 5

Slides from Code4lib Presentation on HTML5 Microdata and Schema.org

Feb 7, 2012

Update: You can find the video on Livestream at 01:05:30.

Here are my slides from slides from my Code4Lib 2012 presentation on HTML5 Microdata and Schema.org. If you want the speaker notes, please dm me on Twitter @ronallo.

HTML5 Microdata and Schema...

Common Crawl, Web Data Commons, and Microdata

Jan 30, 2012

The other day I discovered the Web Data Commons, which is building on top of the Common Crawl to extract Microformat, Microdata, and RDFa data and make it available for free download. This means that there is starting to be free structured data from a big portion of the Web available for for anyone to play with at very low cost. Common Crawl takes care of the crawling and then Web Data Commons will do data extraction. This opens up new possibilities for services, specialized search, and aggregations of content. Big web data is being opened up for small startups and individuals.

Listing Published Octopress Posts

Jan 24, 2012

In converting my blog from Wordpress to Octopress, I had a lot of old posts I was leaving unpublished. I wanted to keep them around but don’t see the need to republish them right now. I also want to be able to create a lot of drafts of ideas and leave them unpublished. Then whenever I’m ready to work on a post, they’re all right there in my repository already.

Problem is that I find it hard to read through the filenames of posts and try to remember which have been published and which have not. So in order to see the publication status of all my posts, I created this rake task. I just dropped this at the end of Rakefile and run rake listpub.

Solving the Item-Level Problem on the Web

Jan 23, 2012

Digital Collections Services Through Using Web Crawls

Digital libraries have attempted to provide various aggregations of their content. Usually the participants in the aggregation already make that content accessible on the open web. The approaches to aggregating content that have been taken in the past have relied on hosting institutions to provide their metadata in new ways and support additional infrastructure and workflows. An alternative approach to creating aggregations is to perform targeted crawls and reuse the content on the pages. The problem with the crawler approach dentifying items in the collection as opposed to other pages. This document presents a few possibilities for how to identify items.

DPLA Strawman Technical Proposal

Jan 23, 2012

Collection Achievements and Profiles System and DPLA Crawler Services

This is a quick strawman proposal for what the Digital Public Library of America should build as the first parts of a generative platform. This document is not in a finished state, but just as the DPLA has been good at opening up its process with the Beta Sprint, I wanted to release this document early even in this unfinished state.

I attended the December DPLA Technical Workshop in Cambridge and was inspired by the discussion there. I hope that this document makes it clearer some of the approaches I and others at that meeting were advocating. I shared this with the DPLA Interim Development Team a couple of weeks ago, and now that development has started I thought I would share it here as well.

While the first iteration of the DPLA platform may be set and on its way, I still wanted to share one vision of what a generative platform for aggregations might involve. The main point is to get the DPLA to the aggregations they likely need to present at some point. This document leaves aside the question of whether creating aggregations is a good idea. The desire to create aggregations is a big, often unquestioned, assumption of big digital library projects. I think what is set out below is one simple architecture for accomplishing aggregations in a very Web-centered way while potentially having more reuse outside of just aggregations.

Page 4 of 5

Previous page Next page