This weekend someone posted an article describing using Python with YQL to parse Repopular to retrieve a list of popular Github projects. Since mashups are all the rage these days, I thought I would implement this in Factor.
The Yahoo! way
The YQL that was used in the original article is:
use 'http://yqlblog.net/samples/data.html.cssselect.xml' as data.html.cssselect; select * from data.html.cssselect where url="repopular.com" and css="div.pad a"
We can use the http.client to send this query to Yahoo!, parse the returned JSON data using json.reader, and extract the HREFs to the popular projects. We can then filter them for the links which point to Github.
USING: assocs http.client json.reader kernel sequences ; : the-yahoo-way ( -- seq ) "http://query.yahooapis.com/v1/public/yql?q=use%20'http%3A%2F%2Fyqlb log.net%2Fsamples%2Fdata.html.cssselect.xml'%20as%20data.html.cssselect% 3B%20select%20*%20from%20data.html.cssselect%20where%20url%3D%22repopula r.com%22%20and%20css%3D%22div.pad%20a%22&format=json&diagnostics=true&ca llback=" http-get nip json> { "query" "results" "results" "a" } [ swap at ] each [ "href" swap at ] map [ "http://github.com" head? ] filter ;
We can run this to see what the current trending projects are on Github.
( scratchpad ) the-yahoo-way [ . ] each "http://github.com/sinatra/sinatra" "http://github.com/Sutto/barista" "http://github.com/pypy/pypy" "http://github.com/dysinger/apparatus" "http://github.com/videlalvaro/Thumper" "http://github.com/alunny/sleight" "http://github.com/vimpr/vimperator-plugins"
The other way
We can use the html.parser vocabulary to do it another way. Given some knowledge of the HTML returned by Repopular, we can extract the HREFs directly:
USING: accessors assocs html.parser http.client kernel sequences ; : the-other-way ( -- seq ) "http://repopular.com" http-get nip parse-html [ [ name>> "aside" = ] find drop ] [ [ name>> "aside" = ] find-last drop ] [ <slice> ] tri [ name>> "a" = ] filter [ attributes>> "href" swap at ] map [ "http://github.com" head? ] filter ;
We can see it produces the same results:
( scratchpad ) the-other-way [ . ] each "http://github.com/sinatra/sinatra" "http://github.com/Sutto/barista" "http://github.com/pypy/pypy" "http://github.com/dysinger/apparatus" "http://github.com/videlalvaro/Thumper" "http://github.com/alunny/sleight" "http://github.com/vimpr/vimperator-plugins"
No comments:
Post a Comment