Stuff. Also, things.: afternoon papers

Sunday 26 September 2010

content-based stuff now:

Dmitry Bogdanov, Martín Haro, Ferdinand Fuhrmann, Emilia Gómez and Perfecto Herrera

Dmitry presenting

can search in a variety of ways (use of Pearson's correlation is taken from prev work)
for eval compare our method to a bunch of existing methods, content-based , contextual, random
some users did a test get pref set (varies form 19 to 178 tracks for a user) this takes a long time
get lots of tracks from all the methods, shuffle, stick in front of user ask lots of Qs per track
created three categories based on the evals: Hits, trusts, fails

A good system should provide many hits and some trusts avoiding fails
in the results, last.fm (via api) is very good for hits and trusts
everyone else was bad at trusts
the new method was best for non-last.fm with hits, but last.fm is different drawing set of music so they're better
proposed semantics offer an improvement over pure timbral features
but still inferior to industrial approaches, though this proposed work improves considerably, a good way to cold start perhaps

Q (oscar) I dont' understand the last.fm? why didn't you use for sim?

we tried, couldn't get enough info

(oscar follow up) low trust on the content, do you think it's tied to a lack of transparency?

maybe, but our definition of trust just meant user likes and knows.

Q() was the SEM-ALL about finding songs that are close to any or all?

any

UPDATE (~5pm):

Pedro Mercado and Hanna Lukashevich

Hannah is presenting

similarity can be given considered as a graph, then you can do random walks, calc eigen values etc.
but, what if this user doesn't care about somethings? User pref based feature selection.
in the given space, you can then find distance (paper uses Pearson's but other dist could be used)
contraint the space (tricky math, see paper...)
eval: used the MIREX 04content description data
constraints from genre labels
using test train as an example: what's in contraint space, what isn't
mutual information, something else I didn't catch
some graphs showing that there's more awesome with presented method
when looking at outliers, things are less clear but still seem positive
[graphs are page 6 of the pdf, have a look for details]
to wrap up: ML approaches can improve recs at least with our simulated user...
our clustering methods are speedy, though scale is tricky but since our matrix sparse should be doable
Way better than random constraints
future work: stick constraints in feature selector, we did this, to appear in ICML, gives significant imporvement, but causes some trouble, read paper for detail [excellent ICML tease...]

-- coffee and demos now...

Sunday 26 September 2010