Sunday 26 September 2010

afternoon papers


content-based stuff now:

Dmitry Bogdanov, Martín Haro, Ferdinand Fuhrmann, Emilia Gómez and Perfecto Herrera

Dmitry presenting
  • Sim is not rec. need similarity
  • can we improve content based rec by merging pref data?
  • gmm + pref model
  • process:
  1. ask user for small set of tracks that specify the user's preference by example
  2. get bag of frames on these
  3. SVMs to get sematics (probablistic)
  4. in this semantic space, search for tracks
  • can search in a variety of ways (use of Pearson's correlation is taken from prev work)
  • for eval compare our method to a bunch of existing methods, content-based , contextual, random
  • some users did a test get pref set (varies form 19 to 178 tracks for a user) this takes a long time
  • get lots of tracks from all the methods, shuffle, stick in front of user ask lots of Qs per track
  • created three categories based on the evals: Hits, trusts, fails
  1. Hits -user likes, is new
  2. trusts - user likes, is not new
  3. fail - no to all
  4. unclear - the rest (18%)
  • A good system should provide many hits and some trusts avoiding fails
  • in the results, last.fm (via api) is very good for hits and trusts
  • everyone else was bad at trusts
  • the new method was best for non-last.fm with hits, but last.fm is different drawing set of music so they're better
  • proposed semantics offer an improvement over pure timbral features
  • but still inferior to industrial approaches, though this proposed work improves considerably, a good way to cold start perhaps
Q (oscar) I dont' understand the last.fm? why didn't you use for sim?
we tried, couldn't get enough info
(oscar follow up) low trust on the content, do you think it's tied to a lack of transparency?
maybe, but our definition of trust just meant user likes and knows.

Q() was the SEM-ALL about finding songs that are close to any or all?
any


UPDATE (~5pm):

Pedro Mercado and Hanna Lukashevich
Hannah is presenting

  • clustering can help you swim in the sea of data
  • users can fix incorrect clusters, positive feedback
  • system diagram:

  • similarity can be given considered as a graph, then you can do random walks, calc eigen values etc.
  • but, what if this user doesn't care about somethings? User pref based feature selection.
  • in the given space, you can then find distance (paper uses Pearson's but other dist could be used)
  • contraint the space (tricky math, see paper...)
  • eval: used the MIREX 04content description data
  • constraints from genre labels
  • using test train as an example: what's in contraint space, what isn't
  • mutual information, something else I didn't catch
  • some graphs showing that there's more awesome with presented method
  • when looking at outliers, things are less clear but still seem positive
  • [graphs are page 6 of the pdf, have a look for details]
  • to wrap up: ML approaches can improve recs at least with our simulated user...
  • our clustering methods are speedy, though scale is tricky but since our matrix sparse should be doable
  • Way better than random constraints
  • future work: stick constraints in feature selector, we did this, to appear in ICML, gives significant imporvement, but causes some trouble, read paper for detail [excellent ICML tease...]
-- coffee and demos now...

No comments: