Trying to build an Anime recommender system, based on the MyAnimeList kaggle dataset with the plan to eventually host it on a website. So far, I've spent the day hacking together a short script that at already kinda works, although I haven't finetuned anything yet.
I created "recommendations" for myself based on my profile in the dataset, but as the dataset is already more than a year old and as I am notoriously bad at keeping my animelist up to date it's mostly stuff I've already watched.
The top 50 "recommendations" for my (outdated) profile:
Cowboy Bebop, Death Note, Fullmetal Alchemist: Brotherhood, Toki wo Kakeru Shoujo, Howl no Ugoku Shiro, Samurai Champloo, Byousoku 5 Centimeter, Ghost in the Shell, Bakemonogatari, Akira, Tonari no Totoro, Evangelion: 1.0 You Are (Not) Alone, Higurashi no Naku Koro ni, Summer Wars, Mushishi, Hotaru no Haka, Paprika, Evangelion: 2.0 You Can (Not) Advance, Clannad, Serial Experiments Lain, Clannad: After Story, Trigun, Kaze no Tani no Nausicaä, Fate/Zero, Toradora!, K-On!, Angel Beats!, Higashi no Eden, Ergo Proxy, Cowboy Bebop: Tengoku no Tobira, Perfect Blue, Lucky☆Star, Great Teacher Onizuka, Soul Eater, Shingeki no Kyojin, Tenkuu no Shiro Laputa, Ano Hi Mita Hana no Namae wo Bokutachi wa Mada Shiranai., Fate/Zero 2nd Season, Ghost in the Shell: Stand Alone Complex, Kara no Kyoukai 2: Satsujin Kousatsu (Zen), Hellsing Ultimate, Higurashi no Naku Koro ni Kai, Monster, Kara no Kyoukai 5: Mujun Rasen. Majo no Takkyuubin, Kara no Kyoukai 3: Tsuukaku Zanryuu, Psycho-Pass, Suzumiya Haruhi no Yuuutsu (2009), Sayonara Zetsubou Sensei, Dragon Ball Z
Certainly some good recomendations. I watched most of the stuff and enjoyed pretty much everything, so I guess that shows that the system already works quite well? Another possibility is that it's just recommending me stuff everybody likes.
Next up:
- webscrape the up-to-date animelist of users I want to build recommendations for, so that I can remove animes that are already on their list from the recommendations
- maybe I should scrape together a whole new up-to-date dataset while I am at it
- atm I am using matrix factorization for predicting scores users would give animes they haven't watched yet. Of course that only works for users that are already part of the dataset, so I need to think of a way to predict scores of users not part of the dataset