I'm not sure if this is the right place to ask about this -- if it should go somewhere else, let me know.
A while ago I was working with some rating data and found that FactorizationRecommender's SGD seemed to be decaying the learning rate too fast. (The results would get better as I made the initial step size larger, which seemed odd.) I compared it against a library (scikit-surprise) that uses vanilla SGD with a constant learning rate, and the vanilla SGD could get a much lower RMSE.
I've reproduced the same issue here with fake data that should be exactly reproducible by a factorization recommender (and is reproducible with negligible error by scikit-surprise's SVD recommender, but not by graphlab's factorization recommender). See that notebook for more details.
The FactorizationRecommender class has a bunch of attributes that aren't described in the documentation but which can still be set with create(), some of which control aspects of the solver. One of them, step_size_decrease_rate, seems to control "n" in a formula like
learning rate = init_rate / (lambda * t)^n
A similar formula appears in this paper, which is linked in the documentation here. lambda seems to be fixed at a value close to 1 and I can't find any attribute that controls it.
Am I doing something wrong? I've stopped using graphlab's recommenders because of the above. It'd be nice to know how to make them fall back to vanilla SGD (this would correspond to n=0 but step_size_decrease_rate doesn't go below 0.5), or how to control the learning rate schedule more fully.
I'm not sure if this is the right place to ask about this -- if it should go somewhere else, let me know.
A while ago I was working with some rating data and found that FactorizationRecommender's SGD seemed to be decaying the learning rate too fast. (The results would get better as I made the initial step size larger, which seemed odd.) I compared it against a library (scikit-surprise) that uses vanilla SGD with a constant learning rate, and the vanilla SGD could get a much lower RMSE.
I've reproduced the same issue here with fake data that should be exactly reproducible by a factorization recommender (and is reproducible with negligible error by scikit-surprise's SVD recommender, but not by graphlab's factorization recommender). See that notebook for more details.
The FactorizationRecommender class has a bunch of attributes that aren't described in the documentation but which can still be set with create(), some of which control aspects of the solver. One of them, step_size_decrease_rate, seems to control "n" in a formula like
learning rate = init_rate / (lambda * t)^n
A similar formula appears in this paper, which is linked in the documentation here. lambda seems to be fixed at a value close to 1 and I can't find any attribute that controls it.
Am I doing something wrong? I've stopped using graphlab's recommenders because of the above. It'd be nice to know how to make them fall back to vanilla SGD (this would correspond to n=0 but step_size_decrease_rate doesn't go below 0.5), or how to control the learning rate schedule more fully.