Use case:
based on a software, which other software it is more related to?
How is this done?
1- Calculate topics for corpus based on description (e.g., based on Latent Dirichlet Allocation distance)
2- For each topic, you have the probability of a document to belong to that topic, creating clusters of software.
3- Having a new query (in this case a series of keywords), you would calculate which cluster they are more similar to.
We can also define a metric based on graph similarity (to explore)
Use case:
based on a software, which other software it is more related to?
How is this done?
1- Calculate topics for corpus based on description (e.g., based on Latent Dirichlet Allocation distance)
2- For each topic, you have the probability of a document to belong to that topic, creating clusters of software.
3- Having a new query (in this case a series of keywords), you would calculate which cluster they are more similar to.
We can also define a metric based on graph similarity (to explore)