Week 8
Meeting 10
(August 4th, 2022)
GSOC 2022 weekly update
Attendees
- Gaurav Mishra
- Shaheem Azmal M MD
- Vivek Kumar
- Kaushlendra Pratap
- Ayush Bhardwaj
- Vasudev Maduri
- Sushant Kumar
- Rohit Pandey
- Feng Wenhan
- Soham Banerjee
- Samuel Dushimimana
- Krishna Mahato
Updates
- Doc2vec Model requires tagged document for training the model, such that it can make a vector for each document that can be further use to calculate cosine similarity for the license statement. Due to lack of memory in my computer I was not able to tag whole dataset. So currently for checking the working of model, this week I have trained a smaller part of dataset on Doc2vec Model.
- Created agent for the same trained model on atarashi and got the accuracy score of 8 percent.
Total files scanned = 100
Successfully matched = 8
++++++++++++++++++ Result ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++
---> Total time elapsed: 13.02 Seconds <---
---> Accuracy: 8.0% <---
++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++
- Working of agent is shown below:
{
"file": "/home/shushant/check.py",
"results": [
{
"description": "",
"shortname": "BSD-1-Clause",
"sim_score": 0.5472090244293213,
"sim_type": "semanticTextSim"
}
]
}
Conclusion and Further Plans
- Trainig has been done on a smaller part of dataset. Will train Doc2vec Model on whole dataset and see if any further improvement is needed or not.
- Will start working on the documentation for final evalutation.