Coding Week 1 Meeting
Attendees
- Anupam Ghosh
- Gaurav Mishra
- Vasudev
- Ayush Bharadwaj
- Shreya Singh
- Kaushlendra Pratap Singh
- Omar AbdelSamea
Discussions
- Get Started with Implementing the NER and POS tagging on a sample Copyright
- Implement NER and POS tagging on a larger dataset that was observed from Fossology itself.
- Look for and Relations that can be implemented and defined as our logical checks for Detection of False Positives.
- Use of Spacy/NLTK/Others
Week 1 Progress
- Spacy was decided to be used as the library prompt for NER and POS tagging. As it supports both the functionalities in other languages as well and it is very future proof.
- NER and POS Tagging pre-trained models were set up and implemented on test copyright statements.
- Set up fossology and gathered a Synthetic dataset by using pre-existed Copyright Agent, which helped us give a dataset with all four quadrants of the prediction.
- Codebase has been written in such a way that can help in comparing and finding the relations
- Wiki has been Updated
The Collaboratory Gist can be visited.
Conclusion and Further Plans
Implementation of POS tagging and NER was planned for next few weeks.