Coding Week 6 Meeting
Attendees
- Anupam Ghosh
- Gaurav Mishra
- Vasudev
- Ayush Bharadwaj
- Shreya Singh
- Kaushlendra Pratap Singh
- Omar AbdelSamea
Discussions
- Checking results manually and understanding the edge cases.
- Implementation of the edge cases like Differentiating between ['Date'] and ['Cardinal].
- Figuring out the solutions for the DATE and CARDINAL anomaly.
- Go through different manually checked copyright CSV provided and The final CSV provided by Michael.
- Generating the Accuracy score for true positives.
- Thoughts for implementing our own NER model and creating an entity-based table according to our copyrights.
Week 6 Progress
- [Date] needed to be an important entity for copyright recognition. Various solutions have been implemented to get more and more accurate about it.
- Solution was: ['Cardinal'] --> was included into the NER list and then a python date format check has been implemented which will check whether the date is present in the NER["Entity"] list. (This is reducing the accuracy that means it is not working)
- I am working on another solution using regex which will help in removing one more kind of edge case.
- Divided the datasets into chunks of 10,000 and will traverse through it and check the ideal results on all over it.
- Accuracy Score for True Positives was calculated: 87.6% which will further increase after removing few more edge cases.
- The dataset also contains human errors and it is impacting our accuracy score for TP as well.
- Wiki has been Updated
Conclusion and Further Plans
Understanding the edge cases and calculating the accuracy score for statements more than 50 thousand statements.