Skip to main content

Coding Week 6 Meeting

Attendees

  • Anupam Ghosh
  • Gaurav Mishra
  • Vasudev
  • Ayush Bharadwaj
  • Shreya Singh
  • Kaushlendra Pratap Singh
  • Omar AbdelSamea

Discussions

  • Checking results manually and understanding the edge cases.
  • Implementation of the edge cases like Differentiating between ['Date'] and ['Cardinal].
  • Figuring out the solutions for the DATE and CARDINAL anomaly.
  • Go through different manually checked copyright CSV provided and The final CSV provided by Michael.
  • Generating the Accuracy score for true positives.
  • Thoughts for implementing our own NER model and creating an entity-based table according to our copyrights.

Week 6 Progress

  • [Date] needed to be an important entity for copyright recognition. Various solutions have been implemented to get more and more accurate about it.
  • Solution was: ['Cardinal'] --> was included into the NER list and then a python date format check has been implemented which will check whether the date is present in the NER["Entity"] list. (This is reducing the accuracy that means it is not working)
  • I am working on another solution using regex which will help in removing one more kind of edge case.
  • Divided the datasets into chunks of 10,000 and will traverse through it and check the ideal results on all over it.
  • Accuracy Score for True Positives was calculated: 87.6% which will further increase after removing few more edge cases.
  • The dataset also contains human errors and it is impacting our accuracy score for TP as well.
  • Wiki has been Updated

Conclusion and Further Plans

Understanding the edge cases and calculating the accuracy score for statements more than 50 thousand statements.