Coding Week 6 Meeting

Checking results manually and understanding the edge cases.
Implementation of the edge cases like Differentiating between ['Date'] and ['Cardinal].
Figuring out the solutions for the DATE and CARDINAL anomaly.
Go through different manually checked copyright CSV provided and The final CSV provided by Michael.
Generating the Accuracy score for true positives.
Thoughts for implementing our own NER model and creating an entity-based table according to our copyrights.

[Date] needed to be an important entity for copyright recognition. Various solutions have been implemented to get more and more accurate about it.
Solution was: ['Cardinal'] --> was included into the NER list and then a python date format check has been implemented which will check whether the date is present in the NER["Entity"] list. (This is reducing the accuracy that means it is not working)
I am working on another solution using regex which will help in removing one more kind of edge case.
Divided the datasets into chunks of 10,000 and will traverse through it and check the ideal results on all over it.
Accuracy Score for True Positives was calculated: 87.6% which will further increase after removing few more edge cases.
The dataset also contains human errors and it is impacting our accuracy score for TP as well.
Wiki has been Updated

Understanding the edge cases and calculating the accuracy score for statements more than 50 thousand statements.