Skip to main content

Coding Week 4&5 Meeting

Attendees

  • Anupam Ghosh
  • Gaurav Mishra
  • Vasudev
  • Ayush Bharadwaj
  • Shreya Singh
  • Kaushlendra Pratap Singh
  • Omar AbdelSamea

Discussions

  • Checking results manually and understanding the edge cases.
  • Implementation of the edge cases like (c) --> copyright, Date needed to be mandatory.
  • Go through different manually checked copyright CSV provided and The final CSV provided by Michael.
  • Traversing the CSV provided by Michael and Implementing the algorithm over it.
  • Implementing a performance score with which the algorithms performance to detect the copyrights is been calculated.

Week 4&5 Progress

  • All the results from the different CSVs were traversed and few edge cases were predicted: Org/person name in a different language is impacting, (c) was not been predicted as copyright and it was excluded, [Date] needed to be an important entity for copyright recognition.
  • Solution was: (c) --> has been changed to "copyright" string and then it was fed to the algorithm, [Date] check has been implemented inside all the checks which individually helps as a final check before calling it a hit.
  • CSV provided by Michael contained 13lakh+ datasets that were not ideal to traverse through all of it once (Jupyter server crashed after continuous 10 hours running).
  • Divided the datasets into chunks of 10,000 and will traverse through it and check the ideal results on all over it.
  • Performance score was calculated {hitscore/No.of copyrights in list}*100, which came out as 82.65%
  • Wiki has been Updated

Conclusion and Further Plans

Understanding the edge cases and calculating the accuracy score over True Positives.