Skip to main content

Week 7

Attendees:

Updates from contributors

  • Soham Banerjee

    • Worked on update endpoint for customize page.
    • The GET endpoint now returns consistent type of data.
    • Will be working on file info page endpoints.
      • Found a bug in the customize page, integer fields are accepting strings.
  • Sushant Kumar

    • Was working on different ways to run ScanCode with different parameters.
    • Integrated the changes for running ScanCode via API, demonstrated the same in the meeting.
      • Worked on copyrights for now. And found API is faster (13 seconds) than CLI (23 seconds) as shown in demo.
    • Still need to work on reading output from the Python script and updating DB.
    • Request from mentor is to push the updated code for ScanCode for recommending updates.
      • Also, to work on changes requested on CDX PR #2507
  • Kavya Shukla

    • Updated the meeting minutes.
    • License is almost finished, will be focusing on Obligations.
    • Work on audit is not done, but test cases is done for other endpoints.
    • Can discuss on E2E testing in next call.
  • Abdelrahman jamal

    • Created a labled result for FOSSology's repo. The copyrights are color coded based on True Positive (green), False Positive (red), Different lang (blue), not actual copyright (grey), confusing (orange).
    • Used this data (14k +ve and 5K -ve) to train classifiers. Started with tf-idf and trained SVM, Random Forest, Navie Bayes.
      • NB can be told to have certial level of confidence before classifying a string.
    • Results are very good, >95% accuracy. Higher recall is aimed on identifying +ve copyrights.
    • Tested out Bert, but is slow and not very performant given the amount of data.
    • More data provided by the mentors.
  • Samuel Dushimimana

    • Working on unlinking folders endpoints.
    • Updating existing PRs.
    • Next steps for the project will be to write testcases and migrate to v2 of API.