Week 7

Attendees:

Updates from contributors

Soham Banerjee
- Worked on update endpoint for customize page.
- The GET endpoint now returns consistent type of data.
- Will be working on file info page endpoints.
  - Found a bug in the customize page, integer fields are accepting strings.
Sushant Kumar
- Was working on different ways to run ScanCode with different parameters.
- Integrated the changes for running ScanCode via API, demonstrated the same in the meeting.
  - Worked on copyrights for now. And found API is faster (13 seconds) than CLI (23 seconds) as shown in demo.
- Still need to work on reading output from the Python script and updating DB.
- Request from mentor is to push the updated code for ScanCode for recommending updates.
  - Also, to work on changes requested on CDX PR #2507
Kavya Shukla
- Updated the meeting minutes.
- License is almost finished, will be focusing on Obligations.
- Work on audit is not done, but test cases is done for other endpoints.
- Can discuss on E2E testing in next call.
Abdelrahman jamal
- Created a labled result for FOSSology's repo. The copyrights are color coded based on True Positive (green), False Positive (red), Different lang (blue), not actual copyright (grey), confusing (orange).
- Used this data (14k +ve and 5K -ve) to train classifiers. Started with tf-idf and trained SVM, Random Forest, Navie Bayes.
  - NB can be told to have certial level of confidence before classifying a string.
- Results are very good, >95% accuracy. Higher recall is aimed on identifying +ve copyrights.
- Tested out Bert, but is slow and not very performant given the amount of data.
- More data provided by the mentors.
Samuel Dushimimana
- Working on unlinking folders endpoints.
- Updating existing PRs.
- Next steps for the project will be to write testcases and migrate to v2 of API.

Attendees:​

Updates from contributors​

Attendees:

Updates from contributors