Week 7
(July,12,2023)
Attendees:
Updates:
Script Development
- Scancodes Library: Developed a script that utilizes the Scancodes library to extract copyrights from a directory. The script can be accessed here.
- Comparison Script: Created a second script that contrasts the copyrights identified by the Scancodes library with those identified by Fossology. This script can be found here.
Scancodes Library Observations
- Notably, the Scancodes library does not extract the copyright text verbatim from its source file. Instead, it identifies copyrights and then reconstructs them based on internal grammar rules. This necessitated modifications in the comparison code, such as converting the copyright symbol © to variants like (c), (C), and the word "copyright" followed by (c). Further examples are provided in the second gist.
Fossology Repository
- Almost concluded the review of copyrights in the Fossology repository. These copyrights are diverse and demanded heightened scrutiny compared to other repositories.
Conclusion and Further Plans:
Fossology Repository
- Conclude the review of copyrights.
Next Steps
- Transition to the task of copyright classification.