Week 19
(October,04,2023)
Attendees:
Updates:
1. Integration of copyrightfpd into Fossology:
- Resolved speed issues from the previous week.
- Evaluated the model's performance on open-source projects from GitHub:
- Ansible:
- Initial count: 510 copyrights.
- After false positive removal: 435.
- Notable overlooked false positives:
© b=eñyei',(c) for c in cmd))(c) for c in cmd), verbosity=1)© error',
- Linux:
- Initial count: 23,419 copyrights.
- After false positive removal: 22,780.
- Sample of overlooked errors:
copyright/by:(c) | Contending |(c) container_of(c, struct wf_lm75_sensor, sens)(C) clock] */ clock-frequency = <12288000>; pwms = <&tpu 0 81 0>;(C) clock](c) (c->hva_dev->dev)
- Ansible:
2. Enhancements in Decluttering using NER:
- Expanded labeled dataset for better NER performance.
- Integrated decluttering functionality into
copyrightfpdand Fossology. Encountered minor integration issues which are currently under investigation. - Showcase of decluttering performance (highlighted parts are recognized copyright material):
Copyright (c) InQuant GmbH Stefan Eletzhofer <stefan.eletzhofer@inquant.de>Copyright (c) 2001 Bill Bumgarner <bbum@friday.com>License: MIT, see below.Copyright (C) 2001 Python Software Foundation, www.python.org Taken from Python2.2, License: PSF - see below.Copyright (C) 2001 Python Software Foundation, www.python.orgTaken from Python2.2, License: PSF - see below.copyright, i.e., "Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006 Python Software Foundation; All Rights Reserved" are retained in Python alone or in any derivative version prepared by Licensee.
Conclusion and Next Steps:
1. Renaming Task Rebrand
copyrightfpdto be more reflective of its Fossology integration.
2. Documentation
- Focus on updating and improving GSoC documentation.
3. Code Organization
- Document and structure the scattered code across Python notebooks for future readability and exploration.