Week 19
(October,04,2023)
Attendees:
Updates:
1. Integration of copyrightfpd into Fossology:
- Resolved speed issues from the previous week.
- Evaluated the model's performance on open-source projects from GitHub:
- Ansible:
- Initial count: 510 copyrights.
- After false positive removal: 435.
- Notable overlooked false positives:
- © b=eñyei',
- (c) for c in cmd))
- (c) for c in cmd), verbosity=1)
- © error',
 
 
- Linux:
- Initial count: 23,419 copyrights.
- After false positive removal: 22,780.
- Sample of overlooked errors:
- copyright/by:
- (c) | Contending |
- (c) container_of(c, struct wf_lm75_sensor, sens)
- (C) clock] */ clock-frequency = <12288000>; pwms = <&tpu 0 81 0>;
- (C) clock]
- (c) (c->hva_dev->dev)
 
 
 
- Ansible:
2. Enhancements in Decluttering using NER:
- Expanded labeled dataset for better NER performance.
- Integrated decluttering functionality into copyrightfpdand Fossology. Encountered minor integration issues which are currently under investigation.
- Showcase of decluttering performance (highlighted parts are recognized copyright material):
- Copyright (c) InQuant GmbH Stefan Eletzhofer <stefan.eletzhofer@inquant.de>
- Copyright (c) 2001 Bill Bumgarner <bbum@friday.com>License: MIT, see below.
- Copyright (C) 2001 Python Software Foundation, www.python.org Taken from Python2.2, License: PSF - see below.
- Copyright (C) 2001 Python Software Foundation, www.python.org- Taken from Python2.2, License: PSF - see below.
- copyright, i.e., "- Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006 Python Software Foundation; All Rights Reserved" are retained in Python alone or in any derivative version prepared by Licensee.
 
Conclusion and Next Steps:
1. Renaming Task Rebrand
- copyrightfpdto be more reflective of its Fossology integration.
2. Documentation
- Focus on updating and improving GSoC documentation.
3. Code Organization
- Document and structure the scattered code across Python notebooks for future readability and exploration.