Week 19
(October,04,2023)
Attendees:
Updates:
1. Integration of copyrightfpd
into Fossology:
- Resolved speed issues from the previous week.
- Evaluated the model's performance on open-source projects from GitHub:
- Ansible:
- Initial count: 510 copyrights.
- After false positive removal: 435.
- Notable overlooked false positives:
© b=eñyei',
(c) for c in cmd))
(c) for c in cmd), verbosity=1)
© error',
- Linux:
- Initial count: 23,419 copyrights.
- After false positive removal: 22,780.
- Sample of overlooked errors:
copyright/by:
(c) | Contending |
(c) container_of(c, struct wf_lm75_sensor, sens)
(C) clock] */ clock-frequency = <12288000>; pwms = <&tpu 0 81 0>;
(C) clock]
(c) (c->hva_dev->dev)
- Ansible:
2. Enhancements in Decluttering using NER:
- Expanded labeled dataset for better NER performance.
- Integrated decluttering functionality into
copyrightfpd
and Fossology. Encountered minor integration issues which are currently under investigation. - Showcase of decluttering performance (highlighted parts are recognized copyright material):
Copyright (c) InQuant GmbH Stefan Eletzhofer <stefan.eletzhofer@inquant.de>
Copyright (c) 2001 Bill Bumgarner <bbum@friday.com>
License: MIT, see below.Copyright (C) 2001 Python Software Foundation, www.python.org Taken from Python2.2
, License: PSF - see below.Copyright (C) 2001 Python Software Foundation
, www.python.orgTaken from Python2.2
, License: PSF - see below.copyright, i.e., "
Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006 Python Software Foundation
; All Rights Reserved" are retained in Python alone or in any derivative version prepared by Licensee.
Conclusion and Next Steps:
1. Renaming Task Rebrand
copyrightfpd
to be more reflective of its Fossology integration.
2. Documentation
- Focus on updating and improving GSoC documentation.
3. Code Organization
- Document and structure the scattered code across Python notebooks for future readability and exploration.