Skip to main content

Week 19

(October,04,2023)

Attendees:

Updates:

1. Integration of copyrightfpd into Fossology:

  • Resolved speed issues from the previous week.
  • Evaluated the model's performance on open-source projects from GitHub:
    • Ansible:
      • Initial count: 510 copyrights.
      • After false positive removal: 435.
      • Notable overlooked false positives:
        • © b=eñyei',
        • (c) for c in cmd))
        • (c) for c in cmd), verbosity=1)
        • © error',
    • Linux:
      • Initial count: 23,419 copyrights.
      • After false positive removal: 22,780.
      • Sample of overlooked errors:
        • copyright/by:
        • (c) | Contending |
        • (c) container_of(c, struct wf_lm75_sensor, sens)
        • (C) clock] */ clock-frequency = <12288000>; pwms = <&tpu 0 81 0>;
        • (C) clock]
        • (c) (c->hva_dev->dev)

2. Enhancements in Decluttering using NER:

  • Expanded labeled dataset for better NER performance.
  • Integrated decluttering functionality into copyrightfpd and Fossology. Encountered minor integration issues which are currently under investigation.
  • Showcase of decluttering performance (highlighted parts are recognized copyright material):
    1. Copyright (c) InQuant GmbH Stefan Eletzhofer <stefan.eletzhofer@inquant.de>
    2. Copyright (c) 2001 Bill Bumgarner <bbum@friday.com> License: MIT, see below.
    3. Copyright (C) 2001 Python Software Foundation, www.python.org Taken from Python2.2, License: PSF - see below.
    4. Copyright (C) 2001 Python Software Foundation , www.python.org Taken from Python2.2, License: PSF - see below.
    5. copyright, i.e., " Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006 Python Software Foundation ; All Rights Reserved" are retained in Python alone or in any derivative version prepared by Licensee.

Conclusion and Next Steps:

1. Renaming Task Rebrand

  • copyrightfpd to be more reflective of its Fossology integration.

2. Documentation

  • Focus on updating and improving GSoC documentation.

3. Code Organization

  • Document and structure the scattered code across Python notebooks for future readability and exploration.