Skip to main content

Week 5

(June 26, 2025 - July 02, 2025)

Meeting 1

(June 30, 2025)

Attendees

Discussions

  • Clarified the decision tree regarding the license-possibility pipeline.
  • Discussed how to improve the performance of KeywordAgent.

Meeting 2

(July 2, 2025)

Attendees

Discussions

  • Completed implementation of the KeywordAgent and raised a PR:
  • Evaluated KeywordAgent’s performance with and without licenseRef strings.
  • Found performance bottlenecks due to licenseRef expansion (~1600 keywords).
  • Identified bugs in Nirjas Agent that forced full file scans, further degrading performance.

However, all was not bad,

  • Despite challenges, observed significant improvement in certain scenarios:
    • On No_license_found files, using only regex-based patterns, scan time was reduced by ~50%.

Results

  1. Overall KeywordAgent accuracy: image

The overall accuracy was increased to a whopping 99.6%


  1. Without KeywordAgent being invoked: image

  2. With KeywordAgent being invoked: image

This led to ~49% increase in speed approximately, which is significant for large codebases.


  1. Without 1600 licenseRef keywords: image

  2. With 1600 licenseRef keywords: image

This is around 3.7 times more time consuming which is insanely more, which needs to be addressed.


Suggested Improvements

  1. Regex Pattern Enhancements:

    • New patterns proposed:
      Exception
      -[0-9]+\.[0-9]+
      -only-or-later
      Version\s[0-9]+\.[0-9]+
      Version-[0-9]+\.[0-9]+
      SPDX-License-Identifier
    • Goal: Reduce dependency on verbose keyword list and increase detection precision.
  2. Nirjas Bug Fixes:

    • Fixing the bug would allow comment-only scanning, avoiding unnecessary full-text analysis.
  3. Broader Integration Plan:

    • Outlined future development milestones:
      1. Separate download sources for files.
      2. Fix Nirjas to extract relevant portions only.
      3. Use KeywordAgent to skip non-relevant files based on keyword absence.
      4. Integrate the system with FOSSology.
      5. Handle edge cases and further optimize Nirjas agent.

image

Next Steps

  • Fix Nirjas comment-extraction bug.
  • Finalize updated regex patterns.
  • Refactor and optimize KeywordAgent to use refined patterns.