Week 5
(June 26, 2025 - July 02, 2025)
Meeting 1
(June 30, 2025)
Attendees
Discussions
- Clarified the decision tree regarding the license-possibility pipeline.
- Discussed how to improve the performance of KeywordAgent.
Meeting 2
(July 2, 2025)
Attendees
Discussions
- Completed implementation of the KeywordAgent and raised a PR:
- Evaluated KeywordAgent’s performance with and without licenseRef strings.
- Found performance bottlenecks due to licenseRef expansion (~1600 keywords).
- Identified bugs in Nirjas Agent that forced full file scans, further degrading performance.
However, all was not bad,
- Despite challenges, observed significant improvement in certain scenarios:
- On
No_license_found
files, using only regex-based patterns, scan time was reduced by ~50%.
- On
Results
- Overall KeywordAgent accuracy:
The overall accuracy was increased to a whopping 99.6%
-
Without KeywordAgent being invoked:
-
With KeywordAgent being invoked:
This led to ~49% increase in speed approximately, which is significant for large codebases.
-
Without 1600 licenseRef keywords:
-
With 1600 licenseRef keywords:
This is around 3.7 times more time consuming which is insanely more, which needs to be addressed.
Suggested Improvements
-
Regex Pattern Enhancements:
- New patterns proposed:
Exception
-[0-9]+\.[0-9]+
-only-or-later
Version\s[0-9]+\.[0-9]+
Version-[0-9]+\.[0-9]+
SPDX-License-Identifier - Goal: Reduce dependency on verbose keyword list and increase detection precision.
- New patterns proposed:
-
Nirjas Bug Fixes:
- Fixing the bug would allow comment-only scanning, avoiding unnecessary full-text analysis.
-
Broader Integration Plan:
- Outlined future development milestones:
- Separate download sources for files.
- Fix Nirjas to extract relevant portions only.
- Use KeywordAgent to skip non-relevant files based on keyword absence.
- Integrate the system with FOSSology.
- Handle edge cases and further optimize Nirjas agent.
- Outlined future development milestones:
Next Steps
- Fix Nirjas comment-extraction bug.
- Finalize updated regex patterns.
- Refactor and optimize KeywordAgent to use refined patterns.