Week 4
(June 17, 2025 - June 25, 2025)
Meeting 1
(June 25, 2025)
Attendees
Discussions
- Demonstrated the working KeywordAgent, built using
SRINGS.in
from the Nomos agent. - Shared the evaluator script results tested on NomosTestFiles — yielded ~99.5% accuracy.
- Debugged cases with true negatives and realized the need for additional keyword sources.
- Finalized a two-stage filtering architecture for license detection pre-check.
- Walked through code improvements and refactoring done on the Atarashi base code.
KeywordAgent Implementation
Implemented a rule-based keyword detection agent using regex patterns derived from SRINGS.in
. Keywords included:
acknowledg(e|ement|ements)?
agreement
as[\s-]is
copyright
damages
deriv(e|ed|ation|ative|es|ing)
redistribut(e|ion|able|ing)?|distribut(e|ion|able|ing)?
free software
grant
indemnif(i|y|ied|ication|ying)?
intellectual propert(y|ies)?
[^e]liabilit(y|ies)?
licencs?
mis[- ]?represent
open source
patent
permission
public[\s-]domain
require(s|d|ment|ments)?
same terms
see[\s:-]*(https?://|file://|www.|[A-Za-z0-9._/-]+)
source (and|or)? ?binary
source code
subject to
terms and conditions
warrant(y|ies|ed|ing)?
without (fee|restrict(ion|ed)?|limit(ation|ed)?)
severability clause
Evaluation Results
- Ran the KeywordAgent on NomosTestFiles.
- Achieved ~99.5% accuracy, confirming robustness of regex pattern matching.
- Detected minor edge cases (true negatives) which informed the next steps for keyword expansion.
Code Improvements
- Refactored parts of the Atarashi codebase:
- Applied Python best practices (docstrings, function decomposition, consistent naming).
- Improved error handling and logging in preprocessing and agent workflows.
- Commit: https://github.com/fossology/atarashi/compare/master...rajuljha:atarashi:feat/newagent/Keyword
Two-Stage Detection Plan
Decided on a layered pre-check system for Atarashi Scanner:
- Stage 1: Match against the initial keyword list (from SRINGS.in).
- Stage 2: If Stage 1 fails, match against license shortnames and FOSSology license_ref strings.
- If neither stage matches, the file is skipped from scanning.
- If either matches, it is scanned using Atarashi.