Skip to main content

Week 4

(June 17, 2025 - June 25, 2025)

Meeting 1

(June 25, 2025)

Attendees

Discussions

  • Demonstrated the working KeywordAgent, built using SRINGS.in from the Nomos agent.
  • Shared the evaluator script results tested on NomosTestFiles — yielded ~99.5% accuracy.
  • Debugged cases with true negatives and realized the need for additional keyword sources.
  • Finalized a two-stage filtering architecture for license detection pre-check.
  • Walked through code improvements and refactoring done on the Atarashi base code.

KeywordAgent Implementation

Implemented a rule-based keyword detection agent using regex patterns derived from SRINGS.in. Keywords included:

acknowledg(e|ement|ements)?
agreement
as[\s-]is
copyright
damages
deriv(e|ed|ation|ative|es|ing)
redistribut(e|ion|able|ing)?|distribut(e|ion|able|ing)?
free software
grant
indemnif(i|y|ied|ication|ying)?
intellectual propert(y|ies)?
[^e]liabilit(y|ies)?
licencs?
mis[- ]?represent
open source
patent
permission
public[\s-]domain
require(s|d|ment|ments)?
same terms
see[\s:-]*(https?://|file://|www.|[A-Za-z0-9._/-]+)
source (and|or)? ?binary
source code
subject to
terms and conditions
warrant(y|ies|ed|ing)?
without (fee|restrict(ion|ed)?|limit(ation|ed)?)
severability clause

Evaluation Results

  • Ran the KeywordAgent on NomosTestFiles.
  • Achieved ~99.5% accuracy, confirming robustness of regex pattern matching.
  • Detected minor edge cases (true negatives) which informed the next steps for keyword expansion.

Code Improvements

  • Refactored parts of the Atarashi codebase:
    • Applied Python best practices (docstrings, function decomposition, consistent naming).
    • Improved error handling and logging in preprocessing and agent workflows.

Two-Stage Detection Plan

image

Decided on a layered pre-check system for Atarashi Scanner:

  1. Stage 1: Match against the initial keyword list (from SRINGS.in).
  2. Stage 2: If Stage 1 fails, match against license shortnames and FOSSology license_ref strings.
  3. If neither stage matches, the file is skipped from scanning.
  4. If either matches, it is scanned using Atarashi.