Skip to main content

Meeting 8

(July 18,2024)

Attendees:

Discussion:

  • Improvements on semantic search (somehow)
  • test nomos Discussed several interesting ideas including license compatibility and obligations and so on

Evaluation of Semantic Search on Nomos Test Dataset

  • Dataset Overview:
    The Nomos test dataset consists of 2054 rows, but I evaluated the algorithm using only 1000 rows to gather initial results.

  • Initial Results:

    1. Accuracy: 65.1%
    2. Coverage: 58.2%

    Challenges:

    • My parser isn't perfect, and there are cases where the algorithm matches minor variations of licenses (e.g., AFL-1.2 instead of AFL-1.1), which is counted as incorrect.
    • The real accuracy is likely higher, in the range of 70-75%, considering these minor variations.
  • Next Steps:

    1. Refine the parser to better handle license versions and variations.
    2. Reevaluate accuracy with a more comprehensive dataset to improve these metrics.

Work on License Obligations

  • Introduction to License Obligations:
    License obligations refer to the specific legal requirements imposed by open-source licenses to ensure compliance. I began working on this aspect to explore its potential in the project.

    Key Concepts:
    OSADL (Open Source Automation Development Lab) has developed the "Open Source License Obligations Checklists" project, which helps organizations comply with open-source licenses by:

    1. Encoding Obligations: Defining what actions are required or prohibited by different licenses.
    2. Creating Checklists: Structured lists of obligations for various licenses.
    3. Evaluating Compatibility: Assessing how different licenses interact and whether they can be used together.
  • Progress:
    I started by using OSADL’s checklist as a framework for obligations. Here’s a link to OSADL's obligations for various licenses.

    Example Obligation for the Academic Free License v2.0 (AFL-2.0):

    USE CASE Source code delivery OR Binary delivery  
    YOU MUST Reference License text
    YOU MUST Search License acceptance
    ATTRIBUTE Reasonable
    IF Software modification
    YOU MUST Forward Copyright notices
    YOU MUST Forward Patent notice
    YOU MUST Forward Trademark notice
    YOU MUST Forward License notice
    YOU MUST Provide Modification notice
    YOU MUST NOT Promote
    IF Modified work Under Original license
    EITHER
    YOU MUST Include Source code Of Modified work
    ATTRIBUTE Machine-readable
    OR
    YOU MUST Provide Delayed source code delivery Of Modified work
    ATTRIBUTE Machine-readable
    ATTRIBUTE Via Internet
    ATTRIBUTE No profit
    ATTRIBUTE Duration As long as distributed
    YOU MUST Reference Source code
    ATTRIBUTE Below Copyright notices
    PATENT HINTS Yes

Experimenting with License Obligation Conversion via LLM

  • Objective:
    I attempted to evaluate whether an LLM could generate license obligations similar to OSADL’s checklist through prompt engineering.

  • Initial Results:
    Using prompt engineering, I tested multiple LLMs to generate obligations directly from license texts. However, the results were inconsistent due to:

  1. Ambiguity in Interpretation: OSADL's obligations are highly detailed and rely on specific legal interpretations. These are difficult for an LLM to replicate without detailed context.
  2. LLM Limitations: While the LLM could generate obligations, they were not structured or detailed enough to meet the OSADL standard.
  • Challenges and Next Steps:
  1. The LLM produced obligations, but the results did not meet the precision required, mainly due to different interpretations.
  2. Moving forward, I plan to refine the prompt further and explore additional ways to guide the LLM towards creating obligations more aligned with legal standards.

Conclusions and Next Steps

  • Further Work on Obligations:
    I will continue working on converting licenses into obligations, refining the prompt-engineering process and exploring different approaches to improve the LLM’s performance.

  • Other Tasks:

  1. Begin acknowledging licenses from notice files, extending the current capability of license identification.
  2. Continue refining the parser and matching algorithm to better handle edge cases and improve overall accuracy and speed.

Additional Discussions and Potential Advancements

In this meeting, several potential directions forward were discussed regarding obligations and other interesting license-related applications:

  1. Validate Obligation Rule Against License Text:

    • Provide an obligation rule (e.g., "YOU MUST Provide Modification report") to the LLM.
    • Ask the LLM to check if this rule exists within the license text and provide an explanation.
  2. Potential Advancements

    1. Create Human-Readable Obligations from Machine-Readable Obligations:

      • Convert OSADL's structured, machine-readable obligations into a more user-friendly format.
    2. Check License Compatibility Using Obligations or License Text:

      • Explore using obligations or the raw license text to determine if different licenses are compatible with each other.
    3. Convert License Text to Obligations Using LLM:

      • Continue refining the prompt engineering process to improve the LLM's ability to generate obligations directly from license text.

    These potential advancements offer various avenues for further exploration and development within the project.