Skip to main content

Meeting 10

(August 1,2024)

Attendees:

Discussion:

  • Continued work on license obligations and compatibility
  • Example analyses using LLM for license obligations conversion
  • Evaluating accuracy of LLM-based approaches for legal obligations

Experimenting with License Obligation Conversion via LLM

  • Objective:
    I focused on converting license texts to obligations using LLMs through a structured prompt. The goal was to extract obligations similar to OSADL's structured format.

  • Prompt Used:

    Convert the Following License Text to License Obligations License Text: {text}
  • Results:
    Several examples were tested, and while the LLM was able to generate obligations in most cases, they were not always fully aligned with the structured approach required by OSADL.

Example 1: Universal Permissive License (UPL), Version 1.0:
Obligations from LLM:

Include Copyright notices
Include License text OR Reference To https://unlicense.org/

Actual Obligations:

USE CASE Source code delivery OR Binary delivery
YOU MUST Provide Copyright notices
EITHER
YOU MUST Provide License text
OR
YOU MUST Reference License text

Example 2: Bitstream Vera Font License:
Obligations from LLM:

Include Copyright notices
Include Trademark notice
Include License text
Rename Font if Modified AND Font Name Contains "Bitstream" OR "Vera"
Do Not Sell Font By Itself
Do Not Promote

Actual Obligations:

USE CASE Font delivery
YOU MUST Provide Copyright notices
ATTRIBUTE In Font File
YOU MUST Provide Trademark notice
ATTRIBUTE In Font File
YOU MUST Provide License text
ATTRIBUTE In Font File
IF Font Modification
YOU MUST NOT Use "Bitstream" OR "Vera" In Font Name
YOU MUST NOT Sell Font
ATTRIBUTE Stand-alone
YOU MUST NOT Promote

Example 3: Attribution Assurance License:
Obligations from LLM:

Source code delivery
Provide License text
Binary delivery
Provide License text
Display Attribution notice
Name
Professional identification
URL
No Trademark endorsement

Actual Obligations: None as this license doesn't have a corresponding OSADL obligation

Challenges:
The LLM struggles with complex license structures and nuances. For example, legal experts are often required to interpret certain clauses accurately. With legal supervision, LLMs can help as an initial draft tool but are not a replacement for expert evaluation.

Evaluation of Clause Accuracy in License Obligations

  • Objective:
    I developed a prompt to check if specific clauses from obligations match the license text.

  • Prompt Used:

    [Task]
    Evaluate the accuracy of each clause within the provided set of obligations against the given open-source license text. Determine if each clause is valid (supported by the license), invalid (contradicts the license), or partially valid (partially accurate or open to interpretation).

    [Instructions]
    1. Carefully analyze the open-source license text.
    2. Examine each clause within the provided obligations.
    3. Compare each clause to the relevant sections of the license text.
    4. Categorize each clause as valid, invalid, or partially valid.
    5. Provide a clear explanation for each assessment, citing specific license text sections and any interpretations.
    6. Present your analysis in the following list format only, without any additional text or commentary:
    Clause: [Clause text]
    Result: [valid/invalid/partially valid]
    Explanation: [Your detailed explanation, citing specific license text sections and any relevant interpretations]
    7. Be Concise and only follow the output format provided in Instruction 6 without any introductions or conclusions

    [Additional Notes]
    1. if a clause contains multiple parts, assess each part separately and then as a whole.
    2. Be concise yet thorough in your explanations.
    3. Use clear, unambiguous language.
    4. Maintain a professional, objective tone.

    [License Text]
    {license_text}

    [Corresponding Obligations]
    {obligations}
  • Results:
    The LLM was able to categorize clauses as valid, invalid, or partially valid with ~93% accuracy. False negatives were primarily due to minor ambiguities or interpretation issues. After manual review, the accuracy is estimated to be closer to 95-96%.

Example: W3C Software License
License Text:

This work is being provided by the copyright holders under the following license.

License
By obtaining and/or copying this work, you (the licensee) agree that you have read, understood, and will comply with the following terms and conditions.

Permission to copy, modify, and distribute this work, with or without modification, for any purpose and without fee or royalty is hereby granted, provided that you include the following on ALL copies of the work or portions thereof, including modifications:

• The full text of this NOTICE in a location viewable to users of the redistributed or derivative work.
• Any pre-existing intellectual property disclaimers, notices, or terms and conditions. If none exist, the W3C Software and Document Short Notice should be included.
• Notice of any changes or modifications, through a copyright statement on the new code or document such as "This software or document includes material copied from or derived from [title and URI of the W3C document]. Copyright (c) [YEAR] W3C® (MIT, ERCIM, Keio, Beihang)."

Disclaimers
THIS WORK IS PROVIDED "AS IS," AND COPYRIGHT HOLDERS MAKE NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE OR DOCUMENT WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.

COPYRIGHT HOLDERS WILL NOT BE LIABLE FOR ANY DIRECT, INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF ANY USE OF THE SOFTWARE OR DOCUMENT.

The name and trademarks of copyright holders may NOT be used in advertising or publicity pertaining to the work without specific, written prior permission. Title to copyright in this work will at all times remain with copyright holders.

Obligations:

USE CASE Source code delivery OR Binary delivery
YOU MUST Provide License text
ATTRIBUTE Viewable
YOU MUST Provide Legal notices
IF NOT Legal notices
YOU MUST Provide Copyright notice "W3C Software and Document Short Notice https://www.w3.org/Consortium/Legal/2015/copyright-software-short-notice.html"
IF Software modification
YOU MUST Provide Copyright notice AND Modification notice
YOU MUST NOT Promote

LLM Response:

Clause: YOU MUST Provide License text
Result: Valid
Explanation: The license text explicitly states that the licensee agrees to comply with the terms and conditions, which includes providing the full text of the NOTICE (license text) on ALL copies of the work or portions thereof, including modifications. (Section: Permission to copy, modify, and distribute this work)

Clause: ATTRIBUTE Viewable
Result: Partially Valid
Explanation: The license text does not explicitly mention the requirement to attribute the work as "Viewable". However, it does require the inclusion of the full text of the NOTICE (license text) on ALL copies of the work or portions thereof, including modifications. This implies that the attribution should be viewable, but the exact wording is not specified. (Section: Permission to copy, modify, and distribute this work)

Clause: YOU MUST Provide Legal notices
Result: Valid
Explanation: The license text requires the inclusion of any pre-existing intellectual property disclaimers, notices, or terms and conditions on ALL copies of the work or portions thereof, including modifications. This implies that the licensee must provide legal notices. (Section: Permission to copy, modify, and distribute this work)

Clause: IF NOT Legal notices
Result: Valid
Explanation: The license text requires the inclusion of the W3C Software and Document Short Notice if no pre-existing intellectual property disclaimers, notices, or terms and conditions exist. This implies that the licensee must provide a legal notice if none exist. (Section: Permission to copy, modify, and distribute this work)

Clause: IF Software modification
Result: Valid
Explanation: The license text requires the inclusion of the full text of the NOTICE (license text) on ALL copies of the work or portions thereof, including modifications. Additionally, it requires the inclusion of a copyright notice and modification notice if the software is modified. This implies that the licensee must provide a copyright notice and modification notice if the software is modified. (Section: Permission to copy, modify, and distribute this work)

Clause: YOU MUST NOT Promote
Result: Valid
Explanation: The license text explicitly states that the name and trademarks of copyright holders may NOT be used in advertising or publicity pertaining to the work without specific, written prior permission. This implies that the licensee must not promote the work without permission. (Section: Disclaimers)

Challenges:
Some clauses were marked as partially valid due to wording differences or interpretation differences, which could be refined with further work.

Work on License Compatibility Analysis

  • Objective:
    I experimented with checking license compatibility using both license texts and obligations. The analysis was done by comparing attributes like attribution requirements, copyleft conditions, and modification rights.

  • Prompt Used:

    License Obligation Analysis

    Analyze the obligations imposed on licensees by the two license texts provided below. Consider the following key aspects:

    1. **Attribution:**
    * Do the licenses require attribution of the original author(s)? If so, what specific information must be included, and in what format?
    * Are there any restrictions on how the attribution can be presented (e.g., size, placement)?

    2. **Copyleft/ShareAlike:**
    * Do the licenses include copyleft or ShareAlike provisions? If so, what obligations do these impose on the distribution of derivative works?
    * Are there any exceptions or exemptions to the copyleft/ShareAlike requirements?

    3. **Modification and Distribution:**
    * Do the licenses allow modification of the software? If so, are there any conditions or restrictions on how the modified software can be distributed?
    * Are there any requirements for the disclosure of source code or changes made to the software?

    4. **Commercial Use:**
    * Do the licenses explicitly permit or restrict commercial use of the software?
    * If commercial use is permitted, are there any specific obligations or limitations associated with it?

    5. **Patent Grants:**
    * Do the licenses include any patent grants or licenses? If so, what rights do they grant, and under what conditions?

    6. **Liability and Warranty Disclaimers:**
    * Do the licenses include disclaimers of liability or warranty? If so, what are the specific terms and limitations of these disclaimers?

    Based on your analysis, provide a comprehensive summary of the obligations imposed by each license. Highlight any potential conflicts or ambiguities between the obligations of the two licenses.

    Overall Verdict:

    Based on the analysis above, provide an overall verdict on the relationship between the obligations of the two licenses. Are the obligations generally compatible, or are there significant conflicts that would make it difficult or impossible to comply with both licenses simultaneously? Briefly explain the reasoning behind your verdict.

    License Text A:
    [Insert the full text of License A here]

    License Text B:
    [Insert the full text of License B here]
  • Results:
    The LLM produced somewhat accurate results but often misinterpreted certain compatibility conflicts. While it was able to detect simple compatibility issues, it struggled with more nuanced cases.

Challenges:
The LLM occasionally implied conflicts where none existed and missed actual conflicts in more complex cases. Further refinement of the prompt and evaluation approach is needed.

Next Steps and Conclusion

  • Refinement of License Identification Process:
    I plan to shift focus away from full license identification using LLMs. Instead, the goal will be to have the LLM extract license-identifying parts from comments, with further analysis done using semantic search or algorithms to match the text to known licenses.

  • PR to Atarashi Repository:
    I will make a pull request (PR) to the Fossology Atarashi repository with the semantic search functionality, but I will focus less on working on it going forward.

Overall, the project continues to show promise, with LLMs proving capable of initial obligation drafting, but further work is needed to ensure precision and reduce interpretation ambiguities.