Skip to main content

Meeting 12

(August 15,2024)

Attendees:

  • No meeting held due to a national holiday.

Discussion:

Atarashi Integration and Build Process

  • Build Errors and Code Integration:

    1. Approach: Fixed the errors with the Atarashi build process, successfully integrating the semantic search code into the codebase.

    2. Evaluation: Began evaluating the integrated semantic search agent within Atarashi. Some issues were identified where the agent missed certain cases, and efforts were made to refine these areas to improve performance.

LLM Experimentation for License-Relevant Text Detection

  • LLM Selection and Testing:

    1. Objective: Continued experimenting with various LLMs to determine which was best suited for detecting license-relevant text in code files.

    2. Results: Tested Mistral 7b, Gemma 2 9b, and LLama 3 8b. All models performed well in general, but from my experience, Gemma 2 9b was the best performer. While no formal metrics were used, Gemma 2 9b appeared to handle more nuanced cases better than the other two models. Being the latest release, this was not unexpected, but all models were largely able to detect the license-relevant text without significant issues.

Conclusions and Next Steps

  • Refine Atarashi Agent: Continue to work on refining the semantic search agent in Atarashi to address the missed cases and improve overall accuracy.

  • LLM Selection: Focus on further experimenting with Gemma 2 9b for more comprehensive testing and continue exploring ways to quantify its performance in comparison to the other models.