Meeting 12
(August 15,2024)
Attendees:
- No meeting held due to a national holiday.
Discussion:
Atarashi Integration and Build Process
-
Build Errors and Code Integration:
-
Approach: Fixed the errors with the Atarashi build process, successfully integrating the semantic search code into the codebase.
-
Evaluation: Began evaluating the integrated semantic search agent within Atarashi. Some issues were identified where the agent missed certain cases, and efforts were made to refine these areas to improve performance.
-
LLM Experimentation for License-Relevant Text Detection
-
LLM Selection and Testing:
-
Objective: Continued experimenting with various LLMs to determine which was best suited for detecting license-relevant text in code files.
-
Results: Tested Mistral 7b, Gemma 2 9b, and LLama 3 8b. All models performed well in general, but from my experience, Gemma 2 9b was the best performer. While no formal metrics were used, Gemma 2 9b appeared to handle more nuanced cases better than the other two models. Being the latest release, this was not unexpected, but all models were largely able to detect the license-relevant text without significant issues.
-
Conclusions and Next Steps
-
Refine Atarashi Agent: Continue to work on refining the semantic search agent in Atarashi to address the missed cases and improve overall accuracy.
-
LLM Selection: Focus on further experimenting with Gemma 2 9b for more comprehensive testing and continue exploring ways to quantify its performance in comparison to the other models.