Skip to main content

Week 4

Meeting 6

(July 07th, 2022)

GSOC 2022 weekly update

Attendees

Discussions

                  +-- linearsvc
│ +-- LICENSE
│ +-- MANIFEST.in
│ +-- README.md
│ +-- setup.py
│ +-- src
│ +-- linearsvc
│ │ +-- data
│ │ │ +-- linearsvc
│ │ +-- __init__.py
│ +-- model_train.py
+-- logreg
+-- LICENSE
+-- MANIFEST.in
+-- README.md
+-- setup.py
+-- src
+-- logreg
│ +-- data
│ │ +-- logreg
│ +-- __init__.py
+-- model_train.py
  • Modified init.py from the src folder of both the python packages as suggested:

    • In the code below, it can be seen that the linearsvc class have two functions:
      1. linearsvc.classify() can be called to get the model classifier and the classifier can be further used to predict the license shortname for atarshi agent just by using the predict() function.
      2. And in linearsvc.predict_shortname(), we can directly pass the preprocessed file and it will return the license shortname.
    • Similar functions has been implemented for logreg model also.
            class linearsvc():
def __init__(self, preprocessed_file):
self.preprocessed_file = preprocessed_file

def classify(self):
data = resource_filename("linearsvc", "data/linearsvc")
with open(data, 'rb') as f:
Classifier = pickle.load(f)
return Classifier

def predict_shortname(self):
predictor = self.classify()
return predictor.predict(self.preprocessed_file)

Conclusion and Further Plans

  • Will make the changes according to further suggestion.
  • Will start implementing okapi_BM25 in place of tfidftransformer for ranking the license text on dataset for training the models and compare which among the two is working better on dataset.