New Model Tutorial

This example walks through adding a new model and scoring it on existing benchmarks. Everything can be developed locally with full access to publicly available benchmarks, but we strongly encourage you to submit your model to Brain-Score to make it accessible to the community, and to make it testable on future benchmarks.

If you haven’t already, check out other models and the docs.

Adding the model plugin

We require models to implement the ArtificialSubject API. This interface is the central communication point between models and benchmarks and guarantees that your model can be evaluated on all available benchmarks. It includes three central methods that set the model up for performing a behavioral task, for performing neural recordings, and for digesting text with behavioral and/or neural outputs. A model does not have to implement all three methods, it can for instance only engage on behavior, or only on neurons.

HuggingFace models

For models on HuggingFace, we provide a simple HuggingfaceSubject wrapper that lets you instantiate models in very few lines of code since the wrapper takes care of implementing the ArtificialSubject interface. The main choice you will have to make is which layer corresponds to which brain region. For instance, the following is an excerpt from adding gpt models:

from brainscore_language.model_helpers.huggingface import HuggingfaceSubject

model_registry['distilgpt2'] = lambda: HuggingfaceSubject(model_id='distilgpt2', region_layer_mapping={
    ArtificialSubject.RecordingTarget.language_system: 'transformer.h.5.mlp.dropout'})

Unit tests

As with all plugins, please provide a test.py file to ensure the continued validity of your model. For instance, the following is an excerpt from the tests for gpt2-xl:

from brainscore_language import load_model

def test_next_word(model_identifier, ):
    model = load_model('gpt2-xl')
    text = ['the quick brown fox', 'jumps over', 'the lazy']
    expected_next_words = ['jumps', 'the', 'dog']
    model.start_behavioral_task(task=ArtificialSubject.Task.next_word)
    next_word_predictions = model.digest_text(text)['behavior']
    np.testing.assert_array_equal(next_word_predictions, expected_next_words)

Running your model on benchmarks

You can now locally run models on your benchmark (see Submit to Brain-Score for running models on the Brain-Score platform). Run the score function, passing in the desired benchmark identifier(s) and the identifier for your model.

For instance, you might run:

from brainscore_language import score

model_score = score(model_identifier='distilgpt2', benchmark_identifier='Futrell2018-pearsonr')

Submit to Brain-Score

To share your model plugin with the community and to make it accessible for continued benchmark evaluation, please submit it to the platform.

There are two main ways to do that:

By uploading a zip file on the website
By submitting a github pull request with the proposed changes

Both options result in the same outcome: your plugin will automatically be tested, and added to the codebase after it passes tests.