Brain-Score Language

Brain-Score is a collection of benchmarks and models: benchmarks combine neural/behavioral data with a metric to score models on their alignment to humans, and models are evaluated as computational hypotheses of human brain processing.

The Brain-Score Language library contains benchmarks that can easily be used to test language models on their alignment to human behavioral and internal brain processing, as well as language models that can easily be tested on new behavioral or neural data. This makes experimental data accessible to modelers, and computational models accessible to experimenters, accelerating progress in discovering ever-more-accurate models of the human brain and mind.

The score function is the primary entry point to score a model on a benchmark.

brainscore_language.score(model_identifier: str, benchmark_identifier: str, conda_active: bool = False) → Score[source]

Score the model referenced by the model_identifier on the benchmark referenced by the benchmark_identifier. The model needs to implement the ArtificialSubject interface so that the benchmark can interact with it. The benchmark will be looked up from the benchmarks and evaluates the model (looked up from models) on how brain-like it is under that benchmark’s experimental paradigm, primate measurements, comparison metric, and ceiling. This results in a quantitative Score ranging from 0 (least brain-like) to 1 (most brain-like under this benchmark).

Parameters:

model_identifier – the identifier for the model
benchmark_identifier – the identifier for the benchmark to test the model against

Returns:

a Score of how brain-like the candidate model is under this benchmark. The score is normalized by this benchmark’s ceiling such that 1 means the model matches the data to ceiling level.

Contents: