analysis – Refbank

Programmatic access

You can access refbank data in R through the refbank package or directly using the Redivis API for Python.

The codebook contains descriptions of all tables and what each column means.

Install the refbankr R package:

remotes::install_github("refbank/refbankr")

Generate and set an API token.
Access the data:

library(refbankr)

# Load table as tidyverse tibble
datasets <- get_datasets()

# experiment-level information about the conditions
conditions <- get_conditions()

# trial-level information about who was in the trial, what the stimuli were, and when in a game it occurred
trials <- get_trials()

# information about participant choices on each trial
choices <- get_choices()

# language data for each trial
messages <- get_messages()

# meta-data about stimuli
images <- get_images()

# meta-data about participants
participants <- get_players()

There are also pre-computed summary tables that give a quick overview of the data without downloading full tables.

# summary stats aggregated per game
per_game_summary <- get_per_game_summary()

# counts of data per dataset and condition
dataset_summary <- get_dataset_summary()

By default, functions return all datasets in the current version of the data, but you can specify a different version, or a specific set of datasets.

# learn what the current version number is
get_current_version()

# specify a specific version
# or specific datasets
trials <- get_trials(version="7.3", datasets=c("hawkins2023_frompartners", "hawkins2021_respect"))

For testing, you can limit the number of results retrieved.

trials <- get_trials(max_results=100)
# this is non-deterministic in which items are returned

For convenience, you can get some tables with other tables joined in already.

messages <- get_messages(include_trial_data=T,
                         include_player_data=T,
                         include_image_data=T,
                         include_condition_data=T)
                         
choices <- get_choices(include_trial_data=T,
                       include_player_data=T,
                       include_image_data=T,
                       include_condition_data=T)


trials <- get_trials(include_image_data=T,
                    include_condition_data=T)

You can also download the image files where available.

download_image_files(destination="images/")

In addition to the primary data tables, there are also derived tables of processed data, including vector embeddings, cosine similarities, linguistic parses, and message annotations.

embeddings <- get_sbert_embeddings()

# available sim_type values:
# "to_last", "to_next", "to_first", "diverge", "diff", "idiosyncrasy"
similarities <- get_cosine_similarities(sim_type = c("to_last", "to_first"))

# stanza-parsed linguistic output for each message
parsed <- get_parsed_messages()

# human or model annotations for messages
annotated <- get_annotated_messages()

Install the redivis-python client library:

pip install --upgrade redivis

Generate and set an API token.
Access the data:

import redivis

organization = redivis.organization("datapages")
dataset = organization.dataset("refbank")
table = dataset.table("summary")

# Load table as a dataframe
df = table.to_pandas_dataframe()

View documentation