Developers’ Reference#

The library installs a couple of commands to your system. The documentation for these commands can be found below or by executing ms3 -h.

When using ms3 as a module, we are dealing with four main object types:

MSCX objects hold the information of a single parsed MuseScore file;
Annotations objects hold a set of annotation labels which can be either attached to a score (i.e., contained in its XML structure), or detached.
Both types of objects are contained within a Score object. For example, a set of Annotations read from a TSV file can be attached to the XML of an MSCX object, which can then be output as a MuseScore file.
To manipulate many Score objects at once, for example those of an entire corpus, we use Parse objects.

Since MSCX and Annotations objects are always attached to a Score, the documentation starts with this central class.

The Parse class#

class ms3.parse.Parse(directory: str | Collection[str] | None = None, recursive: bool = True, only_metadata_pieces: bool = True, include_convertible: bool = False, include_tsv: bool = True, exclude_review: bool = True, file_re: Pattern | str | None = None, folder_re: Pattern | str | None = None, exclude_re: Pattern | str | None = None, file_paths: Collection[str] | None = None, labels_cfg: dict = {}, ms=None, **logger_cfg)[source]#

Class for creating one or several Corpus objects and performing actions on all of them.

__init__(directory: str | Collection[str] | None = None, recursive: bool = True, only_metadata_pieces: bool = True, include_convertible: bool = False, include_tsv: bool = True, exclude_review: bool = True, file_re: Pattern | str | None = None, folder_re: Pattern | str | None = None, exclude_re: Pattern | str | None = None, file_paths: Collection[str] | None = None, labels_cfg: dict = {}, ms=None, **logger_cfg)[source]#

Initialize a Parse object and try to create corpora if directories and/or file paths are specified.

Parameters:

directory – Path to scan for corpora.
recursive – Pass False if you don’t want to scan directory for subcorpora, but force making it a corpus instead.
only_metadata_pieces – The default view excludes piece names that are not listed in the corpus’ metadata.tsv file (e.g. when none was found). Pass False to include all pieces regardless. This might be needed when setting recursive to False.
include_convertible – The default view excludes scores that would need conversion to MuseScore format prior to parsing. Pass True to include convertible scores in .musicxml, .midi, .cap or any other format that MuseScore 3 can open. For on-the-fly conversion, however, the parameter ms needs to be set.
include_tsv – The default view includes TSV files. Pass False to disregard them and parse only scores.
exclude_review – The default view excludes files and folders whose name contains ‘review’. Pass False to include these as well.
file_re – Pass a regular expression if you want to create a view filtering out all files that do not contain it.
folder_re – Pass a regular expression if you want to create a view filtering out all folders that do not contain it.
exclude_re – Pass a regular expression if you want to create a view filtering out all files or folders that contain it.
file_paths – If directory is specified, the file names of these paths are used to create a filtering view excluding all other files. Otherwise, all paths are expected to be part of the same parent corpus which will be inferred from the first path by looking for the first parent directory that either contains a ‘metadata.tsv’ file or is a git. This parameter is deprecated and file_re should be used instead.
labels_cfg – Pass a configuration dict to detect only certain labels or change their output format.
ms – If you pass the path to your local MuseScore 3 installation, ms3 will attempt to parse musicXML, MuseScore 2, and other formats by temporarily converting them. If you’re using the standard path, you may try ‘auto’, or ‘win’ for Windows, ‘mac’ for MacOS, or ‘mscore’ for Linux. In case you do not pass the ‘file_re’ and the MuseScore executable is detected, all convertible files are automatically selected, otherwise only those that can be parsed without conversion.
**logger_cfg – Keyword arguments for changing the logger configuration. E.g. level='d' to see all debug messages.

corpus_paths: Dict[str, str]#: {corpus_name -> path} dictionary with each corpus’s base directory. Generally speaking, each corpus path is expected to contain a metadata.tsv and, maybe, to be a git.

corpus_objects: Dict[str, Corpus]#: {corpus_name -> Corpus} dictionary with one object per corpus_path.

labels_cfg#: dict Configuration dictionary to determine the output format of labels and expanded tables. The dictonary is passed to Score upon parsing.

property ms: str#: Path or command of the local MuseScore 3 installation if specified by the user and recognized.

property n_detected: int#: Number of detected files aggregated from all Corpus objects without taking views into account. Excludes metadata files.

property n_orphans: int#: Number of files that are always disregarded because they could not be attributed to any of the pieces.

property n_parsed: int#: Number of parsed files aggregated from all Corpus objects without taking views into account. Excludes metadata files.

property n_parsed_scores: int#: Number of parsed scores aggregated from all Corpus objects without taking views into account. Excludes metadata files.

property n_parsed_tsvs: int#: Number of parsed TSV files aggregated from all Corpus objects without taking views into account. Excludes metadata files.

property n_pieces: int#: Number of all available pieces (‘pieces’), independent of the view.

property n_unparsed_scores: int#: Number of all detected but not yet parsed scores, aggregated from all Corpus objects without taking views into account. Excludes metadata files.

property n_unparsed_tsvs: int#: Number of all detected but not yet parsed TSV files, aggregated from all Corpus objects without taking views into account. Excludes metadata files.

property view: View#: Retrieve the current View object. Shorthand for get_view().

property views: None#: Display a short description of the available views.

property view_name: str#: Get the name of the active view.

This method creates a Corpus object which scans the directory directory for parseable files. It inherits all Views from the Parse object.

Parameters:

directory – Directory to scan for files.
corpus_name – By default, the folder name of directory is used as name for this corpus. Pass a string to use a different identifier.
**logger_cfg – Keyword arguments for configuring the logger of the new Corpus object. E.g. level='d' to see all debug messages. Note that the logger is a child logger of this Parse object’s logger and propagates, so it might filter debug messages. You can use _.change_logger_cfg(level=’d’) to change the level post hoc.

This method decides if the directory directory contains several corpora or if it is a corpus itself, and calls add_corpus() for each corpus.

Parameters:

directory – Directory to scan for corpora.
recursive – By default, if any of the first-level subdirectories contains a ‘metadata.tsv’ or is a git, all first-level subdirectories of directory are treated as corpora, i.e. one Corpus object per folder is created. Pass False to prevent this, which is equivalent to calling add_corpus(directory)
**logger_cfg – Keyword arguments for configuring the logger of the new Corpus objects. E.g. level='d' to see all debug messages. Note that the loggers are child loggers of this Parse object’s logger and propagate, so it might filter debug messages. You can use _.change_logger_cfg(level=’d’) to change the level post hoc.

add_files(file_paths: str | Collection[str], corpus_name: str | None = None) → None[source]#

Deprecated: To deal with particular files only, use add_corpus() passing the directory containing them and configure the :class`~.view.View` accordingly. This method here does it for you but easily leads to unexpected behaviour. It expects the file paths to point to files located in a shared corpus folder on some higher level or in folders for which Corpus objects have already been created.

Parameters:

file_paths – Collection of file paths. Only existing files can be added.
corpus_name –
- By default, I will try to attribute the files to existing Corpus objects based on their paths. This makes sense only when new files have been created after the directories were scanned.
- For paths that do no not contain an existing corpus_path, I will try to detect the parent directory that is a corpus (based on it being a git or containing a metadata.tsv). If this is without success for the first path, I will raise an error. Otherwise, all subsequent paths will be considered to be part of that same corpus (watch out meaningless relative paths!).
- You can pass a folder name contained in the first path to create a new corpus, assuming that all other paths are contained in it (watch out meaningless relative paths!).
- Pass an existing corpus_name to add the files to a particular corpus. Note that all parseable files under the corpus_path are detected anyway, and if you add files from other directories, it will lead to invalid relative paths that work only on your system. If you’re adding files that have been created after the Corpus object has, you can leave this parameter empty; paths will be attributed to the existing corpora automatically.

change_labels_cfg(labels_cfg=(), staff=None, voice=None, harmony_layer=None, positioning=None, decode=None, column_name=None, color_format=None)[source]#

Update Parse.labels_cfg and retrieve new ‘labels’ tables accordingly.

Parameters:

labels_cfg (dict) – Using an entire dictionary or, to change only particular options, choose from:
staff – Arguments as they will be passed to get_labels()
voice – Arguments as they will be passed to get_labels()
harmony_layer – Arguments as they will be passed to get_labels()
positioning – Arguments as they will be passed to get_labels()
decode – Arguments as they will be passed to get_labels()
column_name – Arguments as they will be passed to get_labels()

compare_labels(key: str = 'detached', new_color: str = 'ms3_darkgreen', old_color: str = 'ms3_darkred', detached_is_newer: bool = False, add_to_rna: bool = True, view_name: str | None = None, metadata_update: dict | None = None, force_metadata_update: bool = False) → Tuple[int, int][source]#

Compare detached labels key to the ones attached to the Score to create a diff. By default, the attached labels are considered as the reviewed version and labels that have changed or been added in comparison to the detached labels are colored in green; whereas the previous versions of changed labels are attached to the Score in red, just like any deleted label.

Parameters:

key – Key of the detached labels you want to compare to the ones in the score.
new_color – The colors by which new and old labels are differentiated. Identical labels remain unchanged. Colors can be CSS colors or MuseScore colors (see utils.MS3_COLORS).
old_color – The colors by which new and old labels are differentiated. Identical labels remain unchanged. Colors can be CSS colors or MuseScore colors (see utils.MS3_COLORS).
detached_is_newer – Pass True if the detached labels are to be added with new_color whereas the attached changed labels will turn old_color, as opposed to the default.
add_to_rna – By default, new labels are attached to the Roman Numeral layer. Pass False to attach them to the chord layer instead.
metadata_update – Dictionary containing metadata that is to be included in the comparison score. Notably, ms3 uses the key ‘compared_against’ when the comparison is performed against a given git_revision.
force_metadata_update – By default, the metadata is only updated if the comparison yields at least one difference to avoid outputting comparison scores not displaying any changes. Pass True to force the metadata update, which results in the properts changed being set to True.

Returns:

Number of scores in which labels have changed.: Number of scores in which no label has chnged.

count_extensions(view_name: str | None = None, per_piece: bool = False, include_metadata: bool = False)[source]#

Count file extensions.

Parameters:

keys (str or Collection, optional) – Key(s) for which to count file extensions. By default, all keys are selected.
ids (Collection) – If you pass a collection of IDs, keys is ignored and only the selected extensions are counted.
per_key (bool, optional) – If set to True, the results are returned as a dict {key: Counter}, otherwise the counts are summed up in one Counter.
per_subdir (bool, optional) – If set to True, the results are returned as {key: {subdir: Counter} }. per_key=True is therefore implied.

Returns:

By default, the function returns a Counter of file extensions (Counters are converted to dicts). If per_key is set to True, a dictionary {key: Counter} is returned, separating the counts. If per_subdir is set to True, a dictionary {key: {subdir: Counter} } is returned.

Return type:

dict

count_pieces(view_name: str | None = None) → int[source]#: Number of selected pieces under the given view.

disambiguate_facet(facet: Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'], view_name: str | None = None, ask_for_input=True) → None[source]#: Calls the method on every selected corpus.

get_dataframes(notes: bool = False, rests: bool = False, notes_and_rests: bool = False, measures: bool = False, events: bool = False, labels: bool = False, chords: bool = False, expanded: bool = False, form_labels: bool = False, cadences: bool = False, view_name: str | None = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', unfold: bool = False, interval_index: bool = False, flat=False, include_empty: bool = False) → DataFrame | Dict[Tuple[str, str], Dict[str, List[Tuple[File, DataFrame]]] | List[Tuple[File, DataFrame]]][source]#: Renamed to get_facets().

get_facet(facet: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], view_name: str | None = None, choose: Literal['auto', 'ask'] = 'auto', unfold: bool = False, interval_index: bool = False, concatenate: bool = True) → Dict[str, Tuple[File, DataFrame]] | DataFrame[source]#: Retrieves exactly one DataFrame per piece, if available.

get_piece(corpus_name: str, piece: str) → Piece[source]#: Returns an existing Piece object.

get_view(view_name: str | None = None, **config) → View[source]#: Retrieve an existing or create a new View object, potentially while updating the config.

insert_detached_labels(view_name: str | None = None, key: str = 'detached', staff: int = None, voice: Literal[1, 2, 3, 4] = None, harmony_layer: Literal[0, 1, 2] | None = None, check_for_clashes: bool = True)[source]#

Attach all Annotations objects that are reachable via Score.key to their respective Score, altering the XML in memory. Calling store_scores() will output MuseScore files where the annotations show in the score.

Parameters:

key – Key under which the Annotations objects to be attached are stored in the Score objects. Defaults to ‘detached’.
staff (int, optional) – If you pass a staff ID, the labels will be attached to that staff where 1 is the upper stuff. By default, the staves indicated in the ‘staff’ column of ms3.annotations.Annotations.df will be used.
voice ({1, 2, 3, 4}, optional) – If you pass the ID of a notational layer (where 1 is the upper voice, blue in MuseScore), the labels will be attached to that one. By default, the notational layers indicated in the ‘voice’ column of ms3.annotations.Annotations.df will be used.
harmony_layer (int, optional) –

By default, the labels are written to the layer specified as an integer in the column harmony_layer.

Pass an integer to select a particular layer:

* 0 to attach them as absolute (‘guitar’) chords, meaning that when opened next time,

MuseScore will split and encode those beginning with a note name ( resulting in ms3-internal harmony_layer 3).

* 1 the labels are written into the staff’s layer for Roman Numeral Analysis.

* 2 to have MuseScore interpret them as Nashville Numbers
check_for_clashes (bool, optional) – By default, warnings are thrown when there already exists a label at a position (and in a notational layer) where a new one is attached. Pass False to deactivate these warnings.

iter_corpora(view_name: str | None = None) → Iterator[Tuple[str, Corpus]][source]#: Iterate through corpora under the current or specified view.

iter_independent_corpora(view_name: str | None = None) → Iterator[Tuple[str, Corpus]][source]#: Like iter_corpora() but creating new Corpus objects that are not stored in this Parse object to avoid filling up memory when parsing many files.

keys() → List[str][source]#: Return the names of all corpus objects.

load_ignored_warnings(path: str) → None[source]#

Adds a filters to all loggers included in a IGNORED_WARNINGS file.

Parameters:: path – Path of the IGNORED_WARNINGS file.

set_view(active: View = None, **views: View)[source]#: Register one or several view_name=View pairs.

update_metadata_tsv_from_parsed_scores(root_dir: str | None = None, suffix: str = '', markdown_file: str | None = 'README.md', view_name: str | None = None) → List[str][source]#

Gathers the metadata from parsed and currently selected scores and updates ‘metadata.tsv’ with the information.

Parameters:

root_dir – In case you want to output the metadata to folder different from corpus_path.
suffix – Added to the filename: ‘metadata{suffix}.tsv’. Defaults to ‘’. Metadata files with suffix may be used to store views with particular subselections of pieces.
markdown_file – By default, a subset of metadata columns will be written to ‘README.md’ in the same folder as the TSV file. If the file exists, it will be scanned for a line containing the string ‘# Overview’ and overwritten from that line onwards.
view_name – The view under which you want to update metadata from the selected parsed files. Defaults to None, i.e. the active view.

Returns:

The file paths to which metadata was written.

update_score_metadata_from_tsv(view_name: str | None = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', write_empty_values: bool = False, remove_unused_fields: bool = False, write_text_fields: bool = False, update_instrumentation: bool = False) → List[File][source]#

Update metadata fields of parsed scores with the values from the corresponding row in metadata.tsv.

Parameters:

view_name
force
choose
write_empty_values – If set to True, existing values are overwritten even if the new value is empty, in which case the field will be set to ‘’.
remove_unused_fields – If set to True, all non-default fields that are not among the columns of metadata.tsv (anymore) are removed.
write_text_fields – If set to True, ms3 will write updated values from the columns title_text, subtitle_text, composer_text, lyricist_text, and part_name_text into the score headers.
update_instrumentation – Set to True to update the score’s instrumentation based on changed values from ‘staff__instrument’ columns.

Returns:

List of File objects of those scores of which the XML structure has been modified.

update_scores(root_dir: str | None = None, folder: str = '.', suffix: str = '', overwrite: bool = False) → List[str][source]#

Update scores created with an older MuseScore version to the latest MuseScore 3 version.

Parameters:

root_dir – In case you want to create output paths for the updated MuseScore files based on a folder different from corpus_path.
folder –
- The default ‘.’ has the updated scores written to the same directory as the old ones, effectively overwriting them if root_dir is None.
- If folder is None, the files will be written to {root_dir}/scores/.
- If folder is an absolute path, root_dir will be ignored.
- If folder is a relative path starting with a dot . the relative path is appended to the file’s subdir. For example, ../scores will resolve to a sibling directory of the one where the file is located.
- If folder is a relative path that does not begin with a dot ., it will be appended to the root_dir.
suffix – String to append to the file names of the updated files, e.g. ‘_updated’.
overwrite – By default, existing files are not overwritten. Pass True to allow this.

Returns:

A list of all up-to-date paths, whether they had to be converted or were already in the latest version.

update_tsvs_on_disk(facets: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'] | Collection[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords']] = 'tsv', view_name: str | None = None, force: bool = False, choose: Literal['auto', 'ask'] = 'auto') → List[str][source]#

Update existing TSV files corresponding to one or several facets with information freshly extracted from a parsed score, but only if the contents are identical. Otherwise, the existing TSV file is not overwritten and the differences are displayed in a log warning. The purpose is to safely update the format of existing TSV files, (for instance with respect to column order) making sure that the content doesn’t change.

Parameters:

facets
view_name
force – By default, only TSV files that have already been parsed are updated. Set to True in order to force-parse for each facet one of the TSV files included in the given view, if necessary.
choose

Returns:

List of paths that have been overwritten.

metadata_tsv(view_name: str | None = None) → DataFrame[source]#: Concatenates the ‘metadata.tsv’ (as they come) files for all corpora with a [corpus, piece] MultiIndex. If you need metadata that filters out pieces according to the current view, use metadata().

store_extracted_facets(view_name: str | None = None, root_dir: str | None = None, measures_folder: str | None = None, notes_folder: str | None = None, rests_folder: str | None = None, notes_and_rests_folder: str | None = None, labels_folder: str | None = None, expanded_folder: str | None = None, form_labels_folder: str | None = None, cadences_folder: str | None = None, events_folder: str | None = None, chords_folder: str | None = None, metadata_suffix: str | None = None, markdown: bool = True, simulate: bool = False, unfold: bool = False, interval_index: bool = False)[source]#

Store facets extracted from parsed scores as TSV files.

Parameters:

view_name
root_dir –

(‘measures’, ‘notes’, ‘rests’, ‘notes_and_rests’, ‘labels’, ‘expanded’, ‘form_labels’, ‘cadences’,
’events’, ‘chords’)
measures_folder
notes_folder
rests_folder
notes_and_rests_folder
labels_folder
expanded_folder

:param : :param form_labels_folder: Specify directory where to store the corresponding TSV files. :param cadences_folder: Specify directory where to store the corresponding TSV files. :param events_folder: Specify directory where to store the corresponding TSV files. :param chords_folder: Specify directory where to store the corresponding TSV files. :param metadata_suffix: Specify a suffix to update the ‘metadata{suffix}.tsv’ file for each corpus. For the main file, pass ‘’ :param markdown: By default, when metadata_path is specified, a markdown file called README.md containing

the columns [file_name, measures, labels, standard, annotators, reviewers] is created. If it exists already, this table will be appended or overwritten after the heading # Overview.

Parameters:

simulate
unfold – By default, repetitions are not unfolded. Pass True to duplicate values so that they correspond to a full playthrough, including correct positioning of first and second endings.
interval_index

Returns:

store_parsed_scores(view_name: str | None = None, only_changed: bool = True, root_dir: str | None = None, folder: str = '.', suffix: str = '', overwrite: bool = False, simulate=False) → Dict[str, List[str]][source]#

Stores all parsed scores under this view as MuseScore 3 files.

Args:
view_name: Name of another view if another than the current one is to be used. only_changed:

By default, only scores that have been modified since parsing are written. Set to False to store all scores regardless.

root_dir: Directory where to re-build the sub-directory tree of the Corpus in question. folder:

Different behaviours are available. Note that only the third option ensures that file paths are distinct for files that have identical pieces but are located in different subdirectories of the same corpus.

If folder is None (default), the files’ type will be appended to the root_dir.

If folder is an absolute path, root_dir will be ignored.

If folder is a relative path that does not begin with a dot ., it will be appended to the root_dir.

If folder is a relative path starting with a dot . the relative path is appended to the file’s subdir. For example, ``..

otes`` will resolve to a sibling directory of the one where the file

is located.

suffix: Suffix to append to the original file name. overwrite: Pass True to overwrite existing files. simulate: Set to True if no files are to be written.

Returns:: Paths of the stored files.

parse(view_name=None, level=None, parallel=True, only_new=True, labels_cfg={}, cols={}, infer_types=None, **kwargs)[source]#: Shorthand for executing parse_scores and parse_tsv at a time. :param view_name:

parse_scores(level: str = None, parallel: bool = True, only_new: bool = True, labels_cfg: dict = {}, view_name: str = None, choose: Literal['all', 'auto', 'ask'] = 'all')[source]#

Parse MuseScore 3 files (MSCX or MSCZ) and store the resulting read-only Score objects. If they need to be writeable, e.g. for removing or adding labels, pass parallel=False which takes longer but prevents having to re-parse at a later point.

Parameters:

keys (str or Collection, optional) – For which key(s) to parse all MSCX files.
ids (Collection) – To parse only particular files, pass their IDs. keys and fexts are ignored in this case.
level ({'W', 'D', 'I', 'E', 'C', 'WARNING', 'DEBUG', 'INFO', 'ERROR', 'CRITICAL'}, optional) – Pass a level name for which (and above which) you want to see log records.
parallel (bool, optional) – Defaults to True, meaning that all CPU cores are used simultaneously to speed up the parsing. It implies that the resulting Score objects are in read-only mode and that you might not be able to use the computer during parsing. Pass False to parse one score after the other, which uses more memory but will allow making changes to the scores.
only_new (bool, optional) – By default, score which already have been parsed, are not parsed again. Pass False to parse them, too.

Return type:

None

parse_tsv(view_name=None, level=None, cols={}, infer_types=None, only_new=True, choose: Literal['all', 'auto', 'ask'] = 'all', **kwargs)[source]#

Parse TSV files (or other value-separated files such as CSV) to be able to do something with them.

Parameters:

keys (str or Collection, optional) – Key(s) for which to parse all non-MSCX files. By default, all keys are selected.
ids (Collection) – To parse only particular files, pass there IDs. keys and fexts are ignored in this case.
fexts (str or Collection, optional) – If you want to parse only files with one or several particular file extension(s), pass the extension(s)
cols (dict, optional) – By default, if a column called 'label' is found, the TSV is treated as an annotation table and turned into an Annotations object. Pass one or several column name(s) to treat them as label columns instead. If you pass {} or no label column is found, the TSV is parsed as a “normal” table, i.e. a DataFrame.
infer_types (dict, optional) – To recognize one or several custom label type(s), pass {name: regEx}.
level ({'W', 'D', 'I', 'E', 'C', 'WARNING', 'DEBUG', 'INFO', 'ERROR', 'CRITICAL'}, optional) – Pass a level name for which (and above which) you want to see log records.
**kwargs – Arguments for pandas.DataFrame.to_csv(). Defaults to {'sep': ' ', 'index': False}. In particular, you might want to update the default dictionaries for dtypes and converters used in load_tsv().

Returns:

None
Args – only_new: view_name:

__iter__() → Iterator[Tuple[str, Corpus]][source]#

Iterate through all (corpus_name, Corpus) tuples, regardless of any Views.

Yields: (corpus_name, Corpus) tuples

__repr__()[source]#: Show the info() under the active view.

property parsed_mscx: DataFrame#: Deprecated property. Replaced by n_parsed_scores

property parsed_tsv: DataFrame#: Deprecated property. Replaced by n_parsed_tsvs

add_detached_annotations(*args, **kwargs)[source]#: Deprecated method. Replaced by insert_detached_labels().

count_annotation_layers(*args, **kwargs)[source]#: Deprecated method.

count_labels(*args, **kwargs)[source]#: Deprecated method.

get_lists(*args, **kwargs)[source]#: Deprecated method. Replaced by get_facets().

iter(*args, **kwargs)[source]#: Deprecated method. Replaced by ms3.corpus.Corpus.iter_facets().

parse_mscx(*args, **kwargs)[source]#: Deprecated method. Replaced by parse_scores().

pieces(*args, **kwargs)[source]#: Deprecated method. Replaced by info().

store_scores(*args, **kwargs)[source]#: Deprecated method. Replaced by store_parsed_scores().

update_metadata(*args, **kwargs)[source]#: Deprecated method. Replaced by update_score_metadata_from_tsv().

The Corpus class#

class ms3.corpus.Corpus(directory: str, view: View = None, only_metadata_pieces: bool = True, include_convertible: bool = False, include_tsv: bool = True, exclude_review: bool = True, file_re: Pattern | str | None = None, folder_re: Pattern | str | None = None, exclude_re: Pattern | str | None = None, paths: Collection[str] | None = None, labels_cfg={}, ms=None, **logger_cfg)[source]#

Collection of scores and TSV files that can be matched to each other based on their file names.

corpus_path: str#: Path where the corpus is located.

name#: Folder name of the corpus.

repo: Repo | None#: If the corpus is part of a git repository, this attribute holds the corresponding git.Repo object.

files: List[File]#: [File] list of File data objects containing information on the file location etc. for all detected files.

labels_cfg#: dict Configuration dictionary to determine the output format of labels and expanded tables. The dictonary is passed to Score upon parsing.

metadata_tsv: DataFrame#: The parsed ‘metadata.tsv’ file for the corpus.

metadata_ix: int#: The index of the ‘metadata.tsv’ file for the corpus.

ix2pname: Dict[int, str]#: {ix -> piece name} dict for associating files with the piece they have been matched to. None for indices that could not be matched, e.g. metadata.

ix2metadata_file: Dict[int, File]#: {ix -> File} dict for collecting all metadata files.

ix2orphan_file: Dict[int, File]#: {ix -> File} dict for collecting all metadata files.

score_fnames: List[str]#: Sorted list of unique file names of all detected scores

property pnames: List[str]#: All piece names including those of scores that are not listed in metadata.tsv

property n_pieces: int#: Number of all available pieces (‘pieces’), independent of the view.

add_dir(directory: str, filter_other_pieces: bool = False, file_re: str = '.*', folder_re: str = '.*', exclude_re: str = '^(\\.|_)') → List[File][source]#

Add additional files pertaining to the already existing pieces of the corpus.

If you want to use a directory with other pieces, create another Corpus object or combine several corpora in a Parse object.

Parameters:

directory – Directory to scan for parseable (score or TSV) files. Only those that begin with one of the corpus’s pieces will be matched and registered, the others will be kept under ix2orphan_file.
filter_other_pieces – Set to True if you want to filter out all pieces that were not matched up with one of the added files. This can be useful if you’re loading TSV files with labels and want to parse only the scores for which you have added labels.
file_re – Regular expressions for filtering certain file names or folder names. The regEx are checked with search(), not match(), allowing for fuzzy search.
folder_re – Regular expressions for filtering certain file names or folder names. The regEx are checked with search(), not match(), allowing for fuzzy search.
exclude_re – Exclude files and folders containing this regular expression.

Returns:

List of File objects pertaining to the matched, newly added paths.

add_file_paths(paths: Collection[str]) → List[File][source]#

Iterates through the given paths, converts those that correspond to parseable files to File objects (trying to infer their type from the path), and appends those to files.

Parameters:: paths – File paths that are to be registered with this Corpus object.
Returns:: A list of File objects corresponding to parseable files (based on their extensions).

collect_fnames_from_scores() → None[source]#: Construct sorted list of pieces from all detected scores.

create_metadata_tsv(suffix='', view_name: str | None = None, overwrite: bool = False, force: bool = True) → str | None[source]#: Creates a ‘metadata.tsv’ file for the current view.

create_pieces(pnames: Collection[str] | str = None) → None[source]#: Creates and stores one Piece object per piece.

detect_parseable_files() → None[source]#: Walks through the corpus_path and collects information on all parseable files.

disambiguate_facet(facet: Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'], view_name: str | None = None, ask_for_input=True) → None[source]#

Make sure that, for a given facet, the current view includes only one or zero files. If at least one piece has more than one file, the user will be asked which ones to use. The others will be excluded from the view.

Parameters:

facet – Which facet to disambiguate.
ask_for_input – By default, if there is anything to disambiguate, the user is asked to select a group of files. Pass False to see only the questions and choices without actually disambiguating.

extract_facets(facets: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'] | Collection[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords']] = None, view_name: str | None = None, force: bool = False, choose: Literal['auto', 'ask'] = 'auto', unfold: bool = False, interval_index: bool = False, flat=False) → Dict[str, Dict[str, List[Tuple[File, DataFrame]]] | List[Tuple[File, DataFrame]]][source]#: Retrieve a dictionary with the selected feature matrices extracted from the parsed scores. If you want to retrieve parsed TSV files, use get_all_parsed().

find_and_load_metadata() → None[source]#: Checks if a ‘metadata.tsv’ is present at the default path and parses it.

pieces_in_metadata(metadata_ix: int | None = None) → List[str][source]#: pieces (file names without extension and suffix) serve as IDs for pieces. Retrieve those that are listed in the ‘metadata.tsv’ file for this corpus. The argument is simply self.metadata_ix and serves caching of the results for multiple metadata.tsv files.

pieces_not_in_metadata() → List[str][source]#: pieces (file names without extension and suffix) serve as IDs for pieces. Retrieve those that are not listed in the ‘metadata.tsv’ file for this corpus.

get_dataframes(notes: bool = False, rests: bool = False, notes_and_rests: bool = False, measures: bool = False, events: bool = False, labels: bool = False, chords: bool = False, expanded: bool = False, form_labels: bool = False, cadences: bool = False, view_name: str | None = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', unfold: bool = False, interval_index: bool = False, flat=False, include_empty: bool = False) → Dict[str, Dict[str, Tuple[File, DataFrame]] | List[Tuple[File, DataFrame]]][source]#: Renamed to get_facets().

get_facet(facet: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], view_name: str | None = None, choose: Literal['auto', 'ask'] = 'auto', unfold: bool = False, interval_index: bool = False, concatenate: bool = True) → Dict[str, Tuple[File, DataFrame]] | DataFrame[source]#: Retrieves exactly one DataFrame per piece, if available.

get_facets(facets: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'] | Collection[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords']] = None, view_name: str | None = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', unfold: bool = False, interval_index: bool = False, flat=False, include_empty: bool = False) → Dict[str, Dict[str, Tuple[File, DataFrame]] | List[Tuple[File, DataFrame]]][source]#

Parameters:

facets
view_name
force – Only relevant when choose='all'. By default, only scores and TSV files that have already been parsed are taken into account. Set force=True to force-parse all scores and TSV files selected under the given view.
choose
unfold
interval_index
flat
include_empty

Returns:

get_all_pnames(pieces_in_metadata: bool = True, pieces_not_in_metadata: bool = True) → List[str][source]#

pieces (file names without extension and suffix) serve as IDs for pieces. Use this function to retrieve the comprehensive list, ignoring views.

Parameters:

pieces_in_metadata – pieces that are listed in the ‘metadata.tsv’ file for this corpus, if present
pieces_not_in_metadata – pieces that are not listed in the ‘metadata.tsv’ file for this corpus

Returns:

The file names included in ‘metadata.tsv’ and/or those of all other scores.

get_pieces(view_name: str | None = None) → List[str][source]#: Retrieve pieces included in the current or selected view.

get_piece(pname) → Piece[source]#: Returns the Piece object for piece.

get_view(view_name: str | None = None, **config) → View[source]#: Retrieve an existing or create a new View object, potentially while updating the config.

iter_facets(facets: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'] | Collection[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords']] = None, view_name: str | None = None, choose: Literal['auto', 'ask'] = 'auto', unfold: bool = False, interval_index: bool = False, include_files: bool = False) → Iterator[source]#

Iterate through (piece, *DataFrame) tuples containing exactly one or zero DataFrames per requested facet.

Parameters:

facets
view_name
choose
unfold
interval_index
include_files

Returns:

(piece, *DataFrame) tuples containing exactly one or zero DataFrames per requested facet per piece (piece).

iter_pieces(view_name: str | None = None) → Iterator[Tuple[str, Piece]][source]#: Iterate through (name, corpus) tuples under the current or specified view.

load_facet_into_scores(facet: Literal['expanded', 'labels'], view_name: str | None = None, force: bool = False, choose: Literal['auto', 'ask'] = 'auto', git_revision: str | None = None, key: str = 'detached', infer: bool = True, **cols) → int[source]#: Loads annotations from maximum one TSV file to maximum one score per piece. Each score will contain the annotations as a ‘detached’ annotation object accessible via the indicated key (defaults to ‘detached’).

look_for_ignored_warnings(directory: str | None = None)[source]#: Looks for a text file called IGNORED_WARNINGS and, if it exists, loads it, configuring loggers as indicated.

load_ignored_warnings(path: str) → Tuple[List[Logger], List[str]][source]#: Loads in a text file containing warnings that are to be ignored, i.e., wrapped in DEBUG messages. The purpose is to mark certain warnings as OK, warranted by a human, to allow checks to pass regardless.

load_metadata_file(file: File, allow_prefixed: bool = False) → None[source]#: Loads the TSV file at the given path and stores it as metadata. If the file is called ‘metadata.tsv’ it will be treated as the corpus’ main file for determining pieces. Otherwise it is expected to be named ‘metadata{suffix}.tsv’ and the suffix will be used as name for an additionally created view.

parse(view_name=None, level=None, parallel=True, only_new=True, labels_cfg={}, cols={}, infer_types=None, **kwargs)[source]#: Shorthand for executing parse_scores and parse_tsv at a time. :param view_name:

parse_mscx(*args, **kwargs)[source]#: Renamed to parse_scores().

parse_scores(level: str = None, parallel: bool = True, only_new: bool = True, labels_cfg: dict = {}, view_name: str = None, choose: Literal['all', 'auto', 'ask'] = 'all')[source]#

Parameters:

level ({'W', 'D', 'I', 'E', 'C', 'WARNING', 'DEBUG', 'INFO', 'ERROR', 'CRITICAL'}, optional) – Pass a level name for which (and above which) you want to see log records.
parallel (bool, optional) – Defaults to True, meaning that all CPU cores are used simultaneously to speed up the parsing. It implies that the resulting Score objects are in read-only mode and that you might not be able to use the computer during parsing. Set to False to parse one score after the other, which uses more memory but will allow making changes to the scores.
only_new (bool, optional) – By default, score which already have been parsed, are not parsed again. Pass False to parse them, too.

Return type:

None

parse_tsv(view_name: str | None = None, cols={}, infer_types=None, level=None, only_new: bool = True, choose: Literal['all', 'auto', 'ask'] = 'all', **kwargs)[source]#

Parse TSV files to be able to do something with them.

Parameters:

keys (str or Collection, optional) – Key(s) for which to parse all non-MSCX files. By default, all keys are selected.
ids (Collection) – To parse only particular files, pass there IDs. keys and fexts are ignored in this case.
fexts (str or Collection, optional) – If you want to parse only files with one or several particular file extension(s), pass the extension(s)
cols (dict, optional) – By default, if a column called 'label' is found, the TSV is treated as an annotation table and turned into an Annotations object. Pass one or several column name(s) to treat them as label columns instead. If you pass {} or no label column is found, the TSV is parsed as a “normal” table, i.e. a DataFrame.
infer_types (dict, optional) – To recognize one or several custom label type(s), pass {name: regEx}.
level ({'W', 'D', 'I', 'E', 'C', 'WARNING', 'DEBUG', 'INFO', 'ERROR', 'CRITICAL'}, optional) – Pass a level name for which (and above which) you want to see log records.
**kwargs – Arguments for pandas.DataFrame.to_csv(). Defaults to {'sep': ' ', 'index': False}. In particular, you might want to update the default dictionaries for dtypes and converters used in load_tsv(). Passing kwargs prevents ms3 from parsing TSVs in parallel, so it will be a bit slower.

Return type:

None

register_files_with_pieces(files: List[File] | None = None, pnames: str | Collection[str] | None = None) → None[source]#

Iterates through the files and tries to match it with the pieces and registered matched File objects with the corresponding Piece objects (unless already registered).

By default, the method uses this object’s files and pieces. To match with a Piece, the file name (without extension) needs to start with the Piece’s piece; otherwise, it will be stored under ix2orphan_file.

Parameters:

files – File objects to register with the corresponding Piece objects based on their file names.
pnames – Names of the pieces that the files are to be matched to. Those that don’t match any will be stored

:param under ix2orphan_file.:

metadata(view_name: str | None = None, choose: Literal['auto', 'ask'] | None = None) → DataFrame[source]#: Returns metadata.tsv but only for pieces included in the current or indicated view. If no TSV file is present, get metadata from the current scores.

set_view(active: View = None, **views: View)[source]#: Register one or several view_name=View pairs.

update_scores(root_dir: str | None = None, folder: str | None = '.', suffix: str = '', overwrite: bool = False) → List[str][source]#

Update scores created with an older MuseScore version to the latest MuseScore 3 version.

Parameters:

root_dir – In case you want to create output paths for the updated MuseScore files based on a folder different from corpus_path.
folder –
- The default ‘.’ has the updated scores written to the same directory as the old ones, effectively overwriting them if root_dir is None.
- If folder is None, the files will be written to {root_dir}/scores/.
- If folder is an absolute path, root_dir will be ignored.
- If folder is a relative path starting with a dot . the relative path is appended to the file’s subdir. For example, ../scores will resolve to a sibling directory of the one where the file is located.
- If folder is a relative path that does not begin with a dot ., it will be appended to the root_dir.
suffix – String to append to the file names of the updated files, e.g. ‘_updated’.
overwrite – By default, existing files are not overwritten. Pass True to allow this.

Returns:

A list of all up-to-date paths, whether they had to be converted or were already in the latest version.

Parameters:

facets
view_name
force – By default, only TSV files that have already been parsed are updated. Set to True in order to force-parse for each facet one of the TSV files included in the given view, if necessary.
choose

Returns:

List of paths that have been overwritten.

Parameters:

key – Key under which the Annotations objects to be attached are stored in the Score objects. Defaults to ‘detached’.
staff (int, optional) – If you pass a staff ID, the labels will be attached to that staff where 1 is the upper stuff. By default, the staves indicated in the ‘staff’ column of ms3.annotations.Annotations.df will be used.
voice ({1, 2, 3, 4}, optional) – If you pass the ID of a notational layer (where 1 is the upper voice, blue in MuseScore), the labels will be attached to that one. By default, the notational layers indicated in the ‘voice’ column of ms3.annotations.Annotations.df will be used.
harmony_layer (int, optional) –

By default, the labels are written to the layer specified as an integer in the column harmony_layer.

Pass an integer to select a particular layer:

* 0 to attach them as absolute (‘guitar’) chords, meaning that when opened next time,

MuseScore will split and encode those beginning with a note name ( resulting in ms3-internal harmony_layer 3).

* 1 the labels are written into the staff’s layer for Roman Numeral Analysis.

* 2 to have MuseScore interpret them as Nashville Numbers
check_for_clashes (bool, optional) – By default, warnings are thrown when there already exists a label at a position (and in a notational layer) where a new one is attached. Pass False to deactivate these warnings.

change_labels_cfg(labels_cfg=(), staff=None, voice=None, harmony_layer=None, positioning=None, decode=None, column_name=None, color_format=None)[source]#

Update Corpus.labels_cfg and retrieve new ‘labels’ tables accordingly.

Parameters:

labels_cfg (dict) – Using an entire dictionary or, to change only particular options, choose from:
staff – Arguments as they will be passed to get_labels()
voice – Arguments as they will be passed to get_labels()
harmony_layer – Arguments as they will be passed to get_labels()
positioning – Arguments as they will be passed to get_labels()
decode – Arguments as they will be passed to get_labels()
column_name – Arguments as they will be passed to get_labels()

Parameters:

key – Key of the detached labels you want to compare to the ones in the score.
new_color – The colors by which new and old labels are differentiated. Identical labels remain unchanged. Colors can be CSS colors or MuseScore colors (see utils.MS3_COLORS).
old_color – The colors by which new and old labels are differentiated. Identical labels remain unchanged. Colors can be CSS colors or MuseScore colors (see utils.MS3_COLORS).
detached_is_newer – Pass True if the detached labels are to be added with new_color whereas the attached changed labels will turn old_color, as opposed to the default.
add_to_rna – By default, new labels are attached to the Roman Numeral layer. Pass False to attach them to the chord layer instead.
metadata_update –

Dictionary containing metadata that is to be included in the comparison score. Notably, ms3 uses the key
’compared_against’ when the comparison is performed against a given git_revision.

force_metadata_update:
By default, the metadata is only updated if the comparison yields at least one difference to avoid outputting comparison scores not displaying any changes. Pass True to force the metadata update, which results in the properts changed being set to True.

Returns:

Number of scores in which labels have changed. Number of scores in which no label has chnged.

count_annotation_layers(keys=None, which='attached', per_key=False)[source]#

Counts the labels for each annotation layer defined as (staff, voice, harmony_layer). By default, only labels attached to a score are counted.

Parameters:

keys (str or Collection, optional) – Key(s) for which to count annotation layers. By default, all keys are selected.
which ({'attached', 'detached', 'tsv'}, optional) – ‘attached’: Counts layers from annotations attached to a score. ‘detached’: Counts layers from annotations that are in a Score object, but detached from the score. ‘tsv’: Counts layers from Annotation objects that have been loaded from or into annotation tables.
per_key (bool, optional) – If set to True, the results are returned as a dict {key: Counter}, otherwise the counts are summed up in one Counter. If which='detached', the keys are keys from Score objects, otherwise they are keys from this Corpus object.

Returns:

By default, the function returns a Counter of labels for every annotation layer (staff, voice, harmony_layer) If per_key is set to True, a dictionary {key: Counter} is returned, separating the counts.

Return type:

dict or collections.Counter

count_pieces(view_name: str | None = None) → int[source]#: Number of selected pieces under the given view.

count_labels(keys=None, per_key=False)[source]#

Count label types.

Parameters:

keys (str or Collection, optional) – Key(s) for which to count label types. By default, all keys are selected.
per_key (bool, optional) – If set to True, the results are returned as a dict {key: Counter}, otherwise the counts are summed up in one Counter.

Returns:

By default, the function returns a Counter of label types. If per_key is set to True, a dictionary {key: Counter} is returned, separating the counts.

Return type:

dict or collections.Counter

count_tsv_types(keys=None, per_key=False)[source]#

Count inferred TSV types.

Parameters:

keys (str or Collection, optional) – Key(s) for which to count inferred TSV types. By default, all keys are selected.
per_key (bool, optional) – If set to True, the results are returned as a dict {key: Counter}, otherwise the counts are summed up in one Counter.

Returns:

By default, the function returns a Counter of inferred TSV types. If per_key is set to True, a dictionary {key: Counter} is returned, separating the counts.

Return type:

dict or collections.Counter

detach_labels(view_name: str | None = None, force: bool = False, choose: Literal['auto', 'ask'] = 'auto', key: str = 'removed', staff: int = None, voice: Literal[1, 2, 3, 4] = None, harmony_layer: Literal[0, 1, 2, 3] | None = None, delete: bool = True)[source]#: Calls Score.detach_labels <ms3.score.Score.detach_labels() on every parsed score under the current or selected view.

keys() → List[str][source]#: Return the names of all Piece objects.

Store facets extracted from parsed scores as TSV files.

Parameters:

view_name
root_dir
measures_folder
notes_folder
rests_folder
notes_and_rests_folder
labels_folder
expanded_folder

:param : :param form_labels_folder: Specify directory where to store the corresponding TSV files. :param cadences_folder: Specify directory where to store the corresponding TSV files. :param events_folder: Specify directory where to store the corresponding TSV files. :param chords_folder: Specify directory where to store the corresponding TSV files. :param metadata_suffix: Specify a suffix to update the ‘metadata{suffix}.tsv’ file for this corpus. For the main file, pass ‘’ :param markdown: By default, when metadata_path is specified, a markdown file called README.md containing

the columns [file_name, measures, labels, standard, annotators, reviewers] is created. If it exists already, this table will be appended or overwritten after the heading # Overview.

Parameters:

simulate
unfold – By default, repetitions are not unfolded. Pass True to duplicate values so that they correspond to a full playthrough, including correct positioning of first and second endings.
interval_index
frictionless – If True (default), the file is written together with a frictionless resource descriptor JSON file whose column schema is used to validate the stored TSV file.

Returns:

A list of file stored to disk. If frictionless=True (default), it will be the list of descriptor file paths describing the stored TSV files (i.e., the list contains one file for every two files written to disk). Otherwise, it will be the list of TSV file paths.

update_metadata_tsv_from_parsed_scores(root_dir: str | None = None, suffix: str = '', markdown_file: str | None = 'README.md', view_name: str | None = None) → List[str][source]#

Gathers the metadata from parsed and currently selected scores and updates ‘metadata.tsv’ with the information.

Parameters:

root_dir – In case you want to output the metadata to folder different from corpus_path.
suffix – Added to the filename: ‘metadata{suffix}.tsv’. Defaults to ‘’. Metadata files with suffix may be used to store views with particular subselections of pieces.
markdown_file – By default, a subset of metadata columns will be written to ‘README.md’ in the same folder as the TSV file. If the file exists, it will be scanned for a line containing the string ‘# Overview’ and overwritten from that line onwards.
view_name – The view under which you want to update metadata from the selected parsed files. Defaults to None, i.e. the active view.

Returns:

The file paths to which metadata was written.

Update metadata fields of parsed scores with the values from the corresponding row in metadata.tsv.

Parameters:

view_name
force
choose
write_empty_values – If set to True, existing values are overwritten even if the new value is empty, in which case the field will be set to ‘’.
remove_unused_fields – If set to True, all non-default fields that are not among the columns of metadata.tsv (anymore) are removed.
write_text_fields – If set to True, ms3 will write updated values from the columns title_text, subtitle_text, composer_text, lyricist_text, and part_name_text into the score headers.
update_instrumentation – Set to True to update the score’s instrumentation based on changed values from ‘staff__instrument’ columns.

Returns:

List of File objects of those scores of which the XML structure has been modified.

Stores all parsed scores under this view as MuseScore 3 files.

Parameters:

view_name
only_changed – By default, only scores that have been modified since parsing are written. Set to False to store all scores regardless.
root_dir
folder
suffix – Suffix to append to the original file name.
overwrite – Pass True to overwrite existing files.
simulate – Set to True if no files are to be written.

Returns:

Paths of the stored files.

ms3.corpus.parse_musescore_file(file: File, logger: Logger, logger_parent: Logger, labels_cfg: dict = {}, logger_cfg: dict = {}, read_only: bool = False, ms: str | None = None) → Score[source]#

Performs a single parse and returns the resulting Score object or None.

Parameters:

file – File object with path information of a score that can be opened (or converted) with MuseScore 3.
logger – Logger to be used within this function (not for the parsing itself).
logger_cfg – Logger config for the new Score object (and therefore for the parsing itself).
read_only – Pass True to return smaller objects that do not keep a copy of the original XML structure in memory. In order to make changes to the score after parsing, this needs to be False (default).
ms – MuseScore executable in case the file needs to be converted.

Returns:

The parsed score.

The Piece class#

class ms3.piece.Piece(pname: str, view: View | None = None, labels_cfg: dict | None = None, ms=None, **logger_cfg)[source]#

Wrapper around Score for associating it with parsed TSV files

facet2files: Dict[str, List[File]]#: {typ -> [File]} dict storing file information for associated types.

ix2file: Dict[int, File]#: {ix -> File} dict storing the registered file information for access via index.

facet2parsed: Dict[str, Dict[int, Score | DataFrame]]#: {typ -> {ix -> pandas.DataFrame`|:obj:`Score}} dict storing parsed files for associated types.

ix2parsed: Dict[int, Score | DataFrame]#: {ix -> pandas.DataFrame`|:obj:`Score} dict storing the parsed files for access via index.

ix2parsed_score: Dict[int, Score]#: Subset of ix2parsed

ix2parsed_tsv: Dict[int, DataFrame]#: Subset of ix2parsed

ix2annotations: Dict[int, Annotations]#: {ix -> Annotations} dict storing Annotations objects for the parsed labels and expanded labels.

labels_cfg#: dict Configuration dictionary to determine the output format of labels and expanded tables. The dictonary is passed to Score upon parsing.

all_facets_present(view_name: str | None = None, selected_facets: Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'] | Collection[Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown']] | None = None) → bool[source]#

Checks if parsed TSV files have been detected for all selected facets under the active or indicated view.

Parameters:

view_name – Name of the view to check.
selected_facets – If passed, needs to be a subset of the facets selected by the view, otherwise the result will be False. If no selected_facets are passed, check for those selected by the active or indicated view.

Returns:

True if for each selected facet at least one file has been registered.

property has_changed_scores: bool#: Whether any of the parsed scores has been altered.

score_metadata(view_name: str | None, choose: Literal['auto', 'ask'], as_dict: Literal[False]) → Series[source]#

score_metadata(view_name: str | None, choose: Literal['auto', 'ask'], as_dict: Literal[True]) → dict

Parameters:

choose
as_dict – Set to True to change the return type from pandas.Series to dict.

Returns:

property tsv_metadata: Dict[str, str] | None#: If the Corpus has metadata_tsv, this field will contain the {column: value} pairs of the row pertaining to this piece.

metadata(view_name: str | None = None) → Series | None[source]#: If a row of ‘metadata.tsv’ has been stored, return that, otherwise extract from a (force-)parsed score.

set_view(active: View = None, **views: View)[source]#: Register one or several view_name=View pairs.

get_view(view_name: str | None = None, **config) → View[source]#: Retrieve an existing or create a new View object, potentially while updating the config.

change_labels_cfg(labels_cfg=(), staff=None, voice=None, harmony_layer=None, positioning=None, decode=None, column_name=None, color_format=None)[source]#

Update Piece.labels_cfg and retrieve new ‘labels’ tables accordingly.

Parameters:

labels_cfg (dict) – Using an entire dictionary or, to change only particular options, choose from:
staff – Arguments as they will be passed to get_labels()
voice – Arguments as they will be passed to get_labels()
harmony_layer – Arguments as they will be passed to get_labels()
positioning – Arguments as they will be passed to get_labels()
decode – Arguments as they will be passed to get_labels()
column_name – Arguments as they will be passed to get_labels()

Parameters:

key – Key of the detached labels you want to compare to the ones in the score.
new_color – The colors by which new and old labels are differentiated. Identical labels remain unchanged. Colors can be CSS colors or MuseScore colors (see utils.MS3_COLORS).
old_color – The colors by which new and old labels are differentiated. Identical labels remain unchanged. Colors can be CSS colors or MuseScore colors (see utils.MS3_COLORS).
detached_is_newer – Pass True if the detached labels are to be added with new_color whereas the attached changed labels will turn old_color, as opposed to the default.
add_to_rna – By default, new labels are attached to the Roman Numeral layer. Pass False to attach them to the chord layer instead.
metadata_update –

Dictionary containing metadata that is to be included in the comparison score. Notably, ms3 uses the key
’compared_against’ when the comparison is performed against a given git_revision.

force_metadata_update:
By default, the metadata is only updated if the comparison yields at least one difference to avoid outputting comparison scores not displaying any changes. Pass True to force the metadata update, which results in the properts changed being set to True.

Returns:

Number of scores in which labels have changed. Number of scores in which no label has chnged.

count_detected(include_empty: bool = False, view_name: str | None = None, prefix: bool = False) → Dict[str, int][source]#

Count how many files per facet have been detected.

Parameters:

include_empty – By default, facets without files are not included in the dict. Pass True to include zero counts.
view_name
prefix – Pass True if you want the facets prefixed with ‘detected_’.

Returns:

{facet -> count of detected files}

extract_facets(facets: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'] | Collection[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords']] = None, view_name: str | None = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', unfold: bool = False, interval_index: bool = False, flat=False) → Dict[str, List[Tuple[File, DataFrame]]] | List[Tuple[File, DataFrame]][source]#: Retrieve a dictionary with the selected feature matrices extracted from the parsed scores. If you want to retrieve parsed TSV files, use get_all_parsed().

get_facets(facets: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'] | Collection[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords']] = None, view_name: str | None = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', unfold: bool = False, interval_index: bool = False, flat=False) → Dict[str, Tuple[File, DataFrame]] | List[Tuple[File, DataFrame]][source]#

Retrieve score facets both freshly extracted from parsed scores and from parsed TSV files, depending on the parameters and the view in question.

If choose != ‘all’, the goal will be to return one DataFrame per facet. Preference is given to a DataFrame freshly extracted from an already parsed score; otherwise, from an already parsed TSV file. If both are not available, preference will be given to a force-parsed TSV, then to a force-parsed score.

Parameters:

facets
view_name
force – Only relevant when choose='all'. By default, only scores and TSV files that have already been parsed are taken into account. Set force=True to force-parse all scores and TSV files selected under the given view.
choose
unfold
interval_index
flat

Returns:

get_facet(facet: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], view_name: str | None = None, force: bool = False, choose: Literal['auto', 'ask'] = 'auto', unfold: bool = False, interval_index: bool = False) → Tuple[File | None, DataFrame | None][source]#: Retrieve a DataFrame from a parsed score or, if unavailable, from a parsed TSV. If none have been parsed, first force-parse a TSV and, if not included in the given view, force-parse a score.

get_file(facet: Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'], view_name: str | None = None, parsed: bool = True, unparsed: bool = True, choose: Literal['auto', 'ask'] = 'auto') → File | None[source]#

Parameters:

facet
choose

Returns:

A {file_type -> [File] dict containing the selected Files or, if flat=True, just a list.

get_files(facets: Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'] | Literal['tsv', 'tsvs'] | Collection[Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown']] = None, view_name: str | None = None, parsed: bool = True, unparsed: bool = True, choose: Literal['all', 'auto', 'ask'] = 'all', flat: bool = False, include_empty: bool = False) → Dict[str, List[File]] | List[File][source]#

Parameters:: facets
Returns:: A {file_type -> [File] dict containing the selected Files or, if flat=True, just a list.

get_parsed(facet: Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'], view_name: str | None = None, choose: Literal['auto', 'ask'] = 'auto', git_revision: str | None = None, unfold: bool = False, interval_index: bool = False) → Tuple[File | None, Score | DataFrame | None][source]#

Retrieve exactly one parsed score or TSV file. If none has been parsed, parse one automatically.

Parameters:

facet
view_name
choose
git_revision

Returns:

get_all_parsed(facets: Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'] | Literal['tsv', 'tsvs'] | Collection[Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown']] = None, view_name: str | None = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', flat: bool = False, include_empty: bool = False, unfold: bool = False, interval_index: bool = False) → Dict[Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'], List[Tuple[File, Score | DataFrame]]] | List[Tuple[File, Score | DataFrame]][source]#: Return multiple parsed files.

iter_extracted_facet(facet: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], view_name: str | None = None, force: bool = False, unfold: bool = False, interval_index: bool = False) → Iterator[Tuple[File | None, DataFrame | None]][source]#: Iterate through the selected facet extracted from all parsed or yet-to-parse scores.

iter_extracted_facets(facets: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'] | Collection[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords']], view_name: str | None = None, force: bool = False, unfold: bool = False, interval_index: bool = False) → Iterator[Tuple[File, Dict[str, DataFrame]]][source]#: Iterate through the selected facets extracted from all parsed or yet-to-parse scores.

iter_facet2files(view_name: str | None = None, include_empty: bool = False) → Iterator[Tuple[str, List[File]]][source]#: Iterating through facet2files under the current or specified view.

iter_facet2parsed(view_name: str | None = None, include_empty: bool = False) → Iterator[Dict[str, List[File]]][source]#: Iterating through facet2parsed under the current or specified view and selecting only parsed files.

iter_files(facets: Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'] | Literal['tsv', 'tsvs'] | Collection[Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown']] = None, view_name: str | None = None, parsed: bool = True, unparsed: bool = True, choose: Literal['all', 'auto', 'ask'] = 'all', flat: bool = False, include_empty: bool = False) → Iterator[Dict[str, File]] | Iterator[List[File]][source]#: Equivalent to iterating through the result of get_files().

keys() → List[int][source]#: Return the indices of all Files registered with this Piece.

load_annotation_table_into_score(ix: int | None = None, df: DataFrame | None = None, view_name: str | None = None, choose: Literal['auto', 'ask'] = 'auto', key: str = 'detached', infer: bool = True, **cols) → None[source]#

Attach an Annotations object to the score and make it available as Score.{key}. It can be an existing object or one newly created from the TSV file tsv_path.

Parameters:

ix – Either pass the index of a TSV file containing annotations, or
df – A DataFrame containing annotations.
key – Specify a new key for accessing the set of annotations. The string needs to be usable as an identifier, e.g. not start with a number, not contain special characters etc. In return you may use it as a property: For example, passing 'chords' lets you access the Annotations as Score.chords. The key ‘annotations’ is reserved for all annotations attached to the score.
infer – By default, the label types are inferred in the currently configured order (see name2regex). Pass False to not add and not change any label types.
**cols – If the columns in the specified TSV file diverge from the standard column names, pass them as standard_name=’custom name’ keywords.

store_extracted_facet(facet: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], root_dir: str | None = None, folder: str | None = None, view_name: str | None = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', unfold: bool = False, interval_index: bool = False, frictionless: bool = True, raise_exception: bool = True, write_or_remove_errors_file: bool = True)[source]#

Extract a facet from one or several available scores and store the results as TSV files, the paths of which are computed from the respective score’s location.

Args:
facet: root_dir:

Defaults to None, meaning that the path is constructed based on the corpus_path. Pass a directory to construct the path relative to it instead. If folder is an absolute path, root_dir is ignored.

folder:

If folder is None (default), the files’ type will be appended to the root_dir.

If folder is an absolute path, root_dir will be ignored.

If folder is a relative path starting with a dot . the relative path is appended to the file’s subdir. For example, ``..

otes`` will resolve to a sibling directory of the one where the file

is located.

If folder is a relative path that does not begin with a dot ., it will be appended to the root_dir.

view_name: force: choose: unfold: interval_index: frictionless:

If True (default), the file is written together with a frictionless resource descriptor JSON file whose column schema is used to validate the stored TSV file.

raise_exception:
If True (default) raise if the resource is not valid. Only relevant when frictionless=True (i.e., by default).

write_or_remove_errors_file:
If True (default) write a .errors file if the resource is not valid, otherwise remove it if it exists. Only relevant when frictionless=True (i.e., by default).

Returns:

store_parsed_score_at_ix(ix, root_dir: str | None = None, folder: str = '.', suffix: str = '', overwrite: bool = False, simulate=False) → str | None[source]#

Creates a MuseScore file from the Score object at the given index.

Parameters:

ix
folder
suffix – Suffix to append to the original file name.
root_dir
overwrite – Pass True to overwrite existing files.
simulate – Set to True if no files are to be written.

Returns:

Path of the stored file.

Update metadata fields of parsed scores with the values from the corresponding row in metadata.tsv.

Parameters:

view_name
force
choose
write_empty_values – If set to True, existing values are overwritten even if the new value is empty, in which case the field will be set to ‘’.
remove_unused_fields – If set to True, all non-default fields that are not among the columns of metadata.tsv (anymore) are removed.
write_text_fields – If set to True, ms3 will write updated values from the columns title_text, subtitle_text, composer_text, lyricist_text, and part_name_text into the score header.
update_instrumentation – Set to True to update the score’s instrumentation based on changed values from ‘staff__instrument’ columns.

Returns:

List of File objects of those scores of which the XML structure has been modified.

Parameters:

facets
view_name
force – By default, only TSV files that have already been parsed are updated. Set to True in order to force-parse for each facet one of the TSV files included in the given view, if necessary.
choose

Returns:

List of paths that have been overwritten.

get_dataframe(*args, **kwargs) → None[source]#: Deprecated method. Replaced by get_parsed(), extract_facet(), and get_facet().

The View class#

ms3.view.empty_counts()[source]#: Array for counting kept items, discarded items, and their sum.

class ms3.view.View(view_name: str | None = 'all', only_metadata_pieces: bool = False, include_convertible: bool = True, include_tsv: bool = True, exclude_review: bool = False, **logger_cfg)[source]#

Object storing regular expressions and filter lists, storing and keeping track of things filtered out.

is_default(relax_for_cli: bool = False) → bool[source]#: Checks includes and excludes that may influence the selection of pieces. Returns True if the settings do not filter out any pieces. Only if relax_for_cli is set to True, the filters include_convertible and exclude_review are permitted, too.

copy(new_name: str | None = None) → View[source]#: Returns a copy of this view, i.e., a new View object.

Update the configuration of the View. This is a shorthand for issuing several calls to include() and exclude() at once.

Parameters:

view_name – New name of the view.
only_metadata_pieces – Whether or not pieces that are not included in a metadata.tsv should be excluded.
include_convertible – Whether or not scores that need conversion via MuseScore before parsing should be
included.
include_tsv – Whether or not TSV files should be included.
exclude_review – Whether or not files and folder that include ‘review’ should be excluded.
file_paths – The exact file names will be extracted and used as exclusive filter, that is, all files that do not have one of these file names will be excluded. This is regardless of eventual relative or absolute paths included in the argument.
file_re – Include only files whose file name includes this regular expression.
folder_re – Include only files from folders whose name includes this regular expression.
exclude_re – Exclude all file and folders whose name includes this regular expression.
folder_paths – Include only files from these folders.
**logger_cfg

Returns:

check_token(category: Literal['corpora', 'folders', 'pieces', 'files', 'suffixes', 'facets', 'paths'], token: str) → bool[source]#: Checks if a string pertaining to a certain category should be included in the view or not.

check_file(file: File) → Tuple[bool, str][source]#

Check if an individual File passes all filters w.r.t. its subdirectories, file name and suffix.

Parameters:: file
Returns:: False if file is to be discarded from this view. The criterion based on which the file is being excluded.

filter_by_token(category: Literal['corpora', 'folders', 'pieces', 'files', 'suffixes', 'facets', 'paths'], tuples: Iterable[tuple]) → Iterator[tuple][source]#: Filters out those tuples where the token (first element) does not pass _.check_token(category, token).

filtered_tokens(category: Literal['corpora', 'folders', 'pieces', 'files', 'suffixes', 'facets', 'paths'], tokens: Collection[str]) → List[str][source]#: Applies filter_by_token() to a collection of tokens.

filtered_file_list(files: Collection[File], key: str = None) → List[File][source]#

Keep only the files that pass _.check_file().

Parameters:

files – File objects to be filtered.
key – Aggregate results from several filter runs under this dictionary key.

Returns:

class ms3.view.DefaultView(view_name: str | None = 'default', only_metadata_pieces: bool = True, include_convertible: bool = False, include_tsv: bool = True, exclude_review: bool = True, **logger_cfg)[source]#

is_default(relax_for_cli: bool = False) → bool[source]#: Checks includes and excludes that may influence the selection of pieces. Returns True if the settings do not filter out any pieces. Only if relax_for_cli is set to True, the filters include_convertible and exclude_review are permitted, too.

ms3.view.create_view_from_parameters(only_metadata_pieces: bool = True, include_convertible: bool = False, include_tsv: bool = True, exclude_review: bool = True, file_paths=None, file_re=None, folder_re=None, exclude_re=None, level=None) → View[source]#: From the arguments of an __init__ method, create either a DefaultView or a custom view.

The Score class#

class ms3.score.Score(musescore_file=None, match_regex=['dcml', 'form_labels'], read_only=False, labels_cfg={}, parser='bs4', ms=None, **logger_cfg)[source]#

Object representing a score.

ABS_REGEX = '^\\(?[A-G|a-g](b*|#*).*?(/[A-G|a-g](b*|#*))?$'#: str Class variable with a regular expression that recognizes absolute chord symbols in their decoded (string) form; they start with a note name.

NASHVILLE_REGEX = '^(b*|#*)(\\d).*$'#: str Class variable with a regular expression that recognizes labels representing a Nashville numeral, which MuseScore is able to encode.

RN_REGEX = '^$'#: str Class variable with a regular expression for Roman numerals that momentarily matches nothing because ms3 tries interpreting Roman Numerals als DCML harmony annotations.

native_formats = ('mscx', 'mscz')#: tuple Formats that MS3 reads without having to convert.

convertible_formats = ('cap', 'capx', 'midi', 'mid', 'musicxml', 'mxl', 'xml')#: tuple Formats that have to be converted before parsing.

parseable_formats = ('mscx', 'mscz', 'cap', 'capx', 'midi', 'mid', 'musicxml', 'mxl', 'xml')#: tuple Formats that ms3 can parse.

read_only#: bool, optional Defaults to False, meaning that the parsing is slower and uses more memory in order to allow for manipulations of the score, such as adding and deleting labels. Set to True if you’re only extracting information.

full_paths#: dict {KEY: {i: full_path}} dictionary holding the full paths of all parsed MuseScore and TSV files, including file names. Handled internally by _handle_path().

paths#: dict {KEY: {i: file path}} dictionary holding the paths of all parsed MuseScore and TSV files, excluding file names. Handled internally by _handle_path().

files#: dict {KEY: {i: file name with extension}} dictionary holding the complete file name of each parsed file, including the extension. Handled internally by _handle_path().

fnames#: dict {KEY: {i: file name without extension}} dictionary holding the file name of each parsed file, without its extension. Handled internally by _handle_path().

fexts#: dict {KEY: {i: file extension}} dictionary holding the file extension of each parsed file. Handled internally by _handle_path().

ms#: str Path or command of the local MuseScore 3 installation if specified by the user.

_mscx: MSCX#: The object representing the parsed MuseScore file.

_detached_annotations#: dict {(key, i): Annotations object} dictionary for accessing all detached Annotations objects.

_types_to_infer#: list Current order in which types are being recognized.

_regex_name_description#: dict Mapping regex names to their descriptions.

_name2regex#: dict Mapping names to their corresponding regex. Managed via the property name2regex. ‘dcml’: utils.DCML_REGEX,

labels_cfg#: dict Configuration dictionary to determine the output format of the Annotations objects contained in the current object, especially when calling Score.mscx.labels(). The default options correspond to the default parameters of Annotations.get_labels().

parser#: {‘bs4’} Currently only one XML parser has been implemented which uses BeautifulSoup 4.

review_report#: pandas.DataFrame After calling color_non_chord_tones(), this DataFrame contains the expanded chord labels plus the six additional columns [‘n_colored’, ‘n_untouched’, ‘count_ratio’, ‘dur_colored’, ‘dur_untouched’, ‘dur_ratio’] representing the statistics of chord (untouched) vs. non-chord (colored) notes.

comparison_report#: pandas.DataFrame DataFrame showing the labels modified (‘new’) and added (‘old’) by compare_labels().

property name2regex#: list or dict, optional The order in which label types are to be inferred. Assigning a new value results in a call to infer_types(). Passing a {label type: regex} dictionary is a shortcut to update type regex’s or to add new ones. The inference will take place in the order in which they appear in the dictionary. To reuse an existing regex will updating others, you can refer to them as None, e.g. {'dcml': None, 'my_own': r'^(PAC|HC)$'}.

property has_detached_annotations#: bool Is True as long as the score contains Annotations objects, that are not attached to the MSCX object.

property mscx: MSCX#: Standard way of accessing the parsed MuseScore file.

property types#: dict Shows the mapping of label types to their descriptions.

attach_labels(key, staff=None, voice=None, harmony_layer=None, check_for_clashes=True, remove_detached=True)[source]#

Insert detached labels key into this score’s MSCX object.

Parameters:

key (str) – Key of the detached labels you want to insert into the score.
staff (int, optional) – By default, labels are added to staves as specified in the TSV or to -1 (lowest). Pass an integer to specify a staff.
voice (int, optional) – By default, labels are added to voices (notational layers) as specified in the TSV or to 1 (main voice). Pass an integer to specify a voice.
harmony_layer (int, optional) –

By default, the labels are written to the layer specified as an integer in the column harmony_layer.

Pass an integer to select a particular layer:

* 0 to attach them as absolute (‘guitar’) chords, meaning that when opened next time,

MuseScore will split and encode those beginning with a note name ( resulting in ms3-internal harmony_layer 3).

* 1 the labels are written into the staff’s layer for Roman Numeral Analysis.

* 2 to have MuseScore interpret them as Nashville Numbers
check_for_clashes (bool, optional) – Defaults to True, meaning that the positions where the labels will be inserted will be checked for existing labels.
remove_detached (bool, optional) – By default, the detached Annotations object is removed after successfully attaching it. Pass False to have it remain in detached state.

Returns:

int – Number of newly attached labels.
int – Number of labels that were to be attached.

change_labels_cfg(labels_cfg={}, staff=None, voice=None, harmony_layer=None, positioning=None, decode=None, column_name=None, color_format=None)[source]#

Update Score.labels_cfg and MSCX.labels_cfg.

Parameters:

labels_cfg (dict) – Using an entire dictionary or, to change only particular options, choose from:
staff – Arguments as they will be passed to get_labels()
voice – Arguments as they will be passed to get_labels()
harmony_layer – Arguments as they will be passed to get_labels()
positioning – Arguments as they will be passed to get_labels()
decode – Arguments as they will be passed to get_labels()

check_labels(keys='annotations', regex=None, regex_name='dcml', **kwargs)[source]#

Tries to match the labels keys against the given regex or the one of the registered regex_name. Returns wrong labels.

Parameters:

keys (str or Collection, optional) – The key(s) of the Annotation objects you want to check. Defaults to ‘annotations’, the attached labels.
regex (str, optional) – Pass a regular expression against which to check the labels if you don’t want to use the one of an existing regex_name or in order to register a new one on the fly by passing the new name as regex_name.
regex_name (str, optional) – To use the regular expression of a registered type, pass its name, defaults to ‘dcml’. Pass a new name and a regex to register a new label type on the fly.
kwargs – Parameters passed to check_labels().

Returns:

Labels not matching the regex.

Return type:

pandas.DataFrame

color_non_chord_tones(color_name: str = 'red') → DataFrame | None[source]#

Iterates through the attached labels, tries to interpret them as DCML harmony labels, colors the notes in the parsed score that are not expressed by the respective label for a score segment, and stores a report under review_report.

Parameters:: color_name – Name the color that the non-chord tones should get, defaults to ‘red’. Name can be a CSS color or a MuseScore color (see utils.MS3_COLORS).
Returns:: A coloring report which is the original df with the appended columns ‘n_colored’, ‘n_untouched’, ‘count_ratio’, ‘dur_colored’, ‘dur_untouched’, ‘dur_ratio’. They contain the counts and durations of the colored vs. untouched notes as well the ratio of each pair. Note that the report does not take into account notes that reach into a segment, nor does it correct the duration of notes that reach into the subsequent segment.

compare_labels(key: str = 'detached', new_color: str = 'ms3_darkgreen', old_color: str = 'ms3_darkred', detached_is_newer: bool = False, add_to_rna: bool = True, metadata_update: dict | None = None, force_metadata_update: bool = False) → Tuple[int, int][source]#

Parameters:

key – Key of the detached labels you want to compare to the ones in the score.
new_color – The colors by which new and old labels are differentiated. Identical labels remain unchanged. Colors can be CSS colors or MuseScore colors (see utils.MS3_COLORS).
old_color – The colors by which new and old labels are differentiated. Identical labels remain unchanged. Colors can be CSS colors or MuseScore colors (see utils.MS3_COLORS).
detached_is_newer – Pass True if the detached labels are to be added with new_color whereas the attached changed labels will turn old_color, as opposed to the default.
add_to_rna – By default, new labels are attached to the Roman Numeral layer. Pass False to attach them to the chord layer instead.
metadata_update – Dictionary containing metadata that is to be included in the comparison score. Notably, ms3 uses the key ‘compared_against’ when the comparison is performed against a given git_revision.
force_metadata_update – By default, the metadata is only updated if the comparison yields at least one difference to avoid outputting comparison scores not displaying any changes. Pass True to force the metadata update, which results in the properts changed being set to True.

Returns:

Number of attached labels that were not present in the old version and whose color has been changed. Number of added labels that are not present in the current version any more and which have been added as a consequence.

detach_labels(key, staff=None, voice=None, harmony_layer=None, delete=True, inverse=False, regex=None)[source]#

Detach all annotations labels from this score’s MSCX object or just a selection of them, without taking labels_cfg into account (don’t decode the labels). The extracted labels are stored as a new Annotations object that is accessible via Score.{key}. By default, delete is set to True, meaning that if you call store_scores() afterwards, the created MuseScore file will not contain the detached labels.

Parameters:

key (str) – Specify a new key for accessing the detached set of annotations. The string needs to be usable as an identifier, e.g. not start with a number, not contain special characters etc. In return you may use it as a property: For example, passing 'chords' lets you access the detached labels as Score.chords. The key ‘annotations’ is reserved for all annotations attached to the score.
staff (int, optional) – Pass a staff ID to select only labels from this staff. The upper staff has ID 1.
voice ({1, 2, 3, 4}, optional) – Can be used to select only labels from one of the four notational layers. Layer 1 is MuseScore’s main, ‘upper voice’ layer, coloured in blue.
harmony_layer (int or str, optional) – Select one of the harmony layers {0,1,2,3} to select only these.
delete (bool, optional) – By default, the labels are removed from the XML structure in MSCX. Pass False if you want them to remain. This could be useful if you only want to extract a subset of the annotations for storing them separately but without removing the labels from the score.

get_infer_regex()[source]#

Returns:: Mapping of label types to the corresponding regular expressions in the order in which they are currently set to be inferred.
Return type:: dict

get_labels(key: str | None = None, interval_index: bool = False, unfold: bool = False) → DataFrame | None[source]#

DataFrame representing all Labels, i.e., all <Harmony> tags, of the score or another set of annotations. Corresponds to calling get_labels() on the selected object (by default, the one representing labels attached to the score) with the current _labels_cfg. Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, volta, harmony_layer, label, offset_x, offset_y, regex_match

Parameters:

key
interval_index – Pass True to replace the default RangeIndex by an IntervalIndex.

Returns:

DataFrame representing all Labels, i.e., all <Harmony> tags in the score.

new_type(name, regex, description='', infer=True)[source]#

Declare a custom label type. A type consists of a name, a regular expression and, falculatively, of a description.

Parameters:

name (str or int) – Name of the custom label type.
regex (str) – Regular expression that matches all labels of the custom type.
description (str, optional) – Human readable description that appears when calling the property Score.types.
infer (bool, optional) – By default, the labels of all Annotations objects are matched against the new type. Pass False to not change any label’s type.

load_annotations(tsv_path: str | None = None, anno_obj: Annotations | None = None, df: DataFrame | None = None, key: str = 'detached', infer: bool = True, **cols) → None[source]#

Attach an Annotations object to the score and make it available as Score.{key}. It can be an existing object or one newly created from the TSV file tsv_path.

Parameters:

tsv_path – If you want to create a new Annotations object from a TSV file, pass its path.
anno_obj – Instead, you can pass an existing object.
df – Or you can automatically create one from a given DataFrame.
key – Specify a new key for accessing the set of annotations. The string needs to be usable as an identifier, e.g. not start with a number, not contain special characters etc. In return you may use it as a property: For example, passing 'chords' lets you access the Annotations as Score.chords. The key ‘annotations’ is reserved for all annotations attached to the score.
infer – By default, the label types are inferred in the currently configured order (see name2regex). Pass False to not add and not change any label types.
**cols – If the columns in the specified TSV file diverge from the standard column names, pass them as standard_name=’custom name’ keywords.

store_annotations(key='annotations', tsv_path=None, **kwargs)[source]#

Save a set of annotations as TSV file. While store_list stores attached labels only, this method can also store detached labels by passing a key.

Parameters:

key (str, optional) – Key of the Annotations object which you want to output as TSV file. By default, the annotations attached to the score (key=’annotations’) are stored.
tsv_path (str, optional) – Path of the newly created TSV file including the file name. By default, the TSV file is stored next to t
kwargs – Additional keyword arguments will be passed to the function pandas.DataFrame.to_csv() to customise the format of the created file (e.g. to change the separator to commas instead of tabs, you would pass sep=',').

write_score_to_handler(file_handler)[source]#

Write the current MSCX object to a file handler. Just a shortcut for Score.mscx.write_score_to_handler().

Parameters:: file_handler – File handler to write to.

store_score(filepath)[source]#

Store the current MSCX object attached to this score as uncompressed MuseScore file. Just a shortcut for Score.mscx.store_scores().

Parameters:: filepath – Path of the newly created MuseScore file, including the file name ending on ‘.mscx’. Uncompressed files (‘.mscz’) are not supported.

_handle_path(path, key=None)[source]#

Puts the path into paths, files, fnames, fexts dicts with the given key.

Parameters:

path (str) – Full file path.
key (str, optional) – The key chosen by the user. By default, the key is automatically assigend to be the file’s extension.

parse_mscx(musescore_file=None, read_only=None, parser=None, labels_cfg={})[source]#

This method is called by __init__() to parse the score. It checks the file extension and in the case of a compressed MuseScore file (.mscz), a temporary uncompressed file is generated which is removed after the parsing process. Essentially, parsing means to initiate a MSCX object and to make it available as Score.mscx and, if the score includes annotations, to initiate an Annotations object that can be accessed as Score.annotations. The method doesn’t systematically clean up data from a hypothetical previous parse.

Parameters:

musescore_file (str, optional) – Path to the MuseScore file to be parsed.
read_only (bool, optional) – Defaults to False, meaning that the parsing is slower and uses more memory in order to allow for manipulations of the score, such as adding and deleting labels. Set to True if you’re only extracting information.
parser ('bs4', optional) – The only XML parser currently implemented is BeautifulSoup 4.
labels_cfg (dict, optional) – Store a configuration dictionary to determine the output format of the Annotations object representing the currently attached annotations. See MSCX.labels_cfg.

output_mscx(**kwargs) → None[source]#: Deprecated method. Replaced by store_score().

The MSCX class#

This class defines the user interface for accessing score information via Score.mscx. It consists mainly of shortcuts for interacting with the parser in use, currently Beautifulsoup exclusively.

class ms3.score.MSCX(mscx_src, read_only=False, parser='bs4', labels_cfg={}, parent_score=None, **logger_cfg)[source]#

Object for interacting with the XML structure of a MuseScore 3 file. Is usually attached to a Score object and exposed as Score.mscx. An object is only created if a score was successfully parsed.

changed#: bool Switches to True as soon as the original XML structure is changed. Does not automatically switch back to False.

read_only#: bool, optional Shortcut for MSCX.parsed.read_only. Defaults to False, meaning that the parsing is slower and uses more memory in order to allow for manipulations of the score, such as adding and deleting labels. Set to True if you’re only extracting information.

parent_score#: Score The Score object to which this MSCX object is attached.

parser#: {‘bs4’} The currently used parser.

labels_cfg#: dict Configuration dictionary to determine the output format of the Annotations object representing the labels that are attached to a score (stored as _annotations`). The options correspond to the parameters of Annotations.get_labels().

mscx_src#: str Full path of the parsed MuseScore file.

cadences(interval_index: bool = False, unfold: bool = False) → DataFrame | None[source]#: pandas.DataFrame DataFrame representing all cadence annotations in the score.

chords(mode: Literal['auto', 'strict'] = 'auto', interval_index: bool = False, unfold: bool = False) → DataFrame | None[source]#

DataFrame of Chords representing all <Chord> tags contained in the MuseScore file (all <note> tags come within one) and attached score information and performance maerks, e.g. lyrics, dynamics, articulations, slurs (see the explanation for the mode parameter for more details). Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, duration, gracenote, tremolo, nominal_duration, scalar, volta, chord_id, dynamics, articulation, staff_text, slur, Ottava:8va, Ottava:8vb, pedal, TextLine, decrescendo_hairpin, diminuendo_line, crescendo_line, crescendo_hairpin, tempo, qpm, lyrics:1, Ottava:15mb

Parameters:

mode – Defaults to ‘auto’, meaning that additional performance markers available in the score are to be included, namely lyrics, dynamics, fermatas, articulations, slurs, staff_text, system_text, tempo, and spanners (e.g. slurs, 8va lines, pedal lines). This results in NaN values in the column ‘chord_id’ for those markers that are not part of a <Chord> tag, e.g. <Dynamic>, <StaffText>, or <Tempo>. To prevent that, pass ‘strict’, meaning that only <Chords> are included, i.e. the column ‘chord_id’ will have no empty values.
interval_index – Pass True to replace the default RangeIndex by an IntervalIndex.

Returns:

DataFrame of Chords representing all <Chord> tags contained in the MuseScore file.

events(interval_index: bool = False, unfold: bool = False) → DataFrame | None[source]#

DataFrame representing a raw skeleton of the score’s XML structure and contains all Event callbacks API contained in it. It is the original tabular representation of the MuseScore file’s source code from which all other tables, except measures are generated.

Parameters:: interval_index – Pass True to replace the default RangeIndex by an IntervalIndex.
Returns:: DataFrame containing the original tabular representation of all Event callbacks API encoded in the MuseScore file.

expanded(interval_index: bool = False, unfold: bool = False) → DataFrame | None[source]#

DataFrame representing Expanded labels, i.e., all annotations encoded in <Harmony> tags which could be matched against one of the registered regular expressions and split into feature columns. Currently this method is hard-coded to return expanded DCML harmony labels only but it takes into account the current _labels_cfg. Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, volta, label, alt_label, offset_x, offset_y, regex_match, globalkey, localkey, pedal, chord, numeral, form, figbass, changes, relativeroot, cadence, phraseend, chord_type, globalkey_is_minor, localkey_is_minor, chord_tones, added_tones, root, bass_note

Parameters:: interval_index – Pass True to replace the default RangeIndex by an IntervalIndex.
Returns:: DataFrame representing all Labels, i.e., all <Harmony> tags in the score.

property has_annotations#: bool Shortcut for MSCX.parsed.has_annotations. Is True as long as at least one label is attached to the current XML.

property n_form_labels#: int Shortcut for MSCX.parsed.n_form_labels. Is True if at least one StaffText seems to constitute a form label.

form_labels(detection_regex: str = None, exclude_harmony_layer: bool = False, interval_index: bool = False, expand: bool = True, unfold: bool = False) → DataFrame | None[source]#

DataFrame representing form labels (or other) that have been encoded as <StaffText>s rather than in the <Harmony> layer. This function essentially filters all StaffTexts matching the detection_regex and adds the standard position columns.

Parameters:

detection_regex – By default, detects all labels starting with one or two digits followed by a column (see the regex). Pass another regex to retrieve only StaffTexts matching this one.
exclude_harmony_layer – By default, form labels are detected even if they have been encoded as Harmony labels (rather than as StaffText). Pass True in order to retrieve only StaffText form labels.
interval_index – Pass True to replace the default RangeIndex by an IntervalIndex.

Returns:

DataFrame containing all StaffTexts matching the detection_regex

labels(interval_index: bool = False, unfold: bool = False) → DataFrame | None[source]#

DataFrame representing all Labels, i.e., all <Harmony> tags in the score, as returned by calling get_labels() on the object at _annotations with the current _labels_cfg. Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, volta, harmony_layer, label, offset_x, offset_y, regex_match

Parameters:: interval_index – Pass True to replace the default RangeIndex by an IntervalIndex.
Returns:: DataFrame representing all Labels, i.e., all <Harmony> tags in the score.

measures(interval_index: bool = False, unfold: bool = False) → DataFrame | None[source]#

DataFrame representing the Measures of the MuseScore file (which can be incomplete measures). Comes with the columns mc, mn, quarterbeats, duration_qb, keysig, timesig, act_dur, mc_offset, volta, numbering_offset, dont_count, barline, breaks, repeats, next

Parameters:: interval_index – Pass True to replace the default RangeIndex by an IntervalIndex.
Returns:: DataFrame representing the measures of the MuseScore file (which can be incomplete measures).

offset_dict(all_endings: bool = False, unfold: bool = False) → dict[source]#: {mc -> offset} dictionary measuring each MC’s distance from the piece’s beginning (0) in quarter notes.

property metadata#: dict Shortcut for MSCX.parsed.metadata. Metadata from and about the MuseScore file.

notes(interval_index: bool = False, unfold: bool = False) → DataFrame | None[source]#

DataFrame representing the Notes of the MuseScore file. Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, duration, gracenote, tremolo, nominal_duration, scalar, tied, tpc, midi, volta, chord_id

Parameters:: interval_index – Pass True to replace the default RangeIndex by an IntervalIndex.
Returns:: DataFrame representing the Notes of the MuseScore file.

notes_and_rests(interval_index: bool = False, unfold: bool = False) → DataFrame | None[source]#

DataFrame representing the Notes and Rests of the MuseScore file. Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, duration, gracenote, tremolo, nominal_duration, scalar, tied, tpc, midi, volta, chord_id

Parameters:: interval_index – Pass True to replace the default RangeIndex by an IntervalIndex.
Returns:: DataFrame representing the Notes and Rests of the MuseScore file.

property parsed: _MSCX_bs4#: _MSCX_bs4 Standard way of accessing the object exposed by the current parser. MSCX uses this object’s interface for requesting manipulations of and information from the source XML.

rests(interval_index: bool = False, unfold: bool = False) → DataFrame | None[source]#

DataFrame representing the Rests of the MuseScore file. Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, duration, nominal_duration, scalar, volta

Parameters:: interval_index – Pass True to replace the default RangeIndex by an IntervalIndex.
Returns:: DataFrame representing the Rests of the MuseScore file.

property staff_ids#: list of int The staff IDs contained in the score, usually just a list of increasing numbers starting at 1.

property style#: Style Can be used like a dictionary to change the information within the score’s <Style> tag.

property version#: str MuseScore version that the file was created with.

property volta_structure#: dict {first_mc -> {volta_number -> [mc1, mc2…]} } dictionary.

add_labels(annotations_object)[source]#

Receives the labels from an Annotations object and adds them to the XML structure representing the MuseScore file that might be written to a file afterwards.

Parameters:: annotations_object (Annotations) – Object of labels to be added.
Returns:: Number of actually added labels.
Return type:: int

change_label_color(mc, mc_onset, staff, voice, label, color_name=None, color_html=None, color_r=None, color_g=None, color_b=None, color_a=None)[source]#

Shortcut for :py:meth:MSCX.parsed.change_label_color

Parameters:

mc (int) – Measure count of the label
mc_onset (fractions.Fraction) – Onset position to which the label is attached.
staff (int) – Staff to which the label is attached.
voice (int) – Notational layer to which the label is attached.
label (str) – (Decoded) label.
color_name (str, optional) – Two ways of specifying the color.
color_html (str, optional) – Two ways of specifying the color.
color_r (int or str, optional) – To specify a RGB color instead, pass at least, the first three. color_a (alpha = opacity) defaults to 255.
color_g (int or str, optional) – To specify a RGB color instead, pass at least, the first three. color_a (alpha = opacity) defaults to 255.
color_b (int or str, optional) – To specify a RGB color instead, pass at least, the first three. color_a (alpha = opacity) defaults to 255.
color_a (int or str, optional) – To specify a RGB color instead, pass at least, the first three. color_a (alpha = opacity) defaults to 255.

change_labels_cfg(labels_cfg={}, staff=None, voice=None, harmony_layer=None, positioning=None, decode=None, column_name=None, color_format=None)[source]#

Update MSCX.labels_cfg.

Parameters:

labels_cfg (dict) – Using an entire dictionary or, to change only particular options, choose from:
staff – Arguments as they will be passed to get_labels()
voice – Arguments as they will be passed to get_labels()
harmony_layer – Arguments as they will be passed to get_labels()
positioning – Arguments as they will be passed to get_labels()
decode – Arguments as they will be passed to get_labels()

color_non_chord_tones(df: DataFrame, color_name: str = 'red', chord_tone_cols: Collection[str] = ['chord_tones', 'added_tones'], color_nan: bool = True) → DataFrame[source]#

Iterates backwards through the rows of the given DataFrame, interpreting each row as a score segment, and colors all notes that do not correspond to one of the tonal pitch classes (TPC) indicated in one of the tuples contained in the chord_tone_cols. The columns ‘mc’ and ‘mc_onset’ are taken to indicate each score segment’s start, which reaches to the subsequent one (the last segment reaching to the end of the score). Only notes whose onsets lie within the respective segment are colored, meaning that those whose durations reach into a segment are not taken into account.

Parameters:

df – A DataFrame with the columns [‘mc’, ‘mc_onset’] + chord_tone_cols
color_name – Name the color that the non-chord tones should get, defaults to ‘red’. Name can be a CSS color or a MuseScore color (see utils.MS3_COLORS).
chord_tone_cols – Names of the columns containing tuples of chord tones, expressed as TPC. Not that in the expanded tables extracted by default, these columns correspond to intervals relative to the local tonic. The absolute representation required here can be obtained using Annotations.expand_dcml with absolute=True.
color_nan – By default, if all of the chord_tone_cols contain a NaN value, all notes in the segment will be colored. Pass False to add the segment to the previous one instead.

Returns:

A coloring report which is the original df with the appended columns ‘n_colored’, ‘n_untouched’, ‘count_ratio’, ‘dur_colored’, ‘dur_untouched’, ‘dur_ratio’. They contain the counts and durations of the colored vs. untouched notes as well the ratio of each pair. Note that the report does not take into account notes that reach into a segment, nor does it correct the duration of notes that reach into the subsequent segment.

delete_labels(df)[source]#

Delete a set of labels from the current XML.

Parameters:: df (pandas.DataFrame) – A DataFrame with the columns [‘mc’, ‘mc_onset’, ‘staff’, ‘voice’]

replace_labels(annotations_object)[source]#

Parameters:: annotations_object (Annotations) – Object of labels to be added.

delete_empty_labels()[source]#: Remove all empty labels from the attached annotations.

get_chords(staff=None, voice=None, mode='auto', lyrics=False, staff_text=False, dynamics=False, articulation=False, spanners=False, **kwargs)[source]#

Retrieve a customized chord list, e.g. one including less of the processed features or additional, unprocessed ones compared to the standard chord list.

Parameters:

staff (int) – Get information from a particular staff only (1 = upper staff)
voice (int) – Get information from a particular voice only (1 = only the first layer of every staff)
mode ({'auto', 'all', 'strict'}, optional) –
- ‘auto’ (default), meaning that those aspects are automatically included that occur in the score; the resulting DataFrame has no empty columns except for those parameters that are set to True.
- ’all’: Columns for all aspects are created, even if they don’t occur in the score (e.g. lyrics).
- ’strict’: Create columns for exactly those parameters that are set to True, regardless which aspects occur in the score.
lyrics (bool, optional) – Include lyrics.
staff_text (bool, optional) – Include staff text such as tempo markings.
dynamics (bool, optional) – Include dynamic markings such as f or p.
articulation (bool, optional) – Include articulation such as arpeggios.
spanners (bool, optional) – Include spanners such as slurs, 8va lines, pedal lines etc.
**kwargs (bool, optional) – Set a particular keyword to True in order to include all columns from the _events DataFrame whose names include that keyword. Column names include the tag names from the MSCX source code.

Returns:

DataFrame representing all <Chord> tags in the score with the selected features.

Return type:

pandas.DataFrame

get_raw_labels()[source]#

Shortcut for MSCX.parsed.get_raw_labels(). Retrieve a “raw” list of labels, meaning that label types reflect only those defined within <Harmony> tags which can be 1 (MuseScore’s Roman Numeral display), 2 (Nashville) or undefined (in the case of ‘normal’ chord labels, defaulting to 0).

Returns:: DataFrame with raw label features (i.e. as encoded in XML)
Return type:: pandas.DataFrame

infer_mc(mn, mn_onset=0, volta=None)[source]#

Shortcut for MSCX.parsed.infer_mc(). Tries to convert a (mn, mn_onset) into a (mc, mc_onset) tuple on the basis of this MuseScore file. In other words, a human readable score position such as “measure number 32b (i.e., a second ending), beat 3” needs to be converted to (32, 1/2, 2) if “beat” has length 1/4, or–if the meter is, say 9/8 and “beat” has a length of 3/8– to (32, 6/8, 2). The resulting (mc, mc_onset) tuples are required for attaching a label to a score. This is only necessary for labels that were not originally extracted by ms3.

Parameters:

mn (int or str) – Measure number as in a reference print edition.
mn_onset (fractions.Fraction, optional) – Distance of the requested position from beat 1 of the complete measure (MN), expressed as fraction of a whole note. Defaults to 0, i.e. the position of beat 1.
volta (int, optional) – In the case of first and second endings, which bear the same measure number, a MN might have to be disambiguated by passing 1 for first ending, 2 for second, and so on. Alternatively, the MN can be disambiguated traditionally by passing it as string with a letter attached. In other words, infer_mc(mn=32, volta=1) is equivalent to infer_mc(mn='32a').

Returns:

int – Measure count (MC), denoting particular <Measure> tags in the score.
fractions.Fraction

write_score_to_handler(file_handler: IO) → bool[source]#

Shortcut for MSCX.parsed.write_score_to_handler(). Write the current XML structure to a file handler.

Parameters:: file_handler – File handler to write to.
Returns:: Whether the file was successfully created.

store_score(filepath: str) → bool[source]#

Shortcut for MSCX.parsed.store_scores(). Store the current XML structure as uncompressed MuseScore file.

Parameters:: filepath – Path of the newly created MuseScore file, including the file name ending on ‘.mscx’. Uncompressed files (‘.mscz’) are not supported.
Returns:: Whether the file was successfully created.

Store an excerpt of the current score as a new .mscx file by defining start and end measure. If no end measure is specified, the excerpt will include everything following the start measure. The original score header and metadata are kept. Start and end measure both can be specified either as MC (the number in MuseScore’s status bar) or as MN (the number as displayed in the score).

Parameters:

start_mc – Measure count of the first measure to be included in the excerpt. If start_mc is given, start_mn must be None.
start_mn – Measure number of the first measure to be included in the excerpt. If start_mn is given, start_mc must be None.
start_mc_onset – The starting onset value in the first measure. Every note with onset value strictly smaller than start_mc_onset will be removed from the excerpt.
end_mc – Measure count of the last measure to be included in the excerpt. If end_mc is given, end_mn must be None.
end_mn – Measure number of the last measure to be included in the excerpt. If end_mn is given, end_mc must be None.
end_mc_onset – The ending onset value in the last measure. Every not with onset value strictly greate than end_mc_onset will be removed from the excerpt.
exclude_start – If set to True, the note corresponding to start_mc_onset will be removed as well.
exclude_end – If set to True, the note corresponding to end_mc_onset will be removed as well.
metronome_tempo – Optional[float], optional Setting this value will override the tempo at the beginning of the excerpt which, otherwise, is created automatically according to the tempo in vigour at that moment in the score. This is achieved by inserting a hidden metronome marking with a value that depends on the specified “beats per minute”, where “beat” depends on the value of the metronome_beat_unit parameter.
metronome_beat_unit – Optional[Fraction | float], optional Defaults to 1/4, which stands for a quarter note. Please note that for now, the combination of beat unit and tempo is converted and expressed as quarter notes per minute in the (invisible) metronome marking. For example, specifying 1/8=100 will effectively result in 1/4=50 (which is equivalent).
directory – Path to the folder where the excerpts are to be stored.
suffix – String to be inserted in the excerpts filename[suffix]_[start_mc]-[end_mc]

Returns:

if it was impossible to find a quarterbeat value for the given start measure.: In this case the function will not produce an excerpt.

Return type:

Optional[None]

get_phrase_boundaries()[source]#

This method uses the expanded and unfolded labels to find all the phrase boundaries where a beginning is defined by an opening bracket { and the end is defined by a cadence. This cadence can either come with a closing bracket } or after the end of a phrase and before the beginning of the next one. The start and end point are also associated with onset values to precisely know the position of the labels within the measure in order to be able to trim “unrelated” notes later on.

Returns:: “mcs”, “start_onset”, “end_onset”. The first one corresponds to a tuple containing all the measure counts included in the phrase, the second is onset value of the starting label and the last key is the onset value for the ending label.
Return type:: a list of all unique maps that identify all phrases in the score. Each map has three keys

Store excerpts based on the phrase annotations contained in the score, if any. For this purpose, the self.find_phrases() method is called; for each pair of start and end MC an excerpt will be stored. The resulting excerpts will be named [original_filename]_phrase_[start_mc]-[end_mc].mscx by default or [original_filename]_[suffix]_[start_mc]-[end_mc].mscx if suffix is specified.

Parameters:

metronome_tempo – Optional[float], optional Setting this value will override the tempo at the beginning of the excerpt which, otherwise, is created automatically according to the tempo in vigour at that moment in the score. This is achieved by inserting a hidden metronome marking with a value that depends on the specified “beats per minute”, where “beat” depends on the value of the metronome_beat_unit parameter.
metronome_beat_unit – Optional[Fraction | float], optional Defaults to 1/4, which stands for a quarter note. Please note that for now, the combination of beat unit and tempo is converted and expressed as quarter notes per minute in the (invisible) metronome marking. For example, specifying 1/8=100 will effectively result in 1/4=50 (which is equivalent).
directory – Optional[str], optional name of the directory you want the excerpt saved to, by default None
suffix – Optional[str], optional It is the string “category identifier” of your excerpts. For instance the name of the output files will in general be [original_filename]_[suffix]_[start_mc]-[end_mc].mscx
random_skip – Optional[bool], optional This boolean value, if True, will make the method randomly skip extracted excerpts and don’t generate them. This parameter is set by default to False.

This method takes a tuple containing the number of the measures that contained in the excerpt to be stored. The method will infer the active global and local keys, relative to the excerpt, from the annotations. It will then store the excerpt in the given (or default) directory with the name [original_filename]_[suffix]_[start_mc]-[end_mc].mscx.

Parameters:

included_mcs – Tuple[int] The mc values of the measures to be included in the excerpt
start_mc_onset – Optional[Fraction | float], optional The value of the chosen onset for the true start of the excerpt. If onset is None or 0, then the excerpt will normally begin on the onset of the first included measure. In the case where this value should be different, for example 1/2 or .5, then all the notes with onset strictly smaller than this value will be removed from the first measure.
end_mc_onset – Optional[Fraction | float], optional This has the same behaviour as the previous parameter. This means that if is set to None or to the value of the last onset in the measure, then the excerpt will normally finish at the end of the last included measure. In the cse where this value should be different, for example 1/2 or .5, then all notes with onset strictly greater than this value will be removed from the last measure.
exclude_start – Optional[bool], optional If set to True the note (in first measure) with onset value equal to start_mc_onset will also be removed thus excluding the first onset (i.e. the end)
exclude_end – Optional[bool], optional If set to True the note (in last measure) with onset value equal to end_mc_onset will also be removed thus excluding the last onset (i.e. the end)
metronome_tempo – Optional[float], optional Setting this value will override the tempo at the beginning of the excerpt which, otherwise, is created automatically according to the tempo in vigour at that moment in the score. This is achieved by inserting a hidden metronome marking with a value that depends on the specified “beats per minute”, where “beat” depends on the value of the metronome_beat_unit parameter.
metronome_beat_unit – Optional[Fraction | float], optional Defaults to 1/4, which stands for a quarter note. Please note that for now, the combination of beat unit and tempo is converted and expressed as quarter notes per minute in the (invisible) metronome marking. For example, specifying 1/8=100 will effectively result in 1/4=50 (which is equivalent).
directory – Optional[str], optional name of the directory you want the excerpt saved to, by default None
suffix – Optional[str], optional It is the string “category identifier” of your excerpts. For instance the name of the output files will in general be [original_filename]_[suffix]_[start_mc]-[end_mc].mscx

Extract random snippets from the given score. The snippets have the constraint that they must strictly lie within a phrase. This means that within this type of excerpt neither phrase beginnings nor phrase endings will be considered. Not even cadences. By default, it extracts all possible snippets and stores them at the optional directory path. The resulting excerpts will be named [original_filename]_within_phrase_[start_mc]-[end_mc].mscx.

Parameters:

metronome_tempo – Optional[float], optional The value that the user wants to set as the tempo of the excerpts. The tag will be added to XML tree of the excerpt’s file and will have the desired tempo
metronome_beat_unit – Optional[Fraction | float], optional To obtain the correct value for the tempo it is important to specify the beat unit that corresponds to the given tempo value. Since MuseScore works in quarter-beats, the convention is that 1 indicates that the unit is the quarter beat and all other values are relative to this one (i.e. 1/2 would be the eighth note etc.)
directory – Optional[str], optional name of the directory you want the excerpt saved to, by default None
suffix – Optional[str], optional It is the string “category identifier” of your excerpts. For instance the name of the output files will in general be [original_filename]_[suffix]_[start_mc]-[end_mc].mscx
random_skip – Optional[bool], optional This boolean value, if True, will make the method randomly skip extracted excerpts and don’t generate them. This parameter is set by default to False.

Calls the self.find_phrase_endings() method to find all phrase endings contained in the score, then stores all corresponding excerpts. A phrase ending is specified to finish on a cadence and to start 2 MCs before the corresponding closing bracket that indicates the “end” of the phrase. The resulting excerpts will be named [original_filename]_[suffix]_[start_mc]-[end_mc].mscx.

Parameters:

metronome_tempo – Optional[float], optional The value that the user wants to set as the tempo of the excerpts. The tag will be added to XML tree of the excerpt’s file and will have the desired tempo
metronome_beat_unit – Optional[Fraction | float], optional To obtain the correct value for the tempo it is important to specify the beat unit that corresponds to the given tempo value. Since MuseScore works in quarter-beats, the convention is that 1 indicates that the unit is the quarter beat and all other values are relative to this one (i.e. 1/2 would be the eighth note etc.)
max_excerpt_length – Optional[int], optional This parameter specifies the maximum number of measures to be included in the excerpt. For example, if max_excerpt_length is set to 3, all phrase endings excerpts will contain a max. of 3 measures.
directory – Optional[str], optional name of the directory you want the excerpt saved to, by default None
suffix – Optional[str], optional It is the string “category identifier” of your excerpts. For instance the name of the output files will in general be [original_filename]_[suffix]_[start_mc]-[end_mc].mscx
random_skip – Optional[bool], optional This boolean value, if True, will make the method randomly skip extracted excerpts and don’t generate them. This parameter is set by default to False.

Method that stores n_excerpts random excerpts each mn_lengths measures long. If n_excerpts is not specified then the method will create the maximum possible number of different excerpts containing mn_length measures each.

Parameters:

n_excerpts – The number of random excerpts to be created
mc_length – The allowed number of measures for each excerpt
metronome_tempo – The tempo value that the user might specify to overwrite the original piece tempo
metronome_beat_unit – Beat unit value that goes with the specified tempo value. Might be 1/4 if the unit is the quarter note, 1/8 if the unit is the eighth note and so on.
directory – Name of the directory into which the excerpts need to be stored
suffix – Suffix to be added to the name of the generated excerpts

update_metadata(composer: str | None = None, workTitle: str | None = None, movementNumber: str | None = None, movementTitle: str | None = None, workNumber: str | None = None, poet: str | None = None, lyricist: str | None = None, arranger: str | None = None, copyright: str | None = None, creationDate: str | None = None, mscVersion: str | None = None, platform: str | None = None, source: str | None = None, translator: str | None = None, compared_against: str | None = None, **kwargs)[source]#: Update the metadata tags of the parsed score.

The Annotations class#

class ms3.annotations.Annotations(tsv_path=None, df=None, cols={}, index_col=None, sep='\t', mscx_obj=None, infer_types=None, read_only=False, **logger_cfg)[source]#

Class for storing, converting and manipulating annotation labels.

property harmony_layer_counts#: Returns the counts of the harmony_layers as dict.

get_labels(staff: int | None = None, voice: Literal[1, 2, 3, 4] | None = None, harmony_layer: Literal[0, 1, 2, 3] | None = None, positioning: bool = False, decode: bool = True, drop: bool = False, inverse: bool = False, column_name: str | None = None, color_format: Literal['html', 'rgb', 'rgba', 'name'] | None = None, regex=None)[source]#

Returns a DataFrame of annotation labels.

Parameters:

staff (int, optional) – Select harmonies from a given staff only. Pass staff=1 for the upper staff.
harmony_layer ({0, 1, 2, 3, 'dcml', ...}, optional) –

If MuseScore’s harmony feature has been used, you can filter harmony types by passing
0 for unrecognized strings 1 for Roman Numeral Analysis 2 for Nashville Numbers 3 for encoded absolute chords ‘dcml’ for labels from the DCML harmonic annotation standard … self-defined types that have been added to self.regex_dict through the use of self.infer_types()
positioning (bool, optional) – Set to True if you want to include information about how labels have been manually positioned.
decode (bool, optional) – Set to False if you want to keep labels in harmony_layer 0, 2, and 3 labels in their original form as encoded by MuseScore (e.g., with root and bass as TPC (tonal pitch class) where C = 14 for layer 0).
drop (bool, optional) – Set to True to delete the returned labels from this object.
column_name (str, optional) – Can be used to rename the columns holding the labels.
color_format ({'html', 'rgb', 'rgba', 'name', None}) – If label colors are encoded, determine how they are displayed.

expand_dcml(drop_others=True, warn_about_others=True, drop_empty_cols=False, chord_tones=True, relative_to_global=False, absolute=False, all_in_c=False, **kwargs)[source]#

Expands all labels where the regex_match has been inferred as ‘dcml’ and stores the DataFrame in self._expanded.

Parameters:

drop_others (bool, optional) – Set to False if you want to keep labels in the expanded DataFrame which have not regex_match ‘dcml’.
warn_about_others (bool, optional) – Set to False to suppress warnings about labels that have not regex_match ‘dcml’. Is automatically set to False if drop_others is set to False.
drop_empty_cols (bool, optional) – Return without unused columns
chord_tones (bool, optional) – Pass True if you want to add four columns that contain information about each label’s chord, added, root, and bass tones. The pitches are expressed as intervals relative to the respective chord’s local key or, if relative_to_global=True, to the globalkey. The intervals are represented as integers that represent stacks of fifths over the tonic, such that 0 = tonic, 1 = dominant, -1 = subdominant, 2 = supertonic etc.
relative_to_global (bool, optional) – Pass True if you want all labels expressed with respect to the global key. This levels and eliminates the features localkey and relativeroot.
absolute (bool, optional) – Pass True if you want to transpose the relative chord_tones to the global key, which makes them absolute so they can be expressed as actual note names. This implies prior conversion of the chord_tones (but not of the labels) to the global tonic.
all_in_c (bool, optional) – Pass True to transpose chord_tones to C major/minor. This performs the same transposition of chord tones as relative_to_global but without transposing the labels, too. This option clashes with absolute=True.
kwargs – Additional arguments are passed to get_labels() to define the original representation.

Returns:

Expanded DCML labels

Return type:

pandas.DataFrame

The BeautifulSoup parser#

class ms3.bs4_parser._MSCX_bs4(soup: BeautifulSoup, read_only: bool = False, logger_cfg: dict | None = None)[source]#

This sister class implements MSCX’s methods for a score parsed with beautifulsoup4.

mscx_src#

Path to the uncompressed MuseScore 3 file (MSCX) to be parsed.

Type:: str

measure_nodes#: {staff -> {MC -> tag} }

tags#

Nested dictionary allowing to access the score’s XML elements in a convenient and structured manner:

{MC ->

{staff ->

{voice ->

{mc_onset ->

[{“name” -> str,: “duration” -> Fraction, “tag” -> bs4.Tag }, …

]

}

staff2drum_map: Dict[int, DataFrame]#: For each stuff that is to be treated as drumset score, keep a mapping from MIDI pitch (DataFrame index) to note and instrument features. The columns typically include [‘head’, ‘line’, ‘voice’, ‘name’, ‘stem’, ‘shortcut’]. When creating note tables, the ‘name’ column will be populated with the names here rather than note names.

property has_voltas: bool#: Return True if the score includes first and second endings. Otherwise, no ‘volta’ columns will be added to facets.

property volta_structure: Dict[int, Dict[int, List[int]]]#: {first_mc -> {volta_number -> [MC] } }

add_label(label, mc, mc_onset, staff=1, voice=1, **kwargs)[source]#

Adds a single label to the current XML in form of a new <Harmony> (and maybe also <location>) tag.

Parameters:

label
mc
mc_onset
staff
voice
kwargs

add_standard_cols(df: DataFrame) → DataFrame[source]#: Ensures that the DataFrame’s first columns are [‘mc’, ‘mn’, (‘volta’), ‘timesig’, ‘mc_offset’]

change_label_color(mc, mc_onset, staff, voice, label, color_name=None, color_html=None, color_r=None, color_g=None, color_b=None, color_a=None)[source]#

Change the color of an existing label.

Parameters:

mc (int) – Measure count of the label
mc_onset (fractions.Fraction) – Onset position to which the label is attached.
staff (int) – Staff to which the label is attached.
voice (int) – Notational layer to which the label is attached.
label (str) – (Decoded) label.
color_name (str, optional) – Two ways of specifying the color.
color_html (str, optional) – Two ways of specifying the color.
color_r (int or str, optional) – To specify a RGB color instead, pass at least, the first three. color_a (alpha = opacity) defaults to 255.
color_g (int or str, optional) – To specify a RGB color instead, pass at least, the first three. color_a (alpha = opacity) defaults to 255.
color_b (int or str, optional) – To specify a RGB color instead, pass at least, the first three. color_a (alpha = opacity) defaults to 255.
color_a (int or str, optional) – To specify a RGB color instead, pass at least, the first three. color_a (alpha = opacity) defaults to 255.

chords(mode: Literal['auto', 'strict'] = 'auto', interval_index: bool = False, unfold: bool = False) → DataFrame | None[source]#

Parameters:

mode – Defaults to ‘auto’, meaning that additional performance markers available in the score are to be included, namely lyrics, dynamics, fermatas, articulations, slurs, staff_text, system_text, tempo, and spanners (e.g. slurs, 8va lines, pedal lines). This results in NaN values in the column ‘chord_id’ for those markers that are not part of a <Chord> tag, e.g. <Dynamic>, <StaffText>, or <Tempo>. To prevent that, pass ‘strict’, meaning that only <Chords> are included, i.e. the column ‘chord_id’ will have no empty values.
interval_index – Pass True to replace the default RangeIndex by an IntervalIndex.

Returns:

DataFrame of Chords representing all <Chord> tags contained in the MuseScore file.

cl(recompute: bool = False) → DataFrame[source]#: Get the raw Chords without adding quarterbeat columns.

color_notes(from_mc: int, from_mc_onset: Fraction, to_mc: int | None = None, to_mc_onset: Fraction | None = None, midi: List[int] = [], tpc: List[int] = [], inverse: bool = False, color_name: str | None = None, color_html: str | None = None, color_r: int | None = None, color_g: int | None = None, color_b: int | None = None, color_a: int | None = None) → Tuple[List[Fraction], List[Fraction]][source]#

Colors all notes occurring in a particular score segment in one particular color, or only those (not) pertaining to a collection of MIDI pitches or Tonal Pitch Classes (TPC).

Parameters:

from_mc – MC in which the score segment starts.
from_mc_onset – mc_onset where the score segment starts.
to_mc – MC in which the score segment ends. If not specified, the segment ends at the end of the score.
to_mc_onset – If to_mc is defined, the mc_onset where the score segment ends.
midi – Collection of MIDI numbers to use as a filter or an inverse filter (depending on inverse).
tpc – Collection of Tonal Pitch Classes (C=0, G=1, F=-1 etc.) to use as a filter or an inverse filter (depending on inverse).
inverse – By default, only notes where all specified filters (midi and/or tpc) apply are colored. Set to True to color only those notes where none of the specified filters match.
color_name – Specify the color either as a name, or as HTML color, or as RGB(A). Name can be a CSS color or a MuseScore color (see utils.MS3_COLORS).
color_html – Specify the color either as a name, or as HTML color, or as RGB(A). An HTML color needs to be string of length 6.
color_r – If you specify the color as RGB(A), you also need to specify color_g and color_b.
color_g – If you specify the color as RGB(A), you also need to specify color_r and color_b.
color_b – If you specify the color as RGB(A), you also need to specify color_r and color_g.
color_a – If you have specified an RGB color, the alpha value defaults to 255 unless specified otherwise.

Returns:

List of durations (in fractions) of all notes that have been colored. List of durations (in fractions) of all notes that have not been colored.

delete_label(mc, staff, voice, mc_onset, empty_only=False)[source]#

Delete a label from a particular position (if there is one).

Parameters:

mc (int) – Measure count.
staff – Notational layer in which to delete the label.
voice – Notational layer in which to delete the label.
mc_onset (fractions.Fraction) – mc_onset
empty_only (bool, optional) – Set to True if you want to delete only empty harmonies. Since normally all labels at the defined position are deleted, this flag is needed to prevent deleting non-empty <Harmony> tags.

Returns:

Whether a label was deleted or not.

Return type:

bool

events(interval_index: bool = False, unfold: bool = False) → DataFrame | None[source]#

Parameters:: interval_index – Pass True to replace the default RangeIndex by an IntervalIndex.
Returns:: DataFrame containing the original tabular representation of all Event callbacks API encoded in the MuseScore file.

form_labels(detection_regex: str = None, exclude_harmony_layer: bool = False, interval_index: bool = False, unfold: bool = False) → DataFrame | None[source]#

DataFrame representing form labels (or other) that have been encoded as <StaffText>s rather than in the <Harmony> layer (see argument exclude_harmony_layer). This function essentially filters all StaffTexts matching the detection_regex and adds the standard position columns.

Parameters:

detection_regex – By default, detects all labels starting with one or two digits followed by a column (see the regex). Pass another regex to retrieve only StaffTexts matching this one.
exclude_harmony_layer – By default, form labels are detected even if they have been encoded as Harmony labels (rather than as StaffText). Pass True in order to retrieve only StaffText form labels.
interval_index – Pass True to replace the default RangeIndex by an IntervalIndex.
unfold – Pass True to retrieve a Dat

Returns:

DataFrame containing all StaffTexts matching the detection_regex

fl(detection_regex: str = None, exclude_harmony_layer=False) → DataFrame[source]#

Get the raw Form labels (or other) that match the detection_regex, but without adding quarterbeat columns.

{ref}`$1`

detection_regex:: By default, detects all labels starting with one or two digits followed by a column (see the regex). Pass another regex to retrieve only StaffTexts matching this one.

Returns:: DataFrame containing all StaffTexts matching the detection_regex or None

get_chords(staff: int | None = None, voice: Literal[1, 2, 3, 4] | None = None, mode: Literal['auto', 'strict'] = 'auto', lyrics: bool = False, dynamics: bool = False, articulation: bool = False, staff_text: bool = False, system_text: bool = False, tempo: bool = False, spanners: bool = False, thoroughbass: bool = False, **kwargs) → DataFrame[source]#

Retrieve a customized chord lists, e.g. one including less of the processed features or additional, unprocessed ones.

Parameters:

staff – Get information from a particular staff only (1 = upper staff)
voice – Get information from a particular voice only (1 = only the first layer of every staff)
mode –

Defaults to ‘auto’, meaning that those aspects are automatically included that occur in the score; the resulting DataFrame has no empty columns except for those parameters that are set to True.

’strict’: Create columns for exactly those parameters that are set to True, regardless whether they occur in the score or not (in which case the column will be empty).
lyrics – Include lyrics.
dynamics – Include dynamic markings such as f or p.
articulation – Include articulation such as arpeggios.
staff_text – Include expression text such as ‘dolce’ and free-hand staff text such as ‘div.’.
system_text – Include system text such as movement titles.
tempo – Include tempo markings.
spanners – Include spanners such as slurs, 8va lines, pedal lines etc.
thoroughbass – Include thoroughbass figures’ levels and durations.
**kwargs

Returns:

DataFrame representing all <Chord> tags in the score with the selected features.

get_raw_labels()[source]#

Returns a list of <harmony> tags from the parsed score.

Return type:: pandas.DataFrame

get_texts(only_header: bool = True) → Dict[str, str][source]#: Process <Text> nodes (normally attached to <Staff id=”1”>).

_get_metadata()[source]#

Return type:: dict

get_instrumentation() → Dict[str, str][source]#: Returns a {staff__instrument -> instrument_name} dict.

infer_mc(mn, mn_onset=0, volta=None)[source]#: mn_onset and needs to be converted to mc_onset

Create an excerpt by removing all <Measure> tags that are not selected in included_mcs. The order of the given integers is inconsequential because measures are always printed in the order in which they appear in the score. Also, it is assumed that the MCs are consecutive, i.e. there are no gaps between them; otherwise the excerpt will not show correct measure numbers and might be incoherent in terms of missing key and time signatures.

Parameters:

included_mcs – List of measure counts to be included in the excerpt. Pass a single integer to get an excerpt from that MC to the end of the piece.
globalkey – If the excerpt has chord labels, make sure the first label starts with the given global key, e.g. ‘F#’ for F sharp major or ‘ab’ for A flat minor.
localkey – If the excerpt has chord labels, make sure the first label starts with the given local key, e.g. ‘I’ for the major tonic key or ‘#iv’ for the raised subdominant minor key or ‘bVII’ for the lowered subtonic major key.
start_mc_onset – Onset value (either Fraction or float) specified as the “true” start of the first measure. Every note with strictly smaller onset value will be “removed” (i.e. mutated into rest)
end_mc_onset – Onset value (either Fraction or float) specified as the “true” end of the last measure. Every note with strictly greater onset value will be “removed” (i.e. mutated into rest)
exclude_start – If set to True, the first note corresponding to start_mc_onset will also be “removed”
exclude_end – If set to True, the last note corresponding to end_mc_onset will also be “removed”
metronome_tempo – Optional[float], optional Setting this value will override the tempo at the beginning of the excerpt which, otherwise, is created automatically according to the tempo in vigour at that moment in the score. This is achieved by inserting a hidden metronome marking with a value that depends on the specified “beats per minute”, where “beat” depends on the value of the metronome_beat_unit parameter.
metronome_beat_unit – Optional[Fraction | float], optional Defaults to 1/4, which stands for a quarter note. Please note that for now, the combination of beat unit and tempo is converted and expressed as quarter notes per minute in the (invisible) metronome marking. For example, specifying 1/8=100 will effectively result in 1/4=50 (which is equivalent).
decompose_repeat_tags – If set to true, the XML tree will be cleansed from all tags referring to repeat-like structures to avoid possible “broken” structures within the excerpt.

_make_measure_list(sections=True, secure=True, reset_index=True)[source]#: Regenerate the measure list from the parsed score with advanced options.

make_standard_chordlist()[source]#: Stores the result of self.get_chords(mode=’strict’)

measures(interval_index: bool = False, unfold: bool = False) → DataFrame | None[source]#

Parameters:: interval_index – Pass True to replace the default RangeIndex by an IntervalIndex.
Returns:: DataFrame representing the measures of the MuseScore file (which can be incomplete measures).

ml(recompute: bool = False) → DataFrame[source]#

Get the raw Measures without adding quarterbeat columns.

Parameters:: recompute – By default, the measures are cached. Pass True to enforce recomputing anew.

Create a new tag with the given name, value and attributes and insert it into the score relative to a given tag. Only one of after, before, append_within and prepend_within can be specified.

Parameters:

name – <name></name>
value – <name>value</name> (if specified)
attributes – <name key=value, …></name>
after – Insert the tag as sibling following the given tag.
before – Insert the tag as sibling preceding the given tag.
append_within – Insert the tag as last child of the given tag.
prepend_within – Insert the tag as first child of the given tag.

Returns:

The new tag.

notes(interval_index: bool = False, unfold: bool = False) → DataFrame | None[source]#

Parameters:: interval_index – Pass True to replace the default RangeIndex by an IntervalIndex.
Returns:: DataFrame representing the Notes of the MuseScore file.

nl(recompute: bool = False) → DataFrame[source]#

Get the raw Notes without adding quarterbeat columns.

Parameters:: recompute – By default, the notes are cached. Pass True to enforce recomputing anew.

notes_and_rests(interval_index: bool = False, unfold: bool = False) → DataFrame | None[source]#

Parameters:: interval_index – Pass True to replace the default RangeIndex by an IntervalIndex.
Returns:: DataFrame representing the Notes and Rests of the MuseScore file.

nrl(recompute: bool = False) → DataFrame[source]#

Get the raw Notes and Rests without adding quarterbeat columns.

Parameters:: recompute – By default, the measures are cached. Pass True to enforce recomputing anew.

offset_dict(all_endings: bool = False, unfold: bool = False) → dict[source]#

Dictionary mapping MCs (measure counts) to their quarterbeat offset from the piece’s beginning. Used for computing quarterbeats for other facets.

Parameters:

all_endings – If a pieces as alternative endings, by default, only the second ending is taken into account for computing quarterbeats in order to make the timeline correspond to a rendition without performing repeats. Events in other endings, notably the first, receive value NA so that they can be filtered out. For score addressability, one might want to apply a continuous timeline to all measures, in which case one would pass True to use the column ‘quarterbeats_all_endings’ of the measures table if it has one. If not, falls back to the default ‘quarterbeats’.
unfold – Pass True to compute quarterbeats for a mc_playthrough column resulting from unfolding repeats. The parameter all_endings is ignored in this case because the unfolded version brings each ending in its correct place.

Returns:

{MC -> quarterbeat_offset}. Offsets are Fractions. If all_endings is not set to True, values for MCs that are part of a first ending (or third or larger) are NA.

rests(interval_index: bool = False, unfold: bool = False) → DataFrame | None[source]#

DataFrame representing the Rests of the MuseScore file. Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, duration, nominal_duration, scalar, volta

Parameters:: interval_index – Pass True to replace the default RangeIndex by an IntervalIndex.
Returns:: DataFrame representing the Rests of the MuseScore file.

rl(recompute: bool = False) → DataFrame[source]#

Get the raw Rests without adding quarterbeat columns.

Parameters:: recompute – By default, the measures are cached. Pass True to enforce recomputing anew.

parse_soup() → None[source]#: First step of parsing the MuseScore source. Involves discovering the <staff> tags and storing the <Measure> tags of each in the measure_nodes dictionary. Also stores the drum_map for each Drumset staff.

parse_measures()[source]#: Converts the score into the three DataFrame self._measures, self._events, and self._notes

perform_checks()[source]#: Perform a series of checks after parsing and emit warnings registered by the ms3 check command (and, by extension, by ms3 review, too).

store_score(filepath: str) → bool[source]#: Store the score as an MSCX file.

ms3.bs4_parser.replace_chord_tag_with_rest(target_tag)[source]#

This functions takes as a parameter a given chord tag from the XML tree and mutates it into a rest tag of the same exact notation. This functionality is useful to trim excerpts to have more control over the actual musical elements that are extracted. It also gives the advantage of not changing the relative positions of notes from the original score.

Parameters:: target_tag – bs4.Tag The chord tag that needs to be mutated into a rest tag of the same duration

class ms3.bs4_parser.Excerpt(soup: BeautifulSoup, measures: Tuple[int] | int, read_only: bool = False, logger_cfg: dict | None = None, first_mn: int | None = None, first_timesig: str | None = None, first_keysig: int | None = None, first_harmony_values: Dict[str, str] | None = None, first_tempo_tag: Tag | None = None, staff2clef: Dict[int, Dict[str, str]] | None = None, final_barline: bool = False, globalkey: str | None = None, localkey: str | None = None, start_mc_onset: Fraction | None = None, end_mc_onset: Fraction | None = None, exclude_start: bool | None = False, exclude_end: bool | None = False, metronome_tempo: float | None = None, metronome_beat_unit: Fraction | None = Fraction(1, 1), decompose_repeat_tags: bool | None = True)[source]#

Takes a copy of _MSCX_bs4.soup and eliminates all <Measure> tags that do not correspond to the given list of MCs.

set_tempo(first_tempo_tag, metronome_tempo, metronome_beat_unit)[source]#

This method handles the enforcing of the tempo at the beginning of the excerpt. If a metronome mark was found in the piece from which the excerpt was taken, and was still active, and no tempo was specified by the user, then it will be set again in the first measure of the excerpt. Otherwise, if the user indeed specified a tempo along with a beat unit, a custom metronome mark will be added to the beginning of the excerpt overwriting any possible pre-existing metronome mark that could’ve been there.

Parameters:

first_tempo_tag – The last active metronome mark found in the original piece (if any was found)
metronome_tempo – Optional[float], optional Setting this value will override the tempo at the beginning of the excerpt which, otherwise, is created automatically according to the tempo in vigour at that moment in the score. This is achieved by inserting a hidden metronome marking with a value that depends on the specified “beats per minute”, where “beat” depends on the value of the metronome_beat_unit parameter.
metronome_beat_unit – Optional[Fraction | float], optional Defaults to 1/4, which stands for a quarter note. Please note that for now, the combination of beat unit and tempo is converted and expressed as quarter notes per minute in the (invisible) metronome marking. For example, specifying 1/8=100 will effectively result in 1/4=50 (which is equivalent).

trim(start_mc_onset: Fraction | None = None, end_mc_onset: Fraction | None = None, exclude_start: bool | None = False, exclude_end: bool | None = False)[source]#

This method handles the trimming of the excerpt where notes outside of the set onset boundaries are mutated into rests (to not change the relative positions of the notes in the whole excerpt).

Parameters:

start_mc_onset – The onset value before which we want to mutate all other notes (associated with first measure)
end_mc_onset – The onset value after which we want to mutate all other notes (associated with last measure)
exclude_start – If set to True, the note corresponding to the start_mc_onset in the first measure will also be removed
exclude_end – If set to True, the note corresponding to the end_mc_onset in the last measure will also be removed

get_onset_zero_harmony(return_layer: Literal[False]) → Tag | None[source]#
get_onset_zero_harmony(return_layer: Literal[True]) → Tuple[Tag | None, int, int]: Iterate through all tags at mc_onset 0 for all notational (staff, voice) layers and return the first <Harmony> tag or None.

set_clefs(staff2clef: Dict[int, Dict[str, str]])[source]#: Set the initial clefs for the given staves.

set_first_keysig(first_keysig: int)[source]#: Set the key signature of the first measure to the given value.

set_first_mn(first_mn: int)[source]#: Set the measure number of the first measure to the given value.

The method that given the specific onset and measure values, will handle the silencing of all notes that are not withing the onset bounds. More specifically, notes that appear before the start_onset in the start_mc will be mutated to rests (i.e. silenced). Same thing goes for the end_mc. All notes found after the end_onset will also be mutated to rests.

Parameters:

start_onset – onset value set for the first measure. Everything before this will be silenced
end_onset – onset value set for the last measure. Everything after this will be silenced
exclude_start – If set to True, the note corresponding to start_onset in the first measure will also be silenced
exclude_end – If set to True, the note corresponding to end_onset in the last measure will also be silenced

Creates the artificial hidden metronome mark that either comes from the last active metronome mark of the original piece or from some specified tempo and beat unit values specified by the user.

Parameters:

piece_tempo_tag
metronome_tempo – Optional[float], optional Setting this value will override the tempo at the beginning of the excerpt which, otherwise, is created automatically according to the tempo in vigour at that moment in the score. This is achieved by inserting a hidden metronome marking with a value that depends on the specified “beats per minute”, where “beat” depends on the value of the metronome_beat_unit parameter.
metronome_beat_unit – Optional[Fraction | float], optional Defaults to 1/4, which stands for a quarter note. Please note that for now, the combination of beat unit and tempo is converted and expressed as quarter notes per minute in the (invisible) metronome marking. For example, specifying 1/8=100 will effectively result in 1/4=50 (which is equivalent).
user_call

Returns:

decompose_repeat_tags()[source]#: Decomposes all tags that refer to repeat structures of any king in the XML tree of the excerpt. This is a safety measure to avoid ending up with broken repeat structures that would alter the proper “timeline” of the excerpt itself.

class ms3.bs4_parser.ParsedParts(soup: BeautifulSoup, **logger_cfg)[source]#

Storing found parts object from a BeautifulSoup file

Parameters:: soup – bs4.BeautifulSoup, BeautifulSoup object to parse

:param **logger_cfg:obj:dict: The following options are available:: ‘name’: LOGGER_NAME -> by default the logger name is based on the parsed file(s) ‘level’: {‘W’, ‘D’, ‘I’, ‘E’, ‘C’, ‘WARNING’, ‘DEBUG’, ‘INFO’, ‘ERROR’, ‘CRITICAL’} ‘file’: PATH_TO_LOGFILE to store all log messages under the given path.

Parameters:: optional – The following options are available: ‘name’: LOGGER_NAME -> by default the logger name is based on the parsed file(s) ‘level’: {‘W’, ‘D’, ‘I’, ‘E’, ‘C’, ‘WARNING’, ‘DEBUG’, ‘INFO’, ‘ERROR’, ‘CRITICAL’} ‘file’: PATH_TO_LOGFILE to store all log messages under the given path.

property staff2part: dict[list, str]#

Allows users to determine the corresponding part based on the staff number

Example

Returns {[2, 3]: ‘part_1’} for staves 2 and 3 of part 1

Returns:: the dictionary mapping parts to staves
Return type:: dict[list, str]

ms3.bs4_parser.get_enlarged_default_dict() → Dict[str, dict][source]#

Allows users to point to an instrument not only with a ‘trackName’, but also with ‘id’, ‘longName’, ‘shortName’,: ‘instrumentId’, ‘part_trackName’

Returns:

dictionary mapping any of the possible fields (‘id’, ‘longName’, ‘shortName’, trackName’,

Return type:

Dict[str, dict]

’instrumentId’, ‘part_trackName’) corresponding to an instrument into complete information about the instrument (‘id’, ‘longName’, ‘shortName’, ‘trackName’, ‘instrumentId’, ‘part_trackName’, ‘ChannelName’, ‘ChannelValue’)

class ms3.bs4_parser.Instrumentation(soup: BeautifulSoup, **logger_cfg)[source]#

Easy way to read and write the instrumentation of a score, that is ‘id’, ‘longName’, ‘shortName’, ‘trackName’, ‘instrumentId’, ‘part_trackName’,

‘ChannelName’, ‘ChannelValue’.

soup_references() → dict[str, dict[str, Tag]][source]#

Stores tags references for each staff

Returns: the dictionary in the format {‘staff_1’: {‘id’: None, ‘longName’: None, ‘shortName’: None, ‘trackName’: None, ‘instrumentId’: None, ‘part_trackName’: None, ‘ChannelName’, ‘ChannelValue’}, ‘staff_2’: {…}, …} containing the BeautifulSoup tags

property fields#

Extracts information from the tag and stores it for each staff

get_instrument_name(staff_name: str | int)[source]#

Allows users accessing the instrument trackname attributed to the staff staff_name :param staff_name: a number or a string in the format ‘staff_1’ defining the staff of interest

Returns:: trackName extracted from tag for the staff staff_name
Return type:: str

add_suffix(new_values, suffix)[source]#

Adds suffix of the instrument :param new_values: the dictionary of fields to update :param suffix: the string containing version

Returns:: the dictionary with updated names with versions

modify_drumset_tags(staff_type, value, changed_part, field_to_change)[source]#: Sets tags specific for Drumset instruments :param staff_type: the tags containing info of the field :param value: new value of the field :param changed_part: the index of part to update :param field_to_change: the name of field to update

modify_list_tags(changed_part, found, value)[source]#: Sets instruments if there is alist of values to update :param changed_part: number of part of soup file where to find and update in the original file :param found: parts of soup containing channel info in the original file :param value: new values to set :return: corrected list of parts of the same length as value list

set_instrument(staff_id: str | int, trackname)[source]#

Modifies the instrument and all its corresponding information in the soup source file

Parameters:

staff_id – an integer number i or a string in the format ‘staff_i’ defining the staff of interest
trackname – key defining the new value of the instrument, can be one of (‘id’, ‘longName’, ‘shortName’, trackName’, ‘instrumentId’, ‘part_trackName’)

class ms3.bs4_parser.Metatags(soup)[source]#: Easy way to read and write any style information in a parsed MSCX score.

class ms3.bs4_parser.Style(soup)[source]#: Easy way to read and write any style information in a parsed MSCX score.

class ms3.bs4_parser.Prelims(soup: BeautifulSoup, **logger_cfg)[source]#

Easy way to read and write the preliminaries of a score, that is Title, Subtitle, Composer, Lyricist, and ‘Instrument Name (Part)’.

property text_tags: Dict[str, Tag]#: Returns a {key->tag} dict reflecting the <Text> tags currently present in the first <VBox>.

property fields: Dict[str, str]#: Returns a {key->value} dict reflecting the currently set <text> values.

ms3.bs4_parser.get_duration_event(elements)[source]#: Receives a list of dicts representing the events for a given mc_onset and returns the index and name of the first event that has a duration, so either a Chord or a Rest.

ms3.bs4_parser.get_vbox(soup: BeautifulSoup, logger=None) → Tag | None[source]#: Returns the first <VBox> tag contained in the first staff, if any, which usually corresponds to the vertical box at the top of a MuseScore file which contains the prelims (title, composer, etc.)

ms3.bs4_parser.get_part_info(part_tag, start_staff_id=1)[source]#

Instrument names come in different forms in different places. This function extracts the information from a <Part> tag and returns it as a dictionary.

start_staff_id is used as the base for staff numbering when the inner <Staff> tags lack an id attribute (MuseScore 4 format), where the canonical IDs live on the top-level <Staff id="N"> siblings instead. MuseScore numbers staves sequentially across parts, so callers should pass a running counter.

ms3.bs4_parser.make_spanner_cols(df: DataFrame, spanner_types: Collection[str] | None = None, logger=None) → DataFrame[source]#

From a raw chord list as returned by get_chords(spanners=True): create a DataFrame with Spanner IDs for all chords for all spanner types they are associated with.

Parameters:: spanner_types – If this parameter is passed, only the enlisted spanner types [‘Slur’, ‘HairPin’, ‘Pedal’, ‘Ottava’] are included.

History of this algorithm#

At first, spanner IDs were written to Chords of the same layer until a prev/location was found. At first this caused some spanners to continue until the end of the piece because endings were missing when selecting based on the subtype column (endings don’t specify subtype). After fixing this, there were still mistakes, particularly for slurs, because: 1. endings can be missing, 2. endings can occur in a different voice than they should, 3. endings can be expressed with different values than the beginning (all three cases found in ms3/tests/test_local_files/MS3/stabat_03_coloured.mscx) Therefore, the new algorithm ends spanners simply after their given duration.

ms3.bs4_parser.safe_update(old, new)[source]#: Update dict without replacing values.

ms3.bs4_parser.recurse_node(node, prepend=None, exclude_children=None)[source]#

The heart of the XML -> DataFrame conversion. Changes may have ample repercussions!

Returns:: Keys are combinations of tag (& attribute) names, values are value strings.
Return type:: dict

ms3.bs4_parser.decode_harmony_tag(tag)[source]#: Decode a <Harmony> tag into a string.

ms3.bs4_parser.text_tag2str(tag: Tag) → str[source]#: Transforms a <text> tag into a string that potentially includes written-out HTML tags.

ms3.bs4_parser.text_tag2str_components(tag: Tag) → List[str][source]#: Recursively traverses a <text> tag and returns all string components, effectively removing all HTML markup.

ms3.bs4_parser.text_tag2str_recursive(tag: Tag, join_char: str = '') → str[source]#: Gets all string components from a <text> tag and joins them with join_char.

ms3.bs4_parser.tag2text(tag: Tag) → Tuple[str, str][source]#: Takes the <Text> from a MuseScore file’s header and returns its style and string.

ms3.bs4_parser.get_thoroughbass_symbols(item_tag: Tag) → Tuple[str, str][source]#: Returns the prefix and suffix of a <FiguredBassItem> tag if present, empty strings otherwise.

ms3.bs4_parser.thoroughbass_item(item_tag: Tag) → str[source]#: Turns a <FiguredBassItem> tag into a string by concatenating brackets, prefix, digit and suffix.

ms3.bs4_parser.process_thoroughbass(thoroughbass_tag: Tag) → Tuple[List[str], Fraction | None][source]#: Turns a <FiguredBass> tag into a list of components strings, one per level, and duration.

ms3.bs4_parser.get_row_at_quarterbeat(df: DataFrame, quarterbeat: Literal[None]) → DataFrame[source]#

ms3.bs4_parser.get_row_at_quarterbeat(df: DataFrame, quarterbeat: float) → Series | None

Returns the row of a DataFrame that is active at a given quarterbeat by interpreting subsequent intervals of: the given dataframe’s “quarterbeat” column as activation intervals. That is, the rows are interpreted as consecutive, non-overlapping events and the duration_qb column is not taken into account for computing the activation intervals. The last interval’s right boundary is np.inf, so that all values higher than the latest event resolve to the latest event without needing to know the end of the piece.

Parameters:

df – DataFrame in which the column “quarterbeat” is monotonically increasing.
quarterbeat – The position the active row for which will be returned. If the position does not exist because it’s before the first event, None is returned. If None is passed (default), the whole dataframe is returned.

Returns:

The row of the dataframe.

The expand_dcml module#

This is the same code as in the corpora repo as copied on September 24, 2020 and then adapted.

class ms3.expand_dcml.SliceMaker[source]#

This class serves for storing slice notation such as :3 as a variable or passing it as function argument.

Examples

SM = SliceMaker()
some_function( slice_this, SM[3:8] )

select_all = SM[:]
df.loc[select_all]

ms3.expand_dcml.expand_labels(df, column='label', regex=None, rename={}, dropna=False, propagate=True, volta_structure=None, relative_to_global=False, chord_tones=True, absolute=False, all_in_c=False, skip_checks=False, logger=None)[source]#

Split harmony labels complying with the DCML syntax into columns holding their various features and allows for additional computations and transformations.

Uses: compute_chord_tones(), features2type(), labels2global_tonic(), propagate_keys(), propagate_pedal(), replace_special(), roman_numeral2fifths(), split_alternatives(), split_labels(), transform(), transpose()

Parameters:

df (pandas.DataFrame) – Dataframe where one column contains DCML chord labels.
column (str) – Name of the column that holds the harmony labels.
regex (re.Pattern) – Compiled regular expression used to split the labels. It needs to have named groups. The group names are used as column names unless replaced by cols.
rename (dict, optional) – Dictionary to map the regex’s group names to deviating column names of your choice.
dropna (bool, optional) – Pass True if you want to drop rows where column is NaN/<NA>
propagate (bool, optional) – By default, information about global and local keys and about pedal points is spread throughout the DataFrame. Pass False if you only want to split the labels into their features. This ignores all following parameters because their expansions depend on information about keys.
volta_structure (dict, optional) – {first_mc -> {volta_number -> [mc1, mc2…]} } dictionary as you can get it from Score.mscx.volta_structure. This allows for correct propagation into second and other voltas.
relative_to_global (bool, optional) – Pass True if you want all labels expressed with respect to the global key. This levels and eliminates the features localkey and relativeroot.
chord_tones (bool, optional) – Pass True if you want to add four columns that contain information about each label’s chord, added, root, and bass tones. The pitches are expressed as intervals relative to the respective chord’s local key or, if relative_to_global=True, to the globalkey. The intervals are represented as integers that represent stacks of fifths over the tonic, such that 0 = tonic, 1 = dominant, -1 = subdominant, 2 = supertonic etc.
absolute (bool, optional) – Pass True if you want to transpose the relative chord_tones to the global key, which makes them absolute so they can be expressed as actual note names. This implies prior conversion of the chord_tones (but not of the labels) to the global tonic.
all_in_c (bool, optional) – Pass True to transpose chord_tones to C major/minor. This performs the same transposition of chord tones as relative_to_global but without transposing the labels, too. This option clashes with absolute=True.

Returns:

Original DataFrame plus additional columns with split features.

Return type:

pandas.DataFrame

ms3.expand_dcml.extract_features_from_labels(S: Series, regex: Pattern | str | None = None) → DataFrame[source]#: Applies .str.extract(regex) on the Series and returns a DataFrame with all named capturing groups.

ms3.expand_dcml.split_labels(df, label_column='label', regex=None, rename={}, dropna=False, inplace=False, skip_checks=False, logger=None)[source]#

Split harmony labels complying with the DCML syntax into columns holding their various features.

Parameters:

df (pandas.DataFrame) – Dataframe where one column contains DCML chord labels.
label_column (str) – Name of the column that holds the harmony labels.
regex (re.Pattern) – Compiled regular expression used to split the labels. It needs to have named groups. The group names are used as column names unless replaced by cols.
rename (dict) – Dictionary to map the regex’s group names to deviating column names.
dropna (bool, optional) – Pass True if you want to drop rows where column is NaN/<NA>
inplace (bool, optional) – Pass True if you want to mutate df.

ms3.expand_dcml.values_into_df(df: DataFrame, new_values: DataFrame) → DataFrame[source]#: Updates the given DataFrame with the values from the other DataFrame by updating existing columns and concatenating new columns. The returned DataFrame has the columns of new_values on the right-hand side as if they had been concatenated.

ms3.expand_dcml.features2type(numeral, form=None, figbass=None, logger=None)[source]#

Turns a combination of the three chord features into a chord type.

Returns:

‘M’ (Major triad)
’m’ (Minor triad)
’o’ (Diminished triad)
’+’ (Augmented triad)
’mm7’ (Minor seventh chord)
’Mm7’ (Dominant seventh chord)
’MM7’ (Major seventh chord)
’mM7’ (Minor major seventh chord)
’o7’ (Diminished seventh chord)
’%7’ (Half-diminished seventh chord)
’+7’ (Augmented (minor) seventh chord)
’+M7’ (Augmented major seventh chord)

ms3.expand_dcml.replace_special(df, regex, merge=False, inplace=False, cols={}, special_map={}, logger=None)[source]#

Move special symbols in the numeral column to a separate column and replace them by the explicit chords they

stand for. | In particular, this function replaces the symbols It, Ger, and Fr.

Uses: merge_changes()

Parameters:

df (pandas.DataFrame) – Dataframe containing DCML chord labels that have been split by split_labels().
regex (re.Pattern) – Compiled regular expression used to split the labels replacing the special symbols.It needs to have named groups. The group names are used as column names unless replaced by cols.
merge (bool, optional) – False: By default, existing values, except figbass, are overwritten. True: Merge existing with new values (for changes and relativeroot).

cols (dict, optional) –

The special symbols appear in the column numeral and are moved to the column special. In case the column names for ['numeral','form', 'figbass', 'changes', 'relativeroot', 'special'] deviate, pass a dict, such as

{'numeral':         'numeral_col_name',
 'form':            'form_col_name
 'figbass':         'figbass_col_name',
 'changes':         'changes_col_name',
 'relativeroot':    'relativeroot_col_name',
 'special':         'special_col_name'}

special_map (dict, optional) – In case you want to add or alter special symbols to be replaced, pass a replacement map, e.g. {‘N’: ‘bII6’}. The column ‘figbass’ is only altered if it’s None to allow for inversions of special chords.
inplace (bool, optional) – Pass True if you want to mutate df.

ms3.expand_dcml.merge_changes(left, right, *args)[source]#

Merge two changes into one, e.g. b3 and +#7 to +#7b3.

Uses: changes2list()

ms3.expand_dcml.propagate_keys(df, volta_structure=None, globalkey='globalkey', localkey='localkey', add_bool=True, logger=None)[source]#

Propagate information about global keys and local keys throughout the dataframe.

Pass split harmonies for one piece at a time. For concatenated pieces, use apply().

Uses: series_is_minor()

Parameters:

df (pandas.DataFrame) – Dataframe containing DCML chord labels that have been split by split_labels().
volta_structure (dict, optional) – {first_mc -> {volta_number -> [mc1, mc2…]} } dictionary as you can get it from Score.mscx.volta_structure. This allows for correct propagation into second and other voltas.
globalkey (str, optional) – In case you renamed the columns, pass column names.
localkey (str, optional) – In case you renamed the columns, pass column names.
add_bool (bool, optional) – Pass True if you want to add two boolean columns which are true if the respective key is a minor key.

ms3.expand_dcml.propagate_pedal(df, relative=True, drop_pedalend=True, cols={}, logger=None)[source]#

Propagate the pedal note for all chords within square brackets. By default, the note is expressed in relation to each label’s localkey.

Uses: rel2abs_key(), abs2rel_key()

Parameters:

df (pandas.DataFrame) – Dataframe containing DCML chord labels that have been split by split_labels() and where the keys have been propagated using propagate_keys().
relative (bool, optional) – Pass False if you want the pedal note to stay the same even if the localkey changes.
drop_pedalend (bool, optional) – Pass False if you don’t want the column with the ending brackets to be dropped.

cols (dict, optional) –

In case the column names for ['pedal','pedalend', 'globalkey', 'localkey'] deviate, pass a dict, such as

{'pedal':       'pedal_col_name',
 'pedalend':    'pedalend_col_name',
 'globalkey':   'globalkey_col_name',
 'localkey':    'localkey_col_name'}

Utils#

Transformations#

Functions for transforming DataFrames as output by ms3.

ms3.transformations.make_note_name_and_octave_columns(notes: DataFrame, staff2drums: Dict[int, dict | DataFrame | Series] | None = None) → Tuple[Series, Series][source]#: Takes a notelist and maybe a {staff -> {midi_pitch -> ‘instrument_name’}} mapping and returns two columns named ‘name’ and ‘octave’.

ms3.transformations.add_quarterbeats_col(df: DataFrame, offset_dict: Series | dict, offset_dict_all_endings: Series | dict | None = None, interval_index: bool = False, name: str | None = None, logger=None) → DataFrame[source]#

Insert a column measuring the distance of events from MC 1 in quarter notes. If no ‘mc_onset’ column is present,: the column corresponds to the values given in the offset_dict.

Parameters:

df – DataFrame with an mc or mc_playthrough column, and an mc_onset column.
offset_dict –

If unfolded: {mc_playthrough -> offset}

Otherwise: {mc -> offset}

You can create the dict using the functions make_continuous_offset_series() or make_offset_dict_from_measures().

It is not required if the column ‘quarterbeats’ exists already.
offset_dict_all_endings – Argument added later as a straightforward way to add two quarterbeats columns, the second one being the ‘quarterbeats_all_endings’ which is so important that with ms3 v2.2.0 it is included by default. It is independent from unfolding because its main purpose is score addressability.
interval_index – Defaults to False. Pass True to replace the index with an pandas.IntervalIndex (depends on the successful creation of the column duration_qb).
name – If specified, name of the added column. Defaults to ‘quarterbeats’ for normal, and ‘quarterbeats_playthrough’ for unfolded dataframes.
logger

Returns:

The DataFrame with quarterbeats and duration_qb columns added.

ms3.transformations.make_quarterbeats_column(mc_column: Series, mc_onset_column: Series | None, offset_dict: Series | dict, name: str = 'quarterbeats') → Series[source]#

Turn each combination of mc and mc_onset into a quarterbeat value using the offset_dict that maps mc to the measure’s quarterbeat position (distance from the beginning of the piece).

Parameters:

mc_column – A sequence of MC values, each of which will be mapped to its quarterbeats value in offset_dict.
mc_onset_column – If specified, these values will be added to the mapped quarterbeats values.
offset_dict – {mc -> quarterbeats}, can be a Series.
name – Name of the returned Series.

Returns:

Quarterbeats column.

ms3.transformations.add_weighted_grace_durations(notes, weight=0.5, logger=None)[source]#

For a given notes table, change the ‘duration’ value of all grace notes, weighting it by weight.

Parameters:

notes (pandas.DataFrame) – Notes table containing the columns ‘duration’, ‘nominal_duration’, ‘scalar’
weight (Fraction or float) – Value by which to weight duration of all grace notes. Defaults to a half.

Returns:

Copy of notes with altered duration values.

Return type:

pandas.DataFrame

ms3.transformations.compute_chord_tones(df: DataFrame, bass_only: bool = False, expand: bool = False, cols: dict | None = None, logger: Logger | None = None) → DataFrame | Series[source]#

Compute the chord tones for DCML harmony labels. They are returned as lists of tonal pitch classes in close position, starting with the bass note. The tonal pitch classes represent intervals relative to the local tonic:

-2: Second below tonic -1: fifth below tonic 0: tonic 1: fifth above tonic 2: second above tonic, etc.

The labels need to have undergone split_labels() and propagate_keys(). Pedal points are not taken into account.

Uses: features2tpcs()

Parameters:

df – Dataframe containing DCML chord labels that have been split by split_labels() and where the keys have been propagated using propagate_keys(add_bool=True).
bass_only – Pass True if you need only the bass note.
expand – Pass True if you need chord tones and added tones in separate columns. Otherwise a Series is returned.
cols – In case the column names for ['mc', 'numeral', 'form', 'figbass', 'changes', 'relativeroot', 'localkey', 'globalkey'] deviate, pass a dict, such as
logger

Returns:

For every row of df one tuple with chord tones, expressed as tonal pitch classes. If expand is True, the function returns a DataFrame with four columns: Two with tuples for chord tones and added tones, one with the chord root, and one with the bass note.

ms3.transformations.dfs2quarterbeats(dfs: DataFrame | List[DataFrame], measures: DataFrame, unfold=False, quarterbeats=True, interval_index=True, logger=None) → List[DataFrame][source]#

Pass one or several DataFrames and one measures table to unfold repeats and/or add quarterbeats columns and/or index.

Parameters:

dfs – DataFrame(s) that are to be unfolded and/or receive quarterbeats.
measures
unfold
quarterbeats
interval_index

Returns:

Altered copies of dfs.

ms3.transformations.get_chord_sequences(at, major_minor=True, level=None, column='chord', logger=None)[source]#

Transforms an annotation table into lists of chord symbols for n-gram analysis. If your table represents several pieces, make sure to pass the groupby parameter level to avoid including inexistent transitions.

Parameters:

at (pandas.DataFrame) – Annotation table.
major_minor (bool, optional) –

Defaults to True: the length of the chord sequences corresponds to localkey segments. The result comes as

dict of dicts. | If you pass False, chord sequences are returned as they are, potentially including incorrect transitions, e.g., when

the localkey changes. The result comes as list of lists, where the sublists result from the groupby if you specified level.
level (int or list) – Argument passed to pandas.DataFrame.groupby(). Defaults to -1, resulting in a GroupBy by all levels except the last. Conversely, you can pass, for instance, 2 to group by the first two levels.
column (str) – Name of the column containing the chord symbols that compose the sequences.

Returns:

If major_minor is True, the sequences are returned as {int -> {‘localkey’ -> str,

’localkey_is_minor’ -> bool, ‘sequence’ -> list} } | If False, the sequences are returned as a list of lists

Return type:

dict of dict or list of list

ms3.transformations.group_annotations_by_features(at, features='numeral', logger=None)[source]#

Drop exact repetitions of one or several feature columns when occurring under the same localkey (and pedal point). For example, pass features = ['numeral', 'form', 'figbass'] to drop rows where all three features are identical with the previous row _and_ the localkey stays the same. If the column duration_qb is present, it is updated with the new durations, as would be the IntervalIndex if there is one. Uses: nan_eq()

Parameters:

at (pandas.DataFrame) – Annotation table
features (str or list) – Feature or feature combination for which to remove immediate repetitions
dropna (bool) – Also subsumes rows for which all features are NaN, rather than treating them as a new value.

Return type:

pandas.DataFrame

Example

>>> df
+--------------+--------------+-------------+----------+---------------+---------+------+---------+---------+--------------+
|              | quarterbeats | duration_qb | localkey | chord         | numeral | form | figbass | changes |
relativeroot |
+==============+==============+=============+==========+===============+=========+======+=========+=========+==============+
| [37.5, 38.5) | 75/2         | 1.0         | I        | viio65(6b3)/V | vii     | o    | 65      | 6b3     | V
         |
+--------------+--------------+-------------+----------+---------------+---------+------+---------+---------+--------------+
| [38.5, 40.5) | 77/2         | 2.0         | I        | Ger           | vii     | o    | 65      | b3      | V
         |
+--------------+--------------+-------------+----------+---------------+---------+------+---------+---------+--------------+
| [40.5, 41.5) | 81/2         | 1.0         | I        | V(7v4)        | V       |      |         | 7v4     |
         |
+--------------+--------------+-------------+----------+---------------+---------+------+---------+---------+--------------+
| [41.5, 43.5) | 83/2         | 2.0         | I        | V(64)         | V       |      |         | 64      |
         |
+--------------+--------------+-------------+----------+---------------+---------+------+---------+---------+--------------+
| [43.5, 44.5) | 87/2         | 1.0         | I        | V7(9)         | V       |      | 7       | 9       |
         |
+--------------+--------------+-------------+----------+---------------+---------+------+---------+---------+--------------+
| [44.5, 46.5) | 89/2         | 2.0         | I        | V7            | V       |      | 7       |         |
         |
+--------------+--------------+-------------+----------+---------------+---------+------+---------+---------+--------------+
| [46.5, 48.0) | 93/2         | 1.5         | I        | I             | I       |      |         |         |
         |
+--------------+--------------+-------------+----------+---------------+---------+------+---------+---------+--------------+

>>> group_annotations_by_features(df)
+--------------+--------------+-------------+----------+--------------+---------+-------+
|              | quarterbeats | duration_qb | localkey | relativeroot | numeral | chord |
+==============+==============+=============+==========+==============+=========+=======+
| [37.5, 40.5) | 75/2         | 3.0         | I        | V            | vii     | vii/V |
+--------------+--------------+-------------+----------+--------------+---------+-------+
| [40.5, 46.5) | 81/2         | 6.0         | I        | NaN          | V       | V     |
+--------------+--------------+-------------+----------+--------------+---------+-------+
| [46.5, 48.0) | 93/2         | 1.5         | I        | NaN          | I       | I     |
+--------------+--------------+-------------+----------+--------------+---------+-------+

ms3.transformations.labels2global_tonic(df, cols={}, inplace=False, logger=None)[source]#

Transposes all numerals to their position in the global major or minor scale. This eliminates localkeys and relativeroots. The resulting chords are defined by [numeral, figbass, changes, globalkey_is_minor] (and pedal).

Uses: transform(), rel2abs_key^, :py:func:`resolve_relative_keys() -> str_is_minor() transpose_changes(), series_is_minor(),

Parameters:

df (pandas.DataFrame) – Dataframe containing DCML chord labels that have been split by split_labels() and where the keys have been propagated using propagate_keys(add_bool=True).

cols (dict, optional) –

In case the column names for ['numeral', 'form', 'figbass', 'changes', 'relativeroot', 'localkey', 'globalkey'] deviate, pass a dict, such as

{'chord':           'chord_col_name'
 'pedal':           'pedal_col_name',
 'numeral':         'numeral_col_name',
 'form':            'form_col_name',
 'figbass':         'figbass_col_name',
 'changes':         'changes_col_name',
 'relativeroot':    'relativeroot_col_name',
 'localkey':        'localkey_col_name',
 'globalkey':       'globalkey_col_name'}}

inplace (bool, optional) – Pass True if you want to mutate the input.

Returns:

If inplace=False, the relevant features of the transposed chords are returned. Otherwise, the original DataFrame is mutated.

Return type:

pandas.DataFrame

ms3.transformations.make_chord_col(df, cols=None)[source]#: The ‘chord’ column contains the chord part of a DCML label, i.e. without indications of key, pedal, cadence, or phrase. This function can re-create this column, e.g. if the feature columns were changed. To that aim, the function takes a DataFrame and the column names that it adds together, creating new strings. Column names ‘changes’ and ‘relativeroot’, if present, are treated specially (see the code).

ms3.transformations.make_gantt_data(at, last_mn=None, relativeroots=True, mode_agnostic_adjacency=True, logger=None)[source]#

Takes an expanded DCML annotation table and returns a DataFrame with timings of the included key segments, based on the column localkey. The column names are suited for the plotly library. Uses: rel2abs_key, resolve_relative_keys, roman_numeral2fifths roman_numerals2semitones, labels2global_tonic

Parameters:

at (pandas.DataFrame) – Expanded DCML annotation table.
last_mn (int, optional) – By default, the column quarterbeats is used for computing Start and Finish unless the column is not present, in which case a continuous version of measure numbers (MN) is used. In the latter case you should pass the last measure number of the piece in order to calculate the correct duration of the last key segment; otherwise it will go until the end of the last label’s MN. As soon as you pass a value, the column quarterbeats is ignored even if present. If you want to ignore it but don’t know the last MN, pass -1.
relativeroots (bool, optional) – By default, additional rows are added based on the column relativeroot. Pass False to prevent that.
mode_agnostic_adjacency (bool, optional) – By default (if relativeroots is True), additional rows are added for labels adjacent to temporarily tonicized roots, no matter if the mode is identical or not. For example, before and after a V/V, all V _and_ v labels will be grouped as adjacent segments. Pass False to group only labels with the same mode (only V labels in the example), or None to include no adjacency at all.

ms3.transformations.notes2pcvs(notes, pitch_class_format='tpc', normalize=False, long=False, fillna=True, additional_group_cols=None, ensure_columns=None, logger=None)[source]#

Parameters:

notes (pandas.DataFrame) – Note table to be transformed into a wide or long table of Pitch Class Vectors by grouping via the first (or only) index level. The DataFrame needs containing at least the columns ‘duration_qb’ and ‘tpc’ or ‘midi’, depending on pitch_class_format.
pitch_class_format (str, optional) –

Defines the type of pitch classes to use for the vectors.

’tpc’ (default): tonal pitch class, such that -1=F, 0=C, 1=G etc.

’name’: tonal pitch class as spelled pitch, e.g. ‘C’, ‘F#’, ‘Abb’ etc.

’pc’: chromatic pitch classes where 0=C, 1=C#/Db, … 11=B/Cb.

’midi’: original MIDI numbers; the result are pitch vectors, not pitch class vectors.
normalize (bool, optional) – By default, the PCVs contain absolute durations in quarter notes. Pass True to normalize the PCV for each group.
long (bool, optional) – By default, the resulting DataFrames have wide format, i.e. each row contains the PCV for one slice. Pass True if you need long format instead, i.e. with a non-unique pandas.IntervalIndex and two columns, [('tpc'|'midi'), 'duration_qb'] where the first column’s name depends on pitch_class_format.
fillna (bool, optional) – By default, if a Pitch class does not appear in a PCV, its value will be 0. Pass False if you want NA instead.
additional_group_cols ((list of) str) – If you would like to maintain some information from other columns of notes in additional index levels, pass their names.
ensure_columns (Iterable, optional) – By default, pitch classes that don’t appear don’t get a column. Pass a value if you want to ensure the presence of particular columns, even if empty. For example, if pitch_class_format='pc' you could pass ensure_columns=range(12).

ms3.transformations.resolve_all_relative_numerals(at, additional_columns=None, inplace=False)[source]#

Resolves Roman numerals that include slash notation such as ‘#vii/ii’ => ‘#i’ or ‘V/V/V’ => ‘VI’ in a major and ‘#VI’ in a minor key. The function expects the columns [‘globalkey_is_minor’, ‘localkey_is_minor’] to be present. The former is necessary only if the column ‘localkey’ is present and needs resolving. Execution will be slightly faster if performed on the entire DataFrame rather than using transform_multiple().

Parameters:

at (pandas.DataFrame) – Annotation table.
additional_columns (str or list) – By default, the function resolves, if present, the columns [‘relativeroot’, ‘pedal’] but here you can name other columns, too. They will be resolved based on the localkey’s mode.
inplace (bool, optional) – By default, a manipulated copy of at is returned. Pass True to mutate instead.

ms3.transformations.segment_by_adjacency_groups(df, cols, na_values='group', group_keys=False, logger=None)[source]#

Drop exact adjacent repetitions within one or a combination of several feature columns and adapt the IntervalIndex and the column ‘duration_qb’ accordingly. Uses: adjacency_groups(), reduce_dataframe_duration_to_first_row()

Parameters:

df (pandas.DataFrame) – DataFrame to be reduced, expected to contain the column duration_qb. In order to use the result as a segmentation, it should have a pandas.IntervalIndex.
cols (list) – Feature columns which exact, adjacent repetitions should be grouped to a segment, keeping only the first row.
na_values ((list of) str or Any, optional) –

Either pass a list of equal length as cols or a single value that is passed to adjacency_groups()

for each. Not dealing with NA values will lead to wrongly grouped segments. The default option is the safest.

’group’ creates individual groups for NA values

’backfill’ or ‘bfill’ groups NA values with the subsequent group

’pad’, ‘ffill’ groups NA values with the preceding group

Any other value works like ‘group’, with the difference that the created groups will be named with this value.
group_keys (bool, optional) – By default, the grouped values will be returned as an appended MultiIndex, differentiation groups via ascending integers. If you want to duplicate the columns’ value, e.g. to account for a custom filling value for na_values, pass True. Beware that this most often results in non-unique index levels.

Returns:

Reduced DataFrame with updated ‘duration_qb’ column and pandas.IntervalIndex on the first level (if present).

Return type:

pandas.DataFrame

ms3.transformations.segment_by_criterion(df: DataFrame, boolean_mask: Series | array, warn_na: bool = False, logger=None) → DataFrame[source]#

Drop all rows where the boolean mask does not match, and adapt the IntervalIndex and the column ‘duration_qb’ accordingly.

Parameters:

df – DataFrame to be reduced, expected to come with the column duration_qb and an pandas.IntervalIndex.
boolean_mask – Boolean mask where every True value starts a new segment.
warn_na – If the boolean mask starts with any number of False, this first group will be missing from the result. Set warn_na to True if you want the logger to throw a warning in this case.

Returns:

Reduced DataFrame with updated ‘duration_qb’ column and pandas.IntervalIndex on the first level.

ms3.transformations.segment_by_interval_index(df, idx, truncate=True)[source]#

Segment a DataFrame into chunks based on a given IntervalIndex.

Parameters:

df (pandas.DataFrame) – DataFrame that has a pandas.IntervalIndex to allow for its segmentation.
idx (pandas.IntervalIndex or pandas.MultiIndex) – Intervals by which to segment df. The index will be prepended to differentiate between segments. If idx is a pandas.MultiIndex, the first level is expected to be a pandas.IntervalIndex.
truncate (bool, optional) – By default, the intervals of the segmented DataFrame will be cut off at segment boundaries and the event’s ‘duration_qb’ will be adapted accordingly. Pass False to prevent that and duplicate overlapping events without adapting their Intervals and ‘duration_qb’.

Returns:

A copy of df where the index levels idx have been prepended and only rows of df with overlapping intervals are included.

Return type:

pandas.DataFrame

ms3.transformations.slice_df(df: DataFrame, quarters_per_slice: float | None = None) → Dict[Interval, DataFrame][source]#

Returns a sliced version of the DataFrame. Slices appear in the IntervalIndex and the contained event’s durations within the slice are shown in the column ‘duration_qb’. Uses:

Parameters:

df (pandas.DataFrame) – The DataFrame is expected to come with an IntervalIndex and contain the columns ‘quarterbeats’ and ‘duration_qb’. Those can be obtained through Parse.get_lists(interval_index=True) or Parse.iter_transformed(interval_index=True).
quarters_per_slice (float, optional) – By default, the slices have variable size, from onset to onset. If you pass a value, the slices will have that constant size, measured in quarter notes. For example, pass 1.0 for all slices to have size 1 quarter.

Return type:

pandas.DataFrame

ms3.transformations.transform_multiple(df, func, level=-1, logger=None, **kwargs)[source]#

Applying transformation(s) separately to concatenated pieces that can be differentiated by index level(s).

Parameters:

df (pandas.DataFrame) – Concatenated tables with pandas.MultiIndex.
func (Callable or str) – Function to be applied to the individual tables. For convenience, you can pass strings to call the standard transformers for a particular table type. For example, pass ‘annotations’ to call transform_annotations.
level (int or list) – Argument passed to pandas.DataFrame.groupby(). Defaults to -1, resulting in a GroupBy by all levels except the last. Conversely, you can pass, for instance, 2 to group by the first two levels.
kwargs – Keyword arguments passed to func.

Return type:

pandas.DataFrame

ms3.transformations.transform_annotations(at, groupby_features=None, resolve_relative=False)[source]#

Wrapper for applying several transformations to an annotation table.

Parameters:

at (pandas.DataFrame) – Annotation table corresponding to a single piece.
groupby_features (str or list) – Argument features passed to group_annotations_by_features().
resolve_relative (bool) – Resolves slash notation (e.g. ‘vii/V’) from Roman numerals in the columns [‘localkey’, ‘relativeroot’, ‘pedal’].

Return type:

pandas.DataFrame

ms3.transformations.transpose_notes_to_localkey(notes)[source]#

Transpose the columns ‘tpc’ and ‘midi’ such that they reflect the local key as if it was C major/minor. This operation is typically required for creating pitch class profiles. Uses: transform(), name2fifths(), roman_numeral2fifths()

Parameters:: notes (pandas.DataFrame) – DataFrame that has at least the columns [‘globalkey’, ‘localkey’, ‘tpc’, ‘midi’].
Returns:: A copy of notes where the columns ‘tpc’ and ‘midi’ are shifted in such a way that tpc=0 and midi=60 match the local tonic (e.g. for the local key A major/minor, each pitch A will have tpc=0 and midi % 12 = 0).
Return type:: pandas.DataFrame

ms3.transformations.transform_columns(df, func, columns=None, param2col=None, inplace=False, **kwargs)[source]#

Wrapper function to use transform() on df[columns], leaving the other columns untouched.

Parameters:

df (pandas.DataFrame) – DataFrame where columns (or column combinations) work as function arguments.
func (callable) – Function you want to apply to all elements in columns.
columns (list) – Columns to which you want to apply func.
param2col (dict or list, optional) – Mapping from parameter names of func to column names. If you pass a list of column names, the columns’ values are passed as positional arguments. Pass None if you want to use all columns as positional arguments.
inplace (bool, optional) – Pass True if you want to mutate df rather than getting an altered copy.
**kwargs (keyword arguments for transform())

ms3.transformations.transform_note_columns(df, to, note_cols=['chord_tones', 'added_tones', 'bass_note', 'root'], minor_col='localkey_is_minor', inplace=False, logger=None, **kwargs)[source]#

Turns columns with line-of-fifth tonal pitch classes into another representation.

Uses: transform_columns()

Parameters:

df (pandas.DataFrame) – DataFrame where columns (or column combinations) work as function arguments.
to ({'name', 'iv', 'pc', 'sd', 'rn'}) –
The tone representation that you want to get from the note_cols.
- ’name’: Note names. Should only be used if the stacked fifths actually represent
  absolute tonal pitch classes rather than intervals over the local tonic. In other words, make sure to use ‘name’ only if 0 means C rather than I.
- ’iv’: Intervals such that 0 = ‘P1’, 1 = ‘P5’, 4 = ‘M3’, -3 = ‘m3’, 6 = ‘A4’,
  -6 = ‘D5’ etc.
- ’pc’: (Relative) chromatic pitch class, or distance from tonic in semitones.
- ’sd’: Scale degrees such that 0 = ‘1’, -1 = ‘4’, -2 = ‘b7’ in major, ‘7’ in minor etc.
  This representation requires a boolean column minor_col which is True in those rows where the stacks of fifths occur in a local minor context and False for the others. Alternatively, if all pitches are in the same mode or you simply want to express them as degrees of particular mode, you can pass the boolean keyword argument minor.
- ’rn’: Roman numerals such that 0 = ‘I’, -2 = ‘bVII’ in major, ‘VII’ in minor etc.
  Requires boolean ‘minor’ values, see ‘sd’.
note_cols (list, optional) – List of columns that hold integers or collections of integers that represent stacks of fifth (0 = tonal center, 1 = fifth above, -1 = fourth above, etc).
minor_col (str, optional) – If to is ‘sd’ or ‘rn’, specify a boolean column where the value is True in those rows where the stacks of fifths occur in a local minor context and False for the others.
**kwargs (keyword arguments for transform())

ms3.transformations.transpose_chord_tones_by_localkey(df, by_global=False)[source]#

Returns a copy of the expanded table where the scale degrees in the chord tone columns: have been transposed by localkey (i.e. they express all chord tones as scale degrees of the globalkey) or, if by_global is set to True, additionally by globalkey (i.e., chord tones as tonal pitch classes TPC).

Parameters:

df (pandas.DataFrame) – Expanded labels with chord tone columns.
by_global (bool) – By default, the transformed chord tone columns express chord tones as scale degrees (or intervals) of the global tonic. If set to True, they correspond to tonal pitch classes and can be further transformed to note names using transform_note_columns().

Return type:

pandas.DataFrame

The commandline interface#

Welcome to ms3 parsing |

The library offers you the following commands. Add the flag -h to one of them to learn about its parameters.

usage: ms3 [-h] [--version]
           {add,check,compare,convert,empty,extract,metadata,review,transform,update,precommit}
           ...

Positional Arguments#

action

Possible choices: add, check, compare, convert, empty, extract, metadata, review, transform, update, precommit

The action that you want to perform.

Named Arguments#

--version: show program’s version number and exit

Sub-commands#

add#

Add labels from annotation tables to scores.

ms3 add [-h] [--ask] [--use {expanded,labels}] [-d DIR] [-o OUT_DIR] [-n] [-a]
        [-i REGEX] [-e REGEX] [-f REGEX] [-m [PATH]] [--reviewed]
        [--files PATHs [PATHs ...]] [--iterative] [-l {c, e, w, i, d}]
        [--log [LOG]] [-t] [-v] [-s SUFFIX] [--replace]

Named Arguments#

--ask

If several files are available for the selected facet (default: ‘expanded’, see –use), I will pick one automatically. Add –ask if you want me to have you select which ones to compare with the scores.

Default: False

--use

Possible choices: expanded, labels

Which type of labels you want to compare with the ones in the score. Defaults to ‘expanded’, i.e., DCML labels. Set –use labels to use other labels available as TSV and set –ask if several sets of labels are available that you want to choose from.

Default: 'expanded'

-d, --dir

Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.

Default: /home/docs/checkouts/readthedocs.org/user_builds/ms3/checkouts/latest/docs

-o, --out

Output directory. For conversion, an absolute path will result in a copy of the original sub-folder structure, whereas a relative path will contain all converted files next to each other.

-n, --nonrecursive

Treat DIR as single corpus even if it contains corpus directories itself.

Default: False

-a, --all

By default, only files listed in the ‘piece’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.

Default: False

-i, --include

Select only files whose names include this string or regular expression.

-e, --exclude

Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.

-f, --folders

Select only folders whose names include this string or regular expression.

-m, --musescore

Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard: path for your system). Other shortcuts are -m win, -m mac, and -m mscore (for Linux).

--reviewed

By default, review files and folder are excluded from parsing. With this option, they will be included, too.

Default: False

--files

(Deprecated) The paths are expected to be within DIR. They will be converted into a view that includes only the indicated files. This is equivalent to specifying the file names as a regex via –include (assuming that file names are unique amongst corpora.

--iterative

Do not use all available CPU cores in parallel to speed up batch jobs.

Default: False

-l, --level

Choose how many log messages you want to see: c (none), e, w, i, d (maximum)

Default: 'i'

--log

Can be a file path or directory path. Relative paths are interpreted relative to the current directory.

-t, --test

No data is written to disk.

Default: False

-v, --verbose

Show more output such as files discarded from parsing.

Default: False

-s, --suffix

Suffix of the new scores with inserted labels. Defaults to _annotated.

Default: '_annotated'

--replace

Remove existing labels from the scores prior to adding. Like calling ms3 empty first.

Default: False

check#

Parse MSCX files and look for errors. In particular, check DCML harmony labels for syntactic correctness.

ms3 check [-h] [--ignore_scores] [--ignore_labels] [--fail]
          [--ignore_metronome] [-d DIR] [-o OUT_DIR] [-n] [-a] [-i REGEX]
          [-e REGEX] [-f REGEX] [-m [PATH]] [--reviewed]
          [--files PATHs [PATHs ...]] [--iterative] [-l {c, e, w, i, d}]
          [--log [LOG]] [-t] [-v]

Named Arguments#

--ignore_scores

Don’t check scores for encoding errors.

Default: False

--ignore_labels

Don’t check DCML labels for syntactic correctness.

Default: False

--fail

If you pass this argument the process will deliberately fail with an AssertionError when there are any mistakes.

Default: False

--ignore_metronome

Pass this flag if you want the check to pass (not fail) even if there is a warning about a missing metronome mark in the first bar of the score.

Default: False

-d, --dir

Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.

Default: /home/docs/checkouts/readthedocs.org/user_builds/ms3/checkouts/latest/docs

-o, --out

Output directory. For conversion, an absolute path will result in a copy of the original sub-folder structure, whereas a relative path will contain all converted files next to each other.

-n, --nonrecursive

Treat DIR as single corpus even if it contains corpus directories itself.

Default: False

-a, --all

By default, only files listed in the ‘piece’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.

Default: False

-i, --include

Select only files whose names include this string or regular expression.

-e, --exclude

Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.

-f, --folders

Select only folders whose names include this string or regular expression.

-m, --musescore

Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard: path for your system). Other shortcuts are -m win, -m mac, and -m mscore (for Linux).

--reviewed

By default, review files and folder are excluded from parsing. With this option, they will be included, too.

Default: False

--files

--iterative

Do not use all available CPU cores in parallel to speed up batch jobs.

Default: False

-l, --level

Choose how many log messages you want to see: c (none), e, w, i, d (maximum)

Default: 'i'

--log

Can be a file path or directory path. Relative paths are interpreted relative to the current directory.

-t, --test

No data is written to disk.

Default: False

-v, --verbose

Show more output such as files discarded from parsing.

Default: False

compare#

For MSCX files for which annotation tables exist, create another MSCX file with a coloured label comparison if differences are found.

ms3 compare [-h] [--ask] [--use {expanded,labels}] [--flip] [--safe] [--force]
            [-d DIR] [-o OUT_DIR] [-n] [-a] [-i REGEX] [-e REGEX] [-f REGEX]
            [-m [PATH]] [--reviewed] [--files PATHs [PATHs ...]] [--iterative]
            [-l {c, e, w, i, d}] [--log [LOG]] [-t] [-v] [-c GIT_REVISION]
            [-s SUFFIX]

Named Arguments#

--ask

Default: False

--use

Possible choices: expanded, labels

Default: 'expanded'

--flip

Pass this flag to treat the annotation tables as if updating the scores instead of the other way around, effectively resulting in a swap of the colors in the output files.

Default: False

--safe

Don’t overwrite existing files.

Default: True

--force

Output comparison files even when no differences are found.

Default: False

-d, --dir

Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.

Default: /home/docs/checkouts/readthedocs.org/user_builds/ms3/checkouts/latest/docs

-o, --out

Output directory. For conversion, an absolute path will result in a copy of the original sub-folder structure, whereas a relative path will contain all converted files next to each other.

-n, --nonrecursive

Treat DIR as single corpus even if it contains corpus directories itself.

Default: False

-a, --all

By default, only files listed in the ‘piece’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.

Default: False

-i, --include

Select only files whose names include this string or regular expression.

-e, --exclude

Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.

-f, --folders

Select only folders whose names include this string or regular expression.

-m, --musescore

Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard: path for your system). Other shortcuts are -m win, -m mac, and -m mscore (for Linux).

--reviewed

By default, review files and folder are excluded from parsing. With this option, they will be included, too.

Default: False

--files

--iterative

Do not use all available CPU cores in parallel to speed up batch jobs.

Default: False

-l, --level

Choose how many log messages you want to see: c (none), e, w, i, d (maximum)

Default: 'i'

--log

Can be a file path or directory path. Relative paths are interpreted relative to the current directory.

-t, --test

No data is written to disk.

Default: False

-v, --verbose

Show more output such as files discarded from parsing.

Default: False

-c, --compare

By default, the _reviewed file displays removed labels in red and added labels in green, compared to the version currently represented in the present TSV files, if any. If instead you want a comparison with the TSV files from another Git commit, additionally pass its specifier, e.g. ‘HEAD~3’, <branch-name>, <commit SHA> etc. LATEST_VERSION is accepted as a revision specifier and will result in a comparison with the TSV files at the tag with the highest version number (falling back to HEAD if no tags have been assigned to the repository.

Default: ''

-s, --suffix

Suffix of the newly created comparison files. Defaults to _compared

Default: '_compared'

convert#

Use your local install of MuseScore to convert MuseScore files.

ms3 convert [-h] [-d DIR] [-o OUT_DIR] [-n] [-a] [-i REGEX] [-e REGEX]
            [-f REGEX] [-m [PATH]] [--reviewed] [--files PATHs [PATHs ...]]
            [--iterative] [-l {c, e, w, i, d}] [--log [LOG]] [-t] [-v]
            [-s SUFFIX] [--format FORMAT]
            [--extensions EXTENSIONS [EXTENSIONS ...]] [--safe]

Named Arguments#

-d, --dir

Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.

Default: /home/docs/checkouts/readthedocs.org/user_builds/ms3/checkouts/latest/docs

-o, --out

Output directory. For conversion, an absolute path will result in a copy of the original sub-folder structure, whereas a relative path will contain all converted files next to each other.

-n, --nonrecursive

Treat DIR as single corpus even if it contains corpus directories itself.

Default: False

-a, --all

By default, only files listed in the ‘piece’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.

Default: False

-i, --include

Select only files whose names include this string or regular expression.

-e, --exclude

Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.

-f, --folders

Select only folders whose names include this string or regular expression.

-m, --musescore

Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard: path for your system). Other shortcuts are -m win, -m mac, and -m mscore (for Linux).

--reviewed

By default, review files and folder are excluded from parsing. With this option, they will be included, too.

Default: False

--files

--iterative

Do not use all available CPU cores in parallel to speed up batch jobs.

Default: False

-l, --level

Choose how many log messages you want to see: c (none), e, w, i, d (maximum)

Default: 'i'

--log

Can be a file path or directory path. Relative paths are interpreted relative to the current directory.

-t, --test

No data is written to disk.

Default: False

-v, --verbose

Show more output such as files discarded from parsing.

Default: False

-s, --suffix

Suffix of the converted files. Defaults to .

Default: ''

--format

Output format of converted files. Defaults to mscx. Other options are {png, svg, pdf, mscz, wav, mp3, flac, ogg, musicxml, mxl, mid}

Default: 'mscx'

--extensions

Those file extensions that you want to be converted, separated by spaces. Defaults to mscx mscz

Default: ['mscx', 'mscz']

--safe

Don’t overwrite existing files.

Default: True

empty#

Remove harmony annotations and store the MuseScore files without them.

ms3 empty [-h] [-d DIR] [-o OUT_DIR] [-n] [-a] [-i REGEX] [-e REGEX]
          [-f REGEX] [-m [PATH]] [--reviewed] [--files PATHs [PATHs ...]]
          [--iterative] [-l {c, e, w, i, d}] [--log [LOG]] [-t] [-v]
          [-s SUFFIX]

Named Arguments#

-d, --dir

Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.

Default: /home/docs/checkouts/readthedocs.org/user_builds/ms3/checkouts/latest/docs

-o, --out

Output directory. For conversion, an absolute path will result in a copy of the original sub-folder structure, whereas a relative path will contain all converted files next to each other.

-n, --nonrecursive

Treat DIR as single corpus even if it contains corpus directories itself.

Default: False

-a, --all

By default, only files listed in the ‘piece’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.

Default: False

-i, --include

Select only files whose names include this string or regular expression.

-e, --exclude

Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.

-f, --folders

Select only folders whose names include this string or regular expression.

-m, --musescore

Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard: path for your system). Other shortcuts are -m win, -m mac, and -m mscore (for Linux).

--reviewed

By default, review files and folder are excluded from parsing. With this option, they will be included, too.

Default: False

--files

--iterative

Do not use all available CPU cores in parallel to speed up batch jobs.

Default: False

-l, --level

Choose how many log messages you want to see: c (none), e, w, i, d (maximum)

Default: 'i'

--log

Can be a file path or directory path. Relative paths are interpreted relative to the current directory.

-t, --test

No data is written to disk.

Default: False

-v, --verbose

Show more output such as files discarded from parsing.

Default: False

-s, --suffix

Suffix of the new scores with removed labels. Defaults to _clean.

Default: '_clean'

extract#

Extract selected information from MuseScore files and store it in TSV files.

ms3 extract [-h] [-M [folder]] [-N [folder]] [-R [folder]] [-L [folder]]
            [-X [folder]] [-F [folder]] [-E [folder]] [-C [folder]]
            [-J [folder]] [-D [suffix]] [-p] [--raw] [-u] [--interval_index]
            [--corpuswise] [-d DIR] [-o OUT_DIR] [-n] [-a] [-i REGEX]
            [-e REGEX] [-f REGEX] [-m [PATH]] [--reviewed]
            [--files PATHs [PATHs ...]] [--iterative] [-l {c, e, w, i, d}]
            [--log [LOG]] [-t] [-v]

Named Arguments#

-M, --measures

Folder where to store TSV files with measure information needed for tasks such as unfolding repetitions.

-N, --notes

Folder where to store TSV files with information on all notes.

-R, --rests

Folder where to store TSV files with information on all rests.

-L, --labels

Folder where to store TSV files with information on all annotation labels.

-X, --expanded

Folder where to store TSV files with expanded DCML labels.

-F, --form_labels

Folder where to store TSV files with all form labels.

-E, --events

Folder where to store TSV files with all events (chords, rests, articulation, etc.) without further processing.

-C, --chords

Folder where to store TSV files with <chord> tags, i.e. groups of notes in the same voice with identical onset and duration. The tables include lyrics, dynamics, articulation, staff- and system texts, tempo marking, spanners, and thoroughbass figures.

-J, --joined_chords

Like -C except that all Chords are substituted with the actual Notes they contain. This is useful, for example, for relating slurs to the notes they group, or bass figures to their bass notes.

-D, --metadata

Set -D to update the ‘metadata.tsv’ files of the respective corpora with the parsed scores. Add a suffix if you want to update ‘metadata{suffix}.tsv’ instead.

-p, --positioning

When extracting labels, include manually shifted position coordinates in order to restore them when re-inserting.

Default: False

--raw

When extracting labels, leave chord symbols encoded instead of turning them into a single column of strings.

Default: True

-u, --unfold

Unfold the repeats for all stored DataFrames.

Default: False

--interval_index

Prepend a column with [start, end) intervals to the TSV files.

Default: False

--corpuswise

Parse one corpus after the other rather than all at once.

Default: False

-d, --dir

Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.

Default: /home/docs/checkouts/readthedocs.org/user_builds/ms3/checkouts/latest/docs

-o, --out

Output directory. For conversion, an absolute path will result in a copy of the original sub-folder structure, whereas a relative path will contain all converted files next to each other.

-n, --nonrecursive

Treat DIR as single corpus even if it contains corpus directories itself.

Default: False

-a, --all

By default, only files listed in the ‘piece’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.

Default: False

-i, --include

Select only files whose names include this string or regular expression.

-e, --exclude

Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.

-f, --folders

Select only folders whose names include this string or regular expression.

-m, --musescore

Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard: path for your system). Other shortcuts are -m win, -m mac, and -m mscore (for Linux).

--reviewed

By default, review files and folder are excluded from parsing. With this option, they will be included, too.

Default: False

--files

--iterative

Do not use all available CPU cores in parallel to speed up batch jobs.

Default: False

-l, --level

Choose how many log messages you want to see: c (none), e, w, i, d (maximum)

Default: 'i'

--log

Can be a file path or directory path. Relative paths are interpreted relative to the current directory.

-t, --test

No data is written to disk.

Default: False

-v, --verbose

Show more output such as files discarded from parsing.

Default: False

metadata#

Update MSCX files with changes made to metadata.tsv (created via ms3 extract -D [-a]).

ms3 metadata [-h] [-d DIR] [-o OUT_DIR] [-n] [-a] [-i REGEX] [-e REGEX]
             [-f REGEX] [-m [PATH]] [--reviewed] [--files PATHs [PATHs ...]]
             [--iterative] [-l {c, e, w, i, d}] [--log [LOG]] [-t] [-v]
             [-s SUFFIX] [-p] [--instrumentation] [--empty] [--remove]

Named Arguments#

-d, --dir

Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.

Default: /home/docs/checkouts/readthedocs.org/user_builds/ms3/checkouts/latest/docs

-o, --out

Output directory. For conversion, an absolute path will result in a copy of the original sub-folder structure, whereas a relative path will contain all converted files next to each other.

-n, --nonrecursive

Treat DIR as single corpus even if it contains corpus directories itself.

Default: False

-a, --all

By default, only files listed in the ‘piece’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.

Default: False

-i, --include

Select only files whose names include this string or regular expression.

-e, --exclude

Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.

-f, --folders

Select only folders whose names include this string or regular expression.

-m, --musescore

Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard: path for your system). Other shortcuts are -m win, -m mac, and -m mscore (for Linux).

--reviewed

By default, review files and folder are excluded from parsing. With this option, they will be included, too.

Default: False

--files

--iterative

Do not use all available CPU cores in parallel to speed up batch jobs.

Default: False

-l, --level

Choose how many log messages you want to see: c (none), e, w, i, d (maximum)

Default: 'i'

--log

Can be a file path or directory path. Relative paths are interpreted relative to the current directory.

-t, --test

No data is written to disk.

Default: False

-v, --verbose

Show more output such as files discarded from parsing.

Default: False

-s, --suffix

Suffix of the new scores with updated metadata fields.

-p, --prelims

Pass this flag if, in addition to updating metadata fields, you also want score headers to be updated from the columns title_text, subtitle_text, composer_text, lyricist_text, part_name_text.

Default: False

--instrumentation

Pass this flag to update the score’s instrumentation based on changed values from ‘staff__instrument’ columns.

Default: False

--empty

Set this flag to also allow empty values to be used for overwriting existing ones.

Default: False

--remove

Set this flag to remove non-default metadata fields that are not columns in the metadata.tsv file anymore.

Default: False

review#

Extract facets, check labels, and create _reviewed files.

ms3 review [-h] [--ignore_scores] [--ignore_labels] [--fail]
           [--ignore_metronome] [--ask] [--use {expanded,labels}] [--flip]
           [--safe] [--force] [-M [folder]] [-N [folder]] [-R [folder]]
           [-L [folder]] [-X [folder]] [-F [folder]] [-E [folder]]
           [-C [folder]] [-J [folder]] [-D [suffix]] [-p] [--raw] [-u]
           [--interval_index] [--corpuswise] [-d DIR] [-o OUT_DIR] [-n] [-a]
           [-i REGEX] [-e REGEX] [-f REGEX] [-m [PATH]] [--reviewed]
           [--files PATHs [PATHs ...]] [--iterative] [-l {c, e, w, i, d}]
           [--log [LOG]] [-t] [-v] [-c [GIT_REVISION]] [--threshold THRESHOLD]

Named Arguments#

--ignore_scores

Don’t check scores for encoding errors.

Default: False

--ignore_labels

Don’t check DCML labels for syntactic correctness.

Default: False

--fail

If you pass this argument the process will deliberately fail with an AssertionError when there are any mistakes.

Default: False

--ignore_metronome

Pass this flag if you want the check to pass (not fail) even if there is a warning about a missing metronome mark in the first bar of the score.

Default: False

--ask

Default: False

--use

Possible choices: expanded, labels

Default: 'expanded'

--flip

Pass this flag to treat the annotation tables as if updating the scores instead of the other way around, effectively resulting in a swap of the colors in the output files.

Default: False

--safe

Don’t overwrite existing files.

Default: True

--force

Output comparison files even when no differences are found.

Default: False

-M, --measures

Folder where to store TSV files with measure information needed for tasks such as unfolding repetitions.

-N, --notes

Folder where to store TSV files with information on all notes.

-R, --rests

Folder where to store TSV files with information on all rests.

-L, --labels

Folder where to store TSV files with information on all annotation labels.

-X, --expanded

Folder where to store TSV files with expanded DCML labels.

-F, --form_labels

Folder where to store TSV files with all form labels.

-E, --events

Folder where to store TSV files with all events (chords, rests, articulation, etc.) without further processing.

-C, --chords

-J, --joined_chords

Like -C except that all Chords are substituted with the actual Notes they contain. This is useful, for example, for relating slurs to the notes they group, or bass figures to their bass notes.

-D, --metadata

Set -D to update the ‘metadata.tsv’ files of the respective corpora with the parsed scores. Add a suffix if you want to update ‘metadata{suffix}.tsv’ instead.

-p, --positioning

When extracting labels, include manually shifted position coordinates in order to restore them when re-inserting.

Default: False

--raw

When extracting labels, leave chord symbols encoded instead of turning them into a single column of strings.

Default: True

-u, --unfold

Unfold the repeats for all stored DataFrames.

Default: False

--interval_index

Prepend a column with [start, end) intervals to the TSV files.

Default: False

--corpuswise

Parse one corpus after the other rather than all at once.

Default: False

-d, --dir

Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.

Default: /home/docs/checkouts/readthedocs.org/user_builds/ms3/checkouts/latest/docs

-o, --out

Output directory. For conversion, an absolute path will result in a copy of the original sub-folder structure, whereas a relative path will contain all converted files next to each other.

-n, --nonrecursive

Treat DIR as single corpus even if it contains corpus directories itself.

Default: False

-a, --all

By default, only files listed in the ‘piece’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.

Default: False

-i, --include

Select only files whose names include this string or regular expression.

-e, --exclude

Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.

-f, --folders

Select only folders whose names include this string or regular expression.

-m, --musescore

Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard: path for your system). Other shortcuts are -m win, -m mac, and -m mscore (for Linux).

--reviewed

By default, review files and folder are excluded from parsing. With this option, they will be included, too.

Default: False

--files

--iterative

Do not use all available CPU cores in parallel to speed up batch jobs.

Default: False

-l, --level

Choose how many log messages you want to see: c (none), e, w, i, d (maximum)

Default: 'i'

--log

Can be a file path or directory path. Relative paths are interpreted relative to the current directory.

-t, --test

No data is written to disk.

Default: False

-v, --verbose

Show more output such as files discarded from parsing.

Default: False

-c, --compare

Pass -c if you want the _reviewed file to display removed labels in red and added labels in green, compared to the version currently represented in the present TSV files, if any. If instead you want a comparison with the TSV files from another Git commit, additionally pass its specifier, e.g. ‘HEAD~3’, <branch-name>, <commit SHA> etc. LATEST_VERSION is accepted as a revision specifier and will result in a comparison with the TSV files at the tag with the highest version number (falling back to HEAD if no tags have been assigned to the repository.

--threshold

Harmony segments where the ratio of non-chord tones vs. chord tones lies above this threshold will be printed in a warning and will cause the check to fail if the –fail flag is set. Defaults to 0.6 (3:2).

Default: 0.6

transform#

Concatenate and transform TSV data from one or several corpora. Available transformations are unfolding repeats and adding an interval index.

ms3 transform [-h] [-d DIR] [-o OUT_DIR] [-n] [-a] [-i REGEX] [-e REGEX]
              [-f REGEX] [-m [PATH]] [--reviewed] [--files PATHs [PATHs ...]]
              [--iterative] [-l {c, e, w, i, d}] [--log [LOG]] [-t] [-v] [-M]
              [-N] [-R] [-L] [-X] [-F [folder]] [-E] [-C] [-D]
              [-s [SUFFIX ...]] [-u] [--interval_index] [--resources] [--safe]
              [--uncompressed] [--dirty]

Named Arguments#

-d, --dir

Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.

Default: /home/docs/checkouts/readthedocs.org/user_builds/ms3/checkouts/latest/docs

-o, --out

Output directory. For conversion, an absolute path will result in a copy of the original sub-folder structure, whereas a relative path will contain all converted files next to each other.

-n, --nonrecursive

Treat DIR as single corpus even if it contains corpus directories itself.

Default: False

-a, --all

By default, only files listed in the ‘piece’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.

Default: False

-i, --include

Select only files whose names include this string or regular expression.

-e, --exclude

Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.

-f, --folders

Select only folders whose names include this string or regular expression.

-m, --musescore

Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard: path for your system). Other shortcuts are -m win, -m mac, and -m mscore (for Linux).

--reviewed

By default, review files and folder are excluded from parsing. With this option, they will be included, too.

Default: False

--files

--iterative

Do not use all available CPU cores in parallel to speed up batch jobs.

Default: False

-l, --level

Choose how many log messages you want to see: c (none), e, w, i, d (maximum)

Default: 'i'

--log

Can be a file path or directory path. Relative paths are interpreted relative to the current directory.

-t, --test

No data is written to disk.

Default: False

-v, --verbose

Show more output such as files discarded from parsing.

Default: False

-M, --measures

Concatenate measures TSVs for all selected pieces.

Default: False

-N, --notes

Concatenate notes TSVs for all selected pieces.

Default: False

-R, --rests

Concatenate rests TSVs for all selected pieces (use ms3 extract -R to create those).

Default: False

-L, --labels

Concatenate raw harmony label TSVs for all selected pieces (use ms3 extract -L to create those).

Default: False

-X, --expanded

Concatenate expanded DCML label TSVs for all selected pieces.

Default: False

-F, --form_labels

Concatenate form label TSVs for all selected pieces.

-E, --events

Concatenate events TSVs (notes, rests, articulation, etc.) for all selected pieces (use ms3 extract -E to create those).

Default: False

-C, --chords

Concatenate chords TSVs (<chord> tags group notes in the same voice with identical onset and duration) including lyrics, dynamics, articulation, staff- and system texts, tempo marking, spanners, and thoroughbass figures, for all selected pieces (use ms3 extract -C to create those).

Default: False

-D, --metadata

Output ‘concatenated_metadata.tsv’ with one row per selected piece.

Default: False

-s, --suffix

Pass -s to use standard suffixes or -s SUFFIX to choose your own. In the latter case they will be assigned to the extracted aspects in the order in which they are listed above (capital letter arguments).

-u, --unfold

Unfold the repeats for all concatenated DataFrames.

Default: False

--interval_index

Prepend a column with [start, end) intervals to the TSV files.

Default: False

--resources

Store the concatenated DataFrames as TSV files with resource descriptors rather than in a ZIP with a package descriptor.

Default: False

--safe

Don’t overwrite existing files.

Default: True

--uncompressed

Store the transformed files as uncompressed TSVs rather than writing them into a ZIP file.

Default: False

--dirty

Allows to override the ‘This repository is dirty’ blocker.

Default: False

update#

Convert MSCX files to the latest MuseScore version and move all chord annotations to the Roman Numeral Analysis layer. This command overwrites existing files!!!

ms3 update [-h] [-d DIR] [-o OUT_DIR] [-n] [-a] [-i REGEX] [-e REGEX]
           [-f REGEX] [-m [PATH]] [--reviewed] [--files PATHs [PATHs ...]]
           [--iterative] [-l {c, e, w, i, d}] [--log [LOG]] [-t] [-v]
           [-s SUFFIX] [--above] [--safe] [--staff STAFF] [--type TYPE]

Named Arguments#

-d, --dir

Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.

Default: /home/docs/checkouts/readthedocs.org/user_builds/ms3/checkouts/latest/docs

-o, --out

Output directory. For conversion, an absolute path will result in a copy of the original sub-folder structure, whereas a relative path will contain all converted files next to each other.

-n, --nonrecursive

Treat DIR as single corpus even if it contains corpus directories itself.

Default: False

-a, --all

By default, only files listed in the ‘piece’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.

Default: False

-i, --include

Select only files whose names include this string or regular expression.

-e, --exclude

Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.

-f, --folders

Select only folders whose names include this string or regular expression.

-m, --musescore

Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard: path for your system). Other shortcuts are -m win, -m mac, and -m mscore (for Linux).

--reviewed

By default, review files and folder are excluded from parsing. With this option, they will be included, too.

Default: False

--files

--iterative

Do not use all available CPU cores in parallel to speed up batch jobs.

Default: False

-l, --level

Choose how many log messages you want to see: c (none), e, w, i, d (maximum)

Default: 'i'

--log

Can be a file path or directory path. Relative paths are interpreted relative to the current directory.

-t, --test

No data is written to disk.

Default: False

-v, --verbose

Show more output such as files discarded from parsing.

Default: False

-s, --suffix

Add this suffix to the filename of every new file.

--above

Display Roman Numerals above the system.

Default: False

--safe

Only moves labels if their temporal positions stay intact.

Default: False

--staff

Which staff you want to move the annotations to. 1=upper staff; -1=lowest staff (default)

Default: -1

--type

defaults to 1, i.e. moves labels to Roman Numeral layer. Other types have not been tested!

Default: 1

precommit#

Like ms3 review but also adds the resulting files to the Git index.

ms3 precommit [-h] [--ignore_scores] [--ignore_labels] [--fail]
              [--ignore_metronome] [--ask] [--use {expanded,labels}] [--flip]
              [--safe] [--force] [-M [folder]] [-N [folder]] [-R [folder]]
              [-L [folder]] [-X [folder]] [-F [folder]] [-E [folder]]
              [-C [folder]] [-J [folder]] [-D [suffix]] [-p] [--raw] [-u]
              [--interval_index] [--corpuswise] [-d DIR] [-o OUT_DIR] [-n]
              [-a] [-i REGEX] [-e REGEX] [-f REGEX] [-m [PATH]] [--reviewed]
              [--files PATHs [PATHs ...]] [--iterative] [-l {c, e, w, i, d}]
              [--log [LOG]] [-t] [-v] [-c [GIT_REVISION]]
              [--threshold THRESHOLD]
              FILE [FILE ...]

Positional Arguments#

FILE: Shadows the –files argument because pre-commit passes files as positional arguments.

Named Arguments#

--ignore_scores

Don’t check scores for encoding errors.

Default: False

--ignore_labels

Don’t check DCML labels for syntactic correctness.

Default: False

--fail

If you pass this argument the process will deliberately fail with an AssertionError when there are any mistakes.

Default: False

--ignore_metronome

Pass this flag if you want the check to pass (not fail) even if there is a warning about a missing metronome mark in the first bar of the score.

Default: False

--ask

Default: False

--use

Possible choices: expanded, labels

Default: 'expanded'

--flip

Pass this flag to treat the annotation tables as if updating the scores instead of the other way around, effectively resulting in a swap of the colors in the output files.

Default: False

--safe

Don’t overwrite existing files.

Default: True

--force

Output comparison files even when no differences are found.

Default: False

-M, --measures

Folder where to store TSV files with measure information needed for tasks such as unfolding repetitions.

-N, --notes

Folder where to store TSV files with information on all notes.

-R, --rests

Folder where to store TSV files with information on all rests.

-L, --labels

Folder where to store TSV files with information on all annotation labels.

-X, --expanded

Folder where to store TSV files with expanded DCML labels.

-F, --form_labels

Folder where to store TSV files with all form labels.

-E, --events

Folder where to store TSV files with all events (chords, rests, articulation, etc.) without further processing.

-C, --chords

-J, --joined_chords

Like -C except that all Chords are substituted with the actual Notes they contain. This is useful, for example, for relating slurs to the notes they group, or bass figures to their bass notes.

-D, --metadata

Set -D to update the ‘metadata.tsv’ files of the respective corpora with the parsed scores. Add a suffix if you want to update ‘metadata{suffix}.tsv’ instead.

-p, --positioning

When extracting labels, include manually shifted position coordinates in order to restore them when re-inserting.

Default: False

--raw

When extracting labels, leave chord symbols encoded instead of turning them into a single column of strings.

Default: True

-u, --unfold

Unfold the repeats for all stored DataFrames.

Default: False

--interval_index

Prepend a column with [start, end) intervals to the TSV files.

Default: False

--corpuswise

Parse one corpus after the other rather than all at once.

Default: False

-d, --dir

Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.

Default: /home/docs/checkouts/readthedocs.org/user_builds/ms3/checkouts/latest/docs

-o, --out

Output directory. For conversion, an absolute path will result in a copy of the original sub-folder structure, whereas a relative path will contain all converted files next to each other.

-n, --nonrecursive

Treat DIR as single corpus even if it contains corpus directories itself.

Default: False

-a, --all

By default, only files listed in the ‘piece’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.

Default: False

-i, --include

Select only files whose names include this string or regular expression.

-e, --exclude

Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.

-f, --folders

Select only folders whose names include this string or regular expression.

-m, --musescore

Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard: path for your system). Other shortcuts are -m win, -m mac, and -m mscore (for Linux).

--reviewed

By default, review files and folder are excluded from parsing. With this option, they will be included, too.

Default: False

--files

--iterative

Do not use all available CPU cores in parallel to speed up batch jobs.

Default: False

-l, --level

Choose how many log messages you want to see: c (none), e, w, i, d (maximum)

Default: 'i'

--log

Can be a file path or directory path. Relative paths are interpreted relative to the current directory.

-t, --test

No data is written to disk.

Default: False

-v, --verbose

Show more output such as files discarded from parsing.

Default: False

-c, --compare

--threshold

Default: 0.6

Unittests#

ms3 has a test suite that uses the PyTest library.

Install dependencies#

Install the library via pip install ms3[testing].

Configuring the tests#

In order to run the tests you need to

clone the unittest_metacorpus including submodules (ask for permission)
in the configuration file new_tests/conftest.py, change the value of CORPUS_DIR to the path containing your clone of the metacorpus (defaults to the user’s home directory)
in the line below, copy the commit SHA of TEST_COMMIT, e.g. 51e4cb5, and checkout your metacorpus to that commit (e.g., git checkout 51e4cb5).

Running the tests#

In the commandline, head to your ms3 folder and call pytest new_tests. Alternatively, some IDEs allow you to right-click on the folder new_tests and select something like Run pytest in new_tests.

Table of Contents

Developers’ Reference#

The Parse class#

The Corpus class#

The Piece class#

The View class#

The Score class#

The MSCX class#

The Annotations class#

The BeautifulSoup parser#

History of this algorithm#

The expand_dcml module#

Utils#

Transformations#

The commandline interface#

Positional Arguments#

Named Arguments#

Sub-commands#

add#

Named Arguments#

check#

Named Arguments#

compare#

Named Arguments#

convert#

Named Arguments#

empty#

Named Arguments#

extract#

Named Arguments#

metadata#

Named Arguments#

review#

Named Arguments#

transform#

Named Arguments#

update#

Named Arguments#

precommit#

Positional Arguments#

Named Arguments#

Unittests#

Install dependencies#

Configuring the tests#

Running the tests#