ms3.utils package#
Submodules#
ms3.utils.concat_metadata module#
ms3.utils.constants module#
- ms3.utils.constants.COMPUTED_METADATA_COLUMNS = ['TimeSig', 'KeySig', 'last_mc', 'last_mn', 'length_qb', 'last_mc_unfolded', 'last_mn_unfolded', 'length_qb_unfolded', 'volta_mcs', 'all_notes_qb', 'n_onsets', 'n_onset_positions', 'guitar_chord_count', 'form_label_count', 'label_count', 'annotated_key']#
Automatically computed columns
- ms3.utils.constants.DCML_METADATA_COLUMNS = ['harmony_version', 'annotators', 'reviewers', 'score_integrity', 'composed_start', 'composed_end', 'composed_source']#
Arbitrary column names used in the DCML corpus initiative
- ms3.utils.constants.MUSESCORE_METADATA_FIELDS = ['composer', 'workTitle', 'movementNumber', 'movementTitle', 'workNumber', 'poet', 'lyricist', 'arranger', 'copyright', 'creationDate', 'mscVersion', 'platform', 'source', 'translator']#
Default fields available in the File -> Score Properties… menu.
- ms3.utils.constants.VERSION_COLUMNS = ['musescore', 'ms3_version']#
Software versions
- ms3.utils.constants.MUSESCORE_HEADER_FIELDS = ['title_text', 'subtitle_text', 'lyricist_text', 'composer_text', 'part_name_text']#
Default text fields in MuseScore
- ms3.utils.constants.AUTOMATIC_COLUMNS = ['TimeSig', 'KeySig', 'last_mc', 'last_mn', 'length_qb', 'last_mc_unfolded', 'last_mn_unfolded', 'length_qb_unfolded', 'volta_mcs', 'all_notes_qb', 'n_onsets', 'n_onset_positions', 'guitar_chord_count', 'form_label_count', 'label_count', 'annotated_key', 'musescore', 'ms3_version', 'has_drumset', 'ambitus', 'subdirectory', 'rel_path']#
This combination of column names is excluded when updating metadata fields in MuseScore files via ms3 metadata.
- ms3.utils.constants.METADATA_COLUMN_ORDER = ['piece', 'TimeSig', 'KeySig', 'last_mc', 'last_mn', 'length_qb', 'last_mc_unfolded', 'last_mn_unfolded', 'length_qb_unfolded', 'volta_mcs', 'all_notes_qb', 'n_onsets', 'n_onset_positions', 'guitar_chord_count', 'form_label_count', 'label_count', 'annotated_key', 'harmony_version', 'annotators', 'reviewers', 'score_integrity', 'composed_start', 'composed_end', 'composed_source', 'composer', 'workTitle', 'movementNumber', 'movementTitle', 'workNumber', 'poet', 'lyricist', 'arranger', 'copyright', 'creationDate', 'mscVersion', 'platform', 'source', 'translator', 'title_text', 'subtitle_text', 'lyricist_text', 'composer_text', 'part_name_text', 'musescore', 'ms3_version', 'subdirectory', 'rel_path', 'has_drumset', 'ambitus']#
The default order in which columns of metadata.tsv files are to be sorted.
- ms3.utils.constants.STANDARD_NAMES = ['notes_and_rests', 'rests', 'notes', 'measures', 'events', 'labels', 'chords', 'expanded', 'harmonies', 'cadences', 'form_labels', 'MS3', 'scores']#
listIndicators for corpora: If a folder contains any file or folder beginning or ending on any of these names, it is considered to be a corpus by the functioniterate_corpora().
- ms3.utils.constants.DCML_REGEX = re.compile('\n^(\\.?\n ((?P<globalkey>[a-gA-G](b*|\\#*))\\.)?\n ((?P<localkey>((b*|\\#*)(VII|VI|V|IV|III|II|I|vii|vi|v|iv|iii|ii|i)/?)+)\\.)?\n ((?P<pedal>((b*|\\#*)(VII|VI|V|IV|III|II|I|vii|vi|v|iv|iii, re.VERBOSE)#
strConstant with a regular expression that recognizes labels conforming to the DCML harmony annotation standard excluding those consisting of two alternatives.
- ms3.utils.constants.DCML_DOUBLE_REGEX = re.compile('\n ^(?P<first>\n (\\.?\n ((?P<globalkey>[a-gA-G](b*|\\#*))\\.)?\n , re.VERBOSE)#
strConstant with a regular expression that recognizes complete labels conforming to the DCML harmony annotation standard including those consisting of two alternatives, without having to split them. It is simply a doubled version of DCML_REGEX.
- ms3.utils.constants.FORM_DETECTION_REGEX = '^\\d{1,2}.*?:'#
strFollowing Gotham & Ireland (@ISMIR 20 (2019): “Taking Form: A Representation Standard, Conversion Code, and Example Corpus for Recording, Visualizing, and Studying Analyses of Musical Form”), detects form labels as those strings that start with indicating a hierarchical level (one or two digits) followed by a colon. By extension (Llorens et al., forthcoming), allows one or more ‘i’ characters or any other alphabet character to further specify the level.
- ms3.utils.constants.SLICE_INTERVAL_REGEX = '[\\[\\)]((?:\\d+\\.?\\d*)|(?:\\.\\d+)), ((?:\\d+\\.?\\d*)|(?:\\.\\d+))[\\)\\]]'#
Regular expression for slice interval in open/closed notation and any flavour of floating point numbers, e.g. [0, 1.5) or (.5, 2.]
- ms3.utils.constants.rgba#
alias of
RGBA
ms3.utils.frictionless_helpers module#
- ms3.utils.frictionless_helpers.make_frictionless_schema_descriptor(column_names: Iterable[str], primary_key: Iterable[str] | None = None, **custom_data) dict[source]#
- ms3.utils.frictionless_helpers.make_valid_frictionless_name(name: str, replace_char='_') str[source]#
- ms3.utils.frictionless_helpers.assemble_resource_descriptor(resource_name: str, filepath: str, schema: str | dict, innerpath: str | None = None, **kwargs)[source]#
- ms3.utils.frictionless_helpers.get_truncated_hash(S: str | ~typing.Iterable[str], hash_func=<built-in function openssl_sha1>, length=10) str[source]#
Computes the given hashfunction for the given string(s), and truncates the result.
- Raises:
ValueError – If the hash function cannot be computed for any of the strings in S.
- ms3.utils.frictionless_helpers.get_schema_or_url(facet: str, column_names: Tuple[str], index_levels: Tuple[str] | None = None, base_local_path='/home/docs/checkouts/readthedocs.org/user_builds/ms3/checkouts/latest/frictionless_schemas', base_url='https://raw.githubusercontent.com/DCMLab/frictionless_schemas/main/', **kwargs) str | dict[source]#
Given a facet name (=subfolder) and a tuple of [index column names +] column names, compute an identifier and if that schema exists under
<base_url>/<facet>/<identifier>.schema.yamlreturn that URL, or otherwise create a the frictionless schema descriptor based on the column names, using the descriptions and types the ms3 stores for known column names (seemake_frictionless_schema_descriptor()) and treating unknown columns as string fields. In the latter case, the YAML file is written to<base_local_path>/<facet>/<identifier>.schema.yamlso specify local and remote bases such that the latter can be easily updated with the former. Both types of function outputs can be used for theschemakey in a frictionless table resource descriptor.- Parameters:
facet – Name of the subfolder where to store the schema descriptor.
column_names – Names of the schema fields. Used for generating the hash-based identifier and for creating the actual frictionless descriptor based on known field names. Unknown field names are assumed to be strings.
index_levels – Additional index column names prepended to the column names. They are specified separately because in the frictionless schema they are declared under
primaryKey, meaning that the IDs will be validated on the basis of being required and unique.base_local_path – Schema descriptors will be created locally under
<base_local_path>/<facet>/<identifier>.schema.yaml, unless the schema is found online. The purpose of this is to allow for easy updating of the online repository by setting this argument to a local clone. The default value is<ms3>/frictionless_schemas/, a submodule (if initialized) corresponding to DCMLab/frictionless_schemas.base_url – If schema descriptor is found at``<base_url>/<facet>/<identifier>.schema.yaml``, the function returns the URL rather than the descriptor dict.
**kwargs – Arbitrary key-value pairs that will be added to the frictionless schema descriptor as “custom” metadata.
Returns:
- ms3.utils.frictionless_helpers.get_schema(df: DataFrame, facet: str, include_index_levels: bool = False, base_local_path='/home/docs/checkouts/readthedocs.org/user_builds/ms3/checkouts/latest/frictionless_schemas', base_url='https://raw.githubusercontent.com/DCMLab/frictionless_schemas/main/', **kwargs) dict | str[source]#
Given a dataframe and a facet name, return a frictionless schema descriptor for the dataframe. If the schema with the exact same sequence of columns (and index levels) is accessible online at
base_url/facet/<identifier>.schema.yaml, return that URL, otherwise return the descriptor itself as a dict. In both cases, the schema is stored atbase_local_path/facet/<identifier>.schema.yamlif it does not exist.- Parameters:
df – Dataframe to create a schema for.
facet – Facet that the dataframe describes, used as subfolder and added as custom metadata to the schema.
include_index_levels – If False (default), the index levels are not described, assuming that they will not be written to disk (otherwise, validation error). Set to True to add all index levels to the described columns and, in addition, to make them the
primaryKey(which, in frictionless, implies the constraints “required” & “unique”).base_local_path – Schema descriptors will be created locally under
<base_local_path>/<facet>/<identifier>.schema.yaml, unless the schema is found online. The purpose of this is to allow for easy updating of the online repository by setting this argument to a local clone. The default value is<ms3>/frictionless_schemas/, a submodule (if initialized) corresponding to DCMLab/frictionless_schemas.base_url – If schema descriptor is found at``<base_url>/<facet>/<identifier>.schema.yaml``, the function returns the URL rather than the descriptor dict.
**kwargs – Arbitrary key-value pairs that will be added to the frictionless schema descriptor as “custom” metadata.
Returns:
- ms3.utils.frictionless_helpers.store_as_json_or_yaml(descriptor_dict: dict, descriptor_path: str, logger=None)[source]#
- ms3.utils.frictionless_helpers.make_resource_descriptor(df: DataFrame, piece_name: str, facet: str, filepath: str | None = None, innerpath: str | None = None, include_index_levels: bool = False, **kwargs) dict[source]#
Given a dataframe and information about resource name and type and relative paths, create a frictionless resource descriptor.
- Parameters:
df – Dataframe to be described. It is assumed (that is, blindly trusted) that it corresponds, or will correspond, to the file at
filepath(orinnerpathif zipped, see below).piece_name – Will be combined with facet to form the resource name.
facet – Will be combined with piece_name to form the resource name. Specified separately because relevant for
schema. (the)
filepath – The relative path to the resource stored on disk, relative to the descriptor’s location. Defaults to “<piece_name>.<facet>.tsv”. Can be a path to a ZIP file, in which case the resource is stored in the ZIP file at
innerpath.innerpath – If
filepathis a ZIP file, the resource is stored in the ZIP file atinnerpath. Defaults to “<piece_name>.<facet>.tsv”.**kwargs – Additional keyword arguments written as metadata into the descriptor.
- Returns:
A frictionless resource descriptor dictionary.
- ms3.utils.frictionless_helpers.make_and_store_resource_descriptor(df: DataFrame, directory: str, piece_name: str, facet: str, filepath: str | None = None, innerpath: str | None = None, descriptor_extension: Literal['json', 'yaml'] = 'json', include_index_levels: bool = False, logger=None, **kwargs) str[source]#
Make a resource descriptor for a given dataframe, store it to disk, and return the filepath.
- Parameters:
df – Dataframe to be described.
directory – Where to store the descriptor file.
piece_name – Will be combined with facet to form the resource name.
facet – Will be combined with piece_name to form the resource name. Specified separately because relevant for
schema. (the)
filepath – The relative path to the resource stored on disk, relative to the descriptor’s location. Defaults to “<piece_name>.<facet>.tsv”. Can be a path to a ZIP file, in which case the resource is stored in the ZIP file at
innerpath.innerpath – If
filepathis a ZIP file, the resource is stored in the ZIP file atinnerpath. Defaults to “<piece_name>.<facet>.tsv”.**kwargs – Additional keyword arguments written as metadata into the descriptor.
- Returns:
The path to the stored descriptor file.
- ms3.utils.frictionless_helpers.validate_descriptor_at_path(descriptor_path: str, raise_exception: bool = True, write_or_remove_errors_file: bool = True, logger=None) Report[source]#
- ms3.utils.frictionless_helpers.make_and_store_and_validate_resource_descriptor(df: DataFrame, directory: str, piece_name: str, facet: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], filepath: str | None = None, innerpath: str | None = None, descriptor_extension: Literal['json', 'yaml'] = 'json', include_index_levels: bool = False, raise_exception: bool = True, write_or_remove_errors_file: bool = True, logger=None, **kwargs) Report[source]#
Make a resource descriptor for a given dataframe, store it to disk, and return a validation report.
- Parameters:
df – Dataframe to be described.
directory – Where to store the descriptor file.
piece_name – Will be combined with facet to form the resource name.
facet – Will be combined with piece_name to form the resource name. Specified separately because relevant for
schema. (the)
filepath – The relative path to the resource stored on disk, relative to the descriptor’s location. Defaults to “<piece_name>.<facet>.tsv”. Can be a path to a ZIP file, in which case the resource is stored in the ZIP file at
innerpath.innerpath – If
filepathis a ZIP file, the resource is stored in the ZIP file atinnerpath. Defaults to “<piece_name>.<facet>.tsv”.include_index_levels – If False (default), the index levels are not described, assuming that they will not be written to disk (otherwise, validation error). Set to True to add all index levels to the described columns and, in addition, to make them the
primaryKey(which, in frictionless, implies the constraints “required” & “unique”). In order to include the index levels as columns, but as primaryKey, simply passdf.reset_index()to the function.raise_exception – If True (default) raise if the resource is not valid. Only relevant when frictionless=True
(i.e.
default). (by)
write_or_remove_errors_file – If True (default) write a .errors file if the resource is not valid, otherwise remove it if it exists. Only relevant when frictionless=True (i.e., by default).
**kwargs – Additional keyword arguments written as metadata into the descriptor.
- Returns:
A frictionless validation report.
report.validreturns a boolean that is True if successfully validated.
- ms3.utils.frictionless_helpers.is_range_index_equivalent(idx: Index) bool[source]#
Check if a given index is a RangeIndex with the same start, stop, and step as the default RangeIndex.
- ms3.utils.frictionless_helpers.store_dataframe_resource(df: DataFrame, directory: str, piece_name: str, facet: str, pre_process: bool = True, zipped: bool = False, frictionless: bool = True, descriptor_extension: Literal['json', 'yaml', None] = 'json', raise_exception: bool = True, write_or_remove_errors_file: bool = True, logger=None, custom_metadata: dict = None, **kwargs) str | None[source]#
Write a DataFrame to a TSV or CSV file together with its frictionless resource descriptor. If the resource comes with a single RangeIndex level, the index will be omitted from the TSV and the descriptor. If it comes with more than one level (a MultiIndex) the levels will be included as the left-most columns and declared as “primaryKey” in the descriptor. Uses:
write_tsv()- Parameters:
df – DataFrame to write to disk and to store a descriptor for (if default
frictionless=True).directory – Where to write the file(s).
piece_name – Name of the piece, used for the file name(s).
facet – Name of the facet, used for the file name(s).
pre_process – By default, DataFrame cells containing lists and tuples will be transformed to strings and Booleans will be converted to 0 and 1 (otherwise they will be written out as True and False). Pass False to prevent.
zipped – If set to True, the TSV file will be written into a zip archive called
<piece_name>.zip.frictionless – If True (default), a frictionless resource descriptor will be written to disk as well.
raise_exception – If True (default) raise if the resource is not valid. Only relevant when frictionless=True
(i.e.
default). (by)
write_or_remove_errors_file – If True (default) write a .errors file if the resource is not valid, otherwise remove it if it exists. Only relevant when frictionless=True (i.e., by default).
**kwargs – Additional keyword arguments will be passed on to
pandas.DataFrame.to_csv(). Defaults arguments areindex=Falseandsep=' '(assuming extension ‘.tsv’, see above) and, ifzipped=Trueto the corresponding arguments.
- Returns:
If
frictionless=False, the path to the written resource. Iffrictionless=True, the path to the written descriptor or None if it could not be generated.
- ms3.utils.frictionless_helpers.store_dataframes_package(dataframes: DataFrame | Iterable[DataFrame], facets: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'] | Literal['metadata', 'unknown'] | Collection[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'metadata', 'unknown']], directory: str, piece_name: str, pre_process: bool = True, zipped: bool = True, frictionless: bool = True, descriptor_extension: Literal['json', 'yaml', None] = 'json', raise_exception: bool = True, write_or_remove_errors_file: bool = True, logger=None, custom_metadata: dict | None = None)[source]#
Write a DataFrame to a TSV or CSV file together with its frictionless resource descriptor. Uses:
write_tsv()- Parameters:
dataframes – DataFrames to write into the same zip archive forming a datapackage.
facets – Name of the facets, one per given dataframe. Appended to the file names of the TSV files in the form
<piece_name>.<facet>.tsv.directory – Where to create the ZIP file and its descriptor.
piece_name – Name of the piece, used both for the names of ZIP file and the TSV files in includes.
pre_process – By default, DataFrame cells containing lists and tuples will be transformed to strings and Booleans will be converted to 0 and 1 (otherwise they will be written out as True and False). Pass False to prevent.
zipped – If set to False, the TSV file will not be written into a zip archive called
<piece_name>.zip.frictionless – If True (default), the package is written together with a frictionless package descriptor JSON/YAML file that includes column schemas of the included TSV files which are used to validate them all at once.
raise_exception – If True (default) raise if the resource is not valid. Only relevant when frictionless=True
(i.e.
default). (by)
write_or_remove_errors_file – If True (default) write a .errors file if the resource is not valid, otherwise remove it if it exists. Only relevant when frictionless=True (i.e., by default).
ms3.utils.functions module#
- class ms3.utils.functions.map_dict[source]#
Bases:
dictSuch a dictionary can be mapped to a Series to replace its values but leaving the values absent from the dict keys intact.
- ms3.utils.functions.assert_all_lines_equal(before, after, original, tmp_file)[source]#
Compares two multiline strings to test equality.
- ms3.utils.functions.assert_dfs_equal(old, new, exclude=[])[source]#
Compares the common columns of two DataFrames to test equality. Uses: nan_eq()
- ms3.utils.functions.ambitus2oneliner(ambitus)[source]#
Turns a
metadata['parts'][staff_id]dictionary into a string.
- ms3.utils.functions.changes2list(changes, sort=True)[source]#
Splits a string of changes into a list of 4-tuples.
Example
>>> changes2list('+#7b5') [('+#7', '+', '#', '7'), ('b5', '', 'b', '5')]
- ms3.utils.functions.changes2tpc(changes, numeral, minor=False, root_alterations=False, logger=None)[source]#
Given a numeral and changes, computes the intervals that the changes represent. Changes do not express absolute intervals but instead depend on the numeral and the mode.
Uses: split_scale_degree(), changes2list()
- Parameters:
changes (
str) – A string of changes following the DCML harmony standard.numeral (
str) – Roman numeral. If it is preceded by accidentals, it depends on the parameter root_alterations whether these are taken into account.minor (
bool, optional) – Set to true if the numeral occurs in a minor context.root_alterations (
bool, optional) – Set to True if accidentals of the root should change the result.
- ms3.utils.functions.check_labels(df, regex, column='label', split_regex=None, return_cols=['mc', 'mc_onset', 'staff', 'voice'])[source]#
Checks the labels in
columnagainstregexand returns those that don’t match.- Parameters:
df (
pandas.DataFrame) – DataFrame containing a column with labels.regex (
str) – Regular expression that incorrect labels don’t match.column (
str, optional) – Column name where the labels are. Defaults to ‘label’split_regex (
str, optional) – If you pass a regular expression (or simple string), it will be used to split the labels before checking the resulting column separately. Instead, pass True to use the default (a ‘-’ that does not precede a scale degree).return_cols (
list, optional) – Pass a list of the DataFrame columns that you want to be displayed for the wrong labels.
- Returns:
df – DataFrame with wrong labels.
- Return type:
- ms3.utils.functions.color2rgba(c)[source]#
Pass a RGB or RGBA tuple, HTML color or name to convert it to RGBA
- ms3.utils.functions.color_name2format(n, format='rgb')[source]#
Converts a single CSS3 name into one of ‘HTML’, ‘rgb’, or ‘rgba’
- ms3.utils.functions.color_params2rgba(color_name=None, color_html=None, color_r=None, color_g=None, color_b=None, color_a=None, logger=None)[source]#
For functions where the color can be specified in four different ways (HTML string, CSS name, RGB, or RGBA), convert the given parameters to RGBA.
- Parameters:
color_name (
str, optional) – As a name you can use CSS colors or MuseScore colors (seeMS3_COLORS).color_html (
str, optional) – An HTML color needs to be string of length 6.color_r (
int, optional) – If you specify the color as RGB(A), you also need to specify color_g and color_b.color_g (
int, optional) – If you specify the color as RGB(A), you also need to specify color_r and color_b.color_b (
int, optional) – If you specify the color as RGB(A), you also need to specify color_r and color_g.color_a (
int, optional) – If you have specified an RGB color, the alpha value defaults to 255 unless specified otherwise.
- Returns:
namedtuplewith four integers.- Return type:
rgba
- ms3.utils.functions.commonprefix(paths, sep='/')[source]#
Returns common prefix of a list of paths. Uses: allnamesequal(), itertools.takewhile()
- ms3.utils.functions.compute_mn(measures: DataFrame) Series[source]#
Compute measure number integers from a measures table.
- Parameters:
measures – Measures table with columns [‘mc’, ‘dont_count’, ‘numbering_offset’].
Returns:
- ms3.utils.functions.compute_mn_playthrough(measures: DataFrame, logger=None) Series[source]#
Compute measure number strings from an unfolded measures table, such that the first occurrence of a measure number ends on ‘a’, the second one on ‘b’ etc.
The function requires the column ‘dont_count’ in order to correctly number the return of a completing MC after an incomplete MC with “endrepeat” sign. For example, if a repeated section begins with an upbeat that at first completes MN 16 it will have mn_playthrough ‘16a’ the first time and ‘32a’ the second time (assuming it completes the incomplete MN 32).
- Parameters:
measures – Measures table with columns [‘mc’, ‘mn’, ‘dont_count’]
- Returns:
‘mn_playthrough’ Series of disambiguated measure number strings. If no measure repeats, the result will be equivalent to converting column ‘mn’ to strings and appending ‘a’ to all of them.
- ms3.utils.functions.convert(old, new, MS='mscore', logger=None)[source]#
Calls “MS -fo new old”, which converts old to new with the given MuseScore executable.
- ms3.utils.functions.convert_to_ms4(old, new, MS='mscore', logger=None)[source]#
Calls “MS -fo new old”, which converts old to new with the given MuseScore executable. This function offers a workaround for MuseScore 4’s half-baked commandline conversion.
- ms3.utils.functions.convert_folder(directory=None, file_paths=None, target_dir=None, extensions=[], target_extension='mscx', regex='.*', suffix=None, recursive=True, ms='mscore', overwrite=False, parallel=False, logger=None)[source]#
Convert all files in dir that have one of the extensions to .mscx format using the executable MS.
- Parameters:
directory (
str) – Directory in which to look for files to convert.file_paths (
listof dir) – List of file paths to convert. These are not filtered by any means.target_dir (
str) – Directory where to store converted files. Defaults todirectoryextensions (list, optional) – If you want to convert only certain formats, give those, e.g. [‘mscz’, ‘xml’]
recursive (bool, optional) – Subdirectories as well.
MS (str, optional) – Give the path to the MuseScore executable on your system. Need only if the command ‘mscore’ does not execute MuseScore on your system.
- ms3.utils.functions.convert_from_metadata_tsv(directory=None, file_paths=None, target_dir=None, extensions=[], target_extension='mscx', regex='.*', suffix=None, recursive=True, ms='mscore', overwrite=False, parallel=False, logger=None)[source]#
- ms3.utils.functions.decode_harmonies(df, label_col='label', keep_layer=True, return_series=False, alt_cols='alt_label', alt_separator='-', logger=None)[source]#
MuseScore stores types 2 (Nashville) and 3 (absolute chords) in several columns. This function returns a copy of the DataFrame
Annotations.dfwhere the label column contains the strings corresponding to these columns.- Parameters:
df (
pandas.DataFrame) – DataFrame with encoded harmony labels as stored in anAnnotationsobject.label_col (
str, optional) – Column name where the main components (<name> tag) are stored, defaults to ‘label’keep_layer (
bool, optional) – Defaults to True, retaining the ‘harmony_layer’ column with original layers.return_series (
bool, optional) – If set to True, only the decoded labels column is returned as a Series rather than a copy ofdf.alt_cols (
strorlist, optional) – Column(s) with alternative labels that are joined with the label columns usingalt_separator. Defaults to ‘alt_label’. Suppress by passing None.alt_separator (
str, optional) – Separator for joiningalt_cols.
- Returns:
Decoded harmony labels.
- Return type:
- ms3.utils.functions.df2md(df: DataFrame, name: str = 'Overview') MarkdownTableWriter[source]#
Alias for
dataframe2markdown().
- ms3.utils.functions.dataframe2markdown(df: DataFrame, name: str | None = None) MarkdownTableWriter[source]#
Turns a DataFrame into a MarkDown table. The returned writer can be converted into a string.
- ms3.utils.functions.dict2oneliner(d: dict) str[source]#
Turns a dictionary into a single-line string without brackets.
- ms3.utils.functions.resolve_form_abbreviations(token: str, abbreviations: dict, mc: int | str | None = None, fallback_to_lowercase: bool = True, logger=None) str[source]#
Checks for each consecutive substring of the token if it matches one of the given abbreviations and replaces it with the corresponding long name. Trailing numbers are separated by a space in this case.
- Parameters:
token – Individual token after splitting alternative readings.
abbreviations – {abbreviation -> long name} dict for string replacement.
fallback_to_lowercase – By default, the substrings are checked against the dictionary keys and, if unsuccessful, again in lowercase. Pass False to use only the original string.
Returns:
- ms3.utils.functions.distribute_tokens_over_levels(levels: Collection[str], tokens: Collection[str], mc: int | str | None = None, abbreviations: dict = {}, logger=None) Dict[Tuple[str, str], str][source]#
Takes the regex matches of one label and turns them into as many {layer -> token} pairs as the label contains tokens.
- Parameters:
levels – Collection of strings indicating analytical layers.
tokens – Collection of tokens coming along, same size as levels.
mc – Pass the label’s label’s MC to display it in error messages.
abbreviations – {abbrevation -> long name} mapping abbreviations to what they are to be replaced with
- Returns:
A {(form_tree, level) -> token} dict where form_tree is either ‘’ or a letter between a-h identifying one of several trees annotated in parallel.
- ms3.utils.functions.expand_single_form_label(label: str, default_abbreviations=True, **kwargs) Dict[Tuple[str, str], str][source]#
Splits a form label and applies distribute_tokens_over_levels()
- Parameters:
label – Complete form label including indications of analytical layer(s).
default_abbreviations – By default, each token component is checked against a mapping from abbreviations to long names. Pass False to prevent that.
**kwargs – Abbreviation=’long name’ mappings to resolve individual abbreviations
- Returns:
A DataFrame with one column added per hierarchical layer of analysis, starting from level 0.
- ms3.utils.functions.expand_form_labels(fl: DataFrame, fill_mn_until: int = None, default_abbreviations=True, logger=None, **kwargs) DataFrame[source]#
Expands form labels into a hierarchical view of levels in a table.
- Parameters:
fl – A DataFrame containing raw form labels as retrieved from
ms3.Score.mscx.form_labels().fill_mn_until – Pass the last measure number if you want every measure of the piece have a row in the tree view, even if it doesn’t come with a form label. This may be desired for increased intuition of proportions, rather than seeing all form labels right below each other. In order to add the empty rows, even without knowing the number of measures, pass -1.
default_abbreviations – By default, each token component is checked against a mapping from abbreviations to long names. Pass False to prevent that.
**kwargs – Abbreviation=’long name’ mappings to resolve individual abbreviations
- Returns:
A DataFrame with one column added per hierarchical layer of analysis, starting from level 0.
- ms3.utils.functions.add_collections(left: Series, right: Collection, dtype: Dtype) Series[source]#
- ms3.utils.functions.add_collections(left: ndarray[Any, dtype[_ScalarType_co]], right: Collection, dtype: Dtype) ndarray[Any, dtype[_ScalarType_co]]
- ms3.utils.functions.add_collections(left: list, right: Collection, dtype: Dtype) list
- ms3.utils.functions.add_collections(left: tuple, right: Collection, dtype: Dtype) tuple
Zip-adds together the strings (by default) contained in two collections regardless of their types (think of adding two columns together element-wise). Pass another
dtypeif you want the values to be converted to another datatype before adding them together.
- ms3.utils.functions.cast2collection(coll: Series, func: Callable, *args, **kwargs) Series[source]#
- ms3.utils.functions.cast2collection(coll: ndarray[Any, dtype[_ScalarType_co]], func: Callable, *args, **kwargs) ndarray[Any, dtype[_ScalarType_co]]
- ms3.utils.functions.cast2collection(coll: list, func: Callable, *args, **kwargs) list
- ms3.utils.functions.cast2collection(coll: tuple, func: Callable, *args, **kwargs) tuple
- ms3.utils.functions.fifths2acc(fifths: int) str[source]#
- ms3.utils.functions.fifths2acc(fifths: Series) Series
- ms3.utils.functions.fifths2acc(fifths: ndarray[Any, dtype[int]]) ndarray[Any, dtype[str]]
- ms3.utils.functions.fifths2acc(fifths: List[int]) List[str]
- ms3.utils.functions.fifths2acc(fifths: Tuple[int]) Tuple[str]
Returns accidentals for a stack of fifths that can be combined with a basic representation of the seven steps.
- ms3.utils.functions.fifths2iv(fifths: int, smallest: bool, perfect: str, major: str, minor: str, augmented: str, diminished: str) str | None[source]#
- ms3.utils.functions.fifths2iv(fifths: Series, smallest: bool, perfect: str, major: str, minor: str, augmented: str, diminished: str) Series | None
- ms3.utils.functions.fifths2iv(fifths: ndarray[Any, dtype[int]], smallest: bool, perfect: str, major: str, minor: str, augmented: str, diminished: str) ndarray[Any, dtype[str]] | None
- ms3.utils.functions.fifths2iv(fifths: List[int], smallest: bool, perfect: str, major: str, minor: str, augmented: str, diminished: str) List[str] | None
- ms3.utils.functions.fifths2iv(fifths: Tuple[int], smallest: bool, perfect: str, major: str, minor: str, augmented: str, diminished: str) Tuple[str] | None
Return interval name of a stack of fifths such that 0 = ‘P1’, -1 = ‘P4’, -2 = ‘m7’, 4 = ‘M3’ etc. If you pass
smallest=True, intervals of a fifth or greater will be inverted (e.g. ‘m6’ => ‘-M3’ and ‘D5’ => ‘-A4’).- Parameters:
fifths – Number of fifths representing the inveral
smallest – Pass True if you want to wrap intervals of a fifths and larger to the downward counterpart.
perfect – String representing the perfect interval quality, defaults to ‘P’.
major – String representing the major interval quality, defaults to ‘M’.
minor – String representing the minor interval quality, defaults to ‘m’.
augmented – String representing the augmented interval quality, defaults to ‘a’.
diminished – String representing the diminished interval quality, defaults to ‘d’.
- Returns:
Name of the interval as a string.
- ms3.utils.functions.tpc2name(tpc: int, ms: bool = False, minor: bool = False) str | None[source]#
- ms3.utils.functions.tpc2name(tpc: Series, ms: bool = False, minor: bool = False) Series | None
- ms3.utils.functions.tpc2name(tpc: ndarray[Any, dtype[int]], ms: bool = False, minor: bool = False) ndarray[Any, dtype[str]] | None
- ms3.utils.functions.tpc2name(tpc: List[int], ms: bool = False, minor: bool = False) List[str] | None
- ms3.utils.functions.tpc2name(tpc: Tuple[int], ms: bool = False, minor: bool = False) Tuple[str] | None
Turn a tonal pitch class (TPC) into a name or perform the operation on a collection of integers.
- Parameters:
tpc – Tonal pitch class(es) to turn into a note name.
ms – Pass True if
tpcis a MuseScore TPC, i.e. C = 14minor – Pass True if the string is to be returned as lowercase.
Returns:
- ms3.utils.functions.tpc2scale_degree(tpc: int | Series | ndarray[Any, dtype[int]] | List[int] | Tuple[int], localkey: str, globalkey: str) str | Series | ndarray[Any, dtype[str]] | List[str] | Tuple[str] | None[source]#
For example, tonal pitch class 3 (fifths, i.e. “A”) is scale degree ‘#3’ in the localkey of ‘iv’ within ‘c’ minor.
- Parameters:
fifths – Tonal pitch class(es) to turn into scale degree(s).
localkey – Local key in which the pitch classes are situated, as Roman numeral (can include slash notation
V/ii). (such as)
globalkey – Global key as a note name. E.g. Ab for Ab major, or ‘c#’ for C# minor.
- Returns:
The given tonal pitch class(es), expressed as scale degree(s).
- ms3.utils.functions.fifths2name(fifths: int, midi: int | None, ms: bool, minor: bool) str | None[source]#
- ms3.utils.functions.fifths2name(fifths: Series, midi: Series | None, ms: bool, minor: bool) Series | None
- ms3.utils.functions.fifths2name(fifths: ndarray[Any, dtype[int]], midi: ndarray[Any, dtype[int]] | None, ms: bool, minor: bool) ndarray[Any, dtype[str]] | None
- ms3.utils.functions.fifths2name(fifths: List[int], midi: List[int] | None, ms: bool, minor: bool) List[str] | None
- ms3.utils.functions.fifths2name(fifths: Tuple[int], midi: Tuple[int] | None, ms: bool, minor: bool) Tuple[str] | None
- Return note name of a stack of fifths such that
0 = C, -1 = F, -2 = Bb, 1 = G etc. This is a wrapper of
tpc2name(), that additionally accepts the argumentmidiwhich allows for adding octave information.
- Parameters:
fifths – Tonal pitch class(es) to turn into a note name.
midi – In order to include the octave into the note name, pass the corresponding MIDI pitch(es).
ms – Pass True if
fifthsis a MuseScore TPC, i.e. C = 14minor – Pass True if the string is to be returned as lowercase.
- ms3.utils.functions.fifths2pc(fifths)[source]#
Turn a stack of fifths into a chromatic pitch class. Uses: map2elements()
- ms3.utils.functions.fifths2rn(fifths, minor=False, auto_key=False)[source]#
- Return Roman numeral of a stack of fifths such that
0 = I, -1 = IV, 1 = V, -2 = bVII in major, VII in minor, etc. Uses: map2elements(), is_minor_mode()
- Parameters:
auto_key (
bool, optional) – By default, the returned Roman numerals are uppercase. Pass True to pass upper- or lowercase according to the position in the scale.
- ms3.utils.functions.fifths2sd(fifths, minor=False)[source]#
Return scale degree of a stack of fifths such that 0 = ‘1’, -1 = ‘4’, -2 = ‘b7’ in major, ‘7’ in minor etc. Uses: map2elements(), fifths2str()
- ms3.utils.functions.get_git_commit(repo_path: str, git_revision: str | None, logger=None) Commit | None[source]#
Returns the git commit object for the given revision.
- Parameters:
repo_path
git_revision – Any specifier that git understands (branch, tag, commit hash, “HEAD”, etc.). In addition, “LATEST_VERSION” can be passed to get the tag with the highest version number.
logger
- Returns:
git.Commit object that corresponds to the given revision specifier.
- ms3.utils.functions.get_git_repo(directory: str | Path, search_parent_directories: bool = True, logger: Logger | str | None = None) Repo | None[source]#
- ms3.utils.functions.get_git_revision(repo: Repo | None = None, repo_path: str | None = None) str[source]#
- ms3.utils.functions.get_git_tag(repo: Repo | None, repo_path: str | None, always: Literal[True]) str[source]#
- ms3.utils.functions.get_git_tag(repo: Repo | None, repo_path: str | None, always: Literal[False]) str | None
If always is set to True and no tags are found, the commit short hash is returned instead.
- ms3.utils.functions.get_git_version_info(repo: Repo | None = None, repo_path: str | None = None, only_if_clean: bool = True)[source]#
- ms3.utils.functions.git_repo_is_clean(repo: Repo | None = None, repo_path: str | None = None) bool[source]#
- ms3.utils.functions.get_musescore(MS: str | Literal['auto', 'win', 'mac'] = 'auto', logger=None) str | None[source]#
Tests whether a MuseScore executable can be found on the system. Uses: test_binary()
- Parameters:
MS – A path to the executable, installed command, or one of the keywords {‘auto’, ‘win’, ‘mac’}
- Returns:
Path to the executable if found or None.
- ms3.utils.functions.get_path_component(path, after)[source]#
Returns only the path’s subfolders below
after. Ifafteris the last component, ‘.’ is returned.
- ms3.utils.functions.group_id_tuples(list_of_pairs)[source]#
Turns a list of (key, ix) into a {key: [ix]}
- ms3.utils.functions.html2format(df, format='name', html_col='color_html')[source]#
Converts the HTML column of a DataFrame into ‘name’, ‘rgb , or ‘rgba’.
- ms3.utils.functions.html_color2format(h, format='name')[source]#
Converts a single HTML color into ‘name’, ‘rgb’, or ‘rgba’.
- ms3.utils.functions.html_color2name(h)[source]#
Converts a HTML color into its CSS3 name or itself if there is none.
- ms3.utils.functions.interval_overlap(a, b, closed=None)[source]#
Returns the overlap of two pd.Intervals as a new pd.Interval.
- Parameters:
a (
pandas.Interval) – Intervals for which to compute the overlap.b (
pandas.Interval) – Intervals for which to compute the overlap.closed ({'left', 'right', 'both', 'neither'}, optional) – If no value is passed, the closure of the returned interval is inferred from
aandb.
- Return type:
- ms3.utils.functions.interval_overlap_size(a, b, decimals=3)[source]#
Returns the size of the overlap of two pd.Intervals.
- ms3.utils.functions.is_any_row_equal(df1, df2)[source]#
Returns True if any two rows of the two DataFrames contain the same value tuples.
- ms3.utils.functions.is_minor_mode(fifths, minor=False)[source]#
Returns True if the scale degree fifths naturally has a minor third in the scale.
- ms3.utils.functions.iter_nested(nested)[source]#
Iterate through any nested structure of lists and tuples from left to right.
- ms3.utils.functions.iter_selection(collectio, selector=None, opposite=False)[source]#
Returns a generator of
collectio.selectorcan be a collection of index numbers to select or unselect elements – depending onopposite
- ms3.utils.functions.first_level_subdirs(path)[source]#
Returns the directory names contained in path.
- ms3.utils.functions.first_level_files_and_subdirs(path)[source]#
Returns the directory names and filenames contained in path.
- ms3.utils.functions.get_first_level_corpora(path: str, logger=None) List[str][source]#
Checks the first-level subdirectories of path for indicators of being a corpus. If one of them shows an indicator (presence of a ‘metadata.tsv’ file, or of a ‘.git’ folder or any of the default folder names), returns a list of all subdirectories.
- ms3.utils.functions.join_tsvs(dfs, sort_cols=False, logger=None)[source]#
Performs outer join on the passed DataFrames based on ‘mc’ and ‘mc_onset’, if any. Uses: functools.reduce(), sort_cols(), sort_note_lists()
- Parameters:
dfs (
Collection) – Collection of DataFrames to join.sort_cols (
bool, optional) – If you pass True, the columns after those defined inSTANDARD_COLUMN_ORDERwill be sorted alphabetically.
- ms3.utils.functions.eval_string_to_nested_list(s)[source]#
Tries to parse a string encoding a nested list, returns the input if it fails.
- ms3.utils.functions.parse_interval_index_column(df, column=None, closed='left')[source]#
Turns a column of strings in the form ‘[0.0, 1.1)’ into a
pandas.IntervalIndex.- Parameters:
df (
pandas.DataFrame)column (
str, optional) – Name of the column containing strings. If not specified, use the index.closed (
str, optional) – On whot side the intervals should be closed. Defaults to ‘left’.
- Return type:
- ms3.utils.functions.load_tsv(path, index_col=None, sep='\t', converters={}, dtype={}, stringtype=False, **kwargs) DataFrame | None[source]#
Loads the TSV file path while applying correct type conversion and parsing tuples.
- Parameters:
path (
str) – Path to a TSV file as output by format_data().index_col (
list, optional) – By default, the first two columns are loaded as MultiIndex. The first level distinguishes pieces and the second level the elements within.converters (
dict, optional) – Enhances or overwrites the mapping from column names to types included the constants.dtype (
dict, optional) – Enhances or overwrites the mapping from column names to types included the constants.stringtype (
bool, optional) – If you’re using pandas >= 1.0.0 you might want to set this to True in order to be using the new string datatype that includes the new null type pd.NA.
- ms3.utils.functions.make_csvw_jsonld(title: str, columns: Collection[str], urls: str | Collection[str], description: str | None = None) dict[source]#
W3C’s CSV on the Web Primer: https://www.w3.org/TR/tabular-data-primer/
- ms3.utils.functions.store_csvw_jsonld(corpus: str, folder: str, facet: str, columns: Collection[str], files: str | Collection[str]) str[source]#
- ms3.utils.functions.make_continuous_offset_series(measures: DataFrame, quarters: bool = True, negative_anacrusis: Fraction | None = None, logger: Logger | str | None = None) Series[source]#
Accepts a measure table without ‘quarterbeats’ column and computes each MC’s offset from the piece’s beginning. Deal with voltas before passing the table.
If you need an offset_dict and the measures already come with a ‘quarterbeats’ column, you can call
make_offset_dict_from_measures().- Parameters:
measures – A measures table with ‘normal’ RangeIndex containing the column ‘act_durs’ and one of ‘mc’ or ‘mc_playthrough’ (if repeats were unfolded).
quarters – By default, the continuous offsets are expressed in ♩. Pass false to leave them as fractions of a whole note.
negative_anacrusis – By default, the first value is 0. If you pass a fraction here, the first value will be its negative and the second value will be 0.
logger
- Returns:
Cumulative sum of the actual durations, shifted down by 1. Compared to the original DataFrame it has length + 2 because it adds the end value twice, once with the next index value, and once with the index ‘end’. Otherwise the end value would be lost due to the shifting.
- ms3.utils.functions.make_offset_dict_from_measures(measures: DataFrame, all_endings: bool = False) dict[source]#
Turn a measure table that comes with a ‘quarterbeats’ column into a dictionary that maps MCs (measure counts) to their quarterbeat offset from the piece’s beginning, used for computing quarterbeats for other facets.
This function is used for the default case. If you need more options, e.g. an offset dict from unfolded measures or expressed in whole notes or with negative anacrusis, use
make_continuous_offset_series()instead.- Parameters:
measures – Measures table containing a ‘quarterbeats’ column.
all_endings – Uses the column ‘quarterbeats_all_endings’ of the measures table if it has one, otherwise falls back to the default ‘quarterbeats’.
- Returns:
{MC -> quarterbeat_offset}. Offsets are Fractions. If
all_endingsis not set toTrue, values for MCs that are part of a first ending (or third or larger) are NA.
- ms3.utils.functions.make_id_tuples(key, n)[source]#
For a given key, this function returns index tuples in the form [(key, 0), …, (key, n)]
- Returns:
indices in the form [(key, 0), …, (key, n)]
- Return type:
- ms3.utils.functions.make_interval_index_from_breaks(S, end_value=None, closed='left', name='interval', logger=None)[source]#
Interpret a Series as interval breaks and make an IntervalIndex out of it.
- Parameters:
S (
pandas.Series) – Interval breaks. It is assumed that the breaks are sorted.end_value (numeric, optional) – Often you want to pass the right border of the last interval.
closed (
str, optional) – Defaults to ‘left’. Argument passed to topandas.IntervalIndex.from_breaks().name (
str, optional) – Name of the created index. Defaults to ‘interval’.
- Return type:
- ms3.utils.functions.make_name_columns(df)[source]#
Relies on the columns
localkeyandglobalkeyto transform the columnsrootandbass_notesfrom scale degrees (expressed as fifths) to absolute note names, e.g. in C major: 0 => ‘C’, 7 => ‘C#’, -5 => ‘Db’ Uses: transform(), scale_degree2name
- ms3.utils.functions.make_playthrough2mc(measures: DataFrame, logger=None) Series | None[source]#
Turns the column ‘next’ into a mapping of playthrough_mc -> mc.
- ms3.utils.functions.make_playthrough_info(measures: DataFrame, logger=None) DataFrame | Series | None[source]#
Turns a measures table into a DataFrame or Series that can be passed as argument to
unfold_repeats(). The return type is DataFrame if the unfolded measures table contains an ‘mn_playthrough’ column, otherwise it is equal to the result ofmake_playthrough2mc(). Hence, the purpose of the function is to add an ‘mn_playthrough’ column to unfolded facets whenever possible.
- ms3.utils.functions.map2elements(e, f, *args, **kwargs)[source]#
If e is an iterable, f is applied to all elements.
- ms3.utils.functions.merge_ties(df, return_dropped=False, perform_checks=True, logger=None)[source]#
- In a note list, merge tied notes to single events with accumulated durations.
Input dataframe needs columns [‘duration’, ‘tied’, ‘midi’, ‘staff’]. This function does not handle correctly overlapping ties on the same pitch since it doesn’t take into account the notational layers (‘voice’).
- Parameters:
df
return_dropped
- ms3.utils.functions.merge_chords_and_notes(chords_table: DataFrame, notes_table: DataFrame) DataFrame[source]#
Performs an outer join between a chords table and a notes table, based on the column ‘chord_id’. If the chords come with an ‘event’ column, all chord events matched with at least one note will be renamed to ‘Note’. Markup displayed in individual rows (‘Dynamic’, ‘Spanner’, ‘StaffText’, ‘SystemText’, ‘Tempo’, ‘FiguredBass’), are/remain placed before the note(s) with the same onset. Markup showing up in a Chord event’s row (e.g. a Spanner ID) will be duplicated for each note pertaining to that
chord,
i.e., only for notes in the same staff and voice.
- Parameters:
chords_table
notes_table
- Returns:
Merged DataFrame.
- ms3.utils.functions.metadata2series(metadata: dict) Series[source]#
Turns a metadata dict into a pd.Series() (for storing in a DataFrame) Uses: ambitus2oneliner(), dict2oneliner(), parts_info()
- Returns:
A series allowing for storing metadata as a row of a DataFrame.
- Return type:
- ms3.utils.functions.midi_and_tpc2octave(midi: int, tpc: int) int[source]#
- ms3.utils.functions.midi_and_tpc2octave(midi: Series, tpc: Series) Series
- ms3.utils.functions.midi_and_tpc2octave(midi: ndarray[Any, dtype[int]], tpc: ndarray[Any, dtype[int]]) ndarray[Any, dtype[int]]
- ms3.utils.functions.midi_and_tpc2octave(midi: List[int], tpc: List[int]) List[int]
- ms3.utils.functions.midi_and_tpc2octave(midi: Tuple[int], tpc: Tuple[int]) Tuple[int]
- ms3.utils.functions.midi2octave(midi: int, fifths: int | None) int[source]#
- ms3.utils.functions.midi2octave(midi: Series, fifths: Series | None) Series
- ms3.utils.functions.midi2octave(midi: ndarray[Any, dtype[int]], fifths: ndarray[Any, dtype[_ScalarType_co]] | None) ndarray[Any, dtype[int]]
- ms3.utils.functions.midi2octave(midi: List[int], fifths: List[int] | None) List[int]
- ms3.utils.functions.midi2octave(midi: Tuple[int], fifths: Tuple[int] | None) Tuple[int]
- For a given MIDI pitch, calculate the octave. Middle octave = 4
Uses: midi_and_tpc2octave(), map2elements()
- ms3.utils.functions.mn2int(mn_series)[source]#
Turn a series of measure numbers parsed as strings into two integer columns ‘mn’ and ‘volta’.
- ms3.utils.functions.name2format(df, format='html', name_col='color_name')[source]#
Converts a column with CSS3 names into ‘html’, ‘rgb’, or ‘rgba’.
- ms3.utils.functions.name2fifths(nn, logger=None)[source]#
Turn a note name such as Ab into a tonal pitch class, such that -1=F, 0=C, 1=G etc. Uses: split_note_name()
- ms3.utils.functions.name2pc(nn, logger=None)[source]#
Turn a note name such as Ab into a tonal pitch class, such that -1=F, 0=C, 1=G etc. Uses: split_note_name()
- ms3.utils.functions.nan_eq(a, b)[source]#
Returns True if a and b are equal or both null. Works on two Series or two elements.
- ms3.utils.functions.next2sequence(next_col: Series, logger=None) List[int] | None[source]#
Turns a ‘next’ column into the correct sequence of MCs corresponding to unfolded repetitions. Requires that the Series’ index be the MCs as in
measures.set_index('mc').next.
- ms3.utils.functions.no_collections_no_booleans(df: DataFrame, collection_columns: Collection[str] | None = None, boolean_columns: Collection[str] | None = None, logger=None)[source]#
Cleans the DataFrame columns [‘next’, ‘chord_tones’, ‘added_tones’, ‘volta_mcs] from tuples and the columns [‘globalkey_is_minor’, ‘localkey_is_minor’] from booleans, converting them all to integers
- ms3.utils.functions.parts_info(d)[source]#
Turns a (nested)
metadata['parts']dict into a flat dict based on staves.Example
>>> d = s.mscx.metadata >>> parts_info(d['parts']) {'staff_1_instrument': 'Voice', 'staff_1_ambitus': '66-76 (F#4-E5)', 'staff_2_instrument': 'Voice', 'staff_2_ambitus': '55-69 (G3-A4)', 'staff_3_instrument': 'Voice', 'staff_3_ambitus': '48-67 (C3-G4)', 'staff_4_instrument': 'Voice', 'staff_4_ambitus': '41-60 (F2-C4)'}
- ms3.utils.functions.path2type(path, logger=None)[source]#
Determine a file’s type by scanning its path for default components in the constant STANDARD_NAMES.
- Parameters:
path
- ms3.utils.functions.pretty_dict(ugly_dict: dict, heading_key: str = None, heading_value: str = None) str[source]#
Turns a dictionary into a string where the keys are printed in a column, separated by ‘->’.
- ms3.utils.functions.resolve_dir(d)[source]#
Resolves ‘~’ to HOME directory and turns
dinto an absolute path.
- ms3.utils.functions.rgb2format(df, format='html', r_col='color_r', g_col='color_g', b_col='color_b')[source]#
Converts three RGB columns into a color_html or color_name column.
- ms3.utils.functions.rgb_tuple2format(t, format='html')[source]#
Converts a single RGB tuple into ‘HTML’ or ‘name’.
- ms3.utils.functions.rgb_tuple2name(t)[source]#
Converts a single RGB tuple into its CSS3 name or to HTML if there is none.
- ms3.utils.functions.roman_numeral2fifths(rn, global_minor=False, logger=None)[source]#
Turn a Roman numeral into a TPC interval (e.g. for transposition purposes). Uses: split_scale_degree()
- ms3.utils.functions.roman_numeral2semitones(rn, global_minor=False, logger=None)[source]#
Turn a Roman numeral into a semitone distance from the root (0-11). Uses: split_scale_degree()
- ms3.utils.functions.scale_degree2name(fifths: int, localkey: str, globalkey: str) str[source]#
- ms3.utils.functions.scale_degree2name(fifths: Series, localkey: str, globalkey: str) Series
- ms3.utils.functions.scale_degree2name(fifths: ndarray[Any, dtype[int]], localkey: str, globalkey: str) ndarray[Any, dtype[str]]
- ms3.utils.functions.scale_degree2name(fifths: List[int], localkey: str, globalkey: str) List[str]
- ms3.utils.functions.scale_degree2name(fifths: Tuple[int], localkey: str, globalkey: str) Tuple[str]
For example, scale degree -1 (fifths, i.e. the subdominant) of the localkey of ‘VI’ within ‘e’ minor is ‘F’.
- Parameters:
fifths – Scale degree expressed as distance from the tonic in fifths.
localkey – Local key in which the scale degree is situated, as Roman numeral (can include slash notation such
V/ii). (as)
globalkey – Global key as a note name. E.g. Ab for Ab major, or ‘c#’ for C# minor.
- Returns:
The given scale degree(s), expressed as a note name(s).
- ms3.utils.functions.scan_directory(directory: str, file_re: str = '.*', folder_re: str = '.*', exclude_re: str = '^(\\.|_)', recursive: bool = True, subdirs: bool = False, progress: bool = False, exclude_files_only: bool = False, return_metadata: bool = False, logger=None) Iterator[str | Tuple[str, str]][source]#
Generator of filtered file paths in
directory.- Parameters:
directory – Directory to be scanned for files.
file_re – Regular expressions for filtering certain file names or folder names. The regEx are checked with search(), not match(), allowing for fuzzy search.
folder_re – Regular expressions for filtering certain file names or folder names. The regEx are checked with search(), not match(), allowing for fuzzy search.
exclude_re – Exclude files and folders (unless
exclude_files_only=True) containing this regular expression.recursive – By default, sub-directories are recursively scanned. Pass False to scan only
dir.subdirs – By default, full file paths are returned. Pass True to return (path, name) tuples instead.
progress – Pass True to display the progress (useful for large directories).
exclude_files_only – By default,
exclude_reexcludes files and folder. Pass True to exclude only files matching the regEx.return_metadata – If set to True, ‘metadata.tsv’ are always yielded regardless of
file_re.
- Yields:
Full file path or, if
subdirs=True, (path, file_name) pairs in random order.
- ms3.utils.functions.column_order(df, first_cols=None, sort=True)[source]#
Sort DataFrame columns so that they start with the order of
first_cols, followed by those not included.
- ms3.utils.functions.sort_note_list(df, mc_col='mc', mc_onset_col='mc_onset', midi_col='midi', duration_col='duration')[source]#
Sort every measure (MC) by [‘mc_onset’, ‘midi’, ‘duration’] while leaving gracenotes’ order (duration=0) intact.
- Parameters:
df
mc_col
mc_onset_col
midi_col
duration_col
- ms3.utils.functions.sort_tpcs(tpcs, ascending=True, start=None)[source]#
- Sort tonal pitch classes by order on the piano.
Uses: fifths2pc()
- ms3.utils.functions.split_alternatives(df, column='label', regex='-(?!(\\d|b+\\d|\\#+\\d))', max=2, inplace=False, alternatives_only=False, logger=None)[source]#
Splits labels that come with an alternative separated by ‘-’ and adds a new column. Only one alternative is taken into account. df is mutated inplace.
- Parameters:
df (
pandas.DataFrame) – Dataframe where one column contains DCML chord labels.column (
str, optional) – Name of the column that holds the harmony labels.regex (
str, optional) – The regular expression (or simple string) that detects the character combination used to separate alternative annotations. By default, alternatives are separated by a ‘-’ that does not precede a scale degree such as ‘b6’ or ‘3’.max (
int, optional) – Maximum number of admitted alternatives, defaults to 2.inplace (
bool, optional) – Pass True if you want to mutatedf.alternatives_only (
bool, optional) – By default the alternatives are added to the original DataFrame (inplaceor not). Pass True if you just need the split alternatives.
Example
>>> import pandas as pd >>> labels = pd.read_csv('labels.csv') >>> split_alternatives(labels, inplace=True)
- ms3.utils.functions.split_note_name(nn, count=False, logger=None)[source]#
Splits a note name such as ‘Ab’ into accidentals and name.
- ms3.utils.functions.split_scale_degree(sd, count=False, logger=None) Tuple[int | None, str | None][source]#
Splits a scale degree such as ‘bbVI’ or ‘b6’ into accidentals and numeral.
- ms3.utils.functions.transform(df, func, param2col=None, column_wise=False, **kwargs)[source]#
- Compute a function for every row of a DataFrame, using several cols as arguments.
The result is the same as using df.apply(lambda r: func(param1=r.col1, param2=r.col2…), axis=1) but it optimizes the procedure by precomputing func for all occurrent parameter combinations. Uses: inspect.getfullargspec()
- Parameters:
df (
pandas.DataFrameorpandas.Series) – Dataframe containing function parameters.func (
callable) – The result of this function for every row will be returned.param2col (
dictorlist, optional) – Mapping from parameter names of func to column names. If you pass a list of column names, the columns’ values are passed as positional arguments. Pass None if you want to use all columns as positional arguments.column_wise (
bool, optional) – Pass True if you want to mapfuncto the elements of every column separately. This is simply an optimized version of df.apply(func) but allows for naming columns to use as function arguments. If param2col is None,funcis mapped to the elements of all columns, otherwise to all columns that are not named as parameters inparam2col. In the case wherefuncdoes not require a positional first element and you want to pass the elements of the various columns as keyword argument, give it as param2col={‘function_argument’: None}inplace (
bool, optional) – Pass True if you want to mutatedfrather than getting an altered copy.**kwargs (Other parameters passed to
func.)
- ms3.utils.functions.adjacency_groups(S: Series, na_values: str | None = 'group', prevent_merge: bool = False, logger=None) Tuple[Series, Dict[int, Any]][source]#
Turns a Series into a Series of ascending integers starting from 1 that reflect groups of successive equal values. There are several options of how to deal with NA values.
- Parameters:
S – Series in which to group identical adjacent values with each other.
na_values –
‘group’ creates individual groups for NA values (default).’backfill’ or ‘bfill’ groups NA values with the subsequent group’pad’, ‘ffill’ groups NA values with the preceding groupAny other string works like ‘group’, with the difference that the groups will be named with this value.Passing None means NA values & ranges are being ignored, i.e. they will also be present in the output and- the
subsequent value will be based on the preceding value.
prevent_merge – By default, if you use the na_values argument to fill NA values, they might lead to two groups merging. Pass True to prevent this. For example, take the sequence [‘a’, NA, ‘a’] with
na_values='ffill': By default, it will be merged to one single group[1, 1, 1], {1: 'a'}. However, passingprevent_merge=Truewill result in[1, 1, 2], {1: 'a', 2: 'a'}.
- Returns:
A series with increasing integers that can be used for grouping. A dictionary mapping the integers to the grouped values.
- ms3.utils.functions.unfold_measures_table(measures: DataFrame, logger=None) DataFrame | None[source]#
Returns a copy of a measures table that corresponds through a succession of MCs when playing all repeats. To distinguish between repeated MCs and MNs, it adds the continues column ‘mc_playthrough’ (starting at 1) and ‘mn_playthrough’ which contains the values of ‘mn’ as string with letters {‘a’, ‘b’, …} appended.
- Parameters:
measures – Measures table with columns [‘mc’, ‘next’, ‘dont_count’]
Returns:
- ms3.utils.functions.unfold_repeats(df: DataFrame, playthrough_info: Series | DataFrame, logger=None) DataFrame[source]#
Use a succesion of MCs to bring a DataFrame in this succession. MCs may repeat.
- Parameters:
df – DataFrame needs to have the columns ‘mc’. If ‘mn’ is present, the column ‘mn’ will be added, too.
playthrough2mc – A Series of the format
{mc_playthrough: mc}wheremc_playthroughcorresponds to continuous MC
- Returns:
A copy of the dataframe with the columns ‘mc_playthrough’ and ‘mn_playthrough’ (if ‘mn’ is present) inserted.
- ms3.utils.functions.capture_parse_logs(logger_object: Logger, level: str | int = 'w', logger=None) LogCapturer[source]#
Within the context, the given logger will have an additional handler that captures all messages with level
levelor higher. At the end of the context, retrieve the message list via LogCapturer.content_list.Example
with capture_parse_logs(logger, level='d') as capturer: # do the stuff of which you want to capture the log messages of the given level (and above) all_messages = capturer.content_list
- ms3.utils.functions.write_metadata(metadata_df: DataFrame, path: str, index=False, logger=None) bool[source]#
Write the DataFrame
metadata_dftopath, updating an existing file rather than overwriting it.- Parameters:
metadata_df – DataFrame with one row per piece and an index of strings identifying pieces. The index is used for updating a potentially pre-existent file, from which the first column ∈ (‘piece’, ‘fname’, ‘fnames’, ‘name’, ‘names’) will be used as index.
path – If folder path, the filename ‘metadata.tsv’ will be appended; file_path will be used as is but a warning is thrown if the extension is not .tsv
index – Pass True if you want the first column of the output to be a RangeIndex starting from 0.
- Returns:
True if the metadata were successfully written, False otherwise.
- ms3.utils.functions.enforce_piece_index_for_metadata(metadata_df: DataFrame, append=False, logger=None) DataFrame[source]#
Returns a copy of the DataFrame that has an index level called ‘piece’.
- ms3.utils.functions.overwrite_overview_section_in_markdown_file(file_path, md_str, logger=None)[source]#
- ms3.utils.functions.write_markdown(metadata_df: DataFrame, file_path: str, logger=None) None[source]#
Write a subset of the DataFrame
metadata_dftopathin markdown format. If the file exists, it will be scanned for a line containing the string ‘# Overview’ and overwritten from that line onwards.- Parameters:
metadata_df – DataFrame containing metadata.
file_path – Path of the markdown file.
- ms3.utils.functions.ensure_correct_column_types(df: DataFrame, exclude_columns: Collection[str] | None = None) DataFrame[source]#
- ms3.utils.functions.write_tsv(df: DataFrame, file_path: str, pre_process: bool = True, logger=None, **kwargs)[source]#
Write a DataFrame to a TSV or CSV file based on the extension of ‘file_path’. By default, the index is not included, unless you pass
index=Trueas additional keyword argument. Uses:no_collections_no_booleans()- Parameters:
df – DataFrame to write to disk.
file_path – File to create or overwrite. If the extension is .tsv, the argument ‘sep’ will be set to ‘ ‘, otherwise the extension is expected to be .csv and the default separator ‘,’ will be used. Apart from that, the extension ‘zip’ is also allowed but you need to provide the kwargs yourself, especially something like
compression = dict(method='zip', archive_name='innername.csv')pre_process – By default, DataFrame cells containing lists and tuples will be transformed to strings and Booleans will be converted to 0 and 1 (otherwise they will be written out as True and False). Pass False to prevent.
**kwargs – Additional keyword arguments will be passed on to
pandas.DataFrame.to_csv(). Defaults arguments areindex=Falseandsep=' '(assuming extension ‘.tsv’, see above).
- ms3.utils.functions.abs2rel_key(absolute: str, localkey: str, global_minor: bool = False, logger=None) str[source]#
Expresses a Roman numeral as scale degree relative to a given localkey. The result changes depending on whether Roman numeral and localkey are interpreted within a global major or minor key.
Uses:
split_scale_degree()- Parameters:
absolute – Absolute key expressed as Roman scale degree of the local key.
localkey – The local key in terms of which
absolutewill be expressed.global_minor – Has to be set to True if absolute and localkey are scale degrees of a global minor key.
Examples
In a minor context, the key of II would appear within the key of vii as #III.
>>> abs2rel_key('iv', 'VI', global_minor=False) 'bvi' # F minor expressed with respect to A major >>> abs2rel_key('iv', 'vi', global_minor=False) 'vi' # F minor expressed with respect to A minor >>> abs2rel_key('iv', 'VI', global_minor=True) 'vi' # F minor expressed with respect to Ab major >>> abs2rel_key('iv', 'vi', global_minor=True) '#vi' # F minor expressed with respect to Ab minor
>>> abs2rel_key('VI', 'IV', global_minor=False) 'III' # A major expressed with respect to F major >>> abs2rel_key('VI', 'iv', global_minor=False) '#III' # A major expressed with respect to F minor >>> abs2rel_key('VI', 'IV', global_minor=True) 'bIII' # Ab major expressed with respect to F major >>> abs2rel_key('VI', 'iv', global_minor=False) 'III' # Ab major expressed with respect to F minor
- ms3.utils.functions.rel2abs_key(relative: str, localkey: str, global_minor: bool = False, logger=None) str | None[source]#
Expresses a Roman numeral that is expressed relative to a localkey as scale degree of the global key. For local keys {III, iii, VI, vi, VII, vii} the result changes depending on whether the global key is major or minor.
Uses:
split_scale_degree()- Parameters:
relative – Relative key or chord expressed as Roman scale degree of the local key.
localkey – The local key to which rel is relative.
global_minor – Has to be set to True if localkey is a scale degree of a global minor key.
Examples
If the label viio6/VI appears in the context of the local key VI or vi, the absolute key to which viio6 applies depends on the global key. The comments express the examples in relation to global C major or C minor.
>>> rel2abs_key('vi', 'VI', global_minor=False) '#iv' # vi of A major = F# minor >>> rel2abs_key('vi', 'vi', global_minor=False) 'iv' # vi of A minor = F minor >>> rel2abs_key('vi', 'VI', global_minor=True) 'iv' # vi of Ab major = F minor >>> rel2abs_key('vi', 'vi', global_minor=True) 'biv' # vi of Ab minor = Fb minor
The same examples hold if you’re expressing in terms of the global key the root of a VI-chord within the local keys VI or vi.
- ms3.utils.functions.make_interval_index_from_durations(df, position_col='quarterbeats', duration_col='duration_qb', closed='left', round=None, name='interval', logger=None)[source]#
Given an annotations table with positions and durations, create an
pandas.IntervalIndex. Returns None if any row is underspecified.- Parameters:
df (
pandas.DataFrame) – Annotation table containing the columns ofposition_col(default: ‘quarterbeats’) andduration_coldefault: ‘duration_qb’).position_col (
str, optional) – Name of the column containing positions, used as left boundaries.duration_col (
str, optional) – Name of the column containing durations which will be added to the positions to obtain right boundaries.closed (
str, optional) – ‘left’, ‘right’ or ‘both’ <- defining the interval boundariesround (
int, optional) – To how many decimal places to round the intervals’ boundary values.name (
str, optional) – Name of the created index. Defaults to ‘interval’.
- Returns:
A copy of
dfwith the original index replaced and underspecified rows removed (those where no interval could be coputed).- Return type:
- ms3.utils.functions.replace_index_by_intervals(df, position_col='quarterbeats', duration_col='duration_qb', closed='left', filter_zero_duration=False, round=None, name='interval', logger=None)[source]#
Given an annotations table with positions and durations, replaces its index with an
pandas.IntervalIndex. Underspecified rows are removed.- Parameters:
df (
pandas.DataFrame) – Annotation table containing the columns ofposition_col(default: ‘quarterbeats’) andduration_coldefault: ‘duration_qb’).position_col (
str, optional) – Name of the column containing positions.duration_col (
str, optional) – Name of the column containing durations.closed (
str, optional) – ‘left’, ‘right’ or ‘both’ <- defining the interval boundariesfilter_zero_duration (
bool, optional) – Defaults to False, meaning that rows with zero durations are maintained. Pass True to remove them.round (
int, optional) – To how many decimal places to round the intervals’ boundary values.name (
str, optional) – Name of the created index. Defaults to ‘interval’.
- Returns:
A copy of
dfwith the original index replaced and underspecified rows removed (those where no interval could be computed).- Return type:
- ms3.utils.functions.boolean_mode_col2strings(S) Series[source]#
Turn the boolean is_minor columns into string columns such that True => ‘minor’, False => ‘major’.
- ms3.utils.functions.replace_boolean_mode_by_strings(df) DataFrame[source]#
Replaces boolean ‘_is_minor’ columns with string columns renamed to ‘_mode’. Example: df[‘some_col’, ‘some_name_is_minor’] => df[‘some_col’, ‘some_name_mode’]
- ms3.utils.functions.resolve_relative_keys(relativeroot, minor=False, logger=None)[source]#
Resolve nested relative keys, e.g. ‘V/V/V’ => ‘VI’ if minor is False (default) or ‘#VI’ if True.
Uses:
rel2abs_key(),str_is_minor()
- ms3.utils.functions.series_is_minor(S, is_name=True)[source]#
Returns boolean Series where every value in
Srepresenting a minor key/chord is True.
- ms3.utils.functions.str_is_minor(tone, is_name=True)[source]#
Returns True if
tonerepresents a minor key or chord.
- ms3.utils.functions.transpose_changes(changes, old_num, new_num, old_minor=False, new_minor=False, logger=None)[source]#
Since the interval sizes expressed by the changes of the DCML harmony syntax depend on the numeral’s position in the scale, these may change if the numeral is transposed. This function expresses the same changes for the new position. Chord tone alterations (of 3 and 5) stay untouched.
Uses:
changes2tpc()- Parameters:
changes (
str) – A string of changes following the DCML harmony standard.old_num (
str:) – Old numeral, new numeral.new_num (
str:) – Old numeral, new numeral.old_minor (
bool, optional) – For each numeral, pass True if it occurs in a minor context.new_minor (
bool, optional) – For each numeral, pass True if it occurs in a minor context.
- ms3.utils.functions.features2tpcs(numeral, form=None, figbass=None, changes=None, relativeroot=None, key='C', minor=None, merge_tones=True, bass_only=False, mc=None, logger=None)[source]#
Given the features of a chord label, this function returns the chord tones in the order of the inversion, starting from the bass note. The tones are expressed as tonal pitch classes, where -1=F, 0=C, 1=G etc.
Uses:
changes2list(),name2fifths(),resolve_relative_keys(),roman_numeral2fifths(),sort_tpcs(),str_is_minor()- Parameters:
numeral (
str) – Roman numeral of the chord’s rootform ({None, 'M', 'o', '+' '%'}, optional) – Indicates the chord type if not a major or minor triad (for which
formis None). ‘%’ and ‘M’ can only occur as tetrads, not as triads.figbass ({None, '6', '64', '7', '65', '43', '2'}, optional) – Indicates chord’s inversion. Pass None for triad root position.
changes (
str, optional) – Added steps such as ‘+6’ or suspensions such as ‘4’ or any combination such as (9+64). Numbers need to be in descending order.relativeroot (
str, optional) – Pass a Roman scale degree if numeral is to be applied to a different scale degree of the local key, as in ‘V65/V’key (
strorint, optional) – The local key expressed as the root’s note name or a tonal pitch class. If it is a name and minor is None, uppercase means major and lowercase minor. If it is a tonal pitch class, minor needs to be specified.minor (
bool, optional) – Pass True for minor and False for major. Can be omitted if key is a note name. This affects calculation of chords related to III, VI and VII.merge_tones (
bool, optional) – Pass False if you want the function to return two tuples, one with (potentially suspended) chord tones and one with added notes.bass_only (
bool, optional) – Return only the bass note instead of all chord tones.mc (int or str) – Pass measure count to display it in warnings.
- ms3.utils.functions.path2parent_corpus(path)[source]#
Walk up the path and return the name of the first superdirectory that is a git repository or contains a ‘metadata.tsv’ file.
- ms3.utils.functions.chord2tpcs(chord: str, regex: Pattern | None = None, logger: Logger | None = None, **kwargs)[source]#
Split a chord label into its features and apply features2tpcs().
Uses: features2tpcs()
- Parameters:
chord – Chord label that can be split into the features [‘numeral’, ‘form’, ‘figbass’, ‘changes’, ‘relativeroot’].
regex – Compiled regex with named groups for the five features. By default, the current version of the DCML harmony annotation standard is used.
**kwargs – arguments for features2tpcs (pass mc=MC to show it in warnings!)
- ms3.utils.functions.parse_ignored_warnings(messages: Collection[str]) Iterator[Tuple[str, Tuple[int]]][source]#
Turns a list of log messages into an iterator of (logger_name, (message_info, …)) pairs. Log messages consist of a header of the shape WARNING_ENUM_MEMBER (enum_value, [mc, more_info…]) ms3.( Parse|Corpus).corpus.piece [– potentially more, irrelevant stuff]. The header might be followed by several lines of comments, each beginning with a space or tab.
- ms3.utils.functions.ignored_warnings2dict(messages: Collection[str]) Dict[str, List[Tuple[int]]][source]#
- Parameters:
messages
- Returns:
{logger_name -> [ignored_warnings]} dict.
- ms3.utils.functions.parse_ignored_warnings_file(path: str) Dict[str, List[Tuple[int, Tuple[int]]]][source]#
Parse file with log messages that have to be ignored to the dict. The expected structure of message: warning_type (warning_type_id, *integers) file Example of message: INCORRECT_VOLTA_MN_WARNING (2, 94) ms3.Parse.mixed_files.Did03M-Son_regina-1762-Sarti.mscx.MeasureList
- Parameters:
key (
str) –Path to IGNORED_WARNINGS- Returns:
{logger_name: [(message_id, label_of_message), (message_id, label_of_message), …]}.
- Return type:
obj: dict
- ms3.utils.functions.overlapping_chunk_per_interval(df: DataFrame, intervals: List[Interval], truncate: bool = True) Dict[Interval, DataFrame][source]#
- For each interval, create a chunk of the given DataFrame based on its IntervalIndex.
This is an optimized algorithm compared to calling IntervalIndex.overlaps(interval) for each given interval, with the additional advantage that it will not discard rows where the interval is zero, such as [25.0, 25.0).
- df
pandas.DataFrame The DataFrame is expected to come with an IntervalIndex and contain the columns ‘quarterbeats’ and
- df
- ‘duration_qb’.
Those can be obtained through
Parse.get_lists(interval_index=True)orParse.iter_transformed(interval_index=True).- intervals
listofpd.Interval The intervals defining the chunks’ dimensions. Expected to be non-overlapping and monotonically increasing.
- truncate
bool, optional Defaults to True, meaning that the interval index and the ‘duration_qb’ will be adapted for overlapping intervals. Pass False to get chunks with all overlapping intervals as they are.
dict{interval -> chunk}
- intervals
- ms3.utils.functions.infer_tsv_type(df: DataFrame) str | None[source]#
Infers the contents of a DataFrame from the presence of particular columns.
- ms3.utils.functions.reduce_dataframe_duration_to_first_row(df: DataFrame) DataFrame[source]#
Reduces a DataFrame to its row and updates the duration_qb column to reflect the reduced duration.
- Parameters:
df – Dataframe of which to keep only the first row. If it has an IntervalIndex, the interval is updated to reflect the whole duration.
- Returns:
DataFrame with one row.
- class ms3.utils.functions.File(ix: int, type: str, file: str, piece: str, fext: str, subdir: str, corpus_path: str, rel_path: str, full_path: str, directory: str, suffix: str = '', commit_sha: str = '')[source]#
Bases:
objectStoring path and file name information for one file.
- corpus_path: str#
Absolute path of the file’s parent directory that is considered as corpus directory.
- suffix: str = ''#
Upon registering the File with a
Piece, if the current piece has a suffix compared to the Piece’s piece, suffix is removed from the File object’s piece field and added to the suffix field.
- commit_sha: str = ''#
The the file has been retrieved from a particular git revision, this is set to the revision’s hash.
- classmethod from_corpus_path(corpus_path: str, filename: str, ftype: str | None = None, subdir='.', ix: int = -1)[source]#
Creates File object from individual components
- Parameters:
corpus_path – Root directory of the file’s corpus.
filename – Full file name including suffixes and extensions.
ftype – File type (used as default folder name for creating file_paths).
subdir – relative directory appended to corpus_path, defaults to ‘.’, i.e. no subfolder.
ix – Arbitrary index number, defaults to -1.
- ms3.utils.functions.automatically_choose_from_disambiguated_files(disambiguated_choices: Dict[str, File], piece: str, file_type: str, logger=None) File[source]#
- ms3.utils.functions.ask_user_to_choose(query: str, choices: Collection[Any]) Any | None[source]#
Ask user to input an integer and return the nth choice selected by the user.
- ms3.utils.functions.ask_user_to_choose_from_disambiguated_files(disambiguated_choices: Dict[str, File], piece: str, file_type: str = '') File | None[source]#
- ms3.utils.functions.disambiguate_files(files: Collection[File], piece: str, file_type: str, choose: Literal['auto', 'ask'] = 'auto', logger=None) File | None[source]#
Receives a collection of
Filewith the aim to pick one of them. First, a dictionary is created where the keys are disambiguation strings based on the files’ paths and suffixes.- Parameters:
files
choose – If ‘auto’ (default), the file with the shortest disambiguation string is chosen. Set to True if you want to be asked to manually choose a file.
- Returns:
The selected file.
- ms3.utils.functions.files2disambiguation_dict(files: Collection[File], include_disambiguator: bool = False, logger=None) Dict[str, File][source]#
Takes a list of
Filereturns a dictionary with disambiguating strings based on path components. of distinct strings to distinguish files pertaining to the same type.
- ms3.utils.functions.literal_type2tuple(typ: TypeVar) Tuple[str][source]#
Turns the first Literal included in the TypeVar into a list of values. The first literal value needs to be a string, otherwise the function may lead to unexpected behaviour.
- ms3.utils.functions.argument_and_literal_type2list(argument: str | Tuple[str] | Literal[None], typ: TypeVar | Tuple[str] | None = None, none_means_all: bool = True, logger=None) List[str] | None[source]#
Makes sure that an input value is a list of strings and that all strings are valid w.r.t. to the type’s expected literal values (strings).
- Parameters:
argument – If string, wrapped in a list, otherwise expected to be a tuple of strings (passing a list will fail). If None, a list of all possible values according to the type is returned if none_means_all.
typ – A typing.Literal declaration or a TypeVar where the first component is one, or a tuple of allowed values. All allowed values should be strings.
none_means_all – By default, None values are replaced with all allowed values, if specified. Pass False to return None in this case.
- Returns:
The list of accepted strings. The list of rejected strings.
- ms3.utils.functions.check_argument_against_literal_type(argument: str, typ: L, logger=None) L | None[source]#
- ms3.utils.functions.resolve_facets_param(facets, facet_type_var: TypeVar = typing.Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'], none_means_all=True, logger=None)[source]#
Like
argument_and_literal_type2list(), but also resolves ‘tsv’ to all non-score facets.
- ms3.utils.functions.available_views2str(views_dict: Dict[str, View], active_view_name: str = None) str[source]#
- ms3.utils.functions.unpack_json_paths(paths: Collection[str], logger=None) None[source]#
Mutates the list with paths by replacing .json files with the list (of paths) contained in them.
- ms3.utils.functions.resolve_paths_argument(paths: str | Collection[str], files: bool = True, logger=None) List[str][source]#
Makes sure that the given path(s) exists(s) and filters out those that don’t.
- Parameters:
paths – One or several paths given as strings.
files – By default, only file paths are returned. Set to False to return only folders.
Returns:
- ms3.utils.functions.compute_path_from_file(file: File, root_dir: str | None = None, folder: str | None = None, logger=None) str[source]#
Constructs a path based on the arguments.
- Args:
file: This function uses the fields corpus_path, subdir, and type. root_dir:
Defaults to None, meaning that the path is constructed based on the corpus_path. Pass a directory to construct the path relative to it instead. If
folderis an absolute path,root_diris ignored.- folder:
If
folderis None (default), the files’ type will be appended to theroot_dir.If
folderis an absolute path,root_dirwill be ignored.If
folderis a relative path starting with a dot.the relative path is appended to the file’s
- subdir.
For example, ``..
- otes`` will resolve to a sibling directory of the one where the
fileis located. If
folderis a relative path that does not begin with a dot., it will be appended to theroot_dir.If
folder== ‘’ (empty string), the result will be root_dir.
- Returns:
The constructed directory path.
- ms3.utils.functions.make_file_path(file: File, root_dir=None, folder: str = None, suffix: str = '', fext: str = '.tsv')[source]#
Constructs a file path based on the arguments.
- Args:
file: This function uses the fields piece, corpus_path, subdir, and type. root_dir:
Defaults to None, meaning that the path is constructed based on the corpus_path. Pass a directory to construct the path relative to it instead. If
folderis an absolute path,root_diris ignored.- folder:
Different behaviours are available. Note that only the third option ensures that file paths are distinct for files that have identical pieces but are located in different subdirectories of the same corpus. * If
folderis None (default), the files’ type will be appended to theroot_dir. * Iffolderis an absolute path,root_dirwill be ignored. * Iffolderis a relative path starting with a dot.the relative path is appended to the file’s subdir.For example, ``..
- otes`` will resolve to a sibling directory of the one where the
fileis located. If
folderis a relative path that does not begin with a dot., it will be appended to theroot_dir.
suffix: String to append to the file’s piece. fext: File extension to append to the (piece+suffix). Defaults to
.tsv.- Returns:
The constructed file path.
- ms3.utils.functions.string2identifier(s: str, remove_leading_underscore: bool = True) str[source]#
Transform a string in a way that it can be used as identifier (variable or attribute name). Solution by Kenan Banks on https://stackoverflow.com/a/3303361
- ms3.utils.functions.resolve_git_revision(repo_path: str, git_revision: str | None, logger=None) str | None[source]#
Returns the commit hash for the given revision.
- Parameters:
repo_path
git_revision – Any specifier that git understands (branch, tag, commit hash, “HEAD”, etc.). In addition, “LATEST_VERSION” can be passed to get the tag with the highest version number. None defaults to “HEAD”.
logger
- Returns:
Hash of the commit that corresponds to the given revision specifier.
- ms3.utils.functions.parse_tsv_file_at_git_revision(file: File, git_revision: str, repo_path: str | None = None, logger=None) Tuple[File | None, DataFrame | None][source]#
Pass a File object of a TSV file and an identifier for a git revision to retrieve the parsed TSV file at that
- commit.
The file needs to have existed at the revision in question.
- Args:
file: git_revision: repo_path:
Returns:
- ms3.utils.functions.write_messages_to_file_or_remove(warnings_file: str, warnings: List[str], header: str, logger=None) bool[source]#
- ms3.utils.functions.write_warnings_to_file(warnings_file: str, warnings: List[str], header: str | None = None, logger=None)[source]#
- ms3.utils.functions.write_validation_errors_to_file(errors_file: str, errors: List[str], header: str | None = None, logger=None)[source]#
- ms3.utils.functions.make_oneliner(node)[source]#
Pass a tag of which the layout does not spread over several lines.
- ms3.utils.functions.format_node(node, indent)[source]#
Recursively format Beautifulsoup tag as in an MSCX file.
- ms3.utils.functions.bs4_to_mscx(soup: BeautifulSoup)[source]#
Turn the BeautifulSoup into a string representing an MSCX file
- ms3.utils.functions.write_score_to_handler(soup: BeautifulSoup, file_handler: IO, logger=None) bool[source]#
- ms3.utils.functions.write_soup_to_mscx_file(soup: Tag, mscx_path: str, overwrite: bool = False, logger=None) bool[source]#
- ms3.utils.functions.update_relative_paths_with_corpus_dirs(concatenated: DataFrame) None[source]#
Assumes that the first index level includes folder names and adds them to the relative paths. The first column to be updated is “subdirectory” (default name) or “rel_paths” (old name). The second column, if present, is “rel_path”. The operation is performed in-place.
- ms3.utils.functions.concat_metadata_dfs(corpus2metadata_df: Dict[str, DataFrame]) DataFrame[source]#
Concats the dataframes corresponding to the metadata.tsv files of sub-corpora. The corpus names will be prepended as an additional index level and to the relative file paths in the column “subdirectory” (default name) or “rel_paths” (old name).
- Parameters:
corpus2metadata_df – Dictionary mapping corpus names (i.e., folder names) to parsed metadata.tsv files.