Transforms
The forward and inverse transforms follow the librosa convention: time-domain signals have the shape (*batch, samples), and time-frequency representations have the shape (*batch, bins, frames).
Forward Transforms
Functions that take in time-domain signals and output time-frequency representations.
korvax.stft
stft(
x,
/,
n_fft=2048,
hop_length=None,
win_length=None,
window="hann",
center=True,
pad_kwargs=dict(),
)
Compute the short-time Fourier transform (STFT) of a time-domain signal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Float[ArrayLike, '*channels n_samples']
|
Input signal. |
required |
n_fft
|
int
|
FFT size (number of samples per frame). |
2048
|
hop_length
|
int | None
|
Hop (step) length between adjacent frames. If None, defaults to
|
None
|
win_length
|
int | None
|
Length of the analysis window. If None, defaults to |
None
|
window
|
_WindowSpec
|
Either a 1d array containing the window to apply to each frame, or a window specification (see get_window). |
'hann'
|
center
|
bool
|
If True, pad the input so that frames are centered on their timestamps. |
True
|
**pad_kwargs
|
dict[str, Any]
|
Additional keyword arguments forwarded to pad_center. |
dict()
|
Returns:
| Type | Description |
|---|---|
Complex[Array, '*channels {n_fft}//2+1 n_frames']
|
STFT coefficients. |
Source code in src/korvax/transforms/fourier.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | |
korvax.spectrogram
spectrogram(
x,
/,
n_fft=2048,
hop_length=None,
win_length=None,
window="hann",
center=True,
power=2.0,
pad_kwargs=dict(),
)
Compute the magnitude spectrogram of a time-domain signal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Float[ArrayLike, '*channels n_samples']
|
Input signal. |
required |
n_fft
|
int
|
FFT size (number of samples per frame). |
2048
|
hop_length
|
int | None
|
Hop (step) length between adjacent frames. If None, defaults to
|
None
|
win_length
|
int | None
|
Length of the analysis window. If None, defaults to |
None
|
window
|
_WindowSpec
|
Either a 1d array containing the window to apply to each frame, or a window specification (see get_window). |
'hann'
|
center
|
bool
|
If True, pad the input so that frames are centered on their timestamps. |
True
|
power
|
float | int | None
|
Exponent for the magnitude spectrogram. If 2.0, returns power spectrogram. If None, returns complex STFT coefficients. |
2.0
|
pad_kwargs
|
dict[str, Any]
|
Additional keyword arguments forwarded to pad_center. |
dict()
|
Returns:
| Type | Description |
|---|---|
Inexact[Array, '*channels {n_fft}//2+1 n_frames']
|
Magnitude spectrogram. |
Source code in src/korvax/transforms/fourier.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | |
korvax.mel_spectrogram
mel_spectrogram(
x,
/,
sr,
n_fft,
n_mels=128,
fmin=0.0,
fmax=None,
hop_length=None,
win_length=None,
window="hann",
center=True,
power=2.0,
pad_kwargs=dict(),
)
Compute a mel-scaled spectrogram from a time-domain signal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Float[ArrayLike, '*channels n_samples']
|
Input signal. |
required |
sr
|
float
|
Sample rate of the audio signal. |
required |
n_fft
|
int
|
FFT size (number of samples per frame). |
required |
n_mels
|
int
|
Number of mel bands to generate. |
128
|
fmin
|
float
|
Minimum frequency (Hz). |
0.0
|
fmax
|
float | None
|
Maximum frequency (Hz). If None, defaults to |
None
|
hop_length
|
int | None
|
Hop (step) length between adjacent frames. If None, defaults to
|
None
|
win_length
|
int | None
|
Length of the analysis window. If None, defaults to |
None
|
window
|
_WindowSpec
|
Either a 1d array containing the window to apply to each frame, or a window specification (see get_window). |
'hann'
|
center
|
bool
|
If True, pad the input so that frames are centered on their timestamps. |
True
|
power
|
float | int
|
Exponent for the magnitude spectrogram. If 2.0, returns power spectrogram. |
2.0
|
pad_kwargs
|
dict[str, Any]
|
Additional keyword arguments forwarded to pad_center. |
dict()
|
Returns:
| Type | Description |
|---|---|
Float[Array, '*channels {n_mels} n_frames']
|
Mel-scale spectrogram. |
Source code in src/korvax/transforms/mel.py
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | |
korvax.mfcc
mfcc(
x,
/,
sr,
n_fft,
n_mfcc=20,
norm="ortho",
mag_scale="db",
lifter=0.0,
n_mels=128,
fmin=0.0,
fmax=None,
hop_length=None,
win_length=None,
window="hann",
center=True,
power=2.0,
pad_kwargs=dict(),
)
Compute mel-frequency cepstral coefficients (MFCCs) from a time-domain signal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Float[ArrayLike, '*channels n_samples']
|
Input signal. |
required |
sr
|
float
|
Sample rate of the audio signal. |
required |
n_fft
|
int
|
FFT size (number of samples per frame). |
required |
n_mfcc
|
int
|
Number of MFCCs to return. |
20
|
norm
|
Literal['backward', 'ortho'] | None
|
Normalization mode for DCT. |
'ortho'
|
mag_scale
|
Literal['linear', 'log', 'db']
|
Magnitude scaling to apply before DCT. Options are "linear" (no scaling), "log" (natural logarithm), or "db" (decibels). |
'db'
|
lifter
|
float
|
If greater than 0, apply liftering (cepstral filtering) with the specified coefficient. |
0.0
|
n_mels
|
int
|
Number of mel bands to generate. |
128
|
fmin
|
float
|
Minimum frequency (Hz). |
0.0
|
fmax
|
float | None
|
Maximum frequency (Hz). If None, defaults to |
None
|
hop_length
|
int | None
|
Hop (step) length between adjacent frames. If None, defaults to
|
None
|
win_length
|
int | None
|
Length of the analysis window. If None, defaults to |
None
|
window
|
_WindowSpec
|
Either a 1d array containing the window to apply to each frame, or a window specification (see get_window). |
'hann'
|
center
|
bool
|
If True, pad the input so that frames are centered on their timestamps. |
True
|
power
|
float | int
|
Exponent for the magnitude spectrogram. If 2.0, returns power spectrogram. |
2.0
|
pad_kwargs
|
dict[str, Any]
|
Additional keyword arguments forwarded to pad_center. |
dict()
|
Returns:
| Type | Description |
|---|---|
Float[Array, '*channels {n_mfcc} n_frames']
|
Mel-frequency cepstral coefficients. |
Source code in src/korvax/transforms/mel.py
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 | |
korvax.cqt
cqt(
x,
/,
sr,
hop_length=512,
fmin=32.7,
fmax=None,
n_bins=84,
bins_per_octave=12,
filter_scale=1.0,
norm_kernels=1,
power=2.0,
window="hann",
center=True,
normalization_type="librosa",
pad_kwargs=dict(),
)
Compute the Constant-Q Transform (CQT) of a time-domain signal.
The CQT is a time-frequency representation with logarithmically-spaced frequency bins, making it well-suited for music analysis. This is a convenience wrapper that calls vqt with gamma=0.
Source code in src/korvax/transforms/_cqt.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | |
korvax.vqt
vqt(
x,
/,
sr,
hop_length=512,
fmin=32.7,
fmax=None,
n_bins=84,
gamma=0.0,
bins_per_octave=12,
filter_scale=1.0,
norm_kernels=1,
power=2.0,
window="hann",
center=True,
normalization_type="librosa",
pad_kwargs=dict(),
)
Compute the Variable-Q Transform (VQT) of a time-domain signal.
The VQT is a generalization of the Constant-Q Transform (CQT) that allows for variable bandwidth via the gamma parameter. When gamma=0, this is equivalent to the CQT.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Float[Array, ' n_samples']
|
Input signal. |
required |
sr
|
float
|
Sample rate of the input signal. |
required |
hop_length
|
Hop (step) length between adjacent frames. |
512
|
|
fmin
|
float
|
Minimum frequency (Hz). |
32.7
|
fmax
|
float | None
|
Maximum frequency (Hz). If None, determined by |
None
|
n_bins
|
int
|
Number of frequency bins. Ignored if |
84
|
gamma
|
float
|
Bandwidth offset parameter. When gamma=0, this reduces to CQT. |
0.0
|
bins_per_octave
|
int
|
Number of bins per octave. |
12
|
filter_scale
|
float | int
|
Scale factor for filter bandwidths. |
1.0
|
norm_kernels
|
float | int
|
Normalization mode for the filter kernels (p-norm to use). |
1
|
power
|
int | float | None
|
Exponent for the magnitude spectrogram. If 2.0, returns power spectrogram. If None, returns complex VQT coefficients. |
2.0
|
window
|
str | float | tuple
|
Window specification (see get_window). |
'hann'
|
center
|
bool
|
If True, pad the input so that frames are centered on their timestamps. |
True
|
normalization_type
|
Literal['librosa', 'convolutional', 'wrap']
|
Type of normalization to apply ("librosa", "convolutional", or "wrap"). |
'librosa'
|
pad_kwargs
|
Additional keyword arguments forwarded to pad_center. |
dict()
|
Returns:
| Type | Description |
|---|---|
Float[Array, ' n_bins n_frames']
|
VQT coefficients. |
Source code in src/korvax/transforms/_cqt.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 | |
Inverse Transforms
Functions that take in time-frequency representations and output time-domain signals.
korvax.istft
istft(
x,
/,
n_fft=None,
hop_length=None,
win_length=None,
window="hann",
center=True,
length=None,
)
Compute the inverse short-time Fourier transform (ISTFT).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Complex[ArrayLike, '*channels n_freqs n_frames']
|
STFT coefficients. |
required |
n_fft
|
int | None
|
FFT size (number of samples per frame). |
None
|
hop_length
|
int | None
|
Hop (step) length between adjacent frames. If None, defaults to
|
None
|
win_length
|
int | None
|
Length of the analysis window. If None, defaults to |
None
|
window
|
_WindowSpec
|
Either a 1d array containing the window to apply to each frame, or a window specification (see get_window). |
'hann'
|
center
|
bool
|
If |
True
|
length
|
int | None
|
If provided, the output will be trimmed or zero-padded to exactly this length. |
None
|
Returns:
| Type | Description |
|---|---|
Float[Array, '*channels n_samples']
|
Reconstructed time-domain signal. |
Source code in src/korvax/transforms/fourier.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
korvax.griffin_lim
griffin_lim(
S,
/,
key=None,
n_iter=32,
n_fft=None,
hop_length=None,
win_length=None,
window="hann",
center=True,
length=None,
momentum=0.99,
pad_kwargs=dict(),
)
Reconstruct a time-domain signal from a magnitude spectrogram using the Griffin-Lim algorithm.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
S
|
Float[ArrayLike, '*channels n_freqs n_frames']
|
Magnitude spectrogram. |
required |
key
|
PRNGKeyArray | None
|
JAX PRNG key for random phase initialization. If None, uses zero phase initialization. |
None
|
n_iter
|
int
|
Number of Griffin-Lim iterations to perform. |
32
|
n_fft
|
int | None
|
FFT size (number of samples per frame). If None, inferred from spectrogram shape. |
None
|
hop_length
|
int | None
|
Hop (step) length between adjacent frames. If None, defaults to
|
None
|
win_length
|
int | None
|
Length of the analysis window. If None, defaults to |
None
|
window
|
_WindowSpec
|
Either a 1d array containing the window to apply to each frame, or a window specification (see get_window). |
'hann'
|
center
|
bool
|
If True, frames are assumed to be centered in time. If False, they are assumed to be left-aligned in time. |
True
|
length
|
int | None
|
If provided, the output will be trimmed or zero-padded to exactly this length. |
None
|
momentum
|
float
|
Momentum parameter for fast Griffin-Lim (typically between 0 and 1). |
0.99
|
pad_kwargs
|
dict[str, Any]
|
Additional keyword arguments forwarded to pad_center. |
dict()
|
Returns:
| Type | Description |
|---|---|
Float[Array, '*channels n_samples']
|
Reconstructed time-domain signal. |
Source code in src/korvax/transforms/fourier.py
213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 | |
Frequency Transforms
These functions take in frequency-domain representations and output modified frequency-domain representations. They are used in the above time-to-frequency transforms, but can also be used standalone.
korvax.cepstral_coefficients
cepstral_coefficients(
S, /, n_cc=20, norm="ortho", mag_scale="db", lifter=0.0
)
Compute cepstral coefficients from a spectrogram via discrete cosine transform.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
S
|
Float[Array, '*channels n_freqs n_frames']
|
Input spectrogram. |
required |
n_cc
|
int
|
Number of cepstral coefficients to return. |
20
|
norm
|
Literal['backward', 'ortho'] | None
|
Normalization mode for DCT. |
'ortho'
|
mag_scale
|
Literal['linear', 'log', 'db']
|
Magnitude scaling to apply before DCT. Options are "linear" (no scaling), "log" (natural logarithm), or "db" (decibels). |
'db'
|
lifter
|
float
|
If greater than 0, apply liftering (cepstral filtering) with the specified coefficient. |
0.0
|
Returns:
| Type | Description |
|---|---|
Float[Array, '*channels {n_cc} n_frames']
|
Cepstral coefficients. |
Source code in src/korvax/transforms/mel.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | |
korvax.to_mel_scale
to_mel_scale(
S, /, sr, n_fft, n_mels=128, fmin=0.0, fmax=None
)
Convert a linear-frequency spectrogram to mel scale.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
S
|
Float[Array, '*channels n_freqs n_frames']
|
Input spectrogram. |
required |
sr
|
float
|
Sample rate of the audio signal. |
required |
n_fft
|
int
|
FFT size (number of samples per frame). |
required |
n_mels
|
int
|
Number of mel bands to generate. |
128
|
fmin
|
float
|
Minimum frequency (Hz). |
0.0
|
fmax
|
float | None
|
Maximum frequency (Hz). If None, defaults to |
None
|
Returns:
| Type | Description |
|---|---|
Float[Array, '*channels {n_mels} n_frames']
|
Mel-scale spectrogram. |
Source code in src/korvax/transforms/mel.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
Perceptual Loudness Weighting
korvax.A_weighting
A_weighting(frequencies, /, min_db=-80.0)
Compute A-weighting curve for given frequencies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frequencies
|
Float[ArrayLike, ' n_freqs']
|
Frequencies in Hz. |
required |
min_db
|
float | None
|
Minimum decibel value for clipping. If None, no clipping applied. |
-80.0
|
Returns:
| Type | Description |
|---|---|
Float[Array, ' n_freqs']
|
A-weighting values in dB. |
Source code in src/korvax/convert.py
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 | |
korvax.B_weighting
B_weighting(frequencies, /, min_db=-80.0)
Compute B-weighting curve for given frequencies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frequencies
|
Float[ArrayLike, ' n_freqs']
|
Frequencies in Hz. |
required |
min_db
|
float | None
|
Minimum decibel value for clipping. If None, no clipping applied. |
-80.0
|
Returns:
| Type | Description |
|---|---|
Float[Array, ' n_freqs']
|
B-weighting values in dB. |
Source code in src/korvax/convert.py
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 | |
korvax.C_weighting
C_weighting(frequencies, /, min_db=-80.0)
Compute C-weighting curve for given frequencies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frequencies
|
Float[ArrayLike, ' n_freqs']
|
Frequencies in Hz. |
required |
min_db
|
float | None
|
Minimum decibel value for clipping. If None, no clipping applied. |
-80.0
|
Returns:
| Type | Description |
|---|---|
Float[Array, ' n_freqs']
|
C-weighting values in dB. |
Source code in src/korvax/convert.py
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 | |
korvax.D_weighting
D_weighting(frequencies, /, min_db=-80.0)
Compute D-weighting curve for given frequencies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frequencies
|
Float[ArrayLike, ' n_freqs']
|
Frequencies in Hz. |
required |
min_db
|
float | None
|
Minimum decibel value for clipping. If None, no clipping applied. |
-80.0
|
Returns:
| Type | Description |
|---|---|
Float[Array, ' n_freqs']
|
D-weighting values in dB. |
Source code in src/korvax/convert.py
267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 | |
Utilities
korvax.mel_filterbank
mel_filterbank(
*,
sr,
n_fft,
n_mels=128,
fmin=0.0,
fmax=None,
htk=False,
norm="slaney",
dtype=None,
)
Create a mel-scale filterbank.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sr
|
float
|
Sample rate of the audio signal. |
required |
n_fft
|
int
|
FFT size (number of samples per frame). |
required |
n_mels
|
int
|
Number of mel bands to generate. |
128
|
fmin
|
float
|
Minimum frequency (Hz). |
0.0
|
fmax
|
float | None
|
Maximum frequency (Hz). If None, defaults to |
None
|
htk
|
bool
|
If True, use HTK formula for mel scale. Otherwise, use Slaney formula. |
False
|
norm
|
Literal['slaney'] | float | None
|
Normalization mode. If "slaney", use Slaney-style normalization. If a float, use L-norm normalization. If None, no normalization. |
'slaney'
|
dtype
|
DTypeLike | None
|
Data type for the filterbank. If None, defaults to default float type. |
None
|
Returns:
| Type | Description |
|---|---|
Float[Array, ' {n_mels} {n_fft}//2+1']
|
Mel filterbank matrix. |
Source code in src/korvax/transforms/mel.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
korvax.mel_to_hz
mel_to_hz(mels, /, htk=False)
Convert mel scale to frequencies in Hz.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mels
|
Float[ArrayLike, '*dims']
|
Mel-scale values. |
required |
htk
|
bool
|
If True, use HTK formula. Otherwise, use Slaney formula. |
False
|
Returns:
| Type | Description |
|---|---|
Float[Array, '*dims']
|
Frequencies in Hz. |
Source code in src/korvax/convert.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | |
korvax.hz_to_mel
hz_to_mel(frequencies, /, htk=False)
Convert frequencies in Hz to mel scale.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frequencies
|
Float[ArrayLike, '*dims']
|
Frequencies in Hz. |
required |
htk
|
bool
|
If True, use HTK formula. Otherwise, use Slaney formula. |
False
|
Returns:
| Type | Description |
|---|---|
Float[Array, '*dims']
|
Mel-scale values. |
Source code in src/korvax/convert.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | |
korvax.db_to_amplitude
db_to_amplitude(S_db, /, ref=1.0)
Convert a decibel-scale spectrogram to amplitude scale.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
S_db
|
Float[ArrayLike, '*dims']
|
Input spectrogram in dB. |
required |
ref
|
float
|
Reference value for decibel calculation. |
1.0
|
Returns:
| Type | Description |
|---|---|
Float[Array, '*dims']
|
Amplitude spectrogram. |
Source code in src/korvax/convert.py
391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 | |
korvax.amplitude_to_db
amplitude_to_db(S, /, ref=1.0, amin=1e-08, top_db=80.0)
Convert an amplitude spectrogram to decibel scale.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
S
|
Inexact[ArrayLike, '*dims']
|
Input amplitude spectrogram. |
required |
ref
|
Float[ArrayLike, ''] | Callable[[Float[ArrayLike, '*']], Float[ArrayLike, '']]
|
Reference value for decibel calculation. Can be a scalar or callable that computes a reference from the input. |
1.0
|
amin
|
float
|
Minimum threshold for input values. |
1e-08
|
top_db
|
float | None
|
Maximum decibel range. Values below |
80.0
|
Returns:
| Type | Description |
|---|---|
Float[Array, '*dims']
|
Amplitude spectrogram in dB. |
Source code in src/korvax/convert.py
356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 | |
korvax.power_to_db
power_to_db(S, /, ref=1.0, amin=1e-10, top_db=80.0)
Convert a power spectrogram to decibel scale.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
S
|
Inexact[ArrayLike, '*dims']
|
Input power spectrogram. |
required |
ref
|
Float[ArrayLike, ''] | Callable[[Float[ArrayLike, '*']], Float[ArrayLike, '']]
|
Reference value for decibel calculation. Can be a scalar or callable that computes a reference from the input. |
1.0
|
amin
|
float
|
Minimum threshold for input values. |
1e-10
|
top_db
|
float | None
|
Maximum decibel range. Values below |
80.0
|
Returns:
| Type | Description |
|---|---|
Float[Array, '*dims']
|
Power spectrogram in dB. |
Source code in src/korvax/convert.py
299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 | |
korvax.db_to_power
db_to_power(S_db, /, ref=1.0)
Convert a decibel-scale spectrogram to power scale.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
S_db
|
Float[ArrayLike, '*dims']
|
Input spectrogram in dB. |
required |
ref
|
Float[ArrayLike, '']
|
Reference value for decibel calculation. |
1.0
|
Returns:
| Type | Description |
|---|---|
Float[Array, '*dims']
|
Power spectrogram. |
Source code in src/korvax/convert.py
339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 | |
korvax.fft_frequencies
fft_frequencies(*, sr=22050, n_fft=2048)
Compute the center frequencies of FFT bins.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sr
|
float
|
Sample rate of the audio signal. |
22050
|
n_fft
|
int
|
FFT size (number of samples per frame). |
2048
|
Returns:
| Type | Description |
|---|---|
Float[Array, ' {n_fft}//2+1']
|
Center frequencies of FFT bins in Hz. |
Source code in src/korvax/convert.py
130 131 132 133 134 135 136 137 138 139 140 141 142 | |
korvax.mel_frequencies
mel_frequencies(
n_mels=128, /, fmin=0.0, fmax=11025.0, htk=False
)
Compute an array of mel-spaced frequencies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_mels
|
int
|
Number of mel bands. |
128
|
fmin
|
float
|
Minimum frequency (Hz). |
0.0
|
fmax
|
float
|
Maximum frequency (Hz). |
11025.0
|
htk
|
bool
|
If True, use HTK formula. Otherwise, use Slaney formula. |
False
|
Returns:
| Type | Description |
|---|---|
Float[Array, ' {n_mels}']
|
Array of frequencies in Hz. |
Source code in src/korvax/convert.py
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | |
korvax.cqt_frequencies
cqt_frequencies(
n_bins, /, fmin, bins_per_octave=12, tuning=0.0
)
Compute the center frequencies of constant-Q transform bins.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_bins
|
int
|
Number of frequency bins. |
required |
fmin
|
float
|
Minimum frequency (Hz). |
required |
bins_per_octave
|
int
|
Number of bins per octave. |
12
|
tuning
|
float
|
Tuning offset in fractions of a bin. |
0.0
|
Returns:
| Type | Description |
|---|---|
Float[Array, ' {n_bins}']
|
Center frequencies of CQT bins in Hz. |
Source code in src/korvax/convert.py
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | |