metrw-pitch

Tuesday, August 23, 2022

DFT_grid.m

Pitch detection by DFT grid

Acknowledgements

1. Learning Byzantine music at the Byzantine Music School of the Piraeus Metropolis (church of Greece) in 1990s led me to design a project to master chant. In this school I learned the main characteristics of musical sound: timbre, intensity, time, pitch, intervals, scales. I dropped out of this school in the second grade. By the way let me say that there I learned from the late vocal teacher Anastasios Gerou, to open my mouth wide and pronounce the letter "a" with a steady voice.

2. I studied DSP literature at the Eugenidis Foundation Library in the 1990s and 2004. I picked up the basics of DSP there.

3. I mastered DFT in 2016 by studying the book Understanding Digital Signal Processing by Mr. Richard Lyons, a mentor for DSP professionals.
4. I posed the question, "Can I test any frequency with DFT?" on dsp.stackexchange in 2017. *Yes* was the response given by Marcus Müller, a mentor for DSP specialists. I can now perform DFT grid, I learned.

5. I was recommended to use 16384 as the second argument in the function fft by Mr. Jeffrey Clark at matlab.

Introduction

1. I am unable to use this code in the DSP kit since I am homeless. I'm giving it away for free.

It pinpoints fundamental frequency (fundFreq), commonly referred to as pitch or the first harmonic, with far greater accuracy by using a DFT grid about fundFreq discovered by FFT or any other pitch-detecting technique. It functions as a micrometer.

My method is simple enough even for a music student to utilize; all it takes is for the user's voice pitch to be one tone (9/8) lower or higher than the frequency of the note. The Matlab's abs(fft()) function is then used to get indices up to 1.2 note's frequency. Among them, the first harmonic's abs is the biggest.

2. It should, in fact, be integrated into a project that uses a DSP kit in part.

3. There should be two sections to the project. One uses the DSP kit to execute DFT grid, while the other uses the PC's processor to execute graphics.

4. A rectangle is used to graphically indicate the interval between a musical note and its base when expressed in a unit like a 100th of a major tone (9/8). Its width indicates the duration (tempo) of the note at a given rate. Color indicates the volume or amplitude of sound. Rectangle travels to the left at a speed determined by its width and the tempo of the note. On the left side of the window, there is an arrow that moves up and down. Pitch of voice is shown by its height. To obtain a notion view the Metrw-pitch graphics.

5. A vertical scale with zero in the middle, one semitone represented by the top half, and one semitone represented by the bottom half might be designed. Zero designates the top edge of the note rectangle. This scale can perform micrometer tasks. A horizontal scroll bar is requored to enable the user to navigate back. It also needs a speed regulator.

6. We have to add a snippet in order to gauge the sound's volume. The volume of the sound also requires a third scale.

Accuracy error

1. Even the tiniest musical interval, diesis, must be given by singers, chanters, and violists. Nevertheless, projects that use FFT to determine fundFreq are unsatisfactory to musicians since fundFreq has accuracy (or inaccuracy) about a diesis.

Similar errors are made by other pitch detection techniques. The current code has an accuracy better than 0.1 diesis in detecting funbdFreq. Therefore, the user can view the pitch error of their voice in real time, eight times per second, and adjust the pitch of their voice to give diesis when this code is embedded in a project running in the DSP kit.

The issue is whether the precision of the MATLAB code running in the DSP kit is the same as that of the MATLAB online R2020b. Visit dsp.stackexchange to view Mr. Ben's response to my query query. It is positive.

2. The error is based on the Segm/SampFreq ratio. The accuracy increases with the ratio. However, the visuals' pitch iteration rate decreases.

3. FundFreq also affects error. Less fundFreq corresponds to more accuracy error. The apparent explanation for both is this: More FundFreq periods in Segm correspond to increased accuracy. Additionally, it is somewhat dependent on phase, amplitude, and upper harmonics.

General instructions

1. I advise sampling at a frequency of 96000 samples per second to avoid analog filters, which distort signals. A snippet for noise reduction is required.

2. Next, a low pass FIR filter with a 7000–8000 kHz transition band is applied.

3. SampFreq is lowered to 16,000 samples per second, with 1 sample chosen every 6.

4. Programmer is able to alter numbers.

5. See the responses to my question at dsp.stackexchange for a conical display of FFT amplitudes.

The code

format default;
SampFreq = 16000;
Segm = 1:2048;
baseFreq = 256; % it is example
noteFreq = baseFreq*100/81; % up to 2666, for 3rd harmonic < 8000
x = 1.1; % 8/9 < x < 9/8. It is example
voicePitch = x * noteFreq;
% Generating signal
FirstHarmAngles = voicePitch*2*pi/SampFreq*Segm+1.9*pi;
SinFirstHarmAngles = sin(FirstHarmAngles);
SecondHarmAngles = voicePitch*2*2*pi/SampFreq*Segm+2.9*pi;
SinSecondHarmAngles = sin(SecondHarmAngles);
ThirdHarmAngles = voicePitch*3*2*pi/SampFreq*Segm+0.3*pi;
SinThirdHarmAngles = sin(ThirdHarmAngles);
Xn=170000*SinFirstHarmAngles+220000*...
SinSecondHarmAngles+150000*...
SinThirdHarmAngles;
% Finding FFT pitch
magn = abs(fft(Xn, 16384));
OnePointTwo = round(noteFreq * 1.2);
[maxMagn, FFTpitch] = max(magn(1:OnePointTwo));
% Performing DFT grid
LowSpanLimit = 0.8*FFTpitch;
HighSpanLimit = 1.2*FFTpitch;
PartsOfSpan = 1001;
TestFreqs = logspace(log10(LowSpanLimit), log10(HighSpanLimit), PartsOfSpan);
MagnSqrd = ones(1, PartsOfSpan);
for f = 1:PartsOfSpan
   Angles = TestFreqs(f)*2*pi/SampFreq*Segm;
   XnCos = sum(Xn.*cos(Angles));
   XnSin = sum(Xn.*-sin(Angles));
   MagnSqrd(f) = XnCos.^2+XnSin.^2;
end
% Finding GRID pitch
[maxMagnSqrd, maxMagnSqrdIndex] = max(MagnSqrd);
GRIDpitch = TestFreqs(maxMagnSqrdIndex);
voiceRatioInterval = GRIDpitch/baseFreq;
% Following should be incorporated in graphics part.
% I use the cent of a major tone as an example of a tempering unit.
mainUnit = 9/8; % for cents of semitone it is 2
tempUnit = 100; % for cents of semitone it is 1200
noteRatioInterval = noteFreq / baseFreq;
tempNOTEpitch = log(noteRatioInterval)/log(mainUnit)*tempUnit;
tempGRIDpitch = log(voiceRatioInterval)/log(mainUnit)*tempUnit;
% Display results
noteFreq
voicePitch
FFTpitch
GRIDpitch
tempGRIDpitch
tempNOTEpitch

Tuesday, February 2, 2021

Metrw-pitch graphics

Indroduction

1. I'm homeless and can't implement this code in DSP kit. I offer it for free.

2. Project detects fundamental frequency (fundFreq), or pitch, performing DFT grid around fundFreq detected by FFT. Indeed it should be embedded in a project executing, in part, in DSP kit.

2. Project should have two parts. One, executing in DSP kit detects fundFreq (see DFT_grid.m) and other, executing in PC's processor, has the graphics.

3. Graphics part represents musical note as rectangular with height representing note's interval from base by some unit e.g. cent of semitone. Wide represents note's duration. Color represents sound's volume. At a specific speed, the rectangle (Quad) travels to the left.
4. At the left side of window a pointer moves up and down. Its point's height from base represents voice's pitch.

5. Another vertical scale with zero at its middle, upper half representing one semitone, and lower also one semitone, is micrometer. Zero represents the note rectangular's upper edge.

6. Third vertical scale - not shown in graphics - indicates the volume of the sound.

7. A speed regulator is a must, as well a horizontal scroll bar for the user can go back.

8. To get inspired, refer to this guide (which is outdated but requires you to set up GLFW only on your project) or this one (which is updated but requires you to set up GLFW and GLAD). Run follow code after that.

9. I got assistance from Mr. Dougbinks in learning how to make 2D graphics using OpenGL.

______________________________________________

#include <GLFW/glfw3.h>

#include <gl/GL.h>

#include <iostream>

void render_loop(float a, float b, float x, int lineHeight2, int lineHeight1)

{

glClearColor(0.2f, 0.2f, 1.0f, 0.1f);

glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

glBegin(GL_QUADS);

glColor3f(0.0, 1.0, 0.0);

glVertex2f(a + x, 0.0f);

glVertex2f(b + x, 0.0f);

glVertex2f(b + x, 700.0f);

glVertex2f(a + x, 700.0f);

glColor3f(1.0, 1.0, 1.0);

glVertex2f(0, 0.0f);

glVertex2f(200, 0.0f);

glVertex2f(200, 900.0f);

glVertex2f(0, 900.0f);

glEnd();

glBegin(GL_LINES);

glColor3f(1.0, 0.0, 0.0);

glVertex2i(100, lineHeight2);

glVertex2i(200, lineHeight2);

glColor3f(1.0, 0.0, 1.0);

glVertex2i(0, lineHeight1);

glVertex2i(100, lineHeight1);

glColor3f(0.0, 0.0, 0.0);

glVertex2i(100, 0);

glVertex2i(100, 800);

glColor3f(0.0, 0.0, 0.0);

glVertex2i(0, 400);

glVertex2i(100, 400);

glEnd();

}

/* program entry */

int main(int argc, char* argv[])

{

GLFWwindow* window;

if (!glfwInit())

{

std::cout << "Failed to initialize GLFW\n";

exit(EXIT_FAILURE);

}

glfwWindowHint(GLFW_RESIZABLE, GL_FALSE);

window = glfwCreateWindow(1800, 800, "LearnSinging", NULL, NULL);

glfwSetWindowPos(window, 10, 40);

if (!window)

{

std::cout << "Failed to open GLFW window\n";

glfwTerminate();

exit(EXIT_FAILURE);

}

glfwMakeContextCurrent(window);

glfwSwapInterval(1);

// set up view

glViewport(0, 0, 1800, 800);

glMatrixMode(GL_PROJECTION);

glLoadIdentity();

// see https://www.opengl.org/sdk/docs/man2/xhtml/glOrtho.xml

glOrtho(0.0, 800.0, 0.0, 800.0, 0.0, 1.0); // this creates a canvas

// you can do 2D drawing

// Main loop

float x = 0.0f;

/*If screen's frequency is 60 hz, that is 60 iterations/second, then

speedFactor = 1 means

speed = 1 pixel / iteration, or 60 pixels/second.

In general speedFactor = speed/screen's frequency.

For example if you prefer whole note = 1 sec representing by 90 pixels,

it means speed = 90 pixels/sec, which needs speedFactor = 1.5.*/

float speedFactor = 1.3f;

float a{ 200.0f };

float b{ 750.0f };

int lineHeight2{ 500 };

int lineHeight1{ 500 };

int lastTime{ 0 };

while (!glfwWindowShouldClose(window))

{

if (glfwGetTime() >= lastTime)

{

lineHeight2 = rand() % 100 + 650;

lineHeight1 = ((lineHeight2 - 700) * 5) + 400;

lastTime = lastTime + 1;

}

x = x - speedFactor;

// Draw gears

render_loop(a, b, x, lineHeight2, lineHeight1);

// Swap buffers

glfwSwapBuffers(window);

glfwPollEvents();

}

// Terminate GLFW

glfwTerminate();

// Exit program

exit(EXIT_SUCCESS);

}

Wednesday, May 13, 2020

General formulas for the equal-tempered interval

It is copied from wikipedia's article "Equal temperament", now in "View history" section.

Background theory

Basic concepts:
From logarithms: The logarithm of a number is the exponent to which another fixed number, the base, must be raised to produce that number. For example: (98)² = 8164. The logarithm of 8164 to base 98 is 2, or

\log _{\frac {9}{8}}{\frac {81}{64}}=2.

To compare logarithms, we use some standard base, e.g. 10 or e. Usually, in mathematical notation "log" means "logarithm with base 10". For example: log 98 = log 1.125 = 0.0511525224473813, and, log (98)² = log 8164 = log 1.265625 = 0.1023050448947626. The latter logarithm divided by the former yields 2, the exponent as above.
From physics: For a given velocity of the wave on the string, length and frequency of the sound produced from a string when it is plucked, are inverse proportional. For the same tension in the string, and the same string’s linear density, the velocity of the wave on the string is the same.
From acoustics: Pitch is the distinctive quality of a sound, dependent primarily on the [fundamental] frequency of the sound waves produced by its source.^[45]
Psychoacoustical law: It was known to Pythagoras^[46]^[47] and perhaps to ancient Chinese philosophers, that our subjective sensation of the difference (interval) in pitch (tone) is inversely proportional to the exponent of the ratio of string (chord) lengths. In modern terms, it is proportional to the logarithm of the ratio of fundamental frequencies of sounds produced by two different lengths of the string.

Example

Consider two pairs of a string's lengths: 90 cm, 80 cm, and, 81 cm, 64 cm. Our subjective sensation of pitch difference (interval) between the sounds produced by string's lengths 81 cm and 64 cm, the ratio 8164 = (98)², is double that between 90 cm and 80 cm, the ratio 9080 = 98 (see above, From logarithms and Psychoacoustical law). However the pitch of sound from the 90 cm is lower than that from the 80 cm, and from the 81 cm is lower than that from the 64 cm. Now consider two pairs of fundamental frequencies (from other lengths): 90 Hz/80 Hz and 81 Hz/64 Hz. Our subjective sensation of the ratio 8164 = (98)² is also twice that of 9080 = 98, but the pitch of 90 Hz is higher than that of 80 Hz, and that of 81 Hz higher than that of 64 Hz. The greater the string's length, the lesser the fundamental frequency, and the lower the pitch.
Let A and B be two periodic sounds with stable fundamental frequencies (stimuli) ffreq_A and ffreq_B. Then, the ratio ffreq_Affreq_B is called the ratio interval. Our subjective sensation of the interval between these sounds is proportional to log ffreq_Affreq_B.
Let C and D be two other periodic sounds with stable fundamental frequencies: ffreq_C and ffreq_D. Their ratio interval is ffreq_Cffreq_D. Let

{\displaystyle m={\frac {\log {\frac {\mathrm {ffreq_{A}} }{\mathrm {ffreq_{B}} }}}{\log {\frac {\mathrm {ffreq_{C}} }{\mathrm {ffreq_{D}} }}}}}

(see above, logarithms, second example, and below, main formula for d = 1). Then

{\displaystyle log{\frac {\mathrm {ffreq_{A}} }{\mathrm {ffreq_{B}} }}=m\cdot \log {\frac {\mathrm {ffreq_{C}} }{\mathrm {ffreq_{D}} }}.}

Our subjective sensation of the sound interval A/B (represented by log ffreq_Affreq_B), is proportional, by proportion m, to our subjective sensation (represented by log ffreq_Cffreq_D) of the sound interval C/D. The proportion m is magnitude, and log ffreq_Cffreq_D the unit for the tempered interval A/B.
Indeed, if, in the last equation, we multiply m by d and divide log ffreq_Cffreq_D by d, that is

{\frac {\log {\frac {\mathrm {ffreq_{C}} }{\mathrm {ffreq_{D}} }}}{d}}

then md is the magnitude and

{\frac {\log {\frac {\mathrm {ffreq_{C}} }{\mathrm {ffreq_{D}} }}}{d}}

is the unit, as with every quantity, e.g. length L = 5 m. If the magnitude 5 is multiplied by 100, and m (certain length, serving as unit) divided by 100 (that is, cm), then L = 500 cm. When ffreq_Cffreq_D = 2 and d = 1200, unit is cent (one 1200th of octave)."
Let t(r) be the tempered interval of ratio interval r. Then we have:

t(r)=\log r=m\cdot {\frac {\log b}{d}},

where m is magnitude and log bd is the unit of tempered interval. The main formula below is obtained from this one.
In the example 2 of the main formula, it is

{\frac {\log {\frac {3}{2}}}{\log {\frac {9}{8}}}}=3.442...,

which means our subjective sensation of the ratio interval 32 (a perfect fifth) is 3.442 times our subjective sensation of the ratio interval 98 (a major second). The number 3.442 is the magnitude, and the tempered interval t(98) = log 98 is the unit of tempered interval t(32) = log 32. Multiplying by 100 yields 344.2 hundredths of the tempered interval t(98).
Note: Pitch depends to a lesser degree on the sound pressure level (loudness, volume) of the tone, especially at frequencies below 1,000 Hz and above 2,000 Hz. The pitch of lower tones gets lower as sound pressure increases. For instance, a tone of 200 Hz that is very loud seems one semitone lower in pitch than if it is just barely audible. Above 2,000 Hz, the pitch gets higher as the sound gets louder. See Pitch and frequency (last paragraph, "Pitch depends to a lesser degree...").

Main formula

m={\frac {\log r}{\log b}}\cdot d={\frac {\ln r}{\ln b}}\cdot d

where m is the magnitude of tempered interval, r the ratio interval, b the base interval (2 for octave) and d the divisor of base interval (1200 for cents).
Here are some examples of the formula:

Ratio interval r = 32, and unit cent (b = 2 and d = 1200), then m = 701.955... and t(32) = 701.955 cents.
Ratio interval r = 32, and unit hundredth of the whole tone 98 (b = 98 and d = 100), then m = 344.247... and t(32) = 344.247 hundredths of the whole tone.
Ratio interval r = 2 (octave) and unit hundredth of the whole tone 98 (b = 98 and d = 100), then m = 588.4949... and t(2) = 588.4949 hundredths of the whole tone.
Ratio interval r = 2 (octave), and unit hundredth of Pythagorean limma (b = 256243 and d = 100), then m = 1330 and t(2) = 1330 hundredths of Pythagorean limma.
Ratio interval r = 800729 (idiosyncratic of Byzantine music), and unit tenth of Pythagorean comma (b = 531441524288 and d = 10), then m = 68.58... and t(800729) = 68.58 tenths of Pythagorean comma.
Ratio interval r = 256243 (Pythagorean limma), and unit hundredth of the fourth (b = 43 and d = 100), then m = 18.1158... and t(r) = 18.1158 hundredths of the fourth.
Ratio interval r = 98 (whole tone), and unit hundredth of the fourth (b = 43 and d = 100), then m = 40.942... and t(98) = 40.942 hundredths of the fourth.
Ratio interval r = 1.359 (1 plus random decimal 359), and unit cent (b = 2 and d = 1200), then m = 531.05... and t(1.359) = 531.05 cents.
Ratio interval r = 0.052 (random decimal part), and unit cent (b = 2 and d = 1200), then m = -5118 and t(0.052) = −5118 cents.
Ratio interval r = 1, and unit cent (b = 2 and d = 1200), then m = 0 and t(1) = 0 cents.
Ratio interval r = 2 (octave), and unit thousandth of tempered 10 (b = 10 and d = 1000), then m = 301.03 and t(2) = 301.03 1000ths of t(10).
With natural logarithm (ln): Ratio interval r = 2 (octave) and unit thousandth of tempered e (b = e = 2.71828... and d = 1000), then m = 693 1000ths of t(e) with natural logarithm.

Remarks:
1. For 11th example: log(10) = 1, so log(b) (the unit) becomes 1 and is ignored. So we could call it "absolute" or "abstract" unit with log.
2. For 12th example: ln(e) = 1, so ln(b) (the unit) becomes 1 and is ignored. So we could call it "absolute" or "abstract" unit with ln.

Derived formula

r=b^{\frac {m}{d}}

where r, b, d, m are as in the main formula, above. This can be obtained by taking the exponent of each side in the main formula.

When b = 2 (octave), d = 1200, m = 100 (tempered semitone), then r = 1.0594630943592953, ratio interval of tempered semitone.
When b = 9 8 (whole tone), d = 100, m = 344.247..., then r = 1.5 = 32 (perfect fifth). Compare example 2 of main formula.
When b = 256243, d = 100, m = 226, then r = 1.125 = 98 (whole tone).
When b = 256243, d = 100, m = 1330, then r = 2 (octave). Compare example 4 of main formula.
When b = 256243, d = 100, m = 655 (random number), then r = 1.406859...
When t(r) = −351 (random) cents (b = 2, d = 1200), then r = 0.816485969595...
Let note NH = 250 Hz, and we need some note's frequency NH plus 98, plus 800729, minus (descending diesis) 13 of t(256243) (tempered Pythagorean lemma). Calculation is following:

Desired frequency = 250 Hz × 98 × 800729 × (256243)⁻¹³ = 250 Hz × 1.125 × 1.097393681 × 0.982778 = 303.3 Hz.

8. When b = 10, d = 1000, m = 301.03, then r = 2 (octave).

Notes

The numbers r, m, b, d could be random under following conditions:
1. The magnitude (m) can be any real number.
2. The ratio interval (r) must be a positive number: r > 0.
3. The base interval (b) must be greater than unity: b > 1.
4. The divisor (d) must be a natural number: d = 1, 2, 3, ...
The relations between r and m are:
- When 0 < r < 1, then m < 0.
- When r = 1, then m = 0.
- When r > 1, then m > 0.

as with logarithms in general.