Image from https://depts.washington.edu/wmatkins/kinetics/michaelis-menten.html

About

The main focus of this project was to find the best fitting model parameters to represent a data set of reaction velocities and substrate concentrations. In an ideal situation, the data could be perfectly modeled by the Michaelis-Menten curve shown in the image with the red curve. The equation to model this is shown below the graph. V0 refers to the reaction velocity v. How we obtain V0 from the set of data will be further discussed later. Vmax refers to the maximum reaction velocity or the asymptote at which the red curve approaches. Km is the substrate concentration that corresponds to a reaction velocity that is half Vmax. And S is the substrate concentration.

In this project, we were given 5 enzymes to analyze: A, B, C, D, E. Each enzyme had a test 1 and an extra test 1 duplicate. We therefore had 10 sets of data in total. Our first step was to obtain the V0 values from each set of data. For each set of data, we were given 10 seperate initial substrate concentrations and we could therefore determine 10 sets of points on the graph (i.e. (S1, V1), (S2, V2)... (S10, V10)). The main challenge in this project was to determine the best fit Vmax and Km parameters in the Michaelis-Menten Equation to represent the 10 sets of points we have found from the data. We combined two fundamental steps in order to solve this nonlinear regression problem. The first step involved an algebraic approach while the second step involved mixing and matching our potential Vmax and Km candidates. Finally, we apply gradient descent to further improve the resulting Vmax and Km values from the previous steps. Our ultimate goal is to be as accurate as previously discovered methods such as the Eadie–Hofstee diagram, Hanes–Woolf plot and Lineweaver–Burk plot.

The Excel data and algorithm written on MATLAB can be found here.


Obtaining the reaction velocities V0
To simplify the explanation, we will just be focusing on test 1 of Enzyme A (the same process was performed on the Enzymes and tests).

To obtain the 10 sets of points i.e. (S1, V1), (S2, V2)... (S10, V10) from Enzyme A Test 1, we will have to determine the reaction velocities v0 from the data. At the moment, we only have the value of S1, S2 ..., S10 (the substrate concentrations) known that is the values 3.75 uM, 7.5 uM, ..., 2000 uM. The reaction velocities v0 for each substrate concentration can be found as the initial rate of change of the plot of [P](uM) vs Time (s). To find the corresponding reaction velocity v0 for substrate concentration 3.75 uM, a plot is shown of [P](uM)- column B vs Time (s)- column A. Keep in mind the plot here is not the same as the Michaelis-Menten curve shown in red above.

The reaction velocity v0 can be represented by the slope of the tangent line at the initial interval of the [P] (uM) vs Time (s) plot. Based on our algorithm, we determined that it was most accurate to find the slope of the line between the initial point and the point at the 5% interval.

The plot between [P](uM)- column B vs Time (s)- column A has data recorded up to 1480 seconds. We therefore find the first 5% interval which is from 0 seconds to 74 seconds. The corresponding value of [P] (uM) at time 74 seconds is 1.492 uM and we know that the initial point is (0 seconds, 0 uM). The slope of the tangent line can then be found as (1.492 - 0) / (74 - 0) which yields a reaction velocity v0 of approximately 0.2016 uM / s. We have found our first point (S1, V1) as (3.75 uM, 0.2016 uM / s). The same process is done to find the rest of the points (S2, V2), (S3, V3)...(S10, V10).

Because we were given a test 1 and extra test 1 duplicate for each enzyme, we ended up with two computed V0s for each substrate concentration. We decided to average the two V0 values to move forward with our analysis. For example while Enzyme A test 1 yielded an initial point of (3.75 uM, 0.02016 uM / s), Enzyme A test 1 duplicate yielded an initial point of (3.75 uM, 0.02684 uM / s). We would take the average and use the point (3.75 uM, 0.0235 uM / s) in our analysis.

The rest of the points for Enzyme A were determined by the same process and are as follows:
(3.75 uM, 0.0235 uM / s), (7.5 uM, 0.0437 uM / s), (15 uM, 0.0788 uM / s), (30 uM, 0.1552 uM / s), (65 uM, 0.2633 uM / s), (125 uM, 0.3997 uM / s), (250 uM, 0.5553 uM / s), (500 uM, 0.6730 uM / s), (1000 uM, 0.7908 uM / s), (2000 uM, 0.8794 uM / s).

Nonlinear Regression
By plotting our previously determined points from the section 'Obtaining the reaction velocities V0', we would come up with points that may seem to line up to fit the Michaelis-Menten curve similar to the red curve at the top of this page.

From the Michaelis-Menten equation, we have two unknown parameters: Vmax and Km. The animation from desmos is shown where the variable y represents V0 (reaction velocity v), x represents Substrate Concentration [S], v represents Vmax and k represents Km. Our goal is to find the best Vmax and Km to model our data points. In an ideal world, there will be a value for Vmax and Km which will yield a curve that passes through each and every single data point.

Step 1: The Algebraic Approach
The main idea involves solving a system of equations. At the moment, we are solving for two unknown variables Vmax and Km. Because we need two equations to solve two unknowns, we can pick any two coordinates from our previously determined 10 points to form two equations. The working is shown with the final equations highlighted.

Just for example, we might test out points (S7, V7), (S10, V10). Their values are (250 uM, 0.5553 uM / s) and (2000 uM, 0.8794 uM / s) respectively. Based on the highlighted final equations, we should calculate Km first since that equation does not hold any unknown values. Though it does not matter which points we pick for V1, S1 and V2, S2, it is important to be consistent after you choose the values. The computed Km based on these two coordinates is 189.925 uM and the Vmax value was found to be 0.9594 uM / s. This method can be repeated for several different combinations of points e.g. (S7, V7), (S8, V8) or (S7, V7), (S9, V9)... etc.

With this method, the challenge was to determine which two set of coordinates out of our 10 would be the best to select and represent the model. If we try out all possible combinations we will get 0.5*n*(n-1) different possibilities (where n is the number of points total). We therefore have 45 different combinations to test and try out.

The first iteration of this idea was to find the best pair of coordinates out of the 45 combinations which yielded Vmax and Km values that resulted in the least SSE (Sum of Squared Errors). While this method was fairly accurate, it lacked precision (especially when we ran the algorithm on all 5 enzymes). This led us to reconsider our approach and ultimately led to Step 2: mixing and matching.

Step 2: Mix and Match / Minimizing SSE
Rather than test out just 45 possible pairs of Km and Vmax, we finalized our algorithm by testing every single possible combination of them. The potential candidates that would go through the evaluation to minimize SSE would therefore include:


We would end up with 45 * 45 = 2025 possible combinations of Km and Vmax for each of the 5 enzymes. For each pair in the 2025, the Km and Vmax would be passed to a function to evaluate the SSE. This is done by summing the squared differences between the predicted value of the coordinate points based on the provided Km and Vmax parameters vs the true values based on the 10 determined coordinate points in the beginning.

Enzyme A Results before Gradient Descent
Vmax: 0.920935 uM/s
Km: 162.448487 uM
SSE: 0.001414

Data Points:
(3.75 uM, 0.0235 uM / s)
(7.5 uM, 0.0437 uM / s)
(15 uM, 0.0788 uM / s)
(30 uM, 0.1552 uM / s)
(65 uM, 0.2633 uM / s)
(125 uM, 0.3997 uM / s)
(250 uM, 0.5553 uM / s)
(500 uM, 0.6730 uM / s)
(1000 uM, 0.7908 uM / s)
(2000 uM, 0.8794 uM / s)

Enzyme B Results before Gradient Descent
Vmax: 0.855029 uM/s
Km: 362.841333 uM
SSE: 0.000988

Data Points:
(3.75 uM, 0.0095 uM / s)
(7.5 uM, 0.0180 uM / s)
(15 uM, 0.0361 uM / s)
(30 uM, 0.0677 uM / s)
(65 uM, 0.1314 uM / s)
(125 uM, 0.2179 uM / s)
(250 uM, 0.3473 uM / s)
(500 uM, 0.4816 uM / s)
(1000 uM, 0.6532 uM / s)
(2000 uM, 0.7131 uM / s)

Enzyme C Results before Gradient Descent
Vmax: 1.194908 uM/s
Km: 201.247645 uM
SSE: 0.000501

Data Points:
(3.75 uM, 0.0231 uM / s)
(7.5 uM, 0.0431 uM / s)
(15 uM, 0.0837 uM / s)
(30 uM, 0.1510 uM / s)
(65 uM, 0.2883 uM / s)
(125 uM, 0.4506 uM / s)
(250 uM, 0.6640 uM / s)
(500 uM, 0.8687 uM / s)
(1000 uM, 0.9832 uM / s)
(2000 uM, 1.0841 uM / s)

Enzyme D Results before Gradient Descent
Vmax: 1.550868 uM/s
Km: 299.282898 uM
SSE: 0.000672

Data Points:
(3.75 uM, 0.0202 uM / s)
(7.5 uM, 0.0395 uM / s)
(15 uM, 0.0770 uM / s)
(30 uM, 0.1472 uM / s)
(65 uM, 0.2802 uM / s)
(125 uM, 0.4683 uM / s)
(250 uM, 0.7040 uM / s)
(500 uM, 0.9684 uM / s)
(1000 uM, 1.1802 uM / s)
(2000 uM, 1.3662 uM / s)

Enzyme E Results before Gradient Descent
Vmax: 1.589693 uM/s
Km: 165.594468 uM
SSE: 0.001004

Data Points:
(3.75 uM, 0.0350 uM / s)
(7.5 uM, 0.0717 uM / s)
(15 uM, 0.1335 uM / s)
(30 uM, 0.2301 uM / s)
(65 uM, 0.4597 uM / s)
(125 uM, 0.6790 uM / s)
(250 uM, 0.9527 uM / s)
(500 uM, 1.2123 uM / s)
(1000 uM, 1.3478 uM / s)
(2000 uM, 1.4749 uM / s)

Step 3: Gradient Descent
The results from the previous two steps look pretty good. Could we improve on this though? By using gradient descent we can edge even closer to the optimal value of Vmax and Km.

We first start off from our previously found "optimal values" of Vmax and Km shown in the results above. Because these may be suboptimal values or in other words values that lead us to a local minima of the Sum of Squared Errors (SSE), the plan is to explore ±10% from our current values. If we look at Testing Pairs we could see that we try out 5 values from our current ones. Because we have both Vmax and Km this will result in 5 * 5 = 25 pairs. Each pair is run through gradient descent and after convergence the SSE is evaluated. The converged pair that provides us with the least SSE is kept.

For a deeper understanding of how gradient descent can enhance model optimization and to see its application in other contexts, consider exploring my other project blog post Neural Network Architecture from Scratch.

.

Enzyme A Results after Gradient Descent
Vmax: 0.934056 uM/s
Km: 170.568012 uM
SSE: 0.001286

Data Points:
(3.75 uM, 0.0235 uM / s)
(7.5 uM, 0.0437 uM / s)
(15 uM, 0.0788 uM / s)
(30 uM, 0.1552 uM / s)
(65 uM, 0.2633 uM / s)
(125 uM, 0.3997 uM / s)
(250 uM, 0.5553 uM / s)
(500 uM, 0.6730 uM / s)
(1000 uM, 0.7908 uM / s)
(2000 uM, 0.8794 uM / s)

Enzyme B Results after Gradient Descent
Vmax: 0.855870 uM/s
Km: 362.842262 uM
SSE: 0.000987

Data Points:
(3.75 uM, 0.0095 uM / s)
(7.5 uM, 0.0180 uM / s)
(15 uM, 0.0361 uM / s)
(30 uM, 0.0677 uM / s)
(65 uM, 0.1314 uM / s)
(125 uM, 0.2179 uM / s)
(250 uM, 0.3473 uM / s)
(500 uM, 0.4816 uM / s)
(1000 uM, 0.6532 uM / s)
(2000 uM, 0.7131 uM / s)

Enzyme C Results after Gradient Descent
Vmax: 1.194148 uM/s
Km: 201.249568 uM
SSE: 0.000499

Data Points:
(3.75 uM, 0.0231 uM / s)
(7.5 uM, 0.0431 uM / s)
(15 uM, 0.0837 uM / s)
(30 uM, 0.1510 uM / s)
(65 uM, 0.2883 uM / s)
(125 uM, 0.4506 uM / s)
(250 uM, 0.6640 uM / s)
(500 uM, 0.8687 uM / s)
(1000 uM, 0.9832 uM / s)
(2000 uM, 1.0841 uM / s)

Enzyme D Results after Gradient Descent
Vmax: 1.554397 uM/s
Km: 299.280365 uM
SSE: 0.000646

Data Points:
(3.75 uM, 0.0202 uM / s)
(7.5 uM, 0.0395 uM / s)
(15 uM, 0.0770 uM / s)
(30 uM, 0.1472 uM / s)
(65 uM, 0.2802 uM / s)
(125 uM, 0.4683 uM / s)
(250 uM, 0.7040 uM / s)
(500 uM, 0.9684 uM / s)
(1000 uM, 1.1802 uM / s)
(2000 uM, 1.3662 uM / s)

Enzyme E Results after Gradient Descent
Vmax: 1.590849 uM/s
Km: 165.598473 uM
SSE: 0.001000

Data Points:
(3.75 uM, 0.0350 uM / s)
(7.5 uM, 0.0717 uM / s)
(15 uM, 0.1335 uM / s)
(30 uM, 0.2301 uM / s)
(65 uM, 0.4597 uM / s)
(125 uM, 0.6790 uM / s)
(250 uM, 0.9527 uM / s)
(500 uM, 1.2123 uM / s)
(1000 uM, 1.3478 uM / s)
(2000 uM, 1.4749 uM / s)