CS 370

Floating Point

Computers have a limited amount of memory, they cannot represent all numbers.

S \subset \mathbb{R}

Assume we want to compute e^{-5.5}. Using the Taylor expansion, we know that e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + .... We cannot include infinite terms, so we need to make an approximation. This will lead to a result of 0.0026363.

However, if we use e^{-x} = \frac{1}{1 + x + \frac{x^2}{2!} + ...}, we will get a result of 0.0040865 which is much closer to the true value.

Let us consider the number 12.25. This is equal to 1\cdot 10 + 2\cdot 10^0 + 2\cdot 10^{-1} + 5\cdot10^{-5}. We can generalize this as d_k \in \{0, 1, ..., 9\} with d_0d_1...d_{j-1}.d_jd_{j+1}...d_{j+i-1}and is represented as d_0\cdot 10^{j-1} + d_1\cdot 10^{j-2} + ... + d_{j+i-1} \cdot 10^{-i}. Computers only understand bits, so d_k \in \{0, 1\}. So instead we have (\sum_{k=0}^{j+i-1}d_k\cdot 2^{-k})\cdot 2^{j-1}. We have some limitations because j, i cannot be arbitrarily large.

We need to truncate, we decide an upper bound on the exponent and the number of significant digits we can represent.

What numbers can we represent?

Example: \beta = 2, m = 3. d_k \in \{0, 1\}, 0 \le j \le 3.

\overline{x} = \left(\sum_{k=0}^2 d_k 2^{-k}\right)\cdot 2^{j-1}

Assume d_0 > 0, so d_0 = 1 (normalized floating point convention). We can simplify the above expression.

\overline{x} = 2^{j-1} + \left(\sum_{k=1}^2 d_k 2^{-k}\right)\cdot 2^{j-1}

We minimize \overline{x} with j=0, d_k = 0. We maximize with j=3, d_k = 1.

In general, given \beta, m, j_{max}, j_{min}, the minimum is \beta^{j_{min} - 1} and the maximum is (\beta - \beta^{1 - m}) \cdot \beta^{j_{max}}. So all numbers are in the range of [\beta^{j_{min} - 1}, (\beta - \beta^{1 - m})\cdot \beta^{j_{max}}].

The minimum significant is 1, and the maximum significant is \beta. Fixing j, then \Delta = \beta^{j-m}.

The number of floating point numbers within the interval are \frac{\beta^j - \beta^{j-1}}{\Delta} = (\beta - 1) \beta^{m-1}. So the error is bounded by |f(x) - x| \le \frac{1}{2} \beta^{-(m-1)}\beta^{j-1}, with relative error \frac{|f(x) - x|}{|x|} \le \frac{1}{2}\beta^{-(m-1)}.

Errors

We can have error when the arithmetic operation results in a number with m^\prime > m.

Example: \overline{x} = 1.235, \overline{y} = 1.234. \overline{x} - \overline{y} = 1.509\mathbf{1}.

Example: I_n = \int_{0}^1 \frac{x^n}{x+\alpha}dx

I_0 = \int_0^1 \frac{1}{x+\alpha}dx = \log(1+\alpha) - \log(\alpha) = \log(\frac{1 + \alpha}{\alpha}) We can compute recursively, I_n = \frac{1}{n} - \alpha I_{n-1}.

Polynomial Interpolation

Given n points (x_i, y_i) where x_i are distinct. Find a polynomial p(x) of degree at most n-1 such that p(x_i) = y_i.

We want p(x) = c_1 + c_2x + ... + c_nx^{n-1}. To find p(x), we set up n linear equations in n unknowns.

  1. Does the interpolating polynoials always exist?
  2. Are interpolating polynomials unique?

We can either show that y_i = 0 \Rightarrow c_i = 0 or that we can always find an answer or that \det V = \Pi_{i<j}(x_i - x_j) \neq 0 when x_i distinct.

Lagrange Form of Interpolating Polynomial

We can always construct an interpolating polynomial as follows. p(x) = \sum_{i=1}^n y_i L_i(x). Where each L_i(x) is a polynomial of degree at most n-1 which also satisfies L_i(x_i) = 1, L_i(x_j) = 0, \forall i \neq j.

L_i(x) = \frac{(x - x_1)(x - x_2)...(x - x_{i-1})(x - x_{i+1})...(x_n)}{(x_i - x_1)(x_i - x_2)...(x_i - x_{i-1})(x_i - x_{i+1})...(x_i - x_n)}

A1 Hint Aside

I_0, I_1, ..., I_n = \frac{1}{n} - \alpha I_{n-1}.

If our starting point is incorrect, we want to look at the errors in every step. How does the error grow? We have a recurrence for the error.

Hermite Interpolation

Given n points (x_i, y_i) with dervative values s_i, we attempt to find S(x) of degree n+1 satisfying S(x_i) = y_i, S(x_i)^\prime = s_i.

Cubic Splines

Polynomial interpolation does not tend give answers which keep the shape of the given points, it flows between them. We want a function which looks good.

We want a function S(x) which satisfies the following conditions.

  1. In each interval, S(x) is a cubic polynomial.
  2. S(x) interpolates the points, S(x_i) = y_i.
  3. S^\prime(x), S^{\prime\prime}(x) exists and is continuous [x_1, x_n].
  4. Two additional boundary conditions.

S(x) = \begin{cases} S_1(x),\ &x_1 \le x \le x_2\\ S_2(x),\ &x_2 \le x \le x_2 \\ ... \\ S_{n-1}(x),\ &x_{n-1} \le x \le x_n\\ \end{cases}

Where s_i, s_i^\prime are unknowns, well-defined.

How many unknowns? 4(n-1). How many conditions? 2(n-1) interpolation, 2(n-2) derivatives, 2 boundary conditions.

Boundary Conditions

Natural Boundaries

S^{\prime\prime}(x_1) = S^{\prime\prime}(x_n) = 0.

Clamped

S^{\prime}(x_1) = s_1, S^{\prime}(x_n) = s_n, where s_1, s_n are constants.

Periodic

S^\prime(x_1) = S^\prime(x_n), S^{\prime\prime}(x_1) = S^{\prime\prime}(x_n).

Not-a-knot

S^{\prime\prime\prime} is continuous at x_2 and x_{n-1}.

Computing Cubic Splines Fast

We dont want to sovle a system of 4(n-1) unknowns.

  1. Treat the derivatives s_i are the n points as unknowns.
  2. Each S_i(x) = a_i + b_i(x-x_i) + c_i(x-x_i)^2 + d_i(x - x_i)^3.

We know that in general.

  1. a_i = y_i.
  2. b_i = s_i.
  3. c_i = \frac{3y_i^\prime - 2s_i - s_{i+1}}{\Delta x_i}.
  4. d_i = \frac{s_i + s_{i+1} - 2y_i^\prime}{\Delta x_i^2}.

For natural splines.

  1. Determined by first boundary condition.

    S^{\prime\prime} = 0 \Rightarrow 2c_1 = 0. So then 2s_1 + s_2 = 3y_1^\prime.

  2. Determined by second boundary condition.

    S^{\prime\prime} = 0 \Rightarrow S_{n-1}^{\prime\prime} = 0. So 2c_{n-1} + 6d_{n-1}(x_n - x_{n-1}) = 0. \begin{aligned}\frac{3y_{n-1}^\prime - 2s_{n-1} - s_n}{\Delta x_{n-1}} + \frac{3(s_{n-1} + s_n - 2y_{n-1}^\prime)}{\Delta x_{n-1}^2} \Delta x_{n-1} &= 0 \\ s_{n-1} + 2s_n &= 3y_{n-1}^\prime\end{aligned}

  3. Based on the interior points.

    \begin{aligned} S^{\prime\prime}_{i-1}(x_i) &= S^{\prime\prime}_i(x_i) \\ 2c_{i-1}+6d_{i-1}(x_i - x_{i+1}) &= 2c_i - 6d_i(x_i - x_i) \\ c_{i-1} + 3d_{i-1}\Delta x_{i-1} &= c_i \\ \frac{3y_{i-1}^\prime - 2s_{i-1} - s_i}{\Delta x_{i-1}} + \frac{3(s_{i-1} + s_i - 2y_{i-1}^\prime)}{\Delta x_{i-1}^2}\Delta x_{i-1} &= \frac{3y_{i}^\prime - 2s_i - 2s_{i+1}}{\Delta x_i} \\ \Delta x_i S_{i-1} + 2(\Delta x_{i-1} + \Delta x_i)S_i + \Delta x_{i-1} S_{i+1} &= 3y_i^\prime \Delta x_{i-1} + 3y_{i-1}^\prime \Delta x_i \\ \end{aligned}

\begin{bmatrix} 2 & 1 & 0 & 0 & 0 & ... & 0 & 0 & 0 \\ \star & \star & \star & 0 & 0 & ... & 0 & 0 & 0\\ 0 & \star & \star & \star & 0 & ... & 0 & 0 & 0\\ 0 & 0 & \star & \star & \star & ... & 0 & 0 & 0\\ & & & & & ... & & & \\ 0 & 0 & 0 & 0 & 0 & ... & \star & \star & \star \\ 0 & 0 & 0 & 0 & 0 & ... & 0 & 1 & 2\\ \end{bmatrix}\begin{bmatrix} s_1 \\ s_2 \\ s_3 \\ s_4 \\ \ \\ ... \\ s_n \end{bmatrix} = \begin{bmatrix} 3y_1^\prime \\\ \\\ \\\\ \\\ \\ 3y_n^\prime \end{bmatrix}

Example: (1, 1), (2, 2), (3, 1), (4, 0), (5, 3), (6, 3).

\Delta x_{i} = 1. y_1^\prime = 1, y_2^\prime = -1, y_3^\prime = -1, y_4^\prime = 3, y_5^\prime = 0.

\left[\begin{array}{cccccc|c} 2 & 1 & 0 & 0 & 0 & 0 & 3\\ 1 & 4 & 1 & 0 & 0 & 0 & 0\\ 0 & 1 & 4 & 1 & 0 & 0 & -6\\ 0 & 0 & 1 & 4 & 1 & 0 & 6\\ 0 & 0 & 0 & 1 & 4 & 1 & 9\\ 0 & 0 & 0 & 0 & 1 & 2 & 0\\ \end{array}\right]

Parametric Curves

Given (x_i, y_i) as points on a curve, we want to find (x(t), y(t)), t_1 \le t \le t_n.

  1. Spline interpolate (i, x_i) to get x(t), then spline interpolate (i, y_i) to get y(t). The curve is (x(t), y(t)).
  2. t changes according to the distance between points.

A1 Hint Aside

Question 6

  1. Create a set of (x, y) points for a given curve. Likely more than one curve.
  2. For each set of points representing a curve, create your arc length parameter t.
  3. Create a spline function in MATLAB for x using (t_i, x_i). X = csape(t, x, conditions). An example of conditions is ‘periodic’.
  4. Create a spline function in MATLAB for y using (t_i, y_i). Y = csape(t, y, conditions).
  5. xpoints = ppval(X, tpoints), ypoints = ppval(Y, tpoints).
  6. plot(xpoints, ypoints).

Initial Value Problems or Ordinary Differential Equations

Examples.

  1. \frac{dy}{dt} = ky(t), y(0) = y_0.
  2. y^\prime(y) = -ty(t).
  3. x^\prime(t) = x(t)(a - \alpha y(t)). y^\prime(t) = y(t)(-b + \beta x(t)).

Solving means to numerically solve. We want t_i, y_i such that y_k \approx y(t_k).

Three methods are provided.

  1. Forward Euler.
  2. Modified or Improved Euler.
  3. Trapezoid Rule.

In each we will assume we are given a time increment h = \Delta t and all time increments will be constant t_k = t_k + k \cdot h.