Assignable or not

While NumPy arrays are flexible and can be changed (mutable), TensorFlow tensors are fixed (immutable). This means that it is possible to directly modify the values of NumPy arrays, but changing the values of TensorFlow tensors after their creation is not allowed.

For example, the following NumPy array allows itself to be assigned:

 

In contrast, a TensorFlow tensor cannot be revised as follows:

 

To update TensorFlow tensors, the tf.Variable class can be used.

 

Gradient Computation Capabilities

NumPy can't retrieve the gradient of any differentiable expression for any of its inputs. To apply some computation to one or several input tensors and retrieve the gradient of the result with respect to the inputs, just open a GradientTape scope as below:

When dealing with a constant tensor, it needs to be explicitly marked for tracking by calling watch() on it. This is because storing the information required to compute the gradient of anything with respect to anything would be too expensive to do preemptively. The following example utilises watch() to avoid wasting resources, ensuring that the GradientTape knows what to monitor:

The GradientTape is capable of computing second-order gradients (the gradient of a gradient).

  • The gradient of the position of an object with regard to time is the speed of that object.
  • The second-order gradient is its acceleration.

Here is an example below:

 

'ArtificialIntelligence > TensorFlow' 카테고리의 다른 글

Basic functions of Keras within TensorFlow  (0) 2024.01.18

The De Moivre's formula is:

$$ [r(cos\theta + (\imath *sin\theta ))]^n=r^n((cos * n\theta) + (i * sin * n\theta)) $$

 

The following two terms are complex numbers, which are the combinations of a Real Number and Imaginary Number:

$$ \imath *sin\theta\ $$

$$ i * sin * n\theta $$

  • A Real Number is the type of number: 1.4, 5/8, -2390, 0, for example.
  • An Imaginary Number gives a negative result when squared: i^2=-1

The complex number is

$$ 4 + 3\imath $$

 

r is

$$ r = \sqrt{4^2+3^2}=\sqrt{25}=5 $$

angle (in radian) is

$$ \Theta =tan^{-1}(y/x)=tan^{-1}(3/4)=0.6435 $$

x is

$$ cos(\theta)=x/r $$

$$ x=r*cos(\theta)=5*cos(0.6435)=4 $$

y is

$$ sin(\theta)=y/r $$

$$ y=r*sin(\theta)=5*sin(0.6435)=3 $$

 

Here is the common way to write the complex number below:

$$ x+(i*y)=r(cos\theta + (i*(sin\theta)))=r*cis\theta $$

  • Note that a combination of cos and sin is often shortened to 'cis'

In the case, therefore, the complex number can be written as follows:

$$ 4+3i=5*cis(0.6435) $$

 

In the De Moivre's formula,

$$ [r(cos\theta + (\imath *sin\theta ))]^n=r^n((cos * n\theta) + (i * sin * n\theta)) $$

magnitude becomes

$$ r^n $$

angle (in radian) becomes

$$ n\theta $$

 

In the above case, the De Moivre formula is

$$ (5*cis(0.6435))^2=5^2*cis(2*0.6435)=25*cis(1.287) $$

So, the magnitude is 25 and the angle is 1.287 in radian.

'Mathematics' 카테고리의 다른 글

Vector Normalisation 벡터 정규화  (0) 2023.10.24
Moore-Penrose inverse 무어-펜로즈 의사역행렬  (0) 2023.09.13
Gram-Schmidt orthogonalization  (0) 2023.07.24
Pascal's rule  (0) 2021.09.02
Combinations with repetition  (0) 2021.08.26

To avoid significant noise amplification when the number of training data are small, an approach is to add an extra term (extra constraint) to the least-squares cost function.

  • The extra term penalises the norm of the coefficient vector.

Modifying cost functions to favour structured solutions is called regularisation. Least-squares regression combined with l2-norm regularisaion is known as ridge regression in statistics and as Tikhonov regularisation in the literature on inverse problems.

 

In the simplest case, a positive multiple of the sum of squares of the variables is added to the cost function:

$$ \sum_{i=1}^{k}(a_i^Tx-b_i)^2+\rho \sum_{i=1}^{n}x_i^2 $$

where

$$ \rho>0 $$

  • The extra terms result in a sensible solution in cases when minimising the first sum only does not

To refine the choice among Pareto optimal solutions, the objective function landscape can be adjusted by adding specific terms.

'ConvexOptimisation' 카테고리의 다른 글

Least-squares problems  (0) 2024.01.16
Mathematical optimisation problem (basic concept)  (0) 2024.01.16

Data representations for neural networks

  1. Scalars (rank-0 tensors)
    e.g. np.array(2)
  2. Vectors (rank-1 tensors or 1D tensor)
    e.g. np.array([2, 3])
  3. Matrices (rank-2 tensors or 2D tensor)
    e.g. np.array([[2, 3], [4, 5]])
  4. Rank-3 (3D tensor)
    e.g. np.array([[[2, 3, 4], [5, 6, 7]], [[8, 9, 10], [11, 12, 13]], [[14, 15, 16], [17, 18, 19]]])
           features: 2, width: 3, samples: 1

Tensors are a generalisation of matrices to an arbitrary number of dimensions (a dimension is often called an axis).

Ranks are the number of axes of a tensor.

 

Tensor operations

  • Build a model by stacking Dense layers on top of each other
    • Hidden layer: tensorflow.keras.layers.Dense(512, activation="relu")
      • This function is as follows: relu(dot(input, W) + b)
      • dot is a dot product between the input tensor and a tensor named W
      • + is an addition between the resulting matrix and a vector b
      • "relu" stands for "rectified linear unit", and relu(x) is equivalent to max(x, 0)
      • "relu" activation is usually used in the hidden layer, especially in the first layer
      • The number of units (neurons), 512 in this case, should be determined based on the complexity of the data. It should not be random.
    • Output layer: tensorflow.keras.layers.Dense(10, activation="softmax")
      • This last layer is a 10-way softmax classification layer, which means it will return an array of 10 probability scores (summing to 1).
  • Make the model ready for training at the compilation step
    • optimizer: The mechanism through which the model will update itself based on the training data it sees, to improve its performance.
    • loss: How the model will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction. The purpose of loss functions is to compute the quantity that a model should seek to minimise during training.
    • metrics: This function is used to judge the performance of a model (monitor during training and testing).
  • Fit the model to its training data
    • epochs
    • batch_size

 

The three screenshots are excerpts from the code available at https://github.com/we1c0me2s0rapark/UoL-MLNN/blob/main/MNIST.ipynb

 

In the above case, given the shape of the training data as (60000, 784), the number of gradient updates is calculated as 2345, obtained from ((60000/128) * 5).

'ArtificialIntelligence > TensorFlow' 카테고리의 다른 글

NumPy array vs. TensorFlow tensor  (0) 2024.01.23

+ Recent posts