ECE5883/6883 Assignment Modelling the noise in nanopore-based DNA sequencers
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
ECE5883/6883 Assignment
Modelling the noise in nanopore-based DNA sequencers
April 2025
Your submission must be in the form of a report, detailing all the steps and reasoning for each of the questions below.
In this assignment, we consider a dataset collected from the nanopore DNA sequencer produced by Oxford Nanopore Technologies [1]. This dataset is provided in the Matlab file assignment data.mat which contains a cell array with each cell containing a different signal [2].
The portion of the signals which you will analyse corresponds to the calibration preamble which is used to equalise the different nanopores at a constant level. We will assume that the measurement signals can be modelled as
yi (n) = µ + xi + zi (n) for i = 1, . . . K, n = 1,..., Li
where xi is a random fluctuation around µ due to the variability of the different pores generating the signals, and zi (n) is an independently and identically distributed (i.i.d.) realisation of the additive noise process z(n) that is related to other effects in the measurement process. In the given dataset, there are K = 20000 unique signals and the length Li of each signal varies. For instance, the first signal has length L1 = 242, the second signal has length L2 = 269, and so on. Note that the length of each measured signal is variable due to the randomness of the translocation speed of the DNA strand through the nanopore DNA sequencer.
The goal of this assignment is to model the random processes involved in order to be able to reproduce the measurement process such that it has the closest possible statistical characteristics of the data.
Questions
1. [3 marks] For each yi (n), compute (µ + xi ), assuming that z(n) is a zero-mean process. Estimate the PDF of (µ + xi ). How well does it fit to a uniform or a Gaussian PDF? Remove outliers if necessary.
2. [2 marks] Assuming xi is normally distributed with zero mean, find µ and the variance of the process
xi. You may need to use the values computed in Q1.
3. [3 marks] The signal yi (n) happens to be corrupted by additional noise due to edge effects. Assume there are only N = 128 samples in the middle that are not corrupted by edge effects.
(a) Design a rectangular window wR (n) that mitigates these edge effects.
(b) Write the frequency response WR (n) of the rectangular window wR (n) and plot it.
(c) Calculate the frequency resolution when the modified periodogram method is used with wR (n).
4. [4 marks] Use Bartlett’s method with segment length L = 64 and the rectangular window in Q3 to compute periodograms for zi (n).
(a) Compute the mean and variance of the periodograms.
(b) Determine a method to reduce the variance of these estimates without changing L.
(c) Estimate the PSD of z(n) with sufficient justifications and plot it.
5. [2 marks] Use the PSD estimate in Q4 to estimate the autocorrelation function rz (k) and plot it. Does this calculation require that z(n) is wide-sense stationary (WSS)?
6. [6 marks] Assume the noise can be modelled as z(n) = p(n) + w(n) where:
• p(n) follows a power-law spectrum with PSD β/f α for frequency f, scaling constant β > 0, and exponent α > 0
• w(n) is uncorrelated white Gaussian noise with power σ2.
(a) Derive the PSD of z(n) for arbitrary parameters α, β and σ2.
(b) Estimate the parameters α, β and σ2 by visual inspection of the PSD estimate in Q4.
(c) Derive a filter that whitens the noise z(n) [3].
(d) Verify the whitening filter by whitening zi (n) and re-computing the PSD estimate as in Q4.
7. [5 marks contributing to the continuous assessment total (max 50)]
Optional research question - Suggest a possible physical interpretation of why the PSD of z(n) presents this shape. Provide at least one relevant reference to a research paper that supports this reasoning.
References
[1] Nanopore DNA Sequencing, https://nanoporetech.com/applications/dna-nanopore-sequencing
[2] MATLAB Cell Arrays, https://au.mathworks.com/help/matlab/cell-arrays.html
[3] Whitening filter, https://www.sciencedirect.com/topics/computer-science/whitening-filter
2025-05-14