A Sample Problem — — Train to be an Interview Problem Solver (1)

Recently I’ve started interviewing for different companies, and for a software engineer role, there would definitely be rounds for technical interviews (well, also known as whiteboard coding interview, notoriously). In this kind of interview, you’ll have to write some codes to solve a given problem, analyzing different solutions and come up with the best methods based on the given conditions. Usually, a well-trained interviewer (engineer) will evaluate a candidate by the following observations:

1. Problem analysis
2. Algorithm & data structure
3. Coding ability
4. Culture fitting & Cleverness

One could mostly learn the second and the third from school courses, however, it is from practicing and experiences that you can know how to tackle a problem and analyze from different aspects.

Here, I would like to show a sample problem that could be tackled from different approaches, while each could be applied in different scenarios.

So, let’s get down to the problem.

You have two inputs: N, M
Write a function that could print out M ordered numbers sampled from 0 ~ N-1
The probability for choosing each combination should be the same.
For example -> N = 3, M = 2

The function should print out [0,1] or [0,2] or [1,2]
The probability for printing out [0,1], [0,2] or [1,2] should be the same.
[2, 1] will not be considered because we want it to be ordered.

Well, a really, really brute force way is to generate all possible sequences, and then randomly choose one of them. To write a recursive function to search all possible subsequences is trivial to implement. But that way the time complexity will be O(N!) which seems terrible. We need a better way.

After second thought, one should easily figure out that if you want an ordered sequence, you can not choose a number before itself. For example, if N = 5 and M = 2, the probability to choose 0 should be 2/5, and then for the probability to choose 1 is more subtle — — if 0 is chosen, then it should 1/4 otherwise 2/4. In conclusion, we will choose the next number based on R/S, where R denotes to the remaining slots to select and S denotes to the remaining number available to choose.

If we know the pattern to represent a probability M / N:

if (rand() % N < M) { … }

Then we should be able to come up with this approach:

This takes O(N) time and O(1) space, or O(M) if we save the result.
Here comes the tradeoff: what if N is too large?
It seems that this kind of approach will be broken right? Can we do better?

Another approach is to randomly select a number inside a collection until the size reaches M. We have to detect duplication while selecting numbers. A possible implementation using C++ is:

Note here, the implementation of std::set is a binary search tree where the container is ordered and could support O(log N) insertion operation. If duplication happens, the selected number will be discarded and pick another again.

Can we do better?

Observing the implementation carefully, we could find that the usage of line 7, the insertion operation might end up multiple collisions if M is near N because most numbers will be selected before the algorithm ends.

There’s an elegant and concise way to improve this. See the approach below:

Clever, right? This implementation will limit the collision check for at most once for each number selected, and the probability to generate the subsequences will still be even. To illustrate the intuition behind this, check the post.

Are we done here? Well, basically, an interviewer will expect you to come up with the first two approaches and analyze the use cases then you’re good. For the third one, if you come up with that without knowing it, you’re a genius. But the most important take away from this sample problem is, to tackle a problem, there should be various approaches with various performances. Try to leverage the advantage and disadvantage for different algorithms, and make the best choice.

Also, be creative. You may use sorting on a collection to made it ordered. Or first sampled N-M numbers and then take M numbers that are not in the sampling collection. These are possible paths towards a solution.

Anyway, computer science is all about trade-off sciences. A trade-off between time complexity, space complexity, and coding complexity.

And that’s all that we would do as a software engineer.

Reference:

  1. Programing Pearls, Second Edition (Very Recommended!)

2. Robert Floyd’s Tiny and Beautiful Algorithm for Sampling without Replacement

3. The full c++ sample program