Phân loại cấu trúc liên kết tối thiểu của một DAG có nhãn

13

Hãy xem xét vấn đề mà chúng ta được đưa ra làm đầu vào một đồ thị chu kỳ có hướng $G = (V, E)$ , a labeling function $\lambda$ from $V$ to some set $L$ with a total order $<_L$ (e.g., the integers), and where we are asked to compute the lexicographically smallest topological sort of $G$ in terms of $\lambda$ . More precisely, a topological sort of $G$ is an enumeration of $V$ as $\mathbf{v} = v_1, \ldots, v_n$ , such that for all $i \neq j$ , whenever there is a path from $v_i$ to $v_j$ in $G$ , then we must have $i < j$ . The label of such a topological sort is the sequence of elements of $S$ obtained as $\mathbf{l} = \lambda(v_1), \ldots, \lambda(v_n)$ . The lexicographical order on such sequences (which all have length $|V|$ ) is defined as $\mathbf{l} <_{\text{LEX}} \mathbf{l'}$ iff there is some position $i$ such that $l_i <_L l_i'$ and $l_j = l'_j$ for all $j < i$ . Pay attention to the fact that each label in $S$ can be assigned to multiple vertices in $V$ (otherwise the problem is trivial).

This problem can be stated either in a computation variant ("compute the lexicographically minimal topological sort") or in a decision variant ("is this input word the minimal topological sort?"). My question is, what is the complexity of this problem? Is it in PTIME (or in FP, for the computation variant) or is it NP-hard? If the general problem is NP-hard, I am also interested about the version where the set $S$ of possible labels is fixed in advance (i.e., there are only a constant number of possible labels).

Remarks:

Here is a small real-world example to motivate the problem. We can see the DAG as representing tasks of a project (with a dependency relationship between them) and the labels are integers representing the number of days that each task takes. To finish the project, it will take me the same total amount of time no matter the order I choose for the tasks. However, I would like to impress my boss, and to do this I want to finish as many tasks as possible as fast as possible (in a greedy manner, even if it means being very slow at the end because the harder tasks remain). Choosing the lexicographically minimal order optimises the following criterion: I want to choose an order $o$ such that there is no other order $o'$ and a number of days $n$ where after $n$ days I would have finished more tasks with order $o'$ than with order $o$ (i.e., if my boss looks at time $n$ , I give a better impression with $o'$ ), but for all $m < n$ I have finished no less tasks with order $o'$ than with order $o$ .

To give some insight about the problem: I already know from previous answers that the following related problem is hard: "is there a topological sort which achieves the following sequence"? However, the fact here that I want a sequence which is minimal for this lexicographic order seems to constrain a lot the possible topological orders that may achieve it (in particular the reductions in those other answers no longer seem to work). Intuitively, there are much less situations where we have a choice to make.

Note that there seems to be interesting rephrasings of the problems in terms of set cover (when restricting the problem to DAGs that are bipartite, i.e., have height two): given a set of sets, enumerate them in a order $S_1, \ldots, S_n$ that minimises lexicographically the sequence $|S_1|$ , $|S_2 \backslash S_1|$ , $|S_3 \backslash (S_1 \cup S_2)|$ , $\ldots$ , $|S_n \backslash (S_1 \cup \cdots \cup S_{n-1})|$ . The problem can also be rephrased on undirected graphs (progressively expand a connected area of the graph following the order that minimizes the lexicographic sequence of the uncovered labels). However, because of the fact that the sequence has to be greedy at all times by definition of the lexicographic order, I can't get reductions (e.g., of Steiner tree) to work.

Thanks in advance for your ideas!

— a3nm
nguồn

12

With multiple copies of the same label allowed, the problem is NP-hard, via a reduction from cliques in graphs. Given a graph $G$ in which you want to find a $k$ -clique, make a DAG with a source vertex for each vertex of $G$ , a sink vertex for each edge of $G$ , and a directed edge $xy$ whenever $x$ is a vertex of $G$ that forms an endpoint of edge $y$ . Give the vertices of $G$ the label value $1$ and the edges of $G$ the label value $0$ .

Then, there is a $k$ -clique in $G$ if and only if the lexicographically first topological order forms a sequence of $k$ $1$ 's and $\tbinom{k}{2}$ $0$ 's, with $i-1$ $0$ 's following the $i$ th $1$ . E.g. a six-vertex clique would be represented by the sequence $110100100010000100000$ . This is the lexicographically smallest sequence that could possibly begin a topological ordering of a labeled DAG given by this construction (replacing any of the $1$ 's by $0$ 's would give a sequence with more edges than could be found in a simple graph with that many vertices) and it can only be the beginning of a topological ordering when $G$ contains the desired clique.

— David Eppstein
nguồn

Oh, I hadn't thought about cliques. That's a nice reduction, thanks a lot! So this shows that the computation problem is NP-hard, even with the fixed label alphabet

{0, 1}

$\{0, 1\}$ ). It also implies that the decision problem "is the lexicographically smallest sequence less than this one" is NP-hard as well (you can use it to compute the minimum with binary search). The only additional question I see is whether the problem "is this exact input sequence the minimal one" is NP-hard as well. (With it, you cannot test easily if the minimal word starts with a prefix.) Do you have any idea for that one?

— a3nm

1

My suspicion is that the problem "is this exact sequence achievable" is NP-complete, but I don't have a reduction at hand. "Is this exact sequence the minimal one" should be at the second level of the polynomial hierarchy, since it requires a combination of existential quantification (is it achievable) and universal quantification (are all achievable sequences at least as big).

— David Eppstein

In fact I already know that testing whether an exact sequence is achievable is NP-hard (on an alphabet with 3 labels) by a reduction from unary 3-partition by Marzio de Biasi sketched here: cstheory.stackexchange.com/a/19415. But I think it doesn't tell the status of the problem "is this the minimal achievable sequence": when asking about whether a certain sequence is achievable, in general it will stand little chance of being minimal in some lexicographic order. Either way, what your reduction shows is still very interesting, thanks again! :)

— a3nm

2

According to this reference (1), the lexicographically first topological order problem is NLOG-complete.

You may want to take a more thorough look at the article to ensure that it covers the case(s) that you're interested in. In particular, based on the technical report version (pdf) of that article, it appears that they're treating the lexicographic ordering of the vertices as strict (e.g.: in your notation, $\lambda(u) \neq \lambda(v)$ for $u \neq v$ ), but I'm not sure if this affects the applicability of the result.

Shoudai, Takayoshi. "The lexicographically first topological order problem is NLOG-complete." Information processing letters 33.3 (1989): 121-124.

— mhum
nguồn

4

NLOG-complete is a subset of polynomial-time, and (per the "Pay attention" sentence in the first paragraph of the problem) making the labels of the vertices distinct makes the problem easily solvable by a polynomial-time greedy algorithm. The real question is what happens when the labels are not distinct.

— David Eppstein

That's a fair point. It's now clear from your answer that the repetition of labels makes the problem more difficult than the case of unique labels.

— mhum