Bằng chứng là mã chết không thể được phát hiện bởi trình biên dịch

32

Tôi dự định dạy một khóa học mùa đông về một số chủ đề khác nhau, một trong số đó sẽ là trình biên dịch. Bây giờ, tôi đã gặp vấn đề này trong khi nghĩ về các bài tập để đưa ra trong suốt quý, nhưng nó đã làm tôi bối rối để tôi có thể sử dụng nó làm ví dụ thay thế.

public class DeadCode {
  public static void main(String[] args) {
     return;
     System.out.println("This line won't print.");
  }
}

Trong chương trình trên, rõ ràng là câu lệnh in sẽ không bao giờ thực thi vì return. Trình biên dịch đôi khi đưa ra cảnh báo hoặc lỗi về mã chết. Ví dụ, đoạn mã trên sẽ không biên dịch trong Java. Tuy nhiên, trình biên dịch javac sẽ không phát hiện tất cả các trường hợp mã chết trong mọi chương trình. Làm thế nào tôi có thể chứng minh rằng không có trình biên dịch có thể làm như vậy?

computability proof-techniques compilers

— thomas
nguồn

29

Bối cảnh của bạn là gì và bối cảnh bạn sẽ giảng dạy là gì? Nói thẳng ra, tôi hơi lo lắng rằng bạn phải hỏi điều này, xem như bạn sẽ dạy. Nhưng gọi tốt hỏi ở đây!

— Raphael

5

Câu hỏi tương tự tại Stack Overflow

— Moyli

9

@ MichaelKjorling Phát hiện mã chết là không thể ngay cả khi không có những cân nhắc đó.

— David Richerby

2

BigInteger i = 0; while(isCollatzConjectureTrueFor(i)) i++; printf("Hello world\n");

— dùng253751

2

@immibis Câu hỏi yêu cầu một bằng chứng cho thấy phát hiện mã chết là không thể . Bạn đã đưa ra một ví dụ trong đó phát hiện mã chết chính xác yêu cầu giải quyết một vấn đề mở trong toán học. Điều đó không chứng minh rằng phát hiện mã chết là không thể .

— David Richerby

57

Tất cả xuất phát từ sự không ổn định của vấn đề tạm dừng. Giả sử chúng ta có một hàm mã chết "hoàn hảo", một số Turing Machine M và một số chuỗi đầu vào x và một thủ tục trông giống như thế này:

Run M on input x;
print "Finished running input";

Nếu M chạy mãi mãi, thì chúng ta sẽ xóa câu lệnh in, vì chúng ta sẽ không bao giờ đạt được nó. Nếu M không chạy mãi mãi, thì chúng ta cần giữ nguyên câu lệnh in. Do đó, nếu chúng ta có một công cụ loại bỏ mã chết, nó cũng cho phép chúng ta giải quyết vấn đề Ngừng, vì vậy chúng ta biết rằng không thể có loại bỏ mã chết như vậy.

Cách chúng ta giải quyết vấn đề này là "xấp xỉ bảo thủ". Vì vậy, trong ví dụ về Máy Turing của tôi ở trên, chúng tôi có thể giả sử rằng việc chạy M trên x có thể kết thúc, vì vậy chúng tôi chơi nó an toàn và không xóa câu lệnh in. Trong ví dụ của bạn, chúng tôi biết rằng bất kể chức năng nào làm hoặc không dừng lại, rằng không có cách nào chúng tôi sẽ đạt được tuyên bố in đó.

Thông thường, điều này được thực hiện bằng cách xây dựng một "biểu đồ luồng điều khiển". Chúng tôi thực hiện các giả định đơn giản hóa, chẳng hạn như "kết thúc vòng lặp while được kết nối với đầu và câu lệnh sau", ngay cả khi nó chạy mãi mãi hoặc chỉ chạy một lần và không truy cập cả hai. Tương tự, chúng tôi giả định rằng một câu lệnh if có thể đến tất cả các nhánh của nó, ngay cả khi trong thực tế, một số không bao giờ được sử dụng. Những loại đơn giản hóa này cho phép chúng tôi loại bỏ "mã chết rõ ràng" như ví dụ bạn đưa ra, trong khi vẫn có thể quyết định được.

Để làm rõ một vài nhầm lẫn từ các ý kiến:

Nitpick: đối với M cố định, điều này luôn luôn có thể quyết định. M phải là đầu vào

Như Raphael nói, trong ví dụ của tôi, chúng tôi coi Máy Turing là đầu vào. Ý tưởng là, nếu chúng tôi có một thuật toán DCE hoàn hảo, chúng tôi sẽ có thể xây dựng đoạn mã tôi đưa cho bất kỳ Máy Turing nào và có DCE sẽ giải quyết vấn đề tạm dừng.
không thuyết phục. trở lại như một tuyên bố thẳng thừng trong một thực thi thẳng về phía trước không có chi nhánh không khó để quyết định. (và trình biên dịch của tôi nói với tôi rằng nó có khả năng tìm ra điều này)

Đối với vấn đề njzk2 nêu ra: bạn hoàn toàn đúng, trong trường hợp này bạn có thể xác định rằng không có cách nào để đưa ra tuyên bố sau khi hoàn trả. Điều này là do nó đủ đơn giản để chúng ta có thể mô tả tính không thể truy cập của nó bằng cách sử dụng các ràng buộc biểu đồ luồng điều khiển (nghĩa là không có các cạnh đi ra khỏi câu lệnh return). Nhưng không có loại bỏ mã chết hoàn hảo, loại bỏ tất cả các mã không sử dụng.
Tôi không lấy bằng chứng phụ thuộc đầu vào để làm bằng chứng. Nếu tồn tại loại đầu vào người dùng như vậy có thể cho phép mã là hữu hạn, thì trình biên dịch sẽ cho rằng nhánh sau không chết. Tôi không thể thấy tất cả các upvote này để làm gì, cả hai đều rõ ràng (ví dụ: stdin vô tận) và sai.

Đối với TomášZato: nó không thực sự là một bằng chứng phụ thuộc đầu vào. Thay vào đó, giải thích nó như là một "forall". Nó hoạt động như sau: giả sử chúng ta có một thuật toán DCE hoàn hảo. Nếu bạn đưa cho tôi một máy Turing tùy ý M và đầu vào x, tôi có thể sử dụng thuật toán DCE của mình để xác định xem M có dừng hay không, bằng cách xây dựng đoạn mã ở trên và xem liệu lệnh in có bị xóa không. Kỹ thuật này, để lại một tham số tùy ý để chứng minh một câu lệnh forall, là phổ biến trong toán học và logic.

Tôi không hiểu đầy đủ quan điểm của TomášZato về mã là hữu hạn. Chắc chắn mã là hữu hạn, nhưng một thuật toán DCE hoàn hảo phải áp dụng cho tất cả các mã, đó là một tập hợp infinte. Tương tự như vậy, trong khi bản thân mã là hữu hạn, các bộ đầu vào tiềm năng là vô hạn, cũng như thời gian chạy tiềm năng của mã.

Đối với việc xem xét nhánh cuối cùng không chết: nó an toàn về mặt "xấp xỉ bảo thủ" mà tôi nói đến, nhưng nó không đủ để phát hiện tất cả các trường hợp mã chết như OP yêu cầu.

Hãy xem xét mã như thế này:

while (true)
  print "Hello"
print "goodbye"

Rõ ràng chúng ta có thể loại bỏ print "goodbye"mà không thay đổi hành vi của chương trình. Như vậy, nó là mã chết. Nhưng nếu có một lệnh gọi chức năng khác thay vì (true)trong whileđiều kiện, thì chúng ta không biết liệu chúng ta có thể loại bỏ nó hay không, dẫn đến tính không ổn định.

Lưu ý rằng tôi không tự mình nghĩ ra điều này. Đó là một kết quả nổi tiếng trong lý thuyết về trình biên dịch. Nó được thảo luận trong The Tiger Book . (Bạn có thể thấy nơi họ nói về trong sách của Google .

— jmite
nguồn

1

@njzk2: We're trying to show it's impossible to build a dead code eliminator that eliminates all dead code, not that it's impossible to build a dead code eliminator that eliminates some dead code. The print-after-return example can be eliminated easily using control-flow graph techniques, but not all dead code can be eliminated this way.

— user2357112 supports Monica

4

This answer references comments. As I read the answer, I need to jump down into the comments, then return to the answer. This is confusing (doubly so when you consider that comments are fragile and may be lost). A self-contained answer would be far easier to read.

— TRiG

1

@TomášZato - consider the program that increments a variable

n

$n$ and checks whether or not

n

$n$ is an odd perfect number, terminating only when it finds such a number. Clearly this program does not depend on any external input. Are you asserting that it can easily be determined whether or not this program terminates?

— Gregory J. Puleo

3

@TomášZato You are mistaken in your understanding of the halting problem. Given a finite Turing Machine

M

$M$ , and finite input

x

$x$ , it's impossible to determine whether

M

$M$ infinitely loops while running on

x

$x$ . I haven't proven this rigorously because it has been proved over and over again, and is a fundamental principle of computer science. There's a nice sketch of the proof on Wikipedia

— jmite

1

jmite, please incorporate valid comments into the answer so that the answer stands on its own. Then flag all comments that are obsolete as such so we can clean up. Thanks!

— Raphael

14

This is a twist on jmite's answer that circumvents the potential confusion about non-termination. I'll give a program that always halts itself, may have dead code but we can not (always) algorithmically decide if it has.

Consider the following class of inputs for the dead-code identifier:

simulateMx(n) {
  simulate TM M on input x for n steps
  if M did halt
    return 0
  else
    return 1
}

Since M and x are fixed, simulateMs has dead code with return 0 if and only if M does not halt on x.

This immediately gives us a reduction from the halting problem to dead-code checking: given TM $M$ as halting-problem instance, create above program with x the code of $M$ -- it has dead code if and only if $M$ does not halt on its own code.

Hence, dead-code checking is not computable.

In case you are unfamiliar with reduction as a proof technique in this context, I recommend our reference material.

— Raphael
nguồn

5

A simple way to demonstrate this kind of property without getting bogged into details is to use the following lemma:

Lemma: For any compiler C for a Turing-complete language, there exists a function undecidable_but_true() which takes no arguments and returns the boolean true, such that C cannot predict whether undecidable_but_true() returns true or false.

Note that the function depends on the compiler. Given a function undecidable_but_true1(), a compiler can always be augmented with the knowledge of whether this function returns true or false; but there is always some other function undecidable_but_true2() that won't be covered.

Proof: by Rice's theorem, the property “this function returns true” is undecidable. Therefore any static analysis algorithm is unable to decide this property for all possible functions.

Corollary: Given a compiler C, the following program contains dead code which cannot be detected:

if (!undecidable_but_true()) {
    do_stuff();
}

A note about Java: the Java language mandates that compilers reject certain programs that contain unreachable code, while sensibly mandating that code is provided at all reachable points (e.g. control flow in a non-void function must end with a return statement). The language specifies exactly how the unreachable code analysis is performed; if it didn't then it would be impossible to write portable programs. Given a program of the form

some_method () {
    <code whose continuation is unreachable>
    // is throw InternalError() needed here?
}

it is necessary to specify in which cases the unreachable code must be followed by some other code and in which cases it must not be followed by any code. An example of a Java program that contains code that is unreachable, but not in a way that Java compilers are allowed to notice, comes up in Java 101:

String day_of_week(int n) {
    switch (n % 7) {
    case 0: return "Sunday";
    case 1: case -6: return "Monday";
    …
    case 6: case -1: return "Saturday";
    }
    // return or throw is required here, even though this point is unreachable
}

— Gilles 'SO- stop being evil'
nguồn

Note that some compilers for some languages may be able to detect that the end of day_of_week is unreachable.

— user253751

@immibis Yes, for example CS101 students can do it in my experience (though admittedly CS101 students aren't a sound static analyzer, they usually forget about the negative cases). That's part of my point: it's an example of a program with unreachable code that a Java compiler will not detect (at least, may warn about, but may not reject).

— Gilles 'SO- stop being evil'

1

I'm afraid the phrasing of the Lemma is misleading at best, with a tint of wrongness to it. Undecidability only makes sense if you phrase it terms of (infinite) sets of instances. (The compiler does produce an answer for every function, and we know that it can not be always correct, but saying that there's a single undecidable instance is off.) Your paragraph between the Lemma and the Proof (which does not quite match the Lemma as stated) tries to fix this, but I think it would be better to formulate a clearly correct lemma.

— Raphael

@Raphael Uh? No, the compiler need not produce an answer to the question “is this function constant?”. It doesn't need to distinguish “I don't know” from “no” to produce working code, but that's not relevant here since we're only interested in the static analysis part of the compiler, not in the code translation part. I don't understand what you find misleading or incorrect about the statement of the lemma — unless your point is that I should write “static analyzer” instead of “compiler”?

— Gilles 'SO- stop being evil'

The statement sounds like "undecidability means that there is an instance that can not be solved", which is wrong. (I know you don't mean to say that, but that's how it can read to the unwary/novices, imho.)

— Raphael

3

jmite's answer applies to whether the program will ever exit a calculation--just because it's infinite I wouldn't call the code after it dead.

However, there's another approach: A problem for which there is an answer but it's unknown:

public void Demo()
{
  if (Chess.Evaluate(new Chessboard(), int.MaxValue) != 0)
    MessageBox.Show("Chess is unfair!");
  else
    MessageBox.Show("Chess is fair!");
}

public class chess
{
  public Int64 Evaluate(Chessboard Board, int SearchDepth)
  {
  ...
  }
}

This routine without a doubt does contain dead code--the function will return an answer that executes one path but not the other. Good luck finding it, though! My memory is no theoretical computer can solve this within the lifespan of the universe.

In more detail:

The Evaluate() function computes which side wins a chess games if both sides play perfectly (with maximum search depth).

Chess evaluators normally look ahead at every possible move some specified depth and then attempt to score the board at that point (sometimes expanding certain branches farther as looking halfway through an exchange or the like can produce a very skewed perception.) Since the actual maximum depth is 17695 half-moves the search is exhaustive, it will traverse every possible chess game. Since all the games end there's no issue of trying to decide how good a position each board is (and thus no reason to look at the board evaluation logic--it will never be called), the result is either a win, a loss or a draw. If the result is a draw the game is fair, if the result is not a draw it's an unfair game. To expand it a bit we get:

public Int64 Evaluate(Chessboard Board, int SearchDepth)
{
  foreach (ChessMove Move in Board.GetPossibleMoves())
    {
      Chessboard NewBoard = Board.MakeMove(Move);
      if (NewBoard.Checkmate()) return int.MaxValue;
      if (NewBoard.Draw()) return 0;
      if (SearchDepth == 0) return NewBoard.Score();
      return -Evaluate(NewBoard, SearchDepth - 1);
    }
}

Note, also, that it will be virtually impossible for the compiler to realize that Chessboard.Score() is dead code. A knowledge of the rules of chess allows us humans to figure this out but to figure this out you have to know that MakeMove can never increase the piece count and that Chessboard.Draw() will return true if the piece count remains static for too long.

Note that the search depth is in half-moves, not whole moves. This is normal for this sort of AI routine as it's an O(x^n) routine--adding one more search ply has a major effect on how long it takes to run.

— Loren Pechtel
nguồn

8

You assume that a checking algorithm would have to perform the calculation. A common fallacy! No, you don't get to assume anything about how a checker would work, otherwise you can not refute its existence.

— Raphael

6

The question requests a proof that it is impossible to detect dead code. Your post contains an example of a case where you suspect it would be difficult to detect dead code. That isn't an answer to the question at hand.

— David Richerby

2

@LorenPechtel I don't know, but that's not a proof. See also here; a cleaner example of your misconception.

— Raphael

3

If it helps, consider that there's nothing theoretically stopping someone from running their compiler for more than the lifetime of the universe; the only limitation is practicality. A decidable problem is a decidable problem, even if it's in the complexity class NONELEMENTARY.

— Pseudonym

4

In other words, this answer is at best a heuristic intended to show why it's probably not easy to build a compiler that detects all dead code -- but it's not a proof of impossibility. This kind of example might be useful as a way to build intuition for students, but it is not a proof. By presenting itself as a proof, it does a disservice. The answer should be edited to state that it is an intuition-building example but not a proof of impossibility.

— D.W.

-3

I think in a computing course, the notion of dead code is interesting in the context of understanding the difference between compile time and run time!

A compiler can determine when you've got code that can in no compile-time scenario ever be traversed, but it cannot do so for runtime. a simple while-loop with user input for the loop-break test shows that.

If a compiler could actually determine runtime dead code (i.e. discern Turing complete) then there's an argument that the code never needs be run, because the job's already done!

If nothing else, the existence of code that passes compile-time dead code checks illustrates the need for pragmatic bounds-checking on inputs and general coding hygiene (in the real world of real projects.)

— dwoz
nguồn

1

The question asks for a proof that it is impossible to detect dead code. You have not answered that question.

— David Richerby

Also, your assertion that "A compiler can determine when you've got code that can in no compile-time scenario ever be traversed" is incorrect and directly contradicts what the question asks you to prove.

— David Richerby

@David Richerby, I think you may be misreading me. I'm not suggesting that compile-time checking can find ALL dead code, quite definitely not. I'm suggesting that there is a subset of the set of all dead code that is discernable at compile time. If I write: if(true==false){ print("something");}, that print statement will be discernable at compile time to be dead code. Do you disagree that this is a counterexample to your assertion?

— dwoz

Sure, you can determine some dead code. But if you're going to say "determine when [you have dead code]" with no qualifications then that, to me, means find all the dead code, not just some of it.

— David Richerby