Fix a latent query-of-death in a GFE+QUIC+Leto codepath This CL addresses a long-standing, rare, and mysterious GFE production crash, first observed during the initial Leto-for-QUIC rollout in 2019 and undiagnosed until now. The crash occurs in the following situation: - Client sends an initial inchoate CHLO - GFE responds with a REJ containing a ServerConfig. - A client sends a CHLO containing a PUBS (public value) field which is incorrectly-sized (or possibly corrupt in some other way, but the size issue is easiest to understand). - GFE forwards the PUBS value to Leto to complete the DH exchange, but Leto returns an error since the value is corrupt. - GFE is configured in a temporary mode where it holds copies of Leto's private keys, to provide transparent fallback if Leto goes down. GFE tries to complete the DH exchange with its local copy of the key, which fails for the same reason as above. At this point GFE *should* give up and reject the handshake. Instead, it incorrectly goes down a codepath which was intended for the situation where GFE *does not* hold the ServerConfig private keys. In that case, each GFE generates a local (i.e. not shared with other GFEs) "fallback" ServerConfig and private key, so that if Leto cannot be reached, the GFE can send a REJ containing this "fallback" ServerConfig, and allow the handshake to proceed without Leto. GFE only creates a "fallback" ServerConfig if it intends to use Leto exclusively, without holding the private keys for all of its "normal" ServerConfigs. So, in the situation described above, GFE has not created any "fallback" ServerConfig at all, yet the fallback codepaths assume that it has, and segfaults. The fix is simple - check whether a fallback ServerConfig exists before trying to use it. Testing the fix is ugly. I found no existing test tools for generating malformed or otherwise adversarial QUIC messages. So, after consulting with fayang@, I went with a testvalue-driven approach - the test client generates a kosher CHLO, and I use a testvalue callback to corrupt its PUBS field inside the server. This requires an invasive change to core QUIC code, as well as undesirably adding a subset of the testvalue API to quic/platform. My intention is that this method would just be stubbed out and useless on the Chromium side. I would welcome feedback on whether this is a workable approach, or what alternatives I should use instead. Protected by Default true --quic_reloadable_flag_quic_check_fallback_null. PiperOrigin-RevId: 333740049 Change-Id: I2abfa3b5f45ea69b1f252e79b14e9237656f4d99
diff --git a/quic/core/crypto/quic_crypto_server_config.cc b/quic/core/crypto/quic_crypto_server_config.cc index 5835e94..ec535cf 100644 --- a/quic/core/crypto/quic_crypto_server_config.cc +++ b/quic/core/crypto/quic_crypto_server_config.cc
@@ -47,6 +47,7 @@ #include "net/third_party/quiche/src/quic/platform/api/quic_logging.h" #include "net/third_party/quiche/src/quic/platform/api/quic_reference_counted.h" #include "net/third_party/quiche/src/quic/platform/api/quic_socket_address.h" +#include "net/third_party/quiche/src/quic/platform/api/quic_testvalue.h" #include "net/third_party/quiche/src/common/platform/api/quiche_string_piece.h" #include "net/third_party/quiche/src/common/platform/api/quiche_text_utils.h" @@ -797,6 +798,11 @@ return; } + // Allow testing a specific adversarial case in which a client sends a public + // value of incorrect size. + AdjustTestValue("quic::QuicCryptoServerConfig::public_value_adjust", + &public_value); + const AsynchronousKeyExchange* key_exchange = configs.requested->key_exchanges[key_exchange_index].get(); std::string* initial_premaster_secret = @@ -824,9 +830,11 @@ << QuicVersionToString(context->transport_version()); if (found_error) { - // If we are already using the fallback config, just bail out of the - // handshake. - if (context->signed_config()->config == configs.fallback || + // If we are already using the fallback config, or there is no fallback + // config to use, just bail out of the handshake. + if ((GetQuicReloadableFlag(quic_check_fallback_null) && + configs.fallback == nullptr) || + context->signed_config()->config == configs.fallback || !GetQuicReloadableFlag( send_quic_fallback_server_config_on_leto_error)) { context->Fail(QUIC_INVALID_CRYPTO_MESSAGE_PARAMETER,
diff --git a/quic/platform/api/quic_testvalue.h b/quic/platform/api/quic_testvalue.h new file mode 100644 index 0000000..0df0c40 --- /dev/null +++ b/quic/platform/api/quic_testvalue.h
@@ -0,0 +1,25 @@ +// Copyright 2020 The Chromium Authors. All rights reserved. +// Use of this source code is governed by a BSD-style license that can be +// found in the LICENSE file. + +#ifndef QUICHE_QUIC_PLATFORM_API_QUIC_TESTVALUE_H_ +#define QUICHE_QUIC_PLATFORM_API_QUIC_TESTVALUE_H_ + +#include "net/quic/platform/impl/quic_testvalue_impl.h" +#include "net/third_party/quiche/src/common/platform/api/quiche_string_piece.h" + +namespace quic { + +// Interface allowing injection of test-specific code in production codepaths. +// |label| is an arbitrary value identifying the location, and |var| is a +// pointer to the value to be modified. +// +// Note that this method does nothing in Chromium. +template <class T> +void AdjustTestValue(quiche::QuicheStringPiece label, T* var) { + AdjustTestValueImpl(label, var); +} + +} // namespace quic + +#endif // QUICHE_QUIC_PLATFORM_API_QUIC_TESTVALUE_H_