Replica-Exchange represent a powerful class of algorithms that are used for enhanced configurational and energetic sampling in a range of physical systems. Computationally they represent a type of applications with multiple scales of communication; at a fine-grained level there is often communication with a replica, typically an MPI process. At a coarse-grained level -- both temporally as well as in data amount exchanged, the replicas communicate with other replicas. In this paper, we outline a novel framework that we have developed that supports the large-scale and flexible execution of a number of replicas. The framework is flexible in the sense that it supports different coupling schemes between the replicas, and is agnostic to the specific underlying simulation -- classical or quantum, single-core simulation or a parallel simulation. As a measure of the scalability of the framework, we measure the number of nanoseconds simulated a day, as a function of the number of replicas. In spite of the increasing communication and coordination requirements as a function of the number of replicas, our framework supports the execution of a thousand replicas without significant overhead. Furthermore, as representative of the efficiency of the framework, the number of nanoseconds simulated in twenty-four hours as a function of replicas remains essentially constant. Although there are several specific aspects that will benefit from further optimization, a first working prototype has the ability to fundamentally change the scale of replica-exchange simulations possible on production distributed cyberinfrastructure such as XSEDE, as well as support novel usage modes on these infrastructure. This paper also represents the release of the framework to the broader biophysical simulation community and provides details on its usage.