The use of AI in medical applications invariably shows that they involve the gathering, integration, and processing of large databases, typically multimodal, through complex methods. The efficient execution of these applications is a key aspect for them to evolve. The computing power allows them to benefit from increasingly complex methods and models that along with larger datasets tend to improve their results quality. Computing systems in this context is in charge of meeting the urgency of each application executed, abstracting from the user (patient, doctor or manager) the intrinsic difficulties due to the diverse set of AI applications used in the medical domain. Therefore, the main challenge of this track refers to: “How to transparently provide the computing resources required to meet the urgency and scale of medical AI applications?”
This is a challenging question from the systems’ perspective, including hardware and computer networks, because it involves complex decision aspects about “where should computing take place (in which data center, using which type of processing element – CPU, FPGA, CPU) in order to meet the application requirements?” The three main aspects that impact this decision are related to the data gravity concept: (1) the input data size, (2) the computational intensity, and (3) the results urgency, which is a crucial aspect in medical applications. From the systems perspective, application requirements need to be met regardless of their characteristics, and requirements vary widely. For instance, in Anamnesis (with fusion of laboratory tests, wearables, and medical records) there are applications with a typical urgency in the order of minutes involving small input data (KB-MB) and medium to low computational intensity. On the other hand, in tele-surgery the urgency is in milliseconds, the data is larger (MB-GB), and the computation intensity of methods is higher. Dealing with this diversity of applications transparently in complex distributed computing systems is very challenging. In this context, following the data gravity concept, larger data tends to attract computation closer to them, while the computational intensity of the models may attract the data to specific processors. Urgency, in turn, limits or eases previous relationships by changing computational and input/output demands.
The choice of the location to execute AI methods and process the medical data, as well as the data and/or code movement, is affected by the architecture of the processing elements and how they communicate. This motivates the development of new processors or domain specific devices. Several aspects of decisions in architecture developments are open problems. Most current AI tools focus on local optimizations of particular algorithms or computing patterns, while in this project we will approach efficient execution from a global perspective.As the final output or product, we will develop a heterogeneous platform, composed of embedded devices and cloud and local servers, able to adapt to the urgency, data volume, and processing characteristics of health care applications, that decides where to efficiently perform each AI task. To achieve this result, this project will: (1) implement or extend a workload manager for distributed computing environments (e.g., SLURM), capable of handling concurrent execution of applications that should be aware of their urgency; (2) implement algorithms for determining the computing location with a macro view (wearable or datacenter, and which datacenter127) and micro (within the data center or device); and (3) propose and implement scheduling strategies for hybrid environments, equipped with CPUs, GPUs, FPGAs, etc. In a previous work128 we demonstrated the potential of these scheduling strategies in maximizing processing throughput. Here, however, the applications’ urgency demands new global approaches capable of dealing with deadlines, data locality, and new forms of computational heterogeneity; (iv) propose and develop new processing and I/O architectures (neuromorphic computing, non-von Neumann architectures, and new transistor technologies, e.g., QCA129) to accelerate the execution of relevant processing patterns that are not efficiently performed by off-the-shelf processors; (v) design protocols for reliable, low latency and high bandwidth transport, propose new standards and architectures for communication networks130, new I/O devices131 and efficient algorithms for self-management of communication networks using IA techniques132; and (vi) develop new data acquisition sensors with precision and frequency appropriate to specific problems and with a low degree of invasiveness.
127. Neto JLD, Yu S, Macedo DF, Nogueira JMS, Langar R, Secci S. ULOOF: A User Level Online Offloading Framework for Mobile Edge Computing. IEEE Trans Mob Comput. 2018 Nov;17(11):2660–74.
128. Teodoro G, Hartley TDR, Catalyurek U. Run-time optimizations for replicated dataflows on heterogeneous environments. Proceedings of the 19th [Internet]. 2010; Available from: https://dl.acm.org/doi/abs/10.1145/1851476.1851479
129. Cesar TF, Vieira LFM, Vieira MAM, Neto OPV. Cellular automata-based byte error correction in QCA. Nano Commun Netw. 2020 Feb 1;23:100278.
130. Matheus L, Pires L, Vieira A, Vieira LFM, Vieira MAM, Nacif JA. The internet of light: Impact of colors in LED-to-LED visible light communication systems [Internet]. Vol. 2, Internet Technology Letters. 2019. p. e78. Available from: http://dx.doi.org/10.1002/itl2.78
131. Boito FZ, Inacio EC, Bez JL, Navaux POA. A checkpoint of research on parallel i/o for high-performance computing. ACM Computing [Internet]. 2018; Available from: https://dl.acm.org/doi/abs/10.1145/3152891
132. Moura HD, Macedo DF, Vieira MAM. Wireless control using reinforcement learning for practical web QoE. Comput Commun. 2020 Mar 15;154:331–46.