Huasong Shan
Staff Scientist
JD.com American Technologies Corporation
Mountain View, CA, USA
E-Mail: monadynshy AT gmail DOT com

Research Project
  • Applications of AI in Infrastructure & Operation (AIOps): AIOps is defined by Gartner as "the application of machine learning (ML) and data science to IT operations problems. AIOps platforms combine big data and ML functionality to enhance and partially replace all primary IT operations functions, including availability and performance monitoring, event correlation and analysis, and IT service management and automation". There are a lot of research directions for AIInfra and AIOps: anomaly detection & root-cause diagnosis (e.g.,ϵ-Diagnosis, eDiagnosis), scalable tracing (e.g., JCallGraph), scalable service(e.g., ContainerDNS, Code of ContainerDNS), real-time tracing (e.g., IoTTracing), trend prediction, adaptive alerting, autoscaling, resource scheduling(e.g., ClusterScheduling-Patent, ContainerResourceAllocation-Patent, ScientificGateway, SIMULOCEAN), etc.

  • Cloud Security: Performance interference (stemming from resource contentions among co-located VMs) in the cloud platform is still an open question today. We investigate these performance vulnerabilities inside the cloud (e.g., Tail Amplification, my thesis) to protect cloud services with fast responsiveness. We are the first ones to find the phenomenon of "Tail Amplification" resulting from transient cross-resource contention inside the cloud.

  • System Security: Large scale systems typically adopt distributed architecture, such as n-tier systems for Web applications. Due to complex dependency among distributed nodes, there exist potential performance enigmas with hard traceability. I dig into these performance vulnerabilities (e.g., VSI-DDoS Attacks and Tail Attacks), and investigate the solution of fast responsiveness for Web applications (e.g., AsynchronousServer). We, for the first time, propose "Tail Attacks" by investigating complex resource dependencies among distributed nodes and exploiting transient bottleneck resource of the target web system, that can significantly cause the long-tail latency problem in web applications, while giving an "Unsaturated illusion" for state-of-the-art IDS/IPS tools leading to a higher level of stealthiness.

Research Impacts