


Our success will be evidenced through independent measures of our core activities. This is a vision and strategic plan of substance, supported by detailed delivery plans.
#Planning with visual understanding environment to plan life how to#
It has also enabled us to explore and refine how to deliver them. This approach has ensured a shared creation of ideas and allowed us to test the resilience of the priorities that we have identified.

We have built our vision and strategic plan on extensive consultation with colleagues, students, alumni, and regional and national stakeholders. The strategic plan reinforces what already makes The University of Manchester distinctive: our excellence, openness and inclusivity, our longstanding commitment to social responsibility, our scale and breadth, our tradition of innovation, and our very close bonds with, and location at the heart of, Manchester. It points to a future where we will expand our world-leading research to address the most challenging global questions and exploit our capability for interdisciplinary research transform the way our students learn to make them the most employable graduates and truly global citizens and ensure that all our activities make a positive difference to society. It builds on our strengths while taking the University in new directions. MIT-IBM Watson AI Lab.The foundation of this vision and strategic plan remains our three core goals of research and discovery, teaching and learning, and social responsibility, which are encapsulated in our motto: knowledge, wisdom and humanity. In the past, he has interned at Facebook AI Research (FAIR), Snap Research. On the other hand, he also worked on how to leverage structural understanding of images towards the other tasks, such as visual question answering, visual question generation, etc. Specifically, his work focus on how to extract the structure from visual data and then how to leverage the interactions with human and environment to help further improve the visual system. His main research is about computer vision and the combination with language and embodiment. candidate in the College of Computing at Georgia Tech. Finally, I will briefly talk about my ongoing and future works which are aimed at connecting vision, language, and environment towards better visual understanding and interactions. Specifically, I will present: 1) how to learn to prune dense graph and perform relational modeling for scene graph generation 2) how to leverage structure in images for more grounded caption generation and question generation to actively acquire more information from humans 3) How to learn a moving strategy for embodied visual system in a 3D environments to achieve better visual perception through actions. In this talk, I will talk about how to leverage this structure in our visual world for visual understanding and interactions with language and environment. This structure in our world manifests itself in the visual data that captures the world around us. Moreover, objects usually interact with each other in predictable ways (e.g., mugs are on tables, keyboards are below computer monitors, the sky is in the background). In images, there are usually a background and some foreground objects (e.g., kites and birds in the sky, sheep and cows on the grass). As 2D projection of our world, images are also structured. The visual world around us is highly structured.
