OpenMPCon 2015 Talk Series 4

OpenMPCon  this month aims to bring a stellar lineup of the latest industry gurus, users and developers together with the language designers. As such we have 3 keynotes along with two full day tutorial and a day and a half of talks. You cans see the first keynote, tutorial and the first of three talks here. We also posted the second of three keynotes by Professor William Tang of Princeton University as well as the second series and third series of talks. The third keynote is also here and we will now describe fourth series of talks.

Want to know how OpenMP is used in US National Labs, especially at NERSC? NERSC is the primary supercomputing facility for Office of Science in the US Depart of Energy (DOE). Our next production system will be an Intel Xeon Phi Knights Landing (KNL) system, with 60+ cores per node and 4 hardware threads per core. The recommended programming model is hybrid MPI/OpenMP, which also promotes portability across different system architectures.

OpenMP usage statistics, such as the percentage of codes using OpenMP, typical number of threads used, etc., on current NERSC production systems will be analyzed. They will describe what they tell their users how to use OpenMP efficiently with multiple compilers on various NERSC systems, including how to obtain best process and thread affinity for hybrid MPI/OpenMP, memory locality with NUMA domains, programming tips for adding OpenMP, strategies for improving OpenMP scaling, how to use nested OpenMP, and tools available for OpenMP. Tuning examples with real scientific user codes will also be presented on improving OpenMP performance.

Manuel Arenaz will demonstrate A Success Case using Parallware. The manual parallelization of existing code is usually a tedious and error-prone task, specially in the case of large projects. Parallware is the first commercial OpenMP-enabling source-to-source compiler that automatically adds OpenMP capabilities in scientific programs. The compiler automatically discovers the parallelism available in sequential codes written in the C programming language. It produces human readable code annotated with OpenMP directives, instead of a binary executable file.In this work we analyze the parallelization of the program EP from the NAS Parallel Benchmarks (NPB) suite. They show through performance results that, starting from the original sequential version and applying some simple code refactorizations, Parallware is able to generate efficient OpenMP parallel code automatically.

Please consider attending by signing up here. In the mean time, we are looking for student and volunteers to help with the conference. Please connect with OpenMPCon if you wish to help.

OpenMPCon 2015 talks Series 3

OpenMPCon  this month aims to bring a stellar lineup of the latest industry gurus, users and developers together with the language designers. As such we have 3 keynotes along with two full day tutorial and a day and a half of talks. You cans see the first keynote, tutorial and the first of three talks here. We also posted the second of three keynotes by Professor William Tang of Princeton University as well as the second series of talks. The third keynote is also here and we will now describe third series of talks.

Want to know what it takes to port OpenACC 2.0 to OpenMP 4.0? Oscar Hernandez of Oak Ridge NL has done it and can show you the way as he presents code comparisons to show how each API is used to parallelize representative code fragments. Furthermore, he will give guidelines for developers wishing to convert codes from OpenACC 2.0 to OpenMP 4.0.

Alice Koniges of Lawrence Berkeley NL will describe what it takes to Enable Application portability across HPC platforms using Open Standards with an aim towards User-oriented goals for OpenMP. Portability plus performance are key requirements for large-scale scientific simulations on the path to exascale. Users of the high-end computing facilities such as the National Energy Research Scientific Computing Center (NERSC) and the Oak Ridge Leadership Computing Facility (OLCF) are demanding portable standards to enable their codes to run on differing high performance computing (HPC) architectures with relatively little user intervention between differing versions that have been optimized for performance.

The emerging OpenMP standards are poised to offer such portability. In this presentation, she will discuss several important goals and requirements of portable standards in the context of OpenMP.

Want to know Effective OpenMP SIMD Vectorization for Intel Xeon and Xeon Phi Architectures? There is no better guru then Intel’s Xinmin Tian, who will show how to efficiently exploit SIMD vector units in achieving high performance of the application code running on Intel® Xeon and Xeon Phi™.In this talk, he will present Intel® compiler framework that supports OpenMP4.0/4.1 SIMD extensions, and also present a set of key vectorization techniques such as function vectorization, masking support, uniformity and linearity propagation, alignment optimization, gather/scatter optimization, remainder and peeling loop vectorization that are implemented inside the Intel® C/C++ and Fortran product compilers for Intel® Xeon processors and Xeon Phi™ coprocessors.

Please consider attending by signing up here. In the mean time, we are looking for student and volunteers to help with the conference. Please connect with OpenMPCon if you wish to help.

OpenMPCon’s third keynote on Embedded Computing with OpenMP

OpenMPCon  this month aims to bring a stellar lineup of the latest industry gurus, users and developers together with the language designers. As such we have 3 keynotes along with two full day tutorial and a day and a half of talks. You cans see the first keynote, tutorial and the first of three talks here. We also posted the second of three keynotes by Professor William Tang of Princeton University as well as the second series of three talks.

In this post, I would like to announce the third keynote by Exas Instrument’s Eric Stotzer on Towards Programming Embedded Systems with OpenMP.

Software for embedded systems is more complex than in the past, as more functions are implemented on the same device. This talk will provide an overview of the characteristics of embedded systems and discuss features that could be added to OpenMP to enable it to better serve as a programming model for these systems. Embedded systems typically are constrained by among other things real-time deadlines, power-limitations and limited memory resources. Today OpenMP is not able to express these types of constraints. Embedded systems applications can be broadly classified as event-driven or compute and data intensive. OpenMP is well suited to expressing the parallel execution that is demanded by compute and data intensive applications. However, extensions are needed for event-driven applications, such as automotive embedded systems, where the behavior is characterized by a defined sequence of responses to a given incoming event from the environment. While the actions performed may not be compute or data intensive, they tend to have strict real-time constraints. The use of multicore technology has increased the design space and performance of Multiprocessor Systems-on-Chip (MPSoCs) that are targeted at embedded applications. A natural extension is to adapt the device construct added in OpenMP 4.0 to support the mapping of different software tasks, or components, to various processor cores.

Eric Stotzer (Ph.D. Computer Science) is a member of the Software Development Organization’s Compiler team.  He has been at TI 25 years working on software development tools, compilers, architectures, and parallel programming models.

Please consider attending by signing up here. In the mean time, we are looking for student and volunteers to help with the conference. Please connect with OpenMPCon if you wish to help.

Why are we charging for SG14 Games Dev/Low Latency meeting at CPPCON 2015? (and how you can get in for free)

We are less then a month away from CPPCon 2015 near Seattle, the premier event for C++ Boot camp. I have two talks scheduled – one on Memory Model and Atomics in C++11/14/17 and a second one on the Birth of SG14. I also have a Grill the Committee Panel.

But I am most excited about a new add-on event. We showed evidence and interest to the Standard committee at the Lenexa Meeting and was approved to form SG14 a Subgroup for Games development, Low latency, real time, simulation, and I might add banking/finance. That name is a bit too long to be usable. So we have just shortened it to the first two labels. As such, we will be chairing a full day meeting of SG14 in a room at the Meydenbauer Center concurrent with CPPCon on Wednesday, Sept 23rd. This enables Game developers who cannot attend C++ Standard meeting to attend with Committee members present to evaluate their proposals. A second meeting is already set up on March 14-18 2016 at GDC 2015 hosted by Sony (thank you, Sony).

If you have been registering, you will have noticed in the registration page for CPPCon that we are charging $25 to attend SG14. You will wonder why is there a cost to attend an SG14 meeting. The reason is really simple and I will show you a way to get in for free simply for reading this blog!

The SG14 meeting is a real C++ Std working meeting where we will be triaging and evaluating proposals, and giving feedback to authors all day (or until all proposals are done). The room we are given is limited in seating capacity (about 50 I am told). So we need to give preference to paper authors, C++ committee members and truly interested attendees. The conference organizers told me, and I agree that to offer a free but limited seating event would be asking for it to be filled (because what is the harm in signing up if you can just show up and leave) and not allow any space for the people who really need to be there.

So as such, if you are a paper author, or a C++ Committee member who intends to be there most of the day to help evaluate proposals, just send me an email reply on the C++ Standard reflector and I will add you to the protected list who can get in for free. Many people are already on that list but those 50 seats run out fast.

Who are the paper authors? They have been busy discussing on the SG14 reflector some of the following topics where there will be a likelihood a paper will be ready to be discussed. Their authors also have a bye into SG14 and I know who they are.

  • flat_map
  • fixed point
  • uninitialized algorithms
  • string stuff
  • rolling queues
  • intrusive containers
  • EH costs
  • Compare virtual function and see if a class has implementation or not
  • thread safe STL

If you fit none of those (not a C++ std committee member, not on SG14, not a paper author) and still interested in attending, you should join the SG14 discussion first, then email me on SG14, here or my gmail address and ask for a free ticket with some justification as to why you might stick through it all. I will be happy to grant it as the aim of this is truly not to make any money, but one of the few gate keeping method we have of not having a limited capacity room flooded, assuming we have the luxury of having that problem:)

Finally, CPPCon 2015 will have a number of talks which seem to be Games related. I had triaged the list with the help of Sean MiIddleditch and Nicolas Guillermot. I can’t say for sure as I have not contacted each author yet, but in a list of likelihood are the following (thanks to Jon Kalb for sending me the correct CPPCon links):

  • Definitely games related:

    • C++ for cross-platform VR development:

http://cppcon2015.sched.org/event/7212a9da0198fcfd8de5c05be21b667c

    • Testing Battle.net (before deploying to millions of players):

http://cppcon2015.sched.org/event/ac2534ecb08510c5810e7df34cdddb94

    • The current memory and C++ debugging tools used at Electronic Arts:

http://cppcon2015.sched.org/event/a9bccd0c3f6beb05752b36a4197a1deb

    • The Birth of Sg14:

http://cppcon2015.sched.org/event/0404d7fede126851710420c16218cdb9

  • Probably interesting to games developers:

    • Live lock-free or deadlock (practical Lock-free programming)

http://cppcon2015.sched.org/event/595740ce3bab0220cc3c22fa92777830

    • Reflection techniques in C++:

http://cppcon2015.sched.org/event/1d5b459ba8433d8e5effad7a862d599a

    • Cross-Platform Mobile App Development with Visual C++ 2015

http://cppcon2015.sched.org/event/7104a3140c2ba28cdd0a68e323f78eb2

    • How to make your data structures wait-free for reads:

http://cppcon2015.sched.org/event/34d0ca4052e1acad959c725584329dd7

    • C++11/14/17 Atomics the Deep dive: the gory details, before the story consumes you!

http://cppcon2015.sched.org/event/6f91922313cebd5a25369c05a56d4359

    • C++ Atomics: The Sad Story of memory_order_consume: A Happy Ending at Last?

http://cppcon2015.sched.org/event/6d97f88ae259e8103f23830ae350dc30

    • C++ in the Audio Industry

http://cppcon2015.sched.org/event/1cded491a6eeea3a5e5f1541af80a2a7

    • 3D Face Tracking and Reconstruction using Modern C++

http://cppcon2015.sched.org/event/d5f2c8bdd2fbdee420fa24f166f8bdec

    • Implementation of a component-based entity system in modern C++14

http://cppcon2015.sched.org/event/eb915d37a737d8ace0fbb9e4b5892f6d

  • Probably less interesting to games developers:

    • C++ Multi-dimensional Arrays for Computational Physics and Applied Mathematics

http://cppcon2015.sched.org/event/3ec0f48e8500cb20789d2935facca8c5

    • CopperSpice: A Pure C++ GUI Library

http://cppcon2015.sched.org/event/e27044d13660bf65b8a799dac1eff177

I hope to see you at the conference and I am hoping you will attend SG14, despite this $25 charge because now you know how to get in for free!

OpenMPCon 2015 talk series 2

OpenMPCon  next month aims to bring a stellar lineup of the latest industry gurus, users and developers together with the language designers. As such we have 3 keynotes along with two full day tutorial and a day and a half of talks. You cans see the first keynote, tutorial and the first of three talks here and the second keynote here.

I like to take this chance to look at 3 more talks.

Simon McIntosh-Smith of Bristol University will do a performance comparison between OpenMP 4.0 and OpenCL.

There are many discussions about whether different programming models can achieve high performance and how easy that is for the programmer. For HPC applications, where performance is critical, this question is especially interesting in the context of OpenMP 4.0 and OpenCL which offer different constructs for describing data parallelism.

Martin Jambor from SUSE Linux will offer a talk on GCC support to compile OpenMP 4 target constructs for Heterogeneous System Architecture (HSA) accelerators that summarizes their experience from ongoing development of a GCC branch which takes OpenMP code and compiles it so that it runs on HSA GPGPUs. He will outline the architecture of the process with emphasis on the differences between compilation for the host and HSA accelerators. Furthermore, he will discuss what kinds of input they can compile in a straightforward manner and, on the contrary, what are the problematic cases and how they have tackled them. They intend to merge the branch to GCC 6, and so the talk will also serve as a preview of what will be available in the GCC released in spring 2016.

Dmitry Prohorov of Intel will describe OpenMP analysis in Intel VTune Amplifier XE. This talk will present what they have been doing in Intel ® VTune™ Amplifier XE to support OpenMP performance analysis that shows the results in terms of the OpenMP constructs that the programmer operates with, instead of offering general tuning paradigms that can confuse OpenMP programmers rather than helping them understand the real problems in their code.

Please consider attending by signing up here. In the mean time, we are looking for student and volunteers to help with the conference. Please connect with OpenMPCon if you wish to help.

OpenMPCon Keynote: OPEN-MP-ENABLED SCALABLE SCIENTIFIC SOFTWARE FOR EXTREME SCALE APPLICATIONS: FUSION ENERGY SCIENCE

OpenMPCon  next month aims to bring a stellar lineup of the latest industry gurus, users and developers together with the language designers. As such we have 3 keynotes along with two full day tutorial and a day and a half of talks. You cans see the first keynote, tutorial and the first of three talks here. I like to take this chance to describe the second of three keynotes by Professor William Tang of Princeton University.

A major challenge for supercomputing today is to demonstrate how advances in HPC technology
translate to accelerated progress in key application domains – especially with respect to reduction
in “time-to-solution” and also “energy to solution” of advanced codes that model complex physical
systems. In order to effectively address the extreme concurrency present in modern
supercomputing hardware, one of the most efficient algorithmic approaches has been to adopt
OpenMP to facilitate efficient multi-threading approaches. This presentation describes the
deployment of Open-MP-enabled scalable scientific software for extreme scale applications – with
focus on Fusion Energy Science as an illustrative application domain.
Computational advances in magnetic fusion energy research have produced particle-in-cell (PIC)
simulations of turbulent kinetic dynamics for which computer run-time and problem size scale very
well with the number of processors on massively parallel many-core supercomputers. For
example, the GTC-Princeton (GTC-P) code, which has been developed with a “co-design” focus,
has demonstrated the effective usage of the full power of current leadership class computational
platforms worldwide at the petascale and beyond to produce efficient nonlinear PIC simulations
that have advanced progress in understanding the complex nature of plasma turbulence and
confinement in fusion systems. Results have provided great encouragement for being able to
include increasingly realistic dynamics in extreme-scale computing campaigns with the goal of
enabling predictive simulations characterized by unprecedented physics realism needed to help
accelerate progress in delivering clean energy. In particular, OpenMP usage experience and
associated best practices in achieving these advances will be described.

Prof. William Tang of Princeton University’s Department of Astrophysical Science serves on the
Executive Board for the University’s interdisciplinary “Princeton Institute for Computational Science
and Engineering (PICSciE)” which he helped establish as Associate Director (2003-2009). He is
also a Principal Research Physicist at the Princeton Plasma Physics Laboratory [the DOE national
laboratory for Fusion Energy Research for which he served as Chief Scientist (1997-2009)] and
was recently appointed Distinguished Visiting Professor at the Shanghai Jiao Tong University’s
HPC Center and NVIDIA Center of Excellence. He is a Fellow of the American Physical Society
and has received the Chinese Institute of Engineers-USA Distinguished Achievement Award
(2005) and the HPC Innovation Excellence Award from the International Data Corporation (2013).
He is internationally recognized for expertise in the mathematical formalism as well as associated
computational applications dealing with electromagnetic kinetic plasma behavior in complex
geometries, and has an “h-index” or “impact factor” of more than 45 on the Web of Science,
including well over 7000 citations. Prof. Tang has taught for over 30 years at Princeton U. and has
supervised numerous Ph.D. students, including recipients of the Presidential Early Career Award
for Scientists and Engineers in 2000 and 2005.

Please consider attending by signing up here. In the mean time, we are looking for student and volunteers to help with the conference. Please connect with OpenMPCon if you wish to help.

Clang 3.7 will have full OpenMP 3.1 support followed by OpenMP 4

Hi all, one of my job as CEO of OpenMP is to manage the improvement of OpenMP, the de-facto parallel programming API serving on C, C++, and Fortran. For the last several years, we have been working in collaboration with Intel, IBM and many other companies to add OpenMP support into clang/llvm. OpenMP is already well supported by many proprietary compilers from IBM, Intel, Oracle, PathScale, TI, as well as GNU and others. But with clang/llvm, we had to start from scratch but Intel had an  OpenMP github branch which we have been collaborating with developing, and upstreaming to clang. So over the last 2 years, you have been seeing more OpenMP support in clang/llvm in llvm 3.5, 3.6 releases.

I am happy to announce that this collaboration has achieved full OpenMP 3.1 support in clang/llvm 3.7 which will be released at the end of August.

This branch will have OpenMP 3.1 fully supported, but disabled by default (due to a timing of library switch, but eventually it will be on by default likely by the next clang release). To enable it, please use “-fopenmp=libomp“ command line option. Your feedback (positive or negative) on
using OpenMP-enabled clang would be much appreciated; please share it either at the cfe-dev  or openmp-dev mailing list.

In addition to OpenMP 3.1, several important elements of 4.0 version of the
standard are supported as well:
– “omp simd“, “omp for simd“ and “omp parallel for simd“ pragmas
– atomic constructs
– “proc_bind“ clause of “omp parallel“ pragma
– “depend“ clause of “omp task“ pragma (except for array sections)
– “omp cancel“ and “omp cancellation point“ pragmas
– “omp taskgroup“ pragma

Clang 3.7 now fully supports OpenMP 3.1 and is reported to work on many platforms,  including x86, x86-64 and Power based on the collaboration of many companies below.

Our intention is to have clang/llvm along with GNU compilers be the reference implementations allowing it to follow OpenMP closely as OpenMP rapidly develop new features. Commercial compilers have their own specific pressures and customer requirements and can typically follow as they need.

We plan to continue work on 4.0 for clang 3.8. Please see this link for up-to-date
status. OpenMP 4.1 is sure to follow too as we are now just releasing the public Comment draft and aimed to ratify OpenMP 4.1 with significant enhancements for accelerators by SuperComputing 2015 in November.

Contributors to this work include AMD, Argonne National Lab., IBM, Intel, Texas Instruments, University of Houston and many others.

OpenMPCon/IWOMP 2015 Developers Conference will be in Aachen, Germany

It is with great pleasure that I wish to announce OpenMPCon 2015 which is paired and co-located with the sister conference IWOMP 2015 on Sept 28-Oct 2. As OpenMP evolves rapidly with more releases to match pace with the rapidly evolving Hardware and Parallel world (one Technical Report followed by one ratified Specification every year; we are nearly delivering OpenMP 4.1 which you can preview at OpenMPCon), OpenMPCon aims to be the central place for the latest OpenMP Language development, tutorial, tips and tricks, as well developers connecting with language designers, and compiler/tool vendors.  This will be done on Monday and Tuesday, and partially on Wednesday where the tutorial is shared with IWOMP but can be attended by both conferences.

IWOMP continues its rich history of academic refereed journal research paper presentations on Thursday and Friday on the latest research that will move OpenMP forward into the next generation, as well as offering experimental data on current OpenMP usage that will allow the Language Committee to refine existing constructs. These two combined conferences form a very healthy way for OpenMP to remain agile and moving forward.

Whether you are an OpenMP Developers/Students/Consultants/Educators, Managers / Directors / Team Leaders, Vendor Companies/Sponsors/Book Publishers, or Career Seekers & Recruiters, here are reasons to attend
or sponsor. Note that sponsorship of OpenMP also implies automatic sponsorship of IWOMP and vice versa.

The latest Program Schedule is online and it contains a wealth of full day tutorials, advanced tutorials, talks from users, developers and Language Committee members. The Conference long break/lunch times for connecting with all the attendees. There will be Lightning talks and posters held at breaks as well as evening sessions where you can Grill the Committee/CEO on the latest draft of the OpenMP 4.1 specification which should be out for comment draft before the conference.

Currently, the first of three keynote is offered by Intel’s Chief Evangelist and Book Publisher extraordinaire James Reinders. I like to take this chance to describe the first Tutorial to be hosted by past OpenMP CEO Tim Mattson all-day on Monday Sept 28..

For this tutorial we assume you know C (OpenMP supports Fortran and C++, but we will restrict ourselves to C), are relatively new to parallel programming.The tutorial is based on Active learning! and will mix short lectures with short exercises. It offers a comprehensive overview of OpenMP from one of the most expert and entertaining instructors of OpenMP, offering everything that OpenMP Beginners to Intermediate users would want to get started in OpenMP and learn more of the deep details of tasks and loop parallelism.

If you are interested in the latest hot topics on Exascale computing or just how to program GPUs, or heterogeneous computing, then you need to attend this talk on a comparison of the latest state of OpenMP vs OpenACC will be offered by James Beyers who having been positioned in both groups offers unique insight into both. As both an OpenMP and OpenACC insider, he will present his opinion of the current status of these two directive sets for Programming “accelerators”. Insights into why some decisions were made in initial releases and why they were changed later will be used to explain the trade-offs required to achieve agreement on these complex directive sets.

One of the main aim of the OpenMPCon is for users/developers to learn the latest tips, tricks and gotchas from OpenMP Gurus and there is none better from past OpenMP Language Chair Mark Bull from EPCC. This talk will present a series of practical hints for OpenMP programmers, collected from many years experience of teaching OpenMP, and from answering questions on the OpenMP Forum. He will describe some common pitfalls, and tactics for how to avoid them, or work around them, and some helpful hints that will hopefully make your life as an OpenMP programmer that little bit easier!

If you want performance in OpenMP, then you need to attend this talk on where did your performance go. Mark Bull will give you an understanding of how and why OpenMP programs lose performance. In this talk he will attempt to enumerate all the possible ways that OpenMP programs can deliver less than ideal speedup, divided into six main categories: lack of parallelism, load imbalance, synchronisation, communication, hardware contention and compiler non-optimisation. For each category, he will explain why it happens, and offer some possible solutions.

In this blog, I have described the OpenMPCon first of three keynote, as well as the Monday full-day Beginner/Intermediate Tutorial, and three of the talks accepted at OpenMPCon 2015.

In the next set of blogs, I will update as follows

  • Second keynote + next 3 accepted talks
  • Wednesday Advanced Tutorial + next 3 accepted talks
  • Third keynote + next 3 accepted talks
  • Evening sessions + Lightning/Poster talks

Please consider attending by signing up here. In the mean time, we are looking for student and volunteers to help with the conference. Please connect with OpenMPCon if you wish to help.