I am now a Codeplayer


Hi all, I have decided to change company and move to Codeplay. I am very excited to join this very fast growing company lead by Andrew Richards as their VP of Research and Development. Please read about it.

I will be even more involved in the C++ Standard and my job will now be even more tightly integrated with SG14. I have decided to resign from all my OpenMP duties, in order to devote all my attention to my new position.

As clang/llvm is very important to us, I will continue to be active in that area, while adding to my portfolio meetings within Khronos, with a focus on self-driving cars, big-data, financial, embedded devices and other forms of heterogeneous devices and massive parallel dispatch, with special interest in leading a future programming model for these interesting devices.

I might even get a chance to lead a Canadian division for Codeplay.

Thank you to all who have supported me and I look forward to continuing to work with you, or seeing you at a future conference or on some collaborative telecons.



C++17 content prediction(pre-Jacksonville and post-Kona report)

C++17 content (a prediction)

C++17 content will likely close in the next 2 meetings. At the upcoming meeting in Jacksonville, it has already asked that any interested party who wants a major feature in C++17 to put in a position paper, or an opposition paper. We call this the land rush into C++17. Please see previous meeting blog from Lenexa for background.

What will happen is that at Jacksonville, this question will focus our work for the week, with the intention of voting out a Committee Draft(CD) at the following meeting in Oulu in June for National Body Comments. The CD stage is where most comments should happen. This is the national bodies’ first chance to inspect and provide feedback on the committee’s work. After the ballot ends, the comments are collected and the committee has to work through them and respond with a Disposition of Comments. This CD will return comment in the Issaquah meeting in Nov. If there are no major surprises, a DIS will be issued out of that meeting. This is unlikely to be that fast. There will likely be nontrivial changes, such that either we will take 2 meetings to address these comments or a 2nd CD will be issued . With 2 meetings to address comments, the DIS will be issued at the latest in the Feb 2017 Kona meeting, which will result in approval vote in the July 2017 Toronto meeting. There is no need for a FDIS if the DIS passes cleanly. And all this has to happen cleanly such that the Standard appears for sale on ISO and National Standard body websites for sale just before the end of 2017.

C++17 already contains a large number of language and library features as a result of integrating changes from Evolution Working Group, and Library Evolution Working Group. Please see below for a current list of the features.




But few of them are wide-ranging in their effect on covering all of the programming domains of C++. They still add excellent value to a new Standard, but people are beginning to wonder if it will be as large a change as C++11, not to say that is necessarily a good thing to have that many large changes. Without the addition of something like Concepts or other major features that are broadly applicable to many domains, the two questions everyone has is will it contain any truly major features and if so will they be wide-ranging in their effect on the C++ community, i.e. be broadly applicable to all domains, as opposed to a few domains. The outlook, as they say, is stay tuned.

Cppreference gives a great link of all the current projects on hand.


Of these I can form a similar table stating what is their current status just before the Jacksonville meeting, on their likelihood of making into C++17, as we now have much greater clarity and insight.

ISO number Name Status links C++17?
ISO/IEC TR 18015:2006 Technical Report on C++ Performance Published 2006 (ISO store)

Draft: TR18015 (2006-02-15)

std::hardware No
ISO/IEC TR 19768:2007 Technical Report on C++ Library Extensions Published 2007-11-15 (ISO store)

Draft: n1745 (2005-01-17)
TR 29124 split off, the rest merged into C++11

N/A (mostly already included into C++11)
ISO/IEC TR 29124:2010 Extensions to the C++ Library to support mathematical special functions Published 2010-09-03 (ISO Store)

Final draft: n3060 (2010-03-06). Under consideration to merge into C++17 by p0226 (2016-02-10)

special math May be pending special vote and discussion on Monday


ISO/IEC TR 24733:2011 Extensions for the programming language C++ to support decimal floating-point arithmetic Published 2011-10-25 (ISO Store)

Draft: n2849 (2009-03-06)
May be superseded by a future Decimal TS or merged into C++ by n3871

ISO/IEC TS 18822:2015 C++ File System Technical Specification Published 2015-06-18. (ISO store). Final draft: n4100 (2014-07-04) filesystem May be pending special vote and discussion on Monday


ISO/IEC TS 19570:2015 C++ Extensions for Parallelism Published 2015-06-24. (ISO Store). Final draft: n4507 (2015-05-05) parallelism May be pending special vote and discussion on Monday


ISO/IEC TS 19841:2015 Transactional Memory TS Published 2015-09-16, (ISO Store). Final draft: n4514 (2015-05-08) transactional memory No
ISO/IEC TS 19568:2015 C++ Extensions for Library Fundamentals Published 2015-09-30, (ISO Store). Final draft: n4480 (2015-04-07) library extensions May be pending special vote and discussion on Monday


ISO/IEC TS 19217:2015 C++ Extensions for Concepts Published 2015-11-13. (ISO Store). Final draft: n4553 (2015-10-02) constraints and concepts May be pending special vote and discussion on Monday


ISO/IEC TS 19571:2016 C++ Extensions for Concurrency Published 2016-01-19. (ISO Store) Final draft: p0159r0 (2015-10-22) concurrency No
ISO/IEC DTS 19568:xxxx C++ Extensions for Library Fundamentals, Version 2 DTS. Draft: n4564 (2015-11-05) library extensions 2 No
ISO/IEC DTS 21425:xxxx Ranges TS In development, Draft n4569 (2016-02-15) No
ISO/IEC DTS 19216:xxxx Networking TS In development, Draft n4575 (2016-02-15) No
Modules In development, Draft p0142r0 (2016-02-15) and p0143r1 (2016-02-15) No
Numerics TS Early development. Draft p0101 (2015-09-27) No
ISO/IEC DTS 19571:xxxx Concurrency TS part 2 Early development No
ISO/IEC DTS 19570:xxxx Parallelism TS part 2 Early development. Draft n4578 (2016-02-22) No
ISO/IEC DTS 19841:xxxx Transactional Memory TS part 2 Early development No
Graphics TS Early development. Draft p0267r0 (2016-02-12) No
ISO/IEC DTS 19569:xxxx Array Extensions TS Under overhaul. Abandoned draft: n3820 (2013-10-10) No


At the pre-meeting call which I and other senior leaders of the C++ committee participate, one of the first call was a planned discussion on the first day of plenary at Jacksonville of several major projects with the goal of using what precious time we have to focus on those that have a good chance of making it.

  • FileSystems TS
  • Parallelism TS1
  • Library Fundamental TS1
  • Concepts TS
  • Special Math IS

These are the most likely candidates for C++17 that has been in existence as a TS, may have enough usage experience, and is of definite interest to some group.

Of these FileSystems TS and Parallelism TS will likely have the most amount of support to move forward, mostly intact, but you never know. LF TS1 will likely have some parts of it separated out according to the following table courtesy of Marshal Clow:

Feature Sections Notes Implementations
apply() 3.2.2 gcc
Variable Templates For Type Traits 3.3.1 Already added to C++17
Invocation type traits 3.3.2 None?
optional 5 gcc
any 6 gcc
string_view 7 gcc
shared_ptr with array support 8.2 gcc
polymorphic memory resources (PMRs) 8.4, 2.1, 4.2, 9.2, 9.3 gcc (partial)
search 4.3, 10.2 gcc
shuffle 10.3 gcc


For Concepts, the main question is how important will be separate checking and whether there is no way forward for definition checking of templates. Is definition checking a big deal as it was one of the reason the original Concepts C++0x was pulled in the “Frankfurt Accord”? Will it take a long time to be added after current Concepts “Lite” constraint checking is added. I personally do not see definition checking as a big impediment, and if people want definition checking, it will be done. Waiting for it is the essence of Perfect is the enemy of Good, and Good in this case is good enough.

Special Math is interesting as it has been its own IS for sometime, and now is interested in being added to C++17. There is now much more High Performance Computing interest attending C++, so it is no longer a small domain. C++ has a real chance to be enabled for this important domain which covers not just nuclear research or astronomy, but oil and gas, and many consumer domains. It will be good for C++ to have this.

In order to move forward, there will be a three-way vote to decide on each of these questions to judge how many are firmly decided that this should be in, or not in C++17, and how many are still undecided, pending more work. This will lead to some item be dropped from consideration immediately if there is an overwhelming majority against it., to allow time for other more likely candidate to move ahead.

We already had a number of proposals in this and other areas put forward by SG14:

P0249R0 Input Devices For 2D Graphics Brett Searles 2016-02-05 2016-02   SG14
P0267R0 A Proposal to Add 2D Graphics Rendering and Display to C++, Michael McLaughlin 2016-02-12 2016-02 N4073 SG14


P0037R1 Fixed point real numbers John McFarlane 2016-02-11 2016-02 P0037R0 Library Evolution, SG14  
P0040R1 Extending memory management tools Brent Friedman 2016-01-10 2016-02 P0040R0 Library Evolution, SG14
P0059R1 Add rings to the Standard Library Guy Davidson, Arthur O’Dwyer 2016-02-09 2016-02 P0059R0 SG14, Library Evolution
P0203R0 Considerations for the design of expressive portable SIMD vectors Mathias Gaunard 2016-01-26 2016-02   SG14
P0230R0 SG14 Games Dev/Low Latency/Financial Meeting Minutes 2015/10/14-2015/02/10 Michael Wong 2016-02-12 2016-02   SG14
P0232R0 A Concurrency ToolKit for Structured Deferral/Optimistic Speculation Paul McKenney, Michael Wong, Maged Michael 2016-02-12 2016-02   Concurrency, SG14, Evolution  
P0233R0 Hazard Pointers: Safe Reclamation for Optimistic Concurrency Maged M. Michael, Michael Wong 2016-02-12 2016-02   Concurrency, SG14, Library Evolution  
P0234R0 Towards Massive Parallelism(aka Heterogeneous Devices/Accelerators/GPGPU) support in C++ Michael Wong, Hartmut Kaiser, Thomas Heller 2016-02-12 2016-02   Concurrency, SG14, Evolution  
P0235R0 A Packaging System for C++ Guy Somberg, Brian Fitzgerald 2016-02-05 2016-02   Evolution, SG14  
P0236R0 Khronos’s OpenCL SYCL to support Heterogeneous Devices for C++ Michael Wong, Andrew Richards, Maria Rovatsou, Ruyman Reyes 2016-02-12 2016-02   Concurrency, SG14  
P0237R0 On the standardization of fundamental bit manipulation
Vincent Reverdy, Robert J. Brunner 2016-02-12 2016-02   Library Evolution, SG14
P0193R0 Where is Vectorization in C++‽ JF Bastien, Hans Boehm 2016-01-21 2016-02   Concurrency

Kona trip report look back

The last time we were in Kailua-Kona for a C++ Standard meeting was just after C++11 was released, and we had successfully justified the requirement for a new Study Group SG5 Transactional Memory. We are now back in Kona again, always a favorite destination and sure to draw many more people to come to the Standard Meeting, and we have just completed publication of SG5’s Transactional Memory Technical Specification, and starting a new Study Group SG14 on Games Dev/Low Latency/Financial.

Kailua-Kona on the Big Island of Hawaii is in some sense trapped in time, as over the last fifteen years I have been coming here, it has barely changed. Most of the same shops are still in the same place. But the rest of the Big Island has changed (with new volcanic eruptions) and C++ is very different from when I first started coming here.

Today, C++ has 14 Study Groups, have over 100 people attending, and is being updated every five years or less. We are closing on the cutoff date for C++17 features which will likely be the next two meetings in 2016. After that, we will likely be drafting the final changes for ratification in time for C++17 publication, which can have a pipeline as long as eight months.

Over the last decade, the Canadian delegation has grown from just me to 10. At this meeting, there were so many Canadians attending that I would need to do a special Canadian count during formal country ISO votes. We had the following Subject Matter Experts from Canada:

  • Michael Wong, ISOCPP.org, OpenMP CEO (HoD)
  • Hubert Tong, IBM
  • Tony Van Eerd, Christie Digital
  • Botond Ballo, Mozilla
  • JF Bastien, Google
  • Patrice Roy, Sherbrooke Universite
  • Eric Fiselier, Bloomberg
  • Michael Park, pending
  • Xing Xue, IBM
  • Chris Cambly, IBM

Whereas before, I was the only trip report and covering multiple rooms, now some of them have written great trip reports and can help me to cover other rooms which are occurring simultaneously.

I mostly hang out in SG1 concurnecy where I co-chair some sessions while sometimes chairing my own SG5 Transactional Memory and SG14 Low Latency. I would attend Evolution, Library Evolution, Core, or Library sessions as needed to advance individual proposals.

For the first time, in French, Patrice Roy, a professor from Sherbrooke Universite has written this report:


While Botond Ballo has continued in-depth coverage of the Evolution Committee but also an excellent general overview of the entire proceedings.


At this session, I worked to advance a number of proposals for SG14 which means unlike previously where I spent most of my time in SG1 Parallelism/Concurrency, I would move around between all the committees.You can see here for a SG14 paper status report out of Kona.

Description of the major projects

Given the many well done blogs on the content of these meetings, it seems most useful to focus for this blog what major features will get into C++17.

What will Definitely will be in C++17, essentially in nearly complete form

The Filesystems TS is the first major TS accomplished since TS started becoming the way Evolution Groups (Both Language and Library) explore major features without getting bogged down. This features enables a uniform way of accessing filesystems submerging the differences between Unix, and Windows (for example, the back vs forward slash, and awareness of capitalization). It has been a Boost library for quite some time under Beman Dawes and its addition to C++17 is certain. There will likely be a second version of this because as it stands it does not deal with filesystems such as Network filesystems, or that which exists on mainframe systems. These will likely be added in future TSs.

Parallelism TS will contain Parallel (and Parallel and vectorized versions) versions of STL. It has been published with many implementations and there is no reason to think it will have any changes on it for inclusion into C++17 fully as is. It contains the seed that will form the future of massive parallelism as we move forward to support accelerators.

Library Fundamental adds a number of useful functionalities in the form of optional, any, string_view and much more to the Library. It has been published and parts will certain to be in C++17. There will be controversial parts that would be left out.

What may make it into C++17, at least some part of it in some form.

Concepts (lite) TS is a constrained template mechanism that enables the template side to have a prototype mechanism to offer a much more useful template mechanism that finally solve the error novel problem whenever a template client does not match what the template requires. This has been a long standing problem in C++ that would have been solved with the original Concept proposal. But in Frankfurt back in 2007, it was shown that the existing design would mean even simple uses of template would require knowledge of Concepts. That was because it contains extensive template dfeinition checking as well as a Concept Map mechanism for abstraction of archetypes. That original Concepts proposal was removed in a critical vote, which will forever be known as the “Frankfurt Accord”, although few people know about that.

What will Not make it into C+17, but will be on deck for C++20 or C++22.

Concurrency TS contains improvements to futures, latches and barriers, and atomic smart pointers. Although there is implementation experience, it was just approved and is too fresh to be voted to be added to C++17. The futures improvement are language changes but are already in well tested in Microsoft’s Visual C++ and C# compilers. The other two features are library additions. Latches and barriers have been used in Google, and atomic smart pointers are just a syntactic sugar on top of smart pointers such that it has a first class name. In future, it will form the basis of a future marriage with the Parallelism TS to form the basis of the support for SG14’s design for Accelerator support for Massive parallelism. The future mechanism is ideal in conjunction with parallel STL to enable dependency with bulk dispatch.

Library Fundamental TS2 already contains a rich amount of content because the work is still ongoing. It already has source code information and various utilities. As this is being balloted by National Bodies, it will be approved before the deadline for C++17 cutoff in the next two meetings but is too fresh to be considered for C++17.

One of my two groups, the Transactional Memory TS has a well defined specification that has already been approved to publish. It will very soon have implementation in GNU V6 in 1H, 2016. We also have been exploring Wyatt Technology usage in industry and learning from their experience of using TM in their code. We still need more experience from usage of the TS form and GNU will give us that. Without that usage experience, SG5 volunterily does not proposed this to be added to C++17. Most likely, if any part is to be include, then the synchronized form of the construct is the most likely candidate. Unlike the atomic form, It enables a simple replacement for locks, that is composable (whereas locks do not compose) , and usable immediately in that it works with any current code (even transaction unsafe code). Of course, the downside is that it will become irrevocable as soon as a transaction unsafe action is performed, and will commit all of its actions. This makes it possible to be not well-scalable.

Networking TS is based on established practice in Boost’s ASIO library by Christopher Kohloff. It is a socket library and much of it was reviewed last year in a special Library meeting with current wording review in progress. It is a large proposal and as such, despite its well formed historical basis, some details has changed from the original Boost.ASIO. It will not make it into C+17.

Ranges TS has been called STL2, and instead of iterators will use range-based algorithms. It was specially commissioned by ISOCPP.org and designed by Eric Niebler, an expert long-time Boost developer. It is also design review complete with wording review in progress. It is a very large change enhancement and while there is wide interest and support behind it, it is still too fresh to make it into C++17.

Parallelism TS2 is already on the deck with major features such as Task blocks, agents, progress guarantees, and SIMD. In fact, a joint SG14/SG1 call last week just established SIMD in a form that essentially accepts implicit wavefront as the way to move forward, modulo a few semantics corner cases with multiple threads writing to the same SIMD lane, causing either undefined or well defined industry behavior (ordered by whichever writes last). At some point in the future, this TS may also add mapreduce and pipeline capabilities.

Numerics TS (SG6) is still under development with additions for Fixed point arithmetic, DFP, and future support for IEE754 128-bit floating point. It has been meeting on and off as the chair cannot attend every meeting. This likely will have less chance of making it into C++17, although individual proposals such as fixed point still may.

Array extension has had a checkered past and likely similarly checkered future. It is a stack array with size that is not known at compile time. It has been started, then pulled out, then at Kona, direction was given after a special discussion in EWG to that can be used as framework for future proposal. It will not be in C++17

See you after Jacksonville meeting. Thanks.

C++ Standard SG14 meeting (Games Dev/Low Latency/Financials) at CPPCON 2016

ISO Study Group 14 “Game Development/Low Latency/Financials” will be meeting co-located with and hosted by CppCon on Wednesday, Sept 21st from 8:30 am to 5:30 pm. Conference attendance is not required to participate in the SG14 meeting.

About CPPCON, the arrangement will be similar to last year which seemed to have worked out quite well.
Here is a trip report:

We will have a room at The Meydenbauer Center to have a full day meeting to review the papers. We will have a signup (and free entrance if you notify me here, otherwise you would have to pay a nominal amount if you just proceed through the CPPCOn registration) as we only have space for about 50 people. This is a working meeting so only people who really want to participate in reviewing and offering feedbacks to papres, or have proposals should attend. We can’t stop people from stopping in and out, but paper authors will need to know to be present to give their proposal.

CPPCon and ISOCPP.org foundation has already agreed to pay for the room, and break which will just piggy-back on CPPCon.

Meeting paper submission deadlines.
In order to have a useful meeting, we need people to submit their ideas, proposals well before the 2 SG14 meetings. That mean we will have SG14 Paper submission deadlines (just like a  normal C++ Std meeting). These deadlines enable papers to be reviewed before the meeting by others and meaningful feedback be given at the meeting.


C++ Standard SG14 meeting (Games Dev/Low Latency/Financial) at GDC 2016

I finally have news of SG14 on the place and date for the SG14 meeting at GDC.

JF Bastien and Google has offered us a place in downtown San Francisco at their Google office (if you ever want to visit Google, here it is) which is about 17 minutes walking from GDC

345 Spear Street
San Francisco, CA 94105

On the 7th floor. Google will provide breakfast, refreshment, and lunch.

But we are also planning a social event and have a drink and food (self-paid) afterwards or whenever it ends.

The meeting has to be held on Monday, March 14 8:30am-5:30pm as that is the only time Google has space and time. I wanted to release this information immediately so people can make plans.

We will need a sign-up name of people intending to attend so that badges can be printed.  Please sign up by emailing me. Let me know if you intend to stay for the social.
If you intend to present papers, the mailing deadline to this forum is Feb 29.  Please email me too with paper name and author(s) and submit using SG14 reflector mail.

Please distribute to the Games community.
We welcome new papers and review of previous papers with new adjusted content to get feedback before we bring it to the C++ Standard meeting in June at Oulu, Finland.


The view from the SG14 Games Dev/Low Latency meeting at CPPCon 2015

SG14 report
In short, we had a super successful meeting. The room was full the entire day and the discussions were healthy and constructive. Thank you to all who helped to make this so much fun and enjoyable. I predict there will be even more participation next year as people seems to really enjoy participation in a design process and not just the mere act of listening to presentations.
I wish to than everyone who made it to the meeting, my co-chairs Sean Middleditch (Wargaming) and Nicolas Guillemot (Intel), as well as the scribes for the meeting who worked under difficult conditions:
Billy Baker (Flight Safety)
John McFarlane (Zoox.com)
Guy Davidson (Creative Assembly)

We plan to continue monthly telecon to conduct paper reviews. At Kona, we will have a Wednesday night evening session to update everyone.

Please see the slides for the report here which was presented in the Friday session by Sean and Nicolas which also outlines the problem domain, our motivation as well as the status of the discussion.

The Birth of SG14

We have 8 papers upstreamed from the meeting
P0037R0 Fixed point real numbers John McFarlane LEWG SG14/SG6
P0038R0 Flat Containers Sean Middleditch LEWG SG14
P0039R0 Extending raw_storage_iterator Brent Friedman LEWG SG14
P0040R0 Extending memory management tools Brent Friedman LEWG SG14
P0041R0 Unstable remove algorithms Brent Friedman LEWG SG14
P0048R0 Games Dev/Low Latency/Financial Trading/Banking Meeting Minutes 2015/08/12-2015/09/23 Michael Wong SG14
P0059R0 Add rings to the Standard Library Guy Davidson LEWG SG14
P0130R0 Comparing virtual functions Scott Wardle, Roberto Parolin EWG SG14

We had a major brain storm on why games programmers tends to turn off exception handling, and a brain storm session to see what kind of options and ideas to address their concerns.

We also briefly discussed the scope and coverage and decided that because of the common interest of

  • Real Time Graphics
  • Interactive simulation
  • Low-Latency
  • Constrained Resources

All are areas of interest to Games Dev, but they also intersect Financial Trading, Simulation and Embedded devices as can be seen from slide 12 of The Birth of SG14 talk.

As such a more appropriate name for SG14 group may be: Games Dev/Financial Trading/Banking/Simulation and we wish to invite representatives from these industries to join.

We intend to bring to discussion in addition to Exception/RTTI costs, topics regarding the following, some of which crosses into other groups such as SG1 Concurrency, SG6 Numerics and SG7 Reflections.

There are a few papers in the current mailing that is of interest to SG14.

GPU Accelerator support
P0069R0 A C++ Compiler for Heterogeneous Computing

We actually intend to study the design of C++AMP, OpenMP Accelerator/OpenACC, OpenCL, OpenGL and Vulcan for a GPU accelerator design to support gamers. We plan to review this paper under a co-located SG14 session which can run with SG1, but we are interested in taking this work further.
P0089R0 Quantifying Memory-Allocation Strategies
Coroutines and games
P0054R0 Coroutines: reports from the fields
P0055R0 On Interactions Between Coroutines and Networking Library
P0070R0 Coroutines: Return Before Await
P0071R0 Coroutines: Keyword alternatives
P0073R0 On unifying the coroutines and resumable functions proposals
P0099R0 A low-level API for stackful context switching
Intrusive containers


function multiversioning

low latency for financial/trading

matrix operations

SIMD vectors
P0076R0 Vector and Wavefront Policies


P0106R0 C++ Binary Fixed-Point Arithmetic
Our next telecon call is next Wednesday. You can get the detail by watching this mailing list:
And at the next C++ Standard meeting, we will be discussing the papers above. Thanks.

OpenMPCon 2015 Advanced tutorials

OpenMPCon  this month aims to bring a stellar lineup of the latest industry gurus, users and developers together with the language designers. As such we have 3 keynotes along with two full day tutorial and a day and a half of talks. You cans see the first keynote, tutorial and the first of three talks here. We also posted the second of three keynotes by Professor William Tang of Princeton University as well as the second series, third series , and fourth series of talks. The third keynote is also here as are the evening sessions on Grill the Committee and Plan the next OpenMPCon.

I will now describe one of final gems of attending the OpenMP Developers conference along with all the other great talks that reveal the nuts and bolts of OpenMP. The tutorial material offers the latest way to fast track you to being guru at using OpenMP in your work, taught by committee members and educators who are plugged into the design of the specification. We offer a full education range starting with the thoroughly popular and well-tested beginner/intermediate hands-on full coverage of all of OpenMP by Tim Mattson on Monday where the tutorial is based on Active learning! and will mix short lectures with short exercises.

This tutorial is based on a long series of tutorials presented at Supercomputing conferences and  are based on a course he teach with Kurt Keutzer of UC Berkeley.

On Wednesday, along with a regular series of talks and keynotes, one of the track will show case OpenMP senior Educator Ruud Van der Pas teaching why OpenMP REALLY scales. In his characteristic entertaining and annedote-filled manner, Rudd will take a difficult to handle topic how to make OpenMP scale, because unfortunately it is a very widespread myth that OpenMP Does Not Scale – a myth we intend to dispel in this talk.

Tasking models are now everywhere in many standards and specification as they are used to deal with irregular workloads that can not be captured in a parallel loop. Yet  some are heavy weight and some are light weight. Michael Klemm and Christian Terboven, the OpenMPCon and IWOMP Program Chair, respectively will show what OpenMP offers Task and the insider information on how to best take advantage of them.

If you think OpenMP is merely about threading then you might be interested in the latest features of OpenMP 4.x that exploit the SIMD capabilities of modern processors.   Since processors tend to spend more die space for SIMD, growing with every new generation, the so-called “vectorization” becomes more important.  Whereas threading is already covered well, vectorization is still is an underdog.In this tutorial we provide an introduction to vectorization extensions of OpenMP 4.0 and the upcoming version.  Simplified examples extracted from recent Intel Parallel Computing Center projects will be used as demonstration.  Attendees will get a set of different examples to become accustomed with the different vectorization techniques of the latest OpenMP standards.

Want more? OpenMP is the dominant programming model for shared-memory parallelism in C, C++ and Fortran due to its easy-to-use directive-based style, portability and broad support by compiler vendors. Compute-intensive application regions are increasingly being accelerated using specialized heterogeneous devices and a programming model with similar characteristics is needed here. This tutorial on OpenMP 4.x Accelerator Model will focus on the OpenMP 4.0 accelerator model that provides such a programming model.

For this half-day tutorial we assume attendees have a basic understanding of OpenMP concepts. We quickly review OpenMP programming topics that are most relevant to the accelerator model. We focus on how the OpenMP execution and memory models were extended to support heterogeneous devices. We cover the new device constructs and API routines that were added in OpenMP 4.0, and we work through some example code. Finally, we preview some of the upcoming features coming in OpenMP 4.1.

Please consider attending by signing up here. In the mean time, we are looking for student and volunteers to help with the conference. Please connect with OpenMPCon if you wish to help.

Grill the OpenMP CEO/ARB/Language Committee members and Plan the next OpenMPCon

OpenMPCon  this month aims to bring a stellar lineup of the latest industry gurus, users and developers together with the language designers. As such we have 3 keynotes along with two full day tutorial and a day and a half of talks. You cans see the first keynote, tutorial and the first of three talks here. We also posted the second of three keynotes by Professor William Tang of Princeton University as well as the second series, third series , and fourth series of talks. The third keynote is also here and we will now describe evening sessions on Grill the Committee and Plan the next OpenMPCon.

So What would you like to know about how the OpenMP Specification happens (especially on OpenMP 4.1 that is scheduled to be released in a month at SC 15) or the membership/organizational changes to the OpenMP ARB, or may be you just like to grill the current or past CEOs?

On Tuesday evening, before the dinner and right after the Tuesday talks will be The Grill the Committee session will offer that chance as the the panel is made up of members of the C++ Standards Committee and the audience asks the questions. Current members anticipate to be there are:

CEOs past and present:

  • Michael Wong
  • Larry Meadows
  • Tim Mattson

ARB and Language members present:

  • Eric Stotzer
  • James Beyer
  • Xinmin Tian
  • Alice Koniges
  • Oscar Hernandez
  • Mark Bull
  • Michael Klemm
  • Yun He

and many others. You ask the questions, and give feedback about OpenMP.

On Wednesday evening, at the end of all the talks and if you still want to stick around (which means you are really interested), or is waiting for the IWOMP to start the next day, we will hold a Planning the next OpenMPCon panel. The next OpenMPCon is tentatively be in Kyoto, Japan sometimes in Oct 2016. We would like to know from you what we did well and what needs improving, but best of all we would want your volunteer help in organizing the next OpenMPCon.

Please consider attending by signing up here. In the mean time, we are looking for student and volunteers to help with the conference. Please connect with OpenMPCon if you wish to help.

OpenMPCon 2015 Talk Series 4

OpenMPCon  this month aims to bring a stellar lineup of the latest industry gurus, users and developers together with the language designers. As such we have 3 keynotes along with two full day tutorial and a day and a half of talks. You cans see the first keynote, tutorial and the first of three talks here. We also posted the second of three keynotes by Professor William Tang of Princeton University as well as the second series and third series of talks. The third keynote is also here and we will now describe fourth series of talks.

Want to know how OpenMP is used in US National Labs, especially at NERSC? NERSC is the primary supercomputing facility for Office of Science in the US Depart of Energy (DOE). Our next production system will be an Intel Xeon Phi Knights Landing (KNL) system, with 60+ cores per node and 4 hardware threads per core. The recommended programming model is hybrid MPI/OpenMP, which also promotes portability across different system architectures.

OpenMP usage statistics, such as the percentage of codes using OpenMP, typical number of threads used, etc., on current NERSC production systems will be analyzed. They will describe what they tell their users how to use OpenMP efficiently with multiple compilers on various NERSC systems, including how to obtain best process and thread affinity for hybrid MPI/OpenMP, memory locality with NUMA domains, programming tips for adding OpenMP, strategies for improving OpenMP scaling, how to use nested OpenMP, and tools available for OpenMP. Tuning examples with real scientific user codes will also be presented on improving OpenMP performance.

Manuel Arenaz will demonstrate A Success Case using Parallware. The manual parallelization of existing code is usually a tedious and error-prone task, specially in the case of large projects. Parallware is the first commercial OpenMP-enabling source-to-source compiler that automatically adds OpenMP capabilities in scientific programs. The compiler automatically discovers the parallelism available in sequential codes written in the C programming language. It produces human readable code annotated with OpenMP directives, instead of a binary executable file.In this work we analyze the parallelization of the program EP from the NAS Parallel Benchmarks (NPB) suite. They show through performance results that, starting from the original sequential version and applying some simple code refactorizations, Parallware is able to generate efficient OpenMP parallel code automatically.

Please consider attending by signing up here. In the mean time, we are looking for student and volunteers to help with the conference. Please connect with OpenMPCon if you wish to help.

OpenMPCon 2015 talks Series 3

OpenMPCon  this month aims to bring a stellar lineup of the latest industry gurus, users and developers together with the language designers. As such we have 3 keynotes along with two full day tutorial and a day and a half of talks. You cans see the first keynote, tutorial and the first of three talks here. We also posted the second of three keynotes by Professor William Tang of Princeton University as well as the second series of talks. The third keynote is also here and we will now describe third series of talks.

Want to know what it takes to port OpenACC 2.0 to OpenMP 4.0? Oscar Hernandez of Oak Ridge NL has done it and can show you the way as he presents code comparisons to show how each API is used to parallelize representative code fragments. Furthermore, he will give guidelines for developers wishing to convert codes from OpenACC 2.0 to OpenMP 4.0.

Alice Koniges of Lawrence Berkeley NL will describe what it takes to Enable Application portability across HPC platforms using Open Standards with an aim towards User-oriented goals for OpenMP. Portability plus performance are key requirements for large-scale scientific simulations on the path to exascale. Users of the high-end computing facilities such as the National Energy Research Scientific Computing Center (NERSC) and the Oak Ridge Leadership Computing Facility (OLCF) are demanding portable standards to enable their codes to run on differing high performance computing (HPC) architectures with relatively little user intervention between differing versions that have been optimized for performance.

The emerging OpenMP standards are poised to offer such portability. In this presentation, she will discuss several important goals and requirements of portable standards in the context of OpenMP.

Want to know Effective OpenMP SIMD Vectorization for Intel Xeon and Xeon Phi Architectures? There is no better guru then Intel’s Xinmin Tian, who will show how to efficiently exploit SIMD vector units in achieving high performance of the application code running on Intel® Xeon and Xeon Phi™.In this talk, he will present Intel® compiler framework that supports OpenMP4.0/4.1 SIMD extensions, and also present a set of key vectorization techniques such as function vectorization, masking support, uniformity and linearity propagation, alignment optimization, gather/scatter optimization, remainder and peeling loop vectorization that are implemented inside the Intel® C/C++ and Fortran product compilers for Intel® Xeon processors and Xeon Phi™ coprocessors.

Please consider attending by signing up here. In the mean time, we are looking for student and volunteers to help with the conference. Please connect with OpenMPCon if you wish to help.