You've picked a hard topic for a video!
::rubs hands together::  :nerd:
The first thing is that it's very hard to say what is important when it comes to performance. If you have something (Spine or otherwise) that takes a lot of resources, that in itself is not a problem. It is only important IF the total of all the resources your app uses exceeds a reasonable amount. Everything the app does needs to be considered, not just the Spine portion. Your super inefficient app is perfectly fine and requires no special attention right up until you exceed that reasonable amount. Only then does performance matter.
Spending some effort on constructing things in an efficient way can be helpful to make it more likely that you'll never hit that unreasonable threshold. Also if you do exceed that threshold, it may be easier to make adjustments. It's a very fine line to walk, trying to worry only the correct amount.
The problem most people fall into is that they worry too much about performance long before they are anywhere near the threshold where they should begin to worry. That worry can add up to a HUGE amount of wasted effort. As you probably know, it is called "premature optimization". Great effort can be put into doing things extremely efficiently, making everything about building the app harder, and very often NONE of that was actually necessary.
Discussing performance optimizations is good, but it's important to focus on the efforts that make the most difference. Since most people will be doing preemptive and premature optimization, it's most helpful to discuss potential problems that are most likely to cause you to exceed the unreasonable performance threshold. Give people ways to help avoid the worst performance problems with the least effort. That least effort part is important, because they don't actually have any performance problems yet!
Once you do have performance problems, there are still plenty of ways to waste huge amounts of time and effort. There is no point putting effort into making areas that are already fast even faster, even if you could make those areas much more efficient. For example, say you can reduce one action by 99% of the time it takes, and another you can reduce only 25%. However if the first took 10ms, now it takes 0.1ms and you probably can't tell the difference. If the other took 4s, it goes down to 3s, which is noticeable. Prioritizing the areas that are causing your problems (identifying your worst bottlenecks) is the first step of performance optimization and to do that you almost always need to take measurements of your actual app. I know many of your are going to ignore that bold part, please read it a few extra times! Due to that, watching a video about various optimizations is unlikely to be helpful unless you happen to have the exact problem covered by the video.
Enough blah blah, on to your questions!
warmanw wroteUse Linked meshes when creating sequence of meshes, and use them with skins when parts are using same silhouette
  
This isn't really related to performance. Linked meshes are better than duplicating a mesh many times, because some of the same mesh information is shared (bones, vertices, triangles, UVs, hull length, edges, width, height). However, you'll still have an attachment per frame in your sequence. It would be a bit better to use a single mesh attachment with Sequence checked. Then you have only one mesh attachment and you don't need a timeline to change attachments. That means less data in the skeleton file, less memory needed to load attachments and keys. CPU and GPU performance aren't affected though.
warmanw wroteI have heard Constraints are heavy, and using them sparingly is a great Idea. any difference between constraints? for example maybe path constraint that has 10 bones and multiple vertices is much heavier than path with 10 bones and 2 vertices? And what if we enable stretching for 2 bone IK? does any of this make a big impact?
 
There is no reason to avoid constraints. They can cause a few more bone transformations, but those are pretty cheap. IK and transform constraints don't take much processing at all, no matter their settings. Path constraints require more CPU than those, especially if you have many bones following the path, but even if you use many path constraints it's unlikely to be your worst bottleneck.
warmanw wroteDeforming meshes in animation are super bad for CPU, but is it bad if we deform them in setup pose?
 
Applying deform keys does not use a lot of CPU, it's just a simple float array copy, then an addition per vertex. That's not free, nothing is, but it's not a big deal. What's bad about deform keys is they use a lot of memory (and size in the data file) to store values for every bone weighted to each mesh vertex. A few keys isn't a big deal, but consider if you key all the meshes on your character 5 times in 10 animations
 
 you've increased the mesh vertices that need to be stored by 50x! One of the largest parts of the skeleton data is the mesh vertices, so you have likely increased the entire size of your skeleton data by nearly 50x. This is how people get 25MB+ skeleton data files. It's easily avoided by using weights. Use deform keys sparingly or not at all.
warmanw wroteBlending mode of the slots, does any of them is heavier?
 
If you render using PMA then normal and additive can be used without any performance difference. Otherwise, generally changing blend modes causes a batch flush. For example to render a single attachment with a different blend mode, you cause 2 batch flushes: normal rendering, flush, other blend mode rendering, flush, more normal rendering. Like everything else, a few extra batch flushes are fine. It doesn't matter until you are flushing way too many times per frame, which depends on the performance of the devices you target.
Maybe you could explain the different runtime parts in your video. The GPU has a few: geometry submission, draw calls (batching), fill rate. You could break CPU down into the (generally) most expensive operations: clipping, bone and vertex transforms, etc.
warmanw wroteWhat about tint black? what would be heavier if we enable Tint Black for the slot or change blending mode to additive?
 
Tint black causes more data to be sent per vertex. Disabling tint black entirely for the renderer is more efficient. This only matters if you are sending way too much geometry to the GPU each frame. That is unlikely because 2D doesn't need much compared to 3D.
warmanw wroteHow about the keys in the timeline. any difference between the timelines? or keys just hold numbers?
  
The keys just hold numbers, so more keys means a bigger data file and more memory to hold the data at runtime. Applying timelines takes some CPU though (it's a binary search to find the next key for the current animation time), so fewer timelines is better. However, you're unlikely to notice the difference in most cases. Maybe if you are applying many timelines for many skeletons then you'd see a lot of CPU usage for thousands of timelines, but in that case you probably can't easily reduce the number of timelines. Removing a few won't make much difference, because each doesn't take much processing. You probably can't remove say 50% of your timelines, because then you won't get the animation you wanted.
One thing you can do is apply half the animations every other frame (or a similar scheme). That reduces your timeline applying time by 50%.
If you have many skeletons on screen, you may be able to get away with animating say 10 skeletons, then drawing those each 10 times to make an army of 100 skeletons on screen. You've reduced your timeline applying time by 90%! With so many skeletons visible at once, it may not be noticeable that many have the same pose.
warmanw wroteWhat if we shear an art will that use bigger rectangle to render?
 
Shear has no special cost. The size of your art affects the fill rate of the GPU.
warmanw wroteI will also talk about the vertex transforms, question is which is heavier, one vertex with 3 weights or 3 vertices with one weight each?
 
They are the same cost. 13 or 31 results in 3 vertex transforms, which is where the cost is.
warmanw wroteI know about the clipping as well, that number of clipped slots and vertices of the mask affect the performance. anything else?
 
Just what's described on the clipping attachment doc page.
warmanw wroteWhat about inherit transform, rotation and scale? is it bad? looks like simple to calculate.
 
There is not a big difference between any of the combinations.
warmanw wroteIf we set the alpha of the slot to 0 will it continue to draw? or what if we scale the bone to 0, will containing art continue to draw? what if we have 2 slots one with 100% the other with 99% transparency. are they similar performance wise?
 
Setting the alpha to 0 (or 99%) will still draw the image and use your fill rate exactly the same as alpha of 1, unless the runtime has a special case that notices the 0 alpha and skips rendering. It's better to hide attachments.
Scaling an attachment to 0 will still send geometry to the GPU, but nothing will be rendered. Just hide the attachments, it's more clear what the intent is.
warmanw wroteSkel and json are different in Size. is there a difference when code parses them? I know that json is much more easy
 
Parsing JSON is not easier! It is more complex, parsing is slow, and it uses a lot more memory during parsing.
The largest part of skeleton data is usually mesh vertices, which are mostly lists of numbers. Storing all that as text is not great. In binary, each number is 4 bytes and parsing it into a number is very easy. In JSON, it can take 8-9 bytes and parsing the characters into a number is slow when done thousands of times.
Binary (skel) is better than JSON in every way: it's smaller on disk and it's faster to parse. There is no good reason to use JSON in a production application. Only use JSON if you need a human to be able to read it or you need to process the data with other tools. Importing JSON data into a different version of Spine is a little more forgiving than binary, but that is not officially supported.