What is the precise definition of "Stage / Staging" in Computer Science / Engineering?

495 Views Asked by At

I am starting in Data Science and I come from math/stats/economics. I am very used to precise definitions even if it means going a bit deeper into the theory to explain something as simple as a function.

I tried to look for precise definitions of Stage / Staging when used as:

  • Staging area
  • Staging environment
  • Staging models
  • Staging file
  • a staging step in git
  • etc

For example: https://githowto.com/staging_and_committing Here, I can understand the context, of course, but I'd like an abstract computer engineering explanation of what it is as if you were learning the theory to build a "stage" on your own.

However, none of the explanations were able to precisely define what it is and where it comes from. For example, if you are an electronic or computer engineer or computer scientist, how would you define it, and would you mind pointing out research papers or a famous textbook where you learned it?

I am in the context of "data" but I would argue that it is independent of the field, because it is a computer unit after all, as I understand.. but I may be wrong.

Thank you!

2

There are 2 best solutions below

0
On

It's an analogy.

I think of staging data like an actors text on a theater stage. As soon as the actor (the ETL job) enters the stage, they need text (data) to play with. Putting data on stage is like giving an actor a new textbook. He knows how to read, interpret and play, but he doesn't know the text, yet. So providing the text ("staging" the data) is quite before the play (the process/job) actually begins, but can also be between the scenes. The picture might be a little odd, but I think you get the point.

  • EXTRACT data -> put it onto stage
  • TRANSFORM data -> let the actors play and create something new
  • LOAD data -> deliver the experience

Actually, I doubt there's something like a precise definition for it, but technically, the staging area, also called landing zone, is the storage area between extracting and loading the data in an ETL process.

Generally this data is defined non-persistent; it's overwritten by or deleted before or after an ETL job. However, there are also cases in which staging data becomes metadata, parameters or comparison data for the next job run, depending on the ETL architecture. I prefer to keeping it non-persistent wherever it's possible.

In git, staging would be the "get on stage and be ready" (think of the theatre stage behind the closed curtain) and committing would be (again) the "delivery" to the audience.

0
On

"Staging" generally is an intermediate place where you put something. I believe the derivation is from military phrases like "staging ground" and "staging area".

It doesn't have a precise technical meaning.

"Staged changes" are source code changes added to a git repo but not yet committed.

"Staging data" is data that was extracted from a source system and landed in a database table before being transformed and loaded into a target table.

"Staging environment" is environment where a complete application is deployed for final testing ahead of a production deployment.