Thursday 27 July 2017

Using a value file in a parameter set in Information Server DataStage

Question

How do I create and use a value file within a parameter set in DataStage?

Answer

Using a value file in a parameter set allows you to set values of a parameter dynamically. For Example:

  • Job A updates the value file. Job B uses a parameter set that points to that value file.
  • When moving jobs from development to test or production you can update the value file to reflect different values for the different environments without having to recompile the job.

To create a new Parameter Set, select File, New and select "Create new Parameter Set"

This will Launch a Dialog shown below:

Fill out the appropriate information on the General tab and the proceed to the Parameters Tab:




In this tab, enter in the Parameters you wish to include in this Parameter Set. Note that you can also add existing Environmental Variables.



The last tab, Values, allows you to specify a Value File name. This is the name of the file that will automatically be created on the Engine tier. This tab also allows you to view/edit values located in the value file.

Click OK to save the Parameter set.

Once the Parameter Set is created, you can view or edit the Value file on the Engine tier. The value file can be found in the following location ../Projects/<project name>/ParameterSets/<Parameter Set Name>. For example:
$ pwd
/opt/IBM/InformationServer/Server/Projects/PROJECT_NAME/ParameterSets/Param_test
$ ls
Param_test_data
$ more Param_test_data
Database=SYSIBM
Table=SYSTABLES
$

Any changes made to the value file will be populated to the Parameter Set automatically.

​​

A DataStage job does not use the new value that is put in the Parameter set.

Problem(Abstract)

A DataStage job does not use the new value that is put in the Parameter set.

Cause

If you make any changes to a parameter set object, these changes will be reflected in job designs that use this object up until the time the job is compiled. The parameters that a job is compiled with are the ones that will be available when the job is run (although if you change the design after compilation the job will once again link to the current version of the parameter set).

Diagnosing the problem

Examine the log entry "Environment variable settings" for parameter sets. If the parameter set specifies the value "(As predefined)", the parameter set is using the value that was used during the last compile.

Resolving the problem

If the value of the parameter set may be changed, you should specify a Value File for the parameters or set the parameters in the parameter set (including encrypted parameters) to $PROJDEF.

​​

How to set default values for Environment Variables without re-compiling DataStage jobs

Question

Is it possible to set/change the default values for an environment variable without re-compiling the DataStage job?

Answer

Yes, it is possible to set/change the default values for an environment variables without recompiling the job.

You can manage all of your environment variables from the DataStage Administrator client. To do this follow these steps:

  1. Open the Administrator client and select the project you are working in and click Properties.
  2. On the General tab click Environment.
  3. Create a new environment variable under the "User Defined" section or update the variable if it exists. Set the value of the variable to what want the DataStage job to inherit.
  4. Once you do this, close the Administrator Client so the variable is saved.
  5. Next, open the DataStage Designer client and navigate to the Job Properties.
  6. Add the environment variable name that you just created in the DataStage Administrator Client.
  7. Set the value of the new variable to $PROJDEF. This will inherit whatever default value you have set in the Administrator client. This will also allow you to update that default value in the Administrator client without having to recompile the job.

Wednesday 26 July 2017

How to fix missing and underreplicated blocks - HDFS

$ su - <$hdfs_user> 

$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files

$ for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ; hadoop fs -setrep 3 $hdfsfile; done

In above command, 3 is the replication factor. If you are using single datanode, it must be 1.