Welcome to SKA Science Data Challenge Scoring’s documentation!

This package is an open-source implementation of the code used to score and rank the submissions for the SKA Science Data Challenges (SDC).

sdc1

The original IDL code is available at: https://astronomers.skatelescope.org/ska-science-data-challenge-1/

To score a submission for SDC1, one should first instantiate a Scorer. This can be done via two methods depending on the format of the input data.

If your input catalogues are in text format, one should use the class method: ska_sdc.sdc1.sdc1_scorer.Sdc1Scorer.from_txt(). For example:

from ska_sdc.sdc1 import sdc1_scorer

sub_cat_path = "/path/to/submission/catalogue.txt"
truth_cat_path = "/path/to/truth/catalogue.txt"

scorer = sdc1_scorer.from_txt(sub_cat_path, truth_cat_path, freq=1400)

However, if your input catalogues are already dataframes, one should instantiate the constructor for ska_sdc.sdc1.sdc1_scorer.Sdc1Scorer class directly:

from ska_sdc.sdc1 import sdc1_scorer

scorer = sdc1_scorer(df1, df2, freq=1400)

where df1 and df2 are dataframes.

When the class has been instantiated, the ska_sdc.sdc1.sdc1_scorer.Sdc1Scorer.run() method can be called to run the scoring pipeline:

result = scorer.run()

which returns an instance of the Score class ska_sdc.sdc1.models.sdc1_score.Sdc1Score containing all the details related to the run.

The Sdc1Scorer class

class ska_sdc.sdc1.sdc1_scorer.Sdc1Scorer(sub_df, truth_df, freq)

The SDC1 scorer class.

Parameters:
  • sub_df (pandas.DataFrame) – The submission catalogue DataFrame of detected sources and properties
  • truth_path (pandas.DataFrame) – The truth catalogue DataFrame
  • freq (int) – Image frequency band (560, 1400 or 9200 MHz)
classmethod from_txt(sub_path, truth_path, freq, sub_skiprows=1, truth_skiprows=0)

Create an SDC1 scorer class from two source catalogues in text format.

Parameters:
  • sub_path (str) – The path of the submission catalogue of detected sources and properties
  • truth_path (str) – The path of the truth catalogue
  • freq (int) – Image frequency band (560, 1400 or 9200 MHz)
  • sub_skiprows (int, optional) – Number of rows to skip in submission catalogue. Defaults to 1.
  • truth_skiprows (int, optional) – Number of rows to skip in truth catalogue. Defaults to 0.
run(mode=0, train=False, detail=False)

Run the scoring pipeline.

Parameters:
  • mode (int, optional) – 0 or 1 to use core or centroid positions for scoring
  • train (bool, optional) – If True, will only evaluate score based on training area, else will exclude training area
  • detail (bool, optional) – If True, will return the catalogue of matches and per source scores.
Returns:

The calculated

SDC1 score object

Return type:

ska_sdc.sdc1.models.sdc1_score.Sdc1Score

score

Get the resulting Sdc1Score object.

Returns:The calculated SDC1 score object
Return type:ska_sdc.sdc1.models.sdc1_score.Sdc1Score

The Sdc1Score class

class ska_sdc.sdc1.models.sdc1_score.Sdc1Score(mode=0, train=False, detail=False)

Simple data container class for collating data relating to an SDC1 score.

This is created by the SDC1 Scorer’s run method.

acc_pc

The average score per match (%).

Returns:float64
detail

If True, has returned the catalogue of matches and per source scores.

Returns:bool
match_df

Dataframe of matched sources.

Returns:pandas.DataFrame
mode

The position used for scoring (0==core, 1==centroid)

Returns:int
n_bad

Number of candidate matches rejected during data cleansing.

Returns:int
n_det

The total number of detected sources in the submission.

Returns:int
n_false

Number of false detections.

Returns:int
n_match

Number of candidate matches below threshold.

Returns:int
score_det

The sum of the scores.

Returns:float64
scores_df

Dataframe containing the scores.

Returns:pandas.DataFrame
train

If True, has evaluated score based on training area, else excludes training area.

Returns:bool
value

The score for the last run.

Returns:float64

sdc2

This is a skeleton framework for SDC2.

To score a submission for SDC2, one should first instantiate a Scorer. This can be done via two methods depending on the format of the input data.

If your input catalogues are in text format, one should use the class method: ska_sdc.sdc2.sdc2_scorer.Sdc2Scorer.from_txt(). For example:

from ska_sdc.sdc2 import sdc2_scorer

sub_cat_path = "/path/to/submission/catalogue.txt"
truth_cat_path = "/path/to/truth/catalogue.txt"

scorer = sdc2_scorer.from_txt(sub_cat_path, truth_cat_path)

However, if your input catalogues are already dataframes, one should instantiate the constructor for ska_sdc.sdc2.sdc2_scorer.Sdc2Scorer class directly:

from ska_sdc.sdc2 import sdc2_scorer

scorer = sdc2_scorer(df1, df2)

where df1 and df2 are dataframes.

When the class has been instantiated, the ska_sdc.sdc2.sdc2_scorer.Sdc2Scorer.run() method can be called to run the scoring pipeline:

result = scorer.run()

which returns an instance of the Score class ska_sdc.sdc2.models.sdc2_score.Sdc2Score containing all the details related to the run.

The Sdc2Scorer class

class ska_sdc.sdc2.sdc2_scorer.Sdc2Scorer(cat_sub, cat_truth)

The SDC2 scorer class.

Parameters:
  • cat_sub (pandas.DataFrame) – The submission catalogue.
  • cat_truth (pandas.DataFrame) – The truth catalogue.
classmethod from_txt(sub_path, truth_path, sub_skiprows=0, truth_skiprows=0)

Create an SDC2 scorer class from two source catalogues in text format.

The catalogues must have a header row of column names that matches the expected column names in the config file.

Parameters:
  • sub_path (str) – Path to the submission catalogue.
  • truth_path (str) – Path to the truth catalogue.
  • sub_skiprows (int, optional) – Number of rows to skip in submission catalogue. Defaults to 0.
  • truth_skiprows (int, optional) – Number of rows to skip in truth catalogue. Defaults to 0.
run(train=False, detail=False)

Run the scoring pipeline.

Returns:The calculated SDC2 score object
Return type:ska_sdc.sdc2.models.sdc2_score.Sdc2Score
score

Get the resulting Sdc2Score object.

Returns:The calculated SDC2 score object
Return type:ska_sdc.sdc2.models.sdc2_score.Sdc2Score

The Sdc2Score class

class ska_sdc.sdc2.models.sdc2_score.Sdc2Score(train=False, detail=False)

Simple data container class for collating data relating to an SDC2 score.

This is created by the SDC2 Scorer’s run method.

acc_pc

The average score per match (%).

Returns:float64
detail

If True, has returned the catalogue of matches and per source scores.

Returns:bool
match_df

Dataframe of matched sources.

Returns:pandas.DataFrame
n_bad

Number of candidate matches rejected during data cleansing.

Returns:int
n_det

The total number of detected sources in the submission.

Returns:int
n_false

Number of false detections.

Returns:int
n_match

Number of candidate matches below threshold.

Returns:int
score_det

The sum of the scores.

Returns:float64
scores_df

Dataframe containing the scores.

Returns:pandas.DataFrame
train

If True, has evaluated score based on training area, else excludes training area.

Returns:bool
value

The score for the last run.

Returns:float64

Scoring pipeline

The SDC scoring pipeline proceeds sequentially via four steps:

Crossmatch preprocessing

class ska_sdc.sdc2.utils.xmatch_preprocessing.XMatchPreprocessing(step_names=[])

Prepare catalogues for crossmatching.

__init__(step_names=[])
Parameters:step_names (list) – Name of the steps to be imported from ska_sdc.sdc2.utils.xmatch_preprocessing_steps
preprocess(*args, **kwargs)

A wrapper function used to sequentially call all other prerequisite crossmatching preprocessing functions.

Returns:Preprocessed catalogue.
Return type:pandas.DataFrame
Crossmatch preprocessing steps
class ska_sdc.sdc2.utils.xmatch_preprocessing_steps.XMatchPreprocessingStepStub(*args, **kwargs)

Stub class for a preprocessing step.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

execute()

Execute the step.

Returns:Processed catalogue.
Return type:pandas.DataFrame

Catalogue crossmatching

Crossmatch postprocessing

class ska_sdc.sdc2.utils.xmatch_postprocessing.XMatchPostprocessing(step_names=[])

Postprocess crossmatched catalogue.

__init__(step_names=[])
Parameters:step_names (list) – Name of the steps to be imported from ska_sdc.sdc2.utils.xmatch_postprocessing_steps
postprocess(*args, **kwargs)

A wrapper function used to sequentially call all other postrequisite crossmatching postprocessing functions.

Returns:Postprocessed catalogue.
Return type:pandas.DataFrame
Crossmatch postprocessing steps
class ska_sdc.sdc2.utils.xmatch_postprocessing_steps.XMatchPostprocessingStepStub(*args, **kwargs)

Stub class for a postprocessing step.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

execute()

Execute the step.

Returns:Processed catalogue.
Return type:pandas.DataFrame

Score computation

ska_sdc.sdc2.utils.create_score.create_sdc_score(config, sieved_sub_df, n_det, train, detail)

Complete the scoring pipeline using the data generated by the previous steps. This requires the prepared truth and submission catalogues, and the candidate match catalogues created from the crossmatch step.

Parameters:
  • sieved_sub_df (pandas.DataFrame) – The processed and sieved candidate match catalogue between submission and truth.
  • n_det (int) – Total number of detected sources.
  • train (bool) – Whether the score is determined based on training area only
  • detail (bool) – If True, will include the detailed score and match data with the returned Sdc2Score object.

Indices and tables